diff options
author | Costa Tsaousis <costa@netdata.cloud> | 2023-01-10 19:59:21 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-01-10 19:59:21 +0200 |
commit | 368a26cfee6887ca0cb2301d93138f63b75e353a (patch) | |
tree | b57e39fdb78dc57f7a2c1fcc3d9b6bf3c2a2a113 /libnetdata | |
parent | b513888be389f92b2323d1bb3fdf55c22d4e4bad (diff) |
DBENGINE v2 (#14125)
* count open cache pages refering to datafile
* eliminate waste flush attempts
* remove eliminated variable
* journal v2 scanning split functions
* avoid locking open cache for a long time while migrating to journal v2
* dont acquire datafile for the loop; disable thread cancelability while a query is running
* work on datafile acquiring
* work on datafile deletion
* work on datafile deletion again
* logs of dbengine should start with DBENGINE
* thread specific key for queries to check if a query finishes without a finalize
* page_uuid is not used anymore
* Cleanup judy traversal when building new v2
Remove not needed calls to metric registry
* metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent
* disable checks for invalid time-ranges
* Remove type from page details
* report scanning time
* remove infinite loop from datafile acquire for deletion
* remove infinite loop from datafile acquire for deletion again
* trace query handles
* properly allocate array of dimensions in replication
* metrics cleanup
* metrics registry uses arrayalloc
* arrayalloc free should be protected by lock
* use array alloc in page cache
* journal v2 scanning fix
* datafile reference leaking hunding
* do not load metrics of future timestamps
* initialize reasons
* fix datafile reference leak
* do not load pages that are entirely overlapped by others
* expand metric retention atomically
* split replication logic in initialization and execution
* replication prepare ahead queries
* replication prepare ahead queries fixed
* fix replication workers accounting
* add router active queries chart
* restore accounting of pages metadata sources; cleanup replication
* dont count skipped pages as unroutable
* notes on services shutdown
* do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file
* do not add pages we dont need to pdc
* time in range re-work to provide info about past and future matches
* finner control on the pages selected for processing; accounting of page related issues
* fix invalid reference to handle->page
* eliminate data collection handle of pg_lookup_next
* accounting for queries with gaps
* query preprocessing the same way the processing is done; cache now supports all operations on Judy
* dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3)
* get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query
* finner gaps calculation; accounting of overlapping pages in queries
* fix gaps accounting
* move datafile deletion to worker thread
* tune libuv workers and thread stack size
* stop netdata threads gradually
* run indexing together with cache flush/evict
* more work on clean shutdown
* limit the number of pages to evict per run
* do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction
* economies on flags for smaller page footprint; cleanup and renames
* eviction moves referenced pages to the end of the queue
* use murmur hash for indexing partition
* murmur should be static
* use more indexing partitions
* revert number of partitions to number of cpus
* cancel threads first, then stop services
* revert default thread stack size
* dont execute replication requests of disconnected senders
* wait more time for services that are exiting gradually
* fixed last commit
* finer control on page selection algorithm
* default stacksize of 1MB
* fix formatting
* fix worker utilization going crazy when the number is rotating
* avoid buffer full due to replication preprocessing of requests
* support query priorities
* add count of spins in spinlock when compiled with netdata internal checks
* remove prioritization from dbengine queries; cache now uses mutexes for the queues
* hot pages are now in sections judy arrays, like dirty
* align replication queries to optimal page size
* during flushing add to clean and evict in batches
* Revert "during flushing add to clean and evict in batches"
This reverts commit 8fb2b69d068499eacea6de8291c336e5e9f197c7.
* dont lock clean while evicting pages during flushing
* Revert "dont lock clean while evicting pages during flushing"
This reverts commit d6c82b5f40aeba86fc7aead062fab1b819ba58b3.
* Revert "Revert "during flushing add to clean and evict in batches""
This reverts commit ca7a187537fb8f743992700427e13042561211ec.
* dont cross locks during flushing, for the fastest flushes possible
* low-priority queries load pages synchronously
* Revert "low-priority queries load pages synchronously"
This reverts commit 1ef2662ddcd20fe5842b856c716df134c42d1dc7.
* cache uses spinlock again
* during flushing, dont lock the clean queue at all; each item is added atomically
* do smaller eviction runs
* evict one page at a time to minimize lock contention on the clean queue
* fix eviction statistics
* fix last commit
* plain should be main cache
* event loop cleanup; evictions and flushes can now happen concurrently
* run flush and evictions from tier0 only
* remove not needed variables
* flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time
* added worker jobs in timer to find the slow part of it
* support fast eviction of pages when all_of_them is set
* revert default thread stack size
* bypass event loop for dispatching read extent commands to workers - send them directly
* Revert "bypass event loop for dispatching read extent commands to workers - send them directly"
This reverts commit 2c08bc5bab12881ae33bc73ce5dea03dfc4e1fce.
* cache work requests
* minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors
* publish flushed pages to open cache in the thread pool
* prevent eventloop requests from getting stacked in the event loop
* single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c
* more rrdengine.c cleanup
* enable db rotation
* do not log when there is a filter
* do not run multiple migration to journal v2
* load all extents async
* fix wrong paste
* report opcodes waiting, works dispatched, works executing
* cleanup event loop memory every 10 minutes
* dont dispatch more work requests than the number of threads available
* use the dispatched counter instead of the executing counter to check if the worker thread pool is full
* remove UV_RUN_NOWAIT
* replication to fill the queues
* caching of extent buffers; code cleanup
* caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation
* single transaction wal
* synchronous flushing
* first cancel the threads, then signal them to exit
* caching of rrdeng query handles; added priority to query target; health is now low prio
* add priority to the missing points; do not allow critical priority in queries
* offload query preparation and routing to libuv thread pool
* updated timing charts for the offloaded query preparation
* caching of WALs
* accounting for struct caches (buffers); do not load extents with invalid sizes
* protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset
* also check if the expanded before is not the chart later updated time
* also check if the expanded before is not after the wall clock time of when the query started
* Remove unused variable
* replication to queue less queries; cleanup of internal fatals
* Mark dimension to be updated async
* caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol)
* disable pgc stress test, under an ifdef
* disable mrg stress test under an ifdef
* Mark chart and host labels, host info for async check and store in the database
* dictionary items use arrayalloc
* cache section pages structure is allocated with arrayalloc
* Add function to wakeup the aclk query threads and check for exit
Register function to be called during shutdown after signaling the service to exit
* parallel preparation of all dimensions of queries
* be more sensitive to enable streaming after replication
* atomically finish chart replication
* fix last commit
* fix last commit again
* fix last commit again again
* fix last commit again again again
* unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication
* do not cancel start streaming; use high priority queries when we have locked chart data collection
* prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered
* opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue
* Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore
* Fix bad memory allocation / cleanup
* Cleanup ACLK sync initialization (part 1)
* Don't update metric registry during shutdown (part 1)
* Prevent crash when dashboard is refreshed and host goes away
* Mark ctx that is shutting down.
Test not adding flushed pages to open cache as hot if we are shutting down
* make ML work
* Fix compile without NETDATA_INTERNAL_CHECKS
* shutdown each ctx independently
* fix completion of quiesce
* do not update shared ML charts
* Create ML charts on child hosts.
When a parent runs a ML for a child, the relevant-ML charts
should be created on the child host. These charts should use
the parent's hostname to differentiate multiple parents that might
run ML for a child.
The only exception to this rule is the training/prediction resource
usage charts. These are created on the localhost of the parent host,
because they provide information specific to said host.
* check new ml code
* first save the database, then free all memory
* dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db
* Cleanup metadata thread (part 2)
* increase refcount before dispatching prep command
* Do not try to stop anomaly detection threads twice.
A separate function call has been added to stop anomaly detection threads.
This commit removes the left over function calls that were made
internally when a host was being created/destroyed.
* Remove allocations when smoothing samples buffer
The number of dims per sample is always 1, ie. we are training and
predicting only individual dimensions.
* set the orphan flag when loading archived hosts
* track worker dispatch callbacks and threadpool worker init
* make ML threads joinable; mark ctx having flushing in progress as early as possible
* fix allocation counter
* Cleanup metadata thread (part 3)
* Cleanup metadata thread (part 4)
* Skip metadata host scan when running unittest
* unittest support during init
* dont use all the libuv threads for queries
* break an infinite loop when sleep_usec() is interrupted
* ml prediction is a collector for several charts
* sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit
* worker_unregister() in netdata threads cleanup
* moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h
* fixed ML issues
* removed engine2 directory
* added dbengine2 files in CMakeLists.txt
* move query plan data to query target, so that they can be exposed by in jsonwrap
* uniform definition of query plan according to the other query target members
* event_loop should be in daemon, not libnetdata
* metric_retention_by_uuid() is now part of the storage engine abstraction
* unify time_t variables to have the suffix _s (meaning: seconds)
* old dbengine statistics become "dbengine io"
* do not enable ML resource usage charts by default
* unify ml chart families, plugins and modules
* cleanup query plans from query target
* cleanup all extent buffers
* added debug info for rrddim slot to time
* rrddim now does proper gap management
* full rewrite of the mem modes
* use library functions for madvise
* use CHECKSUM_SZ for the checksum size
* fix coverity warning about the impossible case of returning a page that is entirely in the past of the query
* fix dbengine shutdown
* keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently
* fine tune cache evictions
* dont initialize health if the health service is not running - prevent crash on shutdown while children get connected
* rename AS threads to ACLK[hostname]
* prevent re-use of uninitialized memory in queries
* use JulyL instead of JudyL for PDC operations - to test it first
* add also JulyL files
* fix July memory accounting
* disable July for PDC (use Judy)
* use the function to remove datafiles from linked list
* fix july and event_loop
* add july to libnetdata subdirs
* rename time_t variables that end in _t to end in _s
* replicate when there is a gap at the beginning of the replication period
* reset postponing of sender connections when a receiver is connected
* Adjust update every properly
* fix replication infinite loop due to last change
* packed enums in rrd.h and cleanup of obsolete rrd structure members
* prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests()
* void unused variable
* void unused variables
* fix indentation
* entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps
* macros to caclulate page entries by time and size
* prevent statsd cleanup crash on exit
* cleanup health thread related variables
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: vkalintiris <vasilis@netdata.cloud>
Diffstat (limited to 'libnetdata')
-rw-r--r-- | libnetdata/Makefile.am | 1 | ||||
-rw-r--r-- | libnetdata/arrayalloc/arrayalloc.c | 2 | ||||
-rw-r--r-- | libnetdata/clocks/clocks.c | 42 | ||||
-rw-r--r-- | libnetdata/clocks/clocks.h | 3 | ||||
-rw-r--r-- | libnetdata/completion/completion.c | 30 | ||||
-rw-r--r-- | libnetdata/completion/completion.h | 5 | ||||
-rw-r--r-- | libnetdata/dictionary/dictionary.c | 32 | ||||
-rw-r--r-- | libnetdata/july/Makefile.am | 8 | ||||
-rw-r--r-- | libnetdata/july/README.md | 10 | ||||
-rw-r--r-- | libnetdata/july/july.c | 447 | ||||
-rw-r--r-- | libnetdata/july/july.h | 40 | ||||
-rw-r--r-- | libnetdata/libnetdata.c | 25 | ||||
-rw-r--r-- | libnetdata/libnetdata.h | 79 | ||||
-rw-r--r-- | libnetdata/locks/locks.c | 439 | ||||
-rw-r--r-- | libnetdata/locks/locks.h | 29 | ||||
-rw-r--r-- | libnetdata/required_dummies.h | 1 | ||||
-rw-r--r-- | libnetdata/socket/socket.c | 3 | ||||
-rw-r--r-- | libnetdata/socket/socket.h | 25 | ||||
-rw-r--r-- | libnetdata/threads/threads.c | 39 |
19 files changed, 840 insertions, 420 deletions
diff --git a/libnetdata/Makefile.am b/libnetdata/Makefile.am index 1208d16c21..b5bbb79e03 100644 --- a/libnetdata/Makefile.am +++ b/libnetdata/Makefile.am @@ -15,6 +15,7 @@ SUBDIRS = \ ebpf \ eval \ json \ + july \ health \ locks \ log \ diff --git a/libnetdata/arrayalloc/arrayalloc.c b/libnetdata/arrayalloc/arrayalloc.c index 35613baf0b..482317f750 100644 --- a/libnetdata/arrayalloc/arrayalloc.c +++ b/libnetdata/arrayalloc/arrayalloc.c @@ -210,7 +210,7 @@ static void arrayalloc_add_page(ARAL *ar TRACE_ALLOCATIONS_FUNCTION_DEFINITION_P char filename[FILENAME_MAX + 1]; snprintfz(filename, FILENAME_MAX, "%s/array_alloc.mmap/%s.%zu", *ar->cache_dir, ar->filename, ar->internal.file_number); page->filename = strdupz(filename); - page->data = netdata_mmap(page->filename, page->size, MAP_SHARED, 0); + page->data = netdata_mmap(page->filename, page->size, MAP_SHARED, 0, false); if (unlikely(!page->data)) fatal("Cannot allocate arrayalloc buffer of size %zu on filename '%s'", page->size, page->filename); } diff --git a/libnetdata/clocks/clocks.c b/libnetdata/clocks/clocks.c index 3c036facb8..19c66f0a5a 100644 --- a/libnetdata/clocks/clocks.c +++ b/libnetdata/clocks/clocks.c @@ -189,9 +189,13 @@ void sleep_to_absolute_time(usec_t usec) { .tv_nsec = (suseconds_t)((usec % USEC_PER_SEC) * NSEC_PER_USEC) }; + errno = 0; int ret = 0; while( (ret = clock_nanosleep(clock, TIMER_ABSTIME, &req, NULL)) != 0 ) { - if(ret == EINTR) continue; + if(ret == EINTR) { + errno = 0; + continue; + } else { if (ret == EINVAL) { if (!einval_printed) { @@ -313,7 +317,7 @@ usec_t heartbeat_next(heartbeat_t *hb, usec_t tick) { // sleep_usec() has a loop to guarantee we will sleep for at least the requested time. // According the specs, when we sleep for a relative time, clock adjustments should not affect the duration // we sleep. - sleep_usec(next - now); + sleep_usec_with_now(next - now, now); now = now_realtime_usec(); dt = now - hb->realtime; @@ -342,7 +346,7 @@ usec_t heartbeat_next(heartbeat_t *hb, usec_t tick) { return dt; } -void sleep_usec(usec_t usec) { +void sleep_usec_with_now(usec_t usec, usec_t started_ut) { // we expect microseconds (1.000.000 per second) // but timespec is nanoseconds (1.000.000.000 per second) struct timespec rem = { 0, 0 }, req = { @@ -350,21 +354,37 @@ void sleep_usec(usec_t usec) { .tv_nsec = (suseconds_t) ((usec % USEC_PER_SEC) * NSEC_PER_USEC) }; -#ifdef __linux__ - while (clock_nanosleep(CLOCK_REALTIME, 0, &req, &rem) != 0) { -#else + // make sure errno is not EINTR + errno = 0; + + if(!started_ut) + started_ut = now_realtime_usec(); + + usec_t end_ut = started_ut + usec; + while (nanosleep(&req, &rem) != 0) { -#endif if (likely(errno == EINTR && (rem.tv_sec || rem.tv_nsec))) { req = rem; rem = (struct timespec){ 0, 0 }; + + // break an infinite loop + errno = 0; + + usec_t now_ut = now_realtime_usec(); + if(now_ut >= end_ut) + break; + + usec_t remaining_ut = (usec_t)req.tv_sec * USEC_PER_SEC + (usec_t)req.tv_nsec * NSEC_PER_USEC > usec; + usec_t check_ut = now_ut - started_ut; + if(remaining_ut > check_ut) { + req = (struct timespec){ + .tv_sec = (time_t) ( check_ut / USEC_PER_SEC), + .tv_nsec = (suseconds_t) ((check_ut % USEC_PER_SEC) * NSEC_PER_USEC) + }; + } } else { -#ifdef __linux__ - error("Cannot clock_nanosleep(CLOCK_REALTIME) for %llu microseconds.", usec); -#else error("Cannot nanosleep() for %llu microseconds.", usec); -#endif break; } } diff --git a/libnetdata/clocks/clocks.h b/libnetdata/clocks/clocks.h index 7738a2c8ed..b050b62548 100644 --- a/libnetdata/clocks/clocks.h +++ b/libnetdata/clocks/clocks.h @@ -141,7 +141,8 @@ usec_t heartbeat_next(heartbeat_t *hb, usec_t tick); void heartbeat_statistics(usec_t *min_ptr, usec_t *max_ptr, usec_t *average_ptr, size_t *count_ptr); -void sleep_usec(usec_t usec); +void sleep_usec_with_now(usec_t usec, usec_t started_ut); +#define sleep_usec(usec) sleep_usec_with_now(usec, 0); void clocks_init(void); diff --git a/libnetdata/completion/completion.c b/libnetdata/completion/completion.c index b5ac86e4f1..6257e02998 100644 --- a/libnetdata/completion/completion.c +++ b/libnetdata/completion/completion.c @@ -5,6 +5,7 @@ void completion_init(struct completion *p) { p->completed = 0; + p->completed_jobs = 0; fatal_assert(0 == uv_cond_init(&p->cond)); fatal_assert(0 == uv_mutex_init(&p->mutex)); } @@ -32,3 +33,32 @@ void completion_mark_complete(struct completion *p) uv_cond_broadcast(&p->cond); uv_mutex_unlock(&p->mutex); } + +unsigned completion_wait_for_a_job(struct completion *p, unsigned completed_jobs) +{ + uv_mutex_lock(&p->mutex); + while (0 == p->completed && p->completed_jobs <= completed_jobs) { + uv_cond_wait(&p->cond, &p->mutex); + } + completed_jobs = p->completed_jobs; + uv_mutex_unlock(&p->mutex); + + return completed_jobs; +} + +void completion_mark_complete_a_job(struct completion *p) +{ + uv_mutex_lock(&p->mutex); + p->completed_jobs++; + uv_cond_broadcast(&p->cond); + uv_mutex_unlock(&p->mutex); +} + +bool completion_is_done(struct completion *p) +{ + bool ret; + uv_mutex_lock(&p->mutex); + ret = p->completed; + uv_mutex_unlock(&p->mutex); + return ret; +} diff --git a/libnetdata/completion/completion.h b/libnetdata/completion/completion.h index 667360a424..723f736889 100644 --- a/libnetdata/completion/completion.h +++ b/libnetdata/completion/completion.h @@ -9,6 +9,7 @@ struct completion { uv_mutex_t mutex; uv_cond_t cond; volatile unsigned completed; + volatile unsigned completed_jobs; }; void completion_init(struct completion *p); @@ -19,4 +20,8 @@ void completion_wait_for(struct completion *p); void completion_mark_complete(struct completion *p); +unsigned completion_wait_for_a_job(struct completion *p, unsigned completed_jobs); +void completion_mark_complete_a_job(struct completion *p); +bool completion_is_done(struct completion *p); + #endif /* NETDATA_COMPLETION_H */ diff --git a/libnetdata/dictionary/dictionary.c b/libnetdata/dictionary/dictionary.c index 0277e067f3..045dbbe55b 100644 --- a/libnetdata/dictionary/dictionary.c +++ b/libnetdata/dictionary/dictionary.c @@ -1234,11 +1234,29 @@ static inline size_t item_get_name_len(const DICTIONARY_ITEM *item) { return strlen(item->caller_name); } +static ARAL dict_items_aral = { + .filename = NULL, + .cache_dir = NULL, + .use_mmap = false, + .initial_elements = 65536 / sizeof(DICTIONARY_ITEM), + .requested_element_size = sizeof(DICTIONARY_ITEM), +}; + +static ARAL dict_shared_items_aral = { + .filename = NULL, + .cache_dir = NULL, + .use_mmap = false, + .initial_elements = 65536 / sizeof(DICTIONARY_ITEM_SHARED), + .requested_element_size = sizeof(DICTIONARY_ITEM_SHARED), +}; + static DICTIONARY_ITEM *dict_item_create(DICTIONARY *dict __maybe_unused, size_t *allocated_bytes, DICTIONARY_ITEM *master_item) { DICTIONARY_ITEM *item; size_t size = sizeof(DICTIONARY_ITEM); - item = callocz(1, size); +// item = callocz(1, size); + item = arrayalloc_mallocz(&dict_items_aral); + memset(item, 0, sizeof(DICTIONARY_ITEM)); #ifdef NETDATA_INTERNAL_CHECKS item->creator_pid = gettid(); @@ -1257,7 +1275,10 @@ static DICTIONARY_ITEM *dict_item_create(DICTIONARY *dict __maybe_unused, size_t } else { size = sizeof(DICTIONARY_ITEM_SHARED); - item->shared = callocz(1, size); + // item->shared = callocz(1, size); + item->shared = arrayalloc_mallocz(&dict_shared_items_aral); + memset(item->shared, 0, sizeof(DICTIONARY_ITEM_SHARED)); + item->shared->links = 1; *allocated_bytes += size; } @@ -1396,12 +1417,15 @@ static size_t dict_item_free_with_hooks(DICTIONARY *dict, DICTIONARY_ITEM *item) } value_size += item->shared->value_len; - freez(item->shared); + // freez(item->shared); + arrayalloc_freez(&dict_shared_items_aral, item->shared); item->shared = NULL; item_size += sizeof(DICTIONARY_ITEM_SHARED); } - freez(item); + // freez(item); + arrayalloc_freez(&dict_items_aral, item); + item_size += sizeof(DICTIONARY_ITEM); DICTIONARY_STATS_MINUS_MEMORY(dict, key_size, item_size, value_size); diff --git a/libnetdata/july/Makefile.am b/libnetdata/july/Makefile.am new file mode 100644 index 0000000000..161784b8f6 --- /dev/null +++ b/libnetdata/july/Makefile.am @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-3.0-or-later + +AUTOMAKE_OPTIONS = subdir-objects +MAINTAINERCLEANFILES = $(srcdir)/Makefile.in + +dist_noinst_DATA = \ + README.md \ + $(NULL) diff --git a/libnetdata/july/README.md b/libnetdata/july/README.md new file mode 100644 index 0000000000..5cf4fb22f4 --- /dev/null +++ b/libnetdata/july/README.md @@ -0,0 +1,10 @@ +<!-- +custom_edit_url: https://github.com/netdata/netdata/edit/master/libnetdata/july/README.md +--> + + +# July + +An interface similar to `Judy` that uses minimal allocations (that can be cached) +for items that are mainly appended (just a few insertions in the middle) + diff --git a/libnetdata/july/july.c b/libnetdata/july/july.c new file mode 100644 index 0000000000..eda2f7ed34 --- /dev/null +++ b/libnetdata/july/july.c @@ -0,0 +1,447 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "july.h" + +#define JULYL_MIN_ENTRIES 10 + +struct JulyL_item { + Word_t index; + void *value; +}; + +struct JulyL { + size_t entries; + size_t used; + + // statistics + size_t bytes; + size_t bytes_moved; + size_t reallocs; + + struct { + struct JulyL *prev; + struct JulyL *next; + } cache; + + struct JulyL_item array[]; +}; + +// ---------------------------------------------------------------------------- +// JulyL cache + +static struct { + struct { + SPINLOCK spinlock; + struct JulyL *available_items; + size_t available; + } protected; + + struct { + size_t bytes; + size_t allocated; + size_t bytes_moved; + size_t reallocs; + } atomics; +} julyl_globals = { + .protected = { + .spinlock = NETDATA_SPINLOCK_INITIALIZER, + .available_items = NULL, + .available = 0, + }, + .atomics = { + .bytes = 0, + .allocated = 0, + .bytes_moved = 0, + .reallocs = 0, + }, +}; + +void julyl_cleanup(void) { + netdata_spinlock_lock(&julyl_globals.protected.spinlock); + + while(julyl_globals.protected.available_items && julyl_globals.protected.available > 10) { + struct JulyL *item = julyl_globals.protected.available_items; + DOUBLE_LINKED_LIST_REMOVE_UNSAFE(julyl_globals.protected.available_items, item, cache.prev, cache.next); + size_t bytes = item->bytes; + freez(item); + julyl_globals.protected.available--; + __atomic_sub_fetch(&julyl_globals.atomics.bytes, bytes, __ATOMIC_RELAXED); + __atomic_sub_fetch(&julyl_globals.atomics.allocated, 1, __ATOMIC_RELAXED); + } + + netdata_spinlock_unlock(&julyl_globals.protected.spinlock); +} + +struct JulyL *julyl_get(void) { + struct JulyL *j; + + netdata_spinlock_lock(&julyl_globals.protected.spinlock); + + j = julyl_globals.protected.available_items; + if(likely(j)) { + DOUBLE_LINKED_LIST_REMOVE_UNSAFE(julyl_globals.protected.available_items, j, cache.prev, cache.next); + julyl_globals.protected.available--; + } + + netdata_spinlock_unlock(&julyl_globals.protected.spinlock); + + if(unlikely(!j)) { + size_t bytes = sizeof(struct JulyL) + JULYL_MIN_ENTRIES * sizeof(struct JulyL_item); + j = mallocz(bytes); + j->bytes = bytes; + j->entries = JULYL_MIN_ENTRIES; + __atomic_add_fetch(&julyl_globals.atomics.bytes, bytes, __ATOMIC_RELAXED); + __atomic_add_fetch(&julyl_globals.atomics.allocated, 1, __ATOMIC_RELAXED); + } + + j->used = 0; + j->bytes_moved = 0; + j->reallocs = 0; + j->cache.next = j->cache.prev = NULL; + return j; +} + +static void julyl_release(struct JulyL *j) { + if(unlikely(!j)) return; + + __atomic_add_fetch(&julyl_globals.atomics.bytes_moved, j->bytes_moved, __ATOMIC_RELAXED); + __atomic_add_fetch(&julyl_globals.atomics.reallocs, j->reallocs, __ATOMIC_RELAXED); + + netdata_spinlock_lock(&julyl_globals.protected.spinlock); + DOUBLE_LINKED_LIST_APPEND_UNSAFE(julyl_globals.protected.available_items, j, cache.prev, cache.next); + julyl_globals.protected.available++; + netdata_spinlock_unlock(&julyl_globals.protected.spinlock); +} + +size_t julyl_cache_size(void) { + return __atomic_load_n(&julyl_globals.atomics.bytes, __ATOMIC_RELAXED); +} + +size_t julyl_bytes_moved(void) { + return __atomic_load_n(&julyl_globals.atomics.bytes_moved, __ATOMIC_RELAXED); +} + +// ---------------------------------------------------------------------------- +// JulyL + +size_t JulyLGet_binary_search_position_of_index(const struct JulyL *July, Word_t Index) { + // return the position of the first item >= Index + + size_t left = 0; + size_t right = July->used; + while(left < right) { + size_t middle = (left + right) >> 1; + + if(July->array[middle].index > Index) + right = middle; + + else + left = middle + 1; + } + + internal_fatal(left > July->used, "JULY: invalid position returned"); + + if(left > 0 && July->array[left - 1].index == Index) + return left - 1; + + internal_fatal( (left < July->used && July->array[left].index < Index) || + (left > 0 && July->array[left - 1].index >= Index) + , "JULY: wrong item returned"); + + return left; +} + +PPvoid_t JulyLGet(Pcvoid_t PArray, Word_t Index, PJError_t PJError __maybe_unused) { + const struct JulyL *July = PArray; + if(!July) + return NULL; + + size_t pos = JulyLGet_binary_search_position_of_index(July, Index); + + if(unlikely(pos >= July->used || July->array[pos].index != Index)) + return NULL; + + return (PPvoid_t)&July->array[pos].value; +} + +PPvoid_t JulyLIns(PPvoid_t PPArray, Word_t Index, PJError_t PJError __maybe_unused) { + struct JulyL *July = *PPArray; + if(unlikely(!July)) { + July = julyl_get(); + July->used = 0; + *PPArray = July; + } + + size_t pos = JulyLGet_binary_search_position_of_index(July, Index); + + if((pos == July->used || July->array[pos].index != Index)) { + // we have to add this entry + + if (unlikely(July->used == July->entries)) { + // we have to expand the array + size_t bytes = sizeof(struct JulyL) + July->entries * 2 * sizeof(struct JulyL_item); + __atomic_add_fetch(&julyl_globals.atomics.bytes, bytes - July->bytes, __ATOMIC_RELAXED); + July = reallocz(July, bytes); + July->bytes = bytes; + July->entries *= 2; + July->reallocs++; + *PPArray = July; + } + + if (unlikely(pos != July->used)) { + // we have to shift some members to make room + size_t size = (July->used - pos) * sizeof(struct JulyL_item); + memmove(&July->array[pos + 1], &July->array[pos], size); + July->bytes_moved += size; + } + + July->used++; + July->array[pos].value = NULL; + July->array[pos].index = Index; + } + + return &July->array[pos].value; +} + +PPvoid_t JulyLFirst(Pcvoid_t PArray, Word_t *Index, PJError_t PJError __maybe_unused) { + const struct JulyL *July = PArray; + if(!July) + return NULL; + + size_t pos = JulyLGet_binary_search_position_of_index(July, *Index); + // pos is >= Index + + if(unlikely(pos == July->used)) + return NULL; + + *Index = July->array[pos].index; + return (PPvoid_t)&July->array[pos].value; +} + +PPvoid_t JulyLNext(Pcvoid_t PArray, Word_t *Index, PJError_t PJError __maybe_unused) { + const struct JulyL *July = PArray; + if(!July) + return NULL; + + size_t pos = JulyLGet_binary_search_position_of_index(July, *Index); + // pos is >= Index + + if(unlikely(pos == July->used)) + return NULL; + + if(July->array[pos].index == *Index) { + pos++; + + if(unlikely(pos == July->used)) + return NULL; + } + + *Index = July->array[pos].index; + return (PPvoid_t)&July->array[pos].value; +} + +PPvoid_t JulyLLast(Pcvoid_t PArray, Word_t *Index, PJError_t PJError __maybe_unused) { + const struct JulyL *July = PArray; + if(!July) + return NULL; + + size_t pos = JulyLGet_binary_search_position_of_index(July, *Index); + // pos is >= Index + + if(pos > 0 && (pos == July->used || July->array[pos].index > *Index)) + pos--; + + if(unlikely(pos == 0 && July->array[0].index > *Index)) + return NULL; + + *Index = July->array[pos].index; + return (PPvoid_t)&July->array[pos].value; +} + +PPvoid_t JulyLPrev(Pcvoid_t PArray, Word_t *Index, PJError_t PJError __maybe_unused) { + const struct JulyL *July = PArray; + if(!July) + return NULL; + + size_t pos = JulyLGet_binary_search_position_of_index(July, *Index); + // pos is >= Index + + if(unlikely(pos == 0 || July->used == 0)) + return NULL; + + // get the previous one + pos--; + + *Index = July->array[pos].index; + return (PPvoid_t)&July->array[pos].value; +} + +Word_t JulyLFreeArray(PPvoid_t PPArray, PJError_t PJError __maybe_unused) { + struct JulyL *July = *PPArray; + if(unlikely(!July)) + return 0; + + size_t bytes = July->bytes; + julyl_release(July); + *PPArray = NULL; + return bytes; +} + +// -------------------------------------------- |