DBENGINE v2 (#14125)

* count open cache pages refering to datafile * eliminate waste flush attempts * remove eliminated variable * journal v2 scanning split functions * avoid locking open cache for a long time while migrating to journal v2 * dont acquire datafile for the loop; disable thread cancelability while a query is running * work on datafile acquiring * work on datafile deletion * work on datafile deletion again * logs of dbengine should start with DBENGINE * thread specific key for queries to check if a query finishes without a finalize * page_uuid is not used anymore * Cleanup judy traversal when building new v2 Remove not needed calls to metric registry * metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent * disable checks for invalid time-ranges * Remove type from page details * report scanning time * remove infinite loop from datafile acquire for deletion * remove infinite loop from datafile acquire for deletion again * trace query handles * properly allocate array of dimensions in replication * metrics cleanup * metrics registry uses arrayalloc * arrayalloc free should be protected by lock * use array alloc in page cache * journal v2 scanning fix * datafile reference leaking hunding * do not load metrics of future timestamps * initialize reasons * fix datafile reference leak * do not load pages that are entirely overlapped by others * expand metric retention atomically * split replication logic in initialization and execution * replication prepare ahead queries * replication prepare ahead queries fixed * fix replication workers accounting * add router active queries chart * restore accounting of pages metadata sources; cleanup replication * dont count skipped pages as unroutable * notes on services shutdown * do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file * do not add pages we dont need to pdc * time in range re-work to provide info about past and future matches * finner control on the pages selected for processing; accounting of page related issues * fix invalid reference to handle->page * eliminate data collection handle of pg_lookup_next * accounting for queries with gaps * query preprocessing the same way the processing is done; cache now supports all operations on Judy * dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3) * get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query * finner gaps calculation; accounting of overlapping pages in queries * fix gaps accounting * move datafile deletion to worker thread * tune libuv workers and thread stack size * stop netdata threads gradually * run indexing together with cache flush/evict * more work on clean shutdown * limit the number of pages to evict per run * do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction * economies on flags for smaller page footprint; cleanup and renames * eviction moves referenced pages to the end of the queue * use murmur hash for indexing partition * murmur should be static * use more indexing partitions * revert number of partitions to number of cpus * cancel threads first, then stop services * revert default thread stack size * dont execute replication requests of disconnected senders * wait more time for services that are exiting gradually * fixed last commit * finer control on page selection algorithm * default stacksize of 1MB * fix formatting * fix worker utilization going crazy when the number is rotating * avoid buffer full due to replication preprocessing of requests * support query priorities * add count of spins in spinlock when compiled with netdata internal checks * remove prioritization from dbengine queries; cache now uses mutexes for the queues * hot pages are now in sections judy arrays, like dirty * align replication queries to optimal page size * during flushing add to clean and evict in batches * Revert "during flushing add to clean and evict in batches" This reverts commit 8fb2b69d068499eacea6de8291c336e5e9f197c7. * dont lock clean while evicting pages during flushing * Revert "dont lock clean while evicting pages during flushing" This reverts commit d6c82b5f40aeba86fc7aead062fab1b819ba58b3. * Revert "Revert "during flushing add to clean and evict in batches"" This reverts commit ca7a187537fb8f743992700427e13042561211ec. * dont cross locks during flushing, for the fastest flushes possible * low-priority queries load pages synchronously * Revert "low-priority queries load pages synchronously" This reverts commit 1ef2662ddcd20fe5842b856c716df134c42d1dc7. * cache uses spinlock again * during flushing, dont lock the clean queue at all; each item is added atomically * do smaller eviction runs * evict one page at a time to minimize lock contention on the clean queue * fix eviction statistics * fix last commit * plain should be main cache * event loop cleanup; evictions and flushes can now happen concurrently * run flush and evictions from tier0 only * remove not needed variables * flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time * added worker jobs in timer to find the slow part of it * support fast eviction of pages when all_of_them is set * revert default thread stack size * bypass event loop for dispatching read extent commands to workers - send them directly * Revert "bypass event loop for dispatching read extent commands to workers - send them directly" This reverts commit 2c08bc5bab12881ae33bc73ce5dea03dfc4e1fce. * cache work requests * minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors * publish flushed pages to open cache in the thread pool * prevent eventloop requests from getting stacked in the event loop * single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c * more rrdengine.c cleanup * enable db rotation * do not log when there is a filter * do not run multiple migration to journal v2 * load all extents async * fix wrong paste * report opcodes waiting, works dispatched, works executing * cleanup event loop memory every 10 minutes * dont dispatch more work requests than the number of threads available * use the dispatched counter instead of the executing counter to check if the worker thread pool is full * remove UV_RUN_NOWAIT * replication to fill the queues * caching of extent buffers; code cleanup * caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation * single transaction wal * synchronous flushing * first cancel the threads, then signal them to exit * caching of rrdeng query handles; added priority to query target; health is now low prio * add priority to the missing points; do not allow critical priority in queries * offload query preparation and routing to libuv thread pool * updated timing charts for the offloaded query preparation * caching of WALs * accounting for struct caches (buffers); do not load extents with invalid sizes * protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset * also check if the expanded before is not the chart later updated time * also check if the expanded before is not after the wall clock time of when the query started * Remove unused variable * replication to queue less queries; cleanup of internal fatals * Mark dimension to be updated async * caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol) * disable pgc stress test, under an ifdef * disable mrg stress test under an ifdef * Mark chart and host labels, host info for async check and store in the database * dictionary items use arrayalloc * cache section pages structure is allocated with arrayalloc * Add function to wakeup the aclk query threads and check for exit Register function to be called during shutdown after signaling the service to exit * parallel preparation of all dimensions of queries * be more sensitive to enable streaming after replication * atomically finish chart replication * fix last commit * fix last commit again * fix last commit again again * fix last commit again again again * unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication * do not cancel start streaming; use high priority queries when we have locked chart data collection * prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered * opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue * Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore * Fix bad memory allocation / cleanup * Cleanup ACLK sync initialization (part 1) * Don't update metric registry during shutdown (part 1) * Prevent crash when dashboard is refreshed and host goes away * Mark ctx that is shutting down. Test not adding flushed pages to open cache as hot if we are shutting down * make ML work * Fix compile without NETDATA_INTERNAL_CHECKS * shutdown each ctx independently * fix completion of quiesce * do not update shared ML charts * Create ML charts on child hosts. When a parent runs a ML for a child, the relevant-ML charts should be created on the child host. These charts should use the parent's hostname to differentiate multiple parents that might run ML for a child. The only exception to this rule is the training/prediction resource usage charts. These are created on the localhost of the parent host, because they provide information specific to said host. * check new ml code * first save the database, then free all memory * dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db * Cleanup metadata thread (part 2) * increase refcount before dispatching prep command * Do not try to stop anomaly detection threads twice. A separate function call has been added to stop anomaly detection threads. This commit removes the left over function calls that were made internally when a host was being created/destroyed. * Remove allocations when smoothing samples buffer The number of dims per sample is always 1, ie. we are training and predicting only individual dimensions. * set the orphan flag when loading archived hosts * track worker dispatch callbacks and threadpool worker init * make ML threads joinable; mark ctx having flushing in progress as early as possible * fix allocation counter * Cleanup metadata thread (part 3) * Cleanup metadata thread (part 4) * Skip metadata host scan when running unittest * unittest support during init * dont use all the libuv threads for queries * break an infinite loop when sleep_usec() is interrupted * ml prediction is a collector for several charts * sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit * worker_unregister() in netdata threads cleanup * moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h * fixed ML issues * removed engine2 directory * added dbengine2 files in CMakeLists.txt * move query plan data to query target, so that they can be exposed by in jsonwrap * uniform definition of query plan according to the other query target members * event_loop should be in daemon, not libnetdata * metric_retention_by_uuid() is now part of the storage engine abstraction * unify time_t variables to have the suffix _s (meaning: seconds) * old dbengine statistics become "dbengine io" * do not enable ML resource usage charts by default * unify ml chart families, plugins and modules * cleanup query plans from query target * cleanup all extent buffers * added debug info for rrddim slot to time * rrddim now does proper gap management * full rewrite of the mem modes * use library functions for madvise * use CHECKSUM_SZ for the checksum size * fix coverity warning about the impossible case of returning a page that is entirely in the past of the query * fix dbengine shutdown * keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently * fine tune cache evictions * dont initialize health if the health service is not running - prevent crash on shutdown while children get connected * rename AS threads to ACLK[hostname] * prevent re-use of uninitialized memory in queries * use JulyL instead of JudyL for PDC operations - to test it first * add also JulyL files * fix July memory accounting * disable July for PDC (use Judy) * use the function to remove datafiles from linked list * fix july and event_loop * add july to libnetdata subdirs * rename time_t variables that end in _t to end in _s * replicate when there is a gap at the beginning of the replication period * reset postponing of sender connections when a receiver is connected * Adjust update every properly * fix replication infinite loop due to last change * packed enums in rrd.h and cleanup of obsolete rrd structure members * prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests() * void unused variable * void unused variables * fix indentation * entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps * macros to caclulate page entries by time and size * prevent statsd cleanup crash on exit * cleanup health thread related variables Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: vkalintiris <vasilis@netdata.cloud>
author: Costa Tsaousis <costa@netdata.cloud> 2023-01-10 19:59:21 +0200
committer: GitHub <noreply@github.com> 2023-01-10 19:59:21 +0200
commit: 368a26cfee6887ca0cb2301d93138f63b75e353a (patch)
tree: b57e39fdb78dc57f7a2c1fcc3d9b6bf3c2a2a113 /libnetdata
parent: b513888be389f92b2323d1bb3fdf55c22d4e4bad (diff)
19 files changed, 840 insertions, 420 deletions
diff --git a/libnetdata/Makefile.am b/libnetdata/Makefile.am
index 1208d16c21..b5bbb79e03 100644
--- a/libnetdata/Makefile.am
+++ b/libnetdata/Makefile.am
@@ -15,6 +15,7 @@ SUBDIRS = \
     ebpf \
     eval \
     json \
+    july \
     health \
     locks \
     log \
diff --git a/libnetdata/arrayalloc/arrayalloc.c b/libnetdata/arrayalloc/arrayalloc.c
index 35613baf0b..482317f750 100644
--- a/libnetdata/arrayalloc/arrayalloc.c
+++ b/libnetdata/arrayalloc/arrayalloc.c
@@ -210,7 +210,7 @@ static void arrayalloc_add_page(ARAL *ar TRACE_ALLOCATIONS_FUNCTION_DEFINITION_P
         char filename[FILENAME_MAX + 1];
         snprintfz(filename, FILENAME_MAX, "%s/array_alloc.mmap/%s.%zu", *ar->cache_dir, ar->filename, ar->internal.file_number);
         page->filename = strdupz(filename);
-        page->data = netdata_mmap(page->filename, page->size, MAP_SHARED, 0);
+        page->data = netdata_mmap(page->filename, page->size, MAP_SHARED, 0, false);
         if (unlikely(!page->data))
             fatal("Cannot allocate arrayalloc buffer of size %zu on filename '%s'", page->size, page->filename);
     }
diff --git a/libnetdata/clocks/clocks.c b/libnetdata/clocks/clocks.c
index 3c036facb8..19c66f0a5a 100644
--- a/libnetdata/clocks/clocks.c
+++ b/libnetdata/clocks/clocks.c
@@ -189,9 +189,13 @@ void sleep_to_absolute_time(usec_t usec) {
         .tv_nsec = (suseconds_t)((usec % USEC_PER_SEC) * NSEC_PER_USEC)
     };
 
+    errno = 0;
     int ret = 0;
     while( (ret = clock_nanosleep(clock, TIMER_ABSTIME, &req, NULL)) != 0 ) {
-        if(ret == EINTR) continue;
+        if(ret == EINTR) {
+            errno = 0;
+            continue;
+        }
         else {
             if (ret == EINVAL) {
                 if (!einval_printed) {
@@ -313,7 +317,7 @@ usec_t heartbeat_next(heartbeat_t *hb, usec_t tick) {
     // sleep_usec() has a loop to guarantee we will sleep for at least the requested time.
     // According the specs, when we sleep for a relative time, clock adjustments should not affect the duration
     // we sleep.
-    sleep_usec(next - now);
+    sleep_usec_with_now(next - now, now);
     now = now_realtime_usec();
     dt = now - hb->realtime;
 
@@ -342,7 +346,7 @@ usec_t heartbeat_next(heartbeat_t *hb, usec_t tick) {
     return dt;
 }
 
-void sleep_usec(usec_t usec) {
+void sleep_usec_with_now(usec_t usec, usec_t started_ut) {
     // we expect microseconds (1.000.000 per second)
     // but timespec is nanoseconds (1.000.000.000 per second)
     struct timespec rem = { 0, 0 }, req = {
@@ -350,21 +354,37 @@ void sleep_usec(usec_t usec) {
             .tv_nsec = (suseconds_t) ((usec % USEC_PER_SEC) * NSEC_PER_USEC)
     };
 
-#ifdef __linux__
-    while (clock_nanosleep(CLOCK_REALTIME, 0, &req, &rem) != 0) {
-#else
+    // make sure errno is not EINTR
+    errno = 0;
+
+    if(!started_ut)
+        started_ut = now_realtime_usec();
+
+    usec_t end_ut = started_ut + usec;
+
     while (nanosleep(&req, &rem) != 0) {
-#endif
         if (likely(errno == EINTR && (rem.tv_sec || rem.tv_nsec))) {
             req = rem;
             rem = (struct timespec){ 0, 0 };
+
+            // break an infinite loop
+            errno = 0;
+
+            usec_t now_ut = now_realtime_usec();
+            if(now_ut >= end_ut)
+                break;
+
+            usec_t remaining_ut = (usec_t)req.tv_sec * USEC_PER_SEC + (usec_t)req.tv_nsec * NSEC_PER_USEC > usec;
+            usec_t check_ut = now_ut - started_ut;
+            if(remaining_ut > check_ut) {
+                req = (struct timespec){
+                        .tv_sec = (time_t) ( check_ut / USEC_PER_SEC),
+                        .tv_nsec = (suseconds_t) ((check_ut % USEC_PER_SEC) * NSEC_PER_USEC)
+                };
+            }
         }
         else {
-#ifdef __linux__
-            error("Cannot clock_nanosleep(CLOCK_REALTIME) for %llu microseconds.", usec);
-#else
             error("Cannot nanosleep() for %llu microseconds.", usec);
-#endif
             break;
         }
     }
diff --git a/libnetdata/clocks/clocks.h b/libnetdata/clocks/clocks.h
index 7738a2c8ed..b050b62548 100644
--- a/libnetdata/clocks/clocks.h
+++ b/libnetdata/clocks/clocks.h
@@ -141,7 +141,8 @@ usec_t heartbeat_next(heartbeat_t *hb, usec_t tick);
 
 void heartbeat_statistics(usec_t *min_ptr, usec_t *max_ptr, usec_t *average_ptr, size_t *count_ptr);
 
-void sleep_usec(usec_t usec);
+void sleep_usec_with_now(usec_t usec, usec_t started_ut);
+#define sleep_usec(usec) sleep_usec_with_now(usec, 0);
 
 void clocks_init(void);
 
diff --git a/libnetdata/completion/completion.c b/libnetdata/completion/completion.c
index b5ac86e4f1..6257e02998 100644
--- a/libnetdata/completion/completion.c
+++ b/libnetdata/completion/completion.c
@@ -5,6 +5,7 @@
 void completion_init(struct completion *p)
 {
     p->completed = 0;
+    p->completed_jobs = 0;
     fatal_assert(0 == uv_cond_init(&p->cond));
     fatal_assert(0 == uv_mutex_init(&p->mutex));
 }
@@ -32,3 +33,32 @@ void completion_mark_complete(struct completion *p)
     uv_cond_broadcast(&p->cond);
     uv_mutex_unlock(&p->mutex);
 }
+
+unsigned completion_wait_for_a_job(struct completion *p, unsigned completed_jobs)
+{
+    uv_mutex_lock(&p->mutex);
+    while (0 == p->completed && p->completed_jobs <= completed_jobs) {
+        uv_cond_wait(&p->cond, &p->mutex);
+    }
+    completed_jobs = p->completed_jobs;
+    uv_mutex_unlock(&p->mutex);
+
+    return completed_jobs;
+}
+
+void completion_mark_complete_a_job(struct completion *p)
+{
+    uv_mutex_lock(&p->mutex);
+    p->completed_jobs++;
+    uv_cond_broadcast(&p->cond);
+    uv_mutex_unlock(&p->mutex);
+}
+
+bool completion_is_done(struct completion *p)
+{
+    bool ret;
+    uv_mutex_lock(&p->mutex);
+    ret = p->completed;
+    uv_mutex_unlock(&p->mutex);
+    return ret;
+}
diff --git a/libnetdata/completion/completion.h b/libnetdata/completion/completion.h
index 667360a424..723f736889 100644
--- a/libnetdata/completion/completion.h
+++ b/libnetdata/completion/completion.h
@@ -9,6 +9,7 @@ struct completion {
     uv_mutex_t mutex;
     uv_cond_t cond;
     volatile unsigned completed;
+    volatile unsigned completed_jobs;
 };
 
 void completion_init(struct completion *p);
@@ -19,4 +20,8 @@ void completion_wait_for(struct completion *p);
 
 void completion_mark_complete(struct completion *p);
 
+unsigned completion_wait_for_a_job(struct completion *p, unsigned completed_jobs);
+void completion_mark_complete_a_job(struct completion *p);
+bool completion_is_done(struct completion *p);
+
 #endif /* NETDATA_COMPLETION_H */
diff --git a/libnetdata/dictionary/dictionary.c b/libnetdata/dictionary/dictionary.c
index 0277e067f3..045dbbe55b 100644
--- a/libnetdata/dictionary/dictionary.c
+++ b/libnetdata/dictionary/dictionary.c
@@ -1234,11 +1234,29 @@ static inline size_t item_get_name_len(const DICTIONARY_ITEM *item) {
         return strlen(item->caller_name);
 }
 
+static ARAL dict_items_aral = {
+        .filename = NULL,
+        .cache_dir = NULL,
+        .use_mmap = false,
+        .initial_elements = 65536 / sizeof(DICTIONARY_ITEM),
+        .requested_element_size = sizeof(DICTIONARY_ITEM),
+};
+
+static ARAL dict_shared_items_aral = {
+        .filename = NULL,
+        .cache_dir = NULL,
+        .use_mmap = false,
+        .initial_elements = 65536 / sizeof(DICTIONARY_ITEM_SHARED),
+        .requested_element_size = sizeof(DICTIONARY_ITEM_SHARED),
+};
+
 static DICTIONARY_ITEM *dict_item_create(DICTIONARY *dict __maybe_unused, size_t *allocated_bytes, DICTIONARY_ITEM *master_item) {
     DICTIONARY_ITEM *item;
 
     size_t size = sizeof(DICTIONARY_ITEM);
-    item = callocz(1, size);
+//    item = callocz(1, size);
+    item = arrayalloc_mallocz(&dict_items_aral);
+    memset(item, 0, sizeof(DICTIONARY_ITEM));
 
 #ifdef NETDATA_INTERNAL_CHECKS
     item->creator_pid = gettid();
@@ -1257,7 +1275,10 @@ static DICTIONARY_ITEM *dict_item_create(DICTIONARY *dict __maybe_unused, size_t
     }
     else {
         size = sizeof(DICTIONARY_ITEM_SHARED);
-        item->shared = callocz(1, size);
+        // item->shared = callocz(1, size);
+        item->shared = arrayalloc_mallocz(&dict_shared_items_aral);
+        memset(item->shared, 0, sizeof(DICTIONARY_ITEM_SHARED));
+
         item->shared->links = 1;
         *allocated_bytes += size;
     }
@@ -1396,12 +1417,15 @@ static size_t dict_item_free_with_hooks(DICTIONARY *dict, DICTIONARY_ITEM *item)
         }
         value_size += item->shared->value_len;
 
-        freez(item->shared);
+        // freez(item->shared);
+        arrayalloc_freez(&dict_shared_items_aral, item->shared);
         item->shared = NULL;
         item_size += sizeof(DICTIONARY_ITEM_SHARED);
     }
 
-    freez(item);
+    // freez(item);
+    arrayalloc_freez(&dict_items_aral, item);
+
     item_size += sizeof(DICTIONARY_ITEM);
 
     DICTIONARY_STATS_MINUS_MEMORY(dict, key_size, item_size, value_size);
diff --git a/libnetdata/july/Makefile.am b/libnetdata/july/Makefile.am
new file mode 100644
index 0000000000..161784b8f6
--- /dev/null
+++ b/libnetdata/july/Makefile.am
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-3.0-or-later
+
+AUTOMAKE_OPTIONS = subdir-objects
+MAINTAINERCLEANFILES = $(srcdir)/Makefile.in
+
+dist_noinst_DATA = \
+    README.md \
+    $(NULL)
diff --git a/libnetdata/july/README.md b/libnetdata/july/README.md
new file mode 100644
index 0000000000..5cf4fb22f4
--- /dev/null
+++ b/libnetdata/july/README.md
@@ -0,0 +1,10 @@
+<!--
+custom_edit_url: https://github.com/netdata/netdata/edit/master/libnetdata/july/README.md
+-->
+
+
+# July
+
+An interface similar to `Judy` that uses minimal allocations (that can be cached)
+for items that are mainly appended (just a few insertions in the middle)
+
diff --git a/libnetdata/july/july.c b/libnetdata/july/july.c
new file mode 100644
index 0000000000..eda2f7ed34
--- /dev/null
+++ b/libnetdata/july/july.c
@@ -0,0 +1,447 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "july.h"
+
+#define JULYL_MIN_ENTRIES 10
+
+struct JulyL_item {
+    Word_t index;
+    void *value;
+};
+
+struct JulyL {
+    size_t entries;
+    size_t used;
+
+    // statistics
+    size_t bytes;
+    size_t bytes_moved;
+    size_t reallocs;
+
+    struct {
+        struct JulyL *prev;
+        struct JulyL *next;
+    } cache;
+
+    struct JulyL_item array[];
+};
+
+// ----------------------------------------------------------------------------
+// JulyL cache
+
+static struct {
+    struct {
+        SPINLOCK spinlock;
+        struct JulyL *available_items;
+        size_t available;
+    } protected;
+
+    struct {
+        size_t bytes;
+        size_t allocated;
+        size_t bytes_moved;
+        size_t reallocs;
+    } atomics;
+} julyl_globals = {
+        .protected = {
+                .spinlock = NETDATA_SPINLOCK_INITIALIZER,
+                .available_items = NULL,
+                .available = 0,
+        },
+        .atomics = {
+                .bytes = 0,
+                .allocated = 0,
+                .bytes_moved = 0,
+                .reallocs = 0,
+        },
+};
+
+void julyl_cleanup(void) {
+    netdata_spinlock_lock(&julyl_globals.protected.spinlock);
+
+    while(julyl_globals.protected.available_items && julyl_globals.protected.available > 10) {
+        struct JulyL *item = julyl_globals.protected.available_items;
+        DOUBLE_LINKED_LIST_REMOVE_UNSAFE(julyl_globals.protected.available_items, item, cache.prev, cache.next);
+        size_t bytes = item->bytes;
+        freez(item);
+        julyl_globals.protected.available--;
+        __atomic_sub_fetch(&julyl_globals.atomics.bytes, bytes, __ATOMIC_RELAXED);
+        __atomic_sub_fetch(&julyl_globals.atomics.allocated, 1, __ATOMIC_RELAXED);
+    }
+
+    netdata_spinlock_unlock(&julyl_globals.protected.spinlock);
+}
+
+struct JulyL *julyl_get(void) {
+    struct JulyL *j;
+
+    netdata_spinlock_lock(&julyl_globals.protected.spinlock);
+
+    j = julyl_globals.protected.available_items;
+    if(likely(j)) {
+        DOUBLE_LINKED_LIST_REMOVE_UNSAFE(julyl_globals.protected.available_items, j, cache.prev, cache.next);
+        julyl_globals.protected.available--;
+    }
+
+    netdata_spinlock_unlock(&julyl_globals.protected.spinlock);
+
+    if(unlikely(!j)) {
+        size_t bytes = sizeof(struct JulyL) + JULYL_MIN_ENTRIES * sizeof(struct JulyL_item);
+        j = mallocz(bytes);
+        j->bytes = bytes;
+        j->entries = JULYL_MIN_ENTRIES;
+        __atomic_add_fetch(&julyl_globals.atomics.bytes, bytes, __ATOMIC_RELAXED);
+        __atomic_add_fetch(&julyl_globals.atomics.allocated, 1, __ATOMIC_RELAXED);
+    }
+
+    j->used = 0;
+    j->bytes_moved = 0;
+    j->reallocs = 0;
+    j->cache.next = j->cache.prev = NULL;
+    return j;
+}
+
+static void julyl_release(struct JulyL *j) {
+    if(unlikely(!j)) return;
+
+    __atomic_add_fetch(&julyl_globals.atomics.bytes_moved, j->bytes_moved, __ATOMIC_RELAXED);
+    __atomic_add_fetch(&julyl_globals.atomics.reallocs, j->reallocs, __ATOMIC_RELAXED);
+
+    netdata_spinlock_lock(&julyl_globals.protected.spinlock);
+    DOUBLE_LINKED_LIST_APPEND_UNSAFE(julyl_globals.protected.available_items, j, cache.prev, cache.next);
+    julyl_globals.protected.available++;
+    netdata_spinlock_unlock(&julyl_globals.protected.spinlock);
+}
+
+size_t julyl_cache_size(void) {
+    return __atomic_load_n(&julyl_globals.atomics.bytes, __ATOMIC_RELAXED);
+}
+
+size_t julyl_bytes_moved(void) {
+    return __atomic_load_n(&julyl_globals.atomics.bytes_moved, __ATOMIC_RELAXED);
+}
+
+// ----------------------------------------------------------------------------
+// JulyL
+
+size_t JulyLGet_binary_search_position_of_index(const struct JulyL *July, Word_t Index) {
+    // return the position of the first item >= Index
+
+    size_t left = 0;
+    size_t right = July->used;
+    while(left < right) {
+        size_t middle = (left + right) >> 1;
+
+        if(July->array[middle].index > Index)
+            right = middle;
+
+        else
+            left = middle + 1;
+    }
+
+    internal_fatal(left > July->used, "JULY: invalid position returned");
+
+    if(left > 0 && July->array[left - 1].index == Index)
+        return left - 1;
+
+    internal_fatal( (left < July->used && July->array[left].index < Index) ||
+                    (left > 0 && July->array[left - 1].index >= Index)
+                   , "JULY: wrong item returned");
+
+    return left;
+}
+
+PPvoid_t JulyLGet(Pcvoid_t PArray, Word_t Index, PJError_t PJError __maybe_unused) {
+    const struct JulyL *July = PArray;
+    if(!July)
+        return NULL;
+
+    size_t pos = JulyLGet_binary_search_position_of_index(July, Index);
+
+    if(unlikely(pos >= July->used || July->array[pos].index != Index))
+        return NULL;
+
+    return (PPvoid_t)&July->array[pos].value;
+}
+
+PPvoid_t JulyLIns(PPvoid_t PPArray, Word_t Index, PJError_t PJError __maybe_unused) {
+    struct JulyL *July = *PPArray;
+    if(unlikely(!July)) {
+        July = julyl_get();
+        July->used = 0;
+        *PPArray = July;
+    }
+
+    size_t pos = JulyLGet_binary_search_position_of_index(July, Index);
+
+    if((pos == July->used || July->array[pos].index != Index)) {
+        // we have to add this entry
+
+        if (unlikely(July->used == July->entries)) {
+            // we have to expand the array
+            size_t bytes = sizeof(struct JulyL) + July->entries * 2 * sizeof(struct JulyL_item);
+            __atomic_add_fetch(&julyl_globals.atomics.bytes, bytes - July->bytes, __ATOMIC_RELAXED);
+            July = reallocz(July, bytes);
+            July->bytes = bytes;
+            July->entries *= 2;
+            July->reallocs++;
+            *PPArray = July;
+        }
+
+        if (unlikely(pos != July->used)) {
+            // we have to shift some members to make room
+            size_t size = (July->used - pos) * sizeof(struct JulyL_item);
+            memmove(&July->array[pos + 1], &July->array[pos], size);
+            July->bytes_moved += size;
+        }
+
+        July->used++;
+        July->array[pos].value = NULL;
+        July->array[pos].index = Index;
+    }
+
+    return &July->array[pos].value;
+}
+
+PPvoid_t JulyLFirst(Pcvoid_t PArray, Word_t *Index, PJError_t PJError __maybe_unused) {
+    const struct JulyL *July = PArray;
+    if(!July)
+        return NULL;
+
+    size_t pos = JulyLGet_binary_search_position_of_index(July, *Index);
+    // pos is >= Index
+
+    if(unlikely(pos == July->used))
+        return NULL;
+
+    *Index = July->array[pos].index;
+    return (PPvoid_t)&July->array[pos].value;
+}
+
+PPvoid_t JulyLNext(Pcvoid_t PArray, Word_t *Index, PJError_t PJError __maybe_unused) {
+    const struct JulyL *July = PArray;
+    if(!July)
+        return NULL;
+
+    size_t pos = JulyLGet_binary_search_position_of_index(July, *Index);
+    // pos is >= Index
+
+    if(unlikely(pos == July->used))
+        return NULL;
+
+    if(July->array[pos].index == *Index) {
+        pos++;
+
+        if(unlikely(pos == July->used))
+            return NULL;
+    }
+
+    *Index = July->array[pos].index;
+    return (PPvoid_t)&July->array[pos].value;
+}
+
+PPvoid_t JulyLLast(Pcvoid_t PArray, Word_t *Index, PJError_t PJError __maybe_unused) {
+    const struct JulyL *July = PArray;
+    if(!July)
+        return NULL;
+
+    size_t pos = JulyLGet_binary_search_position_of_index(July, *Index);
+    // pos is >= Index
+
+    if(pos > 0 && (pos == July->used || July->array[pos].index > *Index))
+        pos--;
+
+    if(unlikely(pos == 0 && July->array[0].index > *Index))
+        return NULL;
+
+    *Index = July->array[pos].index;
+    return (PPvoid_t)&July->array[pos].value;
+}
+
+PPvoid_t JulyLPrev(Pcvoid_t PArray, Word_t *Index, PJError_t PJError __maybe_unused) {
+    const struct JulyL *July = PArray;
+    if(!July)
+        return NULL;
+
+    size_t pos = JulyLGet_binary_search_position_of_index(July, *Index);
+    // pos is >= Index
+
+    if(unlikely(pos == 0 || July->used == 0))
+        return NULL;
+
+    // get the previous one
+    pos--;
+
+    *Index = July->array[pos].index;
+    return (PPvoid_t)&July->array[pos].value;
+}
+
+Word_t JulyLFreeArray(PPvoid_t PPArray, PJError_t PJError __maybe_unused) {
+    struct JulyL *July = *PPArray;
+    if(unlikely(!July))
+        return 0;
+
+    size_t bytes = July->bytes;
+    julyl_release(July);
+    *PPArray = NULL;
+    return bytes;
+}
+
+// --------------------------------------------
author	Costa Tsaousis <costa@netdata.cloud>	2023-01-10 19:59:21 +0200
committer	GitHub <noreply@github.com>	2023-01-10 19:59:21 +0200
commit	368a26cfee6887ca0cb2301d93138f63b75e353a (patch)
tree	b57e39fdb78dc57f7a2c1fcc3d9b6bf3c2a2a113 /libnetdata
parent	b513888be389f92b2323d1bb3fdf55c22d4e4bad (diff)