summaryrefslogtreecommitdiffstats
path: root/daemon
diff options
context:
space:
mode:
authorCosta Tsaousis <costa@netdata.cloud>2023-01-10 19:59:21 +0200
committerGitHub <noreply@github.com>2023-01-10 19:59:21 +0200
commit368a26cfee6887ca0cb2301d93138f63b75e353a (patch)
treeb57e39fdb78dc57f7a2c1fcc3d9b6bf3c2a2a113 /daemon
parentb513888be389f92b2323d1bb3fdf55c22d4e4bad (diff)
DBENGINE v2 (#14125)
* count open cache pages refering to datafile * eliminate waste flush attempts * remove eliminated variable * journal v2 scanning split functions * avoid locking open cache for a long time while migrating to journal v2 * dont acquire datafile for the loop; disable thread cancelability while a query is running * work on datafile acquiring * work on datafile deletion * work on datafile deletion again * logs of dbengine should start with DBENGINE * thread specific key for queries to check if a query finishes without a finalize * page_uuid is not used anymore * Cleanup judy traversal when building new v2 Remove not needed calls to metric registry * metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent * disable checks for invalid time-ranges * Remove type from page details * report scanning time * remove infinite loop from datafile acquire for deletion * remove infinite loop from datafile acquire for deletion again * trace query handles * properly allocate array of dimensions in replication * metrics cleanup * metrics registry uses arrayalloc * arrayalloc free should be protected by lock * use array alloc in page cache * journal v2 scanning fix * datafile reference leaking hunding * do not load metrics of future timestamps * initialize reasons * fix datafile reference leak * do not load pages that are entirely overlapped by others * expand metric retention atomically * split replication logic in initialization and execution * replication prepare ahead queries * replication prepare ahead queries fixed * fix replication workers accounting * add router active queries chart * restore accounting of pages metadata sources; cleanup replication * dont count skipped pages as unroutable * notes on services shutdown * do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file * do not add pages we dont need to pdc * time in range re-work to provide info about past and future matches * finner control on the pages selected for processing; accounting of page related issues * fix invalid reference to handle->page * eliminate data collection handle of pg_lookup_next * accounting for queries with gaps * query preprocessing the same way the processing is done; cache now supports all operations on Judy * dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3) * get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query * finner gaps calculation; accounting of overlapping pages in queries * fix gaps accounting * move datafile deletion to worker thread * tune libuv workers and thread stack size * stop netdata threads gradually * run indexing together with cache flush/evict * more work on clean shutdown * limit the number of pages to evict per run * do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction * economies on flags for smaller page footprint; cleanup and renames * eviction moves referenced pages to the end of the queue * use murmur hash for indexing partition * murmur should be static * use more indexing partitions * revert number of partitions to number of cpus * cancel threads first, then stop services * revert default thread stack size * dont execute replication requests of disconnected senders * wait more time for services that are exiting gradually * fixed last commit * finer control on page selection algorithm * default stacksize of 1MB * fix formatting * fix worker utilization going crazy when the number is rotating * avoid buffer full due to replication preprocessing of requests * support query priorities * add count of spins in spinlock when compiled with netdata internal checks * remove prioritization from dbengine queries; cache now uses mutexes for the queues * hot pages are now in sections judy arrays, like dirty * align replication queries to optimal page size * during flushing add to clean and evict in batches * Revert "during flushing add to clean and evict in batches" This reverts commit 8fb2b69d068499eacea6de8291c336e5e9f197c7. * dont lock clean while evicting pages during flushing * Revert "dont lock clean while evicting pages during flushing" This reverts commit d6c82b5f40aeba86fc7aead062fab1b819ba58b3. * Revert "Revert "during flushing add to clean and evict in batches"" This reverts commit ca7a187537fb8f743992700427e13042561211ec. * dont cross locks during flushing, for the fastest flushes possible * low-priority queries load pages synchronously * Revert "low-priority queries load pages synchronously" This reverts commit 1ef2662ddcd20fe5842b856c716df134c42d1dc7. * cache uses spinlock again * during flushing, dont lock the clean queue at all; each item is added atomically * do smaller eviction runs * evict one page at a time to minimize lock contention on the clean queue * fix eviction statistics * fix last commit * plain should be main cache * event loop cleanup; evictions and flushes can now happen concurrently * run flush and evictions from tier0 only * remove not needed variables * flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time * added worker jobs in timer to find the slow part of it * support fast eviction of pages when all_of_them is set * revert default thread stack size * bypass event loop for dispatching read extent commands to workers - send them directly * Revert "bypass event loop for dispatching read extent commands to workers - send them directly" This reverts commit 2c08bc5bab12881ae33bc73ce5dea03dfc4e1fce. * cache work requests * minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors * publish flushed pages to open cache in the thread pool * prevent eventloop requests from getting stacked in the event loop * single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c * more rrdengine.c cleanup * enable db rotation * do not log when there is a filter * do not run multiple migration to journal v2 * load all extents async * fix wrong paste * report opcodes waiting, works dispatched, works executing * cleanup event loop memory every 10 minutes * dont dispatch more work requests than the number of threads available * use the dispatched counter instead of the executing counter to check if the worker thread pool is full * remove UV_RUN_NOWAIT * replication to fill the queues * caching of extent buffers; code cleanup * caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation * single transaction wal * synchronous flushing * first cancel the threads, then signal them to exit * caching of rrdeng query handles; added priority to query target; health is now low prio * add priority to the missing points; do not allow critical priority in queries * offload query preparation and routing to libuv thread pool * updated timing charts for the offloaded query preparation * caching of WALs * accounting for struct caches (buffers); do not load extents with invalid sizes * protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset * also check if the expanded before is not the chart later updated time * also check if the expanded before is not after the wall clock time of when the query started * Remove unused variable * replication to queue less queries; cleanup of internal fatals * Mark dimension to be updated async * caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol) * disable pgc stress test, under an ifdef * disable mrg stress test under an ifdef * Mark chart and host labels, host info for async check and store in the database * dictionary items use arrayalloc * cache section pages structure is allocated with arrayalloc * Add function to wakeup the aclk query threads and check for exit Register function to be called during shutdown after signaling the service to exit * parallel preparation of all dimensions of queries * be more sensitive to enable streaming after replication * atomically finish chart replication * fix last commit * fix last commit again * fix last commit again again * fix last commit again again again * unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication * do not cancel start streaming; use high priority queries when we have locked chart data collection * prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered * opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue * Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore * Fix bad memory allocation / cleanup * Cleanup ACLK sync initialization (part 1) * Don't update metric registry during shutdown (part 1) * Prevent crash when dashboard is refreshed and host goes away * Mark ctx that is shutting down. Test not adding flushed pages to open cache as hot if we are shutting down * make ML work * Fix compile without NETDATA_INTERNAL_CHECKS * shutdown each ctx independently * fix completion of quiesce * do not update shared ML charts * Create ML charts on child hosts. When a parent runs a ML for a child, the relevant-ML charts should be created on the child host. These charts should use the parent's hostname to differentiate multiple parents that might run ML for a child. The only exception to this rule is the training/prediction resource usage charts. These are created on the localhost of the parent host, because they provide information specific to said host. * check new ml code * first save the database, then free all memory * dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db * Cleanup metadata thread (part 2) * increase refcount before dispatching prep command * Do not try to stop anomaly detection threads twice. A separate function call has been added to stop anomaly detection threads. This commit removes the left over function calls that were made internally when a host was being created/destroyed. * Remove allocations when smoothing samples buffer The number of dims per sample is always 1, ie. we are training and predicting only individual dimensions. * set the orphan flag when loading archived hosts * track worker dispatch callbacks and threadpool worker init * make ML threads joinable; mark ctx having flushing in progress as early as possible * fix allocation counter * Cleanup metadata thread (part 3) * Cleanup metadata thread (part 4) * Skip metadata host scan when running unittest * unittest support during init * dont use all the libuv threads for queries * break an infinite loop when sleep_usec() is interrupted * ml prediction is a collector for several charts * sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit * worker_unregister() in netdata threads cleanup * moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h * fixed ML issues * removed engine2 directory * added dbengine2 files in CMakeLists.txt * move query plan data to query target, so that they can be exposed by in jsonwrap * uniform definition of query plan according to the other query target members * event_loop should be in daemon, not libnetdata * metric_retention_by_uuid() is now part of the storage engine abstraction * unify time_t variables to have the suffix _s (meaning: seconds) * old dbengine statistics become "dbengine io" * do not enable ML resource usage charts by default * unify ml chart families, plugins and modules * cleanup query plans from query target * cleanup all extent buffers * added debug info for rrddim slot to time * rrddim now does proper gap management * full rewrite of the mem modes * use library functions for madvise * use CHECKSUM_SZ for the checksum size * fix coverity warning about the impossible case of returning a page that is entirely in the past of the query * fix dbengine shutdown * keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently * fine tune cache evictions * dont initialize health if the health service is not running - prevent crash on shutdown while children get connected * rename AS threads to ACLK[hostname] * prevent re-use of uninitialized memory in queries * use JulyL instead of JudyL for PDC operations - to test it first * add also JulyL files * fix July memory accounting * disable July for PDC (use Judy) * use the function to remove datafiles from linked list * fix july and event_loop * add july to libnetdata subdirs * rename time_t variables that end in _t to end in _s * replicate when there is a gap at the beginning of the replication period * reset postponing of sender connections when a receiver is connected * Adjust update every properly * fix replication infinite loop due to last change * packed enums in rrd.h and cleanup of obsolete rrd structure members * prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests() * void unused variable * void unused variables * fix indentation * entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps * macros to caclulate page entries by time and size * prevent statsd cleanup crash on exit * cleanup health thread related variables Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: vkalintiris <vasilis@netdata.cloud>
Diffstat (limited to 'daemon')
-rw-r--r--daemon/analytics.c12
-rw-r--r--daemon/commands.c6
-rw-r--r--daemon/common.h1
-rw-r--r--daemon/event_loop.c49
-rw-r--r--daemon/event_loop.h44
-rw-r--r--daemon/global_statistics.c1492
-rw-r--r--daemon/main.c387
-rw-r--r--daemon/main.h30
-rw-r--r--daemon/service.c32
-rw-r--r--daemon/static_threads.c12
-rw-r--r--daemon/unit_test.c25
11 files changed, 1774 insertions, 316 deletions
diff --git a/daemon/analytics.c b/daemon/analytics.c
index 3d0e514d66..dd7eeebbd7 100644
--- a/daemon/analytics.c
+++ b/daemon/analytics.c
@@ -223,9 +223,7 @@ void analytics_mirrored_hosts(void)
if (rrdhost_flag_check(host, RRDHOST_FLAG_ARCHIVED))
continue;
- netdata_mutex_lock(&host->receiver_lock);
- ((host->receiver || host == localhost) ? reachable++ : unreachable++);
- netdata_mutex_unlock(&host->receiver_lock);
+ ((host == localhost || !rrdhost_flag_check(host, RRDHOST_FLAG_ORPHAN)) ? reachable++ : unreachable++);
count++;
}
@@ -554,7 +552,7 @@ void analytics_gather_mutable_meta_data(void)
snprintfz(b, 6, "%d", analytics_data.dashboard_hits);
analytics_set_data(&analytics_data.netdata_dashboard_used, b);
- snprintfz(b, 6, "%zu", rrd_hosts_available);
+ snprintfz(b, 6, "%zu", rrdhost_hosts_available());
analytics_set_data(&analytics_data.netdata_config_hosts_available, b);
}
}
@@ -587,12 +585,12 @@ void *analytics_main(void *ptr)
debug(D_ANALYTICS, "Analytics thread starts");
//first delay after agent start
- while (!netdata_exit && likely(sec <= ANALYTICS_INIT_SLEEP_SEC)) {
+ while (service_running(SERVICE_ANALYTICS) && likely(sec <= ANALYTICS_INIT_SLEEP_SEC)) {
heartbeat_next(&hb, step_ut);
sec++;
}
- if (unlikely(netdata_exit))
+ if (unlikely(!service_running(SERVICE_ANALYTICS)))
goto cleanup;
analytics_gather_immutable_meta_data();
@@ -605,7 +603,7 @@ void *analytics_main(void *ptr)
heartbeat_next(&hb, step_ut * 2);
sec += 2;
- if (unlikely(netdata_exit))
+ if (unlikely(!service_running(SERVICE_ANALYTICS)))
break;
if (likely(sec < ANALYTICS_HEARTBEAT))
diff --git a/daemon/commands.c b/daemon/commands.c
index fae88dabf9..8f09e63e11 100644
--- a/daemon/commands.c
+++ b/daemon/commands.c
@@ -470,9 +470,13 @@ static void after_schedule_command(uv_work_t *req, int status)
static void schedule_command(uv_work_t *req)
{
- struct command_context *cmd_ctx = req->data;
+ register_libuv_worker_jobs();
+ worker_is_busy(UV_EVENT_SCHEDULE_CMD);
+ struct command_context *cmd_ctx = req->data;
cmd_ctx->status = execute_command(cmd_ctx->idx, cmd_ctx->args, &cmd_ctx->message);
+
+ worker_is_idle();
}
/* This will alter the state of the command_info_array.cmd_str
diff --git a/daemon/common.h b/daemon/common.h
index f3d868661e..8a775fb837 100644
--- a/daemon/common.h
+++ b/daemon/common.h
@@ -4,6 +4,7 @@
#define NETDATA_COMMON_H 1
#include "libnetdata/libnetdata.h"
+#include "event_loop.h"
// ----------------------------------------------------------------------------
// shortcuts for the default netdata configuration
diff --git a/daemon/event_loop.c b/daemon/event_loop.c
new file mode 100644
index 0000000000..13d3a5822c
--- /dev/null
+++ b/daemon/event_loop.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include <daemon/main.h>
+#include "event_loop.h"
+
+// Register workers
+void register_libuv_worker_jobs() {
+ static __thread bool registered = false;
+
+ if(likely(registered))
+ return;
+
+ registered = true;
+
+ worker_register("LIBUV");
+ worker_register_job_name(UV_EVENT_READ_PAGE_CB, "read page cb");
+ worker_register_job_name(UV_EVENT_READ_EXTENT_CB, "read extent cb");
+ worker_register_job_name(UV_EVENT_COMMIT_PAGE_CB, "commit cb");
+ worker_register_job_name(UV_EVENT_FLUSH_PAGES_CB, "flush cb");
+ worker_register_job_name(UV_EVENT_PAGE_LOOKUP, "page lookup");
+ worker_register_job_name(UV_EVENT_METRIC_LOOKUP, "metric lookup");
+ worker_register_job_name(UV_EVENT_PAGE_POPULATION, "populate page");
+ worker_register_job_name(UV_EVENT_EXT_DECOMPRESSION, "extent decompression");
+ worker_register_job_name(UV_EVENT_READ_MMAP_EXTENT, "read extent (mmap)");
+ worker_register_job_name(UV_EVENT_EXTENT_PROCESSING, "extent processing");
+ worker_register_job_name(UV_EVENT_METADATA_STORE, "store host metadata");
+ worker_register_job_name(UV_EVENT_JOURNAL_INDEX_WAIT, "journal v2 wait");
+ worker_register_job_name(UV_EVENT_JOURNAL_INDEX, "journal v2 indexing");
+ worker_register_job_name(UV_EVENT_SCHEDULE_CMD, "schedule command");
+ worker_register_job_name(UV_EVENT_METADATA_CLEANUP, "metadata cleanup");
+ worker_register_job_name(UV_EVENT_EXTENT_CACHE, "extent cache");
+ worker_register_job_name(UV_EVENT_EXTENT_MMAP, "extent mmap");
+ worker_register_job_name(UV_EVENT_PAGE_DISPATCH, "dispatch page list");
+ worker_register_job_name(UV_EVENT_FLUSH_CALLBACK, "flush callback");
+ worker_register_job_name(UV_EVENT_FLUSH_MAIN, "flush main");
+ worker_register_job_name(UV_EVENT_FLUSH_OPEN, "flush open");
+ worker_register_job_name(UV_EVENT_EVICT_MAIN, "evict main");
+ worker_register_job_name(UV_EVENT_DELETING_FILE, "delete datafiles");
+ worker_register_job_name(UV_EVENT_ANALYZE_V2, "analyze journalfile");
+ worker_register_job_name(UV_EVENT_RETENTION_V2, "calculate retention");
+ worker_register_job_name(UV_EVENT_RETENTION_UPDATE, "update retention");
+ worker_register_job_name(UV_EVENT_DATAFILE_ACQUIRE, "datafile acquire");
+ worker_register_job_name(UV_EVENT_DATAFILE_DELETE, "datafile deletion");
+ worker_register_job_name(UV_EVENT_FLUSHED_TO_OPEN, "flushed to open");
+ worker_register_job_name(UV_EVENT_PREP_QUERY, "prep query");
+ worker_register_job_name(UV_EVENT_WORKER_INIT, "worker init");
+
+ uv_thread_set_name_np(pthread_self(), "LIBUV_WORKER");
+}
diff --git a/daemon/event_loop.h b/daemon/event_loop.h
new file mode 100644
index 0000000000..e332c253cd
--- /dev/null
+++ b/daemon/event_loop.h
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_EVENT_LOOP_H
+#define NETDATA_EVENT_LOOP_H
+
+enum event_loop_job {
+ UV_EVENT_JOB_NONE = 0,
+ UV_EVENT_READ_PAGE_CB,
+ UV_EVENT_READ_EXTENT_CB,
+ UV_EVENT_COMMIT_PAGE_CB,
+ UV_EVENT_FLUSH_PAGES_CB,
+ UV_EVENT_EXT_DECOMPRESSION,
+ UV_EVENT_PAGE_LOOKUP,
+ UV_EVENT_METRIC_LOOKUP,
+ UV_EVENT_PAGE_POPULATION,
+ UV_EVENT_READ_MMAP_EXTENT,
+ UV_EVENT_EXTENT_PROCESSING,
+ UV_EVENT_METADATA_STORE,
+ UV_EVENT_JOURNAL_INDEX_WAIT,
+ UV_EVENT_JOURNAL_INDEX,
+ UV_EVENT_SCHEDULE_CMD,
+ UV_EVENT_METADATA_CLEANUP,
+ UV_EVENT_EXTENT_CACHE,
+ UV_EVENT_EXTENT_MMAP,
+ UV_EVENT_FLUSH_CALLBACK,
+ UV_EVENT_EXTEXT_DISPATCH,
+ UV_EVENT_FLUSH_MAIN,
+ UV_EVENT_FLUSH_OPEN,
+ UV_EVENT_EVICT_MAIN,
+ UV_EVENT_PAGE_DISPATCH,
+ UV_EVENT_DELETING_FILE,
+ UV_EVENT_ANALYZE_V2,
+ UV_EVENT_RETENTION_V2,
+ UV_EVENT_RETENTION_UPDATE,
+ UV_EVENT_DATAFILE_ACQUIRE,
+ UV_EVENT_DATAFILE_DELETE,
+ UV_EVENT_FLUSHED_TO_OPEN,
+ UV_EVENT_PREP_QUERY,
+ UV_EVENT_WORKER_INIT,
+};
+
+void register_libuv_worker_jobs();
+
+#endif //NETDATA_EVENT_LOOP_H
diff --git a/daemon/global_statistics.c b/daemon/global_statistics.c
index 49d7086956..15f3a5275f 100644
--- a/daemon/global_statistics.c
+++ b/daemon/global_statistics.c
@@ -669,15 +669,15 @@ static void global_statistics_charts(void) {
"netdata" // type
, "ml_models_consulted" // id
, NULL // name
- , "ml" // family
+ , NETDATA_ML_CHART_FAMILY // family
, NULL // context
, "KMeans models used for prediction" // title
, "models" // units
- , "netdata" // plugin
- , "ml" // module
- , 131004 // priority
+ , NETDATA_ML_PLUGIN // plugin
+ , NETDATA_ML_MODULE_DETECTION // module
+ , NETDATA_ML_CHART_PRIO_MACHINE_LEARNING_STATUS // priority
, localhost->rrd_update_every // update_every
- , RRDSET_TYPE_STACKED // chart_type
+ , RRDSET_TYPE_AREA // chart_type
);
rd = rrddim_add(st, "num_models_consulted", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
@@ -996,9 +996,1119 @@ static void sqlite3_statistics_charts(void) {
// ----------------------------------------------------------------
}
-static void dbengine_statistics_charts(void) {
-#ifdef ENABLE_DBENGINE
+struct dbengine2_cache_pointers {
+ RRDSET *st_cache_hit_ratio;
+ RRDDIM *rd_hit_ratio_closest;
+ RRDDIM *rd_hit_ratio_exact;
+
+ RRDSET *st_operations;
+ RRDDIM *rd_searches_closest;
+ RRDDIM *rd_searches_exact;
+ RRDDIM *rd_add_hot;
+ RRDDIM *rd_add_clean;
+ RRDDIM *rd_evictions;
+ RRDDIM *rd_flushes;
+ RRDDIM *rd_acquires;
+ RRDDIM *rd_releases;
+ RRDDIM *rd_acquires_for_deletion;
+
+ RRDSET *st_pgc_memory;
+ RRDDIM *rd_pgc_memory_free;
+ RRDDIM *rd_pgc_memory_clean;
+ RRDDIM *rd_pgc_memory_hot;
+ RRDDIM *rd_pgc_memory_dirty;
+ RRDDIM *rd_pgc_memory_index;
+ RRDDIM *rd_pgc_memory_evicting;
+ RRDDIM *rd_pgc_memory_flushing;
+
+ RRDSET *st_pgc_pages;
+ RRDDIM *rd_pgc_pages_clean;
+ RRDDIM *rd_pgc_pages_hot;
+ RRDDIM *rd_pgc_pages_dirty;
+ RRDDIM *rd_pgc_pages_referenced;
+
+ RRDSET *st_pgc_memory_changes;
+ RRDDIM *rd_pgc_memory_new_hot;
+ RRDDIM *rd_pgc_memory_new_clean;
+ RRDDIM *rd_pgc_memory_clean_evictions;
+
+ RRDSET *st_pgc_memory_migrations;
+ RRDDIM *rd_pgc_memory_hot_to_dirty;
+ RRDDIM *rd_pgc_memory_dirty_to_clean;
+
+ RRDSET *st_pgc_workers;
+ RRDDIM *rd_pgc_workers_evictors;
+ RRDDIM *rd_pgc_workers_flushers;
+ RRDDIM *rd_pgc_workers_adders;
+ RRDDIM *rd_pgc_workers_searchers;
+ RRDDIM *rd_pgc_workers_jv2_flushers;
+ RRDDIM *rd_pgc_workers_hot2dirty;
+
+ RRDSET *st_pgc_memory_events;
+ RRDDIM *rd_pgc_memory_evictions_critical;
+ RRDDIM *rd_pgc_memory_evictions_aggressive;
+ RRDDIM *rd_pgc_memory_flushes_critical;
+
+ RRDSET *st_pgc_waste;
+ RRDDIM *rd_pgc_waste_evictions_skipped;
+ RRDDIM *rd_pgc_waste_flushes_cancelled;
+ RRDDIM *rd_pgc_waste_insert_spins;
+ RRDDIM *rd_pgc_waste_evict_spins;
+ RRDDIM *rd_pgc_waste_release_spins;
+ RRDDIM *rd_pgc_waste_acquire_spins;
+ RRDDIM *rd_pgc_waste_delete_spins;
+ RRDDIM *rd_pgc_waste_flush_spins;
+
+};
+
+static void dbengine2_cache_statistics_charts(struct dbengine2_cache_pointers *ptrs, struct pgc_statistics *pgc_stats, struct pgc_statistics *pgc_stats_old __maybe_unused, const char *name, int priority) {
+
+ {
+ if (unlikely(!ptrs->st_cache_hit_ratio)) {
+ BUFFER *id = buffer_create(100);
+ buffer_sprintf(id, "dbengine_%s_cache_hit_ratio", name);
+
+ BUFFER *family = buffer_create(100);
+ buffer_sprintf(family, "dbengine %s cache", name);
+
+ BUFFER *title = buffer_create(100);
+ buffer_sprintf(title, "Netdata %s Cache Hit Ratio", name);
+
+ ptrs->st_cache_hit_ratio = rrdset_create_localhost(
+ "netdata",
+ buffer_tostring(id),
+ NULL,
+ buffer_tostring(family),
+ NULL,
+ buffer_tostring(title),
+ "%",
+ "netdata",
+ "stats",
+ priority,
+ localhost->rrd_update_every,
+ RRDSET_TYPE_LINE);
+
+ ptrs->rd_hit_ratio_closest = rrddim_add(ptrs->st_cache_hit_ratio, "closest", NULL, 1, 10000, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_hit_ratio_exact = rrddim_add(ptrs->st_cache_hit_ratio, "exact", NULL, 1, 10000, RRD_ALGORITHM_ABSOLUTE);
+
+ buffer_free(id);
+ buffer_free(family);
+ buffer_free(title);
+ priority++;
+ }
+
+ size_t closest_percent = 100 * 10000;
+ if(pgc_stats->searches_closest > pgc_stats_old->searches_closest)
+ closest_percent = (pgc_stats->searches_closest_hits - pgc_stats_old->searches_closest_hits) * 100 * 10000 / (pgc_stats->searches_closest - pgc_stats_old->searches_closest);
+
+ size_t exact_percent = 100 * 10000;
+ if(pgc_stats->searches_exact > pgc_stats_old->searches_exact)
+ exact_percent = (pgc_stats->searches_exact_hits - pgc_stats_old->searches_exact_hits) * 100 * 10000 / (pgc_stats->searches_exact - pgc_stats_old->searches_exact);
+
+ rrddim_set_by_pointer(ptrs->st_cache_hit_ratio, ptrs->rd_hit_ratio_closest, (collected_number)closest_percent);
+ rrddim_set_by_pointer(ptrs->st_cache_hit_ratio, ptrs->rd_hit_ratio_exact, (collected_number)exact_percent);
+
+ rrdset_done(ptrs->st_cache_hit_ratio);
+ }
+
+ {
+ if (unlikely(!ptrs->st_operations)) {
+ BUFFER *id = buffer_create(100);
+ buffer_sprintf(id, "dbengine_%s_cache_operations", name);
+
+ BUFFER *family = buffer_create(100);
+ buffer_sprintf(family, "dbengine %s cache", name);
+
+ BUFFER *title = buffer_create(100);
+ buffer_sprintf(title, "Netdata %s Cache Operations", name);
+
+ ptrs->st_operations = rrdset_create_localhost(
+ "netdata",
+ buffer_tostring(id),
+ NULL,
+ buffer_tostring(family),
+ NULL,
+ buffer_tostring(title),
+ "ops/s",
+ "netdata",
+ "stats",
+ priority,
+ localhost->rrd_update_every,
+ RRDSET_TYPE_LINE);
+
+ ptrs->rd_searches_closest = rrddim_add(ptrs->st_operations, "search closest", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_searches_exact = rrddim_add(ptrs->st_operations, "search exact", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_add_hot = rrddim_add(ptrs->st_operations, "add hot", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_add_clean = rrddim_add(ptrs->st_operations, "add clean", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_evictions = rrddim_add(ptrs->st_operations, "evictions", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_flushes = rrddim_add(ptrs->st_operations, "flushes", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_acquires = rrddim_add(ptrs->st_operations, "acquires", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_releases = rrddim_add(ptrs->st_operations, "releases", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_acquires_for_deletion = rrddim_add(ptrs->st_operations, "del acquires", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+
+ buffer_free(id);
+ buffer_free(family);
+ buffer_free(title);
+ priority++;
+ }
+
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_searches_closest, (collected_number)pgc_stats->searches_closest);
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_searches_exact, (collected_number)pgc_stats->searches_exact);
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_add_hot, (collected_number)pgc_stats->queues.hot.added_entries);
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_add_clean, (collected_number)(pgc_stats->added_entries - pgc_stats->queues.hot.added_entries));
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_evictions, (collected_number)pgc_stats->queues.clean.removed_entries);
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_flushes, (collected_number)pgc_stats->flushes_completed);
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_acquires, (collected_number)pgc_stats->acquires);
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_releases, (collected_number)pgc_stats->releases);
+ rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_acquires_for_deletion, (collected_number)pgc_stats->acquires_for_deletion);
+
+ rrdset_done(ptrs->st_operations);
+ }
+
+ {
+
+ if (unlikely(!ptrs->st_pgc_memory)) {
+ BUFFER *id = buffer_create(100);
+ buffer_sprintf(id, "dbengine_%s_cache_memory", name);
+
+ BUFFER *family = buffer_create(100);
+ buffer_sprintf(family, "dbengine %s cache", name);
+
+ BUFFER *title = buffer_create(100);
+ buffer_sprintf(title, "Netdata %s Cache Memory", name);
+
+ ptrs->st_pgc_memory = rrdset_create_localhost(
+ "netdata",
+ buffer_tostring(id),
+ NULL,
+ buffer_tostring(family),
+ NULL,
+ buffer_tostring(title),
+ "bytes",
+ "netdata",
+ "stats",
+ priority,
+ localhost->rrd_update_every,
+ RRDSET_TYPE_STACKED);
+
+ ptrs->rd_pgc_memory_free = rrddim_add(ptrs->st_pgc_memory, "free", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_memory_hot = rrddim_add(ptrs->st_pgc_memory, "hot", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_memory_dirty = rrddim_add(ptrs->st_pgc_memory, "dirty", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_memory_clean = rrddim_add(ptrs->st_pgc_memory, "clean", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_memory_index = rrddim_add(ptrs->st_pgc_memory, "index", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_memory_evicting = rrddim_add(ptrs->st_pgc_memory, "evicting", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_memory_flushing = rrddim_add(ptrs->st_pgc_memory, "flushing", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+
+ buffer_free(id);
+ buffer_free(family);
+ buffer_free(title);
+ priority++;
+ }
+
+ collected_number free = (pgc_stats->current_cache_size > pgc_stats->wanted_cache_size) ? 0 :
+ (collected_number)(pgc_stats->wanted_cache_size - pgc_stats->current_cache_size);
+
+ rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_free, free);
+ rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_hot, (collected_number)pgc_stats->queues.hot.size);
+ rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_dirty, (collected_number)pgc_stats->queues.dirty.size);
+ rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_clean, (collected_number)pgc_stats->queues.clean.size);
+ rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_evicting, (collected_number)pgc_stats->evicting_size);
+ rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_flushing, (collected_number)pgc_stats->flushing_size);
+ rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_index,
+ (collected_number)(pgc_stats->size - pgc_stats->queues.clean.size - pgc_stats->queues.hot.size - pgc_stats->queues.dirty.size - pgc_stats->evicting_size - pgc_stats->flushing_size));
+
+ rrdset_done(ptrs->st_pgc_memory);
+ }
+
+ {
+ if (unlikely(!ptrs->st_pgc_pages)) {
+ BUFFER *id = buffer_create(100);
+ buffer_sprintf(id, "dbengine_%s_cache_pages", name);
+
+ BUFFER *family = buffer_create(100);
+ buffer_sprintf(family, "dbengine %s cache", name);
+
+ BUFFER *title = buffer_create(100);
+ buffer_sprintf(title, "Netdata %s Cache Pages", name);
+
+ ptrs->st_pgc_pages = rrdset_create_localhost(
+ "netdata",
+ buffer_tostring(id),
+ NULL,
+ buffer_tostring(family),
+ NULL,
+ buffer_tostring(title),
+ "pages",
+ "netdata",
+ "stats",
+ priority,
+ localhost->rrd_update_every,
+ RRDSET_TYPE_LINE);
+
+ ptrs->rd_pgc_pages_clean = rrddim_add(ptrs->st_pgc_pages, "clean", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_pages_hot = rrddim_add(ptrs->st_pgc_pages, "hot", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_pages_dirty = rrddim_add(ptrs->st_pgc_pages, "dirty", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ ptrs->rd_pgc_pages_referenced = rrddim_add(ptrs->st_pgc_pages, "referenced", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+
+ buffer_free(id);
+ buffer_free(family);
+ buffer_free(title);
+ priority++;
+ }
+
+ rrddim_set_by_pointer(ptrs->st_pgc_pages, ptrs->rd_pgc_pages_clean, (collected_number)pgc_stats->queues.clean.entries);
+ rrddim_set_by_pointer(ptrs->st_pgc_pages, ptrs->rd_pgc_pages_hot, (collected_number)pgc_stats->queues.hot.entries);
+ rrddim_set_by_pointer(ptrs->st_pgc_pages, ptrs->rd_pgc_pages_dirty, (collected_number)pgc_stats->queues.dirty.entries);
+ rrddim_set_by_pointer(ptrs->st_pgc_pages, ptrs->rd_pgc_pages_referenced, (collected_number)pgc_stats->referenced_entries);
+
+ rrdset_done(ptrs->st_pgc_pages);
+ }
+
+ {
+ if (unlikely(!ptrs->st_pgc_memory_changes)) {
+ BUFFER *id = buffer_create(100);
+ buffer_sprintf(id, "dbengine_%s_cache_memory_changes", name);
+
+ BUFFER *family = buffer_create(100);
+ buffer_sprintf(family, "dbengine %s cache", name);
+
+ BUFFER *title = buffer_create(100);
+ buffer_sprintf(title, "Netdata %s Cache Memory Changes", name);
+
+ ptrs->st_pgc_memory_changes = rrdset_create_localhost(
+ "netdata",
+ buffer_tostring(id),
+ NULL,
+ buffer_tostring(family),
+ NULL,
+ buffer_tostring(title),
+ "bytes/s",
+ "netdata",
+ "stats",
+ priority,
+ localhost->rrd_update_every,
+ RRDSET_TYPE_AREA);
+
+ ptrs->rd_pgc_memory_new_clean = rrddim_add(ptrs->st_pgc_memory_changes, "new clean", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_pgc_memory_clean_evictions = rrddim_add(ptrs->st_pgc_memory_changes, "evictions", NULL, -1, 1, RRD_ALGORITHM_INCREMENTAL);
+ ptrs->rd_pgc_memory_new_hot = rrddim_add(ptrs->st_pgc_memory_changes, "new hot", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+
+ buffer_free(id);
+ buffer_free(family);
+ buffer_free(title);
+ priority++;
+ }
+
+ rrddim_set_by_pointer(ptrs->st_pgc_memory_changes, ptrs->rd_pgc_memory_new_clean, (collected_number)(pgc_stats->added_size - pgc_stats->queues.hot.added_size));
+ rrddim_set_by_pointer(ptrs->st_pgc_memory_changes, ptrs->rd_pgc_memory_clean_evictions, (collected_number)pgc_stats->queues.clean.removed_size);
+ rrddim_set_by_pointer(ptrs->st_pgc_memory_changes, ptrs->rd_pgc_memory_new_hot, (collected_number)pgc_stats->queues.hot.added_size);
+
+ rrdset_done(ptrs->st_pgc_memory_changes);
+ }
+
+ {
+ if (unlikely(!ptrs->st_pgc_memory_migrations)) {
+ BUFFER *id = buffer_create(100);
+ buffer_sprintf(id, "dbengine_%s_cache_memory_migrations", name);
+
+ BUFFER *family = buffer_create(100);
+ buffer_sprintf(family, "dbengine %s cache", name);
<