diff options
author | Costa Tsaousis <costa@netdata.cloud> | 2023-01-10 19:59:21 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-01-10 19:59:21 +0200 |
commit | 368a26cfee6887ca0cb2301d93138f63b75e353a (patch) | |
tree | b57e39fdb78dc57f7a2c1fcc3d9b6bf3c2a2a113 /daemon | |
parent | b513888be389f92b2323d1bb3fdf55c22d4e4bad (diff) |
DBENGINE v2 (#14125)
* count open cache pages refering to datafile
* eliminate waste flush attempts
* remove eliminated variable
* journal v2 scanning split functions
* avoid locking open cache for a long time while migrating to journal v2
* dont acquire datafile for the loop; disable thread cancelability while a query is running
* work on datafile acquiring
* work on datafile deletion
* work on datafile deletion again
* logs of dbengine should start with DBENGINE
* thread specific key for queries to check if a query finishes without a finalize
* page_uuid is not used anymore
* Cleanup judy traversal when building new v2
Remove not needed calls to metric registry
* metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent
* disable checks for invalid time-ranges
* Remove type from page details
* report scanning time
* remove infinite loop from datafile acquire for deletion
* remove infinite loop from datafile acquire for deletion again
* trace query handles
* properly allocate array of dimensions in replication
* metrics cleanup
* metrics registry uses arrayalloc
* arrayalloc free should be protected by lock
* use array alloc in page cache
* journal v2 scanning fix
* datafile reference leaking hunding
* do not load metrics of future timestamps
* initialize reasons
* fix datafile reference leak
* do not load pages that are entirely overlapped by others
* expand metric retention atomically
* split replication logic in initialization and execution
* replication prepare ahead queries
* replication prepare ahead queries fixed
* fix replication workers accounting
* add router active queries chart
* restore accounting of pages metadata sources; cleanup replication
* dont count skipped pages as unroutable
* notes on services shutdown
* do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file
* do not add pages we dont need to pdc
* time in range re-work to provide info about past and future matches
* finner control on the pages selected for processing; accounting of page related issues
* fix invalid reference to handle->page
* eliminate data collection handle of pg_lookup_next
* accounting for queries with gaps
* query preprocessing the same way the processing is done; cache now supports all operations on Judy
* dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3)
* get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query
* finner gaps calculation; accounting of overlapping pages in queries
* fix gaps accounting
* move datafile deletion to worker thread
* tune libuv workers and thread stack size
* stop netdata threads gradually
* run indexing together with cache flush/evict
* more work on clean shutdown
* limit the number of pages to evict per run
* do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction
* economies on flags for smaller page footprint; cleanup and renames
* eviction moves referenced pages to the end of the queue
* use murmur hash for indexing partition
* murmur should be static
* use more indexing partitions
* revert number of partitions to number of cpus
* cancel threads first, then stop services
* revert default thread stack size
* dont execute replication requests of disconnected senders
* wait more time for services that are exiting gradually
* fixed last commit
* finer control on page selection algorithm
* default stacksize of 1MB
* fix formatting
* fix worker utilization going crazy when the number is rotating
* avoid buffer full due to replication preprocessing of requests
* support query priorities
* add count of spins in spinlock when compiled with netdata internal checks
* remove prioritization from dbengine queries; cache now uses mutexes for the queues
* hot pages are now in sections judy arrays, like dirty
* align replication queries to optimal page size
* during flushing add to clean and evict in batches
* Revert "during flushing add to clean and evict in batches"
This reverts commit 8fb2b69d068499eacea6de8291c336e5e9f197c7.
* dont lock clean while evicting pages during flushing
* Revert "dont lock clean while evicting pages during flushing"
This reverts commit d6c82b5f40aeba86fc7aead062fab1b819ba58b3.
* Revert "Revert "during flushing add to clean and evict in batches""
This reverts commit ca7a187537fb8f743992700427e13042561211ec.
* dont cross locks during flushing, for the fastest flushes possible
* low-priority queries load pages synchronously
* Revert "low-priority queries load pages synchronously"
This reverts commit 1ef2662ddcd20fe5842b856c716df134c42d1dc7.
* cache uses spinlock again
* during flushing, dont lock the clean queue at all; each item is added atomically
* do smaller eviction runs
* evict one page at a time to minimize lock contention on the clean queue
* fix eviction statistics
* fix last commit
* plain should be main cache
* event loop cleanup; evictions and flushes can now happen concurrently
* run flush and evictions from tier0 only
* remove not needed variables
* flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time
* added worker jobs in timer to find the slow part of it
* support fast eviction of pages when all_of_them is set
* revert default thread stack size
* bypass event loop for dispatching read extent commands to workers - send them directly
* Revert "bypass event loop for dispatching read extent commands to workers - send them directly"
This reverts commit 2c08bc5bab12881ae33bc73ce5dea03dfc4e1fce.
* cache work requests
* minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors
* publish flushed pages to open cache in the thread pool
* prevent eventloop requests from getting stacked in the event loop
* single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c
* more rrdengine.c cleanup
* enable db rotation
* do not log when there is a filter
* do not run multiple migration to journal v2
* load all extents async
* fix wrong paste
* report opcodes waiting, works dispatched, works executing
* cleanup event loop memory every 10 minutes
* dont dispatch more work requests than the number of threads available
* use the dispatched counter instead of the executing counter to check if the worker thread pool is full
* remove UV_RUN_NOWAIT
* replication to fill the queues
* caching of extent buffers; code cleanup
* caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation
* single transaction wal
* synchronous flushing
* first cancel the threads, then signal them to exit
* caching of rrdeng query handles; added priority to query target; health is now low prio
* add priority to the missing points; do not allow critical priority in queries
* offload query preparation and routing to libuv thread pool
* updated timing charts for the offloaded query preparation
* caching of WALs
* accounting for struct caches (buffers); do not load extents with invalid sizes
* protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset
* also check if the expanded before is not the chart later updated time
* also check if the expanded before is not after the wall clock time of when the query started
* Remove unused variable
* replication to queue less queries; cleanup of internal fatals
* Mark dimension to be updated async
* caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol)
* disable pgc stress test, under an ifdef
* disable mrg stress test under an ifdef
* Mark chart and host labels, host info for async check and store in the database
* dictionary items use arrayalloc
* cache section pages structure is allocated with arrayalloc
* Add function to wakeup the aclk query threads and check for exit
Register function to be called during shutdown after signaling the service to exit
* parallel preparation of all dimensions of queries
* be more sensitive to enable streaming after replication
* atomically finish chart replication
* fix last commit
* fix last commit again
* fix last commit again again
* fix last commit again again again
* unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication
* do not cancel start streaming; use high priority queries when we have locked chart data collection
* prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered
* opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue
* Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore
* Fix bad memory allocation / cleanup
* Cleanup ACLK sync initialization (part 1)
* Don't update metric registry during shutdown (part 1)
* Prevent crash when dashboard is refreshed and host goes away
* Mark ctx that is shutting down.
Test not adding flushed pages to open cache as hot if we are shutting down
* make ML work
* Fix compile without NETDATA_INTERNAL_CHECKS
* shutdown each ctx independently
* fix completion of quiesce
* do not update shared ML charts
* Create ML charts on child hosts.
When a parent runs a ML for a child, the relevant-ML charts
should be created on the child host. These charts should use
the parent's hostname to differentiate multiple parents that might
run ML for a child.
The only exception to this rule is the training/prediction resource
usage charts. These are created on the localhost of the parent host,
because they provide information specific to said host.
* check new ml code
* first save the database, then free all memory
* dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db
* Cleanup metadata thread (part 2)
* increase refcount before dispatching prep command
* Do not try to stop anomaly detection threads twice.
A separate function call has been added to stop anomaly detection threads.
This commit removes the left over function calls that were made
internally when a host was being created/destroyed.
* Remove allocations when smoothing samples buffer
The number of dims per sample is always 1, ie. we are training and
predicting only individual dimensions.
* set the orphan flag when loading archived hosts
* track worker dispatch callbacks and threadpool worker init
* make ML threads joinable; mark ctx having flushing in progress as early as possible
* fix allocation counter
* Cleanup metadata thread (part 3)
* Cleanup metadata thread (part 4)
* Skip metadata host scan when running unittest
* unittest support during init
* dont use all the libuv threads for queries
* break an infinite loop when sleep_usec() is interrupted
* ml prediction is a collector for several charts
* sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit
* worker_unregister() in netdata threads cleanup
* moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h
* fixed ML issues
* removed engine2 directory
* added dbengine2 files in CMakeLists.txt
* move query plan data to query target, so that they can be exposed by in jsonwrap
* uniform definition of query plan according to the other query target members
* event_loop should be in daemon, not libnetdata
* metric_retention_by_uuid() is now part of the storage engine abstraction
* unify time_t variables to have the suffix _s (meaning: seconds)
* old dbengine statistics become "dbengine io"
* do not enable ML resource usage charts by default
* unify ml chart families, plugins and modules
* cleanup query plans from query target
* cleanup all extent buffers
* added debug info for rrddim slot to time
* rrddim now does proper gap management
* full rewrite of the mem modes
* use library functions for madvise
* use CHECKSUM_SZ for the checksum size
* fix coverity warning about the impossible case of returning a page that is entirely in the past of the query
* fix dbengine shutdown
* keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently
* fine tune cache evictions
* dont initialize health if the health service is not running - prevent crash on shutdown while children get connected
* rename AS threads to ACLK[hostname]
* prevent re-use of uninitialized memory in queries
* use JulyL instead of JudyL for PDC operations - to test it first
* add also JulyL files
* fix July memory accounting
* disable July for PDC (use Judy)
* use the function to remove datafiles from linked list
* fix july and event_loop
* add july to libnetdata subdirs
* rename time_t variables that end in _t to end in _s
* replicate when there is a gap at the beginning of the replication period
* reset postponing of sender connections when a receiver is connected
* Adjust update every properly
* fix replication infinite loop due to last change
* packed enums in rrd.h and cleanup of obsolete rrd structure members
* prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests()
* void unused variable
* void unused variables
* fix indentation
* entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps
* macros to caclulate page entries by time and size
* prevent statsd cleanup crash on exit
* cleanup health thread related variables
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: vkalintiris <vasilis@netdata.cloud>
Diffstat (limited to 'daemon')
-rw-r--r-- | daemon/analytics.c | 12 | ||||
-rw-r--r-- | daemon/commands.c | 6 | ||||
-rw-r--r-- | daemon/common.h | 1 | ||||
-rw-r--r-- | daemon/event_loop.c | 49 | ||||
-rw-r--r-- | daemon/event_loop.h | 44 | ||||
-rw-r--r-- | daemon/global_statistics.c | 1492 | ||||
-rw-r--r-- | daemon/main.c | 387 | ||||
-rw-r--r-- | daemon/main.h | 30 | ||||
-rw-r--r-- | daemon/service.c | 32 | ||||
-rw-r--r-- | daemon/static_threads.c | 12 | ||||
-rw-r--r-- | daemon/unit_test.c | 25 |
11 files changed, 1774 insertions, 316 deletions
diff --git a/daemon/analytics.c b/daemon/analytics.c index 3d0e514d66..dd7eeebbd7 100644 --- a/daemon/analytics.c +++ b/daemon/analytics.c @@ -223,9 +223,7 @@ void analytics_mirrored_hosts(void) if (rrdhost_flag_check(host, RRDHOST_FLAG_ARCHIVED)) continue; - netdata_mutex_lock(&host->receiver_lock); - ((host->receiver || host == localhost) ? reachable++ : unreachable++); - netdata_mutex_unlock(&host->receiver_lock); + ((host == localhost || !rrdhost_flag_check(host, RRDHOST_FLAG_ORPHAN)) ? reachable++ : unreachable++); count++; } @@ -554,7 +552,7 @@ void analytics_gather_mutable_meta_data(void) snprintfz(b, 6, "%d", analytics_data.dashboard_hits); analytics_set_data(&analytics_data.netdata_dashboard_used, b); - snprintfz(b, 6, "%zu", rrd_hosts_available); + snprintfz(b, 6, "%zu", rrdhost_hosts_available()); analytics_set_data(&analytics_data.netdata_config_hosts_available, b); } } @@ -587,12 +585,12 @@ void *analytics_main(void *ptr) debug(D_ANALYTICS, "Analytics thread starts"); //first delay after agent start - while (!netdata_exit && likely(sec <= ANALYTICS_INIT_SLEEP_SEC)) { + while (service_running(SERVICE_ANALYTICS) && likely(sec <= ANALYTICS_INIT_SLEEP_SEC)) { heartbeat_next(&hb, step_ut); sec++; } - if (unlikely(netdata_exit)) + if (unlikely(!service_running(SERVICE_ANALYTICS))) goto cleanup; analytics_gather_immutable_meta_data(); @@ -605,7 +603,7 @@ void *analytics_main(void *ptr) heartbeat_next(&hb, step_ut * 2); sec += 2; - if (unlikely(netdata_exit)) + if (unlikely(!service_running(SERVICE_ANALYTICS))) break; if (likely(sec < ANALYTICS_HEARTBEAT)) diff --git a/daemon/commands.c b/daemon/commands.c index fae88dabf9..8f09e63e11 100644 --- a/daemon/commands.c +++ b/daemon/commands.c @@ -470,9 +470,13 @@ static void after_schedule_command(uv_work_t *req, int status) static void schedule_command(uv_work_t *req) { - struct command_context *cmd_ctx = req->data; + register_libuv_worker_jobs(); + worker_is_busy(UV_EVENT_SCHEDULE_CMD); + struct command_context *cmd_ctx = req->data; cmd_ctx->status = execute_command(cmd_ctx->idx, cmd_ctx->args, &cmd_ctx->message); + + worker_is_idle(); } /* This will alter the state of the command_info_array.cmd_str diff --git a/daemon/common.h b/daemon/common.h index f3d868661e..8a775fb837 100644 --- a/daemon/common.h +++ b/daemon/common.h @@ -4,6 +4,7 @@ #define NETDATA_COMMON_H 1 #include "libnetdata/libnetdata.h" +#include "event_loop.h" // ---------------------------------------------------------------------------- // shortcuts for the default netdata configuration diff --git a/daemon/event_loop.c b/daemon/event_loop.c new file mode 100644 index 0000000000..13d3a5822c --- /dev/null +++ b/daemon/event_loop.c @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include <daemon/main.h> +#include "event_loop.h" + +// Register workers +void register_libuv_worker_jobs() { + static __thread bool registered = false; + + if(likely(registered)) + return; + + registered = true; + + worker_register("LIBUV"); + worker_register_job_name(UV_EVENT_READ_PAGE_CB, "read page cb"); + worker_register_job_name(UV_EVENT_READ_EXTENT_CB, "read extent cb"); + worker_register_job_name(UV_EVENT_COMMIT_PAGE_CB, "commit cb"); + worker_register_job_name(UV_EVENT_FLUSH_PAGES_CB, "flush cb"); + worker_register_job_name(UV_EVENT_PAGE_LOOKUP, "page lookup"); + worker_register_job_name(UV_EVENT_METRIC_LOOKUP, "metric lookup"); + worker_register_job_name(UV_EVENT_PAGE_POPULATION, "populate page"); + worker_register_job_name(UV_EVENT_EXT_DECOMPRESSION, "extent decompression"); + worker_register_job_name(UV_EVENT_READ_MMAP_EXTENT, "read extent (mmap)"); + worker_register_job_name(UV_EVENT_EXTENT_PROCESSING, "extent processing"); + worker_register_job_name(UV_EVENT_METADATA_STORE, "store host metadata"); + worker_register_job_name(UV_EVENT_JOURNAL_INDEX_WAIT, "journal v2 wait"); + worker_register_job_name(UV_EVENT_JOURNAL_INDEX, "journal v2 indexing"); + worker_register_job_name(UV_EVENT_SCHEDULE_CMD, "schedule command"); + worker_register_job_name(UV_EVENT_METADATA_CLEANUP, "metadata cleanup"); + worker_register_job_name(UV_EVENT_EXTENT_CACHE, "extent cache"); + worker_register_job_name(UV_EVENT_EXTENT_MMAP, "extent mmap"); + worker_register_job_name(UV_EVENT_PAGE_DISPATCH, "dispatch page list"); + worker_register_job_name(UV_EVENT_FLUSH_CALLBACK, "flush callback"); + worker_register_job_name(UV_EVENT_FLUSH_MAIN, "flush main"); + worker_register_job_name(UV_EVENT_FLUSH_OPEN, "flush open"); + worker_register_job_name(UV_EVENT_EVICT_MAIN, "evict main"); + worker_register_job_name(UV_EVENT_DELETING_FILE, "delete datafiles"); + worker_register_job_name(UV_EVENT_ANALYZE_V2, "analyze journalfile"); + worker_register_job_name(UV_EVENT_RETENTION_V2, "calculate retention"); + worker_register_job_name(UV_EVENT_RETENTION_UPDATE, "update retention"); + worker_register_job_name(UV_EVENT_DATAFILE_ACQUIRE, "datafile acquire"); + worker_register_job_name(UV_EVENT_DATAFILE_DELETE, "datafile deletion"); + worker_register_job_name(UV_EVENT_FLUSHED_TO_OPEN, "flushed to open"); + worker_register_job_name(UV_EVENT_PREP_QUERY, "prep query"); + worker_register_job_name(UV_EVENT_WORKER_INIT, "worker init"); + + uv_thread_set_name_np(pthread_self(), "LIBUV_WORKER"); +} diff --git a/daemon/event_loop.h b/daemon/event_loop.h new file mode 100644 index 0000000000..e332c253cd --- /dev/null +++ b/daemon/event_loop.h @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_EVENT_LOOP_H +#define NETDATA_EVENT_LOOP_H + +enum event_loop_job { + UV_EVENT_JOB_NONE = 0, + UV_EVENT_READ_PAGE_CB, + UV_EVENT_READ_EXTENT_CB, + UV_EVENT_COMMIT_PAGE_CB, + UV_EVENT_FLUSH_PAGES_CB, + UV_EVENT_EXT_DECOMPRESSION, + UV_EVENT_PAGE_LOOKUP, + UV_EVENT_METRIC_LOOKUP, + UV_EVENT_PAGE_POPULATION, + UV_EVENT_READ_MMAP_EXTENT, + UV_EVENT_EXTENT_PROCESSING, + UV_EVENT_METADATA_STORE, + UV_EVENT_JOURNAL_INDEX_WAIT, + UV_EVENT_JOURNAL_INDEX, + UV_EVENT_SCHEDULE_CMD, + UV_EVENT_METADATA_CLEANUP, + UV_EVENT_EXTENT_CACHE, + UV_EVENT_EXTENT_MMAP, + UV_EVENT_FLUSH_CALLBACK, + UV_EVENT_EXTEXT_DISPATCH, + UV_EVENT_FLUSH_MAIN, + UV_EVENT_FLUSH_OPEN, + UV_EVENT_EVICT_MAIN, + UV_EVENT_PAGE_DISPATCH, + UV_EVENT_DELETING_FILE, + UV_EVENT_ANALYZE_V2, + UV_EVENT_RETENTION_V2, + UV_EVENT_RETENTION_UPDATE, + UV_EVENT_DATAFILE_ACQUIRE, + UV_EVENT_DATAFILE_DELETE, + UV_EVENT_FLUSHED_TO_OPEN, + UV_EVENT_PREP_QUERY, + UV_EVENT_WORKER_INIT, +}; + +void register_libuv_worker_jobs(); + +#endif //NETDATA_EVENT_LOOP_H diff --git a/daemon/global_statistics.c b/daemon/global_statistics.c index 49d7086956..15f3a5275f 100644 --- a/daemon/global_statistics.c +++ b/daemon/global_statistics.c @@ -669,15 +669,15 @@ static void global_statistics_charts(void) { "netdata" // type , "ml_models_consulted" // id , NULL // name - , "ml" // family + , NETDATA_ML_CHART_FAMILY // family , NULL // context , "KMeans models used for prediction" // title , "models" // units - , "netdata" // plugin - , "ml" // module - , 131004 // priority + , NETDATA_ML_PLUGIN // plugin + , NETDATA_ML_MODULE_DETECTION // module + , NETDATA_ML_CHART_PRIO_MACHINE_LEARNING_STATUS // priority , localhost->rrd_update_every // update_every - , RRDSET_TYPE_STACKED // chart_type + , RRDSET_TYPE_AREA // chart_type ); rd = rrddim_add(st, "num_models_consulted", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); @@ -996,9 +996,1119 @@ static void sqlite3_statistics_charts(void) { // ---------------------------------------------------------------- } -static void dbengine_statistics_charts(void) { -#ifdef ENABLE_DBENGINE +struct dbengine2_cache_pointers { + RRDSET *st_cache_hit_ratio; + RRDDIM *rd_hit_ratio_closest; + RRDDIM *rd_hit_ratio_exact; + + RRDSET *st_operations; + RRDDIM *rd_searches_closest; + RRDDIM *rd_searches_exact; + RRDDIM *rd_add_hot; + RRDDIM *rd_add_clean; + RRDDIM *rd_evictions; + RRDDIM *rd_flushes; + RRDDIM *rd_acquires; + RRDDIM *rd_releases; + RRDDIM *rd_acquires_for_deletion; + + RRDSET *st_pgc_memory; + RRDDIM *rd_pgc_memory_free; + RRDDIM *rd_pgc_memory_clean; + RRDDIM *rd_pgc_memory_hot; + RRDDIM *rd_pgc_memory_dirty; + RRDDIM *rd_pgc_memory_index; + RRDDIM *rd_pgc_memory_evicting; + RRDDIM *rd_pgc_memory_flushing; + + RRDSET *st_pgc_pages; + RRDDIM *rd_pgc_pages_clean; + RRDDIM *rd_pgc_pages_hot; + RRDDIM *rd_pgc_pages_dirty; + RRDDIM *rd_pgc_pages_referenced; + + RRDSET *st_pgc_memory_changes; + RRDDIM *rd_pgc_memory_new_hot; + RRDDIM *rd_pgc_memory_new_clean; + RRDDIM *rd_pgc_memory_clean_evictions; + + RRDSET *st_pgc_memory_migrations; + RRDDIM *rd_pgc_memory_hot_to_dirty; + RRDDIM *rd_pgc_memory_dirty_to_clean; + + RRDSET *st_pgc_workers; + RRDDIM *rd_pgc_workers_evictors; + RRDDIM *rd_pgc_workers_flushers; + RRDDIM *rd_pgc_workers_adders; + RRDDIM *rd_pgc_workers_searchers; + RRDDIM *rd_pgc_workers_jv2_flushers; + RRDDIM *rd_pgc_workers_hot2dirty; + + RRDSET *st_pgc_memory_events; + RRDDIM *rd_pgc_memory_evictions_critical; + RRDDIM *rd_pgc_memory_evictions_aggressive; + RRDDIM *rd_pgc_memory_flushes_critical; + + RRDSET *st_pgc_waste; + RRDDIM *rd_pgc_waste_evictions_skipped; + RRDDIM *rd_pgc_waste_flushes_cancelled; + RRDDIM *rd_pgc_waste_insert_spins; + RRDDIM *rd_pgc_waste_evict_spins; + RRDDIM *rd_pgc_waste_release_spins; + RRDDIM *rd_pgc_waste_acquire_spins; + RRDDIM *rd_pgc_waste_delete_spins; + RRDDIM *rd_pgc_waste_flush_spins; + +}; + +static void dbengine2_cache_statistics_charts(struct dbengine2_cache_pointers *ptrs, struct pgc_statistics *pgc_stats, struct pgc_statistics *pgc_stats_old __maybe_unused, const char *name, int priority) { + + { + if (unlikely(!ptrs->st_cache_hit_ratio)) { + BUFFER *id = buffer_create(100); + buffer_sprintf(id, "dbengine_%s_cache_hit_ratio", name); + + BUFFER *family = buffer_create(100); + buffer_sprintf(family, "dbengine %s cache", name); + + BUFFER *title = buffer_create(100); + buffer_sprintf(title, "Netdata %s Cache Hit Ratio", name); + + ptrs->st_cache_hit_ratio = rrdset_create_localhost( + "netdata", + buffer_tostring(id), + NULL, + buffer_tostring(family), + NULL, + buffer_tostring(title), + "%", + "netdata", + "stats", + priority, + localhost->rrd_update_every, + RRDSET_TYPE_LINE); + + ptrs->rd_hit_ratio_closest = rrddim_add(ptrs->st_cache_hit_ratio, "closest", NULL, 1, 10000, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_hit_ratio_exact = rrddim_add(ptrs->st_cache_hit_ratio, "exact", NULL, 1, 10000, RRD_ALGORITHM_ABSOLUTE); + + buffer_free(id); + buffer_free(family); + buffer_free(title); + priority++; + } + + size_t closest_percent = 100 * 10000; + if(pgc_stats->searches_closest > pgc_stats_old->searches_closest) + closest_percent = (pgc_stats->searches_closest_hits - pgc_stats_old->searches_closest_hits) * 100 * 10000 / (pgc_stats->searches_closest - pgc_stats_old->searches_closest); + + size_t exact_percent = 100 * 10000; + if(pgc_stats->searches_exact > pgc_stats_old->searches_exact) + exact_percent = (pgc_stats->searches_exact_hits - pgc_stats_old->searches_exact_hits) * 100 * 10000 / (pgc_stats->searches_exact - pgc_stats_old->searches_exact); + + rrddim_set_by_pointer(ptrs->st_cache_hit_ratio, ptrs->rd_hit_ratio_closest, (collected_number)closest_percent); + rrddim_set_by_pointer(ptrs->st_cache_hit_ratio, ptrs->rd_hit_ratio_exact, (collected_number)exact_percent); + + rrdset_done(ptrs->st_cache_hit_ratio); + } + + { + if (unlikely(!ptrs->st_operations)) { + BUFFER *id = buffer_create(100); + buffer_sprintf(id, "dbengine_%s_cache_operations", name); + + BUFFER *family = buffer_create(100); + buffer_sprintf(family, "dbengine %s cache", name); + + BUFFER *title = buffer_create(100); + buffer_sprintf(title, "Netdata %s Cache Operations", name); + + ptrs->st_operations = rrdset_create_localhost( + "netdata", + buffer_tostring(id), + NULL, + buffer_tostring(family), + NULL, + buffer_tostring(title), + "ops/s", + "netdata", + "stats", + priority, + localhost->rrd_update_every, + RRDSET_TYPE_LINE); + + ptrs->rd_searches_closest = rrddim_add(ptrs->st_operations, "search closest", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_searches_exact = rrddim_add(ptrs->st_operations, "search exact", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_add_hot = rrddim_add(ptrs->st_operations, "add hot", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_add_clean = rrddim_add(ptrs->st_operations, "add clean", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_evictions = rrddim_add(ptrs->st_operations, "evictions", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_flushes = rrddim_add(ptrs->st_operations, "flushes", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_acquires = rrddim_add(ptrs->st_operations, "acquires", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_releases = rrddim_add(ptrs->st_operations, "releases", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_acquires_for_deletion = rrddim_add(ptrs->st_operations, "del acquires", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + + buffer_free(id); + buffer_free(family); + buffer_free(title); + priority++; + } + + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_searches_closest, (collected_number)pgc_stats->searches_closest); + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_searches_exact, (collected_number)pgc_stats->searches_exact); + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_add_hot, (collected_number)pgc_stats->queues.hot.added_entries); + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_add_clean, (collected_number)(pgc_stats->added_entries - pgc_stats->queues.hot.added_entries)); + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_evictions, (collected_number)pgc_stats->queues.clean.removed_entries); + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_flushes, (collected_number)pgc_stats->flushes_completed); + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_acquires, (collected_number)pgc_stats->acquires); + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_releases, (collected_number)pgc_stats->releases); + rrddim_set_by_pointer(ptrs->st_operations, ptrs->rd_acquires_for_deletion, (collected_number)pgc_stats->acquires_for_deletion); + + rrdset_done(ptrs->st_operations); + } + + { + + if (unlikely(!ptrs->st_pgc_memory)) { + BUFFER *id = buffer_create(100); + buffer_sprintf(id, "dbengine_%s_cache_memory", name); + + BUFFER *family = buffer_create(100); + buffer_sprintf(family, "dbengine %s cache", name); + + BUFFER *title = buffer_create(100); + buffer_sprintf(title, "Netdata %s Cache Memory", name); + + ptrs->st_pgc_memory = rrdset_create_localhost( + "netdata", + buffer_tostring(id), + NULL, + buffer_tostring(family), + NULL, + buffer_tostring(title), + "bytes", + "netdata", + "stats", + priority, + localhost->rrd_update_every, + RRDSET_TYPE_STACKED); + + ptrs->rd_pgc_memory_free = rrddim_add(ptrs->st_pgc_memory, "free", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_memory_hot = rrddim_add(ptrs->st_pgc_memory, "hot", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_memory_dirty = rrddim_add(ptrs->st_pgc_memory, "dirty", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_memory_clean = rrddim_add(ptrs->st_pgc_memory, "clean", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_memory_index = rrddim_add(ptrs->st_pgc_memory, "index", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_memory_evicting = rrddim_add(ptrs->st_pgc_memory, "evicting", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_memory_flushing = rrddim_add(ptrs->st_pgc_memory, "flushing", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + + buffer_free(id); + buffer_free(family); + buffer_free(title); + priority++; + } + + collected_number free = (pgc_stats->current_cache_size > pgc_stats->wanted_cache_size) ? 0 : + (collected_number)(pgc_stats->wanted_cache_size - pgc_stats->current_cache_size); + + rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_free, free); + rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_hot, (collected_number)pgc_stats->queues.hot.size); + rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_dirty, (collected_number)pgc_stats->queues.dirty.size); + rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_clean, (collected_number)pgc_stats->queues.clean.size); + rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_evicting, (collected_number)pgc_stats->evicting_size); + rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_flushing, (collected_number)pgc_stats->flushing_size); + rrddim_set_by_pointer(ptrs->st_pgc_memory, ptrs->rd_pgc_memory_index, + (collected_number)(pgc_stats->size - pgc_stats->queues.clean.size - pgc_stats->queues.hot.size - pgc_stats->queues.dirty.size - pgc_stats->evicting_size - pgc_stats->flushing_size)); + + rrdset_done(ptrs->st_pgc_memory); + } + + { + if (unlikely(!ptrs->st_pgc_pages)) { + BUFFER *id = buffer_create(100); + buffer_sprintf(id, "dbengine_%s_cache_pages", name); + + BUFFER *family = buffer_create(100); + buffer_sprintf(family, "dbengine %s cache", name); + + BUFFER *title = buffer_create(100); + buffer_sprintf(title, "Netdata %s Cache Pages", name); + + ptrs->st_pgc_pages = rrdset_create_localhost( + "netdata", + buffer_tostring(id), + NULL, + buffer_tostring(family), + NULL, + buffer_tostring(title), + "pages", + "netdata", + "stats", + priority, + localhost->rrd_update_every, + RRDSET_TYPE_LINE); + + ptrs->rd_pgc_pages_clean = rrddim_add(ptrs->st_pgc_pages, "clean", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_pages_hot = rrddim_add(ptrs->st_pgc_pages, "hot", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_pages_dirty = rrddim_add(ptrs->st_pgc_pages, "dirty", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + ptrs->rd_pgc_pages_referenced = rrddim_add(ptrs->st_pgc_pages, "referenced", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE); + + buffer_free(id); + buffer_free(family); + buffer_free(title); + priority++; + } + + rrddim_set_by_pointer(ptrs->st_pgc_pages, ptrs->rd_pgc_pages_clean, (collected_number)pgc_stats->queues.clean.entries); + rrddim_set_by_pointer(ptrs->st_pgc_pages, ptrs->rd_pgc_pages_hot, (collected_number)pgc_stats->queues.hot.entries); + rrddim_set_by_pointer(ptrs->st_pgc_pages, ptrs->rd_pgc_pages_dirty, (collected_number)pgc_stats->queues.dirty.entries); + rrddim_set_by_pointer(ptrs->st_pgc_pages, ptrs->rd_pgc_pages_referenced, (collected_number)pgc_stats->referenced_entries); + + rrdset_done(ptrs->st_pgc_pages); + } + + { + if (unlikely(!ptrs->st_pgc_memory_changes)) { + BUFFER *id = buffer_create(100); + buffer_sprintf(id, "dbengine_%s_cache_memory_changes", name); + + BUFFER *family = buffer_create(100); + buffer_sprintf(family, "dbengine %s cache", name); + + BUFFER *title = buffer_create(100); + buffer_sprintf(title, "Netdata %s Cache Memory Changes", name); + + ptrs->st_pgc_memory_changes = rrdset_create_localhost( + "netdata", + buffer_tostring(id), + NULL, + buffer_tostring(family), + NULL, + buffer_tostring(title), + "bytes/s", + "netdata", + "stats", + priority, + localhost->rrd_update_every, + RRDSET_TYPE_AREA); + + ptrs->rd_pgc_memory_new_clean = rrddim_add(ptrs->st_pgc_memory_changes, "new clean", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_pgc_memory_clean_evictions = rrddim_add(ptrs->st_pgc_memory_changes, "evictions", NULL, -1, 1, RRD_ALGORITHM_INCREMENTAL); + ptrs->rd_pgc_memory_new_hot = rrddim_add(ptrs->st_pgc_memory_changes, "new hot", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + + buffer_free(id); + buffer_free(family); + buffer_free(title); + priority++; + } + + rrddim_set_by_pointer(ptrs->st_pgc_memory_changes, ptrs->rd_pgc_memory_new_clean, (collected_number)(pgc_stats->added_size - pgc_stats->queues.hot.added_size)); + rrddim_set_by_pointer(ptrs->st_pgc_memory_changes, ptrs->rd_pgc_memory_clean_evictions, (collected_number)pgc_stats->queues.clean.removed_size); + rrddim_set_by_pointer(ptrs->st_pgc_memory_changes, ptrs->rd_pgc_memory_new_hot, (collected_number)pgc_stats->queues.hot.added_size); + + rrdset_done(ptrs->st_pgc_memory_changes); + } + + { + if (unlikely(!ptrs->st_pgc_memory_migrations)) { + BUFFER *id = buffer_create(100); + buffer_sprintf(id, "dbengine_%s_cache_memory_migrations", name); + + BUFFER *family = buffer_create(100); + buffer_sprintf(family, "dbengine %s cache", name); < |