summaryrefslogtreecommitdiffstats
path: root/database/sqlite/sqlite_functions.c
AgeCommit message (Collapse)Author
2024-01-15Add additional fail reason and source during database initialization (#16794)Stelios Fragkakis
2023-12-12code cleanup (#16542)Costa Tsaousis
fixed minor code cleanup warnings
2023-12-01Code cleanup (#16448)Stelios Fragkakis
* Code cleanup * More cleanup * More cleanup * Use FILENAME_MAX * query fix
2023-11-21Remove queue limit from ACLK sync event loop (#16411)Stelios Fragkakis
Code cleanup
2023-11-07Better database corruption detention during runtime (#16343)Stelios Fragkakis
Detect database corruption doing query executing and schedule recovery on next restart
2023-11-01Optimize database before agent shutdown (#16317)Stelios Fragkakis
Optimize database before shutdown
2023-10-31Improve dimension ML model load (#16262)Stelios Fragkakis
* Prepare metadata sync thread cleanup earlier in the shutdown process * Set flag for the dimensions that need ML MODEL load instead of queueing a message in the event loop * Process the dimension ML load during the normal dimension metadata save loop * Use spinlock for cmd queue / dequeue instead of mutex Cleanup queue structure * Remove old ML model load code * Rebase and cleanup
2023-10-27Faster parents (#16127)Costa Tsaousis
* cache ctx in collection handle * cache rd together with rda * do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension() * fix unit tests * do the absolutely minimum while updating timestamps, ensure validity during reading them * when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only. * remove the spinlock from mrg * remove the metric flags that are not used any more * mrg writers can be different threads * update first time when latest clean is also updated * cleanup * set hot page with a simple atomic operation * sender sets chart slot for every chart * work on senders without SLOT * enable SLOT capability * send slot at BEGIN when SLOT is enabled * fix slot generation and parsing * send slot while re-streaming * use the sender capabilities, not the receiver * cleanup * add slots support to all chart and dimension related plugin commands * fix condition * fix calculation * check sender capabilties * assign slots in constructors * we need the dimension slot at the DIMENSION keyword * more debug info in case of dimension mismatch * ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely * fix renumbering on child restart * reset rda caching when receiving a chart definition * optimize pluginsd_end_v2() * do not do zero sized allocations * trust the chart slot id of the child * cleanup charts on pluginsd thread exit * better cleanup * find the chart and put it in the slot, if it not already there * move slots array to host * initialize pluginsd slots properly * add slots to replay begin; do not cleanup slots that dont belong to a chart * cleanup on obsolete * cleanup slots on obsoletions * cleanup and renames about obsoletion * rewrite obsolation service code to remove race conditions * better service obsoletion log * added debugging * more debug * exposed flag now compares versions * removed debugging messages * respolve conflicts * fix replication check for unsent dimensions
2023-10-27Small optimization of alert queries (#16282)Emmanuel Vasilakis
2023-10-20Drop an unused index from aclk_alert table (#16242)Stelios Fragkakis
* Drop unused aclk_alert index * Log messages only when compiled with NETDATA_INTERNAL_CHECKS
2023-10-13Fix access of memory after free (#16185)Stelios Fragkakis
* Proper init to avoid use after free * CID 400083 Unchecked return value
2023-10-10Fix compilation warnings (#16158)Stelios Fragkakis
Drop warning when parent is not accepting job status updates
2023-10-06Code improvements (#16104)Stelios Fragkakis
* Remove unused functions * No need for prepare statement because the function is not used frequently * Remove db_meta check, already assumed valid * Remove D_ACLK_SYNC and D_METADATALOG, fix log message * Reuse prepared statements per run to avoid sql parsing all the time * Keep rowid in charts and dimensions * Host and chart labels keep rowids * Don't store internal flags * Remove commented out code * Formatting * Fix algorithm when updating dimension
2023-10-03Skip database migration steps in new installation (#16071)Stelios Fragkakis
* For new installation skip database migration steps * Simplify logging * Count database tables to determine if database is empty * Report extended error message
2023-09-28Convert the ML database (#16046)Stelios Fragkakis
* Convert a db to WAL with auto vacuum * Use single sqlite configuration function * Remove UNUSED statements
2023-09-26Maintain node's last connected timestamp in the db (#15979)Stelios Fragkakis
* Maintain node's last connected timestamp in the db * Rebase -- switch to version database v14
2023-09-26Fix summary field in table (#16050)Emmanuel Vasilakis
fix summary field in table
2023-09-19Add a summary field to alerts (#15886)Emmanuel Vasilakis
* add a summary field to alerts * add summary field to db * rebase * better migration * rebase * change email notification * revert to silent * use macro * add the summary field to some alerts * add more summary fields * change migration function * add to postgres alerts * add summary to vernemq * more summary fields * more summary fields * fixes * add doc
2023-09-05Reduce workload during cleanup (#15919)Stelios Fragkakis
* Add index to improve health cleanup * Re arrange query to use index * Check less entries during cleanup to prevent CPU spike
2023-09-01Add better recovery for corrupted metadata (#15891)Stelios Fragkakis
* Add sqlite-meta-recover command line option Remove the old recovery that would attempt to fix only chart and dimension Mark recovery for metadata (for now) Simplify the database init function * Reduce variable scope, formatting
2023-09-01Reduce label memory (#15255)Stelios Fragkakis
2023-08-25Metadata cleanup improvements (#15462)Stelios Fragkakis
* Cleanup improvements Cleanup for charts and chart labels Code Formatting Run health cleanup every hour Generic cleanup function with appropriate callbacks * Cleanup and better logging * Start metadata cleanup job faster * Improve logging message * Do cleanup after storing metadata as needed * First check after 30 minutes * First check after 30 minutes Cleanup
2023-08-23Add a fail reason to pinpoint exactly what went wrong (#15866)Stelios Fragkakis
* Add a fail reason to pinpoint exactly what went wrong * Drop the env for setting the fail reason. Always pass netdata_fail_reason
2023-08-22Misc code cleanup (#15665)Stelios Fragkakis
* Cleanup code * Add SQLITE3_COLUMN_STRDUPZ_OR_NULL for readability * Bind unique id properly * Cleanup with is_claimed parameter to decide which cleanup to use Unify cleanup function sql_health_alarm_log_cleanup Add SQLITE3_BIND_STRING_OR_NULL and SQLITE3_COLUMN_STRINGDUP_OR_NULL sql_health_alarm_log_count returns number of rows instead of updating host->health.health_log_entries_written Reformat queries for clarity * Try to fix codacy issue * Try to fix codacy issue -- issue small warning * Change label from fail to done * Drop index on unique_id and health_log_id and create one on both * Update database/sqlite/sqlite_aclk_alert.c Co-authored-by: Emmanuel Vasilakis <mrzammler@gmail.com> * Fix double bind --------- Co-authored-by: Emmanuel Vasilakis <mrzammler@gmail.com>
2023-08-18Fix warning when compiling with -flto (#15838)Stelios Fragkakis
Fix warning when compiling with flto
2023-08-03Revert "Refactor RRD code. (#15423)" (#15723)vkalintiris
This reverts commit 440bd51e08fdfa2a4daa191fb68643456028a753. dbengine was still being used for non-zero tiers even on non-dbengine modes.
2023-07-27Drop duplicate / unused index (#15568)Stelios Fragkakis
2023-07-26Refactor RRD code. (#15423)vkalintiris
* Storage engine. * Host indexes to rrdb * Move globals to rrdb * Move storage_tiers_backfill to rrdb * default_rrd_update_every to rrdb * default_rrd_history_entries to rrdb * gap_when_lost_iterations_above to rrdb * rrdset_free_obsolete_time_s to rrdb * libuv_worker_threads to rrdb * ieee754_doubles to rrdb * rrdhost_free_orphan_time_s to rrdb * rrd_rwlock to rrdb * localhost to rrdb * rm extern from func decls * mv rrd macro under rrd.h * default_rrdeng_page_cache_mb to rrdb * default_rrdeng_extent_cache_mb to rrdb * db_engine_journal_check to rrdb * default_rrdeng_disk_quota_mb to rrdb * default_multidb_disk_quota_mb to rrdb * multidb_ctx to rrdb * page_type_size to rrdb * tier_page_size to rrdb * No storage_engine_id in rrdim functions * storage_engine_id is provided by st * Update to fix merge conflict. * Update field name * Remove unnecessary macros from rrd.h * Rm unused type decls * Rm duplicate func decls * make internal function static * Make the rest of public dbengine funcs accept a storage_instance. * No more rrdengine_instance :) * rm rrdset_debug from rrd.h * Use rrdb to access globals in ML and ACLK Missed due to not having the submodules in the worktree. * rm total_number * rm RRDVAR_TYPE_TOTAL * rm unused inline * Rm names from typedef'd enums * rm unused header include * Move include * Rm unused header include * s/rrdhost_find_or_create/rrdhost_get_or_create/g * s/find_host_by_node_id/rrdhost_find_by_node_id/ Also, remove duplicate definition in rrdcontext.c * rm macro used only once * rm macro used only once * Reduce rrd.h api by moving funcs into a collector specific utils header * Remove unused func * Move parser specific function out of rrd.h * return storage_number instead of void pointer * move code related to rrd initialization out of rrdhost.c * Remove tier_grouping from rrdim_tier Saves 8 * storage_tiers bytes per dimension. * Fix rebase * s/rrd_update_every/update_every/ * Mark functions as static and constify args * Add license notes and file to build systems. * Remove remaining non-log/config mentions of memory mode * Move rrdlabels api to separate file. Also, move localhost functions that loads labels outside of database/ and into daemon/ * Remove function decl in rrd.h * merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir * Do not expose internal function from rrd.h * Rm NETDATA_RRD_INTERNALS Only one function decl is covered. We have more database internal functions that we currently expose for no good reason. These will be placed in a separate internal header in follow up PRs. * Add license note * Include libnetdata.h instead of aral.h * Use rrdb to access localhost * Fix builds without dbengine * Add header to build system files * Add rrdlabels.h to build systems * Move func def from rrd.h to rrdhost.c * Fix macos build * Rm non-existing function * Rebase master * Define buffer length macro in ad_charts. * Fix FreeBSD builds. * Mark functions static * Rm func decls without definitions * Rebase master * Rebase master * Properly initialize value of storage tiers. * Fix build after rebase.
2023-07-25wait for node_id while claiming (#15526)Costa Tsaousis
2023-07-20Store and transmit chart_name to cloud in alert events (#15441)Emmanuel Vasilakis
2023-07-12Keep health log history in seconds (#15314)Emmanuel Vasilakis
* rebase * changes queries to delete based on when * readme changes * no need to do migration * wip, protect un-updated events from cleanup * remove index on when_key * fix query for claimed cleanup * if set less than minimum, set minimum * fix query * correct config assign
2023-07-11Rename log Macros (debug) (#15322)thiagoftsm
2023-07-03Send alert chart labels config key to cloud (#15283)Emmanuel Vasilakis
* add chart_labels to alert_hash * store chart_labels in alert_hash * transmit to cloud
2023-06-30Replace `info` macro with a less generic name (#15266)Carlo Cabrera
2023-06-22Create index for health log migration (#15233)Stelios Fragkakis
Create health_log_id index
2023-06-21Use a single health log table (#15157)Emmanuel Vasilakis
* move old health log tables to one * change table in sqlite_health * remove check for off period of agent * changes in aclk_alert * fixes * add new field insert_mark_timestamp * cleanup * remove hostname, create the health log table during sqlite init * create the health_log during migration * move source from health_log to alert_hash. Remove class, component and type field from health_log * Register now_usec sqlite function * use global_id instead of insert_mark_timestamp. Use function now_usec to populate it * create functions earlier to have them during migration * small unit test fix * create additional health_log_detail table. Do the insert of an alert event on both * do the update on health_log_detail * change more queries * more indexes, fix inject removed * change last executed and select health log queries * random uuid for sqlite * do migration from old tables * queries to send alerts to cloud * cleanup queries * get an alarm id from db if not found in memory * small fix on query * add info when migration completes * dont pick health_log_detail during migration * check proper old health_log table * safer migration * proper log sent alerts. small fix in claimed cleanup * cleanups * extra check for cleanup * also get an alarm_event_id from sql * check for empty source * remove cleanup of main health log table --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-06-05Generate, store and transmit a unique alert event_hash_id (#15111)Emmanuel Vasilakis
* generate and store an event_hash_id * transmit to cloud * transmit to the cloud
2023-05-02Load/Store ML models (#14981)vkalintiris
* Pass DB connection in db_execute() * Add support for loading/saving models. * Fix ML stats when no training takes place. * Make model flushing batch size configurable. * Delete unused function * Update ML config. * Restore threshold for logs/period. * Rm whitespace. * Add missing dummy function. * Update function call arguments * Guard transactions with a lock when flushing ML models. * Mark dimensions with loaded models as trained.
2023-03-22uuid_compare() replaced with uuid_memcmp() (#14787)Costa Tsaousis
replace uuid_compare() with uuid_memcmp() everywhere where the order is not important but equality is
2023-03-21Update journal v2 (#14750)Stelios Fragkakis
* Add update every in the metric index (new v2 version) Switch to using memcmp instead of uuid_compare to build and search v2 index files * Remove chart label cleanup during startup
2023-03-16Use one thread for ACLK synchonization (#14281)Stelios Fragkakis
* Remove aclk sync threads * Disable functions if compiled with --disable-cloud * Allocate and reuse buffer when scanning hosts Tune transactions when writing metadata Error checking when executing db_execute (it is already within a loop with retries) * Schedule host context load in parallel Child connection will be delayed if context load is not complete Event loop cleanup * Delay retention check if context is not loaded Remove context load check from regular metadata host scan * Improve checks to check finished threads * Cleanup warnings when compiling with --disable-cloud * Clean chart labels that were created before our current maximum retention * Fix sql statement * Remove structures members that of no use Remove buffer allocations when not needed * Fix compilation error * Don't check for service running when not from a worker * Code cleanup if agent is compiled with --disable-cloud Setup ACLK tables in the database if needed Submit node status update messages to the cloud * Fix compilation warning when --disable-cloud is specified * Address codacy issues * Remove empty file -- has already been moved under contexts * Use enum instead of numbers * Use UUID_STR_LEN * Add newline at the end of file * Release node_id to prevent memory leak under certain cases * Add queries in defines * Ignore rc from transaction start -- if there is an active transaction, we will use it (same with commit) should further improve in a future PR * Remove commented out code * If host is null (it should not be) do not allocate config (coverity reports Resource leak) * Do garbage collection when contexts is initialized * Handle the case when config is not yet available for a host
2023-03-08Fix cloud node stale status when a virtual host is created (#14660)Stelios Fragkakis
* Schedule direct metadata update on host creation Virtual hosts do not have a receiver but they are not orphan Schedule node info update on host activation New function to store host info and host_system_info If the host is just created, create tables and sync thread If the host exists during startup it is not live but reschedule node update if it is reactivated * New opcode to send current node state * Remove debug messages * Fix system host info
2023-03-07Guard for null host when sending node instances (#14673)Emmanuel Vasilakis
* guard for null host when sending node instances * also add a default value when migrating
2023-02-24Prevent core dump when the agent is performing a quick shutdown (#14587)Stelios Fragkakis
* Prevent core dump when the agent is performing a quick shutdown (e.g. when rrd_init fails) * Threads that have not started during shutdown are immediately marked as EXITED * Do not attempt to get statistics if database is not initialized * Do not attempt to get context db statistics if the context database is not initialized
2023-02-23Fix context unittest coredump (#14595)Stelios Fragkakis
* Fix compilation warning * Fix memory leak * Fix crash when calling -W ctxtest pthread keys not properly initialized when running only this test * Code cleanup
2023-01-27DBENGINE v2 - improvements part 10 (#14332)Costa Tsaousis
* replication cancels pending queries on exit * log when waiting for inflight queries * when there are collected and not-collected metrics, use the context priority from the collected only * Write metadata with a faster pace * Remove journal file size limit and sync mode to 0 / Drop wal checkpoint for now * Wrap in a big transaction remaining metadata writes (test 1) * fix higher tiers when tiering iterations = 2 * dbengine always returns db-aligned points; query engine expands the queries by 2 points in every direction to have enough data for interpolation * Wrap in a big transaction metadata writes (test 2) * replication cancelling fix * do not first and last entry in replication when the db has no retention * fix internal check condition * Increase metadata write batch size * always apply error limit to dbengine logs * Remove code that processes the obsolete health.db files * cleanup in query.c * do not allow queries to go beyond db boundaries * prevent internal log for +1 delta in timestamp * detect gap pages in conflicts * double protection for gap injection in main cache * Add checkpoint to prevent large WAL while running Remove unused and duplicate functions * do not allocate chart cache dir if not needed * add more info to unittests * revert query expansion to satisfy unittests Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-01-19Remove archivedcharts endpoint, optimize indices (#14296)Stelios Fragkakis
Remove undocumented archivedcharts endpoint. Use context endpoint instead Remove unused functions to lookup chart and dimension UUIDs Drop/Add new index for dimension and chart tables
2023-01-10DBENGINE v2 (#14125)Costa Tsaousis
* count open cache pages refering to datafile * eliminate waste flush attempts * remove eliminated variable * journal v2 scanning split functions * avoid locking open cache for a long time while migrating to journal v2 * dont acquire datafile for the loop; disable thread cancelability while a query is running * work on datafile acquiring * work on datafile deletion * work on datafile deletion again * logs of dbengine should start with DBENGINE * thread specific key for queries to check if a query finishes without a finalize * page_uuid is not used anymore * Cleanup judy traversal when building new v2 Remove not needed calls to metric registry * metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent * disable checks for invalid time-ranges * Remove type from page details * report scanning time * remove infinite loop from datafile acquire for deletion * remove infinite loop from datafile acquire for deletion again * trace query handles * properly allocate array of dimensions in replication * metrics cleanup * metrics registry uses arrayalloc * arrayalloc free should be protected by lock * use array alloc in page cache * journal v2 scanning fix * datafile reference leaking hunding * do not load metrics of future timestamps * initialize reasons * fix datafile reference leak * do not load pages that are entirely overlapped by others * expand metric retention atomically * split replication logic in initialization and execution * replication prepare ahead queries * replication prepare ahead queries fixed * fix replication workers accounting * add router active queries chart * restore accounting of pages metadata sources; cleanup replication * dont count skipped pages as unroutable * notes on services shutdown * do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file * do not add pages we dont need to pdc * time in range re-work to provide info about past and future matches * finner control on the pages selected for processing; accounting of page related issues * fix invalid reference to handle->page * eliminate data collection handle of pg_lookup_next * accounting for queries with gaps * query preprocessing the same way the processing is done; cache now supports all operations on Judy * dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3) * get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query * finner gaps calculation; accounting of overlapping pages in queries * fix gaps accounting * move datafile deletion to worker thread * tune libuv workers and thread stack size * stop netdata threads gradually * run indexing together with cache flush/evict * more work on clean shutdown * limit the number of pages to evict per run * do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction * economies on flags for smaller page footprint; cleanup and renames * eviction moves referenced pages to the end of the queue * use murmur hash for indexing partition * murmur should be static * use more indexing partitions * revert number of partitions to number of cpus * cancel threads first, then stop services * revert default thread stack size * dont execute replication requests of disconnected senders * wait more time for services that are exiting gradually * fixed last commit * finer control on page selection algorithm * default stacksize of 1MB * fix formatting * fix worker utilization going crazy when the number is rotating * avoid buffer full due to replication preprocessing of requests * support query priorities * add count of spins in spinlock when compiled with netdata internal checks * remove prioritization from dbengine queries; cache now uses mutexes for the queues * hot pages are now in sections judy arrays, like dirty * align replication queries to optimal page size * during flushing add to clean and evict in batches * Revert "during flushing add to clean and evict in batches" This reverts commit 8fb2b69d068499eacea6de8291c336e5e9f197c7. * dont lock clean while evicting pages during flushing * Revert "dont lock clean while evicting pages during flushing" This reverts commit d6c82b5f40aeba86fc7aead062fab1b819ba58b3. * Revert "Revert "during flushing add to clean and evict in batches"" This reverts commit ca7a187537fb8f743992700427e13042561211ec. * dont cross locks during flushing, for the fastest flushes possible * low-priority queries load pages synchronously * Revert "low-priority queries load pages synchronously" This reverts commit 1ef2662ddcd20fe5842b856c716df134c42d1dc7. * cache uses spinlock again * during flushing, dont lock the clean queue at all; each item is added atomically * do smaller eviction runs * evict one page at a time to minimize lock contention on the clean queue * fix eviction statistics * fix last commit * plain should be main cache * event loop cleanup; evictions and flushes can now happen concurrently * run flush and evictions from tier0 only * remove not needed variables * flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time * added worker jobs in timer to find the slow part of it * support fast eviction of pages when all_of_them is set * revert default thread stack size * bypass event loop for dispatching read extent commands to workers - send them directly * Revert "bypass event loop for dispatching read extent commands to workers - send them directly" This reverts commit 2c08bc5bab12881ae33bc73ce5dea03dfc4e1fce. * cache work requests * minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors * publish flushed pages to open cache in the thread pool * prevent eventloop requests from getting stacked in the event loop * single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c * more rrdengine.c cleanup * enable db rotation * do not log when there is a filter * do not run multiple migration to journal v2 * load all extents async * fix wrong paste * report opcodes waiting, works dispatched, works executing * cleanup event loop memory every 10 minutes * dont dispatch more work requests than the number of threads available * use the dispatched counter instead of the executing counter to check if the worker thread pool is full * remove UV_RUN_NOWAIT * replication to fill the queues * caching of extent buffers; code cleanup * caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation * single transaction wal * synchronous flushing * first cancel the threads, then signal them to exit * caching of rrdeng query handles; added priority to query target; health is now low prio * add priority to the missing points; do not allow critical priority in queries * offload query preparation and routing to libuv thread pool * updated timing charts for the offloaded query preparation * caching of WALs * accounting for struct caches (buffers); do not load extents with invalid sizes * protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset * also check if the expanded before is not the chart later updated time * also check if the expanded before is not after the wall clock time of when the query started * Remove unused variable * replication to queue less queries; cleanup of internal fatals * Mark dimension to be updated async * caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol) * disable pgc stress test, under an ifdef * disable mrg stress test under an ifdef * Mark chart and host labels, host info for async check and store in the database * dictionary items use arrayalloc * cache section pages structure is allocated with arrayalloc * Add function to wakeup the aclk query threads and check for exit Register function to be called during shutdown after signaling the service to exit * parallel preparation of all dimensions of queries * be more sensitive to enable streaming after replication * atomically finish chart replication * fix last commit * fix last commit again * fix last commit again again * fix last commit again again again * unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication * do not cancel start streaming; use high priority queries when we have locked chart data collection * prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered * opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue * Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore * Fix bad memory allocation / cleanup * Cleanup ACLK sync initialization (part 1) * Don't update metric registry during shutdown (part 1) * Prevent crash when dashboard is refreshed and host goes away * Mark ctx that is shutting down. Test not adding flushed pages to open cache as hot if we are shutting down * make ML work * Fix compile without NETDATA_INTERNAL_CHECKS * shutdown each ctx independently * fix completion of quiesce * do not update shared ML charts * Create ML charts on child hosts. When a parent runs a ML for a child, the relevant-ML charts should be created on the child host. These charts should use the parent's hostname to differentiate multiple parents that might run ML for a child. The only exception to this rule is the training/prediction resource usage charts. These are created on the localhost of the parent host, because they provide information specific to said host. * check new ml code * first save the database, then free all memory * dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db * Cleanup metadata thread (part 2) * increase refcount before dispatching prep command * Do not try to stop anomaly detection threads twice. A separate function call has been added to stop anomaly detection threads. This commit removes the left over function calls that were made internally when a host was being created/destroyed. * Remove allocations when smoothing samples buffer The number of dims per sample is always 1, ie. we are training and predicting only individual dimensions. * set the orphan flag when loading archived hosts * track worker dispatch callbacks and threadpool worker init * make ML threads joinable; mark ctx having flushing in progress as early as possible * fix allocation counter * Cleanup metadata thread (part 3) * Cleanup metadata thread (part 4) * Skip metadata host scan when running unittest * unittest support during init * dont use all the libuv threads for queries * break an infinite loop when sleep_usec() is interrupted * ml prediction is a collector for several charts * sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit * worker_unregister() in netdata threads cleanup * moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h * fixed ML issues * removed engine2 directory * added dbengine2 files in CMakeLists.txt * move query plan data to query target, so that they can be exposed by in jsonwrap * uniform definition of query plan according to the other query target members * event_loop should be in daemon, not libnetdata * metric_retention_by_uuid() is now part of the storage engine abstraction * unify time_t variables to have the suffix _s (meaning: seconds) * old dbengine statistics become "dbengine io" * do not enable ML resource usage charts by default * unify ml chart families, plugins and modules * cleanup query plans from query target * cleanup all extent buffers * added debug info for rrddim slot to time * rrddim now does proper gap management * full rewrite of the mem modes * use library functions for madvise * use CHECKSUM_SZ for the checksum size * fix coverity warning about the impossible case of returning a page that is entirely in the past of the query * fix dbengine shutdown * keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently * fine tune cache evictions * dont initialize health if the health service is not running - prevent crash on shutdown while children get connected * rename AS threads to ACLK[hostname] * prevent re-use of uninitialized memory in queries * use JulyL instead of JudyL for PDC operations - to test it first * add also JulyL files * fix July memory accounting * disable July for PDC (use Judy) * use the function to remove datafiles from linked list * fix july and event_loop * add july to libnetdata subdirs * rename time_t variables that end in _t to end in _s * replicate when there is a gap at the beginning of the replication period * reset postponing of sender connections when a receiver is connected * Adjust update every properly * fix replication infinite loop due to last change * packed enums in rrd.h and cleanup of obsolete rrd structure members * prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests() * void unused variable * void unused variables * fix indentation * entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps * macros to caclulate page entries by time and size * prevent statsd cleanup crash on exit * cleanup health thread related variables Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: vkalintiris <vasilis@netdata.cloud>
2022-12-02Fix 1.37 crashes (#14081)Stelios Fragkakis
* Wait for pending read to complete before destroying the page * fix page alignment crash * Compare copy of descriptor * prevent workers crashes by disabling cancellability on critical areas and separate sqlite3 statistics to its own worker job * do not update sqlite3 stats when they are slow * do not query sqlite3 statistics when they are slow * flipped condition * sqlite3 proper timeout calculation Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
2022-11-28replication fixes No 7 (#14053)Costa Tsaousis
* move global statistics workers to a separate thread; query statistics per query source; query statistics for ML, exporters, backfilling; reset replication point in time every 10 seconds, instead of every 1; fix compilation warnings; optimize the replication queries code; prevent long tail of replication requests (big sleeps); provide query statistics about replication ; optimize replication sender when most senders are full; optimize replication_request_get_first_available(); reset replication completion calculation; * remove workers utilization from global statistics thread