summaryrefslogtreecommitdiffstats
path: root/libnetdata
AgeCommit message (Collapse)Author
2022-10-13overload libc memory allocators with custom ones to trace all allocations ↵Costa Tsaousis
(#13810) * overload libc memory allocators with custom ones to trace all allocations * grab libc pointers for external c plugins * use -ldl when necessary; fallback to work without dlsym when it is not available * initialize global variable * add optional dl libs * dynamically link every library function when needed for the first time * prevent crashes on musl libc * another attempt * dont dereference function * attempt no 3 * attempt no 4 * cleanup - all attempts failed * dont enable tracing of allocations * missing parenthesis
2022-10-10ARAL optimal alloc size (#13804)Costa Tsaousis
aral optimal alloc size
2022-10-10internal log error, when passing NULL dictionary (#13803)Costa Tsaousis
* fixes internal log error, when passing NULL dictionary * item is needed - fatal provides enough info
2022-10-10Return memory freed properly (#13799)Stelios Fragkakis
2022-10-10Array Allocator Memory Leak Fix (#13792)Costa Tsaousis
* aral cleans up previous mmap files; aral uses double linked list macros; aral got more internal checks * fixed aral memory leak * updated macro
2022-10-09full memory tracking and profiling of Netdata Agent (#13789)Costa Tsaousis
* full memory tracking and profiling of Netdata Agent * initialize dbengine only when it is needed * handling of dbengine compiled but not available * restore unittest * restore unittest again * more improvements about ifdef dbengine * fix compilation when dbengine is not enabled * check if dbengine is enabled on exit * call freez() not free() * aral unittest * internal checks activate trace allocations; dev mode activates internal checks
2022-10-09Remove extern from function declared in headers. (#13790)vkalintiris
By default functions are declared as extern in C/C++ headers. The goal of this PR is to reduce the wall of text that many headers have and, more importantly, to make the declaration of extern'd variables - of which we have many dispersed in various places - easily and quickly identifiable. Automatically generated with: $ git grep -l '^extern.*(' '**.h' | \ grep -v libjudy | \ grep -v 'sqlite3.h' | \ xargs sed -i -e 's/extern \(.*(.*$\)/\1/' This is a NFC.
2022-10-06Rename variable for old CentOS version (#13775)thiagoftsm
2022-10-05fix bad merge (#13764)Costa Tsaousis
* fix bad merge * fix for ssl_security_cert
2022-10-05Allow netdata plugins to expose functions for querying more information ↵Costa Tsaousis
about specific charts (#13720) * function renames and code cleanup in popen.c; no actual code changes * netdata popen() now opens both child process stdin and stdout and returns FILE * for both * pass both input and output to parser structures * updated rrdset to call custom functions * RRDSET FUNCTION leading calls for both sync and async operation * put RRDSET functions to a separate file * added format and timeout at function definition * support for synchronous (internal plugins) and asynchronous (external plugins and children) functions * /api/v1/function endpoint * functions are now attached to the host and there is a dictionary view per chart * functions implemented at plugins.d * remove the defer until keyword hook from plugins.d when it is done * stream sender implementation of functions * sanitization of all functions so that certain characters are only allowed * strictier sanitization * common max size * 1st working plugins.d example * always init inflight dictionary * properly destroy dictionaries to avoid parallel insertion of items * add more debugging on disconnection reasons * add more debugging on disconnection reasons again * streaming receiver respects newlines * dont use the same fp for both streaming receive and send * dont free dbengine memory with internal checks * make sender proceed in the buffer * added timing info and garbage collection at plugins.d * added info about routing nodes * added info about routing nodes with delay * added more info about delays * added more info about delays again * signal sending thread to wake up * streaming version labeling and commented code to support capabilities * added functions to /api/v1/data, /api/v1/charts, /api/v1/chart, /api/v1/info * redirect top output to stdout * address coverity findings * fix resource leaks of popen * log attempts to connect to individual destinations * better messages * properly parse destinations * try to find a function from the most matching to the least matching * log added streaming destinations * rotate destinations bypassing a node in the middle that does not accept our connection * break the loops properly * use typedef to define callbacks * capabilities negotiation during streaming * functions exposed upstream based on capabilities; compression disabled per node persisting reconnects; always try to connect with all capabilities * restore functionality to lookup functions * better logging of capabilities * remove old versions from capabilities when a newer version is there * fix formatting * optimization for plugins.d rrdlabels to avoid creating and destructing dictionaries all the time * delayed health initialization for rrddim and rrdset * cleanup health initialization * fix for popen() not returning the right value * add health worker jobs for initializing rrdset and rrddim * added content type support for functions; apps.plugin permanent function to display all the processes * fixes for functions parameters parsing in apps.plugin * fix for process matching in apps.plugiin * first working function for apps.plugin * Dashboard ACL is disabled for functions; Function errors are all in JSON format * apps.plugin function processes returns json table * use json_escape_string() to escape message * fix formatting * apps.plugin exposes all its metrics to function processes * fix json formatting when filtering out some rows * reopen the internal pipe of rrdpush in case of errors * misplaced statement * do not use buffer->len * support for GLOBAL functions (functions that are not linked to a chart * added /api/v1/functions endpoint; removed format from the FUNCTIONS api; * swagger documentation about the new api end points * added plugins.d documentation about functions * never re-close a file * remove uncessesary ifdef * fixed issues identified by codacy * fix for null label value * make edit-config copy-and-paste friendly * Revert "make edit-config copy-and-paste friendly" This reverts commit 54500c0e0a97f65a0c66c4d34e966f6a9056698e. * reworked sender handshake to fix coverity findings * timeout is zero, for both send_timeout() and recv_timeout() * properly detect that parent closed the socket * support caching of function responses; limit function response to 10MB; added protection from malformed function responses * disabled excessive logging * added units to apps.plugin function processes and normalized all values to be human readable * shorter field names * fixed issues reported * fixed apps.plugin error response; tested that pluginsd can properly handle faulty responses * use double linked list macros for double linked list management * faster apps.plugin function printing by minimizing file operations * added memory percentage * fix compatibility issues with older compilers and FreeBSD * rrdpush sender code cleanup; rrhost structure cleanup from sender flags and variables; * fix letftover variable in ifdef * apps.plugin: do not call detach from the thread; exit immediately when input is broken * exclude AR charts from health * flush cleaner; prefer sender output * clarity * do not fill the cbuffer if not connected * fix * dont enabled host->sender if streaming is not enabled; send host label updates to parent; * functions are only available through ACLK * Prepared statement reports only in dev mode * fix AR chart detection * fix for streaming not being enabling itself * more cleanup of sender and receiver structures * moved read-only flags and configuration options to rrdhost->options * fixed merge with master * fix for incomplete rename * prevent service thread from working on charts that are being collected Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-09-26Fix warnings during compilation time on ARM (32 bits) (#13681)thiagoftsm
2022-09-24Faster streaming by 25% on the child (#13708)Costa Tsaousis
* faster printing of BEGIN, SET, END * fewer conditions * faster buffer_fast_strcat() * faster buffer_fast_strcat() fix * eliminate atomic operations and conditions in the BEGIN, SET, END flow * removed unecessary condition
2022-09-20dictionary updated documentation and cosmetics (#13679)Costa Tsaousis
* dictionary updated documentation and cosmetics * improved dictionaries views unit test
2022-09-20disable internal log (#13678)Costa Tsaousis
2022-09-19RRD structures managed by dictionaries (#13646)Costa Tsaousis
* rrdset - in progress * rrdset optimal constructor; rrdset conflict * rrdset final touches * re-organization of rrdset object members * prevent use-after-free * dictionary dfe supports also counting of iterations * rrddim managed by dictionary * rrd.h cleanup * DICTIONARY_ITEM now is referencing actual dictionary items in the code * removed rrdset linked list * Revert "removed rrdset linked list" This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5. * removed rrdset linked list * added comments * Switch chart uuid to static allocation in rrdset Remove unused functions * rrdset_archive() and friends... * always create rrdfamily * enable ml_free_dimension * rrddim_foreach done with dfe * most custom rrddim loops replaced with rrddim_foreach * removed accesses to rrddim->dimensions * removed locks that are no longer needed * rrdsetvar is now managed by the dictionary * set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853 * conflict callback of rrdsetvar now properly checks if it has to reset the variable * dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM * dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe * dictionary walkthrough callbacks get dictionary acquired items * dictionary reference counters that can be dupped from zero * added advanced functions for get and del * rrdvar managed by dictionaries * thread safety for rrdsetvar * faster rrdvar initialization * rrdvar string lengths should match in all add, del, get functions * rrdvar internals hidden from the rest of the world * rrdvar is now acquired throughout netdata * hide the internal structures of rrdsetvar * rrdsetvar is now acquired through out netdata * rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata * better error handling * dont create variables if not initialized for health * dont create variables if not initialized for health again * rrdfamily is now managed by dictionaries; references of it are acquired dictionary items * type checking on acquired objects * rrdcalc renaming of functions * type checking for rrdfamily_acquired * rrdcalc managed by dictionaries * rrdcalc double free fix * host rrdvars is always needed * attempt to fix deadlock 1 * attempt to fix deadlock 2 * Remove unused variable * attempt to fix deadlock 3 * snprintfz * rrdcalc index in rrdset fix * Stop storing active charts and computing chart hashes * Remove store active chart function * Remove compute chart hash function * Remove sql_store_chart_hash function * Remove store_active_dimension function * dictionary delayed destruction * formatting and cleanup * zero dictionary base on rrdsetvar * added internal error to log delayed destructions of dictionaries * typo in rrddimvar * added debugging info to dictionary * debug info * fix for rrdcalc keys being empty * remove forgotten unlock * remove deadlock * Switch to metadata version 5 and drop chart_hash chart_hash_map chart_active dimension_active v_chart_hash * SQL cosmetic changes * do not busy wait while destroying a referenced dictionary * remove deadlock * code cleanup; re-organization; * fast cleanup and flushing of dictionaries * number formatting fixes * do not delete configured alerts when archiving a chart * rrddim obsolete linked list management outside dictionaries * removed duplicate contexts call * fix crash when rrdfamily is not initialized * dont keep rrddimvar referenced * properly cleanup rrdvar * removed some locks * Do not attempt to cleanup chart_hash / chart_hash_map * rrdcalctemplate managed by dictionary * register callbacks on the right dictionary * removed some more locks * rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread * when looking up for an alarm look using both chart id and chart name * host initialization a bit more modular * init rrdlabels on host update * preparation for dictionary views * improved comment * unused variables without internal checks * service threads isolation and worker info * more worker info in service thread * thread cancelability debugging with internal checks * strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647 * dictionary modularization * Remove unused SQL statement definition * unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated * remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops * rewritten dictionary to have 2 separate locks, one for indexing and another for traversal * Update collectors/cgroups.plugin/sys_fs_cgroup.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * Update collectors/cgroups.plugin/sys_fs_cgroup.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * Update collectors/proc.plugin/proc_net_dev.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * fix memory leak in rrdset cache_dir * minor dictionary changes * dont use index locks in single threaded * obsolete dict option * rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim; * fix jump on uninitialized value in dictionary; remove double free of cache_dir * addressed codacy findings * removed debugging code * use the private refcount on dictionaries * make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim; * more dictionary statistics * global statistics about dictionary operations, memory, items, callbacks * dictionary support for views - missing the public API * removed warning about unused parameter * chart and context name for cloud * chart and context name for cloud, again * dictionary statistics fixed; first implementation of dictionary views - not currently used * only the master can globally delete an item * context needs netdata prefix * fix context and chart it of spins * fix for host variables when health is not enabled * run garbage collector on item insert too * Fix info message; remove extra "using" * update dict unittest for new placement of garbage collector * we need RRDHOST->rrdvars for maintaining custom host variables * Health initialization needs the host->host_uuid * split STRING to its own files; no code changes other than that * initialize health unconditionally * unit tests do not pollute the global scope with their variables * Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-07fix rrdcontexts left in the post-processing queue from the garbage collector ↵Costa Tsaousis
(#13645) * fix rrdcontexts left in the post-processing queue from the garbage collector * set the queuing flags atomically, using the dictionary callbacks
2022-09-07eBPF different improvements (#13624)thiagoftsm
2022-09-07fix compile issues (#13640)Costa Tsaousis
2022-09-07Updated tc.plugin (linux bandwidth QoS) (#13634)Costa Tsaousis
* modernized tc plugins (replacing AVL with DICTIONARY, char * with STRING *, linked list with DICTIONARY) and added labels * replace custom metrics with worker metrics and add monitoring number of devices and classes * revert bool to char * Revert "revert bool to char" This reverts commit fae9b92dc0be4b00a5694f8b90972787879ae8a7. * fixed typo * removed bitfield from bool * commented unused function * commented unused function again
2022-09-06Faster rrdcontext (#13629)Costa Tsaousis
* moved rrdcontexts processing to worker thread * added loggings * check for aclk deeper in the code * removed unessesary logs * code re-organization; cleanup; more comments; better error handling; rrdcontext locks optimization; more clarity * updated 2 comments * make instances walkthrough reentrant; move context lock to the place is really needed * created macro for reentrant dictionary walkthrough * incremental updates on instances and metrics * renamed family of rrdcontext workers * prevent crash in case RRDINSTANCE or RRDMETRIC is freed during shutdown * prevent crash during rrddim save, on out of memory fatal() * always post-process contexts * added tracing for tracking the caller that trigger updates * more details on tracing info * fix for charts that are collected without metrics
2022-09-05Deduplicate all netdata strings (#13570)Costa Tsaousis
* rrdfamily * rrddim * rrdset plugin and module names * rrdset units * rrdset type * rrdset family * rrdset title * rrdset title more * rrdset context * rrdcalctemplate context and removal of context hash from rrdset * strings statistics * rrdset name * rearranged members of rrdset * eliminate rrdset name hash; rrdcalc chart converted to STRING * rrdset id, eliminated rrdset hash * rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate * rrdcalctemplate * rrdvar * eval_variable * rrddimvar and rrdsetvar * rrdhost hostname, os and tags * fix master commits * added thread cache; implemented string_dup without locks * faster thread cache * rrdset and rrddim now use dictionaries for indexing * rrdhost now uses dictionary * rrdfamily now uses DICTIONARY * rrdvar using dictionary instead of AVL * allocate the right size to rrdvar flag members * rrdhost remaining char * members to STRING * * better error handling on indexing * strings now use a read/write lock to allow parallel searches to the index * removed AVL support from dictionaries; implemented STRING with native Judy calls * string releases should be negative * only 31 bits are allowed for enum flags * proper locking on strings * string threading unittest and fixes * fix lgtm finding * fixed naming * stream chart/dimension definitions at the beginning of a streaming session * thread stack variable is undefined on thread cancel * rrdcontext garbage collect per host on startup * worker control in garbage collection * relaxed deletion of rrdmetrics * type checking on dictfe * netdata chart to monitor rrdcontext triggers * Group chart label updates * rrdcontext better handling of collected rrdsets * rrdpush incremental transmition of definitions should use as much buffer as possible * require 1MB per chart * empty the sender buffer before enabling metrics streaming * fill up to 50% of buffer * reset signaling metrics sending * use the shared variable for status * use separate host flag for enabling streaming of metrics * make sure the flag is clear * add logging for streaming * add logging for streaming on buffer overflow * circular_buffer proper sizing * removed obsolete logs * do not execute worker jobs if not necessary * better messages about compression disabling * proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped * monitor stream sender used buffer ratio * Update exporting unit tests * no need to compare label value with strcmp * streaming send workers now monitor bandwidth * workers now use strings * streaming receiver monitors incoming bandwidth * parser shift of worker ids * minor fixes * Group chart label updates * Populate context with dimensions that have data * Fix chart id * better shift of parser worker ids * fix for streaming compression * properly count received bytes * ensure LZ4 compression ring buffer does not wrap prematurely * do not stream empty charts; do not process empty instances in rrdcontext * need_to_send_chart_definition() does not need an rrdset lock any more * rrdcontext objects are collected, after data have been written to the db * better logging of RRDCONTEXT transitions * always set all variables needed by the worker utilization charts * implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes * lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock * STRING code re-organization for clarity * thread_cache improvements; double numbers precision on worker threads * STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS * rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts * Add index to speed up database context searches * Removed last_updated optimization (was also buggy after latest merge with master) Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-08-23Improve PID monitoring (step 2) (#13530)thiagoftsm
2022-08-12Modify PID monitoring (ebpf.plugin) (#13397)thiagoftsm
2022-08-05Remove SIGSEGV and SIGABRT (ebpf.plugin) (#13407)thiagoftsm
2022-07-26Tiering statistics API endpoint (#13420)Costa Tsaousis
* calculator statistics * added metrics and metrics_pages counters * implemented API * updates to match sheet * updates to match sheet No2 * fix update every calculation for single point pages * fix lgtm finding
2022-07-26Set value to SN_EMPTY_SLOT if flags is SN_EMPTY_SLOT (#13417)Emmanuel Vasilakis
* set value to SN_EMPTY_SLOT if flags is SN_EMPTY_SLOT * SN_EMPTY_SLOT should be SN_ANOMALOUS_ZERO * added the const attribute to pack_storage_number() * tier1 uses floats * zero should not be empty slot * add unlikely * rename all SN flags to be more meaningful * proper check for zero double value * properly check if pages are full with empty points Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
2022-07-24Rrdcontext (#13335)Costa Tsaousis
* type checking on dictionary return values * first STRING implementation, used by DICTIONARY and RRDLABEL * enable AVL compilation of STRING * Initial functions to store context info * Call simple test functions * Add host_id when getting charts * Allow host to be null and in this case it will process the localhost * Simplify init Do not use strdupz - link directly to sqlite result set * Init the database during startup * make it compile - no functionality yet * intermediate commit * intermidiate * first interface to sql * loading instances * check if we need to update cloud * comparison of rrdcontext on conflict * merge context titles * rrdcontext public interface; statistics on STRING; scratchpad on DICTIONARY * dictionaries maintain version numbers; rrdcontext api * cascading changes * first operational cleanup * string unittest * proper cleanup of referenced dictionaries * added rrdmetrics * rrdmetric starting retention * Add fields to context Adjuct context creation and delete * Memory cleanup * Fix get context list Fix memory double free in tests Store context with two hosts * calculated retention * rrdcontext retention with collection * Persist database and shutdown * loading all from sql * Get chart list and dimension list changes * fully working attempt 1 * fully working attempt 2 * missing archived flag from log * fixed archived / collected * operational * proper cleanup * cleanup - implemented all interface functions - dictionary react callback triggers after the dictionary is unlocked * track all reasons for changes * proper tracking of reasons of changes * fully working thread * better versioning of contexts * fix string indexing with AVL * running version per context vs hub version; ifdef dbengine * added option to disable rrdmetrics * release old context when a chart changes context * cleanup properly * renamed config * cleanup contexts; general cleanup; * deletion inline with dequeue; lots of cleanup; child connected/disconnected * ml should start after rrdcontext * added missing NULL to ri->rrdset; rrdcontext flags are now only changed under a mutex lock * fix buggy STRING under AVL * Rework database initialization Add migration logic to the context database * fix data race conditions during context deletion * added version hash algorithm * fix string over AVL * update aclk-schemas * compile new ctx related protos * add ctx stream message utils * add context messages * add dummy rx message handlers * add the new topics * add ctx capability * add helper functions to send the new messages * update cmake build to not fail * update topic names * handle rrdcontext_enabled * add more functions * fatal on OOM cases instead of return NULL * silence unknown query type error * fully working attempt 1 * fully working attempt 2 * allow compiling without ACLK * added family to the context * removed excess character in UUID * smarter merging of titles and families * Database migration code to add family Add family to SQL_CHART_DATA and VERSIONED_CONTEXT_DATA * add family to context message * enable ctx in communication * hardcoded enabled contexts * Add hard code for CTX * add update node collectors to json * add context message log * fix log about last_time_t * fix collected flags for queued items * prevent crash on charts cleanup * fix bug in AVL indexing of dictionaries; make sure react callback of dictionaries has a reference counter, which is acquired while the dictionary is locked * fixed dictionary unittest * strict policy to cleanup and garbage collector * fix db rotation and garbage collection timings * remove deadlock * proper garbage collection - a lot faster retention recalculation * Added not NULL in database columns Remove migration code for context -- we will ship with version 1 of the table schema Added define for query in tests to detect localhost * Use UUID_STR_LEN instead of GUID_LEN + 1 Use realistic timestamps when adding test data in the database * Add NULL checks for passed parameters * Log deleted context when compiled with NETDATA_INTERNAL_CHECKS * Error checking for null host id * add missing ContextsCheckpoint log convertor * Fix spelling in VACCUM * Hold additional information for host -- prepare to load archived hosts on startup * Make sure claim id is valid * is_get_claimed is actually get the current claim id * Simplify ctx get chart list query * remove env negotiation * fix string unittest when there are some strings already in the index * propagate live-retention flag upstream; cleanup all update reasons; updated instances logging; automated attaching started/stopped collecting flags; * first implementation of /api/v1/contexts * full contexts API; updated swagger * disabled debugging; rrdcontext enabled by default * final cleanup and renaming of global variables * return current time on currently collected contexts, charts and dimensions * added option "deepscan" to the API to have the server refresh the retention and recalculate the contexts on the fly * fixed identation of yaml * Add constrains to the host table * host->node_id may not be available * new capabilities * lock the context while rendering json * update aclk-schemas * added permanent labels to all charts about plugin, module and family; added labels to all proc plugin modules * always add the labels * allow merging of families down to [x] * dont show uuids by default, added option to enable them; response is now accepting after,before to show only data for a specific timeframe; deleted items are only shown when "deleted" is requested; hub version is now shown when "queue" is requested * Use the localhost claim id * Fix to handle host constrains better * cgroups: add "k8s." prefix to chart context in k8s * Improve sqlite metadata version migration check * empty values set to "[none]"; fix labels unit test to reflect that * Check if we reached the version we want first (address CODACY report re: Array index 'i' is used before limits check) * Rewrite condition to address CODACY report (Redundant condition: t->filter_callback. '!A || (A && B)' is equivalent to '!A || B') * Properly unlock context * fixed memory leak on rrdcontexts - it was not freeing all dictionaries in rrdhost; added wait of up to 100ms on dictionary_destroy() to give time to dictionaries to release their items before destroying them * fixed memory leak on rrdlabels not freed on rrdinstances * fixed leak when dimensions and charts are redefined * Mark entries for charts and dimensions as submitted to the cloud 3600 seconds after their creation Mark entries for charts and dimensions as updated (confirmed by the cloud) 1800 seconds after their submission * renamed struct string * update cgroups alarms * fixed codacy suggestions * update dashboard info * fix k8s_cgroup_10s_received_packets_storm alarm * added filtering options to /api/v1/contexts and /api/v1/context * fix eslint * fix eslint * Fix pointer binding for host / chart uuids * Fix cgroups unit tests * fixed non-retention updates not propagated upstream * removed non-fatal fatals * Remove context from 2 way string merge. * Move string_2way_merge to dictionary.c * Add 2-way string merge tests. * split long lines * fix indentation in netdata-swagger.yaml * update netdata-swagger.json * yamllint please * remove the deleted flag when a context is collected * fix yaml warning in swagger * removed non-fatal fatals * charts should now be able to switch contexts * allow deletion of unused metrics, instances and contexts * keep the queued flag * cleanup old rrdinstance labels * dont hide objects when there is no filter; mark objects as deleted when there are no sub-objects * delete old instances once they changed context * delete all instances and contexts that do not have sub-objects * more precise transitions * Load archived hosts on startup (part 1) * update the queued time every time * disable by default; dedup deleted dimensions after snapshot * Load archived hosts on startup (part 2) * delayed processing of events until charts are being collected * remove dont-trigger flag when object is collected * polish all triggers given the new dont_process flag * Remove always true condition Enums for readbility / create_host_callback only if ACLK is enabled (for now) * Skip retention message if context streaming is enabled Add messages in the access log if context streaming is enabled * Check for node id being a UUID that can be parsed Improve error check / reporting when loading archived hosts and creating ACLK sync threads * collected, archived, deleted are now mutually exclusive * Enable the "orphan" handling for now Remove dead code Fix memory leak on free host * Queue charts and dimensions will be no-op if host is set to stream contexts * removed unused parameter and made sure flags are set on rrdcontext insert * make the rrdcontext thread abort mid-work when exiting * Skip chart hash computation and storage if contexts streaming is enabled Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Timo <timotej@netdata.cloud> Co-authored-by: ilyam8 <ilya@netdata.cloud> Co-authored-by: Vladimir Kobal <vlad@prokk.net> Co-authored-by: Vasilis Kalintiris <vasilis@netdata.cloud>
2022-07-22include Judy into our source tree (#13362)Timotej S
2022-07-19Fix coverity issue 379240 (Unchecked return value) (#13401)Stelios Fragkakis
2022-07-18Fix chart update ebpf.plugin (#13351)thiagoftsm
2022-07-12Address Coverity issues (#13364)Stelios Fragkakis
2022-07-11Detect stored metric size by page type (#13334)Stelios Fragkakis
* Report unknown page only once Get metric storage size by the page type Verify validity of the page and skip problematic ones * Change PAGE_SIZE to PAGE_POINT_SIZE_BYTES * Add bitmap256 and unittests * Fix unit test tier_page_type array page_type_size arrays * Add another counter to not rely on uint8_t overflow to stop the test loop
2022-07-08fix 32bit calculation on array allocator (#13343)Costa Tsaousis
fix aral on 31bit
2022-07-08Better ACLK debug communication log (#13281)Timotej S
2022-07-08array allocator for dbengine page descriptors (#13312)Costa Tsaousis
* array allocator for dbengine page descriptors * full implementation of array allocator with cleanup * faster deallocations * eliminate entierely the need for loops during free * addressed comments * lower the min number of elements to 10
2022-07-07Protect shared variables with log lock. (#13306)vkalintiris
Fixes reports by helgrind: ``` ==00:00:00:01.769 44512== Possible data race during read of size 8 at 0x9767B0 by thread #4 ==00:00:00:01.769 44512== Locks held: none ==00:00:00:01.769 44512== at 0x17CB56: error_log_limit (log.c:627) ==00:00:00:01.769 44512== by 0x17CEC0: info_int (log.c:716) ==00:00:00:01.769 44512== by 0x18949F: thread_start (threads.c:173) ==00:00:00:01.769 44512== by 0x484A486: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==00:00:00:01.769 44512== by 0x4E9CD7F: start_thread (pthread_create.c:481) ==00:00:00:01.769 44512== by 0x532F76E: clone (clone.S:95) ==00:00:00:01.769 44512== ==00:00:00:01.769 44512== This conflicts with a previous write of size 8 by thread #3 ==00:00:00:01.769 44512== Locks held: none ==00:00:00:01.769 44512== at 0x17CB61: error_log_limit (log.c:627) ==00:00:00:01.769 44512== by 0x17CEC0: info_int (log.c:716) ==00:00:00:01.769 44512== by 0x18949F: thread_start (threads.c:173) ==00:00:00:01.769 44512== by 0x484A486: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==00:00:00:01.769 44512== by 0x4E9CD7F: start_thread (pthread_create.c:481) ==00:00:00:01.769 44512== by 0x532F76E: clone (clone.S:95) ==00:00:00:01.769 44512== Address 0x9767b0 is 0 bytes inside data symbol "counter.1" ``` ``` ==00:00:00:44.536 47685== Lock at 0x976720 was first observed ==00:00:00:44.536 47685== at 0x48477EF: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==00:00:00:44.536 47685== by 0x17BBF4: __netdata_mutex_lock (locks.c:86) ==00:00:00:44.536 47685== by 0x17C514: log_lock (log.c:471) ==00:00:00:44.536 47685== by 0x17CEC0: info_int (log.c:715) ==00:00:00:44.536 47685== by 0x458C9E: compute_multidb_diskspace (rrdenginelib.c:279) ==00:00:00:44.536 47685== by 0x15B170: get_netdata_configured_variables (main.c:671) ==00:00:00:44.536 47685== by 0x15CE6C: main (main.c:1263) ==00:00:00:44.536 47685== Address 0x976720 is 0 bytes inside data symbol "log_mutex" ==00:00:00:44.536 47685== ==00:00:00:44.536 47685== Possible data race during write of size 8 at 0x9767A0 by thread #1 ==00:00:00:44.536 47685== Locks held: none ==00:00:00:44.536 47685== at 0x17CB39: error_log_limit (log.c:621) ==00:00:00:44.536 47685== by 0x15E234: signals_handle (signals.c:258) ==00:00:00:44.536 47685== by 0x15D880: main (main.c:1534) ==00:00:00:44.536 47685== ==00:00:00:44.536 47685== This conflicts with a previous read of size 8 by thread #9 ==00:00:00:44.536 47685== Locks held: 1, at address 0x976720 ==00:00:00:44.536 47685== at 0x17CAA3: error_log_limit (log.c:604) ==00:00:00:44.536 47685== by 0x17CECA: info_int (log.c:718) ==00:00:00:44.536 47685== by 0x4624D2: rrdset_done_push (rrdpush.c:344) ==00:00:00:44.536 47685== by 0x36190C: rrdset_done (rrdset.c:1351) ==00:00:00:44.536 47685== by 0x1B07E7: Chart::update(unsigned long) (plugin_profile.cc:82) ==00:00:00:44.536 47685== by 0x1B01D4: updateCharts(std::vector<Chart*, std::allocator<Chart*> >, unsigned long) (plugin_profile.cc:126) ==00:00:00:44.536 47685== by 0x1B02AC: profile_main (plugin_profile.cc:144) ==00:00:00:44.536 47685== by 0x1895D4: thread_start (threads.c:185) ==00:00:00:44.536 47685== Address 0x9767a0 is 0 bytes inside data symbol "start.3" ```
2022-07-06Multi-Tier database backend for long term metrics storage (#13263)Stelios Fragkakis
* Tier part 1 * Tier part 2 * Tier part 3 * Tier part 4 * Tier part 5 * Fix some ML compilation errors * fix more conflicts * pass proper tier * move metric_uuid from state to RRDDIM * move aclk_live_status from state to RRDDIM * move ml_dimension from state to RRDDIM * abstracted the data collection interface * support flushing for mem db too * abstracted the query api * abstracted latest/oldest time per metric * cleanup * store_metric for tier1 * fix for store_metric * allow multiple tiers, more than 2 * state to tier * Change storage type in db. Query param to request min, max, sum or average * Store tier data correctly * Fix skipping tier page type * Add tier grouping in the tier * Fix to handle archived charts (part 1) * Temp fix for query granularity when requesting tier1 data * Fix parameters in the correct order and calculate the anomaly based on the anomaly count * Proper tiering grouping * Anomaly calculation based on anomaly count * force type checking on storage handles * update cmocka tests * fully dynamic number of storage tiers * fix static allocation * configure grouping for all tiers; disable tiers for unittest; disable statsd configuration for private charts mode * use default page dt using the tiering info * automatic selection of tier * fix for automatic selection of tier * working prototype of dynamic tier selection * automatic selection of tier done right (I hope) * ask for the proper tier value, based on the grouping function * fixes for unittests and load_metric_next() * fixes for lgtm findings * minor renames * add dbengine to page cache size setting * add dbengine to page cache with malloc * query engine optimized to loop as little are required based on the view_update_every * query engine grouping methods now do not assume a constant number of points per group and they allocate memory with OWA * report db points per tier in jsonwrap * query planer that switches database tiers on the fly to satisfy the query for the entire timeframe * dbegnine statistics and documentation (in progress) * calculate average point duration in db * handle single point pages the best we can * handle single point pages even better * Keep page type in the rrdeng_page_descr * updated doc * handle future backwards compatibility - improved statistics * support &tier=X in queries * enfore increasing iterations on tiers * tier 1 is always 1 iteration * backfilling higher tiers on first data collection * reversed anomaly bit * set up to 5 tiers * natural points should only be offered on tier 0, except a specific tier is selected * do not allow more than 65535 points of tier0 to be aggregated on any tier * Work only on actually activated tiers * fix query interpolation * fix query interpolation again * fix lgtm finding * Activate one tier for now * backfilling of higher tiers using raw metrics from lower tiers * fix for crash on start when storage tiers is increased from the default * more statistics on exit * fix bug that prevented higher tiers to get any values; added backfilling options * fixed the statistics log line * removed limit of 255 iterations per tier; moved the code of freezing rd->tiers[x]->db_metric_handle * fixed division by zero on zero points_wanted * removed dead code * Decide on the descr->type for the type of metric * dont store metrics on unknown page types * free db_metric_handle on sql based context queries * Disable STORAGE_POINT value check in the exporting engine unit tests * fix for db modes other than dbengine * fix for aclk archived chart queries destroying db_metric_handles of valid rrddims * fix left-over freez() instead of OWA freez on median queries Co-authored-by: Costa Tsaousis <costa@netdata.cloud> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-30Remove warnings when openssl 3 is used. (#13170)thiagoftsm
* remove_warnings_openssl_v3: Add new macro to define latest OpenSSL version * remove_warnings_openssl_v3: Add headers necessary for new API * remove_warnings_openssl_v3: Add compatible variables and adjst code inside load_private_key * remove_warnings_openssl_v3: Adjust function aclk_get_mqtt_otp according to openssl version * remove_warnings_openssl_v3: Adjust function private_decrypt * remove_warnings_openssl_v3: Fix function private_decrypt * remove_warnings_openssl_v3: Update error message * remove_warnings_openssl_v3: Update missing error message
2022-06-28Dictionaries with reference counters and full deletion support during ↵Costa Tsaousis
traversal (#13195) * dont use atomic operations when not needed; detect misuse of the the unsafe functions * use relaxed atomic operations for statistics * use relaxed atomic operations for statistics * dictionaries now use reference counters, allowing deletetions of any item while traversing it * added acquire/release interface to dictionaries * added unittest for reference counters * added NETDATA_INTERNAL_CHECKS logs to detect non-exclusive access to crusial parts of the dictionaries * dictionaries cannot be deleted while there are referenced items in them - they will be deleted once the last item gets unreferenced * cleanup * properly cleanup released items * maintain counters for readers and writers; defer all deletes on sorted walkthrough; cleaner internal_error(); * somewhat faster reference counters on single threaded dictionaries * minor optimizations; allow compiling without internal checks
2022-06-28netdata doubles (#13217)Costa Tsaousis
* netdata doubles * fix cmocka test * fix cmocka test again * fix left-overs of long double to NETDATA_DOUBLE * RRDDIM detached from disk representation; db settings in [db] section of netdata.conf * update the memory before saving * rrdset is now detached from file structures too * on memory mode map, update the memory mapped structures on every iteration * allow RRD_ID_LENGTH_MAX to be changed * granularity secs, back to update every * fix formatting * more formatting
2022-06-22Query Engine multi-granularity support (and MC improvements) (#13155)Costa Tsaousis
* set grouping functions * storage engine should check the validity of timestamps, not the query engine * calculate and store in RRDR anomaly rates for every query * anomaly rate used by volume metric correlations * mc volume should use absolute data, to avoid cancelling effect * return anomaly-rates in jasonwrap with jw-anomaly-rates option to data queries * dont return null on anomaly rates * allow passing group query options from the URL * added countif to the query engine and used it in metric correlations * fix configure * fix countif and anomaly rate percentages * added group_options to metric correlations; updated swagger * added newline at the end of yaml file * always check the time the highlighted window was above/below the highlighted window * properly track time in memory queries * error for internal checks only * moved pack_storage_number() into the storage engines * moved unpack_storage_number() inside the storage engines * remove old comment * pass unit tests * properly detect zero or subnormal values in pack_storage_number() * fill nulls before the value, not after * make sure math.h is included * workaround for isfinite() * fix for isfinite() * faster isfinite() alternative * fix for faster isfinite() alternative * next_metric() now returns end_time too * variable step implemented in a generic way * remove left-over variables * ensure we always complete the wanted number of points * fixes * ensure no infinite loop * mc-volume-improvements: Add information about invalid condition * points should have a duration in the past * removed unneeded info() line * Fix unit tests for exporting engine * new_point should only be checked when it is fetched from the db; better comment about the premature breaking of the main query loop Co-authored-by: Thiago Marques <thiagoftsm@gmail.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-17Fix coverity 378587 (#13024)Emmanuel Vasilakis
* check for return value of sysconf * if sysconf fails set OWA_NATURAL_PAGE_SIZE to 4096
2022-06-17