summaryrefslogtreecommitdiffstats
path: root/libnetdata
AgeCommit message (Collapse)Author
2022-07-22include Judy into our source tree (#13362)Timotej S
2022-07-19Fix coverity issue 379240 (Unchecked return value) (#13401)Stelios Fragkakis
2022-07-18Fix chart update ebpf.plugin (#13351)thiagoftsm
2022-07-12Address Coverity issues (#13364)Stelios Fragkakis
2022-07-11Detect stored metric size by page type (#13334)Stelios Fragkakis
* Report unknown page only once Get metric storage size by the page type Verify validity of the page and skip problematic ones * Change PAGE_SIZE to PAGE_POINT_SIZE_BYTES * Add bitmap256 and unittests * Fix unit test tier_page_type array page_type_size arrays * Add another counter to not rely on uint8_t overflow to stop the test loop
2022-07-08fix 32bit calculation on array allocator (#13343)Costa Tsaousis
fix aral on 31bit
2022-07-08Better ACLK debug communication log (#13281)Timotej S
2022-07-08array allocator for dbengine page descriptors (#13312)Costa Tsaousis
* array allocator for dbengine page descriptors * full implementation of array allocator with cleanup * faster deallocations * eliminate entierely the need for loops during free * addressed comments * lower the min number of elements to 10
2022-07-07Protect shared variables with log lock. (#13306)vkalintiris
Fixes reports by helgrind: ``` ==00:00:00:01.769 44512== Possible data race during read of size 8 at 0x9767B0 by thread #4 ==00:00:00:01.769 44512== Locks held: none ==00:00:00:01.769 44512== at 0x17CB56: error_log_limit (log.c:627) ==00:00:00:01.769 44512== by 0x17CEC0: info_int (log.c:716) ==00:00:00:01.769 44512== by 0x18949F: thread_start (threads.c:173) ==00:00:00:01.769 44512== by 0x484A486: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==00:00:00:01.769 44512== by 0x4E9CD7F: start_thread (pthread_create.c:481) ==00:00:00:01.769 44512== by 0x532F76E: clone (clone.S:95) ==00:00:00:01.769 44512== ==00:00:00:01.769 44512== This conflicts with a previous write of size 8 by thread #3 ==00:00:00:01.769 44512== Locks held: none ==00:00:00:01.769 44512== at 0x17CB61: error_log_limit (log.c:627) ==00:00:00:01.769 44512== by 0x17CEC0: info_int (log.c:716) ==00:00:00:01.769 44512== by 0x18949F: thread_start (threads.c:173) ==00:00:00:01.769 44512== by 0x484A486: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==00:00:00:01.769 44512== by 0x4E9CD7F: start_thread (pthread_create.c:481) ==00:00:00:01.769 44512== by 0x532F76E: clone (clone.S:95) ==00:00:00:01.769 44512== Address 0x9767b0 is 0 bytes inside data symbol "counter.1" ``` ``` ==00:00:00:44.536 47685== Lock at 0x976720 was first observed ==00:00:00:44.536 47685== at 0x48477EF: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==00:00:00:44.536 47685== by 0x17BBF4: __netdata_mutex_lock (locks.c:86) ==00:00:00:44.536 47685== by 0x17C514: log_lock (log.c:471) ==00:00:00:44.536 47685== by 0x17CEC0: info_int (log.c:715) ==00:00:00:44.536 47685== by 0x458C9E: compute_multidb_diskspace (rrdenginelib.c:279) ==00:00:00:44.536 47685== by 0x15B170: get_netdata_configured_variables (main.c:671) ==00:00:00:44.536 47685== by 0x15CE6C: main (main.c:1263) ==00:00:00:44.536 47685== Address 0x976720 is 0 bytes inside data symbol "log_mutex" ==00:00:00:44.536 47685== ==00:00:00:44.536 47685== Possible data race during write of size 8 at 0x9767A0 by thread #1 ==00:00:00:44.536 47685== Locks held: none ==00:00:00:44.536 47685== at 0x17CB39: error_log_limit (log.c:621) ==00:00:00:44.536 47685== by 0x15E234: signals_handle (signals.c:258) ==00:00:00:44.536 47685== by 0x15D880: main (main.c:1534) ==00:00:00:44.536 47685== ==00:00:00:44.536 47685== This conflicts with a previous read of size 8 by thread #9 ==00:00:00:44.536 47685== Locks held: 1, at address 0x976720 ==00:00:00:44.536 47685== at 0x17CAA3: error_log_limit (log.c:604) ==00:00:00:44.536 47685== by 0x17CECA: info_int (log.c:718) ==00:00:00:44.536 47685== by 0x4624D2: rrdset_done_push (rrdpush.c:344) ==00:00:00:44.536 47685== by 0x36190C: rrdset_done (rrdset.c:1351) ==00:00:00:44.536 47685== by 0x1B07E7: Chart::update(unsigned long) (plugin_profile.cc:82) ==00:00:00:44.536 47685== by 0x1B01D4: updateCharts(std::vector<Chart*, std::allocator<Chart*> >, unsigned long) (plugin_profile.cc:126) ==00:00:00:44.536 47685== by 0x1B02AC: profile_main (plugin_profile.cc:144) ==00:00:00:44.536 47685== by 0x1895D4: thread_start (threads.c:185) ==00:00:00:44.536 47685== Address 0x9767a0 is 0 bytes inside data symbol "start.3" ```
2022-07-06Multi-Tier database backend for long term metrics storage (#13263)Stelios Fragkakis
* Tier part 1 * Tier part 2 * Tier part 3 * Tier part 4 * Tier part 5 * Fix some ML compilation errors * fix more conflicts * pass proper tier * move metric_uuid from state to RRDDIM * move aclk_live_status from state to RRDDIM * move ml_dimension from state to RRDDIM * abstracted the data collection interface * support flushing for mem db too * abstracted the query api * abstracted latest/oldest time per metric * cleanup * store_metric for tier1 * fix for store_metric * allow multiple tiers, more than 2 * state to tier * Change storage type in db. Query param to request min, max, sum or average * Store tier data correctly * Fix skipping tier page type * Add tier grouping in the tier * Fix to handle archived charts (part 1) * Temp fix for query granularity when requesting tier1 data * Fix parameters in the correct order and calculate the anomaly based on the anomaly count * Proper tiering grouping * Anomaly calculation based on anomaly count * force type checking on storage handles * update cmocka tests * fully dynamic number of storage tiers * fix static allocation * configure grouping for all tiers; disable tiers for unittest; disable statsd configuration for private charts mode * use default page dt using the tiering info * automatic selection of tier * fix for automatic selection of tier * working prototype of dynamic tier selection * automatic selection of tier done right (I hope) * ask for the proper tier value, based on the grouping function * fixes for unittests and load_metric_next() * fixes for lgtm findings * minor renames * add dbengine to page cache size setting * add dbengine to page cache with malloc * query engine optimized to loop as little are required based on the view_update_every * query engine grouping methods now do not assume a constant number of points per group and they allocate memory with OWA * report db points per tier in jsonwrap * query planer that switches database tiers on the fly to satisfy the query for the entire timeframe * dbegnine statistics and documentation (in progress) * calculate average point duration in db * handle single point pages the best we can * handle single point pages even better * Keep page type in the rrdeng_page_descr * updated doc * handle future backwards compatibility - improved statistics * support &tier=X in queries * enfore increasing iterations on tiers * tier 1 is always 1 iteration * backfilling higher tiers on first data collection * reversed anomaly bit * set up to 5 tiers * natural points should only be offered on tier 0, except a specific tier is selected * do not allow more than 65535 points of tier0 to be aggregated on any tier * Work only on actually activated tiers * fix query interpolation * fix query interpolation again * fix lgtm finding * Activate one tier for now * backfilling of higher tiers using raw metrics from lower tiers * fix for crash on start when storage tiers is increased from the default * more statistics on exit * fix bug that prevented higher tiers to get any values; added backfilling options * fixed the statistics log line * removed limit of 255 iterations per tier; moved the code of freezing rd->tiers[x]->db_metric_handle * fixed division by zero on zero points_wanted * removed dead code * Decide on the descr->type for the type of metric * dont store metrics on unknown page types * free db_metric_handle on sql based context queries * Disable STORAGE_POINT value check in the exporting engine unit tests * fix for db modes other than dbengine * fix for aclk archived chart queries destroying db_metric_handles of valid rrddims * fix left-over freez() instead of OWA freez on median queries Co-authored-by: Costa Tsaousis <costa@netdata.cloud> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-30Remove warnings when openssl 3 is used. (#13170)thiagoftsm
* remove_warnings_openssl_v3: Add new macro to define latest OpenSSL version * remove_warnings_openssl_v3: Add headers necessary for new API * remove_warnings_openssl_v3: Add compatible variables and adjst code inside load_private_key * remove_warnings_openssl_v3: Adjust function aclk_get_mqtt_otp according to openssl version * remove_warnings_openssl_v3: Adjust function private_decrypt * remove_warnings_openssl_v3: Fix function private_decrypt * remove_warnings_openssl_v3: Update error message * remove_warnings_openssl_v3: Update missing error message
2022-06-28Dictionaries with reference counters and full deletion support during ↵Costa Tsaousis
traversal (#13195) * dont use atomic operations when not needed; detect misuse of the the unsafe functions * use relaxed atomic operations for statistics * use relaxed atomic operations for statistics * dictionaries now use reference counters, allowing deletetions of any item while traversing it * added acquire/release interface to dictionaries * added unittest for reference counters * added NETDATA_INTERNAL_CHECKS logs to detect non-exclusive access to crusial parts of the dictionaries * dictionaries cannot be deleted while there are referenced items in them - they will be deleted once the last item gets unreferenced * cleanup * properly cleanup released items * maintain counters for readers and writers; defer all deletes on sorted walkthrough; cleaner internal_error(); * somewhat faster reference counters on single threaded dictionaries * minor optimizations; allow compiling without internal checks
2022-06-28netdata doubles (#13217)Costa Tsaousis
* netdata doubles * fix cmocka test * fix cmocka test again * fix left-overs of long double to NETDATA_DOUBLE * RRDDIM detached from disk representation; db settings in [db] section of netdata.conf * update the memory before saving * rrdset is now detached from file structures too * on memory mode map, update the memory mapped structures on every iteration * allow RRD_ID_LENGTH_MAX to be changed * granularity secs, back to update every * fix formatting * more formatting
2022-06-22Query Engine multi-granularity support (and MC improvements) (#13155)Costa Tsaousis
* set grouping functions * storage engine should check the validity of timestamps, not the query engine * calculate and store in RRDR anomaly rates for every query * anomaly rate used by volume metric correlations * mc volume should use absolute data, to avoid cancelling effect * return anomaly-rates in jasonwrap with jw-anomaly-rates option to data queries * dont return null on anomaly rates * allow passing group query options from the URL * added countif to the query engine and used it in metric correlations * fix configure * fix countif and anomaly rate percentages * added group_options to metric correlations; updated swagger * added newline at the end of yaml file * always check the time the highlighted window was above/below the highlighted window * properly track time in memory queries * error for internal checks only * moved pack_storage_number() into the storage engines * moved unpack_storage_number() inside the storage engines * remove old comment * pass unit tests * properly detect zero or subnormal values in pack_storage_number() * fill nulls before the value, not after * make sure math.h is included * workaround for isfinite() * fix for isfinite() * faster isfinite() alternative * fix for faster isfinite() alternative * next_metric() now returns end_time too * variable step implemented in a generic way * remove left-over variables * ensure we always complete the wanted number of points * fixes * ensure no infinite loop * mc-volume-improvements: Add information about invalid condition * points should have a duration in the past * removed unneeded info() line * Fix unit tests for exporting engine * new_point should only be checked when it is fetched from the db; better comment about the premature breaking of the main query loop Co-authored-by: Thiago Marques <thiagoftsm@gmail.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-17Fix coverity 378587 (#13024)Emmanuel Vasilakis
* check for return value of sysconf * if sysconf fails set OWA_NATURAL_PAGE_SIZE to 4096
2022-06-17allow traversing null-value dictionaries (#13162)Costa Tsaousis
* allow traversing null-value dictionaries * fix lgtm report * void the value too * removed NEVERNULL directive
2022-06-15fix crashes due to misaligned allocations (#13137)Costa Tsaousis
2022-06-1373x times faster metrics correlations at the agent (#13107)Costa Tsaousis
* faster correlations * 4x times faster correlations * a little bit more help * 10x times faster metrics correlations * 6 digits precision; better comments * enabled metrics correlations by default * abstracted DIFFS_NUMBER to allow easily changing it * reworked the entire logic to have more accuracy and support a baseline that is power of two multiple of highlight * properly calculate shifts * even more improved version * added support for timeout; fixed another memory leak; skipped hidden dimensions * default timeout 1min * reduce memory even further * use dictionary for the list of charts and optimize locks * return 403 forbidden, when mc is not enabled * added query options * dont process zero dimensions * added volume method as an option to metric correlations ; now metric correlations can support multiple implementations * make sure we will never crash * spread results evenly for both kstwo and volume * fixed bug in query engine that was missing misaligned queries when a single point was requested from the db; improved comments; improved query flags * updated swagger and added sane defaults; query options are now supported, including anomaly-bit * added "raw" option to allow cross node correlations; added "group" option to allow different time aggregations; allowed calling metric correlations without any parameters; allowed calling metric correlations with relative timestamps; added timeout to volume method; properly handled timeout on ks2 method; json output now sends all parameters back - same for json_wrap; modified query engine to use present time for relative timestamps; modified "allow_past" to mean both past backwards and forwards * emulate the old behaviour about zero points * 100% accuracy against python ks_2samp(); now the default is volume and the default points are 500 * added config option to change default metric correlations method * removed work-arounds now that rrdlabels are merged
2022-06-13Labels with dictionary (#13070)Costa Tsaousis
* squashed and rebased to master * fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h * added unittest for UTF-8 multibyte sanitization * Fix unit test compilation * Fix CMake build * remove double sanitizer for opentsdb; cleanup sanitize_json_string() * rename error_description to error_message to avoid conflict with json-c * revert last and undef error_description from json-c * more unittests; attempt to fix protobuf map issue * get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer * added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted * better sorted dictionary checking * proper unittesting for sorted dictionaries * call dictionary deletion callback when destroying the dictionary * remove obsolete variable * Fix exporting unit tests * Fix k8s label parsing test * workaround for cmocka and strdupz() * Bypass cmocka memory allocation check * Revert "Bypass cmocka memory allocation check" This reverts commit 4c49923839d9229bea23ca914dd8a0be1ebe2bf4. * Revert "workaround for cmocka and strdupz()" This reverts commit 7bebee04801db1865c748a7896d5fa54bb7104a5. * Bypass cmocka memory allocation checks * respect json formatting for chart labels * cloud sends colons * print the value only once * allow parenthesis in values and spaces; make stream sender send quotes for values Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-02add the ability to merge dictionary items (#13054)Costa Tsaousis
* add the ability to merge old value and new value * docs * merge to conflict
2022-06-02dictionary improvements (#13052)Costa Tsaousis
* fix typo in foreach write; added unit tests to traverse empty dictionaries * rename variable dfe in macro to be uniform with name variable
2022-06-02Fix dictionary crash walkthrough empty (#13051)Costa Tsaousis
fix dictionary walkthrouhg crash when the dictionary is empty
2022-06-01coverity fixes about statsd; removal of strsame (#13049)Costa Tsaousis
2022-06-01Dictionary with JudyHS and double linked list (#13032)Costa Tsaousis
* dictionary internals isolation * more dictionary cleanups * added unit test * we should use DICT internally * disable cups in cmake * implement DICTIONARY with Judy arrays * operational JUDY implementation * JUDY cleanup * JUDY summary added * JudyHS implementation with double linked list * test negative searches too * optimize destruction * optimize set to insert first without lookup * updated stats * code cleanup; better organization; updated info * more code cleanup and commenting * more cleanup, renames and comments * fix rename * more cleanups * use Judy.h from system paths * added foreach traversal; added flag to add item in front; isolated locks to their own functions; destruction returns the number of bytes freed * more comments; flags are now 16-bit * completed unittesting * addressed comments and added reference counters maintainance * added unittest in main; tested removal of items in front, back and middle * added read/write walkthrough and foreach; allowed walkthrough and foreach in write mode to delete the current element (used by cups.plugin); referenced counters removed from the API * DICTFE.name should be const too * added API calls for exposing all statistics * dictionary flags as enum and reference counters as atomic operations * more comments; improved error handling at unit tests * added functions to allow unsafe access while traversing the dictionary with locks in place * check for libcups in cmake * added delete callback; implemented statsd with this dictionary * added missing dfe_done() * added alternative implementation with AVL * added documentation * added comments and warning about AVL * dictionary walktrhough on new code * simplified foreach; updated docs * updated docs * AVL is much faster without hashes * AVL should follow DBENGINE
2022-06-01Fix disabled apps (ebpf.plugin) (#13044)thiagoftsm
2022-05-25add dictionary support to statsd (#12980)Costa Tsaousis
* add dictionary support to statsd * faster statsd sets and dictionaries; disabled events dimensions by default * properly detect tags, even without a sampling rate * statsd assumes a pipe between fields * missing param * allow names without values and support more unknown fields * more parser fixes * support multiple tags; remove the sum from the dimensions of histograms and timers, but keep it for synthetic charts * Parse statsd tags and support changing units of private charts * remove debug code * added support for naming dimensions too * updated docs * added support for family tags * updated docs
2022-05-24Fix nanosleep on platforms other than Linux (#12991)Vladimir Kobal
2022-05-24Fix compilation warnings (#12993)Vladimir Kobal
2022-05-24Faster queries (#12988)Costa Tsaousis
* faster rrdeng_load_metric_next() * no need to check validity for number - already done at the query side * solve discrepancy between query create and free * inline unpack_storage_number
2022-05-21optimize poll_events() to spread the work over the threads more evenly (#12975)Costa Tsaousis
* optimize poll_events() to spread the work over the threads more evenly * fixed typos, code cleanup * better error handling * prevent crash in case callbacks manipulate the sockets arrays - added warnings
2022-05-19Suppress warning when freeing a NULL pointer in onewayalloc_freez (#12955)Stelios Fragkakis
2022-05-18Prevent command_to_be_logged from overflowing (#12947)Emmanuel Vasilakis
* prevent command_to_be_logged from overflowing * dont access with size
2022-05-17feat: move dirs, logs, and env vars config options to separate sections (#12935)Ilya Mashchenko
2022-05-16user configurable sqlite PRAGMAs (#12917)Costa Tsaousis
* user configurable sqlite PRAGMAs * added cache size
2022-05-16fix `[global statistics]` section in netdata.conf (#12916)Ilya Mashchenko
2022-05-14Fix compilation warnings in FreeBSD (#12887)Vladimir Kobal
2022-05-13chore(worker_utilization): log an error when re-registering an already ↵Ilya Mashchenko
registered job (#12903)
2022-05-10fix for negative per job busy time (#12867)Costa Tsaousis
2022-05-10workers fixes and improvements (#12863)Costa Tsaousis
2022-05-09Workers utilization charts (#12807)Costa Tsaousis
* initial version of worker utilization * working example * without mutexes * monitoring DBENGINE, ACLKSYNC, WEB workers * added charts to monitor worker usage * fixed charts units * updated contexts * updated priorities * added documentation * converted threads to stacked chart * One query per query thread * Revert "One query per query thread" This reverts commit 6aeb391f5987c3c6ba2864b559fd7f0cd64b14d3. * fixed priority for web charts * read worker cpu utilization from proc * read workers cpu utilization via /proc/self/task/PID/stat, so that we have cpu utilization even when the jobs are too long to finish within our update_every frequency * disabled web server cpu utilization monitoring - it is now monitored by worker utilization * tight integration of worker utilization to web server * monitoring statsd worker threads * code cleanup and renaming of variables * contrained worker and statistics conflict to just one variable * support for rendering jobs per type * better priorities and removed the total jobs chart * added busy time in ms per job type * added proc.plugin monitoring, switch clock to MONOTONIC_RAW if available, global statistics now cleans up old worker threads * isolated worker thread families * added cgroups.plugin workers * remove unneeded dimensions when then expected worker is just one * plugins.d and streaming monitoring * rebased; support worker_is_busy() to be called one after another * added diskspace plugin monitoring * added tc.plugin monitoring * added ML threads monitoring * dont create dimensions and charts that are not needed * fix crash when job types are added on the fly * added timex and idlejitter plugins; collected heartbeat statistics; reworked heartbeat according to the POSIX * the right name is heartbeat for this chart * monitor streaming senders * added streaming senders to global stats * prevent division by zero * added clock_init() to external C plugins * added freebsd and macos plugins * added freebsd and macos to global statistics * dont use new as a variable; address compiler warnings on FreeBSD and MacOS * refactored contexts to be unique; added health threads monitoring Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-05-06Remove sync warning (#12831)thiagoftsm
2022-05-05Improve storage number unpacking by using a lookup table. (#11048)vkalintiris
The LUT contains precomputed values of multiplier/divisors that are used to unpack storage numbers into calculated numbers efficiently.
2022-05-04Broadcast completion before unlocking condition variable's mutex (#12822)vkalintiris
2022-05-03onewayallocator to use mallocz() instead of mmap() (#12810)Costa Tsaousis
2022-05-03Trace rwlocks of netdata (#12785)Costa Tsaousis
* with -DNETDATA_INTERNAL_CHECKS=1 enable rwlocks tracing * fix strings alignment on terminal * remove wrong addition * removed formating warning; now counting active locks per thread; tracing is enabled with -DNETDATA_TRACE_RWLOCKS=1 * added the missing netdata_mutex_destroy() * optimized clocks usage in locks * added also main * fixed formatting warning * add compiler warning when compiling with -DNETDATA_TRACE_RWLOCKS=1 * cleanup and documentation * fix for old variable * >= not just > to allow proper comparisons * dont print 0x twice and print the lock pointer on every line * trace locks deeper
2022-05-03One way allocator to double the speed of parallel context queries (#12787)Costa Tsaousis
* one way allocator to speed up context queries * fixed a bug while expanding memory pages * reworked for clarity and finally fixed the bug of allocating memory beyond the page size * further optimize allocation step to minimize the number of allocations made * implement strdup with memcpy instead of strcpy * added documentation * prevent an uninitialized use of owa * added callocz() interface * integrate onewayalloc everywhere - apart sql queries * one way allocator is now used in context queries using archived charts in sql * align on the size of pointers * forgotten freez() * removed not needed memcpys * give unique names to global variables to avoid conflicts with system definitions
2022-05-03Speed up BUFFER increases (minimize reallocs) (#12792)Costa Tsaousis
* speedup BUFFER increases by forward looking reallocs * implemented buffer_vsprintf() and optimized buffer_sprintf() to minimize calls to vsnprintfz() * optimize json generation for well known strings
2022-05-02procfile: more comfortable initial settings and faster/fewer reallocs (#12791)Costa Tsaousis
2022-05-02Don't use MADV_DONTDUMP on non-linux builds (#12795)vkalintiris
2022-04-28faster execution of external programs (#12759)Costa Tsaousis
* faster invocation of external plugins by eliminating the need for starting /bin/sh and then the command * added missing parameter * prefer the z function * cleanup and clarity - addressed LGTM issue * simplified the popen() interface a bit, to make it more predictable for future uses * removed commented old code * more comments cleanup * mypopen_raw() added for completeness - it is not currently used * simplified the mypopen_raw() interface even further * Update libnetdata/popen/popen.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * restored 0 flags for netdata_spawn() and cosmetic changes * added more clarity to the code and reverted old behavior of all other execution of commands Co-authored-by: Vladimir Kobal <vlad@prokk.net>