summaryrefslogtreecommitdiffstats
path: root/streaming/receiver.c
AgeCommit message (Collapse)Author
2022-10-05Allow netdata plugins to expose functions for querying more information ↵Costa Tsaousis
about specific charts (#13720) * function renames and code cleanup in popen.c; no actual code changes * netdata popen() now opens both child process stdin and stdout and returns FILE * for both * pass both input and output to parser structures * updated rrdset to call custom functions * RRDSET FUNCTION leading calls for both sync and async operation * put RRDSET functions to a separate file * added format and timeout at function definition * support for synchronous (internal plugins) and asynchronous (external plugins and children) functions * /api/v1/function endpoint * functions are now attached to the host and there is a dictionary view per chart * functions implemented at plugins.d * remove the defer until keyword hook from plugins.d when it is done * stream sender implementation of functions * sanitization of all functions so that certain characters are only allowed * strictier sanitization * common max size * 1st working plugins.d example * always init inflight dictionary * properly destroy dictionaries to avoid parallel insertion of items * add more debugging on disconnection reasons * add more debugging on disconnection reasons again * streaming receiver respects newlines * dont use the same fp for both streaming receive and send * dont free dbengine memory with internal checks * make sender proceed in the buffer * added timing info and garbage collection at plugins.d * added info about routing nodes * added info about routing nodes with delay * added more info about delays * added more info about delays again * signal sending thread to wake up * streaming version labeling and commented code to support capabilities * added functions to /api/v1/data, /api/v1/charts, /api/v1/chart, /api/v1/info * redirect top output to stdout * address coverity findings * fix resource leaks of popen * log attempts to connect to individual destinations * better messages * properly parse destinations * try to find a function from the most matching to the least matching * log added streaming destinations * rotate destinations bypassing a node in the middle that does not accept our connection * break the loops properly * use typedef to define callbacks * capabilities negotiation during streaming * functions exposed upstream based on capabilities; compression disabled per node persisting reconnects; always try to connect with all capabilities * restore functionality to lookup functions * better logging of capabilities * remove old versions from capabilities when a newer version is there * fix formatting * optimization for plugins.d rrdlabels to avoid creating and destructing dictionaries all the time * delayed health initialization for rrddim and rrdset * cleanup health initialization * fix for popen() not returning the right value * add health worker jobs for initializing rrdset and rrddim * added content type support for functions; apps.plugin permanent function to display all the processes * fixes for functions parameters parsing in apps.plugin * fix for process matching in apps.plugiin * first working function for apps.plugin * Dashboard ACL is disabled for functions; Function errors are all in JSON format * apps.plugin function processes returns json table * use json_escape_string() to escape message * fix formatting * apps.plugin exposes all its metrics to function processes * fix json formatting when filtering out some rows * reopen the internal pipe of rrdpush in case of errors * misplaced statement * do not use buffer->len * support for GLOBAL functions (functions that are not linked to a chart * added /api/v1/functions endpoint; removed format from the FUNCTIONS api; * swagger documentation about the new api end points * added plugins.d documentation about functions * never re-close a file * remove uncessesary ifdef * fixed issues identified by codacy * fix for null label value * make edit-config copy-and-paste friendly * Revert "make edit-config copy-and-paste friendly" This reverts commit 54500c0e0a97f65a0c66c4d34e966f6a9056698e. * reworked sender handshake to fix coverity findings * timeout is zero, for both send_timeout() and recv_timeout() * properly detect that parent closed the socket * support caching of function responses; limit function response to 10MB; added protection from malformed function responses * disabled excessive logging * added units to apps.plugin function processes and normalized all values to be human readable * shorter field names * fixed issues reported * fixed apps.plugin error response; tested that pluginsd can properly handle faulty responses * use double linked list macros for double linked list management * faster apps.plugin function printing by minimizing file operations * added memory percentage * fix compatibility issues with older compilers and FreeBSD * rrdpush sender code cleanup; rrhost structure cleanup from sender flags and variables; * fix letftover variable in ifdef * apps.plugin: do not call detach from the thread; exit immediately when input is broken * exclude AR charts from health * flush cleaner; prefer sender output * clarity * do not fill the cbuffer if not connected * fix * dont enabled host->sender if streaming is not enabled; send host label updates to parent; * functions are only available through ACLK * Prepared statement reports only in dev mode * fix AR chart detection * fix for streaming not being enabling itself * more cleanup of sender and receiver structures * moved read-only flags and configuration options to rrdhost->options * fixed merge with master * fix for incomplete rename * prevent service thread from working on charts that are being collected Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-09-05Deduplicate all netdata strings (#13570)Costa Tsaousis
* rrdfamily * rrddim * rrdset plugin and module names * rrdset units * rrdset type * rrdset family * rrdset title * rrdset title more * rrdset context * rrdcalctemplate context and removal of context hash from rrdset * strings statistics * rrdset name * rearranged members of rrdset * eliminate rrdset name hash; rrdcalc chart converted to STRING * rrdset id, eliminated rrdset hash * rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate * rrdcalctemplate * rrdvar * eval_variable * rrddimvar and rrdsetvar * rrdhost hostname, os and tags * fix master commits * added thread cache; implemented string_dup without locks * faster thread cache * rrdset and rrddim now use dictionaries for indexing * rrdhost now uses dictionary * rrdfamily now uses DICTIONARY * rrdvar using dictionary instead of AVL * allocate the right size to rrdvar flag members * rrdhost remaining char * members to STRING * * better error handling on indexing * strings now use a read/write lock to allow parallel searches to the index * removed AVL support from dictionaries; implemented STRING with native Judy calls * string releases should be negative * only 31 bits are allowed for enum flags * proper locking on strings * string threading unittest and fixes * fix lgtm finding * fixed naming * stream chart/dimension definitions at the beginning of a streaming session * thread stack variable is undefined on thread cancel * rrdcontext garbage collect per host on startup * worker control in garbage collection * relaxed deletion of rrdmetrics * type checking on dictfe * netdata chart to monitor rrdcontext triggers * Group chart label updates * rrdcontext better handling of collected rrdsets * rrdpush incremental transmition of definitions should use as much buffer as possible * require 1MB per chart * empty the sender buffer before enabling metrics streaming * fill up to 50% of buffer * reset signaling metrics sending * use the shared variable for status * use separate host flag for enabling streaming of metrics * make sure the flag is clear * add logging for streaming * add logging for streaming on buffer overflow * circular_buffer proper sizing * removed obsolete logs * do not execute worker jobs if not necessary * better messages about compression disabling * proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped * monitor stream sender used buffer ratio * Update exporting unit tests * no need to compare label value with strcmp * streaming send workers now monitor bandwidth * workers now use strings * streaming receiver monitors incoming bandwidth * parser shift of worker ids * minor fixes * Group chart label updates * Populate context with dimensions that have data * Fix chart id * better shift of parser worker ids * fix for streaming compression * properly count received bytes * ensure LZ4 compression ring buffer does not wrap prematurely * do not stream empty charts; do not process empty instances in rrdcontext * need_to_send_chart_definition() does not need an rrdset lock any more * rrdcontext objects are collected, after data have been written to the db * better logging of RRDCONTEXT transitions * always set all variables needed by the worker utilization charts * implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes * lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock * STRING code re-organization for clarity * thread_cache improvements; double numbers precision on worker threads * STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS * rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts * Add index to speed up database context searches * Removed last_updated optimization (was also buggy after latest merge with master) Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-07-24Rrdcontext (#13335)Costa Tsaousis
* type checking on dictionary return values * first STRING implementation, used by DICTIONARY and RRDLABEL * enable AVL compilation of STRING * Initial functions to store context info * Call simple test functions * Add host_id when getting charts * Allow host to be null and in this case it will process the localhost * Simplify init Do not use strdupz - link directly to sqlite result set * Init the database during startup * make it compile - no functionality yet * intermediate commit * intermidiate * first interface to sql * loading instances * check if we need to update cloud * comparison of rrdcontext on conflict * merge context titles * rrdcontext public interface; statistics on STRING; scratchpad on DICTIONARY * dictionaries maintain version numbers; rrdcontext api * cascading changes * first operational cleanup * string unittest * proper cleanup of referenced dictionaries * added rrdmetrics * rrdmetric starting retention * Add fields to context Adjuct context creation and delete * Memory cleanup * Fix get context list Fix memory double free in tests Store context with two hosts * calculated retention * rrdcontext retention with collection * Persist database and shutdown * loading all from sql * Get chart list and dimension list changes * fully working attempt 1 * fully working attempt 2 * missing archived flag from log * fixed archived / collected * operational * proper cleanup * cleanup - implemented all interface functions - dictionary react callback triggers after the dictionary is unlocked * track all reasons for changes * proper tracking of reasons of changes * fully working thread * better versioning of contexts * fix string indexing with AVL * running version per context vs hub version; ifdef dbengine * added option to disable rrdmetrics * release old context when a chart changes context * cleanup properly * renamed config * cleanup contexts; general cleanup; * deletion inline with dequeue; lots of cleanup; child connected/disconnected * ml should start after rrdcontext * added missing NULL to ri->rrdset; rrdcontext flags are now only changed under a mutex lock * fix buggy STRING under AVL * Rework database initialization Add migration logic to the context database * fix data race conditions during context deletion * added version hash algorithm * fix string over AVL * update aclk-schemas * compile new ctx related protos * add ctx stream message utils * add context messages * add dummy rx message handlers * add the new topics * add ctx capability * add helper functions to send the new messages * update cmake build to not fail * update topic names * handle rrdcontext_enabled * add more functions * fatal on OOM cases instead of return NULL * silence unknown query type error * fully working attempt 1 * fully working attempt 2 * allow compiling without ACLK * added family to the context * removed excess character in UUID * smarter merging of titles and families * Database migration code to add family Add family to SQL_CHART_DATA and VERSIONED_CONTEXT_DATA * add family to context message * enable ctx in communication * hardcoded enabled contexts * Add hard code for CTX * add update node collectors to json * add context message log * fix log about last_time_t * fix collected flags for queued items * prevent crash on charts cleanup * fix bug in AVL indexing of dictionaries; make sure react callback of dictionaries has a reference counter, which is acquired while the dictionary is locked * fixed dictionary unittest * strict policy to cleanup and garbage collector * fix db rotation and garbage collection timings * remove deadlock * proper garbage collection - a lot faster retention recalculation * Added not NULL in database columns Remove migration code for context -- we will ship with version 1 of the table schema Added define for query in tests to detect localhost * Use UUID_STR_LEN instead of GUID_LEN + 1 Use realistic timestamps when adding test data in the database * Add NULL checks for passed parameters * Log deleted context when compiled with NETDATA_INTERNAL_CHECKS * Error checking for null host id * add missing ContextsCheckpoint log convertor * Fix spelling in VACCUM * Hold additional information for host -- prepare to load archived hosts on startup * Make sure claim id is valid * is_get_claimed is actually get the current claim id * Simplify ctx get chart list query * remove env negotiation * fix string unittest when there are some strings already in the index * propagate live-retention flag upstream; cleanup all update reasons; updated instances logging; automated attaching started/stopped collecting flags; * first implementation of /api/v1/contexts * full contexts API; updated swagger * disabled debugging; rrdcontext enabled by default * final cleanup and renaming of global variables * return current time on currently collected contexts, charts and dimensions * added option "deepscan" to the API to have the server refresh the retention and recalculate the contexts on the fly * fixed identation of yaml * Add constrains to the host table * host->node_id may not be available * new capabilities * lock the context while rendering json * update aclk-schemas * added permanent labels to all charts about plugin, module and family; added labels to all proc plugin modules * always add the labels * allow merging of families down to [x] * dont show uuids by default, added option to enable them; response is now accepting after,before to show only data for a specific timeframe; deleted items are only shown when "deleted" is requested; hub version is now shown when "queue" is requested * Use the localhost claim id * Fix to handle host constrains better * cgroups: add "k8s." prefix to chart context in k8s * Improve sqlite metadata version migration check * empty values set to "[none]"; fix labels unit test to reflect that * Check if we reached the version we want first (address CODACY report re: Array index 'i' is used before limits check) * Rewrite condition to address CODACY report (Redundant condition: t->filter_callback. '!A || (A && B)' is equivalent to '!A || B') * Properly unlock context * fixed memory leak on rrdcontexts - it was not freeing all dictionaries in rrdhost; added wait of up to 100ms on dictionary_destroy() to give time to dictionaries to release their items before destroying them * fixed memory leak on rrdlabels not freed on rrdinstances * fixed leak when dimensions and charts are redefined * Mark entries for charts and dimensions as submitted to the cloud 3600 seconds after their creation Mark entries for charts and dimensions as updated (confirmed by the cloud) 1800 seconds after their submission * renamed struct string * update cgroups alarms * fixed codacy suggestions * update dashboard info * fix k8s_cgroup_10s_received_packets_storm alarm * added filtering options to /api/v1/contexts and /api/v1/context * fix eslint * fix eslint * Fix pointer binding for host / chart uuids * Fix cgroups unit tests * fixed non-retention updates not propagated upstream * removed non-fatal fatals * Remove context from 2 way string merge. * Move string_2way_merge to dictionary.c * Add 2-way string merge tests. * split long lines * fix indentation in netdata-swagger.yaml * update netdata-swagger.json * yamllint please * remove the deleted flag when a context is collected * fix yaml warning in swagger * removed non-fatal fatals * charts should now be able to switch contexts * allow deletion of unused metrics, instances and contexts * keep the queued flag * cleanup old rrdinstance labels * dont hide objects when there is no filter; mark objects as deleted when there are no sub-objects * delete old instances once they changed context * delete all instances and contexts that do not have sub-objects * more precise transitions * Load archived hosts on startup (part 1) * update the queued time every time * disable by default; dedup deleted dimensions after snapshot * Load archived hosts on startup (part 2) * delayed processing of events until charts are being collected * remove dont-trigger flag when object is collected * polish all triggers given the new dont_process flag * Remove always true condition Enums for readbility / create_host_callback only if ACLK is enabled (for now) * Skip retention message if context streaming is enabled Add messages in the access log if context streaming is enabled * Check for node id being a UUID that can be parsed Improve error check / reporting when loading archived hosts and creating ACLK sync threads * collected, archived, deleted are now mutually exclusive * Enable the "orphan" handling for now Remove dead code Fix memory leak on free host * Queue charts and dimensions will be no-op if host is set to stream contexts * removed unused parameter and made sure flags are set on rrdcontext insert * make the rrdcontext thread abort mid-work when exiting * Skip chart hash computation and storage if contexts streaming is enabled Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Timo <timotej@netdata.cloud> Co-authored-by: ilyam8 <ilya@netdata.cloud> Co-authored-by: Vladimir Kobal <vlad@prokk.net> Co-authored-by: Vasilis Kalintiris <vasilis@netdata.cloud>
2022-06-27Removes Legacy JSON Cloud Protocol Support In Agent (#13111)Timotej S
* removes old protocol support (cloud removed support already)
2022-06-15Add missing control to streaming (#13112)thiagoftsm
fix_tls_stream: Add call to SSL_get_error to avoid an infinite loop
2022-06-13Labels with dictionary (#13070)Costa Tsaousis
* squashed and rebased to master * fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h * added unittest for UTF-8 multibyte sanitization * Fix unit test compilation * Fix CMake build * remove double sanitizer for opentsdb; cleanup sanitize_json_string() * rename error_description to error_message to avoid conflict with json-c * revert last and undef error_description from json-c * more unittests; attempt to fix protobuf map issue * get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer * added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted * better sorted dictionary checking * proper unittesting for sorted dictionaries * call dictionary deletion callback when destroying the dictionary * remove obsolete variable * Fix exporting unit tests * Fix k8s label parsing test * workaround for cmocka and strdupz() * Bypass cmocka memory allocation check * Revert "Bypass cmocka memory allocation check" This reverts commit 4c49923839d9229bea23ca914dd8a0be1ebe2bf4. * Revert "workaround for cmocka and strdupz()" This reverts commit 7bebee04801db1865c748a7896d5fa54bb7104a5. * Bypass cmocka memory allocation checks * respect json formatting for chart labels * cloud sends colons * print the value only once * allow parenthesis in values and spaces; make stream sender send quotes for values Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-06Allow usage of the new MQTT 5 implementation (#12838)Timotej S
* adds support for new MQTT5 implementation in agent, currently by default disabled as Tech Preview
2022-05-25Delay children chart obsoletion check (#12992)Emmanuel Vasilakis
* wait untill after 2 minutes of last chart received to run obsoletion check * turn write to read locks
2022-05-20Apply some logic to possible streaming destinations (#12866)Emmanuel Vasilakis
* replace connect_to_one_of with connect_to_one_of_destinations * move functions from socket.c * use sizeof * move current destination pointer to host * formatting * use snprintfz * get entries in same order * handle single destination as before (or when it is the last of the list), instead of skiping it every other loop * try other destinations on ssl problem
2022-05-11Set trust durations to have data from children properly aligned (#12870)Stelios Fragkakis
2022-05-09Workers utilization charts (#12807)Costa Tsaousis
* initial version of worker utilization * working example * without mutexes * monitoring DBENGINE, ACLKSYNC, WEB workers * added charts to monitor worker usage * fixed charts units * updated contexts * updated priorities * added documentation * converted threads to stacked chart * One query per query thread * Revert "One query per query thread" This reverts commit 6aeb391f5987c3c6ba2864b559fd7f0cd64b14d3. * fixed priority for web charts * read worker cpu utilization from proc * read workers cpu utilization via /proc/self/task/PID/stat, so that we have cpu utilization even when the jobs are too long to finish within our update_every frequency * disabled web server cpu utilization monitoring - it is now monitored by worker utilization * tight integration of worker utilization to web server * monitoring statsd worker threads * code cleanup and renaming of variables * contrained worker and statistics conflict to just one variable * support for rendering jobs per type * better priorities and removed the total jobs chart * added busy time in ms per job type * added proc.plugin monitoring, switch clock to MONOTONIC_RAW if available, global statistics now cleans up old worker threads * isolated worker thread families * added cgroups.plugin workers * remove unneeded dimensions when then expected worker is just one * plugins.d and streaming monitoring * rebased; support worker_is_busy() to be called one after another * added diskspace plugin monitoring * added tc.plugin monitoring * added ML threads monitoring * dont create dimensions and charts that are not needed * fix crash when job types are added on the fly * added timex and idlejitter plugins; collected heartbeat statistics; reworked heartbeat according to the POSIX * the right name is heartbeat for this chart * monitor streaming senders * added streaming senders to global stats * prevent division by zero * added clock_init() to external C plugins * added freebsd and macos plugins * added freebsd and macos to global statistics * dont use new as a variable; address compiler warnings on FreeBSD and MacOS * refactored contexts to be unique; added health threads monitoring Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-05-07fix memory leaks and mismatches of the use of the z functions for ↵Costa Tsaousis
allocations (#12841) * fix mismatches of the use of the z functions for allocations * when there was no memory; the original name of the dimensions was freed, and with mismatching deallocator.. * fixed memory leak at rrdeng_load_metric_*() functions * fixed memory leak on exit of plugins.d parser * fixed memory leak on plugins and streaming receiver threads exit * fixed compiler warnings
2022-05-03Check for chart obsoletion on children re-connections (#12707)Emmanuel Vasilakis
* check for chart obsoletion on children connections * use rrdset_is_obsolete
2022-04-28Check the return of setsockopt (#12772)Emmanuel Vasilakis
2022-04-20Add a 2 minute timeout to stream receiver socket (#12673)Emmanuel Vasilakis
* add a 10min timeout to receiver socket * reduce timeout to 2 min
2022-01-26re-instate plugins_action for clabels (#12039)Emmanuel Vasilakis
2022-01-19Add code for LZ4 streaming data compression (#11821)avstrakhov
* Add code for LZ4 streaming data compression * Fix LGTM alert * Add lz4 library for link when compression enabled * Add LZ4_resetStream_fast presence detection * Disable compression for older LZ4 libraries * Correct LZ4 API check * [Testing Stream Compression] Debug msgs and report.md * Add LZ4 library version using LZ4_initStream * Fixed bug in SSL mode * [Testing compression] - Add compression info messages * Set compression enabled by default, update doc * Update streaming/README.md Co-authored-by: DShreve2 <david@netdata.cloud> * [Agent Negotiation] Compression as separate capability * [Agent Negotiation] Compression as separate capability - default compression variable always active * Add code to negotiate compression * [Agent Negotiation] Based on stream version * [Agent Negotiation] Version based - fix compilation error * [Agent Negotiation] Fix glob var default_compression_enbaled=0 affects all the connections - Handle compression - stream version based * [Agent Negotiation - Compression] - Add control flag in 1. sender/receiver state & 2. stream.conf per child * [Agent Negotiation - Compression] Fix stream.conf key, mguid control * [Agent Negotiate Compression] Fine control on stream.conf per key,mguid for each child * [Agent Negotiation Compression] Stop destroying compressor for runtime configuration + Update Readme.md * [Agent Negotiation Compression] Use stream_version 4 if compression is disabled * Correct child's compression check * [Agent Negotiation Compression] Create streaming compression section in docs. * [Agent Negotiation Compresion] Remove redundant debug msgs * [Stream Compression] - integrate compression build info & config info in api/v1/info endpoint. * [Agent Negotiation] Finalize README.md * [Agent Stream Compression] Fix buildinfo json, Finalize readme.md * [Agent Stream Compression] Negotiate compression based on stream version * [Agent Stream Compression] Stream compression control per child in stream.conf | per AP_KEY, MACHINE_GUID * [Agent Stream Compression] Avoid destroying compressor enabling runtime configuration + Update Readme.md * [Agent Stream Compression] - Provide stream compression build info & config info in api/v1/info endpoint + Update Readme.md * [Agent Stream Compression] Fix rebase conflicts * [Agent Stream Compression] Fix more rebase conflicts * [Agent Stream Compression] 1. Stream version based negotiation 2. per child stream.conf control 3. finalize docs 4. stream compression build info in web api * [Agent Stream Compression] 1. Stream version based negotiation 2. per child stream.conf control 3. finalize docs 4. stream compression build info in web api * [Agent Stream Compression] Change unsuccessful buffer check to error * [Agent Stream Compression] Readme.md proof-read corrections, downgrade to stream_version_clabels, add shields for supported versions, EOF lint * [Agent Stream Compression] Fix missed lz4 library on Alpine Linux * Phrasal review Co-authored-by: odynik <odynik.ee@gmail.com> Co-authored-by: DShreve2 <david@netdata.cloud> Co-authored-by: Tina Lüdtke <tina@kickoke.com>
2022-01-14Perform a host metadata update on reconnection (#11965)Stelios Fragkakis
2022-01-11Fix time_t format (#11897)Vladimir Kobal
2021-11-19Cleanup compilation warnings (#11810)Stelios Fragkakis
* Fix compilation warnings (variables used when debugging is enabled using NETDATA_INTERNAL_CHECKS) * Fix compilation warning (casting)
2021-10-25Stream chart labels (#11675)Emmanuel Vasilakis
* stream chart labels * update stream protocol to 4 * only send CLABEL_COMMIT when there are labels * mark host as UNUSED * log error for stray CLABEL_COMMIT * remove commented define
2021-07-07ACLK-NG New Cloud NodeInstance related msgs (#11234)Timotej S
Adds new cloud arch NodeInstance messages as per design. Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2021-05-31Provide UTC offset in seconds and edit health config command (#11051)Emmanuel Vasilakis
* add abbreviated timezone, utc offset in seconds, and edit health alarm command rebased * formating * use str2i instead of atoi
2021-05-24Remove unecessary relative paths when including headers. (#11124)vkalintiris
Currently, we add the repository's top-level dir in the compiler's header search path. This means that code in every top-level directory within the repo can include headers sibling top-level directories. This patch makes header inclusion consistent when it comes to files that are included from sibling top-level directories within the repo.
2021-04-26Store null claim id in the database for non claimed children (#11036)Stelios Fragkakis
2021-04-22Persist claim ids in local database for parent and children (#10993)Stelios Fragkakis
2021-04-14Spelling streaming (#10919)Josh Soref
2021-03-19Add check for children connecting to a parent agent with unsupported memory ↵Stelios Fragkakis
mode (#10787)
2021-03-16Adds ACLK-NG as fallback(#10315)Timotej S
* adds a new implementation of ACLK written almost from scratch * external dependencies only OpenSSL and JSON-C * fallback for systems where ACLK Legacy can't build (for technical or philosophical reasons) * can be forced to build by giving "--aclk-ng" to the installer
2020-12-14Kubernetes labels (#10107)Ilya Mashchenko
Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2020-11-26ACLK Child Availability Messages (#9918)Timotej S
* new ACLK messages for Claiming MVP1
2020-10-22Disregard host tags configuration pointer and keep duplicate string from ↵Markos Fountoulakis
streaming handshake. (#10121)
2020-10-20Replace memcpy() with memmove() for overlapping memory ranges. (#10097)Markos Fountoulakis
2020-09-07Fix lock order reversal. (#9888)Markos Fountoulakis
2020-09-04Fix race condition with orphan hosts (#9862)Markos Fountoulakis
* Fix race condition between orphan host cleanup and new streaming connections. * Remove health enabling from log replay, it will be handled at streaming connection time.
2020-08-26Adds claimed_id streaming (#9804)Timotej S
* streams claimed_id of child nodes to parents * adds this information into /api/v1/info
2020-07-28Implemented multihost database (#9556)Stelios Fragkakis
* Hard code a node for non-legacy multidb test Skip dbengine initialization for new incoming children Add code to switch to multidb ctx when accessing the dbengine * When a non-legacy streaming connection is detected, use the multidb metadata log context * Clear the superblock memory to avoid random data written in the metadata log * Activate the host detection during compaction Activate the host detection during metadata log chart updates Keep the host in the user object during replay of the HOST command * Add defaults for health / rrdpush on HOST metadata replay Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child() Use defaults from the stream.conf * Count hosts only if not archived When host switches from archived to active update rrd_hosts_available Remove archived hosts from charts and info * Change parameter from "multidb disk space" to "dbengine multihost disk space" Remove unused variables Fix compilation error when dbengine is disabled Fix condition for machine_guid directory creation under cache_dir * Enable multidb disk space file creation. * Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine. * Fix old bug in the code that confused obsolete hosts with orphan hosts. * Do not delete multi-host DB host files. * Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies. * Identify metadata that collide with non-dbengine memory mode hosts and ignore them. * Handle non-dbengine localhost with dbengine archived charts in localhost and streaming. * Ignore archived hosts in streaming. * Add documentation before merging to master. Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
2020-06-29Fix children version on stream (#9438)thiagoftsm
Store Netdata child version instead parent version inside `HOST` structure.
2020-06-19Fixes the race-hazard in streaming during the shutdown sequence (#9370)Andrew Moss
The streaming component detects when a receiver stream has closed, and stops an attached sender on the same host. This is to support proxy configurations where the stream is passed through. During the shutdown sequence, once netdata_exit has been set no thread should touch any RRDHOST structure as the non-static threads are not joined before the database shuts down. The destruction of the thread state has been separated from the cleanup and can be called from two points. If the thread can detach itself from the host (i.e. it is not during the shutdown sequence) then it does so and destroys the state. During shutdown the thread leaves the state intact so that it can be destroyed during the host destruction, and the host destruction now cancels the thread to ensure a consistent sequence of events.
2020-06-12Change streaming terminology to parent-child in the code (#9323)Andrew Moss
2020-06-12Add support for persistent metadata (#9324)Stelios Fragkakis
* Implemented collector metadata logging * Added persistent GUIDs for charts and dimensions * Added metadata log replay and automatic compaction * Added detection of charts with no active collector (archived) * Added new endpoint to report archived charts via `/api/v1/archivedcharts` * Added support for collector metadata update Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
2020-06-08Fix crash in #9291 (#9297)Andrew Moss
Did not account for the path where rrdhost_find_or_create fails and terminates the thread before setting rpt->host to a valid pointer.
2020-06-03Fix bugs in streaming and enable support for gap filling (#9214)Andrew Moss
This PR adds (inactive) support that we will use to fill the gaps on chart when a receiving agent goes offline and the sender reconnects. The streaming component has been reworked to make the connection bi-directional and fix several outstanding bugs in the area. * Fixed an incorrect case of version negotiation. Removed fatal() on exhaustion of fds. * Fixed cases that fell through to polling the socket after closing. * Fixed locking of data related to sender and receiver in the host structure. * Added fine-grained locks to reduce contention. * Added circular buffer to sender to prevent starvation in high-latency conditions. * Fixed case where agent is a proxy and negotiated different streaming versions with sender and receiver. * Changed interface to new parser to put the buffering code in streaming. * Fixed the bug that stopped senders from reconnecting after their socket times out - this was part of the scaling fixes that provide an early shortcut path for rejecting connections without lock contention. * Uses fine-grained locking and a different approach to thread shutdown instead. * Added liveness detection to connections to allow selection of the best connection.