From 282e0dfaa97289cc6542742e9e389bd76b7e4164 Mon Sep 17 00:00:00 2001 From: vkalintiris Date: Mon, 31 Oct 2022 19:53:20 +0200 Subject: Replication of metrics (gaps filling) during streaming (#13873) * Revert "Use llvm's ar and ranlib when compiling with clang (#13854)" This reverts commit a9135f47bbb36e9cb437b18a7109607569580db7. * Profile plugin * Fix macos static thread * Add support for replication - Add a new capability for replication, when not supported the agent should behave as previously. - When replication is supported, the text protocol supports the following new commands: - CHART_DEFINITION_END: send the first/last entry of the child - REPLAY_RRDSET_BEGIN: sends the name of the chart we are replicating - REPLAY_RRDSET_HEADER: sends a line describing the columns of the following command (ie. start-time, end-time, dim1-name, ...) - REPLAY_RRDSET_DONE: sends values to push for a specific start/end time - REPLAY_RRDSET_END: send the (a) update every of the chart, (b) first/last entries in DB, (c) whether the child's been told to start streaming, (d) original after/before period to replicate. - REPLAY_CHART: Sent from a parent to a child, specifying (a) the chart name we want data for, (b) whether the child should start streaming once it has fullfilled the request with the aforementioned commands, (c) after/before of the data the parent wants - As a consequence of the new protocol, streaming is disabled for all charts on a new connection. It's enabled once replication is finished. - The configuration parameters are specified from within stream.conf: - "enable replication = yes|no" - "seconds to replicate = 3600" - "replication step = 600" (ie. how many seconds to fill per roundtrip request. * Minor fixes - quote set and dim ids - start streaming after writing replicated data to the buffer - write replicated data only when buffer is less than 50% full. - use reentrant iteration for charts * Do not send chart definitions on connection. * Track replication status through rrdset flags. * Add debug flag for noisy log messages. * Add license notice. * Iterate charts with reentrant loop * Set replication finished flag when streaming is disabled. * Revert "Profile plugin" This reverts commit 468fc9386e5283e0865fae56e9989b8ec83de14d. Used only for testing purposes. * Revert "Revert "Use llvm's ar and ranlib when compiling with clang (#13854)"" This reverts commit 27c955c58d95aed6c44d42e8b675f0cf3ca45c6d. Reapply commit that I had to revert in order to be able to build the agent on MacOS. * Build replication source files with CMake. * Pass number of words in plugind functions. * Use get_word instead of indexing words. * Use size_t instead of int. * Pay only what we use when splitting words. * no need to redefine PLUGINSD_MAX_WORDS * fix formatting warning * all usages of pluginsd_split_words() should use the return value to ensure non-cached results reuse; no need to lock the host to find a chart * keep a sender dictionary with all the replication commands received and remove replication commands from charts * do not replicate future data * use last_updated to find the end of the db * uniformity of replication logs * rewrite of the query logic * replication.c in C; debug info in human readable dates * update the chart on every replication row * update all chart members so that rrdset_done() can continue * update the protocol to push one dimension per line and transfer data collection state to parent * fix formatting * remove replication object from pluginsd * shorter communication * fix typo * support for replication proxies * proper use of flags * set receiver replication finished flag on charts created after the sender has been connected * clear RRDSET_FLAG_SYNC_CLOCK on replicated charts * log storing of nulls * log first store * log update every switches * test ignoring timestamps but sending a point just after replication end * replication should work on end_time * use replicated timestamps * at the final replication step, replicate all the remaining points * cleanup code from tests * print timestamps as unsigned long long * more formating changes; fix conflicting type of replicate_chart_response() * updated stream.conf * always respond to replication requests * in non-dbengine db modes, do not replicate more than the database size * advance the db pointer of legacy db modes * should be multiplied by update_every * fix buggy label parsing - identified by codacy * dont log error on history mismatches for db mode dbengine * allow SSL requests to streaming children * dont use ssl variable Co-authored-by: Costa Tsaousis --- streaming/stream.conf | 53 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 21 deletions(-) (limited to 'streaming/stream.conf') diff --git a/streaming/stream.conf b/streaming/stream.conf index 33172bbcbe..cfaf7ebe7b 100644 --- a/streaming/stream.conf +++ b/streaming/stream.conf @@ -33,24 +33,19 @@ destination = # Skip Certificate verification? - # # The netdata child is configurated to avoid invalid SSL/TLS certificate, # so certificates that are self-signed or expired will stop the streaming. # Case the server certificate is not valid, you can enable the use of # 'bad' certificates setting the next option as 'yes'. - # #ssl skip certificate verification = yes # Certificate Authority Path - # # OpenSSL has a default directory where the known certificates are stored, # case it is necessary it is possible to change this rule using the variable # "CApath" - # #CApath = /etc/ssl/certs/ # Certificate Authority file - # # When the Netdata parent has certificate, that is not recognized as valid, # we can add this certificate in the list of known certificates in CApath # and give for Netdata as argument. @@ -61,8 +56,7 @@ api key = # Stream Compression - # - # The netdata child is configurated to enable stream compression by default. + # The default is enabled # You can control stream compression in this agent with options: yes | no #enable compression = yes @@ -91,6 +85,7 @@ reconnect delay seconds = 5 # Sync the clock of the charts for that many iterations, when starting. + # It is ignored when replication is enabled initial clock resync iterations = 60 # ----------------------------------------------------------------------------- @@ -127,9 +122,8 @@ # The default history in entries, for all hosts using this API key. # You can also set it per host below. - # If you don't set it here, the history size of the central netdata - # will be used. - default history = 3600 + # For the default db mode (dbengine), this is ignored. + #default history = 3600 # The default memory mode to be used for all hosts using this API key. # You can also set it per host below. @@ -140,7 +134,7 @@ # ram keep it in RAM, don't touch the disk # none no database at all (use this on headless proxies) # dbengine like a traditional database - default memory mode = ram + #default memory mode = dbengine # Shall we enable health monitoring for the hosts using this API key? # 3 possible values: @@ -150,7 +144,7 @@ # ensure that the netdata process on the child is gracefully stopped, to prevent invalid last_collected alarms # You can also set it per host, below. # The default is taken from [health].enabled of netdata.conf - health enabled by default = auto + #health enabled by default = auto # postpone alarms for a short period after the sender is connected default postpone alarms on connect seconds = 60 @@ -163,11 +157,19 @@ #default proxy send charts matching = * # Stream Compression - # - # The stream with the child can be configurated to enable stream compression. + # By default it is enabled. # You can control stream compression in this parent agent stream with options: yes | no #enable compression = yes + # Replication + # Enable replication for all hosts using this api key. Default: enabled + #enable replication = yes + + # How many seconds to replicate from each child. Default: a day + #seconds to replicate = 86400 + + # The duration we want to replicate per each step. + #replication_step = 600 # ----------------------------------------------------------------------------- # 3. PER SENDING HOST SETTINGS, ON PARENT NETDATA @@ -197,14 +199,15 @@ # and at stream.conf [API_KEY].allow from allow from = * - # The number of entries in the database - history = 3600 + # The number of entries in the database. + # This is ignored for db mode dbengine. + #history = 3600 # The memory mode of the database: save | map | ram | none | dbengine - memory mode = save + #memory mode = dbengine # Health / alarms control: yes | no | auto - health enabled = yes + #health enabled = auto # postpone alarms when the sender connects postpone alarms on connect seconds = 60 @@ -217,8 +220,16 @@ #proxy send charts matching = * # Stream Compression - # - # The stream with the child can be configurated to enable stream compression. + # By default, enabled. # You can control stream compression in this parent agent stream with options: yes | no #enable compression = yes - \ No newline at end of file + + # Replication + # Enable replication for all hosts using this api key. + #enable replication = yes + + # How many seconds to replicate from each child. + #seconds to replicate = 86400 + + # The duration we want to replicate per each step. + #replication_step = 600 -- cgit v1.2.3