summaryrefslogtreecommitdiffstats
path: root/database/rrdhost.c
AgeCommit message (Collapse)Author
2024-02-01Create a top-level directory to contain source code. (#16896)vkalintiris
* Move ML under src * Move spwan under src * Move cli/ under src/ * move registry/ under src/ * move streaming/ under src/ * Move claim under src. Update docs * Move database/ under src/ * Move libnetdata/ under src/ * Update references to libnetdata * Fix logsmanagement includes * Update generated script path.
2024-01-29New Permissions System (#16837)Costa Tsaousis
* wip of migrating to bitmap permissions * replace role with bitmapped permissions * formatting permissions using macros * accept view and edit permissions for all dynamic configuration * work on older compilers * parse the header in hex * agreed permissions updates * map permissions to old roles * new permissions management * fix function rename * build libdatachannel when enabled - currently for code maintainance * dyncfg now keeps 2 sets of statuses, to keep track of what happens to dyncfg and what actually happens with the plugin * complete the additions of jobs and solve unittests * fix renumbering of ACL bits * processes function shows the cmdline based on permissions and the presence of the sensitive data permission * now the agent returns 412 when authorization is missing, 403 when authorization exists but permissions are not enough, 451 when access control list prevents the user from accessing the dashboard * enable cmdline on processes with thhe HTTP_ACCESS_VIEW_AGENT_CONFIG permission * by default functions require anonymous-data access * fix compilation on debian * fix left-over renamed define * updated schema for alerts * updated permissions * require a name when loading json payloads, if the name is not provided by dyncfg
2024-01-23DYNCFG: dynamically configured alerts (#16779)Costa Tsaousis
* cleanup alerts * fix references * fix references * fix references * load alerts once and apply them to each node * simplify health_create_alarm_entry() * Compile without warnings with compiler flags: -Wall -Wextra -Wformat=2 -Wshadow -Wno-format-nonliteral -Winit-self * code re-organization and cleanup * generate patterns when applying prototypes; give unique dyncfg names to all alerts * eval expressions keep the source and the parsed_as as STRING pointers * renamed host to node in dyncfg ids * renamed host to node in dyncfg ids * add all cloud roles to the list of parsed X-Netdata-Role header and also default to member access level * working functionality * code re-organization: moved health event-loop to a new file, moved health globals to health.c * rrdcalctemplate is removed; alert_cfg is removed; foreach dimension is removed; RRDCALCs are now instanciated only when they are linked to RRDSETs * dyncfg alert prototypes initialization for alerts * health dyncfg split to separate file * cleanup not-needed code * normalize matches between parsing and json * also detect !* for disabled alerts * dyncfg capability disabled * Store alert config part1 * Add rrdlabels_common_count * wip health variables lookup without indexes * Improve rrdlabels_common_count by reusing rrdlabels_find_label_with_key_unsafe with an additional parameter * working variables with runtime lookup * working variables with runtime lookup * delete rrddimvar and rrdfamily index * remove rrdsetvar; now all variables are in RRDVARs inside hosts and charts * added /api/v1/variable that resolves a variable the same way alerts do * remove rrdcalc from eval * remove debug code * remove duplicate assignment * Fix memory leak * all alert variables are now handled by alert_variable_lookup() and EVAL is now independent of alerts * hide all internal structures of EVAL * Enable -Wformat flag Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Adjust binding for calculation, warning, critical * Remove unused macro * Update config hash id * use the right info and summary in alerts log * use synchronous queries for alerts * Handle cases when config_hash_id is missing from health_log * remove deadlock from health worker * parsing to json payload for health alert prototypes * cleaner parsing and avoiding memory leaks in case of duplicate members in json * fix left-over rename of function * Keep original lookup field to send to the cloud Cleanup / rename function to store config Remove unused DEFINEs, functions * Use ac->lookup * link jobs to the host when the template is registered; do not accept running a function without a host * full dyncfg support for health alerts, except action TEST * working dyncfg additions, updates, removals * fixed missing source, wrong status updates * add alerts by type, component, classification, recipient and module at the /api/v2/alerts endpoint * fix dyncfg unittest * rename functions * generalize the json-c parser macros and move them to libnetdata * report progress when enabling and disabling dyncfg templates * moved rrdcalc and rrdvar to health * update alarms * added schema for alerts; separated alert_action_options from rrdr_options; restructured the json payload for alerts * enable parsed json alerts; allow sending back accepted but disabled * added format_version for alerts payload; enables/disables status now is also inheritted by the status of the rules; fixed variable names in json output * remove the RRDHOST pointer from DYNCFG * Fix command field submitted to the cloud * do not send updates to creation requests, for DYNCFG jobs --------- Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: ilyam8 <ilya@netdata.cloud>
2024-01-15Improve context load (#16659)Stelios Fragkakis
* Improve single thread load. Handle thread creation failure as well Remove RRDHOST_FLAG_CONTEXT_LOAD_IN_PROGRESS Improve chart label cleanup * Init thread index
2024-01-11Delete memory mode "map" and "save". (#16604)vkalintiris
* Delete memory modes "map" and "save". * Remove unmaintained exporting tests * Remove references of map/save modes in docs. * Remove more references to map/save from docs.
2024-01-11dyncfg v2 (#16702)Costa Tsaousis
* split rrdfunctions streaming and progress * simplified internal inline functions API * split rrdfunctions inflight management * split rrd functions exporters * renames * base dyncfg structure * config pluginsd * intercept dyncfg function calls * loading and saving of dyncfg metadata and data * save metadata and payload to a single file; added code to update the plugins with jobs and saved configs * basic working unit test * added payload to functions execution * removed old dyncfg code that is not needed any more * more cleanup * cleanup sender for functions with payload * dyncfg functions are not exposed as functions * remaining work to avoid indexing the \0 terminating character in dictionary keys * added back old dyncfg plugins.d commands as noop, to allow plugins continue working * working api; working streaming; * updated plugins.d documentation * aclk and http api requests share the same header parsing logic * added source type internal * fixed crashes * added god mode for tests * fixes * fixed messages * save host machine guids to configs * cleaner manipulation of supported commands * the functions event loop for external plugins can now process dyncfg requests * unified internal and external plugins dyncfg API * Netdata serves schema requests from /etc/netdata/schema.d and /var/lib/netdata/conf.d/schema.d * cleanup and various fixes; fixed bug in previous dyncfg implementation on streaming that was sending the paylod in a way that allowed other streaming commands to be multiplexed * internals go to a separate header file * fix duplicate ACLK requests sent by aclk queue mechanism * use fstat instead of stat * working api * plugin actions renamed to create and delete; dyncfg files are removed only from user actions * prevent deadlock by using the react callback * fix for string_strndupz() * better dyncfg unittests * more tests at the unittests * properly detect dyncfg functions * hide config functions from the UI * tree response improvements * send the initial update with payload * determine tty using stdout, not stderr * changes to statuses, cleanup and the code to bring all business logic into interception * do not crash when the status is empty * functions now propagate the source of the requests to plugins * avoid warning about unused functions * in the count at items for attention, do not count the orphan entries * save source into dyncfg * make the list null terminated * fixed invalid comparison * prevent memory leak on duplicated headers; log x-forwarded-for * more unit tests * added dyncfg unittests into the default unittests * more unit tests and fixes * more unit tests and fixes * fix dictionary unittests * config functions require admin access
2024-01-11Name storage engine variables consistently. (#16753)vkalintiris
* Consistent naming of STORAGE_INSTANCE instances. Replace usages of `db_instance` and `instance` with `si`. * Rename array `storage_metrics_groups[tier]` to `smg[tier]` * Rename db_metric_handle to smh * Rename instances of `storage_engine_query_handle` to `seqh`. * Rename instances of STORAGE_ENGINE_BACKEND to `seb`. * Rename instances of STORAGE_COLLECT_HANDLE to `sch`.
2023-12-15Queries Progress (#16574)Costa Tsaousis
* track the progress of queries * add query_progress in libnetdata Makefile.am * add acl, response size and response code to the tracking * define the required functions * fix the last commit * added /api/v2/progress?transaction=ID to report the progress of queries * added function to report netdata-queries * track hashtable additions * when resusing a transaction, maintain the counter * keep track of linked and indexing * added X-Forwarded-Host and X-Forwarded-For to logs. X-Forwarded-For is also added in progress tracking * report compact uuids to match logs; register the actual duration of the transaction * added rowOptions to function; now web_client keeps track if it tracks progress or not * add http request method to progress * add tags per function; /api/vX/functions is now not protected * compact the sanitization array * split pluginsd_parser into multiple files * cleanup keyword definitions * code cleanup * extracted rrd_collector to separate files * added http access level to functions * renamed access "all" to "any" * implemented optional protection on functions * add priority to functions, to allow the UI select the best function (lower priority) when the user has not selected a function * added progress report from the plugins to netdata and from children to parents - untested * added progress reporting in systemd-journal * query timeout is now handled by evloop for external plugins * propagate progress reports to children and plugins * fix codeql warning * adapt to cmake * minor changes * extend function timeout when progress is received; added streaming capability to propagate progress reports to parents and send progress requests to children * revert change in dictionary.h * add log when access level is invalid * update access level of functions * added logs when processing progress updates * log when the deferred response is too big * comment out sender progress to find the issue * added missing newline in streaming progress reports * propogate progress reports to functions * fix logs
2023-12-01Code cleanup (#16448)Stelios Fragkakis
* Code cleanup * More cleanup * More cleanup * Use FILENAME_MAX * query fix
2023-11-23Handle ephemeral hosts (#16381)Stelios Fragkakis
* Handle ephemeral hosts * Node empheral removal timeout 86400 seconds (1 day) * Move config from health to global section * Set a node to queryable false when it is ephemeral and is removed * Log queryable. Send queryable=0 only when forcing host deletion (the node is ephemeral) * Switch to "is ephemeral node" Document stream.conf * Unregister node id
2023-11-22New logging layer (#16357)Costa Tsaousis
* cleanup of logging - wip * first working iteration * add errno annotator * replace old logging functions with netdata_logger() * cleanup * update error_limit * fix remanining error_limit references * work on fatal() * started working on structured logs * full cleanup * default logging to files; fix all plugins initialization * fix formatting of numbers * cleanup and reorg * fix coverity issues * cleanup obsolete code * fix formatting of numbers * fix log rotation * fix for older systems * add detection of systemd journal via stderr * finished on access.log * remove left-over transport * do not add empty fields to the logs * journal get compact uuids; X-Transaction-ID header is added in web responses * allow compiling on systems without memfd sealing * added libnetdata/uuid directory * move datetime formatters to libnetdata * add missing files * link the makefiles in libnetdata * added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses * added stream receiver, sender, proc plugin and pluginsd log stack * iso8601 advanced usage; line_splitter module in libnetdata; code cleanup * add message ids to streaming inbound and outbound connections * cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled * send SIGABRT to external plugins only if we are not shutting down * fix cross cleanup in pluginsd parser * fatal when there is a stack error in logs * compile netdata with -fexceptions * do not kill external plugins with SIGABRT * metasync info logs to debug level * added severity to logs * added json output; added options per log output; added documentation; fixed issues mentioned * allow memfd only on linux * moved journal low level functions to journal.c/h * move health logs to daemon.log with proper priorities * fixed a couple of bugs; health log in journal * updated docs * systemd-cat-native command to push structured logs to journal from the command line * fix makefiles * restored NETDATA_LOG_SEVERITY_LEVEL * fix makefiles * systemd-cat-native can also work as the logger of Netdata scripts * do not require a socket to systemd-journal to log-as-netdata * alarm notify logs in native format * properly compare log ids * fatals log alerts; alarm-notify.sh working * fix overflow warning * alarm-notify.sh now logs the request (command line) * anotate external plugins logs with the function cmd they run * added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash * reformatted alarm-notify logs * unify cgroup-network-helper.sh * added quotes around params * charts.d.plugin switched logging to journal native * quotes for logfmt * unify the status codes of streaming receivers and senders * alarm-notify: dont log anything, if there is nothing to do * all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available * migrate cgroup-name.sh to new logging * systemd-cat-native now supports messages with newlines * socket.c logs use priority * cleanup log field types * inherit the systemd set INVOCATION_ID if found * allow systemd-cat-native to send messages to a systemd-journal-remote URL * log2journal command that can convert structured logs to journal export format * various fixes and documentation of log2journal * updated log2journal docs * updated log2journal docs * updated documentation of fields * allow compiling without libcurl * do not use socket as format string * added version information to newly added tools * updated documentation and help messages * fix the namespace socket path * print errno with error * do not timeout * updated docs * updated docs * updated docs * log2journal updated docs and params * when talking to a remote journal, systemd-cat-native batches the messages * enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote * Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote" This reverts commit b079d53c11f6687cd64d804fdd7b24c0492bf245. * note about uncompressed traffic * log2journal: code reorg and cleanup to make modular * finished rewriting log2journal * more comments * rewriting rules support * increased limits * updated docs * updated docs * fix old log call * use journal only when stderr is connected to journal * update netdata.spec for libcurl, libpcre2 and log2journal * pcre2-devel * do not require pcre2 in centos < 8, amazonlinux < 2023, open suse * log2journal only on systems pcre2 is available * ignore log2journal in .gitignore * avoid log2journal on centos 7, amazonlinux 2 and opensuse * add pcre2-8 to static build * undo last commit * Bundle to static Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Add build deps for deb packages Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Add dependencies; build from source Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Test build for amazon linux and centos expect to fail for suse Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * fix minor oversight Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Reorg code * Add the install from source (deps) as a TODO * Not enable the build on suse ecosystem Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> --------- Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-11-06give the streaming function to nightly users (#16346)Costa Tsaousis
2023-10-27Faster parents (#16127)Costa Tsaousis
* cache ctx in collection handle * cache rd together with rda * do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension() * fix unit tests * do the absolutely minimum while updating timestamps, ensure validity during reading them * when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only. * remove the spinlock from mrg * remove the metric flags that are not used any more * mrg writers can be different threads * update first time when latest clean is also updated * cleanup * set hot page with a simple atomic operation * sender sets chart slot for every chart * work on senders without SLOT * enable SLOT capability * send slot at BEGIN when SLOT is enabled * fix slot generation and parsing * send slot while re-streaming * use the sender capabilities, not the receiver * cleanup * add slots support to all chart and dimension related plugin commands * fix condition * fix calculation * check sender capabilties * assign slots in constructors * we need the dimension slot at the DIMENSION keyword * more debug info in case of dimension mismatch * ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely * fix renumbering on child restart * reset rda caching when receiving a chart definition * optimize pluginsd_end_v2() * do not do zero sized allocations * trust the chart slot id of the child * cleanup charts on pluginsd thread exit * better cleanup * find the chart and put it in the slot, if it not already there * move slots array to host * initialize pluginsd slots properly * add slots to replay begin; do not cleanup slots that dont belong to a chart * cleanup on obsolete * cleanup slots on obsoletions * cleanup and renames about obsoletion * rewrite obsolation service code to remove race conditions * better service obsoletion log * added debugging * more debug * exposed flag now compares versions * removed debugging messages * respolve conflicts * fix replication check for unsent dimensions
2023-10-27ZSTD and GZIP/DEFLATE streaming support (#16268)Costa Tsaousis
* move compression header to compression.h * prototype with zstd compression * updated capabilities * no need for resetting compression * left-over reset function * use ZSTD_compressStream() instead of ZSTD_compressStream2() for backwards compatibility * remove call to LZ4_decoderRingBufferSize() * debug signature failures * fix the buffers of lz4 * fix decoding of zstd * detect compression based on initialization; prefer ZSTD over LZ4 * allow both lz4 and zstd * initialize zstd streams * define missing ZSTD_CLEVEL_DEFAULT * log zero compressed size * debug log * flush compression buffer * add sender compression statistics * removed debugging messages * do not fail if zstd is not available * cleanup and buildinfo * fix max message size, use zstd level 1, add compressio ratio reporting * use compression level 1 * fix ratio title * better compression error logs * for backwards compatibility use buffers of COMPRESSION_MAX_CHUNK * switch to default compression level * additional streaming error conditions detection * do not expose compression stats when compression is not enabled * test for the right lz4 functions * moved lz4 and zstd to their own files * add gzip streaming compression * gzip error handling * added unittest for streaming compression * eliminate a copy of the uncompressed data during zstd compression * eliminate not needed zstd allocations * cleanup * decode gzip with Z_SYNC_FLUSH * set the decoding gzip algorithm * user configuration for compression levels and compression algorithms order * fix exclusion of not preferred compressions * remove now obsolete compression define, since gzip is always available * rename compression algorithms order in stream.conf * move common checks in compression.c * cleanup * backwards compatible error checking
2023-09-29Dyncfg add streaming support (#15791)Timotej S
* dyncfg fncnames as constants * add helper macros to know parser streaming/plugin * plugins dictionary per RRDHOST * api_request_v2_config add support for /host/ * streamify pluginsd_register_plugin * streamify pluginsd_register_module * streamify report_job_status * streamify dyncfg get functions * module_type2str * add job type and flags * add DYNCFG_REGISTER_JOB * implement register job * push all to parent at startup * add helper function is_dyncfg_function * forward virtual functions trough streaming * separate job2json * add api/v2/job_statuses * do cleanup on streaming * streamify set functions * support FUNCTION_PAYLOAD trough streaming * WIP tests * dont attempt loading non-localhost configs * move cfg persistence to proper place * prevent race * properly update job state at runtime * cleanup 1 * job2json add missing reason * add tests * correct HTTP code * add test * streamify delete_job_cb * add DELETE_JOB keyword * job delete over streaming * add tests for create and delete job over parent * rrdpush common checks to macro * add missing forwarders * fix jobs according to test results * more tests * review comment 1 * codacy remove valid warning * codacy ruby fixes * fix wrong rc check * minimal test plugin for child * add test * dict walk insted of master lock * minor - english spelling fixes * thiago comments 1 * minor - rename folder to dynconf * enable only when built with -DNETDATA_TEST_DYNCFG * minor - compiler warning * create dir post daemonization * stricter URL check
2023-09-26Maintain node's last connected timestamp in the db (#15979)Stelios Fragkakis
* Maintain node's last connected timestamp in the db * Rebase -- switch to version database v14
2023-09-18functions cancelling (#15977)Costa Tsaousis
2023-09-01Reduce label memory (#15255)Stelios Fragkakis
2023-08-31Fix build with --disable-https (#15395)Emmanuel Vasilakis
rebased
2023-08-17Fix use after free (#15825)Stelios Fragkakis
2023-08-17Fix coverity 393052: API usage errors (LOCK) (#15832)Stelios Fragkakis
2023-08-03Revert "Refactor RRD code. (#15423)" (#15723)vkalintiris
This reverts commit 440bd51e08fdfa2a4daa191fb68643456028a753. dbengine was still being used for non-zero tiers even on non-dbengine modes.
2023-07-26Refactor RRD code. (#15423)vkalintiris
* Storage engine. * Host indexes to rrdb * Move globals to rrdb * Move storage_tiers_backfill to rrdb * default_rrd_update_every to rrdb * default_rrd_history_entries to rrdb * gap_when_lost_iterations_above to rrdb * rrdset_free_obsolete_time_s to rrdb * libuv_worker_threads to rrdb * ieee754_doubles to rrdb * rrdhost_free_orphan_time_s to rrdb * rrd_rwlock to rrdb * localhost to rrdb * rm extern from func decls * mv rrd macro under rrd.h * default_rrdeng_page_cache_mb to rrdb * default_rrdeng_extent_cache_mb to rrdb * db_engine_journal_check to rrdb * default_rrdeng_disk_quota_mb to rrdb * default_multidb_disk_quota_mb to rrdb * multidb_ctx to rrdb * page_type_size to rrdb * tier_page_size to rrdb * No storage_engine_id in rrdim functions * storage_engine_id is provided by st * Update to fix merge conflict. * Update field name * Remove unnecessary macros from rrd.h * Rm unused type decls * Rm duplicate func decls * make internal function static * Make the rest of public dbengine funcs accept a storage_instance. * No more rrdengine_instance :) * rm rrdset_debug from rrd.h * Use rrdb to access globals in ML and ACLK Missed due to not having the submodules in the worktree. * rm total_number * rm RRDVAR_TYPE_TOTAL * rm unused inline * Rm names from typedef'd enums * rm unused header include * Move include * Rm unused header include * s/rrdhost_find_or_create/rrdhost_get_or_create/g * s/find_host_by_node_id/rrdhost_find_by_node_id/ Also, remove duplicate definition in rrdcontext.c * rm macro used only once * rm macro used only once * Reduce rrd.h api by moving funcs into a collector specific utils header * Remove unused func * Move parser specific function out of rrd.h * return storage_number instead of void pointer * move code related to rrd initialization out of rrdhost.c * Remove tier_grouping from rrdim_tier Saves 8 * storage_tiers bytes per dimension. * Fix rebase * s/rrd_update_every/update_every/ * Mark functions as static and constify args * Add license notes and file to build systems. * Remove remaining non-log/config mentions of memory mode * Move rrdlabels api to separate file. Also, move localhost functions that loads labels outside of database/ and into daemon/ * Remove function decl in rrd.h * merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir * Do not expose internal function from rrd.h * Rm NETDATA_RRD_INTERNALS Only one function decl is covered. We have more database internal functions that we currently expose for no good reason. These will be placed in a separate internal header in follow up PRs. * Add license note * Include libnetdata.h instead of aral.h * Use rrdb to access localhost * Fix builds without dbengine * Add header to build system files * Add rrdlabels.h to build systems * Move func def from rrd.h to rrdhost.c * Fix macos build * Rm non-existing function * Rebase master * Define buffer length macro in ad_charts. * Fix FreeBSD builds. * Mark functions static * Rm func decls without definitions * Rebase master * Rebase master * Properly initialize value of storage tiers. * Fix build after rebase.
2023-07-14Pre release fixes (#15405)Costa Tsaousis
2023-07-13Fix CodeQL alert (#15384)Stelios Fragkakis
Fix CodeQL alert -- Multiplication result converted to larger type
2023-07-11Rename log Macros (debug) (#15322)thiagoftsm
2023-07-10Use spinlock in host and chart (#15328)Stelios Fragkakis
* Switch alarm log lock to spinlock * Switch the alerts lock in the chart structure to spinlock * Proper lock usage
2023-07-06Rename generic `error` function (#15296)thiagoftsm
2023-07-06Code reorg and cleanup - enrichment of /api/v2 (#15294)Costa Tsaousis
* claim script now accepts the same params as the kickstart * rewrote buildinfo to unify all methods * added cloud unavailable in cloud status * added all exporters * renamed httpd to h2o * rename ENABLE_COMPRESSION to ENABLE_LZ4 * rename global variable * rename ENABLE_HTTPS to ENABLE_OPENSSL * fix coverity-scan for openssl * add lz4 to coverity-scan * added all plugins and most of the features * added all plugins and most of the features * generalize bitmap code so that we can have any size of bitmaps * cleanup * fix compilation without protobuf * fix compilation with others allocators * fix bitmap * comprehensive bitmaps unit test * bitmap as macros * added developer mode * added system info to build info * cloud available/unavailable * added /api/v2/info * added units and ni to transitions * when showing instances and transitions, show only the instances that have transitions * cleanup * add missing quotes * add anchor to transitions * added more to build info * calculate retention per tier and expose it to /api/v2/info * added currently collected metrics * do not show space and retention when no numbers are available * fix impossible overflow * Add function for transitions and execute callback * In case of error, reset and try next dictionary entry * Fix error message * simpler logic to maintain retention per tier * /api/v2/alert_transitions * Handle case of recipient null Convert after and before to usec * Add classification, type and component * working /api/v2/alert_transitions * Fix query to properly handle context and alert name * cleanup * Add search with transition * accept transition in /api/v2/alert_transitions * totaly dynamic facets * fixed debug info * restructured facets * cleanup; removal of options=transitions * updated alert entries flags * method to exec * Return also exec run timestamp Temp table cleanup only when we don't execute with a transition * cleanup obsolete anchor parameter * Add sql_get_alert_configuration function * added options=config to alert_transitions * added /api/v2/alert_config * preliminary work for /api/v2/claim * initialize variables; do not expose expected retention if no disk space info is available; do not report aclk as initializing when not claimed * fix claim session key filename * put a newline into the session key file * more progress on claiming * final /api/v2/claim endpoint * after claiming, refresh our state at the output * Fix query to fetch config * Remove debug log * add configuration objects * add configuration objects - fixed * respect the NETDATA_DISABLE_CLOUD env variable * NETDATA_DISABLE_CLOUD env variable sets the default, but the config sets the final value * use a new claimed_id on every claiming * regenerate random key on claiming and wait for online status * ignore write() return value when writing a newline * dont show cloud status disabled when claimed_id is missing * added ctx to alert instances * cleanup config and transitions from /api/v2/alerts * fix unused variable * in /api/v2/alert_config show 1 config without an array * show alert values conditionally, by appending options=values * When storing host info if the key value is empty, store unknown * added options=summary to control when the alerts summary is shown * increased http_api_v2 to version 5 * claming random key file is now not world readable * added local-listeners binary that detects all the listening ports, their IPs and their command lines --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-06-30Replace `info` macro with a less generic name (#15266)Carlo Cabrera
2023-06-29Optimizations part 2 (#15280)Costa Tsaousis
* make all pluginsd functions inline, instead of function pointers * dynamic MRG partitions based on the number of CPUs * report the right size of the MRG * prevent invalid read on pluginsd exit * faster service_running() check; fix compiler warnings; shutdown replication after streaming to prevent crash on shutdown * sender is now using a spinlock * rrdcontext uses spinlock * replace select() with poll() * signed calculation of threads * disable read-ahead on jnfv2 files during scan
2023-06-26use gperf for the pluginsd/streaming parser hashtable (#15251)Costa Tsaousis
* use gperf for the pluginsd parser * simplify pluginsd_parser by removing void pointers to user * pluginsd_split_words() with inlined pluginsd_space() * quoted_string_splitter() now uses a map instead of a function for determining spaces * add stress test for pluginsd parser * optimized BITMAP256 * optimized rrdpush receiver reception * optimized rrdpush sender compression * renames and cleanup * remove wrong negation * unify handshake and disconnection reasons * use parser_find_keyword * register job names only for the current repertoire
2023-06-19Obvious memory reductions (#15204)Costa Tsaousis
* remove rd->update_every * reduce amount of memory for RRDDIM * reorgnize rrddim->db entries * optimize rrdset and statsd * optimize dictionaries * RW_SPINLOCK for dictionaries * fix codeql warning * rw_spinlock improvements * remove obsolete assertion * fix crash on health_alarm_log_process() * use RW_SPINLOCK for AVL trees * add RW_SPINLOCK read/write trylock * pgc and mrg now use rw_spinlocks; cache line optimizations for mrg * thread tag of dbegnine init * append created datafile, lockless * make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use * thread cancelability in spinlocks; optimize thread cancelability management * introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files * use the last timestamp of each journal file for indexing * when the previous cannot be found, start from the beginning * add more stats to PDC to trace routing easier * rename spinlock functions * fix for spinlock renames * revert statsd socket statistics to size_t * turn fatal into internal_fatal() * show candidates always * show connected status and connection attempts
2023-06-19/api/v2/nodes and streaming function (#15168)Costa Tsaousis
* dummy streaming function * expose global functions upstream * separate function for pushing global functions * add missing conditions * allow streaming function to run async * started internal API for functions * cache host retention and expose it to /api/v2/nodes * internal API for function table fields; more progress on streaming status * abstracted and unified rrdhost status * port old coverity warning fix - although it is not needed * add ML information to rrdhost status * add ML capability to streaming to signal the transmission of ML information; added ML information to host status * protect host->receiver * count metrics and instances per host * exposed all inbound and outbound streaming * fix for ML status and dependency of DATA_WITH_ML to INTERPOLATED, not IEEE754 * update ML dummy * added all fields * added streaming group by and cleaned up accepted values by cloud * removed type * Revert "removed type" This reverts commit faae4177e603d4f85b7433f33f92ef3ccd23976e. * added context to db summary * new /api/v2/nodes schema * added ML type * change default function charts * log to trace new capa * add more debug * removed debugging code * retry on receive interrupted read; respect sender reconnect delay in all cases * set disconnected host flag and manipulate localhost child count atomically, inside set/clear receiver * fix infinite loop * send_to_plugin() now has a spinlock to ensure that only 1 thread is writing to the plugin/child at the same time * global cloud_status() call * cloud should be a section, since it will contain error information * put cloud capabilities into cloud * aclk status in /api/v2 agents sections * keep aclk_connection_counter * updates on /api/v2/nodes * final /api/v2/nodes and addition of /api/v2/nodes_instances * parametrize all /api/v2/xxx output to control which info is outputed per endpoint * always accept nodes selector * st needs to be per instance, not per node * fix merging of contexts; fix cups plugin priorities * add after and before parameters to /api/v2/contexts/nodes/nodes_instances/q * give each libuv worker a unique id * aclk http_api_v2 version 4
2023-06-08api v2 nodes for streaming statuses (#15162)Costa Tsaousis
* api v2 nodes for streaming statuses * remove test * move parts of the output * in api/v2/data return 5 values per point when aggregation=percentage and raw option is given; return final values when aggregation=percentage is not the final grouping
2023-06-07Re-write of SSL support in Netdata; restoration of SIGCHLD; detection of ↵Costa Tsaousis
stale plugins; streaming improvements (#15113) * add information about streaming connections to /api/v2/nodes; reset defer time when sender or receivers connect or disconnect * make each streaming destination respect its SSL settings * to not send SSL traffic over non-SSL connection * keep track of outgoing streaming connection attempts * retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ * Revert "retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ" This reverts commit 14c858677c6f2d3b08c94f298e2f45ecdb74c801. * cleanup SSL connections properly * initialize SSL in rpt before takeover * sender should free SSL when talking to a non-SSL destination * do not shutdown SSL when receiver exits * restore operation of SIGCHLD when the reaper is not enabled * create an fgets function that checks for data and times out * work on error handling of plugins exiting * remove newlines from logs * global call to waitid(), caching the result for netdata_pclose() to process * receiver tid * parser timeouts in 2 minutes instead of 10 * fix crash when UUID is NULL in SQLite * abstract sqlite3 parsing for uuid and text * write proper ssl errors on read and write * fix for SSL_ERROR_WANT_RETRY_VERIFY * SSL WANT per function * unified SSL error logging * fix compilation warning * additional logging about parser cleanup * streaming parser should call the pluginsd parser cleanup * SSL error handling work * SSL initialization unification * check for pending data when receiving SSL response with timeout * macro to check if an SSL connection has been established * remove SSL_pending() * check for SSL macros * use SSL_peek() to find if there is a response * SSL renames * more SSL renames & cleanup * rrdpush ssl connection function * abstract all SSL functions into security.c * keep track of SSL connections and always attempt to use SSL read/write when on SSL connection * signal openssl to skip certificate validation when configured to do so * better SSL error handling and logging * SSL code cleanup * SSL retry on SSL_connect and SSL_accept * SSL provide default return value for old compilers * SSL read/write functions emulate system read/write functions * fix receive/send timeout and switch from SSL_peek() to SSL_pending() * remove SSL_pending() * removed sender auto-retry and debug info for initial recevier response * ssl skip certificate verification config for web server * ssl errors log ip and port of the peer * keep ssl with web_client for its whole lifetime * thread safe socket peers to text * use error_limit() for common ssl errors * cleanup * more cleanup * coverity fixes * ssl error logs include both local and remote ip/port info * remove obsolete code
2023-04-26Add support for acquire/release operations on RRDSETs (#14945)vkalintiris
* Add acquire/release support for RRDSET * Release/acquire RRDSET when training. * Fix function name in log message. * Use proper function name to get the hostname. * Add acquire/release for hosts and skip training orphan/obsolete hosts/charts. * Fix variable name
2023-04-25Reject child when context is loading (#14960)Stelios Fragkakis
* Reject child when context is loading * Switch check position
2023-04-20WEBRTC for communication between agents and browsers (#14874)Costa Tsaousis
* initial webrtc setup * missing files * rewrite of webrtc integration * initialization and cleanup of webrtc connections * make it compile without libdatachannel * add missing webrtc_initialize() function when webrtc is not enabled * make c++17 optional * add build/m4/ax_compiler_vendor.m4 * add ax_cxx_compile_stdcxx.m4 * added new m4 files to makefile.am * id all webrtc connections * show warning when webrtc is disabled * fixed message * moved all webrtc error checking inside webrtc.cpp * working webrtc connection establishment and cleanup * remove obsolete code * rewrote webrtc code in C to remove dependency for c++17 * fixed left-over reference * detect binary and text messages * minor fix * naming of webrtc threads * added webrtc configuration * fix for thread_get_name_np() * smaller web_client memory footprint * universal web clients cache * free web clients every 100 uses * webrtc is now enabled by default only when compiled with internal checks * webrtc responses to /api/ requests, including LZ4 compression * fix for binary and text messages * web_client_cache is now global * unification of the internal web server API, for web requests, aclk request, webrtc requests * more cleanup and unification of web client timings * fixed compiler warnings * update sent and received bytes * eliminated of almost all big buffers in web client * registry now uses the new json generation * cookies are now an array; fixed redirects * fix redirects, again * write cookies directly to the header buffer, eliminating the need for cookie structures in web client * reset the has_cookies flag * gathered all web client cleanup to one function * fixes redirects * added summary.globals in /api/v2/data response * ars to arc in /api/v2/data * properly handle host impersonation * set the context of mem.numa_nodes
2023-04-20Initialize machine GUID earlier in the agent startup sequence (#14922)Stelios Fragkakis
* Read (cache) machine guid earlier, so that a fatal event during init will have the correct machine guid * Setup analytics global environment explicitly if we fail to initialize before localhost is created
2023-04-20Skip ML initialization when it's been disabled in netdata.conf (#14920)vkalintiris
* Apply ML changes again. The ML changes in 003df5f2 wheere reverted with 556bdad9 because we were partially initializing ML even when it was explicitly disabled in netdata.conf, causing the agent to crash on startup. * Do not start/stop ML threads when ML is disabled. * Restore default config settings.
2023-04-14Revert ML changes. (#14908)vkalintiris
2023-04-13Save and load ML models (#14810)vkalintiris
* Revert "Revert "Use static thread-pool for training. (#14702)" (#14782)" This reverts commit 5321ca8d1ef8d974a6a2b2128ca8804de6acb693. * Model I/O. * Minor changes Meant to make debugging a crash issues easier on cloud VMs: - Less verbose logging - Higher logging history - Modify installer to use debug info by default * Fix ML initialization order. * read lock hosts when running detection. * Revert debugging changes. * Update ml/Config.cc Co-authored-by: Andrew Maguire <andrewm4894@gmail.com> --------- Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>
2023-03-23Schedule node info to the cloud after child connection (#14790)Stelios Fragkakis
* Schedule node info to the cloud after child connection * Remove debug code * Schedule localhost node info within 5 seconds of startup. If no children are detected Or a child connects (switch to immediate localhost node info update)
2023-03-21Revert "Use static thread-pool for training. (#14702)" (#14782)vkalintiris
This reverts commit 5046e034212c008557dd014196b6f6204eda24b2. Will re-apply once we investigate an issue that occurs during the shutdown of the agent.
2023-03-21Use static thread-pool for training. (#14702)vkalintiris
* Use static thread-pool for training. * Add missing function definition * disable training stats chart * Add config option to explicitly enable ML stats charts. --------- Co-authored-by: Costa