summaryrefslogtreecommitdiffstats
path: root/web/api/queries/query.c
AgeCommit message (Collapse)Author
2024-01-11Name storage engine variables consistently. (#16753)vkalintiris
* Consistent naming of STORAGE_INSTANCE instances. Replace usages of `db_instance` and `instance` with `si`. * Rename array `storage_metrics_groups[tier]` to `smg[tier]` * Rename db_metric_handle to smh * Rename instances of `storage_engine_query_handle` to `seqh`. * Rename instances of STORAGE_ENGINE_BACKEND to `seb`. * Rename instances of STORAGE_COLLECT_HANDLE to `sch`.
2023-12-15Queries Progress (#16574)Costa Tsaousis
* track the progress of queries * add query_progress in libnetdata Makefile.am * add acl, response size and response code to the tracking * define the required functions * fix the last commit * added /api/v2/progress?transaction=ID to report the progress of queries * added function to report netdata-queries * track hashtable additions * when resusing a transaction, maintain the counter * keep track of linked and indexing * added X-Forwarded-Host and X-Forwarded-For to logs. X-Forwarded-For is also added in progress tracking * report compact uuids to match logs; register the actual duration of the transaction * added rowOptions to function; now web_client keeps track if it tracks progress or not * add http request method to progress * add tags per function; /api/vX/functions is now not protected * compact the sanitization array * split pluginsd_parser into multiple files * cleanup keyword definitions * code cleanup * extracted rrd_collector to separate files * added http access level to functions * renamed access "all" to "any" * implemented optional protection on functions * add priority to functions, to allow the UI select the best function (lower priority) when the user has not selected a function * added progress report from the plugins to netdata and from children to parents - untested * added progress reporting in systemd-journal * query timeout is now handled by evloop for external plugins * propagate progress reports to children and plugins * fix codeql warning * adapt to cmake * minor changes * extend function timeout when progress is received; added streaming capability to propagate progress reports to parents and send progress requests to children * revert change in dictionary.h * add log when access level is invalid * update access level of functions * added logs when processing progress updates * log when the deferred response is too big * comment out sender progress to find the issue * added missing newline in streaming progress reports * propogate progress reports to functions * fix logs
2023-12-12code cleanup (#16542)Costa Tsaousis
fixed minor code cleanup warnings
2023-11-22New logging layer (#16357)Costa Tsaousis
* cleanup of logging - wip * first working iteration * add errno annotator * replace old logging functions with netdata_logger() * cleanup * update error_limit * fix remanining error_limit references * work on fatal() * started working on structured logs * full cleanup * default logging to files; fix all plugins initialization * fix formatting of numbers * cleanup and reorg * fix coverity issues * cleanup obsolete code * fix formatting of numbers * fix log rotation * fix for older systems * add detection of systemd journal via stderr * finished on access.log * remove left-over transport * do not add empty fields to the logs * journal get compact uuids; X-Transaction-ID header is added in web responses * allow compiling on systems without memfd sealing * added libnetdata/uuid directory * move datetime formatters to libnetdata * add missing files * link the makefiles in libnetdata * added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses * added stream receiver, sender, proc plugin and pluginsd log stack * iso8601 advanced usage; line_splitter module in libnetdata; code cleanup * add message ids to streaming inbound and outbound connections * cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled * send SIGABRT to external plugins only if we are not shutting down * fix cross cleanup in pluginsd parser * fatal when there is a stack error in logs * compile netdata with -fexceptions * do not kill external plugins with SIGABRT * metasync info logs to debug level * added severity to logs * added json output; added options per log output; added documentation; fixed issues mentioned * allow memfd only on linux * moved journal low level functions to journal.c/h * move health logs to daemon.log with proper priorities * fixed a couple of bugs; health log in journal * updated docs * systemd-cat-native command to push structured logs to journal from the command line * fix makefiles * restored NETDATA_LOG_SEVERITY_LEVEL * fix makefiles * systemd-cat-native can also work as the logger of Netdata scripts * do not require a socket to systemd-journal to log-as-netdata * alarm notify logs in native format * properly compare log ids * fatals log alerts; alarm-notify.sh working * fix overflow warning * alarm-notify.sh now logs the request (command line) * anotate external plugins logs with the function cmd they run * added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash * reformatted alarm-notify logs * unify cgroup-network-helper.sh * added quotes around params * charts.d.plugin switched logging to journal native * quotes for logfmt * unify the status codes of streaming receivers and senders * alarm-notify: dont log anything, if there is nothing to do * all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available * migrate cgroup-name.sh to new logging * systemd-cat-native now supports messages with newlines * socket.c logs use priority * cleanup log field types * inherit the systemd set INVOCATION_ID if found * allow systemd-cat-native to send messages to a systemd-journal-remote URL * log2journal command that can convert structured logs to journal export format * various fixes and documentation of log2journal * updated log2journal docs * updated log2journal docs * updated documentation of fields * allow compiling without libcurl * do not use socket as format string * added version information to newly added tools * updated documentation and help messages * fix the namespace socket path * print errno with error * do not timeout * updated docs * updated docs * updated docs * log2journal updated docs and params * when talking to a remote journal, systemd-cat-native batches the messages * enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote * Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote" This reverts commit b079d53c11f6687cd64d804fdd7b24c0492bf245. * note about uncompressed traffic * log2journal: code reorg and cleanup to make modular * finished rewriting log2journal * more comments * rewriting rules support * increased limits * updated docs * updated docs * fix old log call * use journal only when stderr is connected to journal * update netdata.spec for libcurl, libpcre2 and log2journal * pcre2-devel * do not require pcre2 in centos < 8, amazonlinux < 2023, open suse * log2journal only on systems pcre2 is available * ignore log2journal in .gitignore * avoid log2journal on centos 7, amazonlinux 2 and opensuse * add pcre2-8 to static build * undo last commit * Bundle to static Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Add build deps for deb packages Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Add dependencies; build from source Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Test build for amazon linux and centos expect to fail for suse Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * fix minor oversight Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Reorg code * Add the install from source (deps) as a TODO * Not enable the build on suse ecosystem Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> --------- Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-10-24QUERIES: use tiers only when they have useful data (#16263)Costa Tsaousis
use tiers only when they have useful data
2023-10-04journal: fix the 1 second latency in play mode (#16123)Costa Tsaousis
provide a relative_to_absolute function that does not touch the current realtime time
2023-10-02fix querying out of retention (#16094)Costa Tsaousis
if the db does not have enough data to satisfy a query, cancel it
2023-09-01Reduce label memory (#15255)Stelios Fragkakis
2023-08-08extend the trimming window to avoid empty points at the end of queries (#15773)Costa Tsaousis
2023-08-03systemd-journal plugin (#15363)Costa Tsaousis
2023-08-03Revert "Refactor RRD code. (#15423)" (#15723)vkalintiris
This reverts commit 440bd51e08fdfa2a4daa191fb68643456028a753. dbengine was still being used for non-zero tiers even on non-dbengine modes.
2023-07-2810 points per query min (#15595)Costa Tsaousis
2023-07-26Refactor RRD code. (#15423)vkalintiris
* Storage engine. * Host indexes to rrdb * Move globals to rrdb * Move storage_tiers_backfill to rrdb * default_rrd_update_every to rrdb * default_rrd_history_entries to rrdb * gap_when_lost_iterations_above to rrdb * rrdset_free_obsolete_time_s to rrdb * libuv_worker_threads to rrdb * ieee754_doubles to rrdb * rrdhost_free_orphan_time_s to rrdb * rrd_rwlock to rrdb * localhost to rrdb * rm extern from func decls * mv rrd macro under rrd.h * default_rrdeng_page_cache_mb to rrdb * default_rrdeng_extent_cache_mb to rrdb * db_engine_journal_check to rrdb * default_rrdeng_disk_quota_mb to rrdb * default_multidb_disk_quota_mb to rrdb * multidb_ctx to rrdb * page_type_size to rrdb * tier_page_size to rrdb * No storage_engine_id in rrdim functions * storage_engine_id is provided by st * Update to fix merge conflict. * Update field name * Remove unnecessary macros from rrd.h * Rm unused type decls * Rm duplicate func decls * make internal function static * Make the rest of public dbengine funcs accept a storage_instance. * No more rrdengine_instance :) * rm rrdset_debug from rrd.h * Use rrdb to access globals in ML and ACLK Missed due to not having the submodules in the worktree. * rm total_number * rm RRDVAR_TYPE_TOTAL * rm unused inline * Rm names from typedef'd enums * rm unused header include * Move include * Rm unused header include * s/rrdhost_find_or_create/rrdhost_get_or_create/g * s/find_host_by_node_id/rrdhost_find_by_node_id/ Also, remove duplicate definition in rrdcontext.c * rm macro used only once * rm macro used only once * Reduce rrd.h api by moving funcs into a collector specific utils header * Remove unused func * Move parser specific function out of rrd.h * return storage_number instead of void pointer * move code related to rrd initialization out of rrdhost.c * Remove tier_grouping from rrdim_tier Saves 8 * storage_tiers bytes per dimension. * Fix rebase * s/rrd_update_every/update_every/ * Mark functions as static and constify args * Add license notes and file to build systems. * Remove remaining non-log/config mentions of memory mode * Move rrdlabels api to separate file. Also, move localhost functions that loads labels outside of database/ and into daemon/ * Remove function decl in rrd.h * merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir * Do not expose internal function from rrd.h * Rm NETDATA_RRD_INTERNALS Only one function decl is covered. We have more database internal functions that we currently expose for no good reason. These will be placed in a separate internal header in follow up PRs. * Add license note * Include libnetdata.h instead of aral.h * Use rrdb to access localhost * Fix builds without dbengine * Add header to build system files * Add rrdlabels.h to build systems * Move func def from rrd.h to rrdhost.c * Fix macos build * Rm non-existing function * Rebase master * Define buffer length macro in ad_charts. * Fix FreeBSD builds. * Mark functions static * Rm func decls without definitions * Rebase master * Rebase master * Properly initialize value of storage tiers. * Fix build after rebase.
2023-07-20fix the calculation of incremental-sum (#15468)Costa Tsaousis
* fix the calculation of incremental-sum * for query planning use at least 400 points
2023-07-13Rename log_access and log_health (#15368)Emmanuel Vasilakis
2023-06-30Replace `info` macro with a less generic name (#15266)Carlo Cabrera
2023-06-26use gperf for the pluginsd/streaming parser hashtable (#15251)Costa Tsaousis
* use gperf for the pluginsd parser * simplify pluginsd_parser by removing void pointers to user * pluginsd_split_words() with inlined pluginsd_space() * quoted_string_splitter() now uses a map instead of a function for determining spaces * add stress test for pluginsd parser * optimized BITMAP256 * optimized rrdpush receiver reception * optimized rrdpush sender compression * renames and cleanup * remove wrong negation * unify handshake and disconnection reasons * use parser_find_keyword * register job names only for the current repertoire
2023-06-19Obvious memory reductions (#15204)Costa Tsaousis
* remove rd->update_every * reduce amount of memory for RRDDIM * reorgnize rrddim->db entries * optimize rrdset and statsd * optimize dictionaries * RW_SPINLOCK for dictionaries * fix codeql warning * rw_spinlock improvements * remove obsolete assertion * fix crash on health_alarm_log_process() * use RW_SPINLOCK for AVL trees * add RW_SPINLOCK read/write trylock * pgc and mrg now use rw_spinlocks; cache line optimizations for mrg * thread tag of dbegnine init * append created datafile, lockless * make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use * thread cancelability in spinlocks; optimize thread cancelability management * introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files * use the last timestamp of each journal file for indexing * when the previous cannot be found, start from the beginning * add more stats to PDC to trace routing easier * rename spinlock functions * fix for spinlock renames * revert statsd socket statistics to size_t * turn fatal into internal_fatal() * show candidates always * show connected status and connection attempts
2023-06-07Re-write of SSL support in Netdata; restoration of SIGCHLD; detection of ↵Costa Tsaousis
stale plugins; streaming improvements (#15113) * add information about streaming connections to /api/v2/nodes; reset defer time when sender or receivers connect or disconnect * make each streaming destination respect its SSL settings * to not send SSL traffic over non-SSL connection * keep track of outgoing streaming connection attempts * retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ * Revert "retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ" This reverts commit 14c858677c6f2d3b08c94f298e2f45ecdb74c801. * cleanup SSL connections properly * initialize SSL in rpt before takeover * sender should free SSL when talking to a non-SSL destination * do not shutdown SSL when receiver exits * restore operation of SIGCHLD when the reaper is not enabled * create an fgets function that checks for data and times out * work on error handling of plugins exiting * remove newlines from logs * global call to waitid(), caching the result for netdata_pclose() to process * receiver tid * parser timeouts in 2 minutes instead of 10 * fix crash when UUID is NULL in SQLite * abstract sqlite3 parsing for uuid and text * write proper ssl errors on read and write * fix for SSL_ERROR_WANT_RETRY_VERIFY * SSL WANT per function * unified SSL error logging * fix compilation warning * additional logging about parser cleanup * streaming parser should call the pluginsd parser cleanup * SSL error handling work * SSL initialization unification * check for pending data when receiving SSL response with timeout * macro to check if an SSL connection has been established * remove SSL_pending() * check for SSL macros * use SSL_peek() to find if there is a response * SSL renames * more SSL renames & cleanup * rrdpush ssl connection function * abstract all SSL functions into security.c * keep track of SSL connections and always attempt to use SSL read/write when on SSL connection * signal openssl to skip certificate validation when configured to do so * better SSL error handling and logging * SSL code cleanup * SSL retry on SSL_connect and SSL_accept * SSL provide default return value for old compilers * SSL read/write functions emulate system read/write functions * fix receive/send timeout and switch from SSL_peek() to SSL_pending() * remove SSL_pending() * removed sender auto-retry and debug info for initial recevier response * ssl skip certificate verification config for web server * ssl errors log ip and port of the peer * keep ssl with web_client for its whole lifetime * thread safe socket peers to text * use error_limit() for common ssl errors * cleanup * more cleanup * coverity fixes * ssl error logs include both local and remote ip/port info * remove obsolete code
2023-05-31Percentage of group aggregatable at cloud - fixed for backwards ↵Costa Tsaousis
compatibility (#15126) * percentage of group is now aggregatable at cloud across multiple nodes * do not break backwards compatibility with percentage-of-instance * calculate the percentage when percentage-of-instance is requested * increase capability version
2023-05-31Revert "percentage of group is now aggregatable at cloud across multiple ↵Costa Tsaousis
nodes" (#15122) Revert "percentage of group is now aggregatable at cloud across multiple nodes (#15109)" This reverts commit 44b6c223b3e13774df45a96dd48588aa8a66ba42.
2023-05-30percentage of group is now aggregatable at cloud across multiple nodes (#15109)Costa Tsaousis
2023-05-27percentage-of-group: fix uninitialized array vh (#15106)Costa Tsaousis
fix uninitialized array vh
2023-05-25/api/v2/data percentage calculation on grouped queries (#15100)Costa Tsaousis
allow aggregation=percentage to calculate the percentage over any grouping
2023-04-26do not convert to percentage, when the raw option is given (#14969)Costa Tsaousis
2023-04-20WEBRTC for communication between agents and browsers (#14874)Costa Tsaousis
* initial webrtc setup * missing files * rewrite of webrtc integration * initialization and cleanup of webrtc connections * make it compile without libdatachannel * add missing webrtc_initialize() function when webrtc is not enabled * make c++17 optional * add build/m4/ax_compiler_vendor.m4 * add ax_cxx_compile_stdcxx.m4 * added new m4 files to makefile.am * id all webrtc connections * show warning when webrtc is disabled * fixed message * moved all webrtc error checking inside webrtc.cpp * working webrtc connection establishment and cleanup * remove obsolete code * rewrote webrtc code in C to remove dependency for c++17 * fixed left-over reference * detect binary and text messages * minor fix * naming of webrtc threads * added webrtc configuration * fix for thread_get_name_np() * smaller web_client memory footprint * universal web clients cache * free web clients every 100 uses * webrtc is now enabled by default only when compiled with internal checks * webrtc responses to /api/ requests, including LZ4 compression * fix for binary and text messages * web_client_cache is now global * unification of the internal web server API, for web requests, aclk request, webrtc requests * more cleanup and unification of web client timings * fixed compiler warnings * update sent and received bytes * eliminated of almost all big buffers in web client * registry now uses the new json generation * cookies are now an array; fixed redirects * fix redirects, again * write cookies directly to the header buffer, eliminating the need for cookie structures in web client * reset the has_cookies flag * gathered all web client cleanup to one function * fixes redirects * added summary.globals in /api/v2/data response * ars to arc in /api/v2/data * properly handle host impersonation * set the context of mem.numa_nodes
2023-04-07Boost dbengine (#14832)Costa Tsaousis
* configure extent cache size * workers can now execute up to 10 jobs in a run, boosting query prep and extent reads * fix dispatched and executing counters * boost to the max * increase libuv worker threads * query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical * fix accounting of query prep * inlining of time-grouping functions, to speed up queries with billions of points * make switching based on a local const variable * print one pending contexts loading message per iteration * inlined store engine query API * inlined storage engine data collection api * inlined all storage engine query ops * eliminate and inline data collection ops * simplified query group-by * more error handling * optimized partial trimming of group-by queries * preparative work to support multiple passes of group-by * more preparative work to support multiple passes of group-by (accepts multiple group-by params) * unified query timings * unified query timings - weights endpoint * query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries * fix query memory accounting * added summary.dimension[].pri and sorted summary.dimensions based on priority and then name * limit max ACLK WEB response size to 30MB * the response type should be text/plain * more preparative work for multiple group-by passes * create functions for generating group by keys, ids and names * multiple group-by passes are now supported * parse group-by options array also with an index * implemented percentage-of-instance group by function * family is now merged in multi-node contexts * prevent uninitialized use
2023-03-28/api/v2/X part 7 (#14797)Costa Tsaousis
* /api/v2/weights, points key renamed to result * /api/v2/weights, add node ids in response * /api/v2/data remove NONZERO flag when all dimensions are zero and fix MIN/MAX grouping and statistics * /api/v2/data expose view.dimensions.sts{} * /api/v2 endpoints expose agents and additional info per node, that is needed to unify cloud responses * /api/v2 nodes output now includes the duration of time spent per node * jsonwrap view object renames and cleanup * rework of the statistics returned by the query engine * swagger work * swagger work * more swagger work * updated swagger json * added the remaining of the /api/v2 endpoints to swagger * point.ar has been renamed point.arp * updated weights endpoint * fix compilation warnings
2023-03-21/api/v2/X part 5 (#14718)Costa Tsaousis
* query timestamps are now pre-determined and alignment on timestamps is guarranteed * turn internal_fatal() to internal_error() to investigate the issue * handle query when no data exist in the db * check for non NULL dict when running dictionary garbage collect * support API v2 requests via ACLK * add nodes detailed information to /api/v2/nodes * fixed keys and added dummy nodes for completeness * added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node * make sure replication does not double count charts that are already being replicated * expose min and max in sts structures * added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters * respect trimming in percentage calculation; updated swagger * api/v2/weights preparative work to support multi-node queries - still single node though * multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data * when passing the raw option, the query exposes the hidden dimensions * fix compilation issues on older systems * the query engine now calculates per dimension min, max, sum, count, anomaly count * use the macro to calculate storage point anomaly rate * weights endpoint exposing version hashes * weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate * query: expose RESET flag; do not add the same point multiple times to the aggregated point * weights: more compact output * weights requests can be interrupted * all /api/v2 requests can be interrupted and timeout * allow relative timestamps in weights * fix macos compilation warnings * Revert "fix macos compilation warnings" This reverts commit 8a1d24e41e9b58de566ac59f0c4b1c465bcc0592. * /api/v2/data group-by now works on dimension names, not ids * /api/v2/weights does not query metrics without retention and new output format * /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms
2023-03-13/api/v2 part 4 (#14706)Costa Tsaousis
* expose the order of group by * key renames in json wrapper v2 * added group by context and group by units * added view_average_values * fix for view_average_values when percentage is specified * option group-by-labels is enabling the exposure of all the labels that are used for each of the final grouped dimensions * when executing group by queries, allocate one dimension data at a time - not all of them * respect hidden dimensions * cancel running data query on socket error * use poll to detect socket errors * use POLLRDHUP to detect half closed connections * make sure POLLRDHUP is available * do not destroy aral-by-size arals * completed documentation of /api/v2/data. * moved min, max back to view; updated swagger yaml and json * default format for /api/v2/data is json2
2023-03-10/api/v2/X improvements part 3 (#14665)Costa Tsaousis
* max web request size to 64KB * fix the request too big message * increase max request reading tries to 100 * support for bigger web requests * add "avg" as a shortcut for "average" to both group by aggregation and time aggregation; discard the last partial points of a query in play mode, up to max update every; group by hidden dimensions too * better implementation for partial data trimming * added group_by=selected to return only one dimension for all selected metrics * fix acceptance of group_by=selected * passing option "raw" disables partial data trimming * remove obsolete option "plan"; use "debug" * fix view.min and view.max calculation - there were 2 bugs: a) min and max were reset for every row and b) min and max were corrupted by GBC and AR printing * per row annotations * added time column to point annotations * disable caching for /api/v2/contexts responses * added api format json2 that returns an array for each points, having all the point values and annotations in them * work on swagger about /api/v2 * prevent infinite loop * cleanup and swagger work * allow negative simple pattern expressions to work as expected * do not lookup in the dictionary empty names * garbage collect dictionaries * make query_target allocate less aggressively; queries fill the remaining points with nulls * reusable query ops to save memory on huge queries * move parts of query plans into query ops to save query target memory * remove storage engine from query metric tiers, to save memory, and recalculate it when it is needed
2023-03-02/api/v2/contexts (#14592)Costa Tsaousis
* preparation for /api/v2/contexts * working /api/v2/contexts * add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages * minor fix * query targegt now accurately counts hosts, contexts, instances, dimensions, metrics * cleanup /api/v2/contexts * full text search with /api/v2/contexts * simple patterns now support the option to search ignoring case * full text search API with /api/v2/q * simple pattern execution optimization * do not show q when not given * full text search accounting * separated /api/v2/nodes from /api/v2/contexts * fix ssv queries for group_by * count query instances queried and failed per context and host * split rrdcontext.c to multiple files * add query totals * fix anomaly rate calculation; provide "ni" for indexing hosts * do not generate zero valued members * faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point * fix typo when printing dimensions totals * added option minify to remove spaces and newlines fron JSON output * send instance ids and names when they differ * do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe * fix for the previous + renames and code cleanup * when a dimension is filtered, include in the response all the other dimensions that are selectable * do not add nodes that do not have retention in the current window * move selection of dimensions to query_dimension_add(), instead of query_metric_add() * increase the pre-processing capacity of queries * generate instance fqdn ids and names only when they are needed * provide detailed statistics about tiers retention, queries, points, update_every * late allocation of query dimensions * cleanup * more cleanup * support for annotations per displayed point, RESET and PARTIAL * new type annotations * if a chart is not linked to contexts and it is collected, link it when it is collected * make ML run reentrant * make ML rrdr query synchronous * optimize replication memory allocation of replication_sort_entry * change units to percentage, when requesting a coefficinet of variation, or a percentage query * initialize replication before starting main threads * properly decrement no room requests counter * propagate the non-zero flag to group-by * the same by avoiding the extra loop * respect non-zero in all dimension arrays * remove dictionary garbage collection from dictionary_entries() and dictionary_version() * be more verbose when jv2 indexing is postponed * prevent infinite loop * use hidden dimensions even when dimensions pattern is unset * traverse hosts using dictionaries * fix dictionary unittests
2023-02-22/api/v2/data - multi-host/context/instance/dimension/label queries (#14564)Costa Tsaousis
* fundamentals for having /api/v2/ working * use an atomic to prevent writing to internal pipe too much * first attempt of multi-node, multi-context, multi-chart, multi-dimension queries * v2 jsonwrap * first attempt for group by * cleaned up RRDR and fixed group by * improvements to /api/v2/api * query instance may be realloced, so pointers to it get invalid; solved memory leaks * count of quried metrics in summary information * provide detailed information about selected, excluded, queried and failed metrics for each entity * select instances by fqdn too * add timing information to json output * link charts to rrdcontexts, if a query comes in and it is found unlinked * calculate min, max, sum, average, volume, count per metric * api v2 parameters naming * renders alerts and units * render machine_guid and node_id in all sections it is relevant * unified keys * group by now takes into account units and when there are multiple units involved, it creates a dimension per unit * request and detailed are hidden behind an option * summary includes only a flattened list of alerts * alert counts per host and instance * count of grouped metrics per dimension * added contexts to summary * added chart title * added dimension priorities and chart type * support for multiple group by at the same time * minor fixes * labels are now a tree * keys uniformity * filtering by alerts, both having a specific alert and having a specific alert in a specific status * added scope of hosts and contexts * count of instances on contexts and hosts * make the api return valid responses even when the response contains no data * calculate average and contribution % for every item in the summary * fix compilation warnings * fix compilation warnings - again
2023-02-15JSON internal API, IEEE754 base64/hex streaming, weights endpoint ↵Costa Tsaousis
optimization (#14493) * first work on standardizing json formatting * renamed old grouping to time_grouping and added group_by * add dummy functions to enable compilation * buffer json api work * jsonwrap opening with buffer_json_X() functions * cleanup * storage for quotes * optimize buffer printing for both numbers and strings * removed ; from define * contexts json generation using the new json functions * fix buffer overflow at unit test * weights endpoint using new json api * fixes to weights endpoint * check buffer overflow on all buffer functions * do synchronous queries for weights * buffer_flush() now resets json state too * content type typedef * print double values that are above the max 64-bit value * str2ndd() can now parse values above UINT64_MAX * faster number parsing by avoiding double calculations as much as possible * faster number parsing * faster hex parsing * accurate printing and parsing of double values, even for very large numbers that cannot fit in 64bit integers * full printing and parsing without using library functions - and related unit tests * added IEEE754 streaming capability to enable streaming of double values in hex * streaming and replication to transfer all values in hex * use our own str2ndd for set2 * remove subnormal check from ieee * base64 encoding for numbers, instead of hex * when increasing double precision, also make sure the fractional number printed is aligned to the wanted precision * str2ndd_encoded() parses all encoding formats, including integers * prevent uninitialized use * /api/v1/info using the new json API * Fix error when compiling with --disable-ml * Remove redundant 'buffer_unittest' declaration * Fix formatting * Fix formatting * Fix formatting * fix buffer unit test * apps.plugin using the new JSON API * make sure the metrics registry does not accept negative timestamps * do not allow pages with negative timestamps to be loaded from db files; do not accept pages with negative timestamps in the cache * Fix more formatting --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-02-07do not report dimensions that failed to be queried (#14447)Costa Tsaousis
* do not report dimensions that failed to be queried * renamed SELECTED to QUERIED to have clarity on what it means * fix wrong placement of continue
2023-02-01Remove equality when deciding how to use point (#14402)Emmanuel Vasilakis
remove equality
2023-01-30DBENGINE v2 - improvements part 11 (#14337)Costa Tsaousis
* acquiring / releasing interface for metrics * metrics registry statistics * cleanup metrics registry by deleting metrics when they dont have retention anymore; do not double copy the data of pages to be flushed * print the tier in retention summary * Open files with buffered instead of direct I/O (test) * added more metrics stats and fixed unittest * rename writer functions to avoid confusion with refcounting * do not release a metric that is not acquired * Revert to use direct I/O on write -- use direct I/O on read as well * keep track of ARAL overhead and add it to the memory chart * aral full check via api * Cleanup * give names to ARALs and PGCs * aral improvements * restore query expansion to the future * prefer higher resolution tier when switching plans * added extent read statistics * smoother joining of tiers at query engine * fine tune aral max allocation size * aral restructuring to hide its internals from the rest of netdata * aral restructuring; addtion of defrag option to aral to keep the linked list sorted - enabled by default to test it * fully async aral * some statistics and cleanup * fix infinite loop while calculating retention * aral docs and defragmenting disabled by default * fix bug and add optimization when defragmenter is not enabled * aral stress test * aral speed report and documentation * added internal checks that all pages are full * improve internal log about metrics deletion * metrics registry uses one aral per partition * metrics registry aral max size to 512 elements per page * remove data_structures/README.md dependency --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-01-27DBENGINE v2 - improvements part 10 (#14332)Costa Tsaousis
* replication cancels pending queries on exit * log when waiting for inflight queries * when there are collected and not-collected metrics, use the context priority from the collected only * Write metadata with a faster pace * Remove journal file size limit and sync mode to 0 / Drop wal checkpoint for now * Wrap in a big transaction remaining metadata writes (test 1) * fix higher tiers when tiering iterations = 2 * dbengine always returns db-aligned points; query engine expands the queries by 2 points in every direction to have enough data for interpolation * Wrap in a big transaction metadata writes (test 2) * replication cancelling fix * do not first and last entry in replication when the db has no retention * fix internal check condition * Increase metadata write batch size * always apply error limit to dbengine logs * Remove code that processes the obsolete health.db files * cleanup in query.c * do not allow queries to go beyond db boundaries * prevent internal log for +1 delta in timestamp * detect gap pages in conflicts * double protection for gap injection in main cache * Add checkpoint to prevent large WAL while running Remove unused and duplicate functions * do not allocate chart cache dir if not needed * add more info to unittests * revert query expansion to satisfy unittests Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-01-25DBENGINE v2 - improvements part 8 (#14319)Costa Tsaousis
* cache 100 pages for each size our tiers need * smarter page caching * account the caching structures * dynamic max number of cached pages * make variables const to ensure they are not changed * make sure replication timestamps do not go to the future * replication now sends chart and dimension states atomically; replication receivers ignores chart and dimension states when rbegin is also ignored * make sure all pages are flushed on shutdown * take into account empty points too * when recalculating retention update first_time_s on metrics only when they are bigger * Report the datafile number we use to recalculate retention * Report the datafile number we use to recalculate retention * rotate db at startup * make query plans overlap * Calculate properly first time s * updated event labels * negative page caching fix * Atempt to create missing tables on query failure * Atempt to create missing tables on query failure (part 2) * negative page caching for all gaps, to eliminate jv2 scans * Fix unittest Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-01-20track memory footprint of Netdata (#14294)Costa Tsaousis
* track memory footprint of Netdata * track db modes alloc/ram/save/map * track system info; track sender and receiver * fixes * more fixes * track workers memory, onewayalloc memory; unify judyhs size estimation * track replication structures and buffers * Properly clear host RRDHOST_FLAG_METADATA_UPDATE flag * flush the replication buffer every 1000 times the circular buffer is found empty * dont take timestamp too frequently in sender loop * sender buffers are not used by the same thread as the sender, so they were never recreated - fixed it * free sender thread buffer on replication threads when replication is idle * use the last sender flag as a timestamp of the last buffer recreation * free cbuffer before reconnecting * recreate cbuffer on every flush * timings for journal v2 loading * inlining of metric and cache functions * aral likely/unlikely * free left-over thread buffers * fix NULL pointer dereference in replication * free sender thread buffer on sender thread too * mark ctx as used before flushing * better logging on ctx datafiles closing Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-01-14More 32bit fixes (#14264)Costa Tsaousis
* query planer weight calculation using long long * adjust replication query ahead pipeline for smaller systems * do not generate huge replication messages * add message to indicate replication message was interrupted * improved message * max replication size 25% of sender buffer * fix for last commit * use less cache and smaller page sizes and fewer threads on 32-bits * fix reserved libuv workers for 32bits * fix detection of 32/64 bit
2023-01-13DBENGINE v2 - improvements 2 (#14257)Costa Tsaousis
* allow extents to be merged for as long as possible * do not block the event loop while recalculating retention due to datafile rotation * buffers are incrementally cleaned up, every second, by just 1 entry * fix order of commands * remove newline * measure cancelled extent read requests * count all cancelled extent requests * do not double count failed pages * fixed cancelled name * Fix error and warnings when compiling with --disable-dbengine * when the timeframe is outside retention and whole query should fail * do not mark as failed pages that have been loaded but have been skipped * added chart to show cache memory calculation variables * LONG_MAX for 32-bit compatibility * fix cache size calculation on 32-bit * fix cache size calculation on 32-bit - use unsinged long long * fix compilation warnings on 32-bits * fix another compilation warning on 32-bits * fix compilation warnings on older 32-bit compilers * fix compilation warnings on older 32-bit compilers - more of them * disable ML threads joining Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-01-12DBENGINE v2 - improvements part 1 (#14251)Costa Tsaousis
* allow running multiple evictors and flushers * flipped aggressive and critical evictions * dont run more than 1 evictor * switch to batch evictions when the size of the cache is critical * remove batching of evictions * dedup extent load pending requests * accounting for merged extents * always use double linked list * add extent merging to the overall cache hit ratio * support requeuing merged extents to higher priorities * fix function name * query planner now prefers higher tiers even when they miss some data at the end, which it fills from lower tiers; adding the option "plan" to jsonwrap now renders the query plan * update statistics after every dimension completes * use the retention of all tiers to calculate coverage per tier * use the original window of the query for the planner * give 2.5% befenit for each higher tier * update cmd->priority so that it be requeued multiple times * merged extent pages is a cache hit * fixed dbegnine cache hit stats
2023-01-12allow the cache to grow when huge queries are running that exceed the cache ↵Costa Tsaousis
size (#14247) * allow the cache to grow when huge queries are running that exceed the size of the cache * queue preloading of queries * finalize prepared queries on timeout