Age | Commit message (Collapse) | Author |
|
* Consistent naming of STORAGE_INSTANCE instances.
Replace usages of `db_instance` and `instance` with
`si`.
* Rename array `storage_metrics_groups[tier]` to `smg[tier]`
* Rename db_metric_handle to smh
* Rename instances of `storage_engine_query_handle` to `seqh`.
* Rename instances of STORAGE_ENGINE_BACKEND to `seb`.
* Rename instances of STORAGE_COLLECT_HANDLE to `sch`.
|
|
* track the progress of queries
* add query_progress in libnetdata Makefile.am
* add acl, response size and response code to the tracking
* define the required functions
* fix the last commit
* added /api/v2/progress?transaction=ID to report the progress of queries
* added function to report netdata-queries
* track hashtable additions
* when resusing a transaction, maintain the counter
* keep track of linked and indexing
* added X-Forwarded-Host and X-Forwarded-For to logs. X-Forwarded-For is also added in progress tracking
* report compact uuids to match logs; register the actual duration of the transaction
* added rowOptions to function; now web_client keeps track if it tracks progress or not
* add http request method to progress
* add tags per function; /api/vX/functions is now not protected
* compact the sanitization array
* split pluginsd_parser into multiple files
* cleanup keyword definitions
* code cleanup
* extracted rrd_collector to separate files
* added http access level to functions
* renamed access "all" to "any"
* implemented optional protection on functions
* add priority to functions, to allow the UI select the best function (lower priority) when the user has not selected a function
* added progress report from the plugins to netdata and from children to parents - untested
* added progress reporting in systemd-journal
* query timeout is now handled by evloop for external plugins
* propagate progress reports to children and plugins
* fix codeql warning
* adapt to cmake
* minor changes
* extend function timeout when progress is received; added streaming capability to propagate progress reports to parents and send progress requests to children
* revert change in dictionary.h
* add log when access level is invalid
* update access level of functions
* added logs when processing progress updates
* log when the deferred response is too big
* comment out sender progress to find the issue
* added missing newline in streaming progress reports
* propogate progress reports to functions
* fix logs
|
|
fixed minor code cleanup warnings
|
|
* cleanup of logging - wip
* first working iteration
* add errno annotator
* replace old logging functions with netdata_logger()
* cleanup
* update error_limit
* fix remanining error_limit references
* work on fatal()
* started working on structured logs
* full cleanup
* default logging to files; fix all plugins initialization
* fix formatting of numbers
* cleanup and reorg
* fix coverity issues
* cleanup obsolete code
* fix formatting of numbers
* fix log rotation
* fix for older systems
* add detection of systemd journal via stderr
* finished on access.log
* remove left-over transport
* do not add empty fields to the logs
* journal get compact uuids; X-Transaction-ID header is added in web responses
* allow compiling on systems without memfd sealing
* added libnetdata/uuid directory
* move datetime formatters to libnetdata
* add missing files
* link the makefiles in libnetdata
* added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses
* added stream receiver, sender, proc plugin and pluginsd log stack
* iso8601 advanced usage; line_splitter module in libnetdata; code cleanup
* add message ids to streaming inbound and outbound connections
* cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled
* send SIGABRT to external plugins only if we are not shutting down
* fix cross cleanup in pluginsd parser
* fatal when there is a stack error in logs
* compile netdata with -fexceptions
* do not kill external plugins with SIGABRT
* metasync info logs to debug level
* added severity to logs
* added json output; added options per log output; added documentation; fixed issues mentioned
* allow memfd only on linux
* moved journal low level functions to journal.c/h
* move health logs to daemon.log with proper priorities
* fixed a couple of bugs; health log in journal
* updated docs
* systemd-cat-native command to push structured logs to journal from the command line
* fix makefiles
* restored NETDATA_LOG_SEVERITY_LEVEL
* fix makefiles
* systemd-cat-native can also work as the logger of Netdata scripts
* do not require a socket to systemd-journal to log-as-netdata
* alarm notify logs in native format
* properly compare log ids
* fatals log alerts; alarm-notify.sh working
* fix overflow warning
* alarm-notify.sh now logs the request (command line)
* anotate external plugins logs with the function cmd they run
* added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash
* reformatted alarm-notify logs
* unify cgroup-network-helper.sh
* added quotes around params
* charts.d.plugin switched logging to journal native
* quotes for logfmt
* unify the status codes of streaming receivers and senders
* alarm-notify: dont log anything, if there is nothing to do
* all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available
* migrate cgroup-name.sh to new logging
* systemd-cat-native now supports messages with newlines
* socket.c logs use priority
* cleanup log field types
* inherit the systemd set INVOCATION_ID if found
* allow systemd-cat-native to send messages to a systemd-journal-remote URL
* log2journal command that can convert structured logs to journal export format
* various fixes and documentation of log2journal
* updated log2journal docs
* updated log2journal docs
* updated documentation of fields
* allow compiling without libcurl
* do not use socket as format string
* added version information to newly added tools
* updated documentation and help messages
* fix the namespace socket path
* print errno with error
* do not timeout
* updated docs
* updated docs
* updated docs
* log2journal updated docs and params
* when talking to a remote journal, systemd-cat-native batches the messages
* enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote
* Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote"
This reverts commit b079d53c11f6687cd64d804fdd7b24c0492bf245.
* note about uncompressed traffic
* log2journal: code reorg and cleanup to make modular
* finished rewriting log2journal
* more comments
* rewriting rules support
* increased limits
* updated docs
* updated docs
* fix old log call
* use journal only when stderr is connected to journal
* update netdata.spec for libcurl, libpcre2 and log2journal
* pcre2-devel
* do not require pcre2 in centos < 8, amazonlinux < 2023, open suse
* log2journal only on systems pcre2 is available
* ignore log2journal in .gitignore
* avoid log2journal on centos 7, amazonlinux 2 and opensuse
* add pcre2-8 to static build
* undo last commit
* Bundle to static
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Add build deps for deb packages
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Add dependencies; build from source
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Test build for amazon linux and centos expect to fail for suse
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* fix minor oversight
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
* Reorg code
* Add the install from source (deps) as a TODO
* Not enable the build on suse ecosystem
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
---------
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
|
|
use tiers only when they have useful data
|
|
provide a relative_to_absolute function that does not touch the current realtime time
|
|
if the db does not have enough data to satisfy a query, cancel it
|
|
|
|
|
|
|
|
This reverts commit 440bd51e08fdfa2a4daa191fb68643456028a753.
dbengine was still being used for non-zero tiers
even on non-dbengine modes.
|
|
|
|
* Storage engine.
* Host indexes to rrdb
* Move globals to rrdb
* Move storage_tiers_backfill to rrdb
* default_rrd_update_every to rrdb
* default_rrd_history_entries to rrdb
* gap_when_lost_iterations_above to rrdb
* rrdset_free_obsolete_time_s to rrdb
* libuv_worker_threads to rrdb
* ieee754_doubles to rrdb
* rrdhost_free_orphan_time_s to rrdb
* rrd_rwlock to rrdb
* localhost to rrdb
* rm extern from func decls
* mv rrd macro under rrd.h
* default_rrdeng_page_cache_mb to rrdb
* default_rrdeng_extent_cache_mb to rrdb
* db_engine_journal_check to rrdb
* default_rrdeng_disk_quota_mb to rrdb
* default_multidb_disk_quota_mb to rrdb
* multidb_ctx to rrdb
* page_type_size to rrdb
* tier_page_size to rrdb
* No storage_engine_id in rrdim functions
* storage_engine_id is provided by st
* Update to fix merge conflict.
* Update field name
* Remove unnecessary macros from rrd.h
* Rm unused type decls
* Rm duplicate func decls
* make internal function static
* Make the rest of public dbengine funcs accept a storage_instance.
* No more rrdengine_instance :)
* rm rrdset_debug from rrd.h
* Use rrdb to access globals in ML and ACLK
Missed due to not having the submodules in the
worktree.
* rm total_number
* rm RRDVAR_TYPE_TOTAL
* rm unused inline
* Rm names from typedef'd enums
* rm unused header include
* Move include
* Rm unused header include
* s/rrdhost_find_or_create/rrdhost_get_or_create/g
* s/find_host_by_node_id/rrdhost_find_by_node_id/
Also, remove duplicate definition in rrdcontext.c
* rm macro used only once
* rm macro used only once
* Reduce rrd.h api by moving funcs into a collector specific utils header
* Remove unused func
* Move parser specific function out of rrd.h
* return storage_number instead of void pointer
* move code related to rrd initialization out of rrdhost.c
* Remove tier_grouping from rrdim_tier
Saves 8 * storage_tiers bytes per dimension.
* Fix rebase
* s/rrd_update_every/update_every/
* Mark functions as static and constify args
* Add license notes and file to build systems.
* Remove remaining non-log/config mentions of memory mode
* Move rrdlabels api to separate file.
Also, move localhost functions that loads
labels outside of database/ and into daemon/
* Remove function decl in rrd.h
* merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir
* Do not expose internal function from rrd.h
* Rm NETDATA_RRD_INTERNALS
Only one function decl is covered. We have more
database internal functions that we currently
expose for no good reason. These will be placed
in a separate internal header in follow up PRs.
* Add license note
* Include libnetdata.h instead of aral.h
* Use rrdb to access localhost
* Fix builds without dbengine
* Add header to build system files
* Add rrdlabels.h to build systems
* Move func def from rrd.h to rrdhost.c
* Fix macos build
* Rm non-existing function
* Rebase master
* Define buffer length macro in ad_charts.
* Fix FreeBSD builds.
* Mark functions static
* Rm func decls without definitions
* Rebase master
* Rebase master
* Properly initialize value of storage tiers.
* Fix build after rebase.
|
|
* fix the calculation of incremental-sum
* for query planning use at least 400 points
|
|
|
|
|
|
* use gperf for the pluginsd parser
* simplify pluginsd_parser by removing void pointers to user
* pluginsd_split_words() with inlined pluginsd_space()
* quoted_string_splitter() now uses a map instead of a function for determining spaces
* add stress test for pluginsd parser
* optimized BITMAP256
* optimized rrdpush receiver reception
* optimized rrdpush sender compression
* renames and cleanup
* remove wrong negation
* unify handshake and disconnection reasons
* use parser_find_keyword
* register job names only for the current repertoire
|
|
* remove rd->update_every
* reduce amount of memory for RRDDIM
* reorgnize rrddim->db entries
* optimize rrdset and statsd
* optimize dictionaries
* RW_SPINLOCK for dictionaries
* fix codeql warning
* rw_spinlock improvements
* remove obsolete assertion
* fix crash on health_alarm_log_process()
* use RW_SPINLOCK for AVL trees
* add RW_SPINLOCK read/write trylock
* pgc and mrg now use rw_spinlocks; cache line optimizations for mrg
* thread tag of dbegnine init
* append created datafile, lockless
* make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use
* thread cancelability in spinlocks; optimize thread cancelability management
* introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files
* use the last timestamp of each journal file for indexing
* when the previous cannot be found, start from the beginning
* add more stats to PDC to trace routing easier
* rename spinlock functions
* fix for spinlock renames
* revert statsd socket statistics to size_t
* turn fatal into internal_fatal()
* show candidates always
* show connected status and connection attempts
|
|
stale plugins; streaming improvements (#15113)
* add information about streaming connections to /api/v2/nodes; reset defer time when sender or receivers connect or disconnect
* make each streaming destination respect its SSL settings
* to not send SSL traffic over non-SSL connection
* keep track of outgoing streaming connection attempts
* retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ
* Revert "retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ"
This reverts commit 14c858677c6f2d3b08c94f298e2f45ecdb74c801.
* cleanup SSL connections properly
* initialize SSL in rpt before takeover
* sender should free SSL when talking to a non-SSL destination
* do not shutdown SSL when receiver exits
* restore operation of SIGCHLD when the reaper is not enabled
* create an fgets function that checks for data and times out
* work on error handling of plugins exiting
* remove newlines from logs
* global call to waitid(), caching the result for netdata_pclose() to process
* receiver tid
* parser timeouts in 2 minutes instead of 10
* fix crash when UUID is NULL in SQLite
* abstract sqlite3 parsing for uuid and text
* write proper ssl errors on read and write
* fix for SSL_ERROR_WANT_RETRY_VERIFY
* SSL WANT per function
* unified SSL error logging
* fix compilation warning
* additional logging about parser cleanup
* streaming parser should call the pluginsd parser cleanup
* SSL error handling work
* SSL initialization unification
* check for pending data when receiving SSL response with timeout
* macro to check if an SSL connection has been established
* remove SSL_pending()
* check for SSL macros
* use SSL_peek() to find if there is a response
* SSL renames
* more SSL renames & cleanup
* rrdpush ssl connection function
* abstract all SSL functions into security.c
* keep track of SSL connections and always attempt to use SSL read/write when on SSL connection
* signal openssl to skip certificate validation when configured to do so
* better SSL error handling and logging
* SSL code cleanup
* SSL retry on SSL_connect and SSL_accept
* SSL provide default return value for old compilers
* SSL read/write functions emulate system read/write functions
* fix receive/send timeout and switch from SSL_peek() to SSL_pending()
* remove SSL_pending()
* removed sender auto-retry and debug info for initial recevier response
* ssl skip certificate verification config for web server
* ssl errors log ip and port of the peer
* keep ssl with web_client for its whole lifetime
* thread safe socket peers to text
* use error_limit() for common ssl errors
* cleanup
* more cleanup
* coverity fixes
* ssl error logs include both local and remote ip/port info
* remove obsolete code
|
|
compatibility (#15126)
* percentage of group is now aggregatable at cloud across multiple nodes
* do not break backwards compatibility with percentage-of-instance
* calculate the percentage when percentage-of-instance is requested
* increase capability version
|
|
nodes" (#15122)
Revert "percentage of group is now aggregatable at cloud across multiple nodes (#15109)"
This reverts commit 44b6c223b3e13774df45a96dd48588aa8a66ba42.
|
|
|
|
fix uninitialized array vh
|
|
allow aggregation=percentage to calculate the percentage over any grouping
|
|
|
|
* initial webrtc setup
* missing files
* rewrite of webrtc integration
* initialization and cleanup of webrtc connections
* make it compile without libdatachannel
* add missing webrtc_initialize() function when webrtc is not enabled
* make c++17 optional
* add build/m4/ax_compiler_vendor.m4
* add ax_cxx_compile_stdcxx.m4
* added new m4 files to makefile.am
* id all webrtc connections
* show warning when webrtc is disabled
* fixed message
* moved all webrtc error checking inside webrtc.cpp
* working webrtc connection establishment and cleanup
* remove obsolete code
* rewrote webrtc code in C to remove dependency for c++17
* fixed left-over reference
* detect binary and text messages
* minor fix
* naming of webrtc threads
* added webrtc configuration
* fix for thread_get_name_np()
* smaller web_client memory footprint
* universal web clients cache
* free web clients every 100 uses
* webrtc is now enabled by default only when compiled with internal checks
* webrtc responses to /api/ requests, including LZ4 compression
* fix for binary and text messages
* web_client_cache is now global
* unification of the internal web server API, for web requests, aclk request, webrtc requests
* more cleanup and unification of web client timings
* fixed compiler warnings
* update sent and received bytes
* eliminated of almost all big buffers in web client
* registry now uses the new json generation
* cookies are now an array; fixed redirects
* fix redirects, again
* write cookies directly to the header buffer, eliminating the need for cookie structures in web client
* reset the has_cookies flag
* gathered all web client cleanup to one function
* fixes redirects
* added summary.globals in /api/v2/data response
* ars to arc in /api/v2/data
* properly handle host impersonation
* set the context of mem.numa_nodes
|
|
* configure extent cache size
* workers can now execute up to 10 jobs in a run, boosting query prep and extent reads
* fix dispatched and executing counters
* boost to the max
* increase libuv worker threads
* query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical
* fix accounting of query prep
* inlining of time-grouping functions, to speed up queries with billions of points
* make switching based on a local const variable
* print one pending contexts loading message per iteration
* inlined store engine query API
* inlined storage engine data collection api
* inlined all storage engine query ops
* eliminate and inline data collection ops
* simplified query group-by
* more error handling
* optimized partial trimming of group-by queries
* preparative work to support multiple passes of group-by
* more preparative work to support multiple passes of group-by (accepts multiple group-by params)
* unified query timings
* unified query timings - weights endpoint
* query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries
* fix query memory accounting
* added summary.dimension[].pri and sorted summary.dimensions based on priority and then name
* limit max ACLK WEB response size to 30MB
* the response type should be text/plain
* more preparative work for multiple group-by passes
* create functions for generating group by keys, ids and names
* multiple group-by passes are now supported
* parse group-by options array also with an index
* implemented percentage-of-instance group by function
* family is now merged in multi-node contexts
* prevent uninitialized use
|
|
* /api/v2/weights, points key renamed to result
* /api/v2/weights, add node ids in response
* /api/v2/data remove NONZERO flag when all dimensions are zero and fix MIN/MAX grouping and statistics
* /api/v2/data expose view.dimensions.sts{}
* /api/v2 endpoints expose agents and additional info per node, that is needed to unify cloud responses
* /api/v2 nodes output now includes the duration of time spent per node
* jsonwrap view object renames and cleanup
* rework of the statistics returned by the query engine
* swagger work
* swagger work
* more swagger work
* updated swagger json
* added the remaining of the /api/v2 endpoints to swagger
* point.ar has been renamed point.arp
* updated weights endpoint
* fix compilation warnings
|
|
* query timestamps are now pre-determined and alignment on timestamps is guarranteed
* turn internal_fatal() to internal_error() to investigate the issue
* handle query when no data exist in the db
* check for non NULL dict when running dictionary garbage collect
* support API v2 requests via ACLK
* add nodes detailed information to /api/v2/nodes
* fixed keys and added dummy nodes for completeness
* added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node
* make sure replication does not double count charts that are already being replicated
* expose min and max in sts structures
* added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters
* respect trimming in percentage calculation; updated swagger
* api/v2/weights preparative work to support multi-node queries - still single node though
* multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data
* when passing the raw option, the query exposes the hidden dimensions
* fix compilation issues on older systems
* the query engine now calculates per dimension min, max, sum, count, anomaly count
* use the macro to calculate storage point anomaly rate
* weights endpoint exposing version hashes
* weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate
* query: expose RESET flag; do not add the same point multiple times to the aggregated point
* weights: more compact output
* weights requests can be interrupted
* all /api/v2 requests can be interrupted and timeout
* allow relative timestamps in weights
* fix macos compilation warnings
* Revert "fix macos compilation warnings"
This reverts commit 8a1d24e41e9b58de566ac59f0c4b1c465bcc0592.
* /api/v2/data group-by now works on dimension names, not ids
* /api/v2/weights does not query metrics without retention and new output format
* /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms
|
|
* expose the order of group by
* key renames in json wrapper v2
* added group by context and group by units
* added view_average_values
* fix for view_average_values when percentage is specified
* option group-by-labels is enabling the exposure of all the labels that are used for each of the final grouped dimensions
* when executing group by queries, allocate one dimension data at a time - not all of them
* respect hidden dimensions
* cancel running data query on socket error
* use poll to detect socket errors
* use POLLRDHUP to detect half closed connections
* make sure POLLRDHUP is available
* do not destroy aral-by-size arals
* completed documentation of /api/v2/data.
* moved min, max back to view; updated swagger yaml and json
* default format for /api/v2/data is json2
|
|
* max web request size to 64KB
* fix the request too big message
* increase max request reading tries to 100
* support for bigger web requests
* add "avg" as a shortcut for "average" to both group by aggregation and time aggregation; discard the last partial points of a query in play mode, up to max update every; group by hidden dimensions too
* better implementation for partial data trimming
* added group_by=selected to return only one dimension for all selected metrics
* fix acceptance of group_by=selected
* passing option "raw" disables partial data trimming
* remove obsolete option "plan"; use "debug"
* fix view.min and view.max calculation - there were 2 bugs: a) min and max were reset for every row and b) min and max were corrupted by GBC and AR printing
* per row annotations
* added time column to point annotations
* disable caching for /api/v2/contexts responses
* added api format json2 that returns an array for each points, having all the point values and annotations in them
* work on swagger about /api/v2
* prevent infinite loop
* cleanup and swagger work
* allow negative simple pattern expressions to work as expected
* do not lookup in the dictionary empty names
* garbage collect dictionaries
* make query_target allocate less aggressively; queries fill the remaining points with nulls
* reusable query ops to save memory on huge queries
* move parts of query plans into query ops to save query target memory
* remove storage engine from query metric tiers, to save memory, and recalculate it when it is needed
|
|
* preparation for /api/v2/contexts
* working /api/v2/contexts
* add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages
* minor fix
* query targegt now accurately counts hosts, contexts, instances, dimensions, metrics
* cleanup /api/v2/contexts
* full text search with /api/v2/contexts
* simple patterns now support the option to search ignoring case
* full text search API with /api/v2/q
* simple pattern execution optimization
* do not show q when not given
* full text search accounting
* separated /api/v2/nodes from /api/v2/contexts
* fix ssv queries for group_by
* count query instances queried and failed per context and host
* split rrdcontext.c to multiple files
* add query totals
* fix anomaly rate calculation; provide "ni" for indexing hosts
* do not generate zero valued members
* faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point
* fix typo when printing dimensions totals
* added option minify to remove spaces and newlines fron JSON output
* send instance ids and names when they differ
* do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe
* fix for the previous + renames and code cleanup
* when a dimension is filtered, include in the response all the other dimensions that are selectable
* do not add nodes that do not have retention in the current window
* move selection of dimensions to query_dimension_add(), instead of query_metric_add()
* increase the pre-processing capacity of queries
* generate instance fqdn ids and names only when they are needed
* provide detailed statistics about tiers retention, queries, points, update_every
* late allocation of query dimensions
* cleanup
* more cleanup
* support for annotations per displayed point, RESET and PARTIAL
* new type annotations
* if a chart is not linked to contexts and it is collected, link it when it is collected
* make ML run reentrant
* make ML rrdr query synchronous
* optimize replication memory allocation of replication_sort_entry
* change units to percentage, when requesting a coefficinet of variation, or a percentage query
* initialize replication before starting main threads
* properly decrement no room requests counter
* propagate the non-zero flag to group-by
* the same by avoiding the extra loop
* respect non-zero in all dimension arrays
* remove dictionary garbage collection from dictionary_entries() and dictionary_version()
* be more verbose when jv2 indexing is postponed
* prevent infinite loop
* use hidden dimensions even when dimensions pattern is unset
* traverse hosts using dictionaries
* fix dictionary unittests
|
|
* fundamentals for having /api/v2/ working
* use an atomic to prevent writing to internal pipe too much
* first attempt of multi-node, multi-context, multi-chart, multi-dimension queries
* v2 jsonwrap
* first attempt for group by
* cleaned up RRDR and fixed group by
* improvements to /api/v2/api
* query instance may be realloced, so pointers to it get invalid; solved memory leaks
* count of quried metrics in summary information
* provide detailed information about selected, excluded, queried and failed metrics for each entity
* select instances by fqdn too
* add timing information to json output
* link charts to rrdcontexts, if a query comes in and it is found unlinked
* calculate min, max, sum, average, volume, count per metric
* api v2 parameters naming
* renders alerts and units
* render machine_guid and node_id in all sections it is relevant
* unified keys
* group by now takes into account units and when there are multiple units involved, it creates a dimension per unit
* request and detailed are hidden behind an option
* summary includes only a flattened list of alerts
* alert counts per host and instance
* count of grouped metrics per dimension
* added contexts to summary
* added chart title
* added dimension priorities and chart type
* support for multiple group by at the same time
* minor fixes
* labels are now a tree
* keys uniformity
* filtering by alerts, both having a specific alert and having a specific alert in a specific status
* added scope of hosts and contexts
* count of instances on contexts and hosts
* make the api return valid responses even when the response contains no data
* calculate average and contribution % for every item in the summary
* fix compilation warnings
* fix compilation warnings - again
|
|
optimization (#14493)
* first work on standardizing json formatting
* renamed old grouping to time_grouping and added group_by
* add dummy functions to enable compilation
* buffer json api work
* jsonwrap opening with buffer_json_X() functions
* cleanup
* storage for quotes
* optimize buffer printing for both numbers and strings
* removed ; from define
* contexts json generation using the new json functions
* fix buffer overflow at unit test
* weights endpoint using new json api
* fixes to weights endpoint
* check buffer overflow on all buffer functions
* do synchronous queries for weights
* buffer_flush() now resets json state too
* content type typedef
* print double values that are above the max 64-bit value
* str2ndd() can now parse values above UINT64_MAX
* faster number parsing by avoiding double calculations as much as possible
* faster number parsing
* faster hex parsing
* accurate printing and parsing of double values, even for very large numbers that cannot fit in 64bit integers
* full printing and parsing without using library functions - and related unit tests
* added IEEE754 streaming capability to enable streaming of double values in hex
* streaming and replication to transfer all values in hex
* use our own str2ndd for set2
* remove subnormal check from ieee
* base64 encoding for numbers, instead of hex
* when increasing double precision, also make sure the fractional number printed is aligned to the wanted precision
* str2ndd_encoded() parses all encoding formats, including integers
* prevent uninitialized use
* /api/v1/info using the new json API
* Fix error when compiling with --disable-ml
* Remove redundant 'buffer_unittest' declaration
* Fix formatting
* Fix formatting
* Fix formatting
* fix buffer unit test
* apps.plugin using the new JSON API
* make sure the metrics registry does not accept negative timestamps
* do not allow pages with negative timestamps to be loaded from db files; do not accept pages with negative timestamps in the cache
* Fix more formatting
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* do not report dimensions that failed to be queried
* renamed SELECTED to QUERIED to have clarity on what it means
* fix wrong placement of continue
|
|
remove equality
|
|
* acquiring / releasing interface for metrics
* metrics registry statistics
* cleanup metrics registry by deleting metrics when they dont have retention anymore; do not double copy the data of pages to be flushed
* print the tier in retention summary
* Open files with buffered instead of direct I/O (test)
* added more metrics stats and fixed unittest
* rename writer functions to avoid confusion with refcounting
* do not release a metric that is not acquired
* Revert to use direct I/O on write -- use direct I/O on read as well
* keep track of ARAL overhead and add it to the memory chart
* aral full check via api
* Cleanup
* give names to ARALs and PGCs
* aral improvements
* restore query expansion to the future
* prefer higher resolution tier when switching plans
* added extent read statistics
* smoother joining of tiers at query engine
* fine tune aral max allocation size
* aral restructuring to hide its internals from the rest of netdata
* aral restructuring; addtion of defrag option to aral to keep the linked list sorted - enabled by default to test it
* fully async aral
* some statistics and cleanup
* fix infinite loop while calculating retention
* aral docs and defragmenting disabled by default
* fix bug and add optimization when defragmenter is not enabled
* aral stress test
* aral speed report and documentation
* added internal checks that all pages are full
* improve internal log about metrics deletion
* metrics registry uses one aral per partition
* metrics registry aral max size to 512 elements per page
* remove data_structures/README.md dependency
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* replication cancels pending queries on exit
* log when waiting for inflight queries
* when there are collected and not-collected metrics, use the context priority from the collected only
* Write metadata with a faster pace
* Remove journal file size limit and sync mode to 0 / Drop wal checkpoint for now
* Wrap in a big transaction remaining metadata writes (test 1)
* fix higher tiers when tiering iterations = 2
* dbengine always returns db-aligned points; query engine expands the queries by 2 points in every direction to have enough data for interpolation
* Wrap in a big transaction metadata writes (test 2)
* replication cancelling fix
* do not first and last entry in replication when the db has no retention
* fix internal check condition
* Increase metadata write batch size
* always apply error limit to dbengine logs
* Remove code that processes the obsolete health.db files
* cleanup in query.c
* do not allow queries to go beyond db boundaries
* prevent internal log for +1 delta in timestamp
* detect gap pages in conflicts
* double protection for gap injection in main cache
* Add checkpoint to prevent large WAL while running
Remove unused and duplicate functions
* do not allocate chart cache dir if not needed
* add more info to unittests
* revert query expansion to satisfy unittests
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* cache 100 pages for each size our tiers need
* smarter page caching
* account the caching structures
* dynamic max number of cached pages
* make variables const to ensure they are not changed
* make sure replication timestamps do not go to the future
* replication now sends chart and dimension states atomically; replication receivers ignores chart and dimension states when rbegin is also ignored
* make sure all pages are flushed on shutdown
* take into account empty points too
* when recalculating retention update first_time_s on metrics only when they are bigger
* Report the datafile number we use to recalculate retention
* Report the datafile number we use to recalculate retention
* rotate db at startup
* make query plans overlap
* Calculate properly first time s
* updated event labels
* negative page caching fix
* Atempt to create missing tables on query failure
* Atempt to create missing tables on query failure (part 2)
* negative page caching for all gaps, to eliminate jv2 scans
* Fix unittest
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* track memory footprint of Netdata
* track db modes alloc/ram/save/map
* track system info; track sender and receiver
* fixes
* more fixes
* track workers memory, onewayalloc memory; unify judyhs size estimation
* track replication structures and buffers
* Properly clear host RRDHOST_FLAG_METADATA_UPDATE flag
* flush the replication buffer every 1000 times the circular buffer is found empty
* dont take timestamp too frequently in sender loop
* sender buffers are not used by the same thread as the sender, so they were never recreated - fixed it
* free sender thread buffer on replication threads when replication is idle
* use the last sender flag as a timestamp of the last buffer recreation
* free cbuffer before reconnecting
* recreate cbuffer on every flush
* timings for journal v2 loading
* inlining of metric and cache functions
* aral likely/unlikely
* free left-over thread buffers
* fix NULL pointer dereference in replication
* free sender thread buffer on sender thread too
* mark ctx as used before flushing
* better logging on ctx datafiles closing
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* query planer weight calculation using long long
* adjust replication query ahead pipeline for smaller systems
* do not generate huge replication messages
* add message to indicate replication message was interrupted
* improved message
* max replication size 25% of sender buffer
* fix for last commit
* use less cache and smaller page sizes and fewer threads on 32-bits
* fix reserved libuv workers for 32bits
* fix detection of 32/64 bit
|
|
* allow extents to be merged for as long as possible
* do not block the event loop while recalculating retention due to datafile rotation
* buffers are incrementally cleaned up, every second, by just 1 entry
* fix order of commands
* remove newline
* measure cancelled extent read requests
* count all cancelled extent requests
* do not double count failed pages
* fixed cancelled name
* Fix error and warnings when compiling with --disable-dbengine
* when the timeframe is outside retention and whole query should fail
* do not mark as failed pages that have been loaded but have been skipped
* added chart to show cache memory calculation variables
* LONG_MAX for 32-bit compatibility
* fix cache size calculation on 32-bit
* fix cache size calculation on 32-bit - use unsinged long long
* fix compilation warnings on 32-bits
* fix another compilation warning on 32-bits
* fix compilation warnings on older 32-bit compilers
* fix compilation warnings on older 32-bit compilers - more of them
* disable ML threads joining
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* allow running multiple evictors and flushers
* flipped aggressive and critical evictions
* dont run more than 1 evictor
* switch to batch evictions when the size of the cache is critical
* remove batching of evictions
* dedup extent load pending requests
* accounting for merged extents
* always use double linked list
* add extent merging to the overall cache hit ratio
* support requeuing merged extents to higher priorities
* fix function name
* query planner now prefers higher tiers even when they miss some data at the end, which it fills from lower tiers; adding the option "plan" to jsonwrap now renders the query plan
* update statistics after every dimension completes
* use the retention of all tiers to calculate coverage per tier
* use the original window of the query for the planner
* give 2.5% befenit for each higher tier
* update cmd->priority so that it be requeued multiple times
* merged extent pages is a cache hit
* fixed dbegnine cache hit stats
|
|
size (#14247)
* allow the cache to grow when huge queries are running that exceed the size of the cache
* queue preloading of queries
* finalize prepared queries on timeout
|