Age | Commit message (Collapse) | Author |
|
* split rrdfunctions streaming and progress
* simplified internal inline functions API
* split rrdfunctions inflight management
* split rrd functions exporters
* renames
* base dyncfg structure
* config pluginsd
* intercept dyncfg function calls
* loading and saving of dyncfg metadata and data
* save metadata and payload to a single file; added code to update the plugins with jobs and saved configs
* basic working unit test
* added payload to functions execution
* removed old dyncfg code that is not needed any more
* more cleanup
* cleanup sender for functions with payload
* dyncfg functions are not exposed as functions
* remaining work to avoid indexing the \0 terminating character in dictionary keys
* added back old dyncfg plugins.d commands as noop, to allow plugins continue working
* working api; working streaming;
* updated plugins.d documentation
* aclk and http api requests share the same header parsing logic
* added source type internal
* fixed crashes
* added god mode for tests
* fixes
* fixed messages
* save host machine guids to configs
* cleaner manipulation of supported commands
* the functions event loop for external plugins can now process dyncfg requests
* unified internal and external plugins dyncfg API
* Netdata serves schema requests from /etc/netdata/schema.d and /var/lib/netdata/conf.d/schema.d
* cleanup and various fixes; fixed bug in previous dyncfg implementation on streaming that was sending the paylod in a way that allowed other streaming commands to be multiplexed
* internals go to a separate header file
* fix duplicate ACLK requests sent by aclk queue mechanism
* use fstat instead of stat
* working api
* plugin actions renamed to create and delete; dyncfg files are removed only from user actions
* prevent deadlock by using the react callback
* fix for string_strndupz()
* better dyncfg unittests
* more tests at the unittests
* properly detect dyncfg functions
* hide config functions from the UI
* tree response improvements
* send the initial update with payload
* determine tty using stdout, not stderr
* changes to statuses, cleanup and the code to bring all business logic into interception
* do not crash when the status is empty
* functions now propagate the source of the requests to plugins
* avoid warning about unused functions
* in the count at items for attention, do not count the orphan entries
* save source into dyncfg
* make the list null terminated
* fixed invalid comparison
* prevent memory leak on duplicated headers; log x-forwarded-for
* more unit tests
* added dyncfg unittests into the default unittests
* more unit tests and fixes
* more unit tests and fixes
* fix dictionary unittests
* config functions require admin access
|
|
|
|
* cgroups: update default filter to allow docker rootless containers
* add Rootless mode to docker readme
|
|
|
|
* track the progress of queries
* add query_progress in libnetdata Makefile.am
* add acl, response size and response code to the tracking
* define the required functions
* fix the last commit
* added /api/v2/progress?transaction=ID to report the progress of queries
* added function to report netdata-queries
* track hashtable additions
* when resusing a transaction, maintain the counter
* keep track of linked and indexing
* added X-Forwarded-Host and X-Forwarded-For to logs. X-Forwarded-For is also added in progress tracking
* report compact uuids to match logs; register the actual duration of the transaction
* added rowOptions to function; now web_client keeps track if it tracks progress or not
* add http request method to progress
* add tags per function; /api/vX/functions is now not protected
* compact the sanitization array
* split pluginsd_parser into multiple files
* cleanup keyword definitions
* code cleanup
* extracted rrd_collector to separate files
* added http access level to functions
* renamed access "all" to "any"
* implemented optional protection on functions
* add priority to functions, to allow the UI select the best function (lower priority) when the user has not selected a function
* added progress report from the plugins to netdata and from children to parents - untested
* added progress reporting in systemd-journal
* query timeout is now handled by evloop for external plugins
* propagate progress reports to children and plugins
* fix codeql warning
* adapt to cmake
* minor changes
* extend function timeout when progress is received; added streaming capability to propagate progress reports to parents and send progress requests to children
* revert change in dictionary.h
* add log when access level is invalid
* update access level of functions
* added logs when processing progress updates
* log when the deferred response is too big
* comment out sender progress to find the issue
* added missing newline in streaming progress reports
* propogate progress reports to functions
* fix logs
|
|
fix differ in signedness warn in cgroup
|
|
* cgroups collect pids/pids.current
* change dim name
|
|
|
|
rename functions
|
|
* split cgroups discovery to a separate file
* base for implementing cgroup-top
* working cgtop
* removed off_t casting
* update
* fix cpu_full_pressure_stall_time ctx
* check disk read/write dims is not NULL in functions
* fixes
* fix traffic
* fix codacy warnings
* upd
---------
Co-authored-by: ilyam8 <ilya@netdata.cloud>
|
|
* cache ctx in collection handle
* cache rd together with rda
* do not repeatedy call rrdcontexts - cached collection status; optimize pluginsd_acquire_dimension()
* fix unit tests
* do the absolutely minimum while updating timestamps, ensure validity during reading them
* when the stream is INTERPOLATED, buffer outstanding data for up to 50ms if the buffer contains DATA only.
* remove the spinlock from mrg
* remove the metric flags that are not used any more
* mrg writers can be different threads
* update first time when latest clean is also updated
* cleanup
* set hot page with a simple atomic operation
* sender sets chart slot for every chart
* work on senders without SLOT
* enable SLOT capability
* send slot at BEGIN when SLOT is enabled
* fix slot generation and parsing
* send slot while re-streaming
* use the sender capabilities, not the receiver
* cleanup
* add slots support to all chart and dimension related plugin commands
* fix condition
* fix calculation
* check sender capabilties
* assign slots in constructors
* we need the dimension slot at the DIMENSION keyword
* more debug info in case of dimension mismatch
* ensure the RRDDIM EXPOSED flag is multi-threaded and set it after the sender buffer has been committed, so that replication will not send dimensions prematurely
* fix renumbering on child restart
* reset rda caching when receiving a chart definition
* optimize pluginsd_end_v2()
* do not do zero sized allocations
* trust the chart slot id of the child
* cleanup charts on pluginsd thread exit
* better cleanup
* find the chart and put it in the slot, if it not already there
* move slots array to host
* initialize pluginsd slots properly
* add slots to replay begin; do not cleanup slots that dont belong to a chart
* cleanup on obsolete
* cleanup slots on obsoletions
* cleanup and renames about obsoletion
* rewrite obsolation service code to remove race conditions
* better service obsoletion log
* added debugging
* more debug
* exposed flag now compares versions
* removed debugging messages
* respolve conflicts
* fix replication check for unsent dimensions
|
|
|
|
|
|
Co-authored-by: ilyam8 <ilya@netdata.cloud>
|
|
|
|
This reverts commit 440bd51e08fdfa2a4daa191fb68643456028a753.
dbengine was still being used for non-zero tiers
even on non-dbengine modes.
|
|
|
|
* Storage engine.
* Host indexes to rrdb
* Move globals to rrdb
* Move storage_tiers_backfill to rrdb
* default_rrd_update_every to rrdb
* default_rrd_history_entries to rrdb
* gap_when_lost_iterations_above to rrdb
* rrdset_free_obsolete_time_s to rrdb
* libuv_worker_threads to rrdb
* ieee754_doubles to rrdb
* rrdhost_free_orphan_time_s to rrdb
* rrd_rwlock to rrdb
* localhost to rrdb
* rm extern from func decls
* mv rrd macro under rrd.h
* default_rrdeng_page_cache_mb to rrdb
* default_rrdeng_extent_cache_mb to rrdb
* db_engine_journal_check to rrdb
* default_rrdeng_disk_quota_mb to rrdb
* default_multidb_disk_quota_mb to rrdb
* multidb_ctx to rrdb
* page_type_size to rrdb
* tier_page_size to rrdb
* No storage_engine_id in rrdim functions
* storage_engine_id is provided by st
* Update to fix merge conflict.
* Update field name
* Remove unnecessary macros from rrd.h
* Rm unused type decls
* Rm duplicate func decls
* make internal function static
* Make the rest of public dbengine funcs accept a storage_instance.
* No more rrdengine_instance :)
* rm rrdset_debug from rrd.h
* Use rrdb to access globals in ML and ACLK
Missed due to not having the submodules in the
worktree.
* rm total_number
* rm RRDVAR_TYPE_TOTAL
* rm unused inline
* Rm names from typedef'd enums
* rm unused header include
* Move include
* Rm unused header include
* s/rrdhost_find_or_create/rrdhost_get_or_create/g
* s/find_host_by_node_id/rrdhost_find_by_node_id/
Also, remove duplicate definition in rrdcontext.c
* rm macro used only once
* rm macro used only once
* Reduce rrd.h api by moving funcs into a collector specific utils header
* Remove unused func
* Move parser specific function out of rrd.h
* return storage_number instead of void pointer
* move code related to rrd initialization out of rrdhost.c
* Remove tier_grouping from rrdim_tier
Saves 8 * storage_tiers bytes per dimension.
* Fix rebase
* s/rrd_update_every/update_every/
* Mark functions as static and constify args
* Add license notes and file to build systems.
* Remove remaining non-log/config mentions of memory mode
* Move rrdlabels api to separate file.
Also, move localhost functions that loads
labels outside of database/ and into daemon/
* Remove function decl in rrd.h
* merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir
* Do not expose internal function from rrd.h
* Rm NETDATA_RRD_INTERNALS
Only one function decl is covered. We have more
database internal functions that we currently
expose for no good reason. These will be placed
in a separate internal header in follow up PRs.
* Add license note
* Include libnetdata.h instead of aral.h
* Use rrdb to access localhost
* Fix builds without dbengine
* Add header to build system files
* Add rrdlabels.h to build systems
* Move func def from rrd.h to rrdhost.c
* Fix macos build
* Rm non-existing function
* Rebase master
* Define buffer length macro in ad_charts.
* Fix FreeBSD builds.
* Mark functions static
* Rm func decls without definitions
* Rebase master
* Rebase master
* Properly initialize value of storage tiers.
* Fix build after rebase.
|
|
Co-authored-by: ilyam8 <ilya@netdata.cloud>
|
|
|
|
* Adjust buffers to prevent overflow
* Adjust strncat parameter to prevent buffer overflow
|
|
|
|
* initial webrtc setup
* missing files
* rewrite of webrtc integration
* initialization and cleanup of webrtc connections
* make it compile without libdatachannel
* add missing webrtc_initialize() function when webrtc is not enabled
* make c++17 optional
* add build/m4/ax_compiler_vendor.m4
* add ax_cxx_compile_stdcxx.m4
* added new m4 files to makefile.am
* id all webrtc connections
* show warning when webrtc is disabled
* fixed message
* moved all webrtc error checking inside webrtc.cpp
* working webrtc connection establishment and cleanup
* remove obsolete code
* rewrote webrtc code in C to remove dependency for c++17
* fixed left-over reference
* detect binary and text messages
* minor fix
* naming of webrtc threads
* added webrtc configuration
* fix for thread_get_name_np()
* smaller web_client memory footprint
* universal web clients cache
* free web clients every 100 uses
* webrtc is now enabled by default only when compiled with internal checks
* webrtc responses to /api/ requests, including LZ4 compression
* fix for binary and text messages
* web_client_cache is now global
* unification of the internal web server API, for web requests, aclk request, webrtc requests
* more cleanup and unification of web client timings
* fixed compiler warnings
* update sent and received bytes
* eliminated of almost all big buffers in web client
* registry now uses the new json generation
* cookies are now an array; fixed redirects
* fix redirects, again
* write cookies directly to the header buffer, eliminating the need for cookie structures in web client
* reset the has_cookies flag
* gathered all web client cleanup to one function
* fixes redirects
* added summary.globals in /api/v2/data response
* ars to arc in /api/v2/data
* properly handle host impersonation
* set the context of mem.numa_nodes
|
|
|
|
|
|
|
|
* preparation for /api/v2/contexts
* working /api/v2/contexts
* add anomaly rate information in all statistics; when sum-count is requested, return sums and counts instead of averages
* minor fix
* query targegt now accurately counts hosts, contexts, instances, dimensions, metrics
* cleanup /api/v2/contexts
* full text search with /api/v2/contexts
* simple patterns now support the option to search ignoring case
* full text search API with /api/v2/q
* simple pattern execution optimization
* do not show q when not given
* full text search accounting
* separated /api/v2/nodes from /api/v2/contexts
* fix ssv queries for group_by
* count query instances queried and failed per context and host
* split rrdcontext.c to multiple files
* add query totals
* fix anomaly rate calculation; provide "ni" for indexing hosts
* do not generate zero valued members
* faster calculation of anomaly rate; by just summing integers for each db points and doing math once for every generated point
* fix typo when printing dimensions totals
* added option minify to remove spaces and newlines fron JSON output
* send instance ids and names when they differ
* do not add in query target dimensions, instances, contexts and hosts for which there is no retention in the current timeframe
* fix for the previous + renames and code cleanup
* when a dimension is filtered, include in the response all the other dimensions that are selectable
* do not add nodes that do not have retention in the current window
* move selection of dimensions to query_dimension_add(), instead of query_metric_add()
* increase the pre-processing capacity of queries
* generate instance fqdn ids and names only when they are needed
* provide detailed statistics about tiers retention, queries, points, update_every
* late allocation of query dimensions
* cleanup
* more cleanup
* support for annotations per displayed point, RESET and PARTIAL
* new type annotations
* if a chart is not linked to contexts and it is collected, link it when it is collected
* make ML run reentrant
* make ML rrdr query synchronous
* optimize replication memory allocation of replication_sort_entry
* change units to percentage, when requesting a coefficinet of variation, or a percentage query
* initialize replication before starting main threads
* properly decrement no room requests counter
* propagate the non-zero flag to group-by
* the same by avoiding the extra loop
* respect non-zero in all dimension arrays
* remove dictionary garbage collection from dictionary_entries() and dictionary_version()
* be more verbose when jv2 indexing is postponed
* prevent infinite loop
* use hidden dimensions even when dimensions pattern is unset
* traverse hosts using dictionaries
* fix dictionary unittests
|
|
optimization (#14493)
* first work on standardizing json formatting
* renamed old grouping to time_grouping and added group_by
* add dummy functions to enable compilation
* buffer json api work
* jsonwrap opening with buffer_json_X() functions
* cleanup
* storage for quotes
* optimize buffer printing for both numbers and strings
* removed ; from define
* contexts json generation using the new json functions
* fix buffer overflow at unit test
* weights endpoint using new json api
* fixes to weights endpoint
* check buffer overflow on all buffer functions
* do synchronous queries for weights
* buffer_flush() now resets json state too
* content type typedef
* print double values that are above the max 64-bit value
* str2ndd() can now parse values above UINT64_MAX
* faster number parsing by avoiding double calculations as much as possible
* faster number parsing
* faster hex parsing
* accurate printing and parsing of double values, even for very large numbers that cannot fit in 64bit integers
* full printing and parsing without using library functions - and related unit tests
* added IEEE754 streaming capability to enable streaming of double values in hex
* streaming and replication to transfer all values in hex
* use our own str2ndd for set2
* remove subnormal check from ieee
* base64 encoding for numbers, instead of hex
* when increasing double precision, also make sure the fractional number printed is aligned to the wanted precision
* str2ndd_encoded() parses all encoding formats, including integers
* prevent uninitialized use
* /api/v1/info using the new json API
* Fix error when compiling with --disable-ml
* Remove redundant 'buffer_unittest' declaration
* Fix formatting
* Fix formatting
* Fix formatting
* fix buffer unit test
* apps.plugin using the new JSON API
* make sure the metrics registry does not accept negative timestamps
* do not allow pages with negative timestamps to be loaded from db files; do not accept pages with negative timestamps in the cache
* Fix more formatting
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
Check MMAP failure. If fail to create semaphore, clear the shm_cgroup_ebpf.header to avoid crash during shutdown on thread cleanup
|
|
|
|
* parallel initialization of tiers
* do not spawn multiple dbengine event loops
* user configurable dbengine parallel initialization
* size netdata based on the real cpu cores available on the system netdata runs, not on the system monitored
* user configurable system cpus
* move cpuset parsing to os.c/.h
* fix replication of misaligned chart dimensions
* give a different path to each tier thread
* statically allocate the path into the initialization structure
* use aral for reusing dbengine pages
* dictionaries uses ARAL for fixed sized values
* fix compilation without internal checks
* journal v2 index uses aral
* test to see judy allocations
* judy allocations using aral
* Add config option to select if dbengine will use direct I/O (default is yes)
* V1 journafiles will use uv_fs_read instead of mmap (respect the direct I/O setting)
* Remove sqlite3IsMemdb as it is unused
* Fix compilation error when --disable-dbengine is used
* use aral for dbengine work_cmds
* changed aral API to support new features
* pgc and mrg aral overheads
* rrdeng opcodes using aral
* better structuring and naming
* dbegnine query handles using aral
* page descriptors using aral
* remove obsolete linking
* extent io descriptors using aral
* aral keeps one last page alive
* add missing return value
* added judy aral overhead
* pdc now uses aral
* page_details now use aral
* epdl and deol using aral - make sure ARALs are initialized before spawning the event loop
* remove unused linking
* pgc now uses one aral per partition
* aral measure maximum allocation queue
* aral to allocate pages in parallel
* aral parallel pages allocation when needed
* aral cleanup
* track page allocation and page population separately
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
|
|
* count open cache pages refering to datafile
* eliminate waste flush attempts
* remove eliminated variable
* journal v2 scanning split functions
* avoid locking open cache for a long time while migrating to journal v2
* dont acquire datafile for the loop; disable thread cancelability while a query is running
* work on datafile acquiring
* work on datafile deletion
* work on datafile deletion again
* logs of dbengine should start with DBENGINE
* thread specific key for queries to check if a query finishes without a finalize
* page_uuid is not used anymore
* Cleanup judy traversal when building new v2
Remove not needed calls to metric registry
* metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent
* disable checks for invalid time-ranges
* Remove type from page details
* report scanning time
* remove infinite loop from datafile acquire for deletion
* remove infinite loop from datafile acquire for deletion again
* trace query handles
* properly allocate array of dimensions in replication
* metrics cleanup
* metrics registry uses arrayalloc
* arrayalloc free should be protected by lock
* use array alloc in page cache
* journal v2 scanning fix
* datafile reference leaking hunding
* do not load metrics of future timestamps
* initialize reasons
* fix datafile reference leak
* do not load pages that are entirely overlapped by others
* expand metric retention atomically
* split replication logic in initialization and execution
* replication prepare ahead queries
* replication prepare ahead queries fixed
* fix replication workers accounting
* add router active queries chart
* restore accounting of pages metadata sources; cleanup replication
* dont count skipped pages as unroutable
* notes on services shutdown
* do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file
* do not add pages we dont need to pdc
* time in range re-work to provide info about past and future matches
* finner control on the pages selected for processing; accounting of page related issues
* fix invalid reference to handle->page
* eliminate data collection handle of pg_lookup_next
* accounting for queries with gaps
* query preprocessing the same way the processing is done; cache now supports all operations on Judy
* dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3)
* get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query
* finner gaps calculation; accounting of overlapping pages in queries
* fix gaps accounting
* move datafile deletion to worker thread
* tune libuv workers and thread stack size
* stop netdata threads gradually
* run indexing together with cache flush/evict
* more work on clean shutdown
* limit the number of pages to evict per run
* do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction
* economies on flags for smaller page footprint; cleanup and renames
* eviction moves referenced pages to the end of the queue
* use murmur hash for indexing partition
* murmur should be static
* use more indexing partitions
* revert number of partitions to number of cpus
* cancel threads first, then stop services
* revert default thread stack size
* dont execute replication requests of disconnected senders
* wait more time for services that are exiting gradually
* fixed last commit
* finer control on page selection algorithm
* default stacksize of 1MB
* fix formatting
* fix worker utilization going crazy when the number is rotating
* avoid buffer full due to replication preprocessing of requests
* support query priorities
* add count of spins in spinlock when compiled with netdata internal checks
* remove prioritization from dbengine queries; cache now uses mutexes for the queues
* hot pages are now in sections judy arrays, like dirty
* align replication queries to optimal page size
* during flushing add to clean and evict in batches
* Revert "during flushing add to clean and evict in batches"
This reverts commit 8fb2b69d068499eacea6de8291c336e5e9f197c7.
* dont lock clean while evicting pages during flushing
* Revert "dont lock clean while evicting pages during flushing"
This reverts commit d6c82b5f40aeba86fc7aead062fab1b819ba58b3.
* Revert "Revert "during flushing add to clean and evict in batches""
This reverts commit ca7a187537fb8f743992700427e13042561211ec.
* dont cross locks during flushing, for the fastest flushes possible
* low-priority queries load pages synchronously
* Revert "low-priority queries load pages synchronously"
This reverts commit 1ef2662ddcd20fe5842b856c716df134c42d1dc7.
* cache uses spinlock again
* during flushing, dont lock the clean queue at all; each item is added atomically
* do smaller eviction runs
* evict one page at a time to minimize lock contention on the clean queue
* fix eviction statistics
* fix last commit
* plain should be main cache
* event loop cleanup; evictions and flushes can now happen concurrently
* run flush and evictions from tier0 only
* remove not needed variables
* flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time
* added worker jobs in timer to find the slow part of it
* support fast eviction of pages when all_of_them is set
* revert default thread stack size
* bypass event loop for dispatching read extent commands to workers - send them directly
* Revert "bypass event loop for dispatching read extent commands to workers - send them directly"
This reverts commit 2c08bc5bab12881ae33bc73ce5dea03dfc4e1fce.
* cache work requests
* minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors
* publish flushed pages to open cache in the thread pool
* prevent eventloop requests from getting stacked in the event loop
* single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c
* more rrdengine.c cleanup
* enable db rotation
* do not log when there is a filter
* do not run multiple migration to journal v2
* load all extents async
* fix wrong paste
* report opcodes waiting, works dispatched, works executing
* cleanup event loop memory every 10 minutes
* dont dispatch more work requests than the number of threads available
* use the dispatched counter instead of the executing counter to check if the worker thread pool is full
* remove UV_RUN_NOWAIT
* replication to fill the queues
* caching of extent buffers; code cleanup
* caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation
* single transaction wal
* synchronous flushing
* first cancel the threads, then signal them to exit
* caching of rrdeng query handles; added priority to query target; health is now low prio
* add priority to the missing points; do not allow critical priority in queries
* offload query preparation and routing to libuv thread pool
* updated timing charts for the offloaded query preparation
* caching of WALs
* accounting for struct caches (buffers); do not load extents with invalid sizes
* protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset
* also check if the expanded before is not the chart later updated time
* also check if the expanded before is not after the wall clock time of when the query started
* Remove unused variable
* replication to queue less queries; cleanup of internal fatals
* Mark dimension to be updated async
* caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol)
* disable pgc stress test, under an ifdef
* disable mrg stress test under an ifdef
* Mark chart and host labels, host info for async check and store in the database
* dictionary items use arrayalloc
* cache section pages structure is allocated with arrayalloc
* Add function to wakeup the aclk query threads and check for exit
Register function to be called during shutdown after signaling the service to exit
* parallel preparation of all dimensions of queries
* be more sensitive to enable streaming after replication
* atomically finish chart replication
* fix last commit
* fix last commit again
* fix last commit again again
* fix last commit again again again
* unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication
* do not cancel start streaming; use high priority queries when we have locked chart data collection
* prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered
* opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue
* Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore
* Fix bad memory allocation / cleanup
* Cleanup ACLK sync initialization (part 1)
* Don't update metric registry during shutdown (part 1)
* Prevent crash when dashboard is refreshed and host goes away
* Mark ctx that is shutting down.
Test not adding flushed pages to open cache as hot if we are shutting down
* make ML work
* Fix compile without NETDATA_INTERNAL_CHECKS
* shutdown each ctx independently
* fix completion of quiesce
* do not update shared ML charts
* Create ML charts on child hosts.
When a parent runs a ML for a child, the relevant-ML charts
should be created on the child host. These charts should use
the parent's hostname to differentiate multiple parents that might
run ML for a child.
The only exception to this rule is the training/prediction resource
usage charts. These are created on the localhost of the parent host,
because they provide information specific to said host.
* check new ml code
* first save the database, then free all memory
* dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db
* Cleanup metadata thread (part 2)
* increase refcount before dispatching prep command
* Do not try to stop anomaly detection threads twice.
A separate function call has been added to stop anomaly detection threads.
This commit removes the left over function calls that were made
internally when a host was being created/destroyed.
* Remove allocations when smoothing samples buffer
The number of dims per sample is always 1, ie. we are training and
predicting only individual dimensions.
* set the orphan flag when loading archived hosts
* track worker dispatch callbacks and threadpool worker init
* make ML threads joinable; mark ctx having flushing in progress as early as possible
* fix allocation counter
* Cleanup metadata thread (part 3)
* Cleanup metadata thread (part 4)
* Skip metadata host scan when running unittest
* unittest support during init
* dont use all the libuv threads for queries
* break an infinite loop when sleep_usec() is interrupted
* ml prediction is a collector for several charts
* sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit
* worker_unregister() in netdata threads cleanup
* moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h
* fixed ML issues
* removed engine2 directory
* added dbengine2 files in CMakeLists.txt
* move query plan data to query target, so that they can be exposed by in jsonwrap
* uniform definition of query plan according to the other query target members
* event_loop should be in daemon, not libnetdata
* metric_retention_by_uuid() is now part of the storage engine abstraction
* unify time_t variables to have the suffix _s (meaning: seconds)
* old dbengine statistics become "dbengine io"
* do not enable ML resource usage charts by default
* unify ml chart families, plugins and modules
* cleanup query plans from query target
* cleanup all extent buffers
* added debug info for rrddim slot to time
* rrddim now does proper gap management
* full rewrite of the mem modes
* use library functions for madvise
* use CHECKSUM_SZ for the checksum size
* fix coverity warning about the impossible case of returning a page that is entirely in the past of the query
* fix dbengine shutdown
* keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently
* fine tune cache evictions
* dont initialize health if the health service is not running - prevent crash on shutdown while children get connected
* rename AS threads to ACLK[hostname]
* prevent re-use of uninitialized memory in queries
* use JulyL instead of JudyL for PDC operations - to test it first
* add also JulyL files
* fix July memory accounting
* disable July for PDC (use Judy)
* use the function to remove datafiles from linked list
* fix july and event_loop
* add july to libnetdata subdirs
* rename time_t variables that end in _t to end in _s
* replicate when there is a gap at the beginning of the replication period
* reset postponing of sender connections when a receiver is connected
* Adjust update every properly
* fix replication infinite loop due to last change
* packed enums in rrd.h and cleanup of obsolete rrd structure members
* prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests()
* void unused variable
* void unused variables
* fix indentation
* entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps
* macros to caclulate page entries by time and size
* prevent statsd cleanup crash on exit
* cleanup health thread related variables
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: vkalintiris <vasilis@netdata.cloud>
|
|
* Remove calls to rrdset_next().
* Rm checks plugin
* Update documentantion
* Call rrdset_next from within rrdset_done
This wraps up the removal of rrdset_next from internal collectors, which
removes a lot of unecessary code and the need for if/else clauses in
every place.
The pluginsd parser is the only component that calls rrdset_next*()
functions because it's not strictly speaking a collector but more of a
collector manager/proxy.
With the current changes it's possible to simplify the API we expose
from RRD significantly, but this will be follow-up work in the future.
* Remove stale reference to checks.plugin
* Fix RRD unit test
rrdset_next is not meant to be called from these tests.
* Fix db engine unit test.
* Schedule rrdset_next when we have completed at least one collection.
* Mark chart creation clauses as unlikely.
* Add missing brace to fix FreeBSD plugin.
|
|
about specific charts (#13720)
* function renames and code cleanup in popen.c; no actual code changes
* netdata popen() now opens both child process stdin and stdout and returns FILE * for both
* pass both input and output to parser structures
* updated rrdset to call custom functions
* RRDSET FUNCTION leading calls for both sync and async operation
* put RRDSET functions to a separate file
* added format and timeout at function definition
* support for synchronous (internal plugins) and asynchronous (external plugins and children) functions
* /api/v1/function endpoint
* functions are now attached to the host and there is a dictionary view per chart
* functions implemented at plugins.d
* remove the defer until keyword hook from plugins.d when it is done
* stream sender implementation of functions
* sanitization of all functions so that certain characters are only allowed
* strictier sanitization
* common max size
* 1st working plugins.d example
* always init inflight dictionary
* properly destroy dictionaries to avoid parallel insertion of items
* add more debugging on disconnection reasons
* add more debugging on disconnection reasons again
* streaming receiver respects newlines
* dont use the same fp for both streaming receive and send
* dont free dbengine memory with internal checks
* make sender proceed in the buffer
* added timing info and garbage collection at plugins.d
* added info about routing nodes
* added info about routing nodes with delay
* added more info about delays
* added more info about delays again
* signal sending thread to wake up
* streaming version labeling and commented code to support capabilities
* added functions to /api/v1/data, /api/v1/charts, /api/v1/chart, /api/v1/info
* redirect top output to stdout
* address coverity findings
* fix resource leaks of popen
* log attempts to connect to individual destinations
* better messages
* properly parse destinations
* try to find a function from the most matching to the least matching
* log added streaming destinations
* rotate destinations bypassing a node in the middle that does not accept our connection
* break the loops properly
* use typedef to define callbacks
* capabilities negotiation during streaming
* functions exposed upstream based on capabilities; compression disabled per node persisting reconnects; always try to connect with all capabilities
* restore functionality to lookup functions
* better logging of capabilities
* remove old versions from capabilities when a newer version is there
* fix formatting
* optimization for plugins.d rrdlabels to avoid creating and destructing dictionaries all the time
* delayed health initialization for rrddim and rrdset
* cleanup health initialization
* fix for popen() not returning the right value
* add health worker jobs for initializing rrdset and rrddim
* added content type support for functions; apps.plugin permanent function to display all the processes
* fixes for functions parameters parsing in apps.plugin
* fix for process matching in apps.plugiin
* first working function for apps.plugin
* Dashboard ACL is disabled for functions; Function errors are all in JSON format
* apps.plugin function processes returns json table
* use json_escape_string() to escape message
* fix formatting
* apps.plugin exposes all its metrics to function processes
* fix json formatting when filtering out some rows
* reopen the internal pipe of rrdpush in case of errors
* misplaced statement
* do not use buffer->len
* support for GLOBAL functions (functions that are not linked to a chart
* added /api/v1/functions endpoint; removed format from the FUNCTIONS api;
* swagger documentation about the new api end points
* added plugins.d documentation about functions
* never re-close a file
* remove uncessesary ifdef
* fixed issues identified by codacy
* fix for null label value
* make edit-config copy-and-paste friendly
* Revert "make edit-config copy-and-paste friendly"
This reverts commit 54500c0e0a97f65a0c66c4d34e966f6a9056698e.
* reworked sender handshake to fix coverity findings
* timeout is zero, for both send_timeout() and recv_timeout()
* properly detect that parent closed the socket
* support caching of function responses; limit function response to 10MB; added protection from malformed function responses
* disabled excessive logging
* added units to apps.plugin function processes and normalized all values to be human readable
* shorter field names
* fixed issues reported
* fixed apps.plugin error response; tested that pluginsd can properly handle faulty responses
* use double linked list macros for double linked list management
* faster apps.plugin function printing by minimizing file operations
* added memory percentage
* fix compatibility issues with older compilers and FreeBSD
* rrdpush sender code cleanup; rrhost structure cleanup from sender flags and variables;
* fix letftover variable in ifdef
* apps.plugin: do not call detach from the thread; exit immediately when input is broken
* exclude AR charts from health
* flush cleaner; prefer sender output
* clarity
* do not fill the cbuffer if not connected
* fix
* dont enabled host->sender if streaming is not enabled; send host label updates to parent;
* functions are only available through ACLK
* Prepared statement reports only in dev mode
* fix AR chart detection
* fix for streaming not being enabling itself
* more cleanup of sender and receiver structures
* moved read-only flags and configuration options to rrdhost->options
* fixed merge with master
* fix for incomplete rename
* prevent service thread from working on charts that are being collected
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* rrdset - in progress
* rrdset optimal constructor; rrdset conflict
* rrdset final touches
* re-organization of rrdset object members
* prevent use-after-free
* dictionary dfe supports also counting of iterations
* rrddim managed by dictionary
* rrd.h cleanup
* DICTIONARY_ITEM now is referencing actual dictionary items in the code
* removed rrdset linked list
* Revert "removed rrdset linked list"
This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.
* removed rrdset linked list
* added comments
* Switch chart uuid to static allocation in rrdset
Remove unused functions
* rrdset_archive() and friends...
* always create rrdfamily
* enable ml_free_dimension
* rrddim_foreach done with dfe
* most custom rrddim loops replaced with rrddim_foreach
* removed accesses to rrddim->dimensions
* removed locks that are no longer needed
* rrdsetvar is now managed by the dictionary
* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853
* conflict callback of rrdsetvar now properly checks if it has to reset the variable
* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM
* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe
* dictionary walkthrough callbacks get dictionary acquired items
* dictionary reference counters that can be dupped from zero
* added advanced functions for get and del
* rrdvar managed by dictionaries
* thread safety for rrdsetvar
* faster rrdvar initialization
* rrdvar string lengths should match in all add, del, get functions
* rrdvar internals hidden from the rest of the world
* rrdvar is now acquired throughout netdata
* hide the internal structures of rrdsetvar
* rrdsetvar is now acquired through out netdata
* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata
* better error handling
* dont create variables if not initialized for health
* dont create variables if not initialized for health again
* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items
* type checking on acquired objects
* rrdcalc renaming of functions
* type checking for rrdfamily_acquired
* rrdcalc managed by dictionaries
* rrdcalc double free fix
* host rrdvars is always need |