Age | Commit message (Collapse) | Author |
|
stale plugins; streaming improvements (#15113)
* add information about streaming connections to /api/v2/nodes; reset defer time when sender or receivers connect or disconnect
* make each streaming destination respect its SSL settings
* to not send SSL traffic over non-SSL connection
* keep track of outgoing streaming connection attempts
* retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ
* Revert "retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ"
This reverts commit 14c858677c6f2d3b08c94f298e2f45ecdb74c801.
* cleanup SSL connections properly
* initialize SSL in rpt before takeover
* sender should free SSL when talking to a non-SSL destination
* do not shutdown SSL when receiver exits
* restore operation of SIGCHLD when the reaper is not enabled
* create an fgets function that checks for data and times out
* work on error handling of plugins exiting
* remove newlines from logs
* global call to waitid(), caching the result for netdata_pclose() to process
* receiver tid
* parser timeouts in 2 minutes instead of 10
* fix crash when UUID is NULL in SQLite
* abstract sqlite3 parsing for uuid and text
* write proper ssl errors on read and write
* fix for SSL_ERROR_WANT_RETRY_VERIFY
* SSL WANT per function
* unified SSL error logging
* fix compilation warning
* additional logging about parser cleanup
* streaming parser should call the pluginsd parser cleanup
* SSL error handling work
* SSL initialization unification
* check for pending data when receiving SSL response with timeout
* macro to check if an SSL connection has been established
* remove SSL_pending()
* check for SSL macros
* use SSL_peek() to find if there is a response
* SSL renames
* more SSL renames & cleanup
* rrdpush ssl connection function
* abstract all SSL functions into security.c
* keep track of SSL connections and always attempt to use SSL read/write when on SSL connection
* signal openssl to skip certificate validation when configured to do so
* better SSL error handling and logging
* SSL code cleanup
* SSL retry on SSL_connect and SSL_accept
* SSL provide default return value for old compilers
* SSL read/write functions emulate system read/write functions
* fix receive/send timeout and switch from SSL_peek() to SSL_pending()
* remove SSL_pending()
* removed sender auto-retry and debug info for initial recevier response
* ssl skip certificate verification config for web server
* ssl errors log ip and port of the peer
* keep ssl with web_client for its whole lifetime
* thread safe socket peers to text
* use error_limit() for common ssl errors
* cleanup
* more cleanup
* coverity fixes
* ssl error logs include both local and remote ip/port info
* remove obsolete code
|
|
* generate and store an event_hash_id
* transmit to cloud
* transmit to the cloud
|
|
compatibility (#15126)
* percentage of group is now aggregatable at cloud across multiple nodes
* do not break backwards compatibility with percentage-of-instance
* calculate the percentage when percentage-of-instance is requested
* increase capability version
|
|
Free context
|
|
|
|
* fix memleak
* minor simplify
|
|
zlib compulsory
|
|
|
|
Increase api/v2 version
|
|
* pull aclk schemas
* resolve capas
* handle checkpoints and removed from health
* build with disable-cloud
* codacy 1
* misc changes
* one more char in hash
* free buffer
* change topic
* misc fixes
* skip removed alert variables
* change hash functions
* use create and destroy for compatibility with older openssl
|
|
* initial webrtc setup
* missing files
* rewrite of webrtc integration
* initialization and cleanup of webrtc connections
* make it compile without libdatachannel
* add missing webrtc_initialize() function when webrtc is not enabled
* make c++17 optional
* add build/m4/ax_compiler_vendor.m4
* add ax_cxx_compile_stdcxx.m4
* added new m4 files to makefile.am
* id all webrtc connections
* show warning when webrtc is disabled
* fixed message
* moved all webrtc error checking inside webrtc.cpp
* working webrtc connection establishment and cleanup
* remove obsolete code
* rewrote webrtc code in C to remove dependency for c++17
* fixed left-over reference
* detect binary and text messages
* minor fix
* naming of webrtc threads
* added webrtc configuration
* fix for thread_get_name_np()
* smaller web_client memory footprint
* universal web clients cache
* free web clients every 100 uses
* webrtc is now enabled by default only when compiled with internal checks
* webrtc responses to /api/ requests, including LZ4 compression
* fix for binary and text messages
* web_client_cache is now global
* unification of the internal web server API, for web requests, aclk request, webrtc requests
* more cleanup and unification of web client timings
* fixed compiler warnings
* update sent and received bytes
* eliminated of almost all big buffers in web client
* registry now uses the new json generation
* cookies are now an array; fixed redirects
* fix redirects, again
* write cookies directly to the header buffer, eliminating the need for cookie structures in web client
* reset the has_cookies flag
* gathered all web client cleanup to one function
* fixes redirects
* added summary.globals in /api/v2/data response
* ars to arc in /api/v2/data
* properly handle host impersonation
* set the context of mem.numa_nodes
|
|
remove RX_MSGLEN_MAX
|
|
* change docusaurus admonitions to our style of admonitions
* Apply suggestions from code review
Co-authored-by: Shyam Sreevalsan <shyam@netdata.cloud>
---------
Co-authored-by: Shyam Sreevalsan <shyam@netdata.cloud>
|
|
* configure extent cache size
* workers can now execute up to 10 jobs in a run, boosting query prep and extent reads
* fix dispatched and executing counters
* boost to the max
* increase libuv worker threads
* query prep always get more prio than extent reads; stop processing in batch when dbengine is queue is critical
* fix accounting of query prep
* inlining of time-grouping functions, to speed up queries with billions of points
* make switching based on a local const variable
* print one pending contexts loading message per iteration
* inlined store engine query API
* inlined storage engine data collection api
* inlined all storage engine query ops
* eliminate and inline data collection ops
* simplified query group-by
* more error handling
* optimized partial trimming of group-by queries
* preparative work to support multiple passes of group-by
* more preparative work to support multiple passes of group-by (accepts multiple group-by params)
* unified query timings
* unified query timings - weights endpoint
* query target is no longer a static thread variable - there is a list of cached query targets, each of which of freed every 1000 queries
* fix query memory accounting
* added summary.dimension[].pri and sorted summary.dimensions based on priority and then name
* limit max ACLK WEB response size to 30MB
* the response type should be text/plain
* more preparative work for multiple group-by passes
* create functions for generating group by keys, ids and names
* multiple group-by passes are now supported
* parse group-by options array also with an index
* implemented percentage-of-instance group by function
* family is now merged in multi-node contexts
* prevent uninitialized use
|
|
* capa apiv2
* build new cancellation proto
* commited on wrong branch :D Revert "build new cancellation proto"
This reverts commit 8290422de4074b9c383e29ad10e041668056679a.
* use common source of truth for capas in apiv2
* fix for possible races
|
|
Update privacy link
|
|
* Schedule node info to the cloud after child connection
* Remove debug code
* Schedule localhost node info within 5 seconds of startup. If no children are detected Or a child connects (switch to immediate localhost node info update)
|
|
* query timestamps are now pre-determined and alignment on timestamps is guarranteed
* turn internal_fatal() to internal_error() to investigate the issue
* handle query when no data exist in the db
* check for non NULL dict when running dictionary garbage collect
* support API v2 requests via ACLK
* add nodes detailed information to /api/v2/nodes
* fixed keys and added dummy nodes for completeness
* added nodes_hard_hash, alerts_hard_hash, alerts_soft_hash; started building a nodes status object to reflect the current status of a node
* make sure replication does not double count charts that are already being replicated
* expose min and max in sts structures
* added view_minimum_value and view_maximum_value; percentage calculation is now an additional pass on the data, removed from formatters; absolute value calculation is now done at the query level, removed from formatters
* respect trimming in percentage calculation; updated swagger
* api/v2/weights preparative work to support multi-node queries - still single node though
* multi-node /api/v2/weights endpoint, supporting all the filtering parameters of /api/v2/data
* when passing the raw option, the query exposes the hidden dimensions
* fix compilation issues on older systems
* the query engine now calculates per dimension min, max, sum, count, anomaly count
* use the macro to calculate storage point anomaly rate
* weights endpoint exposing version hashes
* weights method=value shows min, max, average, sum, count, anomaly count, anomaly rate
* query: expose RESET flag; do not add the same point multiple times to the aggregated point
* weights: more compact output
* weights requests can be interrupted
* all /api/v2 requests can be interrupted and timeout
* allow relative timestamps in weights
* fix macos compilation warnings
* Revert "fix macos compilation warnings"
This reverts commit 8a1d24e41e9b58de566ac59f0c4b1c465bcc0592.
* /api/v2/data group-by now works on dimension names, not ids
* /api/v2/weights does not query metrics without retention and new output format
* /api/v2/weights value and anomaly queries do context queries when contexts are filtered; query timeout is now always in ms
|
|
* Remove aclk sync threads
* Disable functions if compiled with --disable-cloud
* Allocate and reuse buffer when scanning hosts
Tune transactions when writing metadata
Error checking when executing db_execute (it is already within a loop with retries)
* Schedule host context load in parallel
Child connection will be delayed if context load is not complete
Event loop cleanup
* Delay retention check if context is not loaded
Remove context load check from regular metadata host scan
* Improve checks to check finished threads
* Cleanup warnings when compiling with --disable-cloud
* Clean chart labels that were created before our current maximum retention
* Fix sql statement
* Remove structures members that of no use
Remove buffer allocations when not needed
* Fix compilation error
* Don't check for service running when not from a worker
* Code cleanup if agent is compiled with --disable-cloud
Setup ACLK tables in the database if needed
Submit node status update messages to the cloud
* Fix compilation warning when --disable-cloud is specified
* Address codacy issues
* Remove empty file -- has already been moved under contexts
* Use enum instead of numbers
* Use UUID_STR_LEN
* Add newline at the end of file
* Release node_id to prevent memory leak under certain cases
* Add queries in defines
* Ignore rc from transaction start -- if there is an active transaction, we will use it (same with commit) should further improve in a future PR
* Remove commented out code
* If host is null (it should not be) do not allocate config (coverity reports Resource leak)
* Do garbage collection when contexts is initialized
* Handle the case when config is not yet available for a host
|
|
* max web request size to 64KB
* fix the request too big message
* increase max request reading tries to 100
* support for bigger web requests
* add "avg" as a shortcut for "average" to both group by aggregation and time aggregation; discard the last partial points of a query in play mode, up to max update every; group by hidden dimensions too
* better implementation for partial data trimming
* added group_by=selected to return only one dimension for all selected metrics
* fix acceptance of group_by=selected
* passing option "raw" disables partial data trimming
* remove obsolete option "plan"; use "debug"
* fix view.min and view.max calculation - there were 2 bugs: a) min and max were reset for every row and b) min and max were corrupted by GBC and AR printing
* per row annotations
* added time column to point annotations
* disable caching for /api/v2/contexts responses
* added api format json2 that returns an array for each points, having all the point values and annotations in them
* work on swagger about /api/v2
* prevent infinite loop
* cleanup and swagger work
* allow negative simple pattern expressions to work as expected
* do not lookup in the dictionary empty names
* garbage collect dictionaries
* make query_target allocate less aggressively; queries fill the remaining points with nulls
* reusable query ops to save memory on huge queries
* move parts of query plans into query ops to save query target memory
* remove storage engine from query metric tiers, to save memory, and recalculate it when it is needed
|
|
* guard for null host when sending node instances
* also add a default value when migrating
|
|
* make the title metadta the H1
* Update collectors/python.d.plugin/zscores/README.md
* Update libnetdata/ebpf/README.md
* Update ml/README.md
* Update libnetdata/string/README.md
---------
Co-authored-by: Chris Akritidis <43294513+cakrit@users.noreply.github.com>
|
|
* Fix coverity 383236: Resource leak
* Fix coverity 382915 : Logically dead code
* Fix coverity 379133 : Division or modulo by float zero
* Fix coverity 382783 : Copy into fixed size buffer
* Fix coverity 381151 : Missing unlock
* Fix coverity 381903 : Dereference after null check
|
|
* fix broken link in ml/README.md
* fix broken link across all files
* fix broken link across all files
* fix broken links and remove what's next sections
* fix broken links and remove what's next section
* Remove related links sections with broken links that link to removed files
* fix broken links
|
|
* Change titles of agent alert notifications
* Reintroduce netdata for iot
* Eliminate guides category, merge health config docs
* Rename setup to configuration
* Codacy fixes and move health config reference
|
|
optimization (#14493)
* first work on standardizing json formatting
* renamed old grouping to time_grouping and added group_by
* add dummy functions to enable compilation
* buffer json api work
* jsonwrap opening with buffer_json_X() functions
* cleanup
* storage for quotes
* optimize buffer printing for both numbers and strings
* removed ; from define
* contexts json generation using the new json functions
* fix buffer overflow at unit test
* weights endpoint using new json api
* fixes to weights endpoint
* check buffer overflow on all buffer functions
* do synchronous queries for weights
* buffer_flush() now resets json state too
* content type typedef
* print double values that are above the max 64-bit value
* str2ndd() can now parse values above UINT64_MAX
* faster number parsing by avoiding double calculations as much as possible
* faster number parsing
* faster hex parsing
* accurate printing and parsing of double values, even for very large numbers that cannot fit in 64bit integers
* full printing and parsing without using library functions - and related unit tests
* added IEEE754 streaming capability to enable streaming of double values in hex
* streaming and replication to transfer all values in hex
* use our own str2ndd for set2
* remove subnormal check from ieee
* base64 encoding for numbers, instead of hex
* when increasing double precision, also make sure the fractional number printed is aligned to the wanted precision
* str2ndd_encoded() parses all encoding formats, including integers
* prevent uninitialized use
* /api/v1/info using the new json API
* Fix error when compiling with --disable-ml
* Remove redundant 'buffer_unittest' declaration
* Fix formatting
* Fix formatting
* Fix formatting
* fix buffer unit test
* apps.plugin using the new JSON API
* make sure the metrics registry does not accept negative timestamps
* do not allow pages with negative timestamps to be loaded from db files; do not accept pages with negative timestamps in the cache
* Fix more formatting
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* Reorge exporter integrations
* Reorg alert notifications.
change mdx to md for alert related files
* Move all .mdx to .md, including links.
|
|
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
|
|
* parallel initialization of tiers
* do not spawn multiple dbengine event loops
* user configurable dbengine parallel initialization
* size netdata based on the real cpu cores available on the system netdata runs, not on the system monitored
* user configurable system cpus
* move cpuset parsing to os.c/.h
* fix replication of misaligned chart dimensions
* give a different path to each tier thread
* statically allocate the path into the initialization structure
* use aral for reusing dbengine pages
* dictionaries uses ARAL for fixed sized values
* fix compilation without internal checks
* journal v2 index uses aral
* test to see judy allocations
* judy allocations using aral
* Add config option to select if dbengine will use direct I/O (default is yes)
* V1 journafiles will use uv_fs_read instead of mmap (respect the direct I/O setting)
* Remove sqlite3IsMemdb as it is unused
* Fix compilation error when --disable-dbengine is used
* use aral for dbengine work_cmds
* changed aral API to support new features
* pgc and mrg aral overheads
* rrdeng opcodes using aral
* better structuring and naming
* dbegnine query handles using aral
* page descriptors using aral
* remove obsolete linking
* extent io descriptors using aral
* aral keeps one last page alive
* add missing return value
* added judy aral overhead
* pdc now uses aral
* page_details now use aral
* epdl and deol using aral - make sure ARALs are initialized before spawning the event loop
* remove unused linking
* pgc now uses one aral per partition
* aral measure maximum allocation queue
* aral to allocate pages in parallel
* aral parallel pages allocation when needed
* aral cleanup
* track page allocation and page population separately
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* Moving the cloud docs under /docs/cloud (previous location: netdata/learn/*)
* Added metadata on almost every document of the old learn site for the new ingest process of learn.
* Map old learn document to their best fit as topic related docs.
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: DShreve2 <david@netdata.cloud>
Co-authored-by: hugovalente-pm <hugo@netdata.cloud>
|
|
* remove MQTT-C (MQTT 3 implementation) from buildsystem
|
|
* track memory footprint of Netdata
* track db modes alloc/ram/save/map
* track system info; track sender and receiver
* fixes
* more fixes
* track workers memory, onewayalloc memory; unify judyhs size estimation
* track replication structures and buffers
* Properly clear host RRDHOST_FLAG_METADATA_UPDATE flag
* flush the replication buffer every 1000 times the circular buffer is found empty
* dont take timestamp too frequently in sender loop
* sender buffers are not used by the same thread as the sender, so they were never recreated - fixed it
* free sender thread buffer on replication threads when replication is idle
* use the last sender flag as a timestamp of the last buffer recreation
* free cbuffer before reconnecting
* recreate cbuffer on every flush
* timings for journal v2 loading
* inlining of metric and cache functions
* aral likely/unlikely
* free left-over thread buffers
* fix NULL pointer dereference in replication
* free sender thread buffer on sender thread too
* mark ctx as used before flushing
* better logging on ctx datafiles closing
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* cleanup journal v2 mounts periodically
* fix for last commit
* re-enable loading page from disk when the arrangement of pages requires it
* Remove unused statistics
* Estimate diskspace when the current datafile is full and queue a rotate command (Currently it will not attempt to estimate end size for journals)
Queue a command to check quota on startup per tier
* apps.plugin now exposes RSS chart
* shorter thread names to make debugging easier, since thread names can only be 15 characters
* more thread names fixes
* allow an apps_groups.conf target to be pid 0 or 1
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* revert health to single thread
* remove getting now
* use a health struct
* remove commented code
* cleanup health log from metdata
* dont check for METADATA_UPDATE
|
|
|
|
* allow extents to be merged for as long as possible
* do not block the event loop while recalculating retention due to datafile rotation
* buffers are incrementally cleaned up, every second, by just 1 entry
* fix order of commands
* remove newline
* measure cancelled extent read requests
* count all cancelled extent requests
* do not double count failed pages
* fixed cancelled name
* Fix error and warnings when compiling with --disable-dbengine
* when the timeframe is outside retention and whole query should fail
* do not mark as failed pages that have been loaded but have been skipped
* added chart to show cache memory calculation variables
* LONG_MAX for 32-bit compatibility
* fix cache size calculation on 32-bit
* fix cache size calculation on 32-bit - use unsinged long long
* fix compilation warnings on 32-bits
* fix another compilation warning on 32-bits
* fix compilation warnings on older 32-bit compilers
* fix compilation warnings on older 32-bit compilers - more of them
* disable ML threads joining
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* count open cache pages refering to datafile
* eliminate waste flush attempts
* remove eliminated variable
* journal v2 scanning split functions
* avoid locking open cache for a long time while migrating to journal v2
* dont acquire datafile for the loop; disable thread cancelability while a query is running
* work on datafile acquiring
* work on datafile deletion
* work on datafile deletion again
* logs of dbengine should start with DBENGINE
* thread specific key for queries to check if a query finishes without a finalize
* page_uuid is not used anymore
* Cleanup judy traversal when building new v2
Remove not needed calls to metric registry
* metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent
* disable checks for invalid time-ranges
* Remove type from page details
* report scanning time
* remove infinite loop from datafile acquire for deletion
* remove infinite loop from datafile acquire for deletion again
* trace query handles
* properly allocate array of dimensions in replication
* metrics cleanup
* metrics registry uses arrayalloc
* arrayalloc free should be protected by lock
* use array alloc in page cache
* journal v2 scanning fix
* datafile reference leaking hunding
* do not load metrics of future timestamps
* initialize reasons
* fix datafile reference leak
* do not load pages that are entirely overlapped by others
* expand metric retention atomically
* split replication logic in initialization and execution
* replication prepare ahead queries
* replication prepare ahead queries fixed
* fix replication workers accounting
* add router active queries chart
* restore accounting of pages metadata sources; cleanup replication
* dont count skipped pages as unroutable
* notes on services shutdown
* do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file
* do not add pages we dont need to pdc
* time in range re-work to provide info about past and future matches
* finner control on the pages selected for processing; accounting of page related issues
* fix invalid reference to handle->page
* eliminate data collection handle of pg_lookup_next
* accounting for queries with gaps
* query preprocessing the same way the processing is done; cache now supports all operations on Judy
* dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3)
* get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query
* finner gaps calculation; accounting of overlapping pages in queries
* fix gaps accounting
* move datafile deletion to worker thread
* tune libuv workers and thread stack size
* stop netdata threads gradually
* run indexing together with cache flush/evict
* more work on clean shutdown
* limit the number of pages to evict per run
* do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction
* economies on flags for smaller page footprint; cleanup and renames
* eviction moves referenced pages to the end of the queue
* use murmur hash for indexing partition
* murmur should be static
* use more indexing partitions
* revert number of partitions to number of cpus
* cancel threads first, then stop services
* revert default thread stack size
* dont execute replication requests of disconnected senders
* wait more time for services that are exiting gradually
* fixed last commit
* finer control on page selection algorithm
* default stacksize of 1MB
* fix formatting
* fix worker utilization going crazy when the number is rotating
* avoid buffer full due to replication preprocessing of requests
* support query priorities
* add count of spins in spinlock when compiled with netdata internal checks
* remove prioritization from dbengine queries; cache now uses mutexes for the queues
* hot pages are now in sections judy arrays, like dirty
* align replication queries to optimal page size
* during flushing add to clean and evict in batches
* Revert "during flushing add to clean and evict in batches"
This reverts commit 8fb2b69d068499eacea6de8291c336e5e9f197c7.
* dont lock clean while evicting pages during flushing
* Revert "dont lock clean while evicting pages during flushing"
This reverts commit d6c82b5f40aeba86fc7aead062fab1b819ba58b3.
* Revert "Revert "during flushing add to clean and evict in batches""
This reverts commit ca7a187537fb8f743992700427e13042561211ec.
* dont cross locks during flushing, for the fastest flushes possible
* low-priority queries load pages synchronously
* Revert "low-priority queries load pages synchronously"
This reverts commit 1ef2662ddcd20fe5842b856c716df134c42d1dc7.
* cache uses spinlock again
* during flushing, dont lock the clean queue at all; each item is added atomically
* do smaller eviction runs
* evict one page at a time to minimize lock contention on the clean queue
* fix eviction statistics
* fix last commit
* plain should be main cache
* event loop cleanup; evictions and flushes can now happen concurrently
* run flush and evictions from tier0 only
* remove not needed variables
* flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time
* added worker jobs in timer to find the slow part of it
* support fast eviction of pages when all_of_them is set
* revert default thread stack size
* bypass event loop for dispatching read extent commands to workers - send them directly
* Revert "bypass event loop for dispatching read extent commands to workers - send them directly"
This reverts commit 2c08bc5bab12881ae33bc73ce5dea03dfc4e1fce.
* cache work requests
* minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors
* publish flushed pages to open cache in the thread pool
* prevent eventloop requests from getting stacked in the event loop
* single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c
* more rrdengine.c cleanup
* enable db rotation
* do not log when there is a filter
* do not run multiple migration to journal v2
* load all extents async
* fix wrong paste
* report opcodes waiting, works dispatched, works executing
* cleanup event loop memory every 10 minutes
* dont dispatch more work requests than the number of threads available
* use the dispatched counter instead of the executing counter to check if the worker thread pool is full
* remove UV_RUN_NOWAIT
* replication to fill the queues
* caching of extent buffers; code cleanup
* caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation
* single transaction wal
* synchronous flushing
* first cancel the threads, then signal them to exit
* caching of rrdeng query handles; added priority to query target; health is now low prio
* add priority to the missing points; do not allow critical priority in queries
* offload query preparation and routing to libuv thread pool
* updated timing charts for the offloaded query preparation
* caching of WALs
* accounting for struct caches (buffers); do not load extents with invalid sizes
* protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset
* also check if the expanded before is not the chart later updated time
* also check if the expanded before is not after the wall clock time of when the query started
* Remove unused variable
* replication to queue less queries; cleanup of internal fatals
* Mark dimension to be updated async
* caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol)
* disable pgc stress test, under an ifdef
* disable mrg stress test under an ifdef
* Mark chart and host labels, host info for async check and store in the database
* dictionary items use arrayalloc
* cache section pages structure is allocated with arrayalloc
* Add function to wakeup the aclk query threads and check for exit
Register function to be called during shutdown after signaling the service to exit
* parallel preparation of all dimensions of queries
* be more sensitive to enable streaming after replication
* atomically finish chart replication
* fix last commit
* fix last commit again
* fix last commit again again
* fix last commit again again again
* unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication
* do not cancel start streaming; use high priority queries when we have locked chart data collection
* prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered
* opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue
* Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore
* Fix bad memory allocation / cleanup
* Cleanup ACLK sync initialization (part 1)
* Don't update metric registry during shutdown (part 1)
* Prevent crash when dashboard is refreshed and host goes away
* Mark ctx that is shutting down.
Test not adding flushed pages to open cache as hot if we are shutting down
* make ML work
* Fix compile without NETDATA_INTERNAL_CHECKS
* shutdown each ctx independently
* fix completion of quiesce
* do not update shared ML charts
* Create ML charts on child hosts.
When a parent runs a ML for a child, the relevant-ML charts
should be created on the child host. These charts should use
the parent's hostname to differentiate multiple parents that might
run ML for a child.
The only exception to this rule is the training/prediction resource
usage charts. These are created on the localhost of the parent host,
because they provide information specific to said host.
* check new ml code
* first save the database, then free all memory
* dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db
* Cleanup metadata thread (part 2)
* increase refcount before dispatching prep command
* Do not try to stop anomaly detection threads twice.
A separate function call has been added to stop anomaly detection threads.
This commit removes the left over function calls that were made
internally when a host was being created/destroyed.
* Remove allocations when smoothing samples buffer
The number of dims per sample is always 1, ie. we are training and
predicting only individual dimensions.
* set the orphan flag when loading archived hosts
* track worker dispatch callbacks and threadpool worker init
* make ML threads joinable; mark ctx having flushing in progress as early as possible
* fix allocation counter
* Cleanup metadata thread (part 3)
* Cleanup metadata thread (part 4)
* Skip metadata host scan when running unittest
* unittest support during init
* dont use all the libuv threads for queries
* break an infinite loop when sleep_usec() is interrupted
* ml prediction is a collector for several charts
* sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit
* worker_unregister() in netdata threads cleanup
* moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h
* fixed ML issues
* removed engine2 directory
* added dbengine2 files in CMakeLists.txt
* move query plan data to query target, so that they can be exposed by in jsonwrap
* uniform definition of query plan according to the other query target members
* event_loop should be in daemon, not libnetdata
* metric_retention_by_uuid() is now part of the storage engine abstraction
* unify time_t variables to have the suffix _s (meaning: seconds)
* old dbengine statistics become "dbengine io"
* do not enable ML resource usage charts by default
* unify ml chart families, plugins and modules
* cleanup query plans from query target
* cleanup all extent buffers
* added debug info for rrddim slot to time
* rrddim now does proper gap management
* full rewrite of the mem modes
* use library functions for madvise
* use CHECKSUM_SZ for the checksum size
* fix coverity warning about the impossible case of returning a page that is entirely in the past of the query
* fix dbengine shutdown
* keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently
* fine tune cache evictions
* dont initialize health if the health service is not running - prevent crash on shutdown while children get connected
* rename AS threads to ACLK[hostname]
* prevent re-use of uninitialized memory in queries
* use JulyL instead of JudyL for PDC operations - to test it first
* add also JulyL files
* fix July memory accounting
* disable July for PDC (use Judy)
* use the function to remove datafiles from linked list
* fix july and event_loop
* add july to libnetdata subdirs
* rename time_t variables that end in _t to end in _s
* replicate when there is a gap at the beginning of the replication period
* reset postponing of sender connections when a receiver is connected
* Adjust update every properly
* fix replication infinite loop due to last change
* packed enums in rrd.h and cleanup of obsolete rrd structure members
* prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests()
* void unused variable
* void unused variables
* fix indentation
* entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps
* macros to caclulate page entries by time and size
* prevent statsd cleanup crash on exit
* cleanup health thread related variables
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: vkalintiris <vasilis@netdata.cloud>
|
|
* bump websockets
* add new files to makefile
* set topic aliases for used topics
|
|
add query type functions for statistics
|
|
|
|
fixes race on startup of query threads
|
|
|
|
Revert "MQTT 5 publish topic alias support (#14067)"
This reverts commit a749ab00a63b229d99f6bc82a965206e5481db3e.
|
|
* add possibility to decypt trafic for developers for debugging purposes
* requires netdata to be explicitly built with this feature enabled
|
|
* mqtt_websockets bumps version
* use the new topic alias support in netdata
|
|
|
|
* Remove calls to rrdset_next().
* Rm checks plugin
* Update documentantion
* Call rrdset_next from within rrdset_done
This wraps up the removal of rrdset_next from internal collectors, which
removes a lot of unecessary code and the need for if/else clauses in
every place.
The pluginsd parser is the only component that calls rrdset_next*()
functions because it's not strictly speaking a collector but more of a
collector manager/proxy.
With the current changes it's possible to simplify the API we expose
from RRD significantly, but this will be follow-up work in the future.
* Remove stale reference to checks.plugin
* Fix RRD unit test
rrdset_next is not meant to be called from these tests.
* Fix db engine unit test.
* Schedule rrdset_next when we have completed at least one collection.
* Mark chart creation clauses as unlikely.
* Add missing brace to fix FreeBSD plugin.
|
|
silence missleading error
|
|
* cleanup capas + add func capa
* make it const
* fixes
* freez
|
|
enable aclk conversation without INTERNAL CHECKS
|