Age | Commit message (Collapse) | Author |
|
add proper metadata to the file
|
|
* add metadata for learn
* first batch of adding metadata to md files
* second batch of adding metadata to md files
* third batch of adding metadata to md files
* test one sidebar_label
* add missing sidebar_labels
* add missing sidebar_labels to files left behind
* test, ansible doc is stubborn
* fix
* fix
* fix
* don't use questionmarks in the sidebar label
* don't use exclamation marks and symbols in the sidebar label
* fix style guide
* fixes
* fixes
|
|
* first commit - untested
* fix wrong begin command
* added set v2 too
* debug to log stream buffer
* debug to log stream buffer
* faster streaming printing
* mark charts and dimensions as collected
* use stream points even if sender is not enabled
* comment out stream debug log
* parse null as nan
* custom begin v2
* custom set v2; replication now copies the anomalous flag too
* custom end v2
* enabled stream log test
* renamed to BEGIN2, SET2, END2
* dont mix up replay and v2 members in user object
* fix typo
* cleanup
* support to v2 to v1 proxying
* mark updated dimensions as such
* do not log unknown flags
* comment out stream debug log
* send also the chart id on BEGIN2, v2 to v2
* update the data collections counter
* v2 values are transferred in hex
* faster hex parsing
* a little more generic hex and dec printing and parsing
* fix hex parsing
* minor optimization in dbengine api
* turn debugging into info message
* generalized the timings tracking, so that it can be used in more places
* commented out debug info
* renamed conflicting variable with macro
* remove wrong edits
* integrated ML and added cleanup in case parsing is interrupted
* disable data collection locking during v2
* cleanup stale ML locks; send updated chart variables during v2; add info to find stale locks
* inject an END2 between repeated BEGIN2 from rrdset_done()
* test: remove lockless single-threaded logic from dictionary and aral and apply the right acquire/release memory order to reference counters
* more fine grained dictionary atomics
* remove unecessary return values
* pointer validation under NETDATA_DICTIONARY_VALIDATE_POINTERS
* Revert "pointer validation under NETDATA_DICTIONARY_VALIDATE_POINTERS"
This reverts commit 846cdf2713e2a7ee2ff797f38db11714228800e9.
* Revert "remove unecessary return values"
This reverts commit 8c87d30f4d86f0f5d6b4562cf74fe7447138bbff.
* Revert "more fine grained dictionary atomics"
This reverts commit 984aec4234a340d197d45239ff9a10fd479fcf3c.
* Revert "test: remove lockless single-threaded logic from dictionary and aral and apply the right acquire/release memory order to reference counters"
This reverts commit c460b3d0ad497d2641bd0ea1d63cec7c052e74e4.
* Apply again "pointer validation under NETDATA_DICTIONARY_VALIDATE_POINTERS" while keeping the improved atomic operations.
This reverts commit f158d009
* fix last commit
* fix last commit again
* optimizations in dbengine
* do not send anomaly bit on non-supporting agents (send it when the INTERPOLATED capability is available)
* break long empty-points-loops in rrdset_done()
* decide page alignment on new page allocation, not on every point collected
* create max size pages but no smaller than 1/3
* Fix compilation when --disable-ml is specified
* Return false
* fixes for NETDATA_LOG_REPLICATION_REQUESTS
* added compile option NETDATA_WITHOUT_WORKERS_LATENCY
* put timings in BEGIN2, SET2, END2
* isolate begin2 ml
* revert repositioning data collection lock
* fixed multi-threading of statistics
* do not lookup dimensions all the time if they come in the same order
* update used on iteration, not on every points; also do better error handling
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* acquiring / releasing interface for metrics
* metrics registry statistics
* cleanup metrics registry by deleting metrics when they dont have retention anymore; do not double copy the data of pages to be flushed
* print the tier in retention summary
* Open files with buffered instead of direct I/O (test)
* added more metrics stats and fixed unittest
* rename writer functions to avoid confusion with refcounting
* do not release a metric that is not acquired
* Revert to use direct I/O on write -- use direct I/O on read as well
* keep track of ARAL overhead and add it to the memory chart
* aral full check via api
* Cleanup
* give names to ARALs and PGCs
* aral improvements
* restore query expansion to the future
* prefer higher resolution tier when switching plans
* added extent read statistics
* smoother joining of tiers at query engine
* fine tune aral max allocation size
* aral restructuring to hide its internals from the rest of netdata
* aral restructuring; addtion of defrag option to aral to keep the linked list sorted - enabled by default to test it
* fully async aral
* some statistics and cleanup
* fix infinite loop while calculating retention
* aral docs and defragmenting disabled by default
* fix bug and add optimization when defragmenter is not enabled
* aral stress test
* aral speed report and documentation
* added internal checks that all pages are full
* improve internal log about metrics deletion
* metrics registry uses one aral per partition
* metrics registry aral max size to 512 elements per page
* remove data_structures/README.md dependency
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* track memory footprint of Netdata
* track db modes alloc/ram/save/map
* track system info; track sender and receiver
* fixes
* more fixes
* track workers memory, onewayalloc memory; unify judyhs size estimation
* track replication structures and buffers
* Properly clear host RRDHOST_FLAG_METADATA_UPDATE flag
* flush the replication buffer every 1000 times the circular buffer is found empty
* dont take timestamp too frequently in sender loop
* sender buffers are not used by the same thread as the sender, so they were never recreated - fixed it
* free sender thread buffer on replication threads when replication is idle
* use the last sender flag as a timestamp of the last buffer recreation
* free cbuffer before reconnecting
* recreate cbuffer on every flush
* timings for journal v2 loading
* inlining of metric and cache functions
* aral likely/unlikely
* free left-over thread buffers
* fix NULL pointer dereference in replication
* free sender thread buffer on sender thread too
* mark ctx as used before flushing
* better logging on ctx datafiles closing
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* allow spinlock to be compiled with gcc 4.9
* allow compiling without dbengine
|
|
* find the issue with workers performance and optimize it
* have 2 independent locks in workers
* break down workers statistics work to measure the time of each module
* optimize reading cpu for workers; cleanup procfile from unecessary allocations and operations
* cleanup workers
* fix the workers statistics dimension name
|
|
use the faster monotonic clock in workers and replication; avoid unecessary statistics function on every request on replication - gather them all together once every second; check the chart flags on all mirrored hosts, not only the ones that have a sender; cleanup and unify replication logs; added child world time to REND; fix first BEGIN been transmitted when replication starts;
|
|
|
|
* streaming compression, query planner and replication fixes
* remove journal v2 stats from global statistics
* disable sql for checking past sql UUIDs
* single threaded replication
* final replication thread using dictionaries and JudyL for sorting the pending requests
* do not timeout the sending socket when there are pending replication requests
* streaming receiver using read() instead of fread()
* remove FILE * from streaming - now using posix read() and write()
* increase timeouts to 10 minutes
* apply sender timeout only when there are metrics that are supposed to be streamed
* error handling in replication
* remove retries on socket read timeout; better error messages
* take into account inbound traffic too to detect that a connection is stale
* remove race conditions from replication thread
* make sure deleted entries are marked as executed, so that even if deletion fails, they will not be executed
* 2 minutes timeout to retry streaming to a parent that already has this node
* remove unecessary condition check
* fix compilation warnings
* include judy in replication
* wrappers to handle retries for SSL_read and SSL_write
* compressed bytes read monitoring
* recursive locks on replication to make it faster during flush or cleanup
* replication completion chart at the receiver side
* simplified recursive mutex
* simplified recursive mutex again
|
|
By default functions are declared as extern in C/C++ headers. The goal
of this PR is to reduce the wall of text that many headers have and,
more importantly, to make the declaration of extern'd variables - of
which we have many dispersed in various places - easily and quickly
identifiable.
Automatically generated with:
$ git grep -l '^extern.*(' '**.h' | \
grep -v libjudy | \
grep -v 'sqlite3.h' | \
xargs sed -i -e 's/extern \(.*(.*$\)/\1/'
This is a NFC.
|
|
* rrdfamily
* rrddim
* rrdset plugin and module names
* rrdset units
* rrdset type
* rrdset family
* rrdset title
* rrdset title more
* rrdset context
* rrdcalctemplate context and removal of context hash from rrdset
* strings statistics
* rrdset name
* rearranged members of rrdset
* eliminate rrdset name hash; rrdcalc chart converted to STRING
* rrdset id, eliminated rrdset hash
* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate
* rrdcalctemplate
* rrdvar
* eval_variable
* rrddimvar and rrdsetvar
* rrdhost hostname, os and tags
* fix master commits
* added thread cache; implemented string_dup without locks
* faster thread cache
* rrdset and rrddim now use dictionaries for indexing
* rrdhost now uses dictionary
* rrdfamily now uses DICTIONARY
* rrdvar using dictionary instead of AVL
* allocate the right size to rrdvar flag members
* rrdhost remaining char * members to STRING *
* better error handling on indexing
* strings now use a read/write lock to allow parallel searches to the index
* removed AVL support from dictionaries; implemented STRING with native Judy calls
* string releases should be negative
* only 31 bits are allowed for enum flags
* proper locking on strings
* string threading unittest and fixes
* fix lgtm finding
* fixed naming
* stream chart/dimension definitions at the beginning of a streaming session
* thread stack variable is undefined on thread cancel
* rrdcontext garbage collect per host on startup
* worker control in garbage collection
* relaxed deletion of rrdmetrics
* type checking on dictfe
* netdata chart to monitor rrdcontext triggers
* Group chart label updates
* rrdcontext better handling of collected rrdsets
* rrdpush incremental transmition of definitions should use as much buffer as possible
* require 1MB per chart
* empty the sender buffer before enabling metrics streaming
* fill up to 50% of buffer
* reset signaling metrics sending
* use the shared variable for status
* use separate host flag for enabling streaming of metrics
* make sure the flag is clear
* add logging for streaming
* add logging for streaming on buffer overflow
* circular_buffer proper sizing
* removed obsolete logs
* do not execute worker jobs if not necessary
* better messages about compression disabling
* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped
* monitor stream sender used buffer ratio
* Update exporting unit tests
* no need to compare label value with strcmp
* streaming send workers now monitor bandwidth
* workers now use strings
* streaming receiver monitors incoming bandwidth
* parser shift of worker ids
* minor fixes
* Group chart label updates
* Populate context with dimensions that have data
* Fix chart id
* better shift of parser worker ids
* fix for streaming compression
* properly count received bytes
* ensure LZ4 compression ring buffer does not wrap prematurely
* do not stream empty charts; do not process empty instances in rrdcontext
* need_to_send_chart_definition() does not need an rrdset lock any more
* rrdcontext objects are collected, after data have been written to the db
* better logging of RRDCONTEXT transitions
* always set all variables needed by the worker utilization charts
* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes
* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock
* STRING code re-organization for clarity
* thread_cache improvements; double numbers precision on worker threads
* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS
* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts
* Add index to speed up database context searches
* Removed last_updated optimization (was also buggy after latest merge with master)
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
|
|
|
|
registered job (#12903)
|
|
|
|
|
|
* initial version of worker utilization
* working example
* without mutexes
* monitoring DBENGINE, ACLKSYNC, WEB workers
* added charts to monitor worker usage
* fixed charts units
* updated contexts
* updated priorities
* added documentation
* converted threads to stacked chart
* One query per query thread
* Revert "One query per query thread"
This reverts commit 6aeb391f5987c3c6ba2864b559fd7f0cd64b14d3.
* fixed priority for web charts
* read worker cpu utilization from proc
* read workers cpu utilization via /proc/self/task/PID/stat, so that we have cpu utilization even when the jobs are too long to finish within our update_every frequency
* disabled web server cpu utilization monitoring - it is now monitored by worker utilization
* tight integration of worker utilization to web server
* monitoring statsd worker threads
* code cleanup and renaming of variables
* contrained worker and statistics conflict to just one variable
* support for rendering jobs per type
* better priorities and removed the total jobs chart
* added busy time in ms per job type
* added proc.plugin monitoring, switch clock to MONOTONIC_RAW if available, global statistics now cleans up old worker threads
* isolated worker thread families
* added cgroups.plugin workers
* remove unneeded dimensions when then expected worker is just one
* plugins.d and streaming monitoring
* rebased; support worker_is_busy() to be called one after another
* added diskspace plugin monitoring
* added tc.plugin monitoring
* added ML threads monitoring
* dont create dimensions and charts that are not needed
* fix crash when job types are added on the fly
* added timex and idlejitter plugins; collected heartbeat statistics; reworked heartbeat according to the POSIX
* the right name is heartbeat for this chart
* monitor streaming senders
* added streaming senders to global stats
* prevent division by zero
* added clock_init() to external C plugins
* added freebsd and macos plugins
* added freebsd and macos to global statistics
* dont use new as a variable; address compiler warnings on FreeBSD and MacOS
* refactored contexts to be unique; added health threads monitoring
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|