Age | Commit message (Collapse) | Author |
|
* use a simple pattern to choose what alerts to load
* rename config to alarms
* add config entry doc
* fix link
* change wording
* change wording 2
* add to analytics
* build a list of disabled alarms for analytics
* add silenced alarms
* remove analytics related
* remove analytics related 2
|
|
Closes https://github.com/netdata/netdata/issues/14152
|
|
|
|
|
|
(#14073)
|
|
Fixes https://github.com/netdata/netdata/issues/14084
|
|
* Sanitize bash arguments.
Remove leading dashes and escape single quotes in command arguments.
* Quote expanded variable in test
|
|
* move global statistics workers to a separate thread; query statistics per query source; query statistics for ML, exporters, backfilling; reset replication point in time every 10 seconds, instead of every 1; fix compilation warnings; optimize the replication queries code; prevent long tail of replication requests (big sleeps); provide query statistics about replication ; optimize replication sender when most senders are full; optimize replication_request_get_first_available(); reset replication completion calculation;
* remove workers utilization from global statistics thread
|
|
change image labels to app.netdata.cloud
|
|
* streaming compression, query planner and replication fixes
* remove journal v2 stats from global statistics
* disable sql for checking past sql UUIDs
* single threaded replication
* final replication thread using dictionaries and JudyL for sorting the pending requests
* do not timeout the sending socket when there are pending replication requests
* streaming receiver using read() instead of fread()
* remove FILE * from streaming - now using posix read() and write()
* increase timeouts to 10 minutes
* apply sender timeout only when there are metrics that are supposed to be streamed
* error handling in replication
* remove retries on socket read timeout; better error messages
* take into account inbound traffic too to detect that a connection is stale
* remove race conditions from replication thread
* make sure deleted entries are marked as executed, so that even if deletion fails, they will not be executed
* 2 minutes timeout to retry streaming to a parent that already has this node
* remove unecessary condition check
* fix compilation warnings
* include judy in replication
* wrappers to handle retries for SSL_read and SSL_write
* compressed bytes read monitoring
* recursive locks on replication to make it faster during flush or cleanup
* replication completion chart at the receiver side
* simplified recursive mutex
* simplified recursive mutex again
|
|
* remove health_thread_stop
* soft sleeps
|
|
|
|
|
|
|
|
|
|
Co-authored-by: Shyam Sreevalsan <shyam@netdata.cloud>
|
|
|
|
|
|
* Rebased
* rebased
* health_execute_pending_updates -> health_execute_delayed_initializations
* fix labels for current host only
* missing bracket
* misc fixes, reload health for disconnected hosts
* remove volatile, add comment
|
|
|
|
|
|
(#13784)
* add host labels for ephemerality and nodes with unstable connections
|
|
|
|
also enable rrdvars from health
|
|
* init st->rrdvars from rrdset callback
* also init rrdfamily and rrdimvar from rrdset
* remove empty line
|
|
By default functions are declared as extern in C/C++ headers. The goal
of this PR is to reduce the wall of text that many headers have and,
more importantly, to make the declaration of extern'd variables - of
which we have many dispersed in various places - easily and quickly
identifiable.
Automatically generated with:
$ git grep -l '^extern.*(' '**.h' | \
grep -v libjudy | \
grep -v 'sqlite3.h' | \
xargs sed -i -e 's/extern \(.*(.*$\)/\1/'
This is a NFC.
|
|
about specific charts (#13720)
* function renames and code cleanup in popen.c; no actual code changes
* netdata popen() now opens both child process stdin and stdout and returns FILE * for both
* pass both input and output to parser structures
* updated rrdset to call custom functions
* RRDSET FUNCTION leading calls for both sync and async operation
* put RRDSET functions to a separate file
* added format and timeout at function definition
* support for synchronous (internal plugins) and asynchronous (external plugins and children) functions
* /api/v1/function endpoint
* functions are now attached to the host and there is a dictionary view per chart
* functions implemented at plugins.d
* remove the defer until keyword hook from plugins.d when it is done
* stream sender implementation of functions
* sanitization of all functions so that certain characters are only allowed
* strictier sanitization
* common max size
* 1st working plugins.d example
* always init inflight dictionary
* properly destroy dictionaries to avoid parallel insertion of items
* add more debugging on disconnection reasons
* add more debugging on disconnection reasons again
* streaming receiver respects newlines
* dont use the same fp for both streaming receive and send
* dont free dbengine memory with internal checks
* make sender proceed in the buffer
* added timing info and garbage collection at plugins.d
* added info about routing nodes
* added info about routing nodes with delay
* added more info about delays
* added more info about delays again
* signal sending thread to wake up
* streaming version labeling and commented code to support capabilities
* added functions to /api/v1/data, /api/v1/charts, /api/v1/chart, /api/v1/info
* redirect top output to stdout
* address coverity findings
* fix resource leaks of popen
* log attempts to connect to individual destinations
* better messages
* properly parse destinations
* try to find a function from the most matching to the least matching
* log added streaming destinations
* rotate destinations bypassing a node in the middle that does not accept our connection
* break the loops properly
* use typedef to define callbacks
* capabilities negotiation during streaming
* functions exposed upstream based on capabilities; compression disabled per node persisting reconnects; always try to connect with all capabilities
* restore functionality to lookup functions
* better logging of capabilities
* remove old versions from capabilities when a newer version is there
* fix formatting
* optimization for plugins.d rrdlabels to avoid creating and destructing dictionaries all the time
* delayed health initialization for rrddim and rrdset
* cleanup health initialization
* fix for popen() not returning the right value
* add health worker jobs for initializing rrdset and rrddim
* added content type support for functions; apps.plugin permanent function to display all the processes
* fixes for functions parameters parsing in apps.plugin
* fix for process matching in apps.plugiin
* first working function for apps.plugin
* Dashboard ACL is disabled for functions; Function errors are all in JSON format
* apps.plugin function processes returns json table
* use json_escape_string() to escape message
* fix formatting
* apps.plugin exposes all its metrics to function processes
* fix json formatting when filtering out some rows
* reopen the internal pipe of rrdpush in case of errors
* misplaced statement
* do not use buffer->len
* support for GLOBAL functions (functions that are not linked to a chart
* added /api/v1/functions endpoint; removed format from the FUNCTIONS api;
* swagger documentation about the new api end points
* added plugins.d documentation about functions
* never re-close a file
* remove uncessesary ifdef
* fixed issues identified by codacy
* fix for null label value
* make edit-config copy-and-paste friendly
* Revert "make edit-config copy-and-paste friendly"
This reverts commit 54500c0e0a97f65a0c66c4d34e966f6a9056698e.
* reworked sender handshake to fix coverity findings
* timeout is zero, for both send_timeout() and recv_timeout()
* properly detect that parent closed the socket
* support caching of function responses; limit function response to 10MB; added protection from malformed function responses
* disabled excessive logging
* added units to apps.plugin function processes and normalized all values to be human readable
* shorter field names
* fixed issues reported
* fixed apps.plugin error response; tested that pluginsd can properly handle faulty responses
* use double linked list macros for double linked list management
* faster apps.plugin function printing by minimizing file operations
* added memory percentage
* fix compatibility issues with older compilers and FreeBSD
* rrdpush sender code cleanup; rrhost structure cleanup from sender flags and variables;
* fix letftover variable in ifdef
* apps.plugin: do not call detach from the thread; exit immediately when input is broken
* exclude AR charts from health
* flush cleaner; prefer sender output
* clarity
* do not fill the cbuffer if not connected
* fix
* dont enabled host->sender if streaming is not enabled; send host label updates to parent;
* functions are only available through ACLK
* Prepared statement reports only in dev mode
* fix AR chart detection
* fix for streaming not being enabling itself
* more cleanup of sender and receiver structures
* moved read-only flags and configuration options to rrdhost->options
* fixed merge with master
* fix for incomplete rename
* prevent service thread from working on charts that are being collected
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
|
|
add node level anomaly rate based example
|
|
|
|
|
|
|
|
This renames the alert classes from Error to Errors, to align with the
125 occurances of the latter, vs. these three of the former.
|
|
|
|
* rrdset - in progress
* rrdset optimal constructor; rrdset conflict
* rrdset final touches
* re-organization of rrdset object members
* prevent use-after-free
* dictionary dfe supports also counting of iterations
* rrddim managed by dictionary
* rrd.h cleanup
* DICTIONARY_ITEM now is referencing actual dictionary items in the code
* removed rrdset linked list
* Revert "removed rrdset linked list"
This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.
* removed rrdset linked list
* added comments
* Switch chart uuid to static allocation in rrdset
Remove unused functions
* rrdset_archive() and friends...
* always create rrdfamily
* enable ml_free_dimension
* rrddim_foreach done with dfe
* most custom rrddim loops replaced with rrddim_foreach
* removed accesses to rrddim->dimensions
* removed locks that are no longer needed
* rrdsetvar is now managed by the dictionary
* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853
* conflict callback of rrdsetvar now properly checks if it has to reset the variable
* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM
* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe
* dictionary walkthrough callbacks get dictionary acquired items
* dictionary reference counters that can be dupped from zero
* added advanced functions for get and del
* rrdvar managed by dictionaries
* thread safety for rrdsetvar
* faster rrdvar initialization
* rrdvar string lengths should match in all add, del, get functions
* rrdvar internals hidden from the rest of the world
* rrdvar is now acquired throughout netdata
* hide the internal structures of rrdsetvar
* rrdsetvar is now acquired through out netdata
* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata
* better error handling
* dont create variables if not initialized for health
* dont create variables if not initialized for health again
* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items
* type checking on acquired objects
* rrdcalc renaming of functions
* type checking for rrdfamily_acquired
* rrdcalc managed by dictionaries
* rrdcalc double free fix
* host rrdvars is always needed
* attempt to fix deadlock 1
* attempt to fix deadlock 2
* Remove unused variable
* attempt to fix deadlock 3
* snprintfz
* rrdcalc index in rrdset fix
* Stop storing active charts and computing chart hashes
* Remove store active chart function
* Remove compute chart hash function
* Remove sql_store_chart_hash function
* Remove store_active_dimension function
* dictionary delayed destruction
* formatting and cleanup
* zero dictionary base on rrdsetvar
* added internal error to log delayed destructions of dictionaries
* typo in rrddimvar
* added debugging info to dictionary
* debug info
* fix for rrdcalc keys being empty
* remove forgotten unlock
* remove deadlock
* Switch to metadata version 5 and drop
chart_hash
chart_hash_map
chart_active
dimension_active
v_chart_hash
* SQL cosmetic changes
* do not busy wait while destroying a referenced dictionary
* remove deadlock
* code cleanup; re-organization;
* fast cleanup and flushing of dictionaries
* number formatting fixes
* do not delete configured alerts when archiving a chart
* rrddim obsolete linked list management outside dictionaries
* removed duplicate contexts call
* fix crash when rrdfamily is not initialized
* dont keep rrddimvar referenced
* properly cleanup rrdvar
* removed some locks
* Do not attempt to cleanup chart_hash / chart_hash_map
* rrdcalctemplate managed by dictionary
* register callbacks on the right dictionary
* removed some more locks
* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread
* when looking up for an alarm look using both chart id and chart name
* host initialization a bit more modular
* init rrdlabels on host update
* preparation for dictionary views
* improved comment
* unused variables without internal checks
* service threads isolation and worker info
* more worker info in service thread
* thread cancelability debugging with internal checks
* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647
* dictionary modularization
* Remove unused SQL statement definition
* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated
* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops
* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal
* Update collectors/cgroups.plugin/sys_fs_cgroup.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Update collectors/cgroups.plugin/sys_fs_cgroup.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* Update collectors/proc.plugin/proc_net_dev.c
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
* fix memory leak in rrdset cache_dir
* minor dictionary changes
* dont use index locks in single threaded
* obsolete dict option
* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;
* fix jump on uninitialized value in dictionary; remove double free of cache_dir
* addressed codacy findings
* removed debugging code
* use the private refcount on dictionaries
* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;
* more dictionary statistics
* global statistics about dictionary operations, memory, items, callbacks
* dictionary support for views - missing the public API
* removed warning about unused parameter
* chart and context name for cloud
* chart and context name for cloud, again
* dictionary statistics fixed; first implementation of dictionary views - not currently used
* only the master can globally delete an item
* context needs netdata prefix
* fix context and chart it of spins
* fix for host variables when health is not enabled
* run garbage collector on item insert too
* Fix info message; remove extra "using"
* update dict unittest for new placement of garbage collector
* we need RRDHOST->rrdvars for maintaining custom host variables
* Health initialization needs the host->host_uuid
* split STRING to its own files; no code changes other than that
* initialize health unconditionally
* unit tests do not pollute the global scope with their variables
* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
|
|
|
|
|
|
* move chart_labels to rrdset
* rename chart_labels to rrdlabels
* renamed hash_id to uuid
* turned is_ar_chart into an rrdset flag
* removed rrdset state
* removed unused senders_connected member of rrdhost
* removed unused host flag RRDHOST_FLAG_MULTIHOST
* renamed rrdhost host_labels to rrdlabels
* Update exporting unit tests
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
|
|
* rrdfamily
* rrddim
* rrdset plugin and module names
* rrdset units
* rrdset type
* rrdset family
* rrdset title
* rrdset title more
* rrdset context
* rrdcalctemplate context and removal of context hash from rrdset
* strings statistics
* rrdset name
* rearranged members of rrdset
* eliminate rrdset name hash; rrdcalc chart converted to STRING
* rrdset id, eliminated rrdset hash
* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate
* rrdcalctemplate
* rrdvar
* eval_variable
* rrddimvar and rrdsetvar
* rrdhost hostname, os and tags
* fix master commits
* added thread cache; implemented string_dup without locks
* faster thread cache
* rrdset and rrddim now use dictionaries for indexing
* rrdhost now uses dictionary
* rrdfamily now uses DICTIONARY
* rrdvar using dictionary instead of AVL
* allocate the right size to rrdvar flag members
* rrdhost remaining char * members to STRING *
* better error handling on indexing
* strings now use a read/write lock to allow parallel searches to the index
* removed AVL support from dictionaries; implemented STRING with native Judy calls
* string releases should be negative
* only 31 bits are allowed for enum flags
* proper locking on strings
* string threading unittest and fixes
* fix lgtm finding
* fixed naming
* stream chart/dimension definitions at the beginning of a streaming session
* thread stack variable is undefined on thread cancel
* rrdcontext garbage collect per host on startup
* worker control in garbage collection
* relaxed deletion of rrdmetrics
* type checking on dictfe
* netdata chart to monitor rrdcontext triggers
* Group chart label updates
* rrdcontext better handling of collected rrdsets
* rrdpush incremental transmition of definitions should use as much buffer as possible
* require 1MB per chart
* empty the sender buffer before enabling metrics streaming
* fill up to 50% of buffer
* reset signaling metrics sending
* use the shared variable for status
* use separate host flag for enabling streaming of metrics
* make sure the flag is clear
* add logging for streaming
* add logging for streaming on buffer overflow
* circular_buffer proper sizing
* removed obsolete logs
* do not execute worker jobs if not necessary
* better messages about compression disabling
* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped
* monitor stream sender used buffer ratio
* Update exporting unit tests
* no need to compare label value with strcmp
* streaming send workers now monitor bandwidth
* workers now use strings
* streaming receiver monitors incoming bandwidth
* parser shift of worker ids
* minor fixes
* Group chart label updates
* Populate context with dimensions that have data
* Fix chart id
* better shift of parser worker ids
* fix for streaming compression
* properly count received bytes
* ensure LZ4 compression ring buffer does not wrap prematurely
* do not stream empty charts; do not process empty instances in rrdcontext
* need_to_send_chart_definition() does not need an rrdset lock any more
* rrdcontext objects are collected, after data have been written to the db
* better logging of RRDCONTEXT transitions
* always set all variables needed by the worker utilization charts
* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes
* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock
* STRING code re-organization for clarity
* thread_cache improvements; double numbers precision on worker threads
* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS
* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts
* Add index to speed up database context searches
* Removed last_updated optimization (was also buggy after latest merge with master)
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
|
|
|
|
make time in warn/crit predictable (#13563)
|
|
|
|
Co-authored-by: ilyam8 <ilya@netdata.cloud>
|
|
|
|
* chart labels for alerts
* proper termination
* use strchr
* change if statement
* change label variable. add docs
* change doc
* assign buf to temp
* use new dictionary functions
* reduce variable scope
* reduce line length
* make sure rrdcalc updates labels after inserted
* reduce var scope
* add rrdcalc.c for cmocka tests
* Revert "add rrdcalc.c for cmocka tests"
This reverts commit 5fe122adcf7abcbe6d67fa2ebd7c4ff8620cf9c8.
* Fix cmocka unit tests
* valgrind errors
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
|
|
do the hash after rrdvar_fix_name
|
|
|
|
* add chart context to alert events
* migrate health log tables to add chart_context
* send it via proto message
* add from v3 to v4
* free table
* free chart_context
|
|
* /api/v1/weights endpoints
* high resolution anomaly rate in parallel with queries; points and options in /api/v1/weights reflect the truth
* context printing
* merged metric_correlations with weights API; added parameter tier to select the tier to run the query; weight api now returns points per tier; added swagger info about weights api
* moved metric_correlations files to web/api/queries as weights
* added contexts filtering; renamed correlated_dimensions; weights API is always enabled; code cleanup
* allow returning zero results
|
|
* show last 15 alerts in notification
* change to alerts
* fix shellcheck
|