Age | Commit message (Collapse) | Author |
|
No write lock required
|
|
check for host labels when linking alerts
|
|
* Move aclk_update_retention to the proper header file
* Do a scan but avoid going through all the dimensions if we have too much to delete -- do not generate a retention message in that case
* Schedule the retention calculation to a worker
* Adjust messages in the access log
* Fix compilation errors with --disable-cloud
|
|
command (#13040)
* Move retry count to the header file
* Add SQL_MAX_RETRY count and fix the netdata_exit check
|
|
created_at timestamp (#13035)
|
|
|
|
trigger queue removed on health log exchange with cloud
|
|
* wait untill after 2 minutes of last chart received to run obsoletion check
* turn write to read locks
|
|
change (#12990)
* Only clear the RRDSET_FLAG_UPSTREAM_EXPOSED chart flag if metadata has changed
* Handle modification of units as well
* Initialize old_units in the chart state
|
|
* stream and advertise mc to the cloud
* better reporting
* remove log
* remove aclk debug
|
|
* faster rrdeng_load_metric_next()
* no need to check validity for number - already done at the query side
* solve discrepancy between query create and free
* inline unpack_storage_number
|
|
|
|
* cleanup and optimize rrdeng_load_metric_next()
* fixed typo
|
|
* replace connect_to_one_of with connect_to_one_of_destinations
* move functions from socket.c
* use sizeof
* move current destination pointer to host
* formatting
* use snprintfz
* get entries in same order
* handle single destination as before (or when it is the last of the list), instead of skiping it every other loop
* try other destinations on ssl problem
|
|
|
|
Defer payload check to the aclk sync thread
|
|
* Add a flag to "cache" the latest hidden status written in the database
* rrddim hide and unhide will check "cached" state, update the database if needed and set the cache flag accordingly
* Check the dimension option and only do the database update if the cached state is different
|
|
* Mark a chart to be exposed only if dimension is created or metadata changes
* Add a calculate liveness for the dimension for collected to non collected (live -> stale) and vice versa
* queue_dimension_to_aclk will have the rrdset and either 0 or last collected time
If 0 then it will be marked as live else it will be marked as stale and last collected time will be sent to the cloud
* Add an extra parameter to indicate if the payload check should be done in the database or it has been done already
* Queue dimension sets dimension liveness and queues the exact payload to store in the database
* Fix compilation error when --disable-cloud is specified
|
|
|
|
* user configurable sqlite PRAGMAs
* added cache size
|
|
Add the correct requested chart sequence id from the cloud and also record the local one we have
|
|
Fix release channel in the node info message (was hardcoded)
|
|
use new capability fields
|
|
* pause and unpause alert pushes to the cloud
* move the check to when creating opcode
* check for worker
* remove previous checks for dbsync_workers. queue and clean aclk_alert tables even if no workers are up. Get wc then check before setting pause
* remove sync_syncronize
* remove sync_synchronize_2
|
|
|
|
|
|
* Remove error (no real value)
* Add a parameter to create an in-memory database for stress testing
* Add a new parameter to the stresstest command to set the number of deisred libuv worker threads
|
|
|
|
* initial version of worker utilization
* working example
* without mutexes
* monitoring DBENGINE, ACLKSYNC, WEB workers
* added charts to monitor worker usage
* fixed charts units
* updated contexts
* updated priorities
* added documentation
* converted threads to stacked chart
* One query per query thread
* Revert "One query per query thread"
This reverts commit 6aeb391f5987c3c6ba2864b559fd7f0cd64b14d3.
* fixed priority for web charts
* read worker cpu utilization from proc
* read workers cpu utilization via /proc/self/task/PID/stat, so that we have cpu utilization even when the jobs are too long to finish within our update_every frequency
* disabled web server cpu utilization monitoring - it is now monitored by worker utilization
* tight integration of worker utilization to web server
* monitoring statsd worker threads
* code cleanup and renaming of variables
* contrained worker and statistics conflict to just one variable
* support for rendering jobs per type
* better priorities and removed the total jobs chart
* added busy time in ms per job type
* added proc.plugin monitoring, switch clock to MONOTONIC_RAW if available, global statistics now cleans up old worker threads
* isolated worker thread families
* added cgroups.plugin workers
* remove unneeded dimensions when then expected worker is just one
* plugins.d and streaming monitoring
* rebased; support worker_is_busy() to be called one after another
* added diskspace plugin monitoring
* added tc.plugin monitoring
* added ML threads monitoring
* dont create dimensions and charts that are not needed
* fix crash when job types are added on the fly
* added timex and idlejitter plugins; collected heartbeat statistics; reworked heartbeat according to the POSIX
* the right name is heartbeat for this chart
* monitor streaming senders
* added streaming senders to global stats
* prevent division by zero
* added clock_init() to external C plugins
* added freebsd and macos plugins
* added freebsd and macos to global statistics
* dont use new as a variable; address compiler warnings on FreeBSD and MacOS
* refactored contexts to be unique; added health threads monitoring
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
- Variable "hostname" going out of scope leaks the storage it points to.
- Null-checking "rd->name" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
|
|
allocations (#12841)
* fix mismatches of the use of the z functions for allocations
* when there was no memory; the original name of the dimensions was freed, and with mismatching deallocator..
* fixed memory leak at rrdeng_load_metric_*() functions
* fixed memory leak on exit of plugins.d parser
* fixed memory leak on plugins and streaming receiver threads exit
* fixed compiler warnings
|
|
|
|
Retry 3 times, to queue the page request before giving up
|
|
metrics ignored (#12829)
|
|
* Add chart filtering in the allmetrics API call
* Fix compilation warnings
* Remove unnecessary function
* Update the documentation
* Apply suggestions from code review
* Check for filter instead of filter_string
* Do not check both - chart id and name for prometheus and shell formats
* Fix unit tests
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
|
|
|
|
* inject removed events when missing from sqlite
* pass flag
* remove log message
|
|
* Optimize linking of foreach alarms to dimensions.
Keep the write-lock on host but use read-lock for charts because it's
easy to verify that they aren't modified by the linking of foreach
alarms to dimensions.
* Protect alarm log modifications with write-lock.
|
|
* Set the thread name for libuv threads to LIBUV_WORKER
* Make sure the dbengine thread has the correct name
|
|
* initial attempt at metric correlations
* fix loop
* simplify struct
* change json
* get points from query
* comment
* dont lock the host as much
* add a configuration option to enable/disable metric correlations
* remove KSfbar from header file
* lock charts
* add timeout
* cast multiplication
* add licencing info
* better licencing
* use onewayalloc
* destroy owa
|
|
|
|
* Try to queue dimension always when:
Trying to clean obsolete charts
If chart has been sent and liveness apparently changed
* delay rotation and skip chart check if not send to cloud
* No need to CLEAR flag during database rotation
Do not clear chart ACLK status for dimension requests
* Change payload_sent to return timestamp of submitted message
* Clear the dimension ACLK flag if we are processing all the charts again
* Check if dimension is already queued to ACLK and ignore it
If queue fails then reset it to retry
Already try to queue the dimension
* Improve dimension cleanup during the retention message calculation
* Change queue_dimension_to_aclk to return void
* If no time range for this dimension then assume it is deleted
* Start streaming for inactive nodes
* Remove dead code
* Correctly report hostname in the access log
* Schedule a dimension deletion without trying to submit a message immediately
* Enable dimension cleanup -- also delete dimension if not found in the dbengine files
Free hostname
|
|
After https://github.com/netdata/netdata/pull/12209 per-chart configuration
was used for (a) enabling/disabling a chart, and (b) renaming dimensions.
Regarding the first use case: We already have component-specific
configuration options|flags to finely control how a chart should behave.
Eg. "send charts matching" in streaming, "charts to skip from training"
in ML, etc. If we really need the concept of a disabled chart, we can
add a host-level simple pattern to match these charts.
Regarding the second use case: It's not obvious why we'd need to provide
support for remapping dimension names through a chart-specific configuration
from the core agent. If the need arises, we could add such support at
the right place, ie. a exporter/streaming config section.
This will allow each flag to act indepentendly from each other and
avoid managing flag-state manually at various places, eg:
```
if(unlikely(!rrdset_flag_check(st, RRDSET_FLAG_ENABLED))) {
rrdset_flag_clear(st, RRDSET_FLAG_UPSTREAM_SEND);
rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE);
} ...
```
|
|
|
|
* rrd: move API structures out of rrddim_volatile
In C, unlike C++, it's not possible to reference a nested structure
from outside this structure.
Since we later want to use rrddim_query_ops and rrddim_collect_ops
separately from rrddim_volatile, move these nested structures out.
* rrd: use opaque handle types for different memory modes
|
|
* check for chart obsoletion on children connections
* use rrdset_is_obsolete
|
|
* one way allocator to speed up context queries
* fixed a bug while expanding memory pages
* reworked for clarity and finally fixed the bug of allocating memory beyond the page size
* further optimize allocation step to minimize the number of allocations made
* implement strdup with memcpy instead of strcpy
* added documentation
* prevent an uninitialized use of owa
* added callocz() interface
* integrate onewayalloc everywhere - apart sql queries
* one way allocator is now used in context queries using archived charts in sql
* align on the size of pointers
* forgotten freez()
* removed not needed memcpys
* give unique names to global variables to avoid conflicts with system definitions
|
|
* filter
* update filter
* queue removed directly
* more
* logging
* cleanup
* cleanup 2
* cleanup 3
* finalize instead of reset
|
|
If memory mode is save, map or ram the set's flags are initialized to 0.
Otherwise, the set is calloc'd which will make the set have 0 flags.
|
|
They are used extensively throughout our code base, and not having
support for them does not generate a thread-safe agent.
|