summaryrefslogtreecommitdiffstats
path: root/database
AgeCommit message (Collapse)Author
2022-06-03Fix locking access to chart labels (#13064)Stelios Fragkakis
No write lock required
2022-06-02Check for host labels when linking alerts for children (#13053)Emmanuel Vasilakis
check for host labels when linking alerts
2022-06-01Schedule retention message calculation to a worker thread (#13039)Stelios Fragkakis
* Move aclk_update_retention to the proper header file * Do a scan but avoid going through all the dimensions if we have too much to delete -- do not generate a retention message in that case * Schedule the retention calculation to a worker * Adjust messages in the access log * Fix compilation errors with --disable-cloud
2022-05-31Fix the retry count and netdata_exit check when running an sqlite3_step ↵Stelios Fragkakis
command (#13040) * Move retry count to the header file * Add SQL_MAX_RETRY count and fix the netdata_exit check
2022-05-31When sending a dimension for the first time, make sure there is a non zero ↵Stelios Fragkakis
created_at timestamp (#13035)
2022-05-30Check return value and log an error on failure (#13037)Stelios Fragkakis
2022-05-30Trigger queue removed alerts on health log exchange with cloud (#12954)Emmanuel Vasilakis
trigger queue removed on health log exchange with cloud
2022-05-25Delay children chart obsoletion check (#12992)Emmanuel Vasilakis
* wait untill after 2 minutes of last chart received to run obsoletion check * turn write to read locks
2022-05-24Don't expose the chart definition to streaming if there is no metadata ↵Stelios Fragkakis
change (#12990) * Only clear the RRDSET_FLAG_UPSTREAM_EXPOSED chart flag if metadata has changed * Handle modification of units as well * Initialize old_units in the chart state
2022-05-24Stream and advertise metric correlations to the cloud (#12940)Emmanuel Vasilakis
* stream and advertise mc to the cloud * better reporting * remove log * remove aclk debug
2022-05-24Faster queries (#12988)Costa Tsaousis
* faster rrdeng_load_metric_next() * no need to check validity for number - already done at the query side * solve discrepancy between query create and free * inline unpack_storage_number
2022-05-23modify code to resolve compile warning issue (#12969)kklionz
2022-05-20cleanup and optimize rrdeng_load_metric_next() (#12966)Costa Tsaousis
* cleanup and optimize rrdeng_load_metric_next() * fixed typo
2022-05-20Apply some logic to possible streaming destinations (#12866)Emmanuel Vasilakis
* replace connect_to_one_of with connect_to_one_of_destinations * move functions from socket.c * use sizeof * move current destination pointer to host * formatting * use snprintfz * get entries in same order * handle single destination as before (or when it is the last of the list), instead of skiping it every other loop * try other destinations on ssl problem
2022-05-19Cleanup chart hash and map tables on startup (#12956)Stelios Fragkakis
2022-05-18Defer the dimension payload check to the ACLK sync thread (#12951)Stelios Fragkakis
Defer payload check to the aclk sync thread
2022-05-18Optimize the dimensions option store to the metadata database (#12952)Stelios Fragkakis
* Add a flag to "cache" the latest hidden status written in the database * rrddim hide and unhide will check "cached" state, update the database if needed and set the cache flag accordingly * Check the dimension option and only do the database update if the cached state is different
2022-05-17Adjust the dimension liveness status check (#12933)Stelios Fragkakis
* Mark a chart to be exposed only if dimension is created or metadata changes * Add a calculate liveness for the dimension for collected to non collected (live -> stale) and vice versa * queue_dimension_to_aclk will have the rrdset and either 0 or last collected time If 0 then it will be marked as live else it will be marked as stale and last collected time will be sent to the cloud * Add an extra parameter to indicate if the payload check should be done in the database or it has been done already * Queue dimension sets dimension liveness and queues the exact payload to store in the database * Fix compilation error when --disable-cloud is specified
2022-05-16chore: add links to SQLite init options in the src code (#12920)Ilya Mashchenko
2022-05-16user configurable sqlite PRAGMAs (#12917)Costa Tsaousis
* user configurable sqlite PRAGMAs * added cache size
2022-05-16Fix the log entry for incoming cloud start streaming commands (#12908)Stelios Fragkakis
Add the correct requested chart sequence id from the cloud and also record the local one we have
2022-05-14Fix release channel in the node info message (#12905)Stelios Fragkakis
Fix release channel in the node info message (was hardcoded)
2022-05-13Implements new capability fields in aclk_schemas (#12602)Timotej S
use new capability fields
2022-05-12Pause alert pushes to the cloud (#12852)Emmanuel Vasilakis
* pause and unpause alert pushes to the cloud * move the check to when creating opcode * check for worker * remove previous checks for dbsync_workers. queue and clean aclk_alert tables even if no workers are up. Get wc then check before setting pause * remove sync_syncronize * remove sync_synchronize_2
2022-05-12Fix compilation warnings (#12886)Vladimir Kobal
2022-05-11Configurable storage engine for Netdata agents: step 2 (#12808)Adrien Béraud
2022-05-10Initialize the metadata database when performing dbengine stress test (#12861)Stelios Fragkakis
* Remove error (no real value) * Add a parameter to create an in-memory database for stress testing * Add a new parameter to the stresstest command to set the number of deisred libuv worker threads
2022-05-09Add a database checkpoint command (#12859)Stelios Fragkakis
2022-05-09Workers utilization charts (#12807)Costa Tsaousis
* initial version of worker utilization * working example * without mutexes * monitoring DBENGINE, ACLKSYNC, WEB workers * added charts to monitor worker usage * fixed charts units * updated contexts * updated priorities * added documentation * converted threads to stacked chart * One query per query thread * Revert "One query per query thread" This reverts commit 6aeb391f5987c3c6ba2864b559fd7f0cd64b14d3. * fixed priority for web charts * read worker cpu utilization from proc * read workers cpu utilization via /proc/self/task/PID/stat, so that we have cpu utilization even when the jobs are too long to finish within our update_every frequency * disabled web server cpu utilization monitoring - it is now monitored by worker utilization * tight integration of worker utilization to web server * monitoring statsd worker threads * code cleanup and renaming of variables * contrained worker and statistics conflict to just one variable * support for rendering jobs per type * better priorities and removed the total jobs chart * added busy time in ms per job type * added proc.plugin monitoring, switch clock to MONOTONIC_RAW if available, global statistics now cleans up old worker threads * isolated worker thread families * added cgroups.plugin workers * remove unneeded dimensions when then expected worker is just one * plugins.d and streaming monitoring * rebased; support worker_is_busy() to be called one after another * added diskspace plugin monitoring * added tc.plugin monitoring * added ML threads monitoring * dont create dimensions and charts that are not needed * fix crash when job types are added on the fly * added timex and idlejitter plugins; collected heartbeat statistics; reworked heartbeat according to the POSIX * the right name is heartbeat for this chart * monitor streaming senders * added streaming senders to global stats * prevent division by zero * added clock_init() to external C plugins * added freebsd and macos plugins * added freebsd and macos to global statistics * dont use new as a variable; address compiler warnings on FreeBSD and MacOS * refactored contexts to be unique; added health threads monitoring Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-05-09Resolve coverity issues (#12846)Stelios Fragkakis
- Variable "hostname" going out of scope leaks the storage it points to. - Null-checking "rd->name" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
2022-05-07fix memory leaks and mismatches of the use of the z functions for ↵Costa Tsaousis
allocations (#12841) * fix mismatches of the use of the z functions for allocations * when there was no memory; the original name of the dimensions was freed, and with mismatching deallocator.. * fixed memory leak at rrdeng_load_metric_*() functions * fixed memory leak on exit of plugins.d parser * fixed memory leak on plugins and streaming receiver threads exit * fixed compiler warnings
2022-05-07speedup queries by providing optimization in the main loop (#12811)Costa Tsaousis
2022-05-06Set a page wait timeout to 1 second (#12836)Stelios Fragkakis
Retry 3 times, to queue the page request before giving up
2022-05-05Reduce the number of messages logged to one that sums up the numbr of ↵Stelios Fragkakis
metrics ignored (#12829)
2022-05-05Add chart filtering parameter to the allmetrics API query (#12820)Vladimir Kobal
* Add chart filtering in the allmetrics API call * Fix compilation warnings * Remove unnecessary function * Update the documentation * Apply suggestions from code review * Check for filter instead of filter_string * Do not check both - chart id and name for prometheus and shell formats * Fix unit tests Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2022-05-05Cleanup node instance (#12825)Stelios Fragkakis
2022-05-05Fill missing removed events after a crash (#12803)Emmanuel Vasilakis
* inject removed events when missing from sqlite * pass flag * remove log message
2022-05-04Optimize linking of foreach alarms to dimensions. (#12813)vkalintiris
* Optimize linking of foreach alarms to dimensions. Keep the write-lock on host but use read-lock for charts because it's easy to verify that they aren't modified by the linking of foreach alarms to dimensions. * Protect alarm log modifications with write-lock.
2022-05-04* Add a parameter for the libuv worker threads to pre-initialize (#12814)Stelios Fragkakis
* Set the thread name for libuv threads to LIBUV_WORKER * Make sure the dbengine thread has the correct name
2022-05-04Metric correlations (#12582)Emmanuel Vasilakis
* initial attempt at metric correlations * fix loop * simplify struct * change json * get points from query * comment * dont lock the host as much * add a configuration option to enable/disable metric correlations * remove KSfbar from header file * lock charts * add timeout * cast multiplication * add licencing info * better licencing * use onewayalloc * destroy owa
2022-05-04fix!: do not replace a hyphen in the chart name with an underscore (#12812)Ilya Mashchenko
2022-05-03Improve agent cloud chart synchronization (#12655)Stelios Fragkakis
* Try to queue dimension always when: Trying to clean obsolete charts If chart has been sent and liveness apparently changed * delay rotation and skip chart check if not send to cloud * No need to CLEAR flag during database rotation Do not clear chart ACLK status for dimension requests * Change payload_sent to return timestamp of submitted message * Clear the dimension ACLK flag if we are processing all the charts again * Check if dimension is already queued to ACLK and ignore it If queue fails then reset it to retry Already try to queue the dimension * Improve dimension cleanup during the retention message calculation * Change queue_dimension_to_aclk to return void * If no time range for this dimension then assume it is deleted * Start streaming for inactive nodes * Remove dead code * Correctly report hostname in the access log * Schedule a dimension deletion without trying to submit a message immediately * Enable dimension cleanup -- also delete dimension if not found in the dbengine files Free hostname
2022-05-03Remove per chart configuration. (#12728)vkalintiris
After https://github.com/netdata/netdata/pull/12209 per-chart configuration was used for (a) enabling/disabling a chart, and (b) renaming dimensions. Regarding the first use case: We already have component-specific configuration options|flags to finely control how a chart should behave. Eg. "send charts matching" in streaming, "charts to skip from training" in ML, etc. If we really need the concept of a disabled chart, we can add a host-level simple pattern to match these charts. Regarding the second use case: It's not obvious why we'd need to provide support for remapping dimension names through a chart-specific configuration from the core agent. If the need arises, we could add such support at the right place, ie. a exporter/streaming config section. This will allow each flag to act indepentendly from each other and avoid managing flag-state manually at various places, eg: ``` if(unlikely(!rrdset_flag_check(st, RRDSET_FLAG_ENABLED))) { rrdset_flag_clear(st, RRDSET_FLAG_UPSTREAM_SEND); rrdset_flag_set(st, RRDSET_FLAG_UPSTREAM_IGNORE); } ... ```
2022-05-03Skip dimension deletion on free (temp fix) (#12777)Stelios Fragkakis
2022-05-03Configurable storage engine for Netdata agents: step 1 (#12776)Adrien Béraud
* rrd: move API structures out of rrddim_volatile In C, unlike C++, it's not possible to reference a nested structure from outside this structure. Since we later want to use rrddim_query_ops and rrddim_collect_ops separately from rrddim_volatile, move these nested structures out. * rrd: use opaque handle types for different memory modes
2022-05-03Check for chart obsoletion on children re-connections (#12707)Emmanuel Vasilakis
* check for chart obsoletion on children connections * use rrdset_is_obsolete
2022-05-03One way allocator to double the speed of parallel context queries (#12787)Costa Tsaousis
* one way allocator to speed up context queries * fixed a bug while expanding memory pages * reworked for clarity and finally fixed the bug of allocating memory beyond the page size * further optimize allocation step to minimize the number of allocations made * implement strdup with memcpy instead of strcpy * added documentation * prevent an uninitialized use of owa * added callocz() interface * integrate onewayalloc everywhere - apart sql queries * one way allocator is now used in context queries using archived charts in sql * align on the size of pointers * forgotten freez() * removed not needed memcpys * give unique names to global variables to avoid conflicts with system definitions
2022-05-02Reduce alert events sent to the cloud. (#12544)Emmanuel Vasilakis
* filter * update filter * queue removed directly * more * logging * cleanup * cleanup 2 * cleanup 3 * finalize instead of reset
2022-05-02Avoid clearing already unset flags. (#12727)vkalintiris
If memory mode is save, map or ram the set's flags are initialized to 0. Otherwise, the set is calloc'd which will make the set have 0 flags.
2022-05-02Make atomics a hard-dep. (#12730)vkalintiris
They are used extensively throughout our code base, and not having support for them does not generate a thread-safe agent.