summaryrefslogtreecommitdiffstats
path: root/database/rrddim.c
AgeCommit message (Collapse)Author
2022-05-03Configurable storage engine for Netdata agents: step 1 (#12776)Adrien Béraud
* rrd: move API structures out of rrddim_volatile In C, unlike C++, it's not possible to reference a nested structure from outside this structure. Since we later want to use rrddim_query_ops and rrddim_collect_ops separately from rrddim_volatile, move these nested structures out. * rrd: use opaque handle types for different memory modes
2022-04-28feat(dbengine): make dbengine page cache undumpable and dedupuble (#12765)Ilya Mashchenko
* make netdata more awesome * reworked on-madvise and mmap to provide clarity
2022-04-01Fix memory leaks on Netdata exit (#12511)Vladimir Kobal
* Fix memory leaks in dimensions and charts * Initialize superblock memory regions * Clean up static threads * Fix memory leaks in compression * Fix memory leaks in rrdcaltemplate * Fix memory leaks in health config * Fix ACLK memory leaks
2022-03-31Initialize foreach alarms of dimensions in health thread. (#12452)vkalintiris
The previous approach required us to try wr-lock the host after locking a chart and sleeping on failure. Lock contention would lead to alarms not being created and the agent to become unresponsive.
2022-03-28Skip `foreach` alarms for dimensions of anomaly rate chart. (#12441)vkalintiris
Health is not enabled for the anomaly rates chart. This was missed in the original PR that added support for tracking anomaly rates with dbengine. The side-effect was that the agent would block when opening the dashboard before its initialization was done.
2022-03-08Adjust cloud dimension update frequency (#12284)Stelios Fragkakis
* Queue a chart immediately to the cloud * Do not inform the cloud immediately if a dimension stopped collecting use MAX(obsoletion time, 1.5 * update_every) * Notify cloud immediately on dimension deletion * Add debug messages * Do not schedule an update if we are shutting down
2022-02-24Track anomaly rates with DBEngine. (#12083)vkalintiris
* Track anomaly rates with DBEngine. This commit adds support for tracking anomaly rates with DBEngine. We do so by creating a single chart with id "anomaly_detection.anomaly_rates" for each trainable/predictable host, which is responsible for tracking the anomaly rate of each dimension that we train/predict for that host. The rrdset->state->is_ar_chart boolean flag is set to true only for anomaly rates charts. We use this flag to: - Disable exposing the anomaly rates charts through the functionality in backends/, exporting/ and streaming/. - Skip generation of configuration options for the name, algorithm, multiplier, divisor of each dimension in an anomaly rates chart. - Skip the creation of health variables for anomaly rates dimensions. - Skip the chart/dim queue of ACLK. - Post-process the RRDR result of an anomaly rates chart, so that we can return a sorted, trimmed number of anomalous dimensions. In a child/parent configuration where both the child and the parent run ML for the child, we want to be able to stream the rest of the ML-related charts to the parent. To be able to do this without any chart name collisions, the charts are now created on localhost and their IDs and titles have the node's machine_guid and hostname as a suffix, respectively. * Fix exporting_engine tests. * Restore default ML configuration. The reverted changes where meant for local testing only. This commit restores the default values that we want to have when someone runs anomaly detection on their node. * Set context for anomaly_detection.* charts. * Check for anomaly rates chart only with a valid pointer. * Remove duplicate code. * Use a more descriptive name for id/title pair variable
2022-02-23Store dimension hidden option in the metadata db (#12196)Stelios Fragkakis
* Add a function to update dimension options in the metadata database * Update the option for dimension to be hidden/unhinden when rrdim_hide/rrdim_unhide is called * Store the hidden option for dimensions to the database
2022-02-22Remove chart specific configuration from netdata.conf except enabled (#12209)Stelios Fragkakis
2021-10-27Anomaly Detection MVP (#11548)vkalintiris
* Add support for feature extraction and K-Means clustering. This patch adds support for performing feature extraction and running the K-Means clustering algorithm on the extracted features. We use the open-source dlib library to compute the K-Means clustering centers, which has been added as a new git submodule. The build system has been updated to recognize two new options: 1) --enable-ml: build an agent with ml functionality, and 2) --enable-ml-tests: support running tests with the `-W mltest` option in netdata. The second flag is meant only for internal use. To build tests successfully, you need to install the GoogleTest framework on your machine. * Boilerplate code to track hosts/dims and init ML config options. A new opaque pointer field is added to the database's host and dimension data structures. The fields point to C++ wrapper classes that will be used to store ML-related information in follow-up patches. The ML functionality needs to iterate all tracked dimensions twice per second. To avoid locking the entire DB multiple times, we use a separate dictionary to add/remove dimensions as they are created/deleted by the database. A global configuration object is initialized during the startup of the agent. It will allow our users to specify ML-related configuration options, eg. hosts/charts to skip from training, etc. * Add support for training and prediction of dimensions. Every new host spawns a training thread which is used to train the model of each dimension. Training of dimensions is done in a non-batching mode in order to avoid impacting the generated ML model by the CPU, RAM and disk utilization of the training code itself. For performance reasons, prediction is done at the time a new value is pushed in the database. The alternative option, ie. maintaining a separate thread for prediction, would be ~3-4x times slower and would increase locking contention considerably. For similar reasons, we use a custom function to unpack storage_numbers into doubles, instead of long doubles. * Add data structures required by the anomaly detector. This patch adds two data structures that will be used by the anomaly detector in follow-up patches. The first data structure is a circular bit buffer which is being used to count the number of set bits over time. The second data structure represents an expandable, rolling window that tracks set/unset bits. It is explicitly modeled as a finite-state machine in order to make the anomaly detector's behaviour easier to test and reason about. * Add anomaly detection thread. This patch creates a new anomaly detection thread per host. Each thread maintains a BitRateWindow which is updated every second based on the anomaly status of the correspondent host. Based on the updated status of the anomaly window, we can identify the existence/absence of an anomaly event, it's start/end time and the dimensions that participate in it. * Create/insert/query anomaly events from Sqlite DB. * Create anomaly event endpoints. This patch adds two endpoints to expose information about anomaly events. The first endpoint returns the list of anomalous events within a specified time range. The second endpoint provides detailed information about a single anomaly event, ie. the list of anomalous dimensions in that event along with their anomaly rate. The `anomaly-bit` option has been added to the `/data` endpoint in order to allow users to get the anomaly status of individual dimensions per second. * Fix build failures on Ubuntu 16.04 & CentOS 7. These distros do not have toolchains with C++11 enabled by default. Replacing nullptr with NULL should be fix the build problems on these platforms when the ML feature is not enabled. * Fix `make dist` to include ML makefiles and dlib sources. Currently, we add ml/kmeans/dlib to EXTRA_DIST. We might want to generate an explicit list of source files in the future, in order to bring down the generated archive's file size. * Small changes to make the LGTM & Codacy bots happy. - Cast unused result of function calls to void. - Pass a const-ref string to Database's constructor. - Reduce the scope of a local variable in the anomaly detector. * Add user configuration option to enable/disable anomaly detection. * Do not log dimension-specific operations. Training and prediction operations happen every second for each dimension. In prep for making this PR easier to run anomaly detection for many charts & dimensions, I've removed logs that would cause log flooding. * Reset dimensions' bit counter when not above anomaly rate threshold. * Update the default config options with real values. With this patch the default configuration options will match the ones we want our users to use by default. * Update conditions for creating new ML dimensions. 1. Skip dimensions with update_every != 1, 2. Skip dimensions that come from the ML charts. With this filtering in place, any configuration value for the relevant simple_pattern expressions will work correctly. * Teach buildinfo{,json} about the ML feature. * Set --enable-ml by default in the configuration options. This patch is only meant for testing the building of the ML functionality on Github. It will be reverted once tests pass successfully. * Minor build system fixes. - Add path to json header - Enable C++ linker when ML functionality is enabled - Rename ml/ml-dummy.cc to ml/ml-dummy.c * Revert "Set --enable-ml by default in the configuration options." This reverts commit 28206952a59a577675c86194f2590ec63b60506c. We pass all Github checks when building the ML functionality, except for those that run on CentOS 7 due to not having a C++11 toolchain. * Check for missing dlib and nlohmann files. We simply check the single-source files upon which our build system depends. If they are missing, an error message notifies the user about missing git submodules which are required for the ML functionality. * Allow users to specify the maximum number of KMeans iterations. * Use dlib v19.10 v19.22 broke compatibility with CentOS 7's g++. Development of the anomaly detection used v19.10, which is the version used by most Debian and Ubuntu distribution versions that are not past EOL. No observable performance improvements/regressions specific to the K-Means algorithm occur between the two versions. * Detect and use the -std=c++11 flag when building anomaly detection. This patch automatically adds the -std=c++11 when building netdata with the ML functionality, if it's supported by the user's toolchain. With this change we are able to build the agent correctly on CentOS 7. * Restructure configuration options. - update default values, - clamp values to min/max defaults, - validate and identify conflicting values. * Add update_every configuration option. Considerring that the MVP does not support per host configuration options, the update_every option will be used to filter hosts to train. With this change anomaly detection will be supported on: - Single nodes with update_every != 1, and - Children nodes with a common update_every value that might differ from the value of the parent node. * Reorganize anomaly detection charts. This follows Andrew's suggestion to have four charts to show the number of anomalous/normal dimensions, the anomaly rate, the detector's window length, and the events that occur in the prediction step. Context and family values, along with the necessary information in the dashboard_info.js file, will be updated in a follow-up commit. * Do not dump anomaly event info in logs. * Automatically handle low "train every secs" configuration values. If a user specifies a very low value for the "train every secs", then it is possible that the time it takes to train a dimension is higher than the its allotted time. In that case, we want the training thread to: - Reduce it's CPU usage per second, and - Allow the prediction thread to proceed. We achieve this by limiting the training time of a single dimension to be equal to half the time allotted to it. This means, that the training thread will never consume more than 50% of a single core. * Automatically detect if ML functionality should be enabled. With these changes, we enable ML if: - The user has not explicitly specified --disable-ml, and - Git submodules have been checked out properly, and - The toolchain supports C++11. If the user has explicitly specified --enable-ml, the build fails if git submodules are missing, or the toolchain does not support C++11. * Disable anomaly detection by default. * Do not update charts in locked region. * Cleanup code reading configuration options. * Enable C++ linker when building ML. * Disable ML functionality for CMake builds. * Skip LGTM for dlib and nlohmann libraries. * Do not build ML if libuuid is missing. * Fix dlib path in LGTM's yaml config file. * Add chart to track duration of prediction step. * Add chart to track duration of training step. * Limit the number dimensions in an anomaly event. This will ensure our JSON results won't grow without any limit. The default ML configuration options, train approximately ~1700 dimensions in a newly-installed Netdata agent. The hard-limit is set to 2000 dimensions which: - Is well above the default number of dimensions we train, - If it is ever reached it means that the user had accidentaly a very low anomaly rate threshold, and - Considering that we sort the result by anomaly score, the cutoff dimensions will be the less anomalous, ie. the least important to investigate. * Add information about the ML charts. * Update family value in ML charts. This fix will allow us to show the individual charts in the RHS Anomaly Detection submenu. * Rename chart type s/anomalydetection/anomaly_detection/g * Expose ML feat in /info endpoint. * Export ML config through /info endpoint. * Fix CentOS 7 build. * Reduce the critical region of a host's lock. Before this change, each host had a single, dedicated lock to protect its map of dimensions from adding/deleting new dimensions while training and detecting anomalies. This was problematic because training of a single dimension can take several seconds in nodes that are under heavy load. After this change, the host's lock protects only the insertion/deletion of new dimensions, and the prediction step. For the training of dimensions we use a dedicated lock per dimension, which is responsible for protecting the dimension from deletion while training. Prediction is fast enough, even on slow machines or under heavy load, which allows us to use the host's main lock and avoid increasing the complexity of our implementation in the anomaly detector. * Improve the way we are tracking anomaly detector's performance. This change allows us to: - track the total training time per update_every period, - track the maximum training time of a single dimension per update_every period, and - export the current number of total, anomalous, normal dimensions to the /info endpoint. Also, now that we use dedicated locks per dimensions, we can train under heavy load continuously without having to sleep in order to yield the training thread and allow the prediction thread to progress. * Use samples instead of seconds in ML configuration. This commit changes the way we are handling input ML configuration options from the user. Instead of treating values as seconds, we interpret all inputs as number of update_every periods. This allows us to enable anomaly detection on hosts that have update_every != 1 second, and still produce a model for training/prediction & detection that behaves in an expected way. Tested by running anomaly detection on an agent with update_every = [1, 2, 4] seconds. * Remove unecessary log message in detection thread * Move ML configuration to global section. * Update web/gui/dashboard_info.js Co-authored-by: Andrew Maguire <andrewm4894@gmail.com> * Fix typo Co-authored-by: Andrew Maguire <andrewm4894@gmail.com> * Rebase. * Use negative logic for anomaly bit. * Add info for prediction_stats and training_stats charts. * Disable ML on PPC64EL. The CI test fails with -std=c++11 and requires -std=gnu++11 instead. However, it's not easy to quickly append the required flag to CXXFLAGS. For the time being, simply disable ML on PPC64EL and if any users require this functionality we can fix it in the future. * Add comment on why we disable ML on PPC64EL. Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>
2021-10-22Reuse the SN_EXISTS bit to track anomaly status. (#11154)vkalintiris
* Replace all usages of SN_EXISTS with SN_DEFAULT_FLAGS. * Remove references to SN_NOT_EXISTS in comments. * Replace raw zero constant with SN_EMPTY_SLOT. * Use get_storage_number_flags only in storage_number.{c,h} * Compare against SN_EMPTY_SLOT to check if a storage_number exists. This is safe because: 1. rrdset_done_interpolate() is the only place where we call store_metric(), 2. All store_metric() calls, except for one, store an SN_EMPTY_SLOT value. 3. When we are not storing an SN_EMPTY_SLOT value, the flags that we pass to pack_storage_number() can be either SN_EXISTS *or* SN_EXISTS_RESET. * Compare only the SN_EXISTS_RESET bit to find reset values. * Remove get_storage_number_flags from storage_number.h * Do not set storage_number flags outside of rrdset_done_interpolate(). This is a NFC intended to limit the scope of storage_number flags processing to just one function. * Set reset bit without overwriting the rest of the flags. * Rename SN_EXISTS to SN_ANOMALY_BIT. * Use GOTOs in pack_storage_number to return from a single place. * Teach pack_storage_number how to handle anomalous zero values. Up until now, a storage_number had always either the SN_EXISTS or SN_EXISTS_RESET bit set. This meant that it was not possible for any packed storage_number to compare equal to the SN_EMPTY_SLOT. However, the SN_ANOMALY_BIT can be set to zero. This is fine for every value other than the anomalous 0 value, because it would compare equal to SN_EMPTY_SLOT. We address this issue by mapping the anomalous zero value to SN_EXISTS_100 (a number which was not possible to generate with the previous versions of the agent, ie. it won't exist in older dbengine files). This change was tested manually by intentionally flipping the anomaly bit for odd/even iterations in rrdset_done_interpolate. Prior to this change, charts whose dimensions had 0 values, where showing up in the dashboard as gaps (SN_EMPTY_SLOT), whereas with this commit the values are displayed correctly.
2021-10-06Enable additional functionality for the new cloud architecture (#11579)Stelios Fragkakis
2021-06-01Store uuid_t metric_uuid in the dimension state structure instead of uuid_t ↵Stelios Fragkakis
* (#11212)
2021-04-14Spelling database (#10914)Josh Soref
2021-03-17Rename abs to ABS to avoid clash with standard definitions. Fixes #10353. ↵Tomáš Kopal
(#10354)
2021-03-15Enable metadata persistence in all memory modes (#10742)Stelios Fragkakis
2021-03-10Rename struct avl to avl_element and the typedef to avl_t (#10735)vkalintiris
Before: ``` struct foobar { avl avl; ... } ``` After: ``` struct foobar { avl_t avl; ... }; ``` Which makes figuring out the type from field name easier.
2020-12-14Fix issue with chart metadata sent multiple times over ACLK (#10381)Stelios Fragkakis
* Add a flag RRDSET_FLAG_ACLK to mark that a chart needs to go to the cloud * Change calls to aclk_update_chart to set the RRDSET_FLAG_ACLK instead Make the call to aclk_update_chart only in rrdset_done (and in case the chart is deleted) * Fix compilation error when cloud is disabled * Skip netdata_cloud_setting check when setting the flag / calling aclk_update_chart (checked in there)
2020-11-28Fix race condition in rrdset_first_entry_t() and rrdset_last_entry_t() (#10276)Markos Fountoulakis
2020-11-24Migrate metadata log to SQLite (#10139)Stelios Fragkakis
2020-09-11Fix memory mode none not dropping stale dimension data in general when ↵Markos Fountoulakis
streaming to a parent. (#9917)
2020-09-11Fix memory mode none not marking dimensions as obsolete. (#9912)Markos Fountoulakis
* Fix memory mode none not marking dimensions as obsolete.
2020-08-20Added code to release memory used by the global GUID map (#9729)Stelios Fragkakis
Fixed memory leak issues associated with the global GUID map during agent shutdown
2020-08-11Fixed issue with missing alarms (#9712)Stelios Fragkakis
Fixed the alarm configuration when dimensions switch from archived to active
2020-07-28Implemented multihost database (#9556)Stelios Fragkakis
* Hard code a node for non-legacy multidb test Skip dbengine initialization for new incoming children Add code to switch to multidb ctx when accessing the dbengine * When a non-legacy streaming connection is detected, use the multidb metadata log context * Clear the superblock memory to avoid random data written in the metadata log * Activate the host detection during compaction Activate the host detection during metadata log chart updates Keep the host in the user object during replay of the HOST command * Add defaults for health / rrdpush on HOST metadata replay Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child() Use defaults from the stream.conf * Count hosts only if not archived When host switches from archived to active update rrd_hosts_available Remove archived hosts from charts and info * Change parameter from "multidb disk space" to "dbengine multihost disk space" Remove unused variables Fix compilation error when dbengine is disabled Fix condition for machine_guid directory creation under cache_dir * Enable multidb disk space file creation. * Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine. * Fix old bug in the code that confused obsolete hosts with orphan hosts. * Do not delete multi-host DB host files. * Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies. * Identify metadata that collide with non-dbengine memory mode hosts and ignore them. * Handle non-dbengine localhost with dbengine archived charts in localhost and streaming. * Ignore archived hosts in streaming. * Add documentation before merging to master. Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
2020-07-11Remove health from archived metrics (#9520)Markos Fountoulakis
* Disassociate health variables and alarms from archived charts and dimensions. * Ignore archived charts during health reload.
2020-06-29Disallow dimensions or charts being obsoleted and archived simultaneously. ↵Markos Fountoulakis
(#9436)
2020-06-15Fixed compiler warnings (#9337)Stelios Fragkakis
Fixed compiler warnings
2020-06-12Fixed invalid pointer access `rrddim_free_custom` (#9326)Stelios Fragkakis
Fixed invalid memory access in host creation and dimension deletion
2020-06-12Add support for persistent metadata (#9324)Stelios Fragkakis
* Implemented collector metadata logging * Added persistent GUIDs for charts and dimensions * Added metadata log replay and automatic compaction * Added detection of charts with no active collector (archived) * Added new endpoint to report archived charts via `/api/v1/archivedcharts` * Added support for collector metadata update Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
2020-03-31Switching over to soft feature flag (#8545)Andrew Moss
Preparing for the cloud release. This changes how we handle the feature flag so that it no longer requires installer switches and can be set from the config file. This still requires internal access to use and is not ready for public access yet.
2020-02-24Merging the feature branch for the ACLK in the previous sprint. (#8179)Andrew Moss
* ACLK connection and protocol improvements (#8139) * Adding ACLK retry on connection failure (#8147) * Fixed reconnect issues on the ACLK. (#8163) * Cleaning up ACLK - part 1 (#8167) Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2020-02-06ACLK agent 1 (#7894)Stelios Fragkakis
* - Add initial mqtt support * [WIP] Agent cloud link - Setup main mqtt thread to connect to a broker using V5 of the MQTT protocol (TBD) - Send alarms to "netdata/alarm" - Add error checks to handle connection failures - Add params for Broker, port Maximum concurrent sent / recev messages - Dummy function to check claiming status - Generic mqtt_send command to publish message to a base topic , sub topic It will end up in the form base_topic/sub_topic - Add host/port in the connection failure error message * Test libmosquitto libs * connect to broker locally (assume localhost:1883) * subscribe to channel netdata/command * Test try a reload command to trigger health reload * publish alerts to netdata/alarm * - Fix compile issues * - Use sleep_usec instead of usleep * - Delay reconnection on failure due to misconfiguration (high cpu usage) * - Remove the TLS connection config * - Fix NETDATA_MQTT_INITIALIZATION_SLEEP_WAIT to use seconds * - Gather ACLK related code under aclk folder - Add aclk_ functions for abstract layer - Moved low level libs intergration in mqtt.c * - Add README.md file with initial comment * - Clean MQTT v5 * - Code cleanup * - Remove alarm log for now - Remove the heart beat * - Remove message properties for V5 * - Remove message properties for V5 (header) * Fixed the netdata target to use a local static version of libmosquitto. The installer does not yet have steps to pull and build the local library. cd project_root git clone ssh://git@github.com/netdata/mosquitto mosquitto/ (cd mosquitto/lib && make) # Ignore the cpp error This will leave mosquitto/lib/libmosquitto.a for the build process to use. * - Fix compile issues with older < 1.6 libmosquitto lib * - Enable alarm events to check it works - Re arrange includes - Rework topic to be agent/guid/. Actual id will be returned by the is_agent_claimed * - Add initial metadata info - Added helper function in web_api - Added a debug command (info) * Update the claiming state to retrieve the claimed id. * - Use define for constants like command and metadata topics - Function to wait for initialization of the ACLK link - New aclk_subscribe command with QOS parameter for the mqtt subscription - Use the is_agent_claimed function to get the real claim id and use it to build the topics that will be used for the cloud communication - Change in netdata-claim.sh.in to write the claim id without a trailing \n * - Use define for constants like command and metadata topics - Function to wait for initialization of the ACLK link - New aclk_subscribe command with QOS parameter for the mqtt subscription - Use the is_agent_claimed function to get the real claim id and use it to build the topics that will be used for the cloud communication - Change in netdata-claim.sh.in to write the claim id without a trailing \n * - Remove the alarm log for now - Add code (but disabled) to send charts * - Use dummy anon, anon as username and password for testing purposes * - Use client id anon as well * Testing without TLS * Switching TLS back on to fix docker environment. * - Added query processing An incoming URL now calls web_client_api_request_v1_data to handle a request and push the results back to the "data" topic - Move the above processing from the message callback to the query handle loop - Added helper "pause" , "resume" commands to stop and resume query processing to stress test loading the queue with queries before executing them - Changed the endpoint topics to "meta", and "cmd" (previously metadata and command) * make info message follow protocol * move metadata msg generation into new func * move metadata msg generation into new func * - Add metadata to the responses - Add hook to queue chart changes on creation and dimensions - Changed the queue mechanism to include delay for X seconds - Add delayed submittion of charts to the cloud so that all DIMs are defined to avoid resubmission * - Add additional data info for aclk_queue command * - Use web_clinet_api_request_v1 to handle the incoming request This will handle all requests coming from the cloud * - Cleanup and aclk_query structure - Add msg_id parameter - Enable the incoming JSON request - Enable the outgoing JSON response * - Added new thread to handle query processing - Add lock and cond wait to wakeup thread when queries are submitted - Cleanup on the main init function * - Add wait time on agent init, to allow for chart, alarms and other definitions to be completed. - During the wait time, no queries will be queued * - Send metadata on query thread init - New generic create header function for the JSON response - Pack info and charts into one message - Modified chart to remove entries (test) - Modified charts mod to remove entries e.g alarms and volatile info - Change input to aclk_update_chart (RRDHOST / instead of hostname) * - When a request fails, add to the payload - We may need to handle in a different key - Error check in json parsing * - Add dummy aclk_update_alarm command * - Move incoming request JSON parsing code away from mqtt.c - Added #ifdef ACLK_ENABLE so that we can have code merged but disabled by default - Added version in incoming and outgoing JSON dict * - Disable code if ACLK_ENABLE is not defined - Remove references to the mqtt (mosquitto) lib - Add dummy stubs in mqtt.c for completeness if ACLK_ENABLE is not defined * - Disable challenge sample code for now * - Remove libmosquitto from makefile * - Fix spaces in Makefile.am - Remove ifdef to avoid warning from LGTM * - Remove for now the code that builds an along log test message to send to the cloud * - Add check for ACLK_ENABLE definition and avoid calling the chart update functions * - Remove commented code * - Move source files to the correct place (ACLK_PLUGIN_FILES) * - Remove include file thats not needed * - Remove include file thats not needed - Add improved checks for load_claiming_state() * - Fix error message. Used error() that also logs errno and message * - Fix some codacy issues * - Fix more codacy issues, code cleanup * - Revert code to address codacy warnings * - Revert spaces added in a previous commit by mistake * clean up if/else nest * print error if fopen fails * minor - error already logs errno * - Fix version formatting * - Cleanup all ACLK related compiler warnings - Re-arrange include files - Removed unused defines * - More compilation warnings fixed - Bug with thread creation fixed * - Add condition to skip compilation of the ACLK code entirely. Add env variable ACLK="yes" to enable * - Add condition to skip the libmosquitto * - Change feature flag from ACLK_ENABLE to ENABLE_ACLK in accordance with the rest of ENABLE_xx flags - Typo in info message fix Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com> Co-authored-by: Timo <6674623+underhood@users.noreply.github.com>
2019-10-14template_foreach_fix: Fix underscore dash (#7069)thiagoftsm
Netdata was not able to create charts when id and name were not the same this could happen when we were using templates, this commit fixes this specific problem, but it does not fix the problems that we have with dash and undescore
2019-10-11Fix dbengine not working when mmap fails (#7065)Markos Fountoulakis
2019-10-01Fix coverity erro (CID 349552) double lock (#6970)thiagoftsm
* dim_template_fix: Fix lock We had a double lock before, this commit fix this * dim_template_fix: Fix order This commit fix the order process * dim_template_fix: Return I am returning for the first solution, because the others are generating: * dim_template_fix: Try to lock This solution try to lock the host before to move in front * dim_template_fix: Move chart lock To avoid the chart to be deleted while we are linking the alarm I am moving bak the chart lock * dim_template_fix: Fix grammar This commit fixes the grammar of an error message * dim_template_fix: bring pattern Bring the defined pattern to the code and use netdata_rwlock_trywrlock * dim_template_fix Fix format This commit fixes a format missing
2019-09-27Create a template for all dimensions (#6560)thiagoftsm
* health_connection: Comments inside Health Config To try to understand better what is necessary to change and where it is necessary to change anything inside the health, I commented the functions inside this file" " * health_connection: Comments about Health in other files This commit brings the rest of the comments that were missed for health" * health_connection: Comments on health_log I had to append more comments on health_log * health_connection: Create a new variable New variable is created to work with foreach * health_connection: Fix new option and doc The first implementation of the 'foreach' had a problem, this fixes the error. This commit also brings the updates for the documentation * health_connection: Understanding health This commit is to save the place that I am working, it has the map to understand all the alam process * health_connection: Update map I changed the position of the error message to identify the correct place to add new alarms * health_connection: End of simple alarm This commit finishes what is necessary to bring the same lookup for different dimensions in one unique line * health_connection: Documentation and template steps This commit brings the documentation missed for template and comments to help in the next step of apply a template to create an alarm. * health_connection: Restoring After some tests, it was detected that the alarms were not working as expected * health_connection: Fix bug and bring dimension to template This commit brings a fix for an old Netdata bug, before this the Netdata always tried to create a new entry in an index with the same id raising an error. It also brings the possibility to use 'foreach' in template * health_connection: Fix cmake compilation There was a problem with cmake compilation fixed by this commit * health_connection: shell script Finilize the shell script to test the PR * health_connection: Remove debug message During the development, I used some messages to understand the code this commit removes the last message * health_connection: Fix bugs This commits fix bugs reported by tests * health_connection: Alarm working This commit brings the necessary change for the alarms work, but it is missing the unlink from the newest list * health_connection: Template code written This commit finishes the creation of alarm from template, but it was not tested yet. * health_connection: Remove comments I am removing the comments from this PR to bring back late * health_connection: Remove lines Another commit to restore the files before they to be commented * health_connection: New alarm and remove messages I am bringing a new alarm to test template with SP and removing comments used during the development * health_connection: Functional test review After to review the functional test script, it was necessary to small adjust to test all the features available with the new version * health_connection: Free structure I am moving the free list for the correct place, the previous place was not safe * health_connection: ShellCheck This commit fixes the problems with shellcheck * health_connection: FIx hash This commit fix the hash calculation that was using wrong input * health_connection: Fix message error The system was showing a wronge message, because when we have foreach the alarm created with templated is added in a second stage to the index * health_connection: Fix documentation In this commit I am fixing the grammar of the previous doc and bringing two examples * health_connection: Fix examples This commit fix the last two examples that was brought in this PR * health_connection: Fix example doc When I brought the correct grammar in the last commit, I lost a mark * health_connection: Grammar fix Fixing grammar of the documentation * health_connection: Memory leak This commit fixes the memory leak that was present in the PR * health_connection: Reload This commit fix the problem that the alarms were not linked after to receive a SIGUSR2 * health_connection: False Positive from codacy Codacy was given a false positive, I changed the function to avoid it. * health_connection: dead code Remove dead code from the code. * health_connection: Memory Leak Remove memory leak when clean simple pattern * health_connection: Script format With this commit I am formatting the last message to return for the default color on terminal * health_connection: Script format 2 With this commit I am formatting the last message to return for the default color on terminal * health_connection: Script format 3 With this commit I am formatting the error message to return for the default color on terminal
2019-08-28Variable Granularity support for data collection (#6430)Markos Fountoulakis
* Variable Granularity support for data collection in the dbengine. * Variable Granularity support for data collection in the daemon. * Added tests to validate the data being queried after having been collected by changing data collection interval * Fix memory corruption * Updated database engine documentation about data collection frequency behaviour
2019-07-23Make use of GCC's __attribute__((unused)) (#6392)Andrew Clayton
* configure.ac: Add support for GCC's __attribute__((unused)) When compiling under GCC with -Wextra (along with -Wall) there are a lot of compiler warnings such as collectors/cgroups.plugin/cgroup-network.c:89:45: warning: unused parameter ‘scope’ [-Wunused-parameter] 89 | struct iface *read_proc_net_dev(const char *scope, const char *prefix) { | ~~~~~~~~~~~~^~~~~ Some arguments may be able to be got rid off, others won't. GCC (and at least clang[0]) provide an __attribute__((unused)) annotation that can be used on function parameters (also on variables, functions, labels, enums, structs etc) to inform the compiler of such and will squash warnings of the above nature. A check is added to configure.ac for the use of GCC (I believe $GCC will be set to 'yes' for clang also) and if found it creates __always_unused & __maybe_unused #define's set to __attribute__((unused)) otherwise it just sets them empty. If other compilers have a similar feature this check can be adjusted to accommodate them. The reason for the two defines is that some variables may always be unused in a function, others may or may not depending on #ifdef's for example. So we are able to document both cases. Subsequent commits will start making use of these to squash such compiler warnings. [0]: https://clang.llvm.org/docs/AttributeReference.html#maybe-unused-unused Signed-off-by: Andrew Clayton <andrew@zeta.digital-domain.net> * collectors/statsd.plugin: Mark a function argument as __maybe_unused In collectors/statsd.plugin/statsd.c the app function argument to STATSD_APP_CHART_DIM() might be unused if NETDATA_INTERNAL_CHECKS is not defined, then the debug() macro that it's used in from libnetdata/log/log.h will be defined to a dummy function where none of the arguments are used. This fixes a compiler warning [-Wunused-parameter] when compiling with -Wextra *and* -Wall. Signed-off-by: Andrew Clayton <andrew@zeta.digital-domain.net> * collectors/apps.plugin: Mark a function argument as __maybe_unused In collectors/apps.plugin/apps_plugin.c the function debug_print_process_tree() takes an argument 'msg' that might be unused if NETDATA_INTERNAL_CHECKS is not defined, then debug_log() will be set to a dummy function that takes no arguments. This fixes a compiler warning [-Wunused-parameter] when compiling with -Wextra *and* -Wall. Signed-off-by: Andrew Clayton <andrew@zeta.digital-domain.net> * libnetdata/locks/locks: Mark function arguments as __maybe_unused In libnetdata/locks/locks.c there a number of functions that take arguments 'file', 'function' & 'line' that might be unused if NETDATA_INTERNAL_CHECKS is not defined, then the debug() macro that it's used in from libnetdata/log/log.h will be defined to a dummy function where none of the arguments are used. This fixes compiler warnings [-Wunused-parameter] when compiling with -Wextra *and* -Wall. Signed-off-by: Andrew Clayton <andrew@zeta.digital-domain.net> * libnetdata/socket/security: Mark a function argument as __maybe_unused In libnetdata/socket/security.c the function security_info_callback() takes an argument 'ret' that might be unused if NETDATA_INTERNAL_CHECKS is not defined, then the debug() macro that it's used in from libnetdata/log/log.h will be defined to a dummy function where none of the arguments are used. This fixes a compiler warning [-Wunused-parameter] when compiling with -Wextra *and* -Wall. Signed-off-by: Andrew Clayton <andrew@zeta.digital-domain.net> * collectors/cgroups.plugin: Mark a function argument as __maybe_unused In collectors/cgroups.plugin/cgroup-network.c the function read_proc_net_dev() takes an argument 'scope' that might be unused if the NETDATA_INTERNAL_CHECKS is not defined. This fixes a compiler warning [-Wunused-parameter] when compiling with -Wextra *and* -Wall". Signed-off-by: Andrew Clayton <andrew@zeta.digital-domain.net> * database/rrddim: Mark function arguments as __maybe_unused In database/rrddim.c there a couple of functions that take a 'st' argument that might be unused if NETDATA_INTERNAL_CHECKS is not defined, then the debug() macro that it's used in from libnetdata/log/log.h will be defined to a dummy function where none of the arguments are used. This fixes compiler warnings [-Wunused-parameter] when compiling with -Wextra *and* -Wall. Signed-off-by: Andrew Clayton <andrew@zeta.digital-domain.net> * database/rrdvar: Mark a function argument as __maybe_unused In database/rrdvar.c the function rrdvar_create_and_index() take an argument 'scope' that might be unused if NETDATA_INTERNAL_CHECKS is not defined, then the debug() macro that it's used in from libnetdata/log/log.h will be defined to a dummy function where none of the arguments are used. This fixes a compiler warning [-Wunused-parameter] when compiling with -Wextra *and* -Wall. Signed-off-by: Andrew Clayton <andrew@zeta.digital-domain.net>
2019-06-07Prometheus remote write backend (#6062)Vladimir Kobal
* Add Prometheus remote write backend prototype * Fix autotools issues * Send HTTP POST request * Add parameters to HTTP header * Discard HTTP responce 200 * Update CMake build configuration * Fix Codacy issue * Check for C++ binary * Fix compilation without remote write backend * Add options to the installer script * Fix configure script warning * Fix make dist * Downgrade to ByteSize for better compatibility * Integrate remote write more tightly into the existing backends code * Cleanup * Fix build error * Parse host tags * Fix Codacy issue * Fix counters for buffered data * Rename preprocessor symbol * Better error handling * Cleanup * Update the documentation
2019-05-15Database engine (#5282)Markos Fountoulakis
* Database engine prototype version 0 * Database engine initial integration with netdata POC * Scalable database engine with file and memory management. * Database engine integration with netdata * Added MIN MAX definitions to fix alpine build of travis CI * Bugfix for backends and new DB engine, remove useless rrdset_time2slot() calls and erroneous checks * DB engine disk protocol correction * Moved DB engine storage file location to /var/cache/netdata/{host}/dbengine * Fix configure to require openSSL for DB engine * Fix netdata daemon health not holding read lock when iterating chart dimensions * Optimized query API for new DB engine and old netdata DB fallback code-path * netdata database internal query API improvements and cleanup * Bugfix for DB engine queries returning empty values * Added netdata internal check for data queries for old and new DB * Added statistics to DB engine and fixed memory corruption bug * Added preliminary charts for DB engine statistics * Changed DB engine ratio statistics to incremental * Added netdata statistics charts for DB engine internal statistics * Fix for netdata not compiling successfully when missing dbengine dependencies * Added DB engine functional test to netdata unittest command parameter * Implemented DB engine dataset generator based on example.random chart * Fix build error in CI * Support older versions of libuv1 * Fixes segmentation fault when using multiple DB engine instances concurrently * Fix memory corruption bug * Fixed createdataset advanced option not exiting * Fix for DB engine not working on FreeBSD * Support FreeBSD library paths of new dependencies * Workaround for unsupported O_DIRECT in OS X * Fix unittest crashing during cleanup * Disable DB engine FS caching in Apple OS X since O_DIRECT is not available * Fix segfault when unittest and DB engine dataset generator don't have permissions to create temporary host * Modified DB engine dataset generator to create multiple files * Toned down overzealous page cache prefetcher * Reduce internal memory fragmentation for page-cache data pages * Added documentatio