summaryrefslogtreecommitdiffstats
path: root/daemon
AgeCommit message (Collapse)Author
2020-02-11Docs: Promote DB engine/long-term metrics storage more heavily (#8031)Joel Hans
* Fixes to DOCS home and README * Edit conf-guide and getting-started * Add dbengine settings to map * Fix tutorial and step-by-step * Fix artifacts of old memory mode types * A few tweaks * Push a little harder on README * Fix for Markos
2020-02-08Fix variety of linter errors across docs (#7944)Joel Hans
* Fixes for database/readme.md * Fixes for registry/readme.md * Fixes for daemon/readme.md * Fixes for database/engine/readme.md * Fixes for registry/readme.md * Fix for cli/readme.md * Fixes on docs/a-github-star-is-important.md * A few more documents
2020-02-07Update `api/v1/info ` (#7862)thiagoftsm
* update_info: New variables This commit creates inside script and it reads them to Netdata * update_info: API This commit changes the web api response * update_info: Disk space This commit brings the disk space to info and renames the environment variables inside Netdata * update_info: Rename variable This commit renames the environment variable * update_info: Rename response variable This commit renames a response variable * update_info: Labels This commit creates the missing labels * update_info: test before free * update_info: Doc function This commit brings docummentation to the functions to give instructions to developer * update_info: Fix info message This commit removes some info messages from the error.log * update_info: Remove unecessary ifs, considering free manual
2020-02-06Drop dirty dbengine pages if disk cannot keep up (#7777)Markos Fountoulakis
* Introduce dirty page pressure handling in the dbengine page cache that invalidates pages when the disk cannot keep up with the flushing speed.
2020-02-06ACLK agent 1 (#7894)Stelios Fragkakis
* - Add initial mqtt support * [WIP] Agent cloud link - Setup main mqtt thread to connect to a broker using V5 of the MQTT protocol (TBD) - Send alarms to "netdata/alarm" - Add error checks to handle connection failures - Add params for Broker, port Maximum concurrent sent / recev messages - Dummy function to check claiming status - Generic mqtt_send command to publish message to a base topic , sub topic It will end up in the form base_topic/sub_topic - Add host/port in the connection failure error message * Test libmosquitto libs * connect to broker locally (assume localhost:1883) * subscribe to channel netdata/command * Test try a reload command to trigger health reload * publish alerts to netdata/alarm * - Fix compile issues * - Use sleep_usec instead of usleep * - Delay reconnection on failure due to misconfiguration (high cpu usage) * - Remove the TLS connection config * - Fix NETDATA_MQTT_INITIALIZATION_SLEEP_WAIT to use seconds * - Gather ACLK related code under aclk folder - Add aclk_ functions for abstract layer - Moved low level libs intergration in mqtt.c * - Add README.md file with initial comment * - Clean MQTT v5 * - Code cleanup * - Remove alarm log for now - Remove the heart beat * - Remove message properties for V5 * - Remove message properties for V5 (header) * Fixed the netdata target to use a local static version of libmosquitto. The installer does not yet have steps to pull and build the local library. cd project_root git clone ssh://git@github.com/netdata/mosquitto mosquitto/ (cd mosquitto/lib && make) # Ignore the cpp error This will leave mosquitto/lib/libmosquitto.a for the build process to use. * - Fix compile issues with older < 1.6 libmosquitto lib * - Enable alarm events to check it works - Re arrange includes - Rework topic to be agent/guid/. Actual id will be returned by the is_agent_claimed * - Add initial metadata info - Added helper function in web_api - Added a debug command (info) * Update the claiming state to retrieve the claimed id. * - Use define for constants like command and metadata topics - Function to wait for initialization of the ACLK link - New aclk_subscribe command with QOS parameter for the mqtt subscription - Use the is_agent_claimed function to get the real claim id and use it to build the topics that will be used for the cloud communication - Change in netdata-claim.sh.in to write the claim id without a trailing \n * - Use define for constants like command and metadata topics - Function to wait for initialization of the ACLK link - New aclk_subscribe command with QOS parameter for the mqtt subscription - Use the is_agent_claimed function to get the real claim id and use it to build the topics that will be used for the cloud communication - Change in netdata-claim.sh.in to write the claim id without a trailing \n * - Remove the alarm log for now - Add code (but disabled) to send charts * - Use dummy anon, anon as username and password for testing purposes * - Use client id anon as well * Testing without TLS * Switching TLS back on to fix docker environment. * - Added query processing An incoming URL now calls web_client_api_request_v1_data to handle a request and push the results back to the "data" topic - Move the above processing from the message callback to the query handle loop - Added helper "pause" , "resume" commands to stop and resume query processing to stress test loading the queue with queries before executing them - Changed the endpoint topics to "meta", and "cmd" (previously metadata and command) * make info message follow protocol * move metadata msg generation into new func * move metadata msg generation into new func * - Add metadata to the responses - Add hook to queue chart changes on creation and dimensions - Changed the queue mechanism to include delay for X seconds - Add delayed submittion of charts to the cloud so that all DIMs are defined to avoid resubmission * - Add additional data info for aclk_queue command * - Use web_clinet_api_request_v1 to handle the incoming request This will handle all requests coming from the cloud * - Cleanup and aclk_query structure - Add msg_id parameter - Enable the incoming JSON request - Enable the outgoing JSON response * - Added new thread to handle query processing - Add lock and cond wait to wakeup thread when queries are submitted - Cleanup on the main init function * - Add wait time on agent init, to allow for chart, alarms and other definitions to be completed. - During the wait time, no queries will be queued * - Send metadata on query thread init - New generic create header function for the JSON response - Pack info and charts into one message - Modified chart to remove entries (test) - Modified charts mod to remove entries e.g alarms and volatile info - Change input to aclk_update_chart (RRDHOST / instead of hostname) * - When a request fails, add to the payload - We may need to handle in a different key - Error check in json parsing * - Add dummy aclk_update_alarm command * - Move incoming request JSON parsing code away from mqtt.c - Added #ifdef ACLK_ENABLE so that we can have code merged but disabled by default - Added version in incoming and outgoing JSON dict * - Disable code if ACLK_ENABLE is not defined - Remove references to the mqtt (mosquitto) lib - Add dummy stubs in mqtt.c for completeness if ACLK_ENABLE is not defined * - Disable challenge sample code for now * - Remove libmosquitto from makefile * - Fix spaces in Makefile.am - Remove ifdef to avoid warning from LGTM * - Remove for now the code that builds an along log test message to send to the cloud * - Add check for ACLK_ENABLE definition and avoid calling the chart update functions * - Remove commented code * - Move source files to the correct place (ACLK_PLUGIN_FILES) * - Remove include file thats not needed * - Remove include file thats not needed - Add improved checks for load_claiming_state() * - Fix error message. Used error() that also logs errno and message * - Fix some codacy issues * - Fix more codacy issues, code cleanup * - Revert code to address codacy warnings * - Revert spaces added in a previous commit by mistake * clean up if/else nest * print error if fopen fails * minor - error already logs errno * - Fix version formatting * - Cleanup all ACLK related compiler warnings - Re-arrange include files - Removed unused defines * - More compilation warnings fixed - Bug with thread creation fixed * - Add condition to skip compilation of the ACLK code entirely. Add env variable ACLK="yes" to enable * - Add condition to skip the libmosquitto * - Change feature flag from ACLK_ENABLE to ENABLE_ACLK in accordance with the rest of ENABLE_xx flags - Typo in info message fix Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com> Co-authored-by: Timo <6674623+underhood@users.noreply.github.com>
2020-02-03Fix cmake build error (#7960)Markos Fountoulakis
2020-02-01Parse host tags (#7702)Vladimir Kobal
* Fix memory leaks * Check for configuration options * Parse simple tags * Parse JSON tags * Remove an unnecessary check * Parse a JSON object * Parse a JSON array * Update the documentation * Fix host locks
2020-01-31Fixes a bug in DO_NOT_TRACK expression (#7929)James Mills
* Fixed bug in DO_NOT_TRACK expression * Fix kickstart-static64 checksum in docs. Co-authored-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
2020-01-30Adds support for opting out of telemetry via the DO_NOT_TRACK envirnment ↵James Mills
variable (#7846) * Added support for opting out of telemtry via the DO_NOT_TRACK environment variable * Added support for DO_NOT_TRACK=1 in anonymous-statistics.sh and minor cleanup in Dockerfile and run.sh entrypoint * Allow DO_NOT_TRACK to be either non-zero or non-empty * Update md5sum of kickstart-static64.sh in docs * Fixed a bug in netdata-installer.sh * Revert changes to daemon/main.c (testing onyl) * Update docs/anonymous-statistics.md Co-Authored-By: Mansour Behabadi <57921115+ncmans@users.noreply.github.com> Co-authored-by: Mansour Behabadi <57921115+ncmans@users.noreply.github.com>
2020-01-29Add disk size detection to system-info.sh. (#7866)Austin S. Hemmelgarn
* Add rudimentary disk size detection to system-info.sh. This adds really basic detection of total disk size to the system-info.sh script. This detection is _wildly_ inaccurate in many cases, but it's not realistically possible to make it much better than what's added here without making the script exponentially more complicated (there are just way too many edge cases and special conditions to worry about). This adds the following keys to the output of the script: * NETDATA_TOTAL_DISK_SIZE: This indicates the total detected disk size in bytes. * NETDATA_DISK_DETECTION: This indicates what detection method was used for determining the total disk size. It may return either `df` (which works almost everywhere but is wildly inaccurate) or `sysfs` (which is linux specific, but is 100% accurate for a majority of cases). On most platforms, this parses the output of the `df` command, limiting only to filesystems that are used on fixed disks, which provides reasonably accurate results in most trvial cases, but kind of falls apart the moment people are doing anything remotely complicated in terms of storage (like using a logical volume manager or under-comitting their disk space). On Linux, it will preferentially try to parse the info out of `/sys/block`, filtering on device major numbers that are actually used for fixed disks and excluding devices that are indicated to be removable. This provides a very accurate result if the system does not use removable media as primary storage, but requires that the user who runs the script can read the contents of `/sys/block`. * Add VirtIO block device major number to the list of scanned devices. * Actually handle VirtIO block devices correctly. * Fixes for macOS handling.
2020-01-28Missing extern (#7877)thiagoftsm
* missing_extern: Fix missing Fix few externs that were missing in global variables * missing_extern: Variables This commit declares the variables inside .c files
2020-01-24Improve the system-info.sh script to report CPU and RAM meta-data. (#7815)Austin S. Hemmelgarn
* Add CPU information collection for Linux and FreeBSD. This adds logic to system.info.sh to collect info about the system's CPU. It adds the following keys to the output of the script: * NETDATA_CPU_LOGICAL_CPU_COUNT: This reports the number of logical CPU cores the system is actually using (including offline ones). This may differ from the CPU's advertised core count, but is what most people actually care about. * NETDATA_CPU_VENDOR: This reports the CPU manufacturer. This is needed because some systems do not include the manufacturer name in the CPU model name. * NETDATA_CPU_MODEL: This reports the CPU model. It may or may not include any of a model number, intended operating frequency, and manufacturer name. * NETDATA_CPU_FREQ: This reports a best guess at the design frequency for the CPU. It may instead be the max boost frequency. This is reported as a number with associated units, which will usually be either hertz or megahertz. * NETDATA_CPU_DETECTION: This reports the method used to detect the CPU information, It will be either 'none' if no detection was successful, or a space-separated list of detection methods. This may potentially use any of the following detection methods: * lscpu: Uses a mix of information from across the system. Requires the `lscpu` command to be installed. * dmidecode: Uses the information from the DMI tables. Requires hardware support as well as the `dmidecode` command. * nproc: Uses the `nproc` command from the GNU coreutils to get a count of logical processors. * sysctl: Uses the `sysctl` command on FreeBSD to fetch information. * sysfs: Uses /sys on Linux to fetch information. * procfs: Uses /proc/cpuinfo on Linux to fetch information. * uname: Uses the `uname` command from the GNU coreutils to get CPU model and vendor information. All values tht were not successfully detected should read back as 'unknown'. Some values may have spaces present, and thus are quoted in the output. * Collect total system RAM info in system-info.sh This collects info about the total usable system RAM in the system-info.sh script. It adds the following two keys to the output of the script: * NETDATA_TOTAL_RAM: Reports the total usable system RAM as a number with an associated unit, usually as bytes or kilobytes. Reports 'unknown' if this couldn't be determined. * NETDATA_RAM_DETECTION: Indicates how we detected the total RAM, or 'none' if we couldn't figure out the total RAM. * Make lscpu output parsing more robust. * Remove extra quotes. The output is not parsed as shell variables, but using a special parser that just reads everything from the `=` to EOL as the value. * Coerce output to base units. This properly converts the output for CPU frequencies and RAM sizes to use base units of Hertz or bytes, allowing for simpler parsing of the output. * Fix incorrect number handling in total RAM parsing. * Correctly fix incorrect number handling in total RAM parsing. * Fix parsing of `lscpu` output. This properly recognizes the CPU frequency value as MHz and truncates the value to an integer.
2020-01-21Issue 7488 docker labels (#7770)Andrew Moss
Improve the metadata detection for containers. The system_info structure has been updated to hold separate copies of OS_NAME, OS_ID, OS_ID_LIKE, OS_VERSION, OS_VERSION_ID and OS_DETECTION for both the container environment and the host. This new information is communicated through the /api/v1/info endpoint. For the streaming interface a partial copy of the info is carried until the stream protocol is upgraded. The anonymous_statistics script has been updated to carry the new data to Google Analytics. Some minor improvements have been made to OS-X / FreeBSD detection, and the detection of virtualization. The docs have been updated to explain how to pass the host environment to the docker container running Netdata.
2020-01-17Fix libuv IPC pipe cleanup problem (#7778)Markos Fountoulakis
* Fix libuv IPC pipe cleanup problem
2020-01-09error exit when rrdhost localhost init fails #7504 (#7663)Timo
* error exit when rrdhost localhost init fails #7504
2019-12-20Set standard name to non-libnetdata threads (libuv, pthread) (#7584)Adrien Mahieux
* [libnetdata/threads] Add uv_thread_set_name This is inspired from thread_set_name() but for libuv threads. Both are based on pthread, but for uv we need to call it with the uv_thread_t pointer, instead of being the thread that calls the function for itself. * [exporting] Set libuv threadname to "EXPORTING-index" * [database/engine] Set libuv thread name to "DBENGINE" * [daemon/command] Set libuv thread name to "DAEMON-COMMAND" * [collectors/proc] Set pthread name to "PLUGIN[cpuidle]" * Use new 'thread_set_name_np' name
2019-12-19Agent claiming (#7525)Markos Fountoulakis
Initial infrastructure support for agent claiming. This feature is not currently enabled as we are still finalizing the details of the cloud infrastructure w.r.t. agent claiming. The feature will be enabled when we are ready to release it.
2019-12-18Fix race condition in dbengine (#7565)thiagoftsm
* fix_db_race_condition: unit test Adjust unit test for dbengine * fix_db_race_condition: page cache Fix database * fix_db_race_condition: Missing function call This commit brings the correct function call inside rrdengine.c
2019-12-17Revert "Fix race condition in dbengine (#7533)" (#7560)Andrew Moss
We are removing this fix for further internal testing, it will be returning after we iron out some bugs. This reverts commit 53ab093d84919c743450199a31bca9a13412e451.
2019-12-16Fix race condition in dbengine (#7533)Markos Fountoulakis
2019-12-16Labels issues (#7515)Andrew Moss
Initial work on host labels from the dedicated branch. Includes work for issues #7096, #7400, #7411, #7369, #7410, #7458, #7459, #7412 and #7408 by @vlvkobal, @thiagoftsm, @cakrit and @amoss.
2019-12-15Fix valgrind errors (#7532)Markos Fountoulakis
* Add callbacks to pipe handle destructors * Fix valgrind errors
2019-12-12Implement the main flow for the Exporting Engine (#7149)Vladimir Kobal
* Add top level tests * Add a skeleton for preparing buffers * Initialize graphite instance * Prepare buffers for all instances * Add Grafite collected value formatter * Add support for exporting.conf read and parsing * - Use new exporting_config instead of netdata_config * Implement Grafite worker * Disable exporting engine compilation if libuv is not available * Add mutex locks - Configure connectors as connector_<type> in sections of exporting.conf - Change exporting_select_type to check for connector_ fields * - Override exporting_config structure if there no exporting.conf so that look ups don't fail and we maintain backwards compatibility * Separate fixtures in unit tests * Test exporting_discard_responce * Test response receiving * Test buffer sending * Test simple connector worker - Instance section has the format connector:instance_name e.g graphite:my_graphite_instance - Connectors with : in their name e.g graphite:plaintext are reserved So graphite:plaintext is not accepted because it would activate an instance with name "plaintext" It should be graphite:plaintext:instance_name * - Enable the add_connector_instance to cleanup the internal structure by passing NULL,not NULL arguments * Implement configurable update interval - Add additional check to verify instance uniqueness across connectors * Add host and chart filters * Add the value calculation over a database series * Add the calculated over stored data graphite connector * Add tests for graphite connector * Add JSON connector * Add tests for JSON formatting functions * Add OpenTSDB connector * Add tests for the OpenTSDB connector * Add temporaty notes to the documentation
2019-12-04Implement netdata command server and cli tool (#7325)Markos Fountoulakis
* Checkpoint commit (POC) * Implemented command server in the daemon * Add netdatacli implementation * Added prints in command server setup functions * Make libuv version 1 a hard dependency for the agent * Additional documentation * Improved accuracy of names and documentation * Fixed documentation * Fixed buffer overflow * Added support for exit status in cli. Added prefixes for exit code, stdout and stderr. Fixed parsers. * Fix compilation errors * Fix compile errors * Fix compile errors * Fix compile error * Fix linker error for muslc
2019-12-04Docs: Fixes to new health documentation structure (#7419)Joel Hans
* Fixed link * Added GA links
2019-12-03Documentation on per-chart configuration options (#7345)Joel Hans
* Initial re-setup * Working on dimension settings * Finished with dimension settings * Grammar fixes and better incremental description * Final few fixes * Fix for Markos
2019-11-14Updating the Travis pipeline (issue 7189) (#7312)Andrew Moss
Added a linting pass. This is non-blocking but will output a measure of how much each .c and .h file deviates from our current .clang format. Changed the standard build to include all of the warning that we are using in dev. Added the dependencies for dbengine and confirmed that the standard build enables dbengine. Fixed the original warnings.
2019-11-11Makefile.am files indentation (#7252)Konstantinos Natsakis
* Use 4 spaces for indentation of non-recipe lines in Makefile.am files * Be more consistent in the use of space before = in Makefile.am files
2019-10-30Fix counter reset detection (#7220)Markos Fountoulakis
* Removed support for 16-bit and 8-bit counter overflow * Improve behaviour of counter overflow detection versus counter resets. * Added support for signed 32-bit and 64-bit limits for counter overflows. * Fixed signed incremental counter issues and added unit tests.
2019-10-24Fixing DNS-lookup performance issue on FreeBSD. (#7132)Andrew Moss
Our default configuration includes: allow connections from = localhost * allow management from = localhost The problem occurs when a connection is received that passes the `allow connections` pattern match, but fails the ACL check for `allow management`. During the failure processing path the DNS lookup is triggered to allow the FQDN to be checked against the pattern. On a FreeBSD system this lookup fails more slowly than linux and causes a visible performance problem during stress-testing. The fix adds a heuristic to analyse the patterns and determine if it is possible to match a DNS name, or only match a numeric IP address (either IPv4 or IPv6), or only match a constant value. This heuristic is used to disable the DNS checks when they cannot produce anything that may match the pattern. Each heuristic is evaluated once, when the configuration is loaded, not per-connection to the agent. Because the heuristic is not exact it can be overridden using the new config options for each of the ACL connection filters to set it to "yes", "no" or "heuristic". The default for everything *except* the netdata.conf ACL is "heuristic". Because of the numeric-patterns in the netdata.conf ACL the default is set to "no".
2019-10-24detect if the disk cannot keep up with data collection (#7139)Markos Fountoulakis
* Adjust dbengine flushing speed more dynamically * Added error tracking statistics for failure to flush events * Added alarm for dbengine flushing errors * Improved dbengine accounting for commited to be written pages
2019-10-17feat(reaper): Add process reaper support (#7059)Steven Hartland
##### Summary Add a child process reaper to the main netdata app if running as init (pid = 1). This prevents zombie processes when a child is re-parented to netdata when its running in a container. Also: * Few style cleanups to match surrounding code. Fixes: #6033 ##### Component Name netdata binary ##### Additional Information This re-purposes old commented out code in `popen.c`, which already implemented part of the required process tracking. Without this on a standard netdata docker install we saw at least one zombie `timeout` process straight after the container was started.
2019-10-15Add CMocka unit tests (#6985)Vladimir Kobal
* Add str2ld test * Build test with Autotools * Add storage_number test * Configure tests in CMake
2019-10-14Add dbengine RAM usage statistics (#7038)Markos Fountoulakis
* Add dbengine RAM usage statistics * Fix code style * Aggregate dbengine statistics across all slave hosts and localhost
2019-10-07Remove hard cap from page cache size to eliminate deadlocks. (#7006)Markos Fountoulakis
* Remove page cache error detection and deadlock resolution * Change page cache logic to disallow deadlocks due to too many API users * Updated documentation * Changed default and minimum page cache size values to 32 and 8 MiB respectively
2019-10-03Fix remark warnings for Daemon README (#6920)Promise Akpan
* fix remark warnings for daemon README * rewrap and indent to make it easier to read * make character limit to 120 not 80
2019-10-03Make dbengine the default memory mode (#6977)Markos Fountoulakis
* Basic functionality for dbengine stress test. * Fix coverity defects * Refactored dbengine stress test to be configurable * Added benchmark results and evaluation in dbengine documentation * Make dbengine the default memory mode
2019-10-02Coverity 20190924 (#6941)thiagoftsm
* coverity_20190924: Fix 215633 In the switch the library stops case this pointer is NULL, so there is not necessity to processed with tests * coverity_20190924: Fix 338067 The current code tries to copy the same size of the variable, another possible solution would be to use a function to sanitize the code, I will try this first * coverity_20190924: Fix 348638 Considering that we are testing the variable value one line above The division will always happen * coverity_20190924: Fix 348640 For this specific case we do not have the possibility to have memory leak, valgrind confirms this, but I am adding a new variable here to the stack to discard the warning
2019-09-24Detect deadlock in dbengine page cache (#6911)Markos Fountoulakis
* Detect deadlock in dbengine page cache when there are too many metrics and print error message * Resolve dbengine deadlock by dropping metrics when page cache is too small and define relevant alarms * Changed printing deadlock errors to only happen once per dbengine instance
2019-09-12Stress test insertions into dbengine and bugfixes (#6814)Markos Fountoulakis
* Fix memory corruption during deallocation of page cache * Refactored dataset generator in order to support the upcoming self-validating stress test and multithreading. * Fix starvation in database engine loop when the command queues are continuously populated * Fixing disk quota limits for dbengine dataset generator
2019-09-04Add high level explanation of dashboard contents (#6648)Joel Hans
* Moved content about charts/families/contexts to web * Working on dashboard docs * Working on dashboard docs * Improvements to charts, families, contexts * Working more on dashboard overview * More improvements to web dashboards docs * Fixing broken links * More fixes to the dashboard areas * Grammar check on revised docs * Fixing broken table * Addressing Chris' comments * Addressing Cosmix's comments plus a few additions * Fixing lint issues * Fixing linter errors and re-adding lost links * Addressing Cosmix' requests * Fixing context issue
2019-08-28Variable Granularity support for data collection (#6430)Markos Fountoulakis
* Variable Granularity support for data collection in the dbengine. * Variable Granularity support for data collection in the daemon. * Added tests to validate the data being queried after having been collected by changing data collection interval * Fix memory corruption * Updated database engine documentation about data collection frequency behaviour
2019-08-15Fix Markdown Lint warnings (#6664)Promise Akpan
* make remark access all directories * detailed fix after autofix by remark lint * cross check autofix for this set of files * crosscheck more files * crosschecking and small fixes * crosscheck autofixed md files
2019-08-13 Change "netdata" to "Netdata" in all docs (#6621)Joel Hans
* First pass of changing netdata to Netdata * Second pass of netdata -> Netdata * Starting work on netdata with no whitespace after * Pass for netdata with no whitespace at the end * Pass for netdata with no whitespace at the front
2019-08-09Better system OS detection for RHEL6 and Mac OS X (#6612)Piotr Roszatycki
* On RHEL 6 file /etc/lsb-release is pretty useless * Better OS version detection for Mac OS X
2019-07-18Do not try to write log in /tmp (#6491)Chris Akritidis
2019-07-12Add global configuration option for zero metrics (#6419)Vladimir Kobal
* Add global configuration option for zero metrics * Add the option to the cgroup plugin * Add the option to the proc plugin (diskstats, meminfo, net_dev, netstat, sctp_snmp, snmp, snmp6, sockstat, sockstat6, synproxy, vmstat, system_edac_mc, system_node, btrfs, ksm, zfs) * Add the option to the macos plugin * Add the option to the freebsd plugin (devstat, getifaddrs, getmntinfo, sysctl) * Change the option behaviour with the 'auto' value * Add the option to the tc plugin * Update the documentation
2019-07-01Easily disable alarms, by persisting the silencers configuration (#6360)thiagoftsm
This PR was created to fix #3414, here I am completing the job initiated by Christopher, among the newest features that we are bring we have JSON inside the core - We are bringing to the core the capacity to work with JSON files, this is available either using the JSON-C library case it is present in the system or using JSMN library that was incorporated to our core. The preference is to have JSON-C, because it is a more complete library, but case the user does not have the library installed we are keeping the JSMN for we do not lose the feature. Health LIST - We are bringing more one command to the Health API, now with the LIST it is possible to get in JSON format the alarms active with Netdata. Health reorganized - Previously we had duplicated code in different files, this PR is fixing this (Thanks @cakrit !), the Health is now better organized. Removing memory leak - The first implementation of the json.c was creating SILENCERS without to link it in anywhere. Now it has been linked properly. Script updated - We are bringing some changes to the script that tests the Health. This PR also fixes the race condition created by the previous new position of the SILENCERS creation, I had to move it to daemon/main.c, because after various tests, it was confirmed that the error could happen in different parts of the code, case it was not initialized before the threads starts. Component Name health directory health-cmd Additional Information Fixes #6356 and #3414
2019-07-01Repeating alarm notifications (#6309)thiagoftsm
* Alarm_repeat mergin the original! * Alarm_repeat binary tree! * Alarm_repeat binary tree finished! * Alarm_repeat move function and format string * Alarms bringing a new Binary tree * Alarms fixing the last two * Alarm_repeat useless var! * Alarm fix format and repeat alarm! * Alarm_backend steps! * Alarm_repeat stopping to test cloud! * Alarm_repeat stopping to test cloud 2! * Alarm_repeat fixing when restart!
2019-06-21Handle file descriptors running out (#6303)Markos Fountoulakis
* Handle file descriptors running out * Added alarm for dbengine FS and I/O errors * more verbose alarm message * * Added File-Descriptor budget to Database Engine instances. * Changed FD budget of the web server from 50% to 25%. * Allocated 25% of FDs to dbengine. * Created a new dbengine global FD utilization chart.