summaryrefslogtreecommitdiffstats
path: root/health
AgeCommit message (Collapse)Author
2019-02-22Added rocketchat to method-list (#5471)tctovsli
2019-02-20Delay raising the linux power supply alarm, to prevent errors during netdata ↵Chris Akritidis
startup (#5447)
2019-02-20Correct duplicate flag enum in health.h (#5441)Chris Akritidis
2019-02-14Add documentation for network interfaces (#5381)Vladimir Kobal
* Add documentation for network interfaces * Minor fix * Format chart names * Add an example
2019-02-12automatic shellchecking on .sh.in scripts (#5165)Paweł Krupa
* CI linting .sh.in files * make linter passing
2019-02-12Lint alarm notify (#5164)Paweł Krupa
* manually lint alarm-notify.sh code * run automatic bash formatter (shfmt) * bring back unused variable
2019-02-11Add cgroup cpu and memory limits and alarms (#5172)Vladimir Kobal
* Add memory limit variables * Add memory usage alarms * Add CPU limit variables * Add cpu usage alarm * Fix quota calculation, minor cleanup * Update the documentation * Add charts with limits * Fix Codacy issues * Change units for the mem_usage_limit chart * Change the behaviour of the cpu_limit chart
2019-02-08fix warning condition for mem.available (#5353)Chris Akritidis
* fix warning condition for mem.available * Fix system.ram ram_available as well
2019-01-31registrypath is not used and causes issues in slack (#5302)Chris Akritidis
2019-01-28Cloud Sign-In (#5095)George Moschovitis
* Manually merged changes from old hub-support branch, tracking #131 Call claim url #4771 Claim ui improvements #4771 Cleanup Implement Sign Out Introduced sign-in modal #3990 Added sign-in button More work on the iframe trick More work More work on the logic, removed old obsolete stuff Close modal Implement account menu Minor rename Renamed my-netdata to My Agents Show migrate button Collect known agents Work on migrateRegistryDidClick Minor Actually show agents from netdata cloud in the menu Some cleanup Keep all the alternate_urls for each agent Fix for tooltips over SignIn/AccountMenu * Actually use NETDATA.registry.cloudBaseURL Tricky! * Hide switch identity when signed-in #153 * Manually merged changes from old hub-support branch, tracking #131 Call claim url #4771 Claim ui improvements #4771 Cleanup Implement Sign Out Introduced sign-in modal #3990 Added sign-in button More work on the iframe trick More work More work on the logic, removed old obsolete stuff Close modal Implement account menu Minor rename Renamed my-netdata to My Agents Show migrate button Collect known agents Work on migrateRegistryDidClick Minor Actually show agents from netdata cloud in the menu Some cleanup Keep all the alternate_urls for each agent Fix for tooltips over SignIn/AccountMenu * Actually use NETDATA.registry.cloudBaseURL Tricky! * Hide switch identity when signed-in #153 * Cleanup * Refresh menu on sign-in * Disable cloud functionality if cloud base url is not set. This wll allow the merging of the branch into master, so we can avoid nasty rebases. * Updated to use the latest API endpoints * Fixed a couple of LGTM warnings * Improved migration algorithm, some cleanup. * Update My-Netdata menu on sign-out * Minor * Replaced modal with window * Update the My-Agents menu after migration, cleanup * Make the agent work after switching cloudBaseURL, cleanup * Introduced event tracing for analytics * Minor * Removed GA * Fixed error reported by LGTM * Only send the diff when syncing agents to ameliorate the load on the backend, cleanup * Reverted My-Netdata name, added some logging * Add Netdata Cloud menu item * Minor * Use the merge: false option and a fix * Added loading message in my-netdata menu * Show error if we cannot connect to netdata.cloud * Minor * Implemented deleteCloudKnownAgentURL api call, use it in my-netdata menu. * Removed menu entry * Disable my-netdata menu if user is not signed-in and using the global registry * Stop accessing the registry if it's not used. * Mask the agent url if the registry is in 'disabled' mode * Filter masked urls * Improved filtering of masked urls * Try to eagerly initialize the account ui to improve perceived performance * Minor * Don't search for other people's urls in cloud-enabled mode. * Added basic my-netdata filtering * Filter streamed host, aesthetic fixes * Minor * Some improvements of the filter ui * Removed What is this * Added placeholder to input, other fixes #240 * Show message if no databases match filter criteria * Fixed bug where agent lists where not merged * Minor * Hide modal if it redirects to self. * autocomplete off for filter input * Enable delete for custom registries, don't show error if delete fails * Filter agents without urls * Fix LGTM warning * Minor * Concatenate at client side, used the faster merge: false path * Added a clear button to the filter for extra usability * Minor * Minor * Improvements for small screens (more needed) * Combined my-netdata menu and hostname * Re-enabled registry masking * Show agent-filter only when signed-in * Improved syncAgents * Don't mask if using custom registry * Reject agents with empty urls * Filter valid agents * Fixed a couple of bugs * Applied Chris' fixes * Fix in registry.c * Cleanup * Only sync once * Implemented forceSync * Added what is this * sso, wip * Working SSO sign-in/sign-out, cleanup * Added Chris' patch * Added a modal that explains what synchronize is doing * Use sso-agent * Use origin as query param in sign-in * iframe -> origin * Pass machine_guid to sso * Make sure that the current netdata agent is synchronized hub#262 * Normalize originURL * Reenable tryFastInitCloud() * Updated to the latest endpoints * Support synchronizing to multiple cloud accounts * Set default cloud base url to netdata.cloud * Fix filter issues with Firefox * Fix for double tooltip on sign-in * Show known servers in console for debugging purposes * Don't block on errors to delete from registry when signed in * Disable tryFastInitCloud * Improved styling for filter input * Improved styling in my-netdata menu * Display the registry url in the sync-registry modal * agents -> nodes in texts * Support for sso-precheck * Do not implicitly synchronize custom registries. * Improvement to syncAgents (more coming) * More fixes * Don't sign in users with private registries if they don't consent on the sync * Set netdataRegistryAfterMs = 0 * Don't pass url to sso-agent * Added Chris' patch to alarm-notify * Refactored syncAgent/mergeAgents, make sure current Agent is synced on sign-in. * Fix for LGTM warning * Minor * Fix for a XSS warning * Extra check for dataLayer
2019-01-24Update alarm-notify.sh.in (#5263)Chris Akritidis
Fixes #5261
2019-01-22alarm-notify: Add Prowl integration for iOS users. (#5132)Austin S. Hemmelgarn
* alarm-notify: Add Prowl integration for iOS users. This adds the ability for Netdata to send push notifications to iOS devices via prowlapp.com. Delivered notifications have a similar format to that used with Fleep.io, and include a URL that links back to the Netdata system that originally issued the alert. There is a hard limit on API calls to Prowl of 1000 per IP per hour. Prowl provides support for issuing notifications to multiple recipients simultaneously by specifying more than one API key with a request, and this code takes advantage of that so that any notification only results in at most one API call. Fixes: #3788 * Numerous small fixes.
2019-01-15Port ACLs, Management API and Health commands (#4969)Chris Akritidis
##### Summary fixes #2673 fixes #2149 fixes #5017 fixes #3830 fixes #3187 fixes #5154 Implements a command API for health which will accept commands via a socket to selectively suppress health checks. Allows different ports to accept different request types (streaming, dashboard, api, registry, netdata.conf, badges, management) Removes support for multi-threaded and single-threaded web servers. ##### Component Name health, daemon
2019-01-03Add variables to alarm-notify.sh to show number of warning and critical ↵Chris Akritidis
alarms, evaluated expression and expression variable values. Use them in the email notifications (#5096)
2018-12-23Fixed typo (#5054)SAMUEL NELA
Fixed typo in Alerta Readme
2018-12-20New option clear_alarm_always (#5050)Piotr Roszatycki
##### Summary Netdata after restart might clear the status of previous alarm to UNINITIALIZED. In that case in Alerta we can have a status not cleared correctly. It is not a problem to send it once again because Alerta does deduplication of alerts by itself. ##### Component Name health/notifications ##### Additional Information It is just additional variable in health_alarm_notify.conf
2018-12-18remove cross-directory dependency in build system (#5012)Paweł Krupa
* remove cross-directory dependency in build system * remove unused pythondir_POST replace sysconfdir/netdata with configdir
2018-12-17Kavenegar returns 200 (#5020)SS Salehi
<!-- Describe the change in summary section, including rationale and degin decisions. Include "Fixes #nnn" if you are fixing an existing issue. In "Component Name" section write which component is changed in this PR. This will help us review your PR quicker. If you have more information you want to add, write them in "Additional Information" section. This is usually used to help others understand your motivation behind this change. A step-by-step reproduction of the problem is helpful if there is no related issue. --> ##### Kavenegar returns 200 when the request has been successful. ##### Kavenegar Notification
2018-12-17Fix missing method_name: kavenegar (#5019)SS Salehi
<!-- Describe the change in summary section, including rationale and degin decisions. Include "Fixes #nnn" if you are fixing an existing issue. In "Component Name" section write which component is changed in this PR. This will help us review your PR quicker. If you have more information you want to add, write them in "Additional Information" section. This is usually used to help others understand your motivation behind this change. A step-by-step reproduction of the problem is helpful if there is no related issue. --> ##### Kavenegar was missing in new alarm-notify.sh ##### Just method_names has been changed and related configs. ##### Add missing kavenegar in method_names variable
2018-12-13Add support for nonredundant arrays (#4923)Vladimir Kobal
* Add support for nonredundant arrays * Change charts representation and add an alarm * Minor cleanup for chart priorities * Add configuration options * Make charts obsolete when raid diappears * Fix small bugs, add config option for making charts obsolete * Fix mdstat parsing
2018-12-13Move power supply python module to proc plugin (#4960)Vladimir Kobal
* Add power supply module * Use linked list * Add charts for all properties * Organize charts menu * Fix line endings * Keep files open * Add options for chart disabling * Final cleanup * Add documentation, disable python module * Fix file attributes * Mark python module as obsolete * Allow symbolic links to power source directories
2018-12-13Fix to #4968, custom recipients were not working properly (#4978)Chris Akritidis
##### Summary Fixes #4968 The custom recipient variable substitution wasn't working properly ##### Component Name health notifications ##### Additional Information
2018-12-09Add support for providing FQDN in alarm notifications. (#4943)Austin S. Hemmelgarn
##### Summary This adds an option to alarm-notify.sh to make it use the system's FQDN instead of it's simple hostname when sending alarm notifications. This can be enabled by adding `use_fqdn="YES"` to `health_alarm_notify.conf`. This does not work correctly for alarms being sent by a master node on behalf of a slave system, and includes an explicit check so that it falls back to just sending the simple hostname in such cases. This commit also cleans up misuse of the `${this_host}` variable, which is supposed to just be a temporary variable. ##### Component Name health ##### Additional Information Minimally tested, it runs correctly on my systems and does as advertised. Relevant to:#809 Fixes: #2477
2018-12-07Ga (#4938)Chris Akritidis
* Added GA tags to markdowns * Add GA tags to mds
2018-12-03fix(pagerduty): Use cURL instead of PagerDuty agent to send alarms. (#4694)Elisiário Couto
2018-12-03Added info on health configuration, with a separate page for Charts, ↵Chris Akritidis
Dimensions, Alarms, Contexts (#4895)
2018-12-02added debug statements when loading health config files (#4896)Costa Tsaousis
2018-11-28Generalize the recipient finding logic and reduce the boilerplate code. (#3960)Austin S. Hemmelgarn
* Generalize the recipient finding logic and reduce the boilerplate code. This generalizes the recipient finding logic in alarm-notify.sh by converting it to utilize computed variable names in a loop instead of having it all serialized. It also handles declaration of the `SEND_*`, `DEFAULT_RECIPIENT_*`, and `role_recipients_*` variables in a similar manner. Overall, this significantly reduces the amount of boilerplate code needed to add a new notification method. By just adding the variable tag (in lowercase) to the `method_names` variable near the top, you get automatic declarations of the above mentioned variables and are automatically hooked into the recipient finding logic. There are two oddities that this has had to work around: * Kafka: The kafka notification support does not do anything with recipients. I know nothing about Kafka myself, so I'm just leaving this alone. Because it doesn't want the whole recipient handling, it's explicitly left out of the `method_names` variable, and the `SEND_KAFKA` variable is explicitly declared. * EMail: The email handling has two peculiarities to it: - There is a default value for `DEFAULT_RECIPIENT_EMAIL` explicitly defined in alarm-notify.sh. We just declare the variable in the nromal loop, and then assign it down with the other email related config variables. - Recipient names need to be separated by a comma and a space, not just a space like everything else wants. This is achieved by some creative usage of the `cut` command and the `$IFS` variable. Ideally, this wouldn't be needed, but spaces in email addresses are valid (as stupid as that fact is), so we have to account for that. * Address code comments by @ktsaou. * Improve the efficiency of the recipient finding loop. This relocates the loop that finds recipients for each method after all the other checks for each method, and adds a bit of logic at the top of the loop to skip methods that are disabled. This avoids running the recipient finding logic for methods that are disabled for any other reason, which should significantly improve performance for the common case of users only configuring one of the notification methods. * Fix one more code issue pointed out by @ktsaou. * Avoid multiple forks when finding email recipients. * Add a unit test mode for the recipient parsing logic. This adds a unit-test mode for alarm-notify.sh that covers the recipient parsing logic and the criticality filtering logic. Invocation for this testing is as follows: alarm-notify,sh unittest <role> <cfg> <status> <old_status> Where `<role>` is the name of the role to use for testing, `<cfg>` is the path to the config file to load for testing, and `<status>` and `<old_status>` are the simulated current and previous status for the test alarm. The unit testing mode will run all the logic up to and including the recipient parsing loop, print out the parsed recipient lists, and then exit. The lines for the parsed recipient lists look like this: results: <method>: foo bar baz Where `<method>` is the delivery method for these recipients. * Renamed variables to improve clarity as suggested by @ktsaou. * Split email handling from the main recipient loop. * Fixes to unittest mode. * Fix incorrect variable name. * Fix typo.
2018-11-28Added a few more debugging instructions for notifications (#4774)Chris Akritidis
Fixes #4319
2018-11-28Improve support for slack recipients (#4765)Chris Akritidis
Fixes #3722 Fixes #4755 ##### Summary - Removed the default addition of '#' to the "channel", so it can support both channels and users - Add the '#' if it's not already there in the recipient, to specify a channel (for backwards compatibility). - If the recipient is just a '#', netdata will not be sending the channel at all. This means that users only need to configure the channel or the user on the Slack webhook. ##### Component Name health/notifications ##### Additional Information
2018-11-27web_log: add alarm on unmatched lines (#4757)Ilya Mashchenko
2018-11-23Sanitize headers and htmlstructure (#4713)Chris Akritidis
* Restructured html site, corrected header in REDISTRIBUTED * Added header * Header updates and restructuring * Move requirements and runtime txts to htmldoc, by adding a netlify.toml that changes the base directory * Minor corrections to support the html doc restructuring * Debugging netlify * Debugging netlify * Debugging netlify * Beautify headers, comment in buildhtml * Beautify headers * Sanitize headers and reorganize static html site * Updated Makefile with moved and created htmldoc scripts
2018-11-22Documentation links sanity checker (#4701)Chris Akritidis
* Fix broken links * Fixed link * Added links checker * link updates from the link checker * Final corrections to allow checklinks to run without errors * Removed whitespace * Fixed codacy errors/warnings
2018-11-14Added new branding material #4598 (#4656)George Moschovitis
* Moved name2id to utils.js * Added new favicon and other branding material #4598 * Added improved favicon, cleanup #4598 * Removed some older icons * Removed seo-performance-128 icon refs, and refs to soylent icon * Regenerated dashboard.js * Use base64 encoded favicon #4598 Avoids refetching when hash gets updated
2018-11-12Htmldoc (#4607)Chris Akritidis
* First html documentation debug set * Test 2 * Relative path changed * Updated comments * Cleanup, installation draft added * fixes * test * test * test * First html documentation debug set * Test 2 * Relative path changed * Updated comments * Cleanup, installation draft added * fixes * test * test * test * First set of major cleanup/deduplication * 2nd major cleanup * update getting started structure * Cleanup in using netdata * Final cleanup/deduplication * Added initial CONTRIBUTING.md, updated some info related to contributing on the orchestrators * Removed Why-Netdata (included in new README in master), added link to CONTRIBUTING.md * First html documentation debug set * Updated Makefile.am to ignore the new md and htmldoc generation files * Removing files from rebase * First html documentation debug set * Test 2 * Relative path changed * Updated comments * Cleanup, installation draft added * fixes * test * test * test * First html documentation debug set * Test 2 * Relative path changed * Updated comments * Cleanup, installation draft added * test * test * First set of major cleanup/deduplication * 2nd major cleanup * update getting started structure * Cleanup in using netdata * Final cleanup/deduplication * Added initial CONTRIBUTING.md, updated some info related to contributing on the orchestrators * Removed Why-Netdata (included in new README in master), added link to CONTRIBUTING.md * First html documentation debug set * Updated Makefile.am to ignore the new md and htmldoc generation files * Removing files from rebase * Fixed Makefile.am * Same line header and badges * Fixed broken link * CPU monitoring is in apps plugin * Removed obsolete files * Remove obsolete files * - Make the Health API part of health/README.md new file web/api/health/README.md - Make installer/LAUNCH.md part of deamon/README.md - Move installer/MAINTAINERS.md to packaging/maintainers/README.md - Move installer/DOCKER.md to docker/README.md - Move system/README.md to daemon/config/README.md - Move web/CUSTOM-DASHBOARDS.md to web/gui/custom/README.md - Move web/CONFLUENCE-DASHBOARDS.md to web/gui/confluence/README.md * Resolve codacy issue $(..) syntax instead of `..` * Fix following warnings and add svgs to the data_structures/README.md - CHANGELOG.md - CODE_OF_CONDUCT.md - CONTRIBUTORS.md - REDISTRIBUTED.md - diagrams/data_structures/README.md - docker/README.md WARNING - Documentation file 'README.md' contains a link to 'collectors/plugins.d' which does not exist in the documentation directory. WARNING - Documentation file 'README.md' contains a link to 'collectors/statsd.plugin' which does not exist in the documentation directory. WARNING - Documentation file 'CONTRIBUTING.md' contains a link to 'web/CUSTOM-DASHBOARDS.md' which does not exist in the documentation directory. WARNING - Documentation file 'CONTRIBUTING.md' contains a link to 'web/CONFLUENCE-DASHBOARDS.md' which does not exist in the documentation directory. * Wrong urls in data_structures/README.md svgs * Fix svg URLs number 2 * Modify the first line of the main README.md, to enable proper static html generation. Executed after copying the file to htmldoc/src * Added back Why Netdata * Fixed link to registry in Why-Netdata.md * Added Why-Netdata to buildyaml and to Makefile.am * Replaced http links causing mixed content warnings * Made buildhtml ignore the directory node_modules created by Netlify * Corrected CONTRIBUTING.MD to CONTRIBUTING.md
2018-11-10Alerta.io notification improvements (#4576)Nick Satterly
* Enhance support of Alerta notifications * use 'indeterminate' severity as catch-all * move $family to "resource" * set "service" to 'Netdata' not $host * set "group" to "Performance' * set "value" to $value_sting not $alarm * add $alarm_id as a "tag" * add $roles, $name, $chart, $familty and $src as "attributes" * don't set auth header unless $ALERTA_API_KEY is set * log suppressed alerts (202 Accepted) as info not error * use $chart as resource for httpcheck alarms * Update alerta wiki page
2018-11-10fixed max interface speed calculation (#4594)Costa Tsaousis
2018-11-08Minor documentation improvements (#4566)George Moschovitis
* Formatting updates in database/README.md * More formarring in README.md files * README.md formatting * Minor formatting change * Minor changes in registy/README.md * Minor formatting * Minor formatting change
2018-11-02Switch e-mail threading to be enabled by default. (#3780)Austin S. Hemmelgarn
As discussed in the PR that added email threading support, this switches it to be enabled by default.
2018-10-24Feat: detect NIC speed and alarm on each device for net traffic overflow (#4430)Dylan Wang
* add chart local variable nic_speed_max and alarm for net traffic overflow * simplify code and respect with netdata host prefix * split sent/received traffic alarm * evaluate alarm only when nic_speed_max is set above 0
2018-10-23adaptec_raid python module (#4429)Ilya Mashchenko
* adaptec_raid module init version * adaptec_raid minor * adaptec_raid minor * adaptec_raid minor * adaptec_raid arcconf command fix * adaptec_raid minor fixes * adaptec_raid add alarms * adaptec_raid add link to screenshot to the readme
2018-10-23modularize the query api (#4443)Costa Tsaousis
* modularized exporters * modularized API data queries * optimized queries * modularized API data reduction methods * modularized api queries * added new directories in makefiles * added median db query * moved all RRDR_GROUPING related to query.h * added stddev query * operational median and stddev * working simple exponential smoothing * too complex to do it right * fixed ses * fixed ses * rewrote query engine * fix double-exponential-smoothing * cleanup * fixed bug identified by @vlvkobal at rrdset_first_slot() * enable freeipmi on systems with libipmimonitoring; #4440
2018-10-18moved related wiki pages into the repo (#4428)Costa Tsaousis
* moved related wiki pages into the repo * updated web server docs * fixed typos
2018-10-17Evaluate $used_ram_to_ignore on FreeBSD (#4419)openspork
Fix un-evaluated $used_ram_to_ignore variable on FreeBSD.
2018-10-16fix netdata.spec for new directory structure (#4410)Costa Tsaousis
* fix netdata.spec; fixes #4409 * second pass * more netdata.spec fixes * more netdata.spec fixes * more netdata.spec fixes * more netdata.spec fixes * more netdata.spec fixes
2018-10-16Account "Laundry" pages as a separate RAM dimension on FreeBSD. (#4390)Vlad Movchan
Laundry pages are dirty pages queued for laundering. "Laundry" page type was introduces in FreeBSD CURRENT about two years ago and later this was backported to FreeBSD 11 branch.
2018-10-15modularized all source code (#4391)Costa Tsaousis
* modularized all external plugins * added README.md in plugins * fixed title * fixed typo * relative link to external plugins * external plugins configuration README * added plugins link * remove plugins link * plugin names are links * added links to external plugins * removed unecessary spacing * list to table * added language * fixed typo * list to table on internal plugins * added more documentation to internal plugins * moved python, node, and bash code and configs into the external plugins * added statsd README * fix bug with corrupting config.h every 2nd compilation * moved all config files together with their code * more documentation * diskspace info * fixed broken links in apps.plugin * added backends docs * updated plugins readme * move nc-backend.sh to backends * created daemon directory * moved all code outside src/ * fixed readme identation * renamed plugins.d.plugin to plugins.d * updated readme * removed linux- from linux plugins * updated readme * updated readme * updated readme * updated readme * updated readme * updated readme * fixed README.md links * fixed netdata tree links * updated codacy, codeclimate and lgtm excluded paths * update CMakeLists.txt * updated automake options at top directory * libnetdata slit into directories * updated READMEs * updated READMEs * updated ARL docs * updated ARL docs * moved /plugins to /collectors * moved all external plugins outside plugins.d * updated codacy, codeclimate, lgtm * updated README * updated url * updated readme * updated readme * updated readme * updated readme * moved api and web into webserver * web/api web/gui web/server * modularized webserver * removed web/gui/version.txt