diff options
author | Chris Akritidis <43294513+cakrit@users.noreply.github.com> | 2023-02-26 09:38:33 -0800 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-02-26 09:38:33 -0800 |
commit | 20745bf78ba1504591711b0780517f405158503a (patch) | |
tree | a00670f9cf7cb61683c6308b635536ce0f2fc1e6 /health | |
parent | c906ddafe6fe964426e5cb007437ac37dc9d48f4 (diff) |
Reorg learn 0226 (#14610)
* Reorg getting started
* Streaming
* Remove blanks
* Fix up to cloud alerts
Diffstat (limited to 'health')
-rw-r--r-- | health/README.md | 6 | ||||
-rw-r--r-- | health/REFERENCE.md | 93 |
2 files changed, 65 insertions, 34 deletions
diff --git a/health/README.md b/health/README.md index e8125e29bb..9776600415 100644 --- a/health/README.md +++ b/health/README.md @@ -18,6 +18,6 @@ community-configured alarms for every app/service [the Agent collects metrics fr silence anything you're not interested in. You can even power complex lookups by running statistical algorithms against your metrics. -Ready to take the next steps with health monitoring? - -[Configuration reference](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md) +You can [use various alert notification methods](https://github.com/netdata/netdata/edit/master/docs/monitor/enable-notifications.md), +[customize alerts](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md), and +[disable/silence](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#disable-or-silence-alerts) alerts. diff --git a/health/REFERENCE.md b/health/REFERENCE.md index df011d6a6f..306db89350 100644 --- a/health/REFERENCE.md +++ b/health/REFERENCE.md @@ -37,6 +37,8 @@ You can configure the Agent's health watchdog service by editing files in two lo Navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) and use `edit-config` to make changes to any of these files. +### Edit individual alerts + For example, to edit the `cpu.conf` health configuration file, run: ```bash @@ -69,17 +71,53 @@ to the values of your choosing. For example: crit: $this > (($status == $CRITICAL) ? (75) : (85)) ``` -Save the file and [reload Netdata's health configuration](#reload-health-configuration) to make your changes live. +Save the file and [reload Netdata's health configuration](#reload-health-configuration) to apply your changes. + +## Disable or silence alerts + +Alerts and notifications can be disabled permanently via configuration changes, or temporarily, via the +[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). The +available options are described below. + +### Disable all alerts + +In the `netdata.conf` `[health]` section, set `enabled` to `no`, and restart the agent. + +### Disable some alerts + +In the `netdata.conf` `[health]` section, set `enabled alerms` to a +[simple pattern](https://github.com/netdata/netdata/edit/master/libnetdata/simple_pattern/README.md) that +excludes one or more alerts. e.g. `enabled alarms = !oom_kill *` will load all alarms except `oom_kill`. + +You can also [edit the file where the alert is defined](#edit-individual-alerts), comment out its definition, +and [reload Netdata's health configuration](#reload-health-configuration). -### Silence an individual alarm +### Silence an individual alert -Instead of disabling an alarm altogether, or even disabling _all_ alarms, you can silence individual alarms by changing -one line in a given health entity. To silence any single alarm, change the `to:` line in its entity to `silent`. +You can stop receiving notification for an individual alert by [changing](#edit-individual-alerts) the `to:` line to `silent`. ```yaml to: silent ``` +This action requires that you [reload Netdata's health configuration](#reload-health-configuration). + +### Temporarily disable alerts at runtime + +When you need to frequently disable all or some alerts from triggering during certain times (for instance +when running backups) you can use the +[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). +The API allows you to issue commands to control the health engine's behavior without changing configuration, +or restarting the agent. + +### Temporarily silence notifications at runtime + +If you want health checks to keep running and alerts to keep getting triggered, but notifications to be +suppressed temporarily, you can use the +[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). +The API allows you to issue commands to control the health engine's behavior without changing configuration, +or restarting the agent. + ## Write a new health entity While tuning existing alarms may work in some cases, you may need to write entirely new health entities based on how @@ -1124,33 +1162,6 @@ template: ml_5min_node The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](https://learn.netdata.cloud/docs/agent/ml#node-anomaly-rate) over the last 5 minues. -## Troubleshooting - -You can compile Netdata with [debugging](https://github.com/netdata/netdata/blob/master/daemon/README.md#debugging) and then set in `netdata.conf`: - -```yaml -[global] - debug flags = 0x0000000000800000 -``` - -Then check your `/var/log/netdata/debug.log`. It will show you how it works. Important: this will generate a lot of -output in debug.log. - -You can find the context of charts by looking up the chart in either `http://NODE:19999/netdata.conf` or -`http://NODE:19999/api/v1/charts`, replacing `NODE` with the IP address or hostname for your Agent dashboard. - -You can find how Netdata interpreted the expressions by examining the alarm at -`http://NODE:19999/api/v1/alarms?all`. For each expression, Netdata will return the expression as given in its -config file, and the same expression with additional parentheses added to indicate the evaluation flow of the -expression. - -## Disabling health checks or silencing notifications at runtime - -It's currently not possible to schedule notifications from within the alarm template. For those scenarios where you need -to temporary disable notifications (for instance when running backups triggers a disk alert) you can disable or silence -notifications are runtime. The health checks can be controlled at runtime via the -[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). - ## Use dimension templates to create dynamic alarms In v1.18 of Netdata, we introduced **dimension templates** for alarms, which simplifies the process of @@ -1311,3 +1322,23 @@ And how just a few of those dimension template-generated alarms look like in the All in all, this single entity creates 36 individual alarms. Much easier than writing 36 separate entities in your health configuration files! + +## Troubleshooting + +You can compile Netdata with [debugging](https://github.com/netdata/netdata/blob/master/daemon/README.md#debugging) and then set in `netdata.conf`: + +```yaml +[global] + debug flags = 0x0000000000800000 +``` + +Then check your `/var/log/netdata/debug.log`. It will show you how it works. Important: this will generate a lot of +output in debug.log. + +You can find the context of charts by looking up the chart in either `http://NODE:19999/netdata.conf` or +`http://NODE:19999/api/v1/charts`, replacing `NODE` with the IP address or hostname for your Agent dashboard. + +You can find how Netdata interpreted the expressions by examining the alarm at +`http://NODE:19999/api/v1/alarms?all`. For each expression, Netdata will return the expression as given in its +config file, and the same expression with additional parentheses added to indicate the evaluation flow of the +expression. |