Reorg learn 0226 (#14610)

* Reorg getting started * Streaming * Remove blanks * Fix up to cloud alerts
author: Chris Akritidis <43294513+cakrit@users.noreply.github.com> 2023-02-26 09:38:33 -0800
committer: GitHub <noreply@github.com> 2023-02-26 09:38:33 -0800
commit: 20745bf78ba1504591711b0780517f405158503a (patch)
tree: a00670f9cf7cb61683c6308b635536ce0f2fc1e6 /health
parent: c906ddafe6fe964426e5cb007437ac37dc9d48f4 (diff)
2 files changed, 65 insertions, 34 deletions
diff --git a/health/README.md b/health/README.md
index e8125e29bb..9776600415 100644
--- a/health/README.md
+++ b/health/README.md
@@ -18,6 +18,6 @@ community-configured alarms for every app/service [the Agent collects metrics fr
 silence anything you're not interested in. You can even power complex lookups by running statistical algorithms against
 your metrics.
 
-Ready to take the next steps with health monitoring?
-
-[Configuration reference](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md)
+You can [use various alert notification methods](https://github.com/netdata/netdata/edit/master/docs/monitor/enable-notifications.md), 
+[customize alerts](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md), and 
+[disable/silence](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#disable-or-silence-alerts) alerts.
diff --git a/health/REFERENCE.md b/health/REFERENCE.md
index df011d6a6f..306db89350 100644
--- a/health/REFERENCE.md
+++ b/health/REFERENCE.md
@@ -37,6 +37,8 @@ You can configure the Agent's health watchdog service by editing files in two lo
 Navigate to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md) and
 use `edit-config` to make changes to any of these files.
 
+### Edit individual alerts
+
 For example, to edit the `cpu.conf` health configuration file, run:
 
 ```bash
@@ -69,17 +71,53 @@ to the values of your choosing. For example:
     crit: $this > (($status == $CRITICAL) ? (75) : (85))
 ```
 
-Save the file and [reload Netdata's health configuration](#reload-health-configuration) to make your changes live.
+Save the file and [reload Netdata's health configuration](#reload-health-configuration) to apply your changes.
+
+## Disable or silence alerts
+
+Alerts and notifications can be disabled permanently via configuration changes, or temporarily, via the
+[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md). The 
+available options are described below.
+
+### Disable all alerts
+
+In the `netdata.conf` `[health]` section, set `enabled` to `no`, and restart the agent.
+
+### Disable some alerts
+
+In the `netdata.conf` `[health]` section, set `enabled alerms` to a 
+[simple pattern](https://github.com/netdata/netdata/edit/master/libnetdata/simple_pattern/README.md) that
+excludes one or more alerts. e.g. `enabled alarms = !oom_kill *` will load all alarms except `oom_kill`. 
+
+You can also [edit the file where the alert is defined](#edit-individual-alerts), comment out its definition, 
+and [reload Netdata's health configuration](#reload-health-configuration).
 
-### Silence an individual alarm
+### Silence an individual alert
 
-Instead of disabling an alarm altogether, or even disabling _all_ alarms, you can silence individual alarms by changing
-one line in a given health entity. To silence any single alarm, change the `to:` line in its entity to `silent`.
+You can stop receiving notification for an individual alert by [changing](#edit-individual-alerts) the `to:` line to `silent`.
 
 ```yaml
       to: silent
 ```
 
+This action requires that you [reload Netdata's health configuration](#reload-health-configuration).
+
+### Temporarily disable alerts at runtime
+
+When you need to frequently disable all or some alerts from triggering during certain times (for instance 
+when running backups) you can use the 
+[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md).
+The API allows you to issue commands to control the health engine's behavior without changing configuration,
+or restarting the agent.
+
+### Temporarily silence notifications at runtime
+
+If you want health checks to keep running and alerts to keep getting triggered, but notifications to be 
+suppressed temporarily, you can use the 
+[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md).
+The API allows you to issue commands to control the health engine's behavior without changing configuration,
+or restarting the agent.
+
 ## Write a new health entity
 
 While tuning existing alarms may work in some cases, you may need to write entirely new health entities based on how
@@ -1124,33 +1162,6 @@ template: ml_5min_node
 
 The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](https://learn.netdata.cloud/docs/agent/ml#node-anomaly-rate) over the last 5 minues.
 
-## Troubleshooting
-
-You can compile Netdata with [debugging](https://github.com/netdata/netdata/blob/master/daemon/README.md#debugging) and then set in `netdata.conf`:
-
-```yaml
-[global]
-   debug flags = 0x0000000000800000
-```
-
-Then check your `/var/log/netdata/debug.log`. It will show you how it works. Important: this will generate a lot of
-output in debug.log.
-
-You can find the context of charts by looking up the chart in either `http://NODE:19999/netdata.conf` or
-`http://NODE:19999/api/v1/charts`, replacing `NODE` with the IP address or hostname for your Agent dashboard.
-
-You can find how Netdata interpreted the expressions by examining the alarm at
-`http://NODE:19999/api/v1/alarms?all`. For each expression, Netdata will return the expression as given in its
-config file, and the same expression with additional parentheses added to indicate the evaluation flow of the
-expression.
-
-## Disabling health checks or silencing notifications at runtime
-
-It's currently not possible to schedule notifications from within the alarm template. For those scenarios where you need
-to temporary disable notifications (for instance when running backups triggers a disk alert) you can disable or silence
-notifications are runtime. The health checks can be controlled at runtime via the 
-[health management API](https://github.com/netdata/netdata/blob/master/web/api/health/README.md).
-
 ## Use dimension templates to create dynamic alarms
 
 In v1.18 of Netdata, we introduced **dimension templates** for alarms, which simplifies the process of 
@@ -1311,3 +1322,23 @@ And how just a few of those dimension template-generated alarms look like in the
 
 All in all, this single entity creates 36 individual alarms. Much easier than writing 36 separate entities in your
 health configuration files!
+
+## Troubleshooting
+
+You can compile Netdata with [debugging](https://github.com/netdata/netdata/blob/master/daemon/README.md#debugging) and then set in `netdata.conf`:
+
+```yaml
+[global]
+   debug flags = 0x0000000000800000
+```
+
+Then check your `/var/log/netdata/debug.log`. It will show you how it works. Important: this will generate a lot of
+output in debug.log.
+
+You can find the context of charts by looking up the chart in either `http://NODE:19999/netdata.conf` or
+`http://NODE:19999/api/v1/charts`, replacing `NODE` with the IP address or hostname for your Agent dashboard.
+
+You can find how Netdata interpreted the expressions by examining the alarm at
+`http://NODE:19999/api/v1/alarms?all`. For each expression, Netdata will return the expression as given in its
+config file, and the same expression with additional parentheses added to indicate the evaluation flow of the
+expression.
author	Chris Akritidis <43294513+cakrit@users.noreply.github.com>	2023-02-26 09:38:33 -0800
committer	GitHub <noreply@github.com>	2023-02-26 09:38:33 -0800
commit	20745bf78ba1504591711b0780517f405158503a (patch)
tree	a00670f9cf7cb61683c6308b635536ce0f2fc1e6 /health
parent	c906ddafe6fe964426e5cb007437ac37dc9d48f4 (diff)