Add ml alerts examples (#13173)

* add ml alarm examples * Update Makefile.am * add hyperlinks and node level AR example
author: Andrew Maguire <andrewm4894@gmail.com> 2022-06-21 10:19:58 +0100
committer: GitHub <noreply@github.com> 2022-06-21 12:19:58 +0300
commit: 73f803fbc8900d0b004a87369983d997b67c094d (patch)
tree: 691c0075d84c8b9fda33351e269f69c78ae372f2 /health
parent: 03de79ed1a0b9d7126f05b113b5de7d026ccf232 (diff)
3 files changed, 99 insertions, 0 deletions
diff --git a/health/Makefile.am b/health/Makefile.am
index d5eb884688..777b35858b 100644
--- a/health/Makefile.am
+++ b/health/Makefile.am
@@ -61,6 +61,7 @@ dist_healthconfig_DATA = \
     health.d/megacli.conf \
     health.d/memcached.conf \
     health.d/memory.conf \
+    health.d/ml.conf \
     health.d/mysql.conf \
     health.d/net.conf \
     health.d/netfilter.conf \
diff --git a/health/REFERENCE.md b/health/REFERENCE.md
index 3c1e53b2a3..d1af747676 100644
--- a/health/REFERENCE.md
+++ b/health/REFERENCE.md
@@ -895,6 +895,68 @@ lookup: mean -10s of user
 
 Since [`z = (x - mean) / stddev`](https://en.wikipedia.org/wiki/Standard_score) we create two input alarms, one for `mean` and one for `stddev` and then use them both as inputs in our final `cpu_user_zscore` alarm.
 
+### Example 8 - [Anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) based CPU dimensions alarm
+
+Warning if 5 minute rolling [anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) for any CPU dimension is above 5%, critical if it goes above 20%:
+
+```yaml
+template: ml_5min_cpu_dims
+      on: system.cpu
+      os: linux
+   hosts: *
+  lookup: average -5m anomaly-bit foreach *
+    calc: $this
+   units: %
+   every: 30s
+    warn: $this > (($status >= $WARNING)  ? (5) : (20))
+    crit: $this > (($status == $CRITICAL) ? (20) : (100))
+    info: rolling 5min anomaly rate for each system.cpu dimension
+```
+
+The `lookup` line will calculate the average anomaly rate of each `system.cpu` dimension over the last 5 minues. In this case
+Netdata will create alarms for all dimensions of the chart.
+
+### Example 9 - [Anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) based CPU chart alarm
+
+Warning if 5 minute rolling [anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) averaged across all CPU dimensions is above 5%, critical if it goes above 20%:
+
+```yaml
+template: ml_5min_cpu_chart
+      on: system.cpu
+      os: linux
+   hosts: *
+  lookup: average -5m anomaly-bit of *
+    calc: $this
+   units: %
+   every: 30s
+    warn: $this > (($status >= $WARNING)  ? (5) : (20))
+    crit: $this > (($status == $CRITICAL) ? (20) : (100))
+    info: rolling 5min anomaly rate for system.cpu chart
+```
+
+The `lookup` line will calculate the average anomaly rate across all `system.cpu` dimensions over the last 5 minues. In this case
+Netdata will create one alarm for the chart.
+
+### Example 10 - [Anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) based node level alarm
+
+Warning if 5 minute rolling [anomaly rate](https://learn.netdata.cloud/docs/agent/ml#anomaly-rate) averaged across all ML enabled dimensions is above 5%, critical if it goes above 20%:
+
+```yaml
+template: ml_5min_node
+      on: anomaly_detection.anomaly_rate
+      os: linux
+   hosts: *
+  lookup: average -5m of anomaly_rate
+    calc: $this
+   units: %
+   every: 30s
+    warn: $this > (($status >= $WARNING)  ? (5) : (20))
+    crit: $this > (($status == $CRITICAL) ? (20) : (100))
+    info: rolling 5min anomaly rate for all ML enabled dims
+```
+
+The `lookup` line will use the `anomaly_rate` dimension of the `anomaly_detection.anomaly_rate` ML chart to calculate the average [node level anomaly rate](https://learn.netdata.cloud/docs/agent/ml#node-anomaly-rate) over the last 5 minues.
+
 ## Troubleshooting
 
 You can compile Netdata with [debugging](/daemon/README.md#debugging) and then set in `netdata.conf`:
diff --git a/health/health.d/ml.conf b/health/health.d/ml.conf
new file mode 100644
index 0000000000..9bcc81e76b
--- /dev/null
+++ b/health/health.d/ml.conf
@@ -0,0 +1,36 @@
+# below are some examples of using the `anomaly-bit` option to define alerts based on anomaly 
+# rates as opposed to raw metric values. You can read more about the anomaly-bit and Netdata's 
+# native anomaly detection here: 
+# https://learn.netdata.cloud/docs/configure/machine-learning#anomaly-bit---100--anomalous-0--normal
+
+# examples below are commented, you would need to uncomment and adjust as desired to enable them.
+
+# alert per dimension example
+# if anomaly rate is between 5-20% then warning (pick your own threshold that works best via tial and error).
+# if anomaly rate is above 20% then critical (pick your own threshold that works best via tial and error).
+# template: ml_5min_cpu_dims
+#       on: system.cpu
+#       os: linux
+#    hosts: *
+#   lookup: average -5m anomaly-bit foreach *
+#     calc: $this
+#    units: %
+#    every: 30s
+#     warn: $this > (($status >= $WARNING)  ? (5) : (20))
+#     crit: $this > (($status == $CRITICAL) ? (20) : (100))
+#     info: rolling 5min anomaly rate for each system.cpu dimension
+
+# alert per chart example
+# if anomaly rate is between 5-20% then warning (pick your own threshold that works best via tial and error).
+# if anomaly rate is above 20% then critical (pick your own threshold that works best via tial and error).
+# template: ml_5min_cpu_chart
+#       on: system.cpu
+#       os: linux
+#    hosts: *
+#   lookup: average -5m anomaly-bit of *
+#     calc: $this
+#    units: %
+#    every: 30s
+#     warn: $this > (($status >= $WARNING)  ? (5) : (20))
+#     crit: $this > (($status == $CRITICAL) ? (20) : (100))
+#     info: rolling 5min anomaly rate for system.cpu chart
+\ No newline at end of file
author	Andrew Maguire <andrewm4894@gmail.com>	2022-06-21 10:19:58 +0100
committer	GitHub <noreply@github.com>	2022-06-21 12:19:58 +0300
commit	73f803fbc8900d0b004a87369983d997b67c094d (patch)
tree	691c0075d84c8b9fda33351e269f69c78ae372f2 /health
parent	03de79ed1a0b9d7126f05b113b5de7d026ccf232 (diff)