diff options
author | Ilya Mashchenko <ilya@netdata.cloud> | 2023-08-15 20:56:24 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-08-15 20:56:24 +0300 |
commit | d5bdb7cf15b73ef4e761d31298eda9b7567bc8a8 (patch) | |
tree | 8c42bc12a3d491849933f1ac2d756129312d5ab4 /health | |
parent | 4040a16ba2e68191b237b3501599bfa3c585f655 (diff) |
docs rename alarm to alert (#15812)
Diffstat (limited to 'health')
-rw-r--r-- | health/README.md | 6 | ||||
-rw-r--r-- | health/REFERENCE.md | 429 | ||||
-rw-r--r-- | health/notifications/README.md | 20 | ||||
-rw-r--r-- | health/notifications/awssns/README.md | 30 | ||||
-rw-r--r-- | health/notifications/custom/README.md | 32 | ||||
-rw-r--r-- | health/notifications/dynatrace/README.md | 4 | ||||
-rw-r--r-- | health/notifications/email/README.md | 2 | ||||
-rw-r--r-- | health/notifications/flock/README.md | 2 | ||||
-rw-r--r-- | health/notifications/gotify/README.md | 2 | ||||
-rw-r--r-- | health/notifications/hangouts/README.md | 10 | ||||
-rw-r--r-- | health/notifications/irc/README.md | 10 | ||||
-rw-r--r-- | health/notifications/matrix/README.md | 2 | ||||
-rw-r--r-- | health/notifications/ntfy/README.md | 4 | ||||
-rw-r--r-- | health/notifications/opsgenie/README.md | 5 | ||||
-rw-r--r-- | health/notifications/rocketchat/README.md | 4 | ||||
-rw-r--r-- | health/notifications/slack/README.md | 4 | ||||
-rw-r--r-- | health/notifications/stackpulse/README.md | 34 |
17 files changed, 299 insertions, 301 deletions
diff --git a/health/README.md b/health/README.md index 96f71f87a2..eec8ad06ff 100644 --- a/health/README.md +++ b/health/README.md @@ -2,10 +2,10 @@ The Netdata Agent is a health watchdog for the health and performance of your systems, services, and applications. We've worked closely with our community of DevOps engineers, SREs, and developers to define hundreds of production-ready -alarms that work without any configuration. +alerts that work without any configuration. -The Agent's health monitoring system is also dynamic and fully customizable. You can write entirely new alarms, tune the -community-configured alarms for every app/service [the Agent collects metrics from](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md), or +The Agent's health monitoring system is also dynamic and fully customizable. You can write entirely new alerts, tune the +community-configured alerts for every app/service [the Agent collects metrics from](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md), or silence anything you're not interested in. You can even power complex lookups by running statistical algorithms against your metrics. diff --git a/health/REFERENCE.md b/health/REFERENCE.md index e5179b4e54..a451f671b4 100644 --- a/health/REFERENCE.md +++ b/health/REFERENCE.md @@ -1,15 +1,15 @@ # Configure alerts -Netdata's health watchdog is highly configurable, with support for dynamic thresholds, hysteresis, alarm templates, and -more. You can tweak any of the existing alarms based on your infrastructure's topology or specific monitoring needs, or +Netdata's health watchdog is highly configurable, with support for dynamic thresholds, hysteresis, alert templates, and +more. You can tweak any of the existing alerts based on your infrastructure's topology or specific monitoring needs, or create new entities. -You can use health alarms in conjunction with any of Netdata's [collectors](https://github.com/netdata/netdata/blob/master/collectors/README.md) (see +You can use health alerts in conjunction with any of Netdata's [collectors](https://github.com/netdata/netdata/blob/master/collectors/README.md) (see the [supported collector list](https://github.com/netdata/netdata/blob/master/collectors/COLLECTORS.md)) to monitor the health of your systems, containers, and applications in real time. -While you can see active alarms both on the local dashboard and Netdata Cloud, all health alarms are configured _per -node_ via individual Netdata Agents. If you want to deploy a new alarm across your +While you can see active alerts both on the local dashboard and Netdata Cloud, all health alerts are configured _per +node_ via individual Netdata Agents. If you want to deploy a new alert across your [infrastructure](https://github.com/netdata/netdata/blob/master/docs/quickstart/infrastructure.md), you must configure each node with the same health configuration files. @@ -55,7 +55,7 @@ template: 10min_cpu_usage to: sysadmin ``` -To tune this alarm to trigger warning and critical alarms at a lower CPU utilization, change the `warn` and `crit` lines +To tune this alert to trigger warning and critical alerts at a lower CPU utilization, change the `warn` and `crit` lines to the values of your choosing. For example: ```yaml @@ -79,7 +79,7 @@ In the `netdata.conf` `[health]` section, set `enabled` to `no`, and restart the In the `netdata.conf` `[health]` section, set `enabled alarms` to a [simple pattern](https://github.com/netdata/netdata/edit/master/libnetdata/simple_pattern/README.md) that -excludes one or more alerts. e.g. `enabled alarms = !oom_kill *` will load all alarms except `oom_kill`. +excludes one or more alerts. e.g. `enabled alarms = !oom_kill *` will load all alerts except `oom_kill`. You can also [edit the file where the alert is defined](#edit-individual-alerts), comment out its definition, and [reload Netdata's health configuration](#reload-health-configuration). @@ -112,7 +112,7 @@ or restarting the agent. ## Write a new health entity -While tuning existing alarms may work in some cases, you may need to write entirely new health entities based on how +While tuning existing alerts may work in some cases, you may need to write entirely new health entities based on how your systems, containers, and applications work. Read the [health entity reference](#health-entity-reference) for a full listing of the format, @@ -128,8 +128,8 @@ sudo touch health.d/ram-usage.conf sudo ./edit-config health.d/ram-usage.conf ``` -For example, here is a health entity that triggers a warning alarm when a node's RAM usage rises above 80%, and a -critical alarm above 90%: +For example, here is a health entity that triggers a warning alert when a node's RAM usage rises above 80%, and a +critical alert above 90%: ```yaml alarm: ram_usage @@ -151,7 +151,7 @@ Let's look into each of the lines to see how they create a working health entity - `on`: Which chart the entity listens to. -- `lookup`: Which metrics the alarm monitors, the duration of time to monitor, and how to process the metrics into a +- `lookup`: Which metrics the alert monitors, the duration of time to monitor, and how to process the metrics into a usable format. - `average`: Calculate the average of all the metrics collected. - `-1m`: Use metrics from 1 minute ago until now to calculate that average. @@ -160,13 +160,13 @@ Let's look into each of the lines to see how they create a working health entity - `units`: Use percentages rather than absolute units. -- `every`: How often to perform the `lookup` calculation to decide whether or not to trigger this alarm. +- `every`: How often to perform the `lookup` calculation to decide whether to trigger this alert. -- `warn`/`crit`: The value at which Netdata should trigger a warning or critical alarm. This example uses simple +- `warn`/`crit`: The value at which Netdata should trigger a warning or critical alert. This example uses simple syntax, but most pre-configured health entities use [hysteresis](#special-use-of-the-conditional-operator) to avoid superfluous notifications. -- `info`: A description of the alarm, which will appear in the dashboard and notifications. +- `info`: A description of the alert, which will appear in the dashboard and notifications. In human-readable format: @@ -174,8 +174,8 @@ In human-readable format: > metrics from the **used** dimension and calculates the **average** of all those metrics in a **percentage** format, > using a **% unit**. The entity performs this lookup **every minute**. > -> If the average RAM usage percentage over the last 1 minute is **more than 80%**, the entity triggers a warning alarm. -> If the usage is **more than 90%**, the entity triggers a critical alarm. +> If the average RAM usage percentage over the last 1 minute is **more than 80%**, the entity triggers a warning alert. +> If the usage is **more than 90%**, the entity triggers a critical alert. When you finish writing this new health entity, [reload Netdata's health configuration](#reload-health-configuration) to see it live on the local dashboard or Netdata Cloud. @@ -188,20 +188,20 @@ without restarting all of Netdata, run `netdatacli reload-health` or `killall -U ## Health entity reference The following reference contains information about the syntax and options of _health entities_, which Netdata attaches -to charts in order to trigger alarms. +to charts in order to trigger alerts. ### Entity types There are two entity types: **alarms** and **templates**. They have the same format and feature set—the only difference is their label. -**Alarms** are attached to specific charts and use the `alarm` label. +**Alerts** are attached to specific charts and use the `alarm` label. **Templates** define rules that apply to all charts of a specific context, and use the `template` label. Templates help you apply one entity to all disks, all network interfaces, all MySQL databases, and so on. -Alarms have higher precedence and will override templates. If an alarm and template entity have the same name and attach -to the same chart, Netdata will use the alarm. +Alerts have higher precedence and will override templates. +If the `alert` and `template` entities have the same name and are attached to the same chart, Netdata will use `alarm`. ### Entity format @@ -219,39 +219,39 @@ Netdata parses the following lines. Beneath the table is an in-depth explanation This comes in handy if your `info` line consists of several sentences. | line | required | functionality | -| --------------------------------------------------- | --------------- | ------------------------------------------------------------------------------------- | -| [`alarm`/`template`](#alarm-line-alarm-or-template) | yes | Name of the alarm/template. | -| [`on`](#alarm-line-on) | yes | The chart this alarm should attach to. | -| [`class`](#alarm-line-class) | no | The general alarm classification. | -| [`type`](#alarm-line-type) | no | What area of the system the alarm monitors. | -| [`component`](#alarm-line-component) | no | Specific component of the type of the alarm. | -| [`os`](#alarm-line-os) | no | Which operating systems to run this chart. | -| [`hosts`](#alarm-line-hosts) | no | Which hostnames will run this alarm. | -| [`plugin`](#alarm-line-plugin) | no | Restrict an alarm or template to only a certain plugin. | -| [`module`](#alarm-line-module) | no | Restrict an alarm or template to only a certain module. | -| [`charts`](#alarm-line-charts) | no | Restrict an alarm or template to only certain charts. | -| [`families`](#alarm-line-families) | no | Restrict a template to only certain families. | -| [`lookup`](#alarm-line-lookup) | yes | The database lookup to find and process metrics for the chart specified through `on`. | -| [`calc`](#alarm-line-calc) | yes (see above) | A calculation to apply to the value found via `lookup` or another variable. | -| [`every`](#alarm-line-every) | no | The frequency of the alarm. | -| [`green`/`red`](#alarm-lines-green-and-red) | no | Set the green and red thresholds of a chart. | -| [`warn`/`crit`](#alarm-lines-warn-and-crit) | yes (see above) | Expressions evaluating to true or false, and when true, will trigger the alarm. | -| [`to`](#alarm-line-to) | no | A list of roles to send notifications to. | -| [`exec`](#alarm-line-exec) | no | The script to execute when the alarm changes status. | -| [`delay`](#alarm-line-delay) | no | Optional hysteresis settings to prevent floods of notifications. | -| [`repeat`](#alarm-line-repeat) | no | The interval for sending notifications when an alarm is in WARNING or CRITICAL mode. | -| [`options`](#alarm-line-options) | no | Add an option to not clear alarms. | -| [`host labels`](#alarm-line-host-labels) | no | Restrict an alarm or template to a list of matching labels present on a host. | -| [`chart labels`](#alarm-line-chart-labels) | no | Restrict an alarm or template to a list of matching labels present on a host. | -| [`info`](#alarm-line-info) | no | A brief description of the alarm. | +|-----------------------------------------------------|-----------------|---------------------------------------------------------------------------------------| +| [`alarm`/`template`](#alert-line-alarm-or-template) | yes | Name of the alert/template. | +| [`on`](#alert-line-on) | yes | The chart this alert should attach to. | +| [`class`](#alert-line-class) | no | The general alert classification. | +| [`type`](#alert-line-type) | no | What area of the system the alert monitors. | +| [`component`](#alert-line-component) | no | Specific component of the type of the alert. | +| [`os`](#alert-line-os) | no | Which operating systems to run this chart. | +| [`hosts`](#alert-line-hosts) | no | Which hostnames will run this alert. | +| [`plugin`](#alert-line-plugin) | no | Restrict an alert or template to only a certain plugin. | +| [`module`](#alert-line-module) | no | Restrict an alert or template to only a certain module. | +| [`charts`](#alert-line-charts) | no | Restrict an alert or template to only certain charts. | +| [`families`](#alert-line-families) | no | Restrict a template to only certain families. | +| [`lookup`](#alert-line-lookup) | yes | The database lookup to find and process metrics for the chart specified through `on`. | +| [`calc`](#alert-line-calc) | yes (see above) | A calculation to apply to the value found via `lookup` or another variable. | +| [`every`](#alert-line-every) | no | The frequency of the alert. | +| [`green`/`red`](#alert-lines-green-and-red) | no | Set the green and red thresholds of a chart. | +| [`warn`/`crit`](#alert-lines-warn-and-crit) | yes (see above) | Expressions evaluating to true or false, and when true, will trigger the alert. | +| [`to`](#alert-line-to) | no | A list of roles to send notifications to. | +| [`exec`](#alert-line-exec) | no | The script to execute when the alert changes status. | +| [`delay`](#alert-line-delay) | no | Optional hysteresis settings to prevent floods of notifications. | +| [`repeat`](#alert-line-repeat) | no | The interval for sending notifications when an alert is in WARNING or CRITICAL mode. | +| [`options`](#alert-line-options) | no | Add an option to not clear alerts. | +| [`host labels`](#alert-line-host-labels) | no | Restrict an alert or template to a list of matching labels present on a host. | +| [`chart labels`](#alert-line-chart-labels) | no | Restrict an alert or template to a list of matching labels present on a host. | +| [`info`](#alert-line-info) | no | A brief description of the alert. | The `alarm` or `template` line must be the first line of any entity. -#### Alarm line `alarm` or `template` +#### Alert line `alarm` or `template` -This line starts an alarm or template based on the [entity type](#entity-types) you're interested in creating. +This line starts an alert or template based on the [entity type](#entity-types) you're interested in creating. -**Alarm:** +**Alert:** ```yaml alarm: NAME @@ -266,11 +266,11 @@ template: NAME `NAME` can be any alpha character, with `.` (period) and `_` (underscore) as the only allowed symbols, but the names cannot be `chart name`, `dimension name`, `family name`, or `chart variables names`. -#### Alarm line `on` +#### Alert line `on` -This line defines the chart this alarm should attach to. +This line defines the chart this alert should attach to. -**Alarms:** +**Alerts:** ```yaml on: CHART @@ -297,40 +297,40 @@ shows a disk I/O chart, the tooltip reads: `proc:/proc/diskstats, disk.io`. You're interested in what comes after the comma: `disk.io`. That's the name of the chart's context. -If you create a template using the `disk.io` context, it will apply an alarm to every disk available on your system. +If you create a template using the `disk.io` context, it will apply an alert to every disk available on your system. -#### Alarm line `class` +#### Alert line `class` -This indicates the type of error (or general problem area) that the alarm or template applies to. For example, `Latency` can be used for alarms that trigger on latency issues on network interfaces, web servers, or database systems. Example: +This indicates the type of error (or general problem area) that the alert or template applies to. For example, `Latency` can be used for alerts that trigger on latency issues on network interfaces, web servers, or database systems. Example: ```yaml class: Latency ``` <details> -<summary>Netdata's stock alarms use the following `class` attributes by default:</summary> +<summary>Netdata's stock alerts use the following `class` attributes by default:</summary> -| Class | -| ----------------| -| Errors | -| Latency | -| Utilization | -| Workload | +| Class | +|-------------| +| Errors | +| Latency | +| Utilization | +| Workload | </details> -`class` will default to `Unknown` if the line is missing from the alarm configuration. +`class` will default to `Unknown` if the line is missing from the alert configuration. -#### Alarm line `type` +#### Alert line `type` -Type can be used to indicate the broader area of the system that the alarm applies to. For example, under the general `Database` type, you can group together alarms that operate on various database systems, like `MySQL`, `CockroachDB`, `CouchDB` etc. Example: +Type can be used to indicate the broader area of the system that the alert applies to. For example, under the general `Database` type, you can group together alerts that operate on various database systems, like `MySQL`, `CockroachDB`, `CouchDB` etc. Example: ```yaml type: Database ``` <details> -<summary>Netdata's stock alarms use the following `type` attributes by default, but feel free to adjust for your own requirements.</summary> +<summary>Netdata's stock alerts use the following `type` attributes by default, but feel free to adjust for your own requirements.</summary> | Type | Description | |-----------------|------------------------------------------------------------------------------------------------| @@ -352,7 +352,7 @@ type: Database | Power Supply | Alerts from power supply related services (e.g. apcupsd) | | Search engine | Alerts for search services (e.g. elasticsearch) | | Storage | Class for alerts dealing with storage services (storage devices typically live under `System`) | -| System | General system alarms (e.g. cpu, network, etc.) | +| System | General system alerts (e.g. cpu, network, etc.) | | Virtual Machine | Virtual Machine software | | Web Proxy | Web proxy software (e.g. squid) | | Web Server | Web server software (e.g. Apache, ngnix, etc.) | @@ -360,11 +360,11 @@ type: Database </details> -If an alarm configuration is missing the `type` line, its value will default to `Unknown`. +If an alert configuration is missing the `type` line, its value will default to `Unknown`. -#### Alarm line `component` +#### Alert line `component` -Component can be used to narrow down what the previous `type` value specifies for each alarm or template. Continuing from the previous example, `component` might include `MySQL`, `CockroachDB`, `MongoDB`, all under the same `Database` type. Example: +Component can be used to narrow down what the previous `type` value specifies for each alert or template. Continuing from the previous example, `component` might include `MySQL`, `CockroachDB`, `MongoDB`, all under the same `Database` type. Example: ```yaml component: MySQL @@ -372,9 +372,9 @@ component: MySQL As with the `class` and `type` line, if `component` is missing from the configuration, its value will default to `Unknown`. -#### Alarm line `os` +#### Alert line `os` -The alarm or template will be used only if the operating system of the host matches this list specified in `os`. The +The alert or template will be used only if the operating system of the host matches this list specified in `os`. The value is a space-separated list. The following example enables the entity on Linux, FreeBSD, and macOS, but no other operating systems. @@ -383,9 +383,9 @@ The following example enables the entity on Linux, FreeBSD, and macOS, but no ot os: linux freebsd macos ``` -#### Alarm line `hosts` +#### Alert line `hosts` -The alarm or template will be used only if the hostname of the host matches this space-separated list. +The alert or template will be used only if the hostname of the host matches this space-separated list. The following example will load on systems with the hostnames `server` and `server2`, and any system with hostnames that begin with `database`. It _will not load_ on the host `redis3`, but will load on any _other_ systems with hostnames that @@ -395,47 +395,47 @@ begin with `redis`. hosts: server1 server2 database* !redis3 redis* ``` -#### Alarm line `plugin` +#### Alert line `plugin` -The `plugin` line filters which plugin within the context this alarm should apply to. The value is a space-separated +The `plugin` line filters which plugin within the context this alert should apply to. The value is a space-separated list of [simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). For example, -you can create a filter for an alarm that applies specifically to `python.d.plugin`: +you can create a filter for an alert that applies specifically to `python.d.plugin`: ```yaml plugin: python.d.plugin ``` The `plugin` line is best used with other options like `module`. When used alone, the `plugin` line creates a very -inclusive filter that is unlikely to be of much use in production. See [`module`](#alarm-line-module) for a +inclusive filter that is unlikely to be of much use in production. See [`module`](#alert-line-module) for a comprehensive example using both. -#### Alarm line `module` +#### Alert line `module` -The `module` line filters which module within the context this alarm should apply to. The value is a space-separated +The `module` line filters which module within the context this alert should apply to. The value is a space-separated list of [simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). For -example, you can create an alarm that applies only on the `isc_dhcpd` module started by `python.d.plugin`: +example, you can create an alert that applies only on the `isc_dhcpd` module started by `python.d.plugin`: ```yaml plugin: python.d.plugin module: isc_dhcpd ``` -#### Alarm line `charts` +#### Alert line `charts` -The `charts` line filters which chart this alarm should apply to. It is only available on entities using the -[`template`](#alarm-line-alarm-or-template) line. +The `charts` line filters which chart this alert should apply to. It is only available on entities using the +[`template`](#alert-line-alarm-or-template) line. The value is a space-separated list of [simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). For -example, a template that applies to `disk.svctm` (Average Service Time) context, but excludes the disk `sdb` from alarms: +example, a template that applies to `disk.svctm` (Average Service Time) context, but excludes the disk `sdb` from alerts: ```yaml -template: disk_svctm_alarm +template: disk_svctm_alert on: disk.svctm charts: !*sdb* * ``` -#### Alarm line `families` +#### Alert line `families` -The `families` line, used only alongside templates, filters which families within the context this alarm should apply +The `families` line, used only alongside templates, filters which families within the context this alert should apply to. The value is a space-separated list. The value is a space-separate list of simple patterns. See our [simple patterns docs](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md) for @@ -448,9 +448,9 @@ families: sda sdb ``` Please note that the use of the `families` filter is planned to be deprecated in upcoming Netdata releases. -Please use [`chart labels`](#alarm-line-chart-labels) instead. +Please use [`chart labels`](#alert-line-chart-labels) instead. -#### Alarm line `lookup` +#### Alert line `lookup` This line makes a database lookup to find a value. This result of this lookup is available as `$this`. @@ -485,17 +485,17 @@ The full [database query API](https://github.com/netdata/netdata/blob/master/web `,` or `|` instead of spaces)_ and the `match-ids` and `match-names` options affect the searches for dimensions. -- `foreach DIMENSIONS` is optional and works only with [templates](#alarm-line-alarm-or-template), will always be the last parameter, and uses the same `,`/`|` +- `foreach DIMENSIONS` is optional and works only with [templates](#alert-line-alarm-or-template), will always be the last parameter, and uses the same `,`/`|` rules as the `of` parameter. Each dimension you specify in `foreach` will use the same rule - to trigger an alarm. If you set both `of` and `foreach`, Netdata will ignore the `of` parameter + to trigger an alert. If you set both `of` and `foreach`, Netdata will ignore the `of` parameter and replace it with one of the dimensions you gave to `foreach`. This option allows you to - [use dimension templates to create dynamic alarms](#use-dimension-templates-to-create-dynamic-alarms). + [use dimension templates to create dynamic alerts](#use-dimension-templates-to-create-dynamic-alerts). The result of the lookup will be available as `$this` and `$NAME` in expressions. The timestamps of the timeframe evaluated by the database lookup is available as variables `$after` and `$before` (both are unix timestamps). -#### Alarm line `calc` +#### Alert line `calc` A `calc` is designed to apply some calculation to the values or variables available to the entity. The result of the calculation will be made available at the `$this` variable, overwriting the value from your `lookup`, to use in warning @@ -512,9 +512,9 @@ The `calc` line uses [expressions](#expressions) for its syntax. calc: EXPRESSION ``` -#### Alarm line `every` +#### Alert line `every` -Sets the update frequency of this alarm. This is the same to the `every DURATION` given +Sets the update frequency of this alert. This is the same to the `every DURATION` given in the `lookup` lines. Format: @@ -525,11 +525,11 @@ every: DURATION `DURATION` accepts `s` for seconds, `m` is minutes, `h` for hours, `d` for days. -#### Alarm lines `green` and `red` +#### Alert lines `green` and `red` Set the green and red thresholds of a chart. Both are available as `$green` and `$red` in expressions. If multiple -alarms define different thresholds, the ones defined by the first alarm will be used. These will eventually visualized -on the dashboard, so only one set of them is allowed. If you need multiple sets of them in different alarms, use +alerts define different thresholds, the ones defined by the first alert will be used. Eventually it will be visualized +on the dashboard, so only one set of them is allowed If you need multiple sets of them in different alerts, use absolute numbers instead of `$red` and `$green`. Format: @@ -539,9 +539,9 @@ green: NUMBER red: NUMBER ``` -#### Alarm lines `warn` and `crit` +#### Alert lines `warn` and `crit` -Define the expression that triggers either a warning or critical alarm. These are optional, and should evaluate to +Define the expression that triggers either a warning or critical alert. These are optional, and should evaluate to either true or false (or zero/non-zero). The format uses Netdata's [expressions syntax](#expressions). @@ -551,9 +551,9 @@ warn: EXPRESSION crit: EXPRESSION ``` -#### Alarm line `to` +#### Alert line `to` -This will be the first parameter of the script to be executed when the alarm switches status. Its meaning is left up to +This will be the first script parameter that will be executed when the alert changes its status. Its meaning is left up to the `exec` script. The default `exec` script, `alarm-notify.sh`, uses this field as a space separated list of roles, which are then @@ -565,9 +565,9 @@ Format: to: ROLE1 ROLE2 ROLE3 ... ``` -#### Alarm line `exec` +#### Alert line `exec` -The script that will be executed when the alarm changes status. +Script to be executed when the alert status changes. Format: @@ -578,10 +578,10 @@ exec: SCRIPT The default `SCRIPT` is Netdata's `alarm-notify.sh`, which supports all the notifications methods Netdata supports, including custom hooks. -#### Alarm line `delay` +#### Alert line `delay` This is used to provide optional hysteresis settings for the notifications, to defend against notification floods. These -settings do not affect the actual alarm - only the time the `exec` script is executed. +settings do not affect the actual alert - only the time the `exec` script is executed. Format: @@ -589,45 +589,45 @@ Format: delay: [[[up U] [down D] multiplier M] max X] ``` -- `up U` defines the delay to be applied to a notification for an alarm that raised its status +- `up U` defines the delay to be applied to a notification for an alert that raised its status (i.e. CLEAR to WARNING, CLEAR to CRITICAL, WARNING to CRITICAL). For example, `up 10s`, the notification for this event will be sent 10 seconds after the actual event. This is used in - hope the alarm will get back to its previous state within the duration given. The default `U` + hope the alert will get back to its previous state within the duration given. The default `U` is zero. -- `down D` defines the delay to be applied to a notification for an alarm that moves to lower +- `down D` defines the delay to be applied to a notification for an alert that moves to lower state (i.e. CRITICAL to WARNING, CRITICAL to CLEAR, WARNING to CLEAR). For example, `down 1m` will delay the notification by 1 minute. This is used to prevent notifications for flapping - alarms. The default `D` is zero. + alerts. The default `D` is zero. -- `multiplier M` multiplies `U` and `D` when an alarm changes state, while a notification is +- `multiplier M` multiplies `U` and `D` when an alert changes state, while a notification is delayed. The default multiplier is `1.0`. -- `max X` defines the maximum absolute notification delay an alarm may get. The default `X` +- `max X` defines the maximum absolute notification delay an alert may get. The default `X` is `max(U * M, D * M)` (i.e. the max duration of `U` or `D` multiplied once with `M`). Example: `delay: up 10s down 15m multiplier 2 max 1h` - The time is `00:00:00` and the status of the alarm is CLEAR. + The time is `00:00:00` and the status of the alert is CLEAR. | time of event | new status | delay | notification will be sent | why | - | ------------- | ---------- | --- | ------------------------- | --- | + |---------------|------------|---------------------|---------------------------|-------------------------------------------------------------------------------| | 00:00:01 | WARNING | `up 10s` | 00:00:11 | first state switch | - | 00:00:05 | CLEAR | `down 15m x2` | 00:30:05 | the alarm changes state while a notification is delayed, so it was multiplied | + | 00:00:05 | CLEAR | `down 15m x2` | 00:30:05 | the alert changes state while a notification is delayed, so it was multiplied | | 00:00:06 | WARNING | `up 10s x2 x2` | 00:00:26 | multiplied twice | | 00:00:07 | CLEAR | `down 15m x2 x2 x2` | 00:45:07 | multiplied 3 times. | So: - - `U` and `D` are multiplied by `M` every time the alarm changes state (any state, not just + - `U` and `D` are multiplied by `M` every time the alert changes state (any state, not just their matching one) and a delay is in place. - - All are reset to their defaults when the alarm switches state without a delay in place. + - All are reset to their defaults when the alert switches state without a delay in place. -#### Alarm line `repeat` +#### Alert line `repeat` -Defines the interval between repeating notifications for the alarms in CRITICAL or WARNING mode. This will override the +Defines the interval between repeating notifications for the alerts in CRITICAL or WARNING mode. This will override the default interval settings inherited from health settings in `netdata.conf`. The default settings for repeating notifications are `default repeat warning = DURATION` and `default repeat critical = DURATION` which can be found in health stock configuration, when one of these interval is bigger than 0, Netdata will activate the repeat notification @@ -639,14 +639,14 @@ Format: repeat: [off] [warning DURATION] [critical DURATION] ``` -- `off`: Turns off the repeating feature for the current alarm. This is effective when the default repeat settings has +- `off`: Tur |