diff options
author | Dimitris P <Dim-P@users.noreply.github.com> | 2023-04-12 15:23:33 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-04-12 15:23:33 +0300 |
commit | 52f2df7ef56242a47821b0657f311f19490fe9ef (patch) | |
tree | cfa48b51b8d7665573d31e23a70c8567a827a620 /health | |
parent | f269cdb88070feed23ab5999a119223e54120b8e (diff) |
Collect additional BTRFS metrics (#14636)
* Add commit_stats metrics to BTRFS section
* Add error_stats metrics (per device) to BTRFS section
* Simplify commit stats variables and chart ids/names
* Add basic BTRFS error alarms.
Configured to trip whenever any of the error dimensions is non-zero.
* Add chart descriptions for new charts.
* Remove duplicate code
* Comment out some debugging code
* Always create error stats dimensions, even if zero
* Show rate of commits and commit duration instead of totals
* Change current commit metrics to absolute from incremental
* Change commits dimension to absolute and add separate commits time share chart
* Rename 'device_' rrdlabels to 'filesystem_'
* Replace all snprintf() calls with snprintfz()
* Fix codacy warning
* Provide separate error charts for each filesystem device
* Accept code review suggestions for more descriptive context and labels
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
* Add 'device' prefix to id, name, title of errors chart
* Add 'device_id' label to device_errors
* Update health.d/btrfs.conf to match new errors charts
* Remove commented out code
* Do not disable all BTRFS metrics collection if only commit_stats is missing
* Do not disable all BTRFS metrics collection if only error_stats is missing
* Fix bug of BTRFS device add/remove not being detected properly
* Fix double free() error when deleting a device
* Update dashboard info with bold tags
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
---------
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud>
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
Diffstat (limited to 'health')
-rw-r--r-- | health/health.d/btrfs.conf | 75 |
1 files changed, 75 insertions, 0 deletions
diff --git a/health/health.d/btrfs.conf b/health/health.d/btrfs.conf index 8d197aa8d2..ab63ff28d0 100644 --- a/health/health.d/btrfs.conf +++ b/health/health.d/btrfs.conf @@ -66,3 +66,78 @@ component: File system delay: up 1m down 15m multiplier 1.5 max 1h info: utilization of BTRFS system space to: sysadmin + + template: btrfs_device_read_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of read_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS read errors + to: sysadmin + + template: btrfs_device_write_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of write_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS write errors + to: sysadmin + + template: btrfs_device_flush_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of flush_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS flush errors + to: sysadmin + + template: btrfs_device_corruption_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of corruption_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS corruption errors + to: sysadmin + + template: btrfs_device_generation_errors + on: btrfs.device_errors + class: Errors + type: System +component: File system + os: * + hosts: * + families: * + units: errors + lookup: max -10m every 1m of generation_errs + warn: $this > 0 + delay: up 1m down 15m multiplier 1.5 max 1h + info: number of encountered BTRFS generation errors + to: sysadmin |