summaryrefslogtreecommitdiffstats
path: root/health
diff options
context:
space:
mode:
authorDimitris P <Dim-P@users.noreply.github.com>2023-04-12 15:23:33 +0300
committerGitHub <noreply@github.com>2023-04-12 15:23:33 +0300
commit52f2df7ef56242a47821b0657f311f19490fe9ef (patch)
treecfa48b51b8d7665573d31e23a70c8567a827a620 /health
parentf269cdb88070feed23ab5999a119223e54120b8e (diff)
Collect additional BTRFS metrics (#14636)
* Add commit_stats metrics to BTRFS section * Add error_stats metrics (per device) to BTRFS section * Simplify commit stats variables and chart ids/names * Add basic BTRFS error alarms. Configured to trip whenever any of the error dimensions is non-zero. * Add chart descriptions for new charts. * Remove duplicate code * Comment out some debugging code * Always create error stats dimensions, even if zero * Show rate of commits and commit duration instead of totals * Change current commit metrics to absolute from incremental * Change commits dimension to absolute and add separate commits time share chart * Rename 'device_' rrdlabels to 'filesystem_' * Replace all snprintf() calls with snprintfz() * Fix codacy warning * Provide separate error charts for each filesystem device * Accept code review suggestions for more descriptive context and labels Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> * Add 'device' prefix to id, name, title of errors chart * Add 'device_id' label to device_errors * Update health.d/btrfs.conf to match new errors charts * Remove commented out code * Do not disable all BTRFS metrics collection if only commit_stats is missing * Do not disable all BTRFS metrics collection if only error_stats is missing * Fix bug of BTRFS device add/remove not being detected properly * Fix double free() error when deleting a device * Update dashboard info with bold tags Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> --------- Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud> Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
Diffstat (limited to 'health')
-rw-r--r--health/health.d/btrfs.conf75
1 files changed, 75 insertions, 0 deletions
diff --git a/health/health.d/btrfs.conf b/health/health.d/btrfs.conf
index 8d197aa8d2..ab63ff28d0 100644
--- a/health/health.d/btrfs.conf
+++ b/health/health.d/btrfs.conf
@@ -66,3 +66,78 @@ component: File system
delay: up 1m down 15m multiplier 1.5 max 1h
info: utilization of BTRFS system space
to: sysadmin
+
+ template: btrfs_device_read_errors
+ on: btrfs.device_errors
+ class: Errors
+ type: System
+component: File system
+ os: *
+ hosts: *
+ families: *
+ units: errors
+ lookup: max -10m every 1m of read_errs
+ warn: $this > 0
+ delay: up 1m down 15m multiplier 1.5 max 1h
+ info: number of encountered BTRFS read errors
+ to: sysadmin
+
+ template: btrfs_device_write_errors
+ on: btrfs.device_errors
+ class: Errors
+ type: System
+component: File system
+ os: *
+ hosts: *
+ families: *
+ units: errors
+ lookup: max -10m every 1m of write_errs
+ warn: $this > 0
+ delay: up 1m down 15m multiplier 1.5 max 1h
+ info: number of encountered BTRFS write errors
+ to: sysadmin
+
+ template: btrfs_device_flush_errors
+ on: btrfs.device_errors
+ class: Errors
+ type: System
+component: File system
+ os: *
+ hosts: *
+ families: *
+ units: errors
+ lookup: max -10m every 1m of flush_errs
+ warn: $this > 0
+ delay: up 1m down 15m multiplier 1.5 max 1h
+ info: number of encountered BTRFS flush errors
+ to: sysadmin
+
+ template: btrfs_device_corruption_errors
+ on: btrfs.device_errors
+ class: Errors
+ type: System
+component: File system
+ os: *
+ hosts: *
+ families: *
+ units: errors
+ lookup: max -10m every 1m of corruption_errs
+ warn: $this > 0
+ delay: up 1m down 15m multiplier 1.5 max 1h
+ info: number of encountered BTRFS corruption errors
+ to: sysadmin
+
+ template: btrfs_device_generation_errors
+ on: btrfs.device_errors
+ class: Errors
+ type: System
+component: File system
+ os: *
+ hosts: *
+ families: *
+ units: errors
+ lookup: max -10m every 1m of generation_errs
+ warn: $this > 0
+ delay: up 1m down 15m multiplier 1.5 max 1h
+ info: number of encountered BTRFS generation errors
+ to: sysadmin