summaryrefslogtreecommitdiffstats
path: root/health
diff options
context:
space:
mode:
authorMarkos Fountoulakis <44345837+mfundul@users.noreply.github.com>2019-10-24 19:43:09 +0300
committerGitHub <noreply@github.com>2019-10-24 19:43:09 +0300
commit88f966593abc5c7888e7c0be83780a97d4326ac2 (patch)
treeea40e93b143cb18c6da12357e1e2a64e0095ce78 /health
parenta6229b245cf42b332d3f11199488f175a8a80a7c (diff)
detect if the disk cannot keep up with data collection (#7139)
* Adjust dbengine flushing speed more dynamically * Added error tracking statistics for failure to flush events * Added alarm for dbengine flushing errors * Improved dbengine accounting for commited to be written pages
Diffstat (limited to 'health')
-rw-r--r--health/health.d/dbengine.conf14
1 files changed, 13 insertions, 1 deletions
diff --git a/health/health.d/dbengine.conf b/health/health.d/dbengine.conf
index ce6427cd25..ce9839ef1d 100644
--- a/health/health.d/dbengine.conf
+++ b/health/health.d/dbengine.conf
@@ -23,4 +23,16 @@ lookup: sum -10m unaligned of I/O errors
crit: $this > 0
delay: down 1h multiplier 1.5 max 3h
info: number of IO errors dbengine came across the last 10 minutes (CRC errors, out of space, bad disk etc)
- to: sysadmin \ No newline at end of file
+ to: sysadmin
+
+ alarm: 10min_dbengine_global_flushing_errors
+ on: netdata.dbengine_global_errors
+ os: linux freebsd macos
+ hosts: *
+lookup: sum -10m unaligned of flushing errors
+ units: errors
+ every: 3s
+ crit: $this > 0
+ delay: down 1h multiplier 1.5 max 3h
+ info: number of times in the last 10 minutes that the dbengine failed to completely flush data to disk, metric data will not be stored in the database, please reduce disk load or use a faster disk
+ to: sysadmin