diff options
author | Markos Fountoulakis <44345837+mfundul@users.noreply.github.com> | 2019-10-24 19:43:09 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2019-10-24 19:43:09 +0300 |
commit | 88f966593abc5c7888e7c0be83780a97d4326ac2 (patch) | |
tree | ea40e93b143cb18c6da12357e1e2a64e0095ce78 /health | |
parent | a6229b245cf42b332d3f11199488f175a8a80a7c (diff) |
detect if the disk cannot keep up with data collection (#7139)
* Adjust dbengine flushing speed more dynamically
* Added error tracking statistics for failure to flush events
* Added alarm for dbengine flushing errors
* Improved dbengine accounting for commited to be written pages
Diffstat (limited to 'health')
-rw-r--r-- | health/health.d/dbengine.conf | 14 |
1 files changed, 13 insertions, 1 deletions
diff --git a/health/health.d/dbengine.conf b/health/health.d/dbengine.conf index ce6427cd25..ce9839ef1d 100644 --- a/health/health.d/dbengine.conf +++ b/health/health.d/dbengine.conf @@ -23,4 +23,16 @@ lookup: sum -10m unaligned of I/O errors crit: $this > 0 delay: down 1h multiplier 1.5 max 3h info: number of IO errors dbengine came across the last 10 minutes (CRC errors, out of space, bad disk etc) - to: sysadmin
\ No newline at end of file + to: sysadmin + + alarm: 10min_dbengine_global_flushing_errors + on: netdata.dbengine_global_errors + os: linux freebsd macos + hosts: * +lookup: sum -10m unaligned of flushing errors + units: errors + every: 3s + crit: $this > 0 + delay: down 1h multiplier 1.5 max 3h + info: number of times in the last 10 minutes that the dbengine failed to completely flush data to disk, metric data will not be stored in the database, please reduce disk load or use a faster disk + to: sysadmin |