From 52f2df7ef56242a47821b0657f311f19490fe9ef Mon Sep 17 00:00:00 2001 From: Dimitris P Date: Wed, 12 Apr 2023 15:23:33 +0300 Subject: Collect additional BTRFS metrics (#14636) * Add commit_stats metrics to BTRFS section * Add error_stats metrics (per device) to BTRFS section * Simplify commit stats variables and chart ids/names * Add basic BTRFS error alarms. Configured to trip whenever any of the error dimensions is non-zero. * Add chart descriptions for new charts. * Remove duplicate code * Comment out some debugging code * Always create error stats dimensions, even if zero * Show rate of commits and commit duration instead of totals * Change current commit metrics to absolute from incremental * Change commits dimension to absolute and add separate commits time share chart * Rename 'device_' rrdlabels to 'filesystem_' * Replace all snprintf() calls with snprintfz() * Fix codacy warning * Provide separate error charts for each filesystem device * Accept code review suggestions for more descriptive context and labels Co-authored-by: Ilya Mashchenko * Add 'device' prefix to id, name, title of errors chart * Add 'device_id' label to device_errors * Update health.d/btrfs.conf to match new errors charts * Remove commented out code * Do not disable all BTRFS metrics collection if only commit_stats is missing * Do not disable all BTRFS metrics collection if only error_stats is missing * Fix bug of BTRFS device add/remove not being detected properly * Fix double free() error when deleting a device * Update dashboard info with bold tags Co-authored-by: Ilya Mashchenko --------- Co-authored-by: Austin S. Hemmelgarn Co-authored-by: Ilya Mashchenko --- web/gui/dashboard_info.js | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'web') diff --git a/web/gui/dashboard_info.js b/web/gui/dashboard_info.js index 2894293b76..3dfbd61664 100644 --- a/web/gui/dashboard_info.js +++ b/web/gui/dashboard_info.js @@ -6445,6 +6445,22 @@ netdataDashboard.context = { info: 'Logical disk usage for BTRFS system. System chunks store information about the allocation of other chunks. The disk space reported here is the usable allocation (i.e. after any striping or replication). The values reported here should be relatively small compared to Data and Metadata, and will scale with the volume size and overall space usage.' }, + 'btrfs.commits': { + info: 'Tracks filesystem wide commits. Commits mark fully consistent synchronization points for the filesystem, and are triggered automatically when certain events happen or when enough time has elapsed since the last commit.' + }, + + 'btrfs.commits_perc_time': { + info: 'Tracks commits time share. The reported time share metrics are valid only when BTRFS commit interval is longer than Netdata\'s update_every interval.' + }, + + 'btrfs.commit_timings': { + info: 'Tracks timing information for commits. last dimension metrics are valid only when BTRFS commit interval is longer than Netdata\'s update_every interval.' + }, + + 'btrfs.device_errors': { + info: 'Tracks per-device error counts. Five types of errors are tracked: read errors, write errors, flush errors, corruption errors, and generation errors. Read, write, and flush are errors reported by the underlying block device when trying to perform the associated operations on behalf of BTRFS. Corruption errors count checksum mismatches, which usually are a result of either at-rest data corruption or hardware problems. Generation errors count generational mismatches within the internal data structures of the volume, and are also usually indicative of at-rest data corruption or hardware problems. Note that errors reported here may not trigger an associated IO error in userspace, as BTRFS has relatively robust error recovery that allows it to return correct data in most multi-device setups.' + }, + // ------------------------------------------------------------------------ // RabbitMQ -- cgit v1.2.3