summaryrefslogtreecommitdiffstats
path: root/health
diff options
context:
space:
mode:
authorthiagoftsm <thiagoftsm@gmail.com>2019-09-27 12:24:54 +0000
committerChris Akritidis <43294513+cakrit@users.noreply.github.com>2019-09-27 14:24:54 +0200
commite3471fa5727bcf286dd3b52ec0cdecd8fdf7067e (patch)
treef2fa1e28143ecdae85536e8488965a7be6c7bfe2 /health
parenta8b28bfbd2fe5a1814e6ddbb211961158f221fda (diff)
Create a template for all dimensions (#6560)
* health_connection: Comments inside Health Config To try to understand better what is necessary to change and where it is necessary to change anything inside the health, I commented the functions inside this file" " * health_connection: Comments about Health in other files This commit brings the rest of the comments that were missed for health" * health_connection: Comments on health_log I had to append more comments on health_log * health_connection: Create a new variable New variable is created to work with foreach * health_connection: Fix new option and doc The first implementation of the 'foreach' had a problem, this fixes the error. This commit also brings the updates for the documentation * health_connection: Understanding health This commit is to save the place that I am working, it has the map to understand all the alam process * health_connection: Update map I changed the position of the error message to identify the correct place to add new alarms * health_connection: End of simple alarm This commit finishes what is necessary to bring the same lookup for different dimensions in one unique line * health_connection: Documentation and template steps This commit brings the documentation missed for template and comments to help in the next step of apply a template to create an alarm. * health_connection: Restoring After some tests, it was detected that the alarms were not working as expected * health_connection: Fix bug and bring dimension to template This commit brings a fix for an old Netdata bug, before this the Netdata always tried to create a new entry in an index with the same id raising an error. It also brings the possibility to use 'foreach' in template * health_connection: Fix cmake compilation There was a problem with cmake compilation fixed by this commit * health_connection: shell script Finilize the shell script to test the PR * health_connection: Remove debug message During the development, I used some messages to understand the code this commit removes the last message * health_connection: Fix bugs This commits fix bugs reported by tests * health_connection: Alarm working This commit brings the necessary change for the alarms work, but it is missing the unlink from the newest list * health_connection: Template code written This commit finishes the creation of alarm from template, but it was not tested yet. * health_connection: Remove comments I am removing the comments from this PR to bring back late * health_connection: Remove lines Another commit to restore the files before they to be commented * health_connection: New alarm and remove messages I am bringing a new alarm to test template with SP and removing comments used during the development * health_connection: Functional test review After to review the functional test script, it was necessary to small adjust to test all the features available with the new version * health_connection: Free structure I am moving the free list for the correct place, the previous place was not safe * health_connection: ShellCheck This commit fixes the problems with shellcheck * health_connection: FIx hash This commit fix the hash calculation that was using wrong input * health_connection: Fix message error The system was showing a wronge message, because when we have foreach the alarm created with templated is added in a second stage to the index * health_connection: Fix documentation In this commit I am fixing the grammar of the previous doc and bringing two examples * health_connection: Fix examples This commit fix the last two examples that was brought in this PR * health_connection: Fix example doc When I brought the correct grammar in the last commit, I lost a mark * health_connection: Grammar fix Fixing grammar of the documentation * health_connection: Memory leak This commit fixes the memory leak that was present in the PR * health_connection: Reload This commit fix the problem that the alarms were not linked after to receive a SIGUSR2 * health_connection: False Positive from codacy Codacy was given a false positive, I changed the function to avoid it. * health_connection: dead code Remove dead code from the code. * health_connection: Memory Leak Remove memory leak when clean simple pattern * health_connection: Script format With this commit I am formatting the last message to return for the default color on terminal * health_connection: Script format 2 With this commit I am formatting the last message to return for the default color on terminal * health_connection: Script format 3 With this commit I am formatting the error message to return for the default color on terminal
Diffstat (limited to 'health')
-rw-r--r--health/README.md44
-rw-r--r--health/health.c23
-rw-r--r--health/health.h3
-rw-r--r--health/health_config.c183
4 files changed, 200 insertions, 53 deletions
diff --git a/health/README.md b/health/README.md
index ab8d6882a1..0ffbbdb51d 100644
--- a/health/README.md
+++ b/health/README.md
@@ -163,7 +163,7 @@ This line makes a database lookup to find a value. This result of this lookup is
The format is:
```
-lookup: METHOD AFTER [at BEFORE] [every DURATION] [OPTIONS] [of DIMENSIONS]
+lookup: METHOD AFTER [at BEFORE] [every DURATION] [OPTIONS] [of DIMENSIONS] [foreach DIMENSIONS]
```
Everything is the same with [badges](../web/api/badges/). In short:
@@ -190,6 +190,11 @@ Everything is the same with [badges](../web/api/badges/). In short:
have spaces in their names). This accepts Netdata simple patterns and the `match-ids` and
`match-names` options affect the searches for dimensions.
+- `foreach DIMENSIONS` is optional, will always be the last parameter, and uses the same `,`/`|`
+ rules as the `of` parameter. Each dimension you specify in `foreach` will use the same rule
+ to trigger an alarm. If you set both `of` and `foreach`, Netdata will ignore the `of` parameter
+ and replace it with one of the dimensions you gave to `foreach`.
+
The result of the lookup will be available as `$this` and `$NAME` in expressions.
The timestamps of the timeframe evaluated by the database lookup is available as variables
`$after` and `$before` (both are unix timestamps).
@@ -660,6 +665,43 @@ Note that the drops chart does not exist if a network interface has never droppe
When Netdata detects a dropped packet, it will add the chart and it will automatically attach this
alarm to it.
+### Example 5
+
+Check if user or system dimension is using more than 50% of cpu:
+
+```
+ alarm: dim_template
+ on: system.cpu
+ os: linux
+lookup: average -3s percentage foreach system,user
+ units: %
+ every: 10s
+ warn: $this > 50
+ crit: $this > 80
+```
+
+The `lookup` line will calculate the average CPU usage from system and user in the last 3 seconds. Because we have
+the foreach in the `lookup` line, Netdata will create two independent alarms called `dim_template_system`
+and `dim_template_user` that will have all the other parameters shared among them.
+
+### Example 6
+
+Check if all dimensions are using more than 50% of cpu:
+
+```
+ alarm: dim_template
+ on: system.cpu
+ os: linux
+lookup: average -3s percentage foreach *
+ units: %
+ every: 10s
+ warn: $this > 50
+ crit: $this > 80
+```
+
+The `lookup` line will calculate the average of CPU usage from system and user in the last 3 seconds. In this case
+Netdata will create alarms for all dimensions of the chart.
+
## Troubleshooting
You can compile Netdata with [debugging](../daemon#debugging) and then set in `netdata.conf`:
diff --git a/health/health.c b/health/health.c
index 592e6a5be2..329191fb88 100644
--- a/health/health.c
+++ b/health/health.c
@@ -113,9 +113,23 @@ void health_reload_host(RRDHOST *host) {
while(host->templates)
rrdcalctemplate_unlink_and_free(host, host->templates);
+ RRDCALCTEMPLATE *rt,*next;
+ for(rt = host->alarms_template_with_foreach; rt ; rt = next) {
+ next = rt->next;
+ rrdcalctemplate_free(rt);
+ }
+ host->alarms_template_with_foreach = NULL;
+
while(host->alarms)
rrdcalc_unlink_and_free(host, host->alarms);
+ RRDCALC *rc,*nc;
+ for(rc = host->alarms_with_foreach; rc ; rc = nc) {
+ nc = rc->next;
+ rrdcalc_free(rc);
+ }
+ host->alarms_with_foreach = NULL;
+
rrdhost_unlock(host);
// invalidate all previous entries in the alarm log
@@ -139,9 +153,17 @@ void health_reload_host(RRDHOST *host) {
health_readdir(host, user_path, stock_path, NULL);
// link the loaded alarms to their charts
+ RRDDIM *rd;
rrdset_foreach_write(st, host) {
rrdsetcalc_link_matching(st);
rrdcalctemplate_link_matching(st);
+
+ //This loop must be the last, because ` rrdcalctemplate_link_matching` will create alarms related to it.
+ rrdset_rdlock(st);
+ rrddim_foreach_read(rd, st) {
+ rrdcalc_link_to_rrddim(rd, st, host);
+ }
+ rrdset_unlock(st);
}
rrdhost_unlock(host);
@@ -888,6 +910,7 @@ void *health_main(void *ptr) {
}
}
}
+
if(unlikely(repeat_every > 0 && (rc->last_repeat + repeat_every) <= now)) {
rc->last_repeat = now;
ALARM_ENTRY *ae = health_create_alarm_entry(
diff --git a/health/health.h b/health/health.h
index 8e4d0f7cb3..ab367e9033 100644
--- a/health/health.h
+++ b/health/health.h
@@ -48,6 +48,7 @@ extern unsigned int default_health_enabled;
#define HEALTH_INFO_KEY "info"
#define HEALTH_DELAY_KEY "delay"
#define HEALTH_OPTIONS_KEY "options"
+#define HEALTH_FOREACH_KEY "foreach"
#define HEALTH_SILENCERS_MAX_FILE_LEN 10000
@@ -106,4 +107,6 @@ extern void health_alarm_log_free_one_nochecks_nounlink(ALARM_ENTRY *ae);
extern void *health_cmdapi_thread(void *ptr);
+extern SIMPLE_PATTERN *health_pattern_from_foreach(char *s);
+
#endif //NETDATA_HEALTH_H
diff --git a/health/health_config.c b/health/health_config.c
index 0d6e77a9e4..65c6d8bd7f 100644
--- a/health/health_config.c
+++ b/health/health_config.c
@@ -46,7 +46,7 @@ static inline int rrdcalc_add_alarm_from_config(RRDHOST *host, RRDCALC *rc) {
rc->id = rrdcalc_get_unique_id(host, rc->chart, rc->name, &rc->next_event_id);
- debug(D_HEALTH, "Health configuration adding alarm '%s.%s' (%u): exec '%s', recipient '%s', green " CALCULATED_NUMBER_FORMAT_AUTO ", red " CALCULATED_NUMBER_FORMAT_AUTO ", lookup: group %d, after %d, before %d, options %u, dimensions '%s', update every %d, calculation '%s', warning '%s', critical '%s', source '%s', delay up %d, delay down %d, delay max %d, delay_multiplier %f, warn_repeat_every %u, crit_repeat_every %u",
+ debug(D_HEALTH, "Health configuration adding alarm '%s.%s' (%u): exec '%s', recipient '%s', green " CALCULATED_NUMBER_FORMAT_AUTO ", red " CALCULATED_NUMBER_FORMAT_AUTO ", lookup: group %d, after %d, before %d, options %u, dimensions '%s', for each dimension '%s', update every %d, calculation '%s', warning '%s', critical '%s', source '%s', delay up %d, delay down %d, delay max %d, delay_multiplier %f, warn_repeat_every %u, crit_repeat_every %u",
rc->chart?rc->chart:"NOCHART",
rc->name,
rc->id,
@@ -59,6 +59,7 @@ static inline int rrdcalc_add_alarm_from_config(RRDHOST *host, RRDCALC *rc) {
rc->before,
rc->options,
(rc->dimensions)?rc->dimensions:"NONE",
+ (rc->foreachdim)?rc->foreachdim:"NONE",
rc->update_every,
(rc->calculation)?rc->calculation->parsed_as:"NONE",
(rc->warning)?rc->warning->parsed_as:"NONE",
@@ -73,6 +74,7 @@ static inline int rrdcalc_add_alarm_from_config(RRDHOST *host, RRDCALC *rc) {
);
rrdcalc_add_to_host(host, rc);
+
return 1;
}
@@ -93,48 +95,70 @@ static inline int rrdcalctemplate_add_template_from_config(RRDHOST *host, RRDCAL
}
RRDCALCTEMPLATE *t, *last = NULL;
- for (t = host->templates; t ; last = t, t = t->next) {
- if(unlikely(t->hash_name == rt->hash_name
- && !strcmp(t->name, rt->name)
- && !strcmp(t->family_match?t->family_match:"*", rt->family_match?rt->family_match:"*")
- )) {
- error("Health configuration template '%s' already exists for host '%s'.", rt->name, host->hostname);
- return 0;
+ if(!rt->foreachdim) {
+ for (t = host->templates; t ; last = t, t = t->next) {
+ if(unlikely(t->hash_name == rt->hash_name
+ && !strcmp(t->name, rt->name)
+ && !strcmp(t->family_match?t->family_match:"*", rt->family_match?rt->family_match:"*")
+ )) {
+ error("Health configuration template '%s' already exists for host '%s'.", rt->name, host->hostname);
+ return 0;
+ }
+ }
+
+ if(likely(last)) {
+ last->next = rt;
+ }
+ else {
+ rt->next = host->templates;
+ host->templates = rt;
+ }
+ } else {
+ for (t = host->alarms_template_with_foreach; t ; last = t, t = t->next) {
+ if(unlikely(t->hash_name == rt->hash_name
+ && !strcmp(t->name, rt->name)
+ && !strcmp(t->family_match?t->family_match:"*", rt->family_match?rt->family_match:"*")
+ )) {
+ error("Health configuration template '%s' already exists for host '%s'.", rt->name, host->hostname);
+ return 0;
+ }
+ }
+
+ if(likely(last)) {
+ last->next = rt;
+ }
+ else {
+ rt->next = host->alarms_template_with_foreach;
+ host->alarms_template_with_foreach = rt;
}
}
- debug(D_HEALTH, "Health configuration adding template '%s': context '%s', exec '%s', recipient '%s', green " CALCULATED_NUMBER_FORMAT_AUTO ", red " CALCULATED_NUMBER_FORMAT_AUTO ", lookup: group %d, after %d, before %d, options %u, dimensions '%s', update every %d, calculation '%s', warning '%s', critical '%s', source '%s', delay up %d, delay down %d, delay max %d, delay_multiplier %f, warn_repeat_every %u, crit_repeat_every %u",
- rt->name,
- (rt->context)?rt->context:"NONE",
- (rt->exec)?rt->exec:"DEFAULT",
- (rt->recipient)?rt->recipient:"DEFAULT",
- rt->green,
- rt->red,
- (int)rt->group,
- rt->after,
- rt->before,
- rt->options,
- (rt->dimensions)?rt->dimensions:"NONE",
- rt->update_every,
- (rt->calculation)?rt->calculation->parsed_as:"NONE",
- (rt->warning)?rt->warning->parsed_as:"NONE",
- (rt->critical)?rt->critical->parsed_as:"NONE",
- rt->source,
- rt->delay_up_duration,
- rt->delay_down_duration,
- rt->delay_max_duration,
- rt->delay_multiplier,
- rt->warn_repeat_every,
- rt->crit_repeat_every
+ debug(D_HEALTH, "Health configuration adding template '%s': context '%s', exec '%s', recipient '%s', green " CALCULATED_NUMBER_FORMAT_AUTO ", red " CALCULATED_NUMBER_FORMAT_AUTO ", lookup: group %d, after %d, before %d, options %u, dimensions '%s', for each dimension '%s', update every %d, calculation '%s', warning '%s', critical '%s', source '%s', delay up %d, delay down %d, delay max %d, delay_multiplier %f, warn_repeat_every %u, crit_repeat_every %u",
+ rt->name,
+ (rt->context)?rt->context:"NONE",
+ (rt->exec)?rt->exec:"DEFAULT",
+ (rt->recipient)?rt->recipient:"DEFAULT",
+ rt->green,
+ rt->red,
+ (int)rt->group,
+ rt->after,
+ rt->before,
+ rt->options,
+ (rt->dimensions)?rt->dimensions:"NONE",
+ (rt->foreachdim)?rt->foreachdim:"NONE",
+ rt->update_every,
+ (rt->calculation)?rt->calculation->parsed_as:"NONE",
+ (rt->warning)?rt->warning->parsed_as:"NONE",
+ (rt->critical)?rt->critical->parsed_as:"NONE",
+ rt->source,
+ rt->delay_up_duration,
+ rt->delay_down_duration,
+ rt->delay_max_duration,
+ rt->delay_multiplier,
+ rt->warn_repeat_every,
+ rt->crit_repeat_every
);
- if(likely(last)) {
- last->next = rt;
- }
- else {
- rt->next = host->templates;
- host->templates = rt;
- }
return 1;
}
@@ -291,16 +315,37 @@ static inline int health_parse_repeat(
return 1;
}
+/**
+ * Health pattern from Foreach
+ *
+ * Create a new simple pattern using the user input
+ *
+ * @param s the string that will be used to create the simple pattern.
+ */
+SIMPLE_PATTERN *health_pattern_from_foreach(char *s) {
+ char *convert= strdupz(s);
+ SIMPLE_PATTERN *val = NULL;
+ if(convert) {
+ dimension_remove_pipe_comma(convert);
+ val = simple_pattern_create(convert, NULL, SIMPLE_PATTERN_EXACT);
+
+ freez(convert);
+ }
+
+ return val;
+}
static inline int health_parse_db_lookup(
size_t line, const char *filename, char *string,
RRDR_GROUPING *group_method, int *after, int *before, int *every,
- uint32_t *options, char **dimensions
+ uint32_t *options, char **dimensions, char **foreachdim
) {
debug(D_HEALTH, "Health configuration parsing database lookup %zu@%s: %s", line, filename, string);
if(*dimensions) freez(*dimensions);
+ if(*foreachdim) freez(*foreachdim);
*dimensions = NULL;
+ *foreachdim = NULL;
*after = 0;
*before = 0;
*every = 0;
@@ -387,8 +432,22 @@ static inline int health_parse_db_lookup(
*options |= RRDR_OPTION_MATCH_NAMES;
}
else if(!strcasecmp(key, "of")) {
- if(*s && strcasecmp(s, "all") != 0)
+ char *find = NULL;
+ if(*s && strcasecmp(s, "all") != 0) {
+ find = strcasestr(s, " foreach");
+ if(find) {
+ *find = '\0';
+ }
*dimensions = strdupz(s);
+ }
+
+ if(!find) {
+ break;
+ }
+ s = ++find;
+ }
+ else if(!strcasecmp(key, HEALTH_FOREACH_KEY )) {
+ *foreachdim = strdupz(s);
break;
}
else {
@@ -521,8 +580,12 @@ static int health_readfile(const char *filename, void *data) {
uint32_t hash = simple_uhash(key);
if(hash == hash_alarm && !strcasecmp(key, HEALTH_ALARM_KEY)) {
- if (rc && (ignore_this || !rrdcalc_add_alarm_from_config(host, rc)))
- rrdcalc_free(rc);
+ if(rc) {
+ if(ignore_this || !rrdcalc_add_alarm_from_config(host, rc)) {
+ rrdcalc_free(rc);
+ }
+ // health_add_alarms_loop(host, rc, ignore_this) ;
+ }
if(rt) {
if (ignore_this || !rrdcalctemplate_add_template_from_config(host, rt))
@@ -552,14 +615,18 @@ static int health_readfile(const char *filename, void *data) {
}
else if(hash == hash_template && !strcasecmp(key, HEALTH_TEMPLATE_KEY)) {
if(rc) {
- if(ignore_this || !rrdcalc_add_alarm_from_config(host, rc))
+// health_add_alarms_loop(host, rc, ignore_this) ;
+ if(ignore_this || !rrdcalc_add_alarm_from_config(host, rc)) {
rrdcalc_free(rc);
+ }
rc = NULL;
}
- if(rt && (ignore_this || !rrdcalctemplate_add_template_from_config(host, rt)))
- rrdcalctemplate_free(rt);
+ if(rt) {
+ if(ignore_this || !rrdcalctemplate_add_template_from_config(host, rt))
+ rrdcalctemplate_free(rt);
+ }
rt = callocz(1, sizeof(RRDCALCTEMPLATE));
rt->name = strdupz(value);
@@ -622,8 +689,10 @@ static int health_readfile(const char *filename, void *data) {
}
else if(hash == hash_lookup && !strcasecmp(key, HEALTH_LOOKUP_KEY)) {
health_parse_db_lookup(line, filename, value, &rc->group, &rc->after, &rc->before,
- &rc->update_every,
- &rc->options, &rc->dimensions);
+ &rc->update_every, &rc->options, &rc->dimensions, &rc->foreachdim);
+ if(rc->foreachdim) {
+ rc->spdim = health_pattern_from_foreach(rc->foreachdim);
+ }
}
else if(hash == hash_every && !strcasecmp(key, HEALTH_EVERY_KEY)) {
if(!config_parse_duration(value, &rc->update_every))
@@ -752,7 +821,10 @@ static int health_readfile(const char *filename, void *data) {
}
else if(hash == hash_lookup && !strcasecmp(key, HEALTH_LOOKUP_KEY)) {
health_parse_db_lookup(line, filename, value, &rt->group, &rt->after, &rt->before,
- &rt->update_every, &rt->options, &rt->dimensions);
+ &rt->update_every, &rt->options, &rt->dimensions, &rt->foreachdim);
+ if(rt->foreachdim) {
+ rt->spdim = health_pattern_from_foreach(rt->foreachdim);
+ }
}
else if(hash == hash_every && !strcasecmp(key, HEALTH_EVERY_KEY)) {
if(!config_parse_duration(value, &rt->update_every))
@@ -866,11 +938,17 @@ static int health_readfile(const char *filename, void *data) {
}
}
- if(rc && (ignore_this || !rrdcalc_add_alarm_from_config(host, rc)))
- rrdcalc_free(rc);
+ if(rc) {
+ //health_add_alarms_loop(host, rc, ignore_this) ;
+ if(ignore_this || !rrdcalc_add_alarm_from_config(host, rc)) {
+ rrdcalc_free(rc);
+ }
+ }
- if(rt && (ignore_this || !rrdcalctemplate_add_template_from_config(host, rt)))
- rrdcalctemplate_free(rt);
+ if(rt) {
+ if(ignore_this || !rrdcalctemplate_add_template_from_config(host, rt))
+ rrdcalctemplate_free(rt);
+ }
fclose(fp);
return 1;
@@ -881,5 +959,6 @@ void health_readdir(RRDHOST *host, const char *user_path, const char *stock_path
debug(D_HEALTH, "CONFIG health is not enabled for host '%s'", host->hostname);
return;
}
+
recursive_config_double_dir_load(user_path, stock_path, subpath, health_readfile, (void *) host, 0);
}