summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorCosta Tsaousis <costa@netdata.cloud>2024-01-23 20:20:41 +0200
committerGitHub <noreply@github.com>2024-01-23 20:20:41 +0200
commitf466b8aef52c1ea394651f5fe6b4586a5e39e5af (patch)
tree96130302033cc5df6b2a5683931138c72e4ab88e
parent33412db1f50d833e56889ea4762725dfde5d6d8f (diff)
DYNCFG: dynamically configured alerts (#16779)
* cleanup alerts * fix references * fix references * fix references * load alerts once and apply them to each node * simplify health_create_alarm_entry() * Compile without warnings with compiler flags: -Wall -Wextra -Wformat=2 -Wshadow -Wno-format-nonliteral -Winit-self * code re-organization and cleanup * generate patterns when applying prototypes; give unique dyncfg names to all alerts * eval expressions keep the source and the parsed_as as STRING pointers * renamed host to node in dyncfg ids * renamed host to node in dyncfg ids * add all cloud roles to the list of parsed X-Netdata-Role header and also default to member access level * working functionality * code re-organization: moved health event-loop to a new file, moved health globals to health.c * rrdcalctemplate is removed; alert_cfg is removed; foreach dimension is removed; RRDCALCs are now instanciated only when they are linked to RRDSETs * dyncfg alert prototypes initialization for alerts * health dyncfg split to separate file * cleanup not-needed code * normalize matches between parsing and json * also detect !* for disabled alerts * dyncfg capability disabled * Store alert config part1 * Add rrdlabels_common_count * wip health variables lookup without indexes * Improve rrdlabels_common_count by reusing rrdlabels_find_label_with_key_unsafe with an additional parameter * working variables with runtime lookup * working variables with runtime lookup * delete rrddimvar and rrdfamily index * remove rrdsetvar; now all variables are in RRDVARs inside hosts and charts * added /api/v1/variable that resolves a variable the same way alerts do * remove rrdcalc from eval * remove debug code * remove duplicate assignment * Fix memory leak * all alert variables are now handled by alert_variable_lookup() and EVAL is now independent of alerts * hide all internal structures of EVAL * Enable -Wformat flag Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Adjust binding for calculation, warning, critical * Remove unused macro * Update config hash id * use the right info and summary in alerts log * use synchronous queries for alerts * Handle cases when config_hash_id is missing from health_log * remove deadlock from health worker * parsing to json payload for health alert prototypes * cleaner parsing and avoiding memory leaks in case of duplicate members in json * fix left-over rename of function * Keep original lookup field to send to the cloud Cleanup / rename function to store config Remove unused DEFINEs, functions * Use ac->lookup * link jobs to the host when the template is registered; do not accept running a function without a host * full dyncfg support for health alerts, except action TEST * working dyncfg additions, updates, removals * fixed missing source, wrong status updates * add alerts by type, component, classification, recipient and module at the /api/v2/alerts endpoint * fix dyncfg unittest * rename functions * generalize the json-c parser macros and move them to libnetdata * report progress when enabling and disabling dyncfg templates * moved rrdcalc and rrdvar to health * update alarms * added schema for alerts; separated alert_action_options from rrdr_options; restructured the json payload for alerts * enable parsed json alerts; allow sending back accepted but disabled * added format_version for alerts payload; enables/disables status now is also inheritted by the status of the rules; fixed variable names in json output * remove the RRDHOST pointer from DYNCFG * Fix command field submitted to the cloud * do not send updates to creation requests, for DYNCFG jobs --------- Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: ilyam8 <ilya@netdata.cloud>
-rw-r--r--CMakeLists.txt46
-rw-r--r--aclk/aclk_capas.c2
-rw-r--r--aclk/mqtt_websockets/mqtt_wss_client.c10
-rw-r--r--collectors/apps.plugin/apps_plugin.c4
-rw-r--r--collectors/cgroups.plugin/cgroup-charts.c2
-rw-r--r--collectors/cgroups.plugin/cgroup-internals.h6
-rw-r--r--collectors/cgroups.plugin/sys_fs_cgroup.c16
-rw-r--r--collectors/cgroups.plugin/tests/test_doubles.c4
-rw-r--r--collectors/freeipmi.plugin/freeipmi_plugin.c133
-rw-r--r--collectors/plugins.d/README.md3
-rw-r--r--collectors/plugins.d/pluginsd_parser.c14
-rw-r--r--collectors/proc.plugin/ipc.c10
-rw-r--r--collectors/proc.plugin/proc_loadavg.c6
-rw-r--r--collectors/proc.plugin/proc_net_dev.c28
-rw-r--r--collectors/proc.plugin/proc_net_netstat.c4
-rw-r--r--collectors/proc.plugin/proc_net_sockstat.c16
-rw-r--r--collectors/proc.plugin/proc_net_stat_conntrack.c4
-rw-r--r--collectors/proc.plugin/proc_stat.c4
-rw-r--r--collectors/proc.plugin/sys_class_drm.c2
-rw-r--r--collectors/proc.plugin/sys_class_infiniband.c6
-rw-r--r--collectors/statsd.plugin/statsd.c3
-rw-r--r--collectors/systemd-journal.plugin/systemd-journal-dyncfg.c1
-rw-r--r--collectors/systemd-journal.plugin/systemd-units.c26
-rw-r--r--daemon/analytics.c4
-rw-r--r--daemon/commands.c2
-rw-r--r--daemon/common.h5
-rw-r--r--daemon/config/dyncfg-echo.c96
-rw-r--r--daemon/config/dyncfg-files.c5
-rw-r--r--daemon/config/dyncfg-inline.c23
-rw-r--r--daemon/config/dyncfg-intercept.c114
-rw-r--r--daemon/config/dyncfg-internals.h8
-rw-r--r--daemon/config/dyncfg-tree.c10
-rw-r--r--daemon/config/dyncfg-unittest.c5
-rw-r--r--daemon/config/dyncfg.c64
-rw-r--r--daemon/main.c16
-rw-r--r--daemon/service.c6
-rw-r--r--daemon/unit_test.c22
-rw-r--r--database/contexts/api_v2.c242
-rw-r--r--database/contexts/query_target.c4
-rw-r--r--database/engine/rrdengine.c6
-rw-r--r--database/rrd.h34
-rw-r--r--database/rrdcalc.c867
-rw-r--r--database/rrdcalc.h271
-rw-r--r--database/rrdcalctemplate.c242
-rw-r--r--database/rrdcalctemplate.h130
-rw-r--r--database/rrddim.c4
-rw-r--r--database/rrddimvar.c273
-rw-r--r--database/rrddimvar.h21
-rw-r--r--database/rrdfamily.c69
-rw-r--r--database/rrdfunctions-inflight.c13
-rw-r--r--database/rrdhost.c147
-rw-r--r--database/rrdlabels.c44
-rw-r--r--database/rrdlabels.h1
-rw-r--r--database/rrdset.c42
-rw-r--r--database/rrdsetvar.c299
-rw-r--r--database/rrdsetvar.h30
-rw-r--r--database/rrdvar.c392
-rw-r--r--database/rrdvar.h77
-rw-r--r--database/sqlite/sqlite_aclk_alert.c8
-rw-r--r--database/sqlite/sqlite_health.c305
-rw-r--r--database/sqlite/sqlite_health.h6
-rw-r--r--exporting/prometheus/prometheus.c88
-rw-r--r--exporting/prometheus/remote_write/remote_write.c28
-rw-r--r--health/health.c1730
-rw-r--r--health/health.d/adaptec_raid.conf4
-rw-r--r--health/health.d/anomalies.conf40
-rw-r--r--health/health.d/file_descriptors.conf2
-rw-r--r--health/health.d/megacli.conf6
-rw-r--r--health/health.d/redis.conf3
-rw-r--r--health/health.h41
-rw-r--r--health/health_config.c1199
-rw-r--r--health/health_dyncfg.c603
-rw-r--r--health/health_event_loop.c751
-rw-r--r--health/health_internals.h130
-rw-r--r--health/health_json.c107
-rw-r--r--health/health_log.c35
-rw-r--r--health/health_notifications.c569
-rw-r--r--health/health_prototypes.c616
-rw-r--r--health/health_prototypes.h120
-rw-r--r--health/health_silencers.c495
-rw-r--r--health/health_silencers.h (renamed from libnetdata/health/health.h)28
-rw-r--r--health/health_variable.c486
-rw-r--r--health/rrdcalc.c539
-rw-r--r--health/rrdcalc.h146
-rw-r--r--health/rrdvar.c342
-rw-r--r--health/rrdvar.h44
-rw-r--r--health/schema.d/health:alert:prototype.json509
-rw-r--r--libnetdata/clocks/clocks.c2
-rw-r--r--libnetdata/config/dyncfg.c3
-rw-r--r--libnetdata/config/dyncfg.h3
-rw-r--r--libnetdata/eval/eval.c190
-rw-r--r--libnetdata/eval/eval.h44
-rw-r--r--libnetdata/health/health.c169
-rw-r--r--libnetdata/inlined.h10
-rw-r--r--libnetdata/json/json-c-parser-inline.h154
-rw-r--r--libnetdata/libnetdata.c19
-rw-r--r--libnetdata/libnetdata.h6
-rw-r--r--libnetdata/line_splitter/line_splitter.c9
-rw-r--r--libnetdata/line_splitter/line_splitter.h5
-rw-r--r--libnetdata/required_dummies.h11
-rw-r--r--libnetdata/simple_pattern/simple_pattern.c2
-rw-r--r--libnetdata/simple_pattern/simple_pattern.h2
-rw-r--r--libnetdata/uuid/uuid.c20
-rw-r--r--libnetdata/uuid/uuid.h18
-rw-r--r--logsmanagement/db_api.c10
-rw-r--r--logsmanagement/flb