summaryrefslogtreecommitdiffstats
path: root/health
AgeCommit message (Collapse)Author
2024-02-06fix exporting internal charts context and family (#16683)Ilya Mashchenko
* fix exporting inernal charts context * update alerts * fix exporting charts family (cherry picked from commit f6c49c4ffe6bb0516a91a388ffd01b188f2acb6b)
2023-12-01Code cleanup (#16448)Stelios Fragkakis
* Code cleanup * More cleanup * More cleanup * Use FILENAME_MAX * query fix
2023-11-30convert some error messages to info (#16508)Ilya Mashchenko
2023-11-23Handle ephemeral hosts (#16381)Stelios Fragkakis
* Handle ephemeral hosts * Node empheral removal timeout 86400 seconds (1 day) * Move config from health to global section * Set a node to queryable false when it is ephemeral and is removed * Log queryable. Send queryable=0 only when forcing host deletion (the node is ephemeral) * Switch to "is ephemeral node" Document stream.conf * Unregister node id
2023-11-22fixes for logging (#16459)Costa Tsaousis
* fixes for logging * added log environment variables to cgroup-network * fix wrong condition * rename variable * fix leftovers * fix log2journal docs and logs
2023-11-22add sbindir_POST to PATH of bash scripts that use `systemd-cat-native` (#16456)Ilya Mashchenko
* use sbindir_POST in charts.d and alarm-notify * convert cgroup-name.sh to in * convert cgroup-network-helper.sh to in * simplify cgroup-network-helper
2023-11-22New logging layer (#16357)Costa Tsaousis
* cleanup of logging - wip * first working iteration * add errno annotator * replace old logging functions with netdata_logger() * cleanup * update error_limit * fix remanining error_limit references * work on fatal() * started working on structured logs * full cleanup * default logging to files; fix all plugins initialization * fix formatting of numbers * cleanup and reorg * fix coverity issues * cleanup obsolete code * fix formatting of numbers * fix log rotation * fix for older systems * add detection of systemd journal via stderr * finished on access.log * remove left-over transport * do not add empty fields to the logs * journal get compact uuids; X-Transaction-ID header is added in web responses * allow compiling on systems without memfd sealing * added libnetdata/uuid directory * move datetime formatters to libnetdata * add missing files * link the makefiles in libnetdata * added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses * added stream receiver, sender, proc plugin and pluginsd log stack * iso8601 advanced usage; line_splitter module in libnetdata; code cleanup * add message ids to streaming inbound and outbound connections * cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled * send SIGABRT to external plugins only if we are not shutting down * fix cross cleanup in pluginsd parser * fatal when there is a stack error in logs * compile netdata with -fexceptions * do not kill external plugins with SIGABRT * metasync info logs to debug level * added severity to logs * added json output; added options per log output; added documentation; fixed issues mentioned * allow memfd only on linux * moved journal low level functions to journal.c/h * move health logs to daemon.log with proper priorities * fixed a couple of bugs; health log in journal * updated docs * systemd-cat-native command to push structured logs to journal from the command line * fix makefiles * restored NETDATA_LOG_SEVERITY_LEVEL * fix makefiles * systemd-cat-native can also work as the logger of Netdata scripts * do not require a socket to systemd-journal to log-as-netdata * alarm notify logs in native format * properly compare log ids * fatals log alerts; alarm-notify.sh working * fix overflow warning * alarm-notify.sh now logs the request (command line) * anotate external plugins logs with the function cmd they run * added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash * reformatted alarm-notify logs * unify cgroup-network-helper.sh * added quotes around params * charts.d.plugin switched logging to journal native * quotes for logfmt * unify the status codes of streaming receivers and senders * alarm-notify: dont log anything, if there is nothing to do * all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available * migrate cgroup-name.sh to new logging * systemd-cat-native now supports messages with newlines * socket.c logs use priority * cleanup log field types * inherit the systemd set INVOCATION_ID if found * allow systemd-cat-native to send messages to a systemd-journal-remote URL * log2journal command that can convert structured logs to journal export format * various fixes and documentation of log2journal * updated log2journal docs * updated log2journal docs * updated documentation of fields * allow compiling without libcurl * do not use socket as format string * added version information to newly added tools * updated documentation and help messages * fix the namespace socket path * print errno with error * do not timeout * updated docs * updated docs * updated docs * log2journal updated docs and params * when talking to a remote journal, systemd-cat-native batches the messages * enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote * Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote" This reverts commit b079d53c11f6687cd64d804fdd7b24c0492bf245. * note about uncompressed traffic * log2journal: code reorg and cleanup to make modular * finished rewriting log2journal * more comments * rewriting rules support * increased limits * updated docs * updated docs * fix old log call * use journal only when stderr is connected to journal * update netdata.spec for libcurl, libpcre2 and log2journal * pcre2-devel * do not require pcre2 in centos < 8, amazonlinux < 2023, open suse * log2journal only on systems pcre2 is available * ignore log2journal in .gitignore * avoid log2journal on centos 7, amazonlinux 2 and opensuse * add pcre2-8 to static build * undo last commit * Bundle to static Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Add build deps for deb packages Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Add dependencies; build from source Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Test build for amazon linux and centos expect to fail for suse Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * fix minor oversight Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Reorg code * Add the install from source (deps) as a TODO * Not enable the build on suse ecosystem Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> --------- Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-11-21Remove queue limit from ACLK sync event loop (#16411)Stelios Fragkakis
Code cleanup
2023-11-21Minor: Small health docs typo fix (#16439)Emmanuel Vasilakis
small typo fix
2023-11-17proc_net_dev: keep nic_speed_max in kilobits (#16429)Ilya Mashchenko
2023-11-16Minor: Remove backtick from doc (#16423)Emmanuel Vasilakis
remove backtick
2023-11-15fix proc net dev: keep iface speed chart var in Mbits (#16418)Ilya Mashchenko
2023-11-15Don't print errors from reading filtered alerts (#16417)Emmanuel Vasilakis
2023-11-13Add a apcupsd status code metric (#16361)thomasbeaudry
Co-authored-by: ilyam8 <ilya@netdata.cloud>
2023-11-12health guides: remove guides for alerts that don't exist in the repo (#16375)Ilya Mashchenko
2023-11-10docs: remove 'families' from health reference (#16380)Ilya Mashchenko
2023-11-08health: put guides into subdirs (#16358)Ilya Mashchenko
2023-11-08Import alert guides from Netdata Assistant (#16355)Ralph Meijer
2023-11-07Copy outdated alert guides to health/guides (#16352)Fotis Voutsas
2023-10-30Apcupsd selftest metric (#16286)thomasbeaudry
* add the apcupsd selftest chart * add an alarm for selftest failing * add selftest chart with each possible value being reported as 0/1 (inactive/active) * add a warning for SELFTEST code being BT or NG * Update health/health.d/apcupsd.conf Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> * add reference for SELFTEST codes * rename fields * escape quotation marks in awk command * fix template on
2023-10-29Regenerate integrations.js (#16291)Netdata bot
Co-authored-by: ilyam8 <ilyam8@users.noreply.github.com>
2023-10-22Fix Discord webhook payload (#16257)luchaos
2023-10-18remove charts.d/nut (#16230)Ilya Mashchenko
2023-10-14health: attach drops ratio alarms to net.drops (#16199)Ilya Mashchenko
* health: attach drops ratio alarms to net.drops * update meta * enable net drops on all os * update freebsd meta
2023-10-13Suppress "families" log (#16186)Stelios Fragkakis
2023-10-13Regenerate integrations.js (#16176)Netdata bot
Co-authored-by: Ancairon <Ancairon@users.noreply.github.com>
2023-10-06health: don't log an unknown key error for "families" (#16145)Ilya Mashchenko
2023-10-06Regenerate integrations.js (#16138)Netdata bot
2023-10-06Remove family from alerts (#16025)Emmanuel Vasilakis
* remove loading and storing families from alert configs * remove families from silencers * remove from alarm log * start remove from alarm-notify.sh.in * fix test alarm * rebase * remove from api/v1/alarm_log * remove from alert stream * remove from config stream * remove from more * remove from swagger for health api * revert md changes * remove from health cmd api test
2023-10-06Add summary to alerts configurations (#16129)Emmanuel Vasilakis
* add to web log alerts * add more * more adds * more adds * more * more * more * more * more * more * more * more * updates --------- Co-authored-by: ilyam8 <ilya@netdata.cloud>
2023-10-05fix proc netstat metrics (#16122)Ilya Mashchenko
2023-10-03external plugins: respect env NETDATA_LOG_SEVERITY_LEVEL (#16089)Ilya Mashchenko
* func to set global log sev level for ext plugins * apps: set log sev level * cgroup-network: set log sev level * cups: set log sev level * debugfs: set log sev level * freeipmi: set log sev level * nfacct: set log sev level * perf: set log sev level * slabinfo: set log sev level * xenstat: set log sev level * cgroup-name.sh: handle log sev level * alarm-notify.sh: handle log sev level * systemd-journal: set log sev e level * ebpf.plugin: set log sev level * ioping: handle log sev level * cgroup-network-helper.sh: handle log sev level * fix for cgroup-network-helper.sh
2023-10-02Regenerate integrations.js (#16062)Netdata bot
Co-authored-by: ilyam8 <ilyam8@users.noreply.github.com> Co-authored-by: Fotis Voutsas <fotis@netdata.cloud>
2023-09-27health: add upsd alerts (#16036)Ilya Mashchenko
2023-09-26Run health queries from tier 0 (#16032)Emmanuel Vasilakis
run health queries from tier 0
2023-09-26Remove discontinued Hangouts and StackPulse notification methods (#16041)Fotis Voutsas
2023-09-19Add a summary field to alerts (#15886)Emmanuel Vasilakis
* add a summary field to alerts * add summary field to db * rebase * better migration * rebase * change email notification * revert to silent * use macro * add the summary field to some alerts * add more summary fields * change migration function * add to postgres alerts * add summary to vernemq * more summary fields * more summary fields * fixes * add doc
2023-09-18Re-store rrdvars on late dimensions (#15984)Emmanuel Vasilakis
2023-09-15update go.d.plugin to v0.55.0 (#15981)Ilya Mashchenko
2023-09-13enable `ml_1min_node_ar` as a default alert (#14687)Andrew Maguire
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2023-09-12feat: Adds access control configuration for ntfy (#15932)Mike Iversen
Co-authored-by: ilyam8 <ilya@netdata.cloud>
2023-09-06Replace _ with spaces for name variable for ntfy (#15909)МАН69К
2023-09-06Reset the obsolete flag on service thread (#15892)Emmanuel Vasilakis
* reset the RRDHOST_FLAG_PENDING_OBSOLETE_CHARTS flag * do rrdset_free even when in dbengine mode * remove RRDSET_FLAG_ARCHIVED * remove commented line * use is_available_for_viewers
2023-09-01Reduce label memory (#15255)Stelios Fragkakis
2023-08-22Misc code cleanup (#15665)Stelios Fragkakis
* Cleanup code * Add SQLITE3_COLUMN_STRDUPZ_OR_NULL for readability * Bind unique id properly * Cleanup with is_claimed parameter to decide which cleanup to use Unify cleanup function sql_health_alarm_log_cleanup Add SQLITE3_BIND_STRING_OR_NULL and SQLITE3_COLUMN_STRINGDUP_OR_NULL sql_health_alarm_log_count returns number of rows instead of updating host->health.health_log_entries_written Reformat queries for clarity * Try to fix codacy issue * Try to fix codacy issue -- issue small warning * Change label from fail to done * Drop index on unique_id and health_log_id and create one on both * Update database/sqlite/sqlite_aclk_alert.c Co-authored-by: Emmanuel Vasilakis <mrzammler@gmail.com> * Fix double bind --------- Co-authored-by: Emmanuel Vasilakis <mrzammler@gmail.com>
2023-08-15docs rename alarm to alert (#15812)Ilya Mashchenko
2023-08-03disable systemdunits alarms (#15726)Ilya Mashchenko
2023-08-03Revert "Refactor RRD code. (#15423)" (#15723)vkalintiris
This reverts commit 440bd51e08fdfa2a4daa191fb68643456028a753. dbengine was still being used for non-zero tiers even on non-dbengine modes.
2023-08-02Assorted fixes for integrations templates. (#15702)Austin S. Hemmelgarn
* Fix missing endif in template. * Add h2 to setup template. * Reduce duplication within the troubleshooting template. * Add missing troubleshooting section for agent notifications. * Fix path checking in troubleshooting template.
2023-08-01Update metadata.yaml (#15679)Shyam Sreevalsan