summaryrefslogtreecommitdiffstats
path: root/health/health_json.c
AgeCommit message (Collapse)Author
2022-09-28Change cast to remove coverity warnings (#13735)thiagoftsm
2022-09-26Fix warnings during compilation time on ARM (32 bits) (#13681)thiagoftsm
2022-09-19RRD structures managed by dictionaries (#13646)Costa Tsaousis
* rrdset - in progress * rrdset optimal constructor; rrdset conflict * rrdset final touches * re-organization of rrdset object members * prevent use-after-free * dictionary dfe supports also counting of iterations * rrddim managed by dictionary * rrd.h cleanup * DICTIONARY_ITEM now is referencing actual dictionary items in the code * removed rrdset linked list * Revert "removed rrdset linked list" This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5. * removed rrdset linked list * added comments * Switch chart uuid to static allocation in rrdset Remove unused functions * rrdset_archive() and friends... * always create rrdfamily * enable ml_free_dimension * rrddim_foreach done with dfe * most custom rrddim loops replaced with rrddim_foreach * removed accesses to rrddim->dimensions * removed locks that are no longer needed * rrdsetvar is now managed by the dictionary * set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853 * conflict callback of rrdsetvar now properly checks if it has to reset the variable * dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM * dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe * dictionary walkthrough callbacks get dictionary acquired items * dictionary reference counters that can be dupped from zero * added advanced functions for get and del * rrdvar managed by dictionaries * thread safety for rrdsetvar * faster rrdvar initialization * rrdvar string lengths should match in all add, del, get functions * rrdvar internals hidden from the rest of the world * rrdvar is now acquired throughout netdata * hide the internal structures of rrdsetvar * rrdsetvar is now acquired through out netdata * rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata * better error handling * dont create variables if not initialized for health * dont create variables if not initialized for health again * rrdfamily is now managed by dictionaries; references of it are acquired dictionary items * type checking on acquired objects * rrdcalc renaming of functions * type checking for rrdfamily_acquired * rrdcalc managed by dictionaries * rrdcalc double free fix * host rrdvars is always needed * attempt to fix deadlock 1 * attempt to fix deadlock 2 * Remove unused variable * attempt to fix deadlock 3 * snprintfz * rrdcalc index in rrdset fix * Stop storing active charts and computing chart hashes * Remove store active chart function * Remove compute chart hash function * Remove sql_store_chart_hash function * Remove store_active_dimension function * dictionary delayed destruction * formatting and cleanup * zero dictionary base on rrdsetvar * added internal error to log delayed destructions of dictionaries * typo in rrddimvar * added debugging info to dictionary * debug info * fix for rrdcalc keys being empty * remove forgotten unlock * remove deadlock * Switch to metadata version 5 and drop chart_hash chart_hash_map chart_active dimension_active v_chart_hash * SQL cosmetic changes * do not busy wait while destroying a referenced dictionary * remove deadlock * code cleanup; re-organization; * fast cleanup and flushing of dictionaries * number formatting fixes * do not delete configured alerts when archiving a chart * rrddim obsolete linked list management outside dictionaries * removed duplicate contexts call * fix crash when rrdfamily is not initialized * dont keep rrddimvar referenced * properly cleanup rrdvar * removed some locks * Do not attempt to cleanup chart_hash / chart_hash_map * rrdcalctemplate managed by dictionary * register callbacks on the right dictionary * removed some more locks * rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread * when looking up for an alarm look using both chart id and chart name * host initialization a bit more modular * init rrdlabels on host update * preparation for dictionary views * improved comment * unused variables without internal checks * service threads isolation and worker info * more worker info in service thread * thread cancelability debugging with internal checks * strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647 * dictionary modularization * Remove unused SQL statement definition * unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated * remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops * rewritten dictionary to have 2 separate locks, one for indexing and another for traversal * Update collectors/cgroups.plugin/sys_fs_cgroup.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * Update collectors/cgroups.plugin/sys_fs_cgroup.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * Update collectors/proc.plugin/proc_net_dev.c Co-authored-by: Vladimir Kobal <vlad@prokk.net> * fix memory leak in rrdset cache_dir * minor dictionary changes * dont use index locks in single threaded * obsolete dict option * rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim; * fix jump on uninitialized value in dictionary; remove double free of cache_dir * addressed codacy findings * removed debugging code * use the private refcount on dictionaries * make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim; * more dictionary statistics * global statistics about dictionary operations, memory, items, callbacks * dictionary support for views - missing the public API * removed warning about unused parameter * chart and context name for cloud * chart and context name for cloud, again * dictionary statistics fixed; first implementation of dictionary views - not currently used * only the master can globally delete an item * context needs netdata prefix * fix context and chart it of spins * fix for host variables when health is not enabled * run garbage collector on item insert too * Fix info message; remove extra "using" * update dict unittest for new placement of garbage collector * we need RRDHOST->rrdvars for maintaining custom host variables * Health initialization needs the host->host_uuid * split STRING to its own files; no code changes other than that * initialize health unconditionally * unit tests do not pollute the global scope with their variables * Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-05Deduplicate all netdata strings (#13570)Costa Tsaousis
* rrdfamily * rrddim * rrdset plugin and module names * rrdset units * rrdset type * rrdset family * rrdset title * rrdset title more * rrdset context * rrdcalctemplate context and removal of context hash from rrdset * strings statistics * rrdset name * rearranged members of rrdset * eliminate rrdset name hash; rrdcalc chart converted to STRING * rrdset id, eliminated rrdset hash * rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate * rrdcalctemplate * rrdvar * eval_variable * rrddimvar and rrdsetvar * rrdhost hostname, os and tags * fix master commits * added thread cache; implemented string_dup without locks * faster thread cache * rrdset and rrddim now use dictionaries for indexing * rrdhost now uses dictionary * rrdfamily now uses DICTIONARY * rrdvar using dictionary instead of AVL * allocate the right size to rrdvar flag members * rrdhost remaining char * members to STRING * * better error handling on indexing * strings now use a read/write lock to allow parallel searches to the index * removed AVL support from dictionaries; implemented STRING with native Judy calls * string releases should be negative * only 31 bits are allowed for enum flags * proper locking on strings * string threading unittest and fixes * fix lgtm finding * fixed naming * stream chart/dimension definitions at the beginning of a streaming session * thread stack variable is undefined on thread cancel * rrdcontext garbage collect per host on startup * worker control in garbage collection * relaxed deletion of rrdmetrics * type checking on dictfe * netdata chart to monitor rrdcontext triggers * Group chart label updates * rrdcontext better handling of collected rrdsets * rrdpush incremental transmition of definitions should use as much buffer as possible * require 1MB per chart * empty the sender buffer before enabling metrics streaming * fill up to 50% of buffer * reset signaling metrics sending * use the shared variable for status * use separate host flag for enabling streaming of metrics * make sure the flag is clear * add logging for streaming * add logging for streaming on buffer overflow * circular_buffer proper sizing * removed obsolete logs * do not execute worker jobs if not necessary * better messages about compression disabling * proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped * monitor stream sender used buffer ratio * Update exporting unit tests * no need to compare label value with strcmp * streaming send workers now monitor bandwidth * workers now use strings * streaming receiver monitors incoming bandwidth * parser shift of worker ids * minor fixes * Group chart label updates * Populate context with dimensions that have data * Fix chart id * better shift of parser worker ids * fix for streaming compression * properly count received bytes * ensure LZ4 compression ring buffer does not wrap prematurely * do not stream empty charts; do not process empty instances in rrdcontext * need_to_send_chart_definition() does not need an rrdset lock any more * rrdcontext objects are collected, after data have been written to the db * better logging of RRDCONTEXT transitions * always set all variables needed by the worker utilization charts * implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes * lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock * STRING code re-organization for clarity * thread_cache improvements; double numbers precision on worker threads * STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS * rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts * Add index to speed up database context searches * Removed last_updated optimization (was also buggy after latest merge with master) Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-08-16Support chart labels in alerts (#13290)Emmanuel Vasilakis
* chart labels for alerts * proper termination * use strchr * change if statement * change label variable. add docs * change doc * assign buf to temp * use new dictionary functions * reduce variable scope * reduce line length * make sure rrdcalc updates labels after inserted * reduce var scope * add rrdcalc.c for cmocka tests * Revert "add rrdcalc.c for cmocka tests" This reverts commit 5fe122adcf7abcbe6d67fa2ebd7c4ff8620cf9c8. * Fix cmocka unit tests * valgrind errors Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-08-04Send chart context with alert events to the cloud (#13409)Emmanuel Vasilakis
* add chart context to alert events * migrate health log tables to add chart_context * send it via proto message * add from v3 to v4 * free table * free chart_context
2022-05-24Return rc->last_update from alarms_values api (#12968)Emmanuel Vasilakis
* add rc last_update as when in alarms values api * rename when to last_updated * update swagger
2022-05-17Adjust alarms count (#12896)Emmanuel Vasilakis
* check for is_available when counting * remove empty line
2022-04-20Store alert log entries even if alert is repeating. (#12226)Emmanuel Vasilakis
* store alarm log entries even if it is repeating * log times repeated for an alert
2022-01-17Add localhost hostname to the edit_command (#11793)Emmanuel Vasilakis
* include localhost hostname in edit_command * since the edit_command now contains the localhost name, dont pass it again to the script
2021-09-19Use sqlite to store the health log and alert configurations. (#11399)Emmanuel Vasilakis
* Rebased * use sql health log if it exists * store alert config in sqlite * move unlock before loop * fix warnings * remove hash message * check return from counting health log * remove check of hostname when reading log * try to create the health log table to catch accidental removals of it * fix warnings, cast values, report config_hash_id * use snprintfz, add info logging * remove unnecessary strdup and free * check if stored config hash is null * return if prepare statement fails * replace with static variables * remove replace info, free edit_command * remove setting cfg entries to NULL * change uuid_copy * check return of uuid_parse, and exit if its not valid * also free cfg * use address * removed health_alarm_entry_sql2json and sql_health_alarm_log_select_all * remove check for is_valid_alarm_id * replace lengths with GUID_LEN * use uuid_unparse_lower_fix * removed web api endopoint to get alert config * check for non null values for name, chart and family * include a date_updated field in alert_hash * for config hash, digest NULL string if value to digest is null * Use empty string instead of null
2021-05-31Provide UTC offset in seconds and edit health config command (#11051)Emmanuel Vasilakis
* add abbreviated timezone, utc offset in seconds, and edit health alarm command rebased * formating * use str2i instead of atoi
2021-04-23Rename struct fields from class to classification. (#11019)vkalintiris
These fields made our headers incompatible with C++, because `class` is a reserved keyword.
2021-04-20Provide new attributes in health conf files (#10961)Emmanuel Vasilakis
* read and store new attributes (class, component, type) from health conf files. Replace family variable in info strings * provide the attributes to jsons * remove extra semicolon * populate conf files with new attributes * added newline * remove extra defines from health.h * remove empty line * remove realloc * use helper variables for find_and_replace. Adjust position for next strstr * remove comments * Add type to mysql.conf and vcsa.conf * fix formatting * add parenthesis * remove extra assignment * changes to mysql_galera_cluster_state from master * add type Errors to unbound_request_list_overwritten * fix identation for info strings spawning more than one line * check for null, replace with empty string if true * add class, component, type to systemdunits.conf
2021-04-07Fix incorrect health log entries (#10822)Stelios Fragkakis
2021-03-26Add a new parameter 'chart' to the /api/v1/alarm_log. (#10788)Emmanuel Vasilakis
* add a chart parameter to api alarm_log * Use hash_chart instead * also do the strcmp * cleaner? * save an if * move simple_hash out of the loop * Changed if * formatting changes * fix formating
2021-03-18Don't show alarms for charts without data (#10804)Vladimir Kobal
2021-03-16Adds ACLK-NG as fallback(#10315)Timotej S
* adds a new implementation of ACLK written almost from scratch * external dependencies only OpenSSL and JSON-C * fallback for systems where ACLK Legacy can't build (for technical or philosophical reasons) * can be forced to build by giving "--aclk-ng" to the installer
2021-02-11Allow the REMOVED status to be sent if a previous status was WARN/CRIT (#10533)Stelios Fragkakis
2021-01-18[health][NFC] Mark internal functions as static. (#10518)vkalintiris
2020-05-12Remove check for old alarm status (#8978)Stelios Fragkakis
Fixed coverity issue (CID 358436)
2020-05-11Enable support for Netdata Cloud.Andrew Moss
This PR merges the feature-branch to make the cloud live. It contains the following work: Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com(opens in new tab)> Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)> Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud(opens in new tab)> Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)> Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com(opens in new tab)> Co-authored-by: Timotej S <6674623+underhood@users.noreply.github.com(opens in new tab)> Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com(opens in new tab)> * dashboard with new navbars, v1.0-alpha.9: PR #8478 * dashboard v1.0.11: netdata/dashboard#76 Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)> * Added installer code to bundle JSON-c if it's not present. PR #8836 Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)> * Fix claiming config PR #8843 * Adds JSON-c as hard dep. for ACLK PR #8838 * Fix SSL renegotiation errors in old versions of openssl. PR #8840. Also - we have a transient problem with opensuse CI so this PR disables them with a commit from @prologic. Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)> * Fix claiming error handling PR #8850 * Added CI to verify JSON-C bundling code in installer PR #8853 * Make cloud-enabled flag in web/api/v1/info be independent of ACLK build success PR #8866 * Reduce ACLK_STABLE_TIMEOUT from 10 to 3 seconds PR #8871 * remove old-cloud related UI from old dashboard (accessible now via /old suffix) PR #8858 * dashboard v1.0.13 PR #8870 * dashboard v1.0.14 PR #8904 * Provide feedback on proxy setting changes PR #8895 * Change the name of the connect message to update during an ongoing session PR #8927 * Fetch active alarms from alarm_log PR #8944
2020-02-24Merging the feature branch for the ACLK in the previous sprint. (#8179)Andrew Moss
* ACLK connection and protocol improvements (#8139) * Adding ACLK retry on connection failure (#8147) * Fixed reconnect issues on the ACLK. (#8163) * Cleaning up ACLK - part 1 (#8167) Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2020-02-07alarms_values: New endpoint (#7836)thiagoftsm
* alarms_values: New endpoint This commit brings the new endpoint to Netdata * alarms_values: Documentation This commit brings the missing documentation for the PR * alarms_values: New function This commit brings a new code that removes dupplication * alarms_values: Fix typo * alarms_values: Fix missing word This commit fixes the missing word inside the documentation * alarms_values: Fix order This commit fixes the order of the alarm answer * alarms_values: Fixes typo and remmove unecessary variable * alarms_values: Fixes doc Describe all paramenters present in the endpoint * alarms_values: Same options This commit brings the same input pattern for alams and alams_values * alarms_values: Update swagger This commit brings the missing information to swagger json * alarms_values: Update swagger This commit brings the missing information to swagger yaml
2020-01-06Clean up host labels in API responses (#7616)Vladimir Kobal
* Remove host labels from the Swagger specification * Remove host labels from the api responses
2019-12-16Labels issues (#7515)Andrew Moss
Initial work on host labels from the dedicated branch. Includes work for issues #7096, #7400, #7411, #7369, #7410, #7458, #7459, #7412 and #7408 by @vlvkobal, @thiagoftsm, @cakrit and @amoss.
2019-09-03Fix clear notification missing (#6638)thiagoftsm
* alarm_clear: Mapping In this PR I mapped all the necessary steps to discover the solution for the ISSUE 6581 * alarm_clear: Documentation and fixes This commit fixes the problem that were present in Netdata and it also updates the documentation of the functions and Netdata. * alarm_clear: shell script The original implementation did not have a shell script, here I begin to fix this * alarm_clear: shell script It is necessay to verify why make is not producing the same binary than cmake and finish the changes in the script * alarm_clear: adjust in health.c I rewrote the health.c to be more readable, but I discovered the problem I had in the last few hours were due kernel update * alarm_clear: script changes In this commit I am bringing the final version of the script that test the alarm repetition * alarm_clear: script fix and remove comments IN this commit I am fixing the shellcheck errors and removing some debug messages that were present in the code while I was developing * alarm_clear: Format The health.c had wrong tabulation, this PR brings back the pattern of space as tab for this file * alarm_clear: Script The script was using killlall that is not more present in all Linux distribution this commit removes this and bring the new way to stop Netdata * alarm_clear: return to previous tabulation I am bringing back the old tabulation here and I will create a new PR exclusively for this * alarm_clear: Remove comments I am removing comments from this PR to keep the focus in the major problem * alarm_clear: Remove comments 2 I forgot one comment * alarm_clear: New variable I am appending a new variable in the check before the rebase, because the health.c changed in other file has a direct relationship with what I did here until now * alarm_clear: Fix clear repetition With this last commit, I am bringing a new way to raise the clear alarm, but it is not repeating more with this fix, it displayed one time when it is cleaned and it will display the message again, if and only if, the alarm was raised.
2019-08-23Add alarm status counter api call (#6554)Valentin Rakush
##### Summary This is implementation of a prerequisite for the requested feature #6536 (Generate an overall status badge/chart for the health of category) ##### Component Name web/api/ health/ ##### Details Provide a new, `alarm_count` API call that returns the total number of alarms for given contexts and alarm states. Default is the total number of raised alarms, for all contexts.
2019-07-01Repeating alarm notifications (#6309)thiagoftsm
* Alarm_repeat mergin the original! * Alarm_repeat binary tree! * Alarm_repeat binary tree finished! * Alarm_repeat move function and format string * Alarms bringing a new Binary tree * Alarms fixing the last two * Alarm_repeat useless var! * Alarm fix format and repeat alarm! * Alarm_backend steps! * Alarm_repeat stopping to test cloud! * Alarm_repeat stopping to test cloud 2! * Alarm_repeat fixing when restart!
2019-01-15Port ACLs, Management API and Health commands (#4969)Chris Akritidis
##### Summary fixes #2673 fixes #2149 fixes #5017 fixes #3830 fixes #3187 fixes #5154 Implements a command API for health which will accept commands via a socket to selectively suppress health checks. Allows different ports to accept different request types (streaming, dashboard, api, registry, netdata.conf, badges, management) Removes support for multi-threaded and single-threaded web servers. ##### Component Name health, daemon
2018-10-23modularize the query api (#4443)Costa Tsaousis
* modularized exporters * modularized API data queries * optimized queries * modularized API data reduction methods * modularized api queries * added new directories in makefiles * added median db query * moved all RRDR_GROUPING related to query.h * added stddev query * operational median and stddev * working simple exponential smoothing * too complex to do it right * fixed ses * fixed ses * rewrote query engine * fix double-exponential-smoothing * cleanup * fixed bug identified by @vlvkobal at rrdset_first_slot() * enable freeipmi on systems with libipmimonitoring; #4440
2018-10-15modularized all source code (#4391)Costa Tsaousis
* modularized all external plugins * added README.md in plugins * fixed title * fixed typo * relative link to external plugins * external plugins configuration README * added plugins link * remove plugins link * plugin names are links * added links to external plugins * removed unecessary spacing * list to table * added language * fixed typo * list to table on internal plugins * added more documentation to internal plugins * moved python, node, and bash code and configs into the external plugins * added statsd README * fix bug with corrupting config.h every 2nd compilation * moved all config files together with their code * more documentation * diskspace info * fixed broken links in apps.plugin * added backends docs * updated plugins readme * move nc-backend.sh to backends * created daemon directory * moved all code outside src/ * fixed readme identation * renamed plugins.d.plugin to plugins.d * updated readme * removed linux- from linux plugins * updated readme * updated readme * updated readme * updated readme * updated readme * updated readme * fixed README.md links * fixed netdata tree links * updated codacy, codeclimate and lgtm excluded paths * update CMakeLists.txt * updated automake options at top directory * libnetdata slit into directories * updated READMEs * updated READMEs * updated ARL docs * updated ARL docs * moved /plugins to /collectors * moved all external plugins outside plugins.d * updated codacy, codeclimate, lgtm * updated README * updated url * updated readme * updated readme * updated readme * updated readme * moved api and web into webserver * web/api web/gui web/server * modularized webserver * removed web/gui/version.txt