summaryrefslogtreecommitdiffstats
path: root/database
AgeCommit message (Collapse)Author
2023-07-14Pre release fixes (#15405)Costa Tsaousis
2023-07-13Fix CodeQL alert (#15384)Stelios Fragkakis
Fix CodeQL alert -- Multiplication result converted to larger type
2023-07-13Rename log_access and log_health (#15368)Emmanuel Vasilakis
2023-07-12Keep health log history in seconds (#15314)Emmanuel Vasilakis
* rebase * changes queries to delete based on when * readme changes * no need to do migration * wip, protect un-updated events from cleanup * remove index on when_key * fix query for claimed cleanup * if set less than minimum, set minimum * fix query * correct config assign
2023-07-11Rename log Macros (debug) (#15322)thiagoftsm
2023-07-11bearer improvements (#15342)Costa Tsaousis
2023-07-10Use spinlock in host and chart (#15328)Stelios Fragkakis
* Switch alarm log lock to spinlock * Switch the alerts lock in the chart structure to spinlock * Proper lock usage
2023-07-09alerts_transitions outputs hostnames and items statistics (#15329)Costa Tsaousis
* alerts_transitions outputs hostnames and items statistics * return details about the items in the database * added comments to items list and made the whole of statsd available under debug
2023-07-06Rename generic `error` function (#15296)thiagoftsm
2023-07-06avoid memory allocations for alert transitions facets processing (#15318)Costa Tsaousis
2023-07-06add add summary linking to alert instances (ati) when options=summary,values ↵Costa Tsaousis
is requested (#15317)
2023-07-06fix alerts transitions sorting (#15315)Costa Tsaousis
2023-07-06stale vitual hosts (#15313)Costa Tsaousis
wrong parenthesis fixed
2023-07-06Code reorg and cleanup - enrichment of /api/v2 (#15294)Costa Tsaousis
* claim script now accepts the same params as the kickstart * rewrote buildinfo to unify all methods * added cloud unavailable in cloud status * added all exporters * renamed httpd to h2o * rename ENABLE_COMPRESSION to ENABLE_LZ4 * rename global variable * rename ENABLE_HTTPS to ENABLE_OPENSSL * fix coverity-scan for openssl * add lz4 to coverity-scan * added all plugins and most of the features * added all plugins and most of the features * generalize bitmap code so that we can have any size of bitmaps * cleanup * fix compilation without protobuf * fix compilation with others allocators * fix bitmap * comprehensive bitmaps unit test * bitmap as macros * added developer mode * added system info to build info * cloud available/unavailable * added /api/v2/info * added units and ni to transitions * when showing instances and transitions, show only the instances that have transitions * cleanup * add missing quotes * add anchor to transitions * added more to build info * calculate retention per tier and expose it to /api/v2/info * added currently collected metrics * do not show space and retention when no numbers are available * fix impossible overflow * Add function for transitions and execute callback * In case of error, reset and try next dictionary entry * Fix error message * simpler logic to maintain retention per tier * /api/v2/alert_transitions * Handle case of recipient null Convert after and before to usec * Add classification, type and component * working /api/v2/alert_transitions * Fix query to properly handle context and alert name * cleanup * Add search with transition * accept transition in /api/v2/alert_transitions * totaly dynamic facets * fixed debug info * restructured facets * cleanup; removal of options=transitions * updated alert entries flags * method to exec * Return also exec run timestamp Temp table cleanup only when we don't execute with a transition * cleanup obsolete anchor parameter * Add sql_get_alert_configuration function * added options=config to alert_transitions * added /api/v2/alert_config * preliminary work for /api/v2/claim * initialize variables; do not expose expected retention if no disk space info is available; do not report aclk as initializing when not claimed * fix claim session key filename * put a newline into the session key file * more progress on claiming * final /api/v2/claim endpoint * after claiming, refresh our state at the output * Fix query to fetch config * Remove debug log * add configuration objects * add configuration objects - fixed * respect the NETDATA_DISABLE_CLOUD env variable * NETDATA_DISABLE_CLOUD env variable sets the default, but the config sets the final value * use a new claimed_id on every claiming * regenerate random key on claiming and wait for online status * ignore write() return value when writing a newline * dont show cloud status disabled when claimed_id is missing * added ctx to alert instances * cleanup config and transitions from /api/v2/alerts * fix unused variable * in /api/v2/alert_config show 1 config without an array * show alert values conditionally, by appending options=values * When storing host info if the key value is empty, store unknown * added options=summary to control when the alerts summary is shown * increased http_api_v2 to version 5 * claming random key file is now not world readable * added local-listeners binary that detects all the listening ports, their IPs and their command lines --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-07-04Change query to store host system info values (#15300)Emmanuel Vasilakis
* change query to store host info * change define name * change rc check
2023-07-04Check for source field when requesting /api/v1/alarm_log (#15306)Emmanuel Vasilakis
check for source field
2023-07-03Change info to netdata_log_info in sqlite_db_migration.c (#15303)Emmanuel Vasilakis
change info to netdata_log_info
2023-07-03Send alert chart labels config key to cloud (#15283)Emmanuel Vasilakis
* add chart_labels to alert_hash * store chart_labels in alert_hash * transmit to cloud
2023-07-01Optimizations part 3 (#15293)Costa Tsaousis
* use madvise to speed up indexing * collect all rrddim members into a collector structure * use tier 0 virtual point for storing last stored value * reorganize key fields in rrddim * remove fgets from pluginsd and replace it with read() * properly uncork the web server sockets * Revert "reorganize key fields in rrddim" This reverts commit 2d45fa3959087e05462d387ff115a260f3a04b60. * Revert "use tier 0 virtual point for storing last stored value" This reverts commit a576cdd377ad4778a3b8608cabbb7ea7bb19a3a8. * fix cork names * fix compilation warnings
2023-06-30Replace `info` macro with a less generic name (#15266)Carlo Cabrera
2023-06-29use stat() instead of lstat() (#15287)Costa Tsaousis
2023-06-29Misc alert fixes (#15274)Emmanuel Vasilakis
* rebase * proper pointer
2023-06-29Optimizations part 2 (#15280)Costa Tsaousis
* make all pluginsd functions inline, instead of function pointers * dynamic MRG partitions based on the number of CPUs * report the right size of the MRG * prevent invalid read on pluginsd exit * faster service_running() check; fix compiler warnings; shutdown replication after streaming to prevent crash on shutdown * sender is now using a spinlock * rrdcontext uses spinlock * replace select() with poll() * signed calculation of threads * disable read-ahead on jnfv2 files during scan
2023-06-29Revert "Optimizations Part 2" (#15279)Costa Tsaousis
Revert "Optimizations Part 2 (#15267)" This reverts commit b52a989497f68cddeeb0282f5fd650c4e373e477.
2023-06-28Optimizations Part 2 (#15267)Costa Tsaousis
* make all pluginsd functions inline, instead of function pointers * dynamic MRG partitions based on the number of CPUs * report the right size of the MRG
2023-06-28rewrite /api/v2/alerts (#15257)Costa Tsaousis
* rewrite /api/v2/alerts * implement searching for transition * Find transition id and issue callback * Fix parameters * call and transition filter * Search with transition as well * renames and cleanup * render flags * what if scenario for moving transitions at the top level * If transition is given, limit the query appropriately * Add alert transitions * Optimize find transition to use prepared query Drop temp table properly * enabled alert instances again * Order by when key * Order by global_id * Return last X transitions * updated field names * add ati to configurations and show all keys in debug mode * Code cleanup and optimizations * Drop temp table in case of error * Finalize temp table population statement to prevent memory leak * final changes --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-06-26use gperf for the pluginsd/streaming parser hashtable (#15251)Costa Tsaousis
* use gperf for the pluginsd parser * simplify pluginsd_parser by removing void pointers to user * pluginsd_split_words() with inlined pluginsd_space() * quoted_string_splitter() now uses a map instead of a function for determining spaces * add stress test for pluginsd parser * optimized BITMAP256 * optimized rrdpush receiver reception * optimized rrdpush sender compression * renames and cleanup * remove wrong negation * unify handshake and disconnection reasons * use parser_find_keyword * register job names only for the current repertoire
2023-06-26Relax jnfv2 caching (#15224)Costa Tsaousis
* readers should be able to recursively acquire the lock, even when there is a writer waiting * dont madvise dontneed and random * dont validate extents and metrics on jnfv2 * dont validate crc * Delay journal metric check * added MRG stress test --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-06-23Fix coverity 393183 & 393182 (#15234)Emmanuel Vasilakis
fix coverity 393183 393182
2023-06-22New alerts endpoint (#15232)Stelios Fragkakis
* alerts / alerts_log v2 * Add global_id to ae Populate entries with global id * Remove transition id from template Change history to instances * Link ae to rc in all cases Code cleanup
2023-06-22Create index for health log migration (#15233)Stelios Fragkakis
Create health_log_id index
2023-06-21/api/v2 improvements (#15227)Costa Tsaousis
* readers should be able to recursively acquire the lock, even when there is a writer waiting * added health section into nodes * uniformity of nodes * nodes instances should not return node info; http_api_v2 capability should be version 4 everywhere * added /api/v2/versions * added /api/v2/functions * /api/v2/version should be neat
2023-06-21Use a single health log table (#15157)Emmanuel Vasilakis
* move old health log tables to one * change table in sqlite_health * remove check for off period of agent * changes in aclk_alert * fixes * add new field insert_mark_timestamp * cleanup * remove hostname, create the health log table during sqlite init * create the health_log during migration * move source from health_log to alert_hash. Remove class, component and type field from health_log * Register now_usec sqlite function * use global_id instead of insert_mark_timestamp. Use function now_usec to populate it * create functions earlier to have them during migration * small unit test fix * create additional health_log_detail table. Do the insert of an alert event on both * do the update on health_log_detail * change more queries * more indexes, fix inject removed * change last executed and select health log queries * random uuid for sqlite * do migration from old tables * queries to send alerts to cloud * cleanup queries * get an alarm id from db if not found in memory * small fix on query * add info when migration completes * dont pick health_log_detail during migration * check proper old health_log table * safer migration * proper log sent alerts. small fix in claimed cleanup * cleanups * extra check for cleanup * also get an alarm_event_id from sql * check for empty source * remove cleanup of main health log table --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-06-20Fix /api/v2/contexts,nodes,nodes_instances,q before match (#15223)Costa Tsaousis
* readers should be able to recursively acquire the lock, even when there is a writer waiting * in /api/v2/contexts/nodes/nodes_instances/q calls, when the context is collected, before should be matched against now, not the latest cached retention
2023-06-19Obvious memory reductions (#15204)Costa Tsaousis
* remove rd->update_every * reduce amount of memory for RRDDIM * reorgnize rrddim->db entries * optimize rrdset and statsd * optimize dictionaries * RW_SPINLOCK for dictionaries * fix codeql warning * rw_spinlock improvements * remove obsolete assertion * fix crash on health_alarm_log_process() * use RW_SPINLOCK for AVL trees * add RW_SPINLOCK read/write trylock * pgc and mrg now use rw_spinlocks; cache line optimizations for mrg * thread tag of dbegnine init * append created datafile, lockless * make DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE friendly for lockless use * thread cancelability in spinlocks; optimize thread cancelability management * introduce a JudyL to index datafiles and use it during queries to quickly find the relevant files * use the last timestamp of each journal file for indexing * when the previous cannot be found, start from the beginning * add more stats to PDC to trace routing easier * rename spinlock functions * fix for spinlock renames * revert statsd socket statistics to size_t * turn fatal into internal_fatal() * show candidates always * show connected status and connection attempts
2023-06-19/api/v2/nodes and streaming function (#15168)Costa Tsaousis
* dummy streaming function * expose global functions upstream * separate function for pushing global functions * add missing conditions * allow streaming function to run async * started internal API for functions * cache host retention and expose it to /api/v2/nodes * internal API for function table fields; more progress on streaming status * abstracted and unified rrdhost status * port old coverity warning fix - although it is not needed * add ML information to rrdhost status * add ML capability to streaming to signal the transmission of ML information; added ML information to host status * protect host->receiver * count metrics and instances per host * exposed all inbound and outbound streaming * fix for ML status and dependency of DATA_WITH_ML to INTERPOLATED, not IEEE754 * update ML dummy * added all fields * added streaming group by and cleaned up accepted values by cloud * removed type * Revert "removed type" This reverts commit faae4177e603d4f85b7433f33f92ef3ccd23976e. * added context to db summary * new /api/v2/nodes schema * added ML type * change default function charts * log to trace new capa * add more debug * removed debugging code * retry on receive interrupted read; respect sender reconnect delay in all cases * set disconnected host flag and manipulate localhost child count atomically, inside set/clear receiver * fix infinite loop * send_to_plugin() now has a spinlock to ensure that only 1 thread is writing to the plugin/child at the same time * global cloud_status() call * cloud should be a section, since it will contain error information * put cloud capabilities into cloud * aclk status in /api/v2 agents sections * keep aclk_connection_counter * updates on /api/v2/nodes * final /api/v2/nodes and addition of /api/v2/nodes_instances * parametrize all /api/v2/xxx output to control which info is outputed per endpoint * always accept nodes selector * st needs to be per instance, not per node * fix merging of contexts; fix cups plugin priorities * add after and before parameters to /api/v2/contexts/nodes/nodes_instances/q * give each libuv worker a unique id * aclk http_api_v2 version 4
2023-06-19Add two functions that allow someone to start/stop ML. (#15185)vkalintiris
* Add two functions that allow someone to start/stop ML. * Shutdown ML after stopping collector services * Remove unnecessary mutex from ml charts. There's already a spinlock that protects the chart when a someone calls rrdset_done(). * Use a lightweight spinlock instead of a mutext for ML dimensions.
2023-06-15sqlite_health.c: remove `uuid.h` include (#15195)Nanda H Krishna
2023-06-08api v2 nodes for streaming statuses (#15162)Costa Tsaousis
* api v2 nodes for streaming statuses * remove test * move parts of the output * in api/v2/data return 5 values per point when aggregation=percentage and raw option is given; return final values when aggregation=percentage is not the final grouping
2023-06-07Re-write of SSL support in Netdata; restoration of SIGCHLD; detection of ↵Costa Tsaousis
stale plugins; streaming improvements (#15113) * add information about streaming connections to /api/v2/nodes; reset defer time when sender or receivers connect or disconnect * make each streaming destination respect its SSL settings * to not send SSL traffic over non-SSL connection * keep track of outgoing streaming connection attempts * retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ * Revert "retry SSL reads when SSL_read() returns SSL_ERROR_WANT_READ" This reverts commit 14c858677c6f2d3b08c94f298e2f45ecdb74c801. * cleanup SSL connections properly * initialize SSL in rpt before takeover * sender should free SSL when talking to a non-SSL destination * do not shutdown SSL when receiver exits * restore operation of SIGCHLD when the reaper is not enabled * create an fgets function that checks for data and times out * work on error handling of plugins exiting * remove newlines from logs * global call to waitid(), caching the result for netdata_pclose() to process * receiver tid * parser timeouts in 2 minutes instead of 10 * fix crash when UUID is NULL in SQLite * abstract sqlite3 parsing for uuid and text * write proper ssl errors on read and write * fix for SSL_ERROR_WANT_RETRY_VERIFY * SSL WANT per function * unified SSL error logging * fix compilation warning * additional logging about parser cleanup * streaming parser should call the pluginsd parser cleanup * SSL error handling work * SSL initialization unification * check for pending data when receiving SSL response with timeout * macro to check if an SSL connection has been established * remove SSL_pending() * check for SSL macros * use SSL_peek() to find if there is a response * SSL renames * more SSL renames & cleanup * rrdpush ssl connection function * abstract all SSL functions into security.c * keep track of SSL connections and always attempt to use SSL read/write when on SSL connection * signal openssl to skip certificate validation when configured to do so * better SSL error handling and logging * SSL code cleanup * SSL retry on SSL_connect and SSL_accept * SSL provide default return value for old compilers * SSL read/write functions emulate system read/write functions * fix receive/send timeout and switch from SSL_peek() to SSL_pending() * remove SSL_pending() * removed sender auto-retry and debug info for initial recevier response * ssl skip certificate verification config for web server * ssl errors log ip and port of the peer * keep ssl with web_client for its whole lifetime * thread safe socket peers to text * use error_limit() for common ssl errors * cleanup * more cleanup * coverity fixes * ssl error logs include both local and remote ip/port info * remove obsolete code
2023-06-06Check null transition id and config hash (#15147)Stelios Fragkakis
* fix crash when UUID is NULL in SQLite * abstract sqlite3 parsing for uuid and text --------- Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
2023-06-05Generate, store and transmit a unique alert event_hash_id (#15111)Emmanuel Vasilakis
* generate and store an event_hash_id * transmit to cloud * transmit to the cloud
2023-06-01fix: allow square brackets in label value (#15131)Ilya Mashchenko
2023-05-31Percentage of group aggregatable at cloud - fixed for backwards ↵Costa Tsaousis
compatibility (#15126) * percentage of group is now aggregatable at cloud across multiple nodes * do not break backwards compatibility with percentage-of-instance * calculate the percentage when percentage-of-instance is requested * increase capability version
2023-05-29Only queue an alert to the cloud when it's inserted (#15110)Emmanuel Vasilakis
only queue an alert to cloud when its inserted
2023-05-26fix the units when returning percentage of a group (#15105)Costa Tsaousis
2023-05-24Release buffer in case of error -- CID 385075 (#15090)Stelios Fragkakis
2023-05-23Better cleanup of health log table (#15045)Emmanuel Vasilakis
2023-05-22Use chart labels to filter alerts (#14982)Emmanuel Vasilakis
* use chart labels to filter alerts * add entry to readme * support chart_label=val val2 val3 * docs updates * more docs * use rc not rt
2023-05-19Simplify loop in alert checkpoint (#15065)Emmanuel Vasilakis
simplify loop in alert checkpoint