summaryrefslogtreecommitdiffstats
path: root/claim
AgeCommit message (Collapse)Author
2024-02-01Create a top-level directory to contain source code. (#16896)vkalintiris
* Move ML under src * Move spwan under src * Move cli/ under src/ * move registry/ under src/ * move streaming/ under src/ * Move claim under src. Update docs * Move database/ under src/ * Move libnetdata/ under src/ * Update references to libnetdata * Fix logsmanagement includes * Update generated script path.
2024-01-31add the CLOEXEC flag to all sockets and files (#16881)Costa Tsaousis
* add the CLOEXEC flag to all sockets and files * add network-viewer to apps.plugin; min update frequency 5 seconds
2024-01-08change get kickstart url to https://get.netdata.cloud/kickstart.sh (#16738)Ilya Mashchenko
2023-12-15Fix small coverity issue (#16616)Stelios Fragkakis
Fix CID 405078: Waiting while holding a lock
2023-12-13CMake build system. (#15996)vkalintiris
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud> Co-authored-by: Tasos Katsoulas <12612986+tkatsoulas@users.noreply.github.com> Co-authored-by: Emmanuel Vasilakis <mrzammler@mm.st> Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: netdatabot <bot@netdata.cloud> Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2023-12-12code cleanup (#16542)Costa Tsaousis
fixed minor code cleanup warnings
2023-11-22New logging layer (#16357)Costa Tsaousis
* cleanup of logging - wip * first working iteration * add errno annotator * replace old logging functions with netdata_logger() * cleanup * update error_limit * fix remanining error_limit references * work on fatal() * started working on structured logs * full cleanup * default logging to files; fix all plugins initialization * fix formatting of numbers * cleanup and reorg * fix coverity issues * cleanup obsolete code * fix formatting of numbers * fix log rotation * fix for older systems * add detection of systemd journal via stderr * finished on access.log * remove left-over transport * do not add empty fields to the logs * journal get compact uuids; X-Transaction-ID header is added in web responses * allow compiling on systems without memfd sealing * added libnetdata/uuid directory * move datetime formatters to libnetdata * add missing files * link the makefiles in libnetdata * added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses * added stream receiver, sender, proc plugin and pluginsd log stack * iso8601 advanced usage; line_splitter module in libnetdata; code cleanup * add message ids to streaming inbound and outbound connections * cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled * send SIGABRT to external plugins only if we are not shutting down * fix cross cleanup in pluginsd parser * fatal when there is a stack error in logs * compile netdata with -fexceptions * do not kill external plugins with SIGABRT * metasync info logs to debug level * added severity to logs * added json output; added options per log output; added documentation; fixed issues mentioned * allow memfd only on linux * moved journal low level functions to journal.c/h * move health logs to daemon.log with proper priorities * fixed a couple of bugs; health log in journal * updated docs * systemd-cat-native command to push structured logs to journal from the command line * fix makefiles * restored NETDATA_LOG_SEVERITY_LEVEL * fix makefiles * systemd-cat-native can also work as the logger of Netdata scripts * do not require a socket to systemd-journal to log-as-netdata * alarm notify logs in native format * properly compare log ids * fatals log alerts; alarm-notify.sh working * fix overflow warning * alarm-notify.sh now logs the request (command line) * anotate external plugins logs with the function cmd they run * added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash * reformatted alarm-notify logs * unify cgroup-network-helper.sh * added quotes around params * charts.d.plugin switched logging to journal native * quotes for logfmt * unify the status codes of streaming receivers and senders * alarm-notify: dont log anything, if there is nothing to do * all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available * migrate cgroup-name.sh to new logging * systemd-cat-native now supports messages with newlines * socket.c logs use priority * cleanup log field types * inherit the systemd set INVOCATION_ID if found * allow systemd-cat-native to send messages to a systemd-journal-remote URL * log2journal command that can convert structured logs to journal export format * various fixes and documentation of log2journal * updated log2journal docs * updated log2journal docs * updated documentation of fields * allow compiling without libcurl * do not use socket as format string * added version information to newly added tools * updated documentation and help messages * fix the namespace socket path * print errno with error * do not timeout * updated docs * updated docs * updated docs * log2journal updated docs and params * when talking to a remote journal, systemd-cat-native batches the messages * enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote * Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote" This reverts commit b079d53c11f6687cd64d804fdd7b24c0492bf245. * note about uncompressed traffic * log2journal: code reorg and cleanup to make modular * finished rewriting log2journal * more comments * rewriting rules support * increased limits * updated docs * updated docs * fix old log call * use journal only when stderr is connected to journal * update netdata.spec for libcurl, libpcre2 and log2journal * pcre2-devel * do not require pcre2 in centos < 8, amazonlinux < 2023, open suse * log2journal only on systems pcre2 is available * ignore log2journal in .gitignore * avoid log2journal on centos 7, amazonlinux 2 and opensuse * add pcre2-8 to static build * undo last commit * Bundle to static Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Add build deps for deb packages Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Add dependencies; build from source Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Test build for amazon linux and centos expect to fail for suse Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * fix minor oversight Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> * Reorg code * Add the install from source (deps) as a TODO * Not enable the build on suse ecosystem Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> --------- Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-10-30claim.sh: use echo instead of /bin/echo (#16300)Ilya Mashchenko
2023-08-31Fix build with --disable-https (#15395)Emmanuel Vasilakis
rebased
2023-08-03systemd-journal plugin (#15363)Costa Tsaousis
2023-08-03Revert "Refactor RRD code. (#15423)" (#15723)vkalintiris
This reverts commit 440bd51e08fdfa2a4daa191fb68643456028a753. dbengine was still being used for non-zero tiers even on non-dbengine modes.
2023-07-26detect the path the netdata-claim.sh script is in (#15556)Costa Tsaousis
2023-07-26Refactor RRD code. (#15423)vkalintiris
* Storage engine. * Host indexes to rrdb * Move globals to rrdb * Move storage_tiers_backfill to rrdb * default_rrd_update_every to rrdb * default_rrd_history_entries to rrdb * gap_when_lost_iterations_above to rrdb * rrdset_free_obsolete_time_s to rrdb * libuv_worker_threads to rrdb * ieee754_doubles to rrdb * rrdhost_free_orphan_time_s to rrdb * rrd_rwlock to rrdb * localhost to rrdb * rm extern from func decls * mv rrd macro under rrd.h * default_rrdeng_page_cache_mb to rrdb * default_rrdeng_extent_cache_mb to rrdb * db_engine_journal_check to rrdb * default_rrdeng_disk_quota_mb to rrdb * default_multidb_disk_quota_mb to rrdb * multidb_ctx to rrdb * page_type_size to rrdb * tier_page_size to rrdb * No storage_engine_id in rrdim functions * storage_engine_id is provided by st * Update to fix merge conflict. * Update field name * Remove unnecessary macros from rrd.h * Rm unused type decls * Rm duplicate func decls * make internal function static * Make the rest of public dbengine funcs accept a storage_instance. * No more rrdengine_instance :) * rm rrdset_debug from rrd.h * Use rrdb to access globals in ML and ACLK Missed due to not having the submodules in the worktree. * rm total_number * rm RRDVAR_TYPE_TOTAL * rm unused inline * Rm names from typedef'd enums * rm unused header include * Move include * Rm unused header include * s/rrdhost_find_or_create/rrdhost_get_or_create/g * s/find_host_by_node_id/rrdhost_find_by_node_id/ Also, remove duplicate definition in rrdcontext.c * rm macro used only once * rm macro used only once * Reduce rrd.h api by moving funcs into a collector specific utils header * Remove unused func * Move parser specific function out of rrd.h * return storage_number instead of void pointer * move code related to rrd initialization out of rrdhost.c * Remove tier_grouping from rrdim_tier Saves 8 * storage_tiers bytes per dimension. * Fix rebase * s/rrd_update_every/update_every/ * Mark functions as static and constify args * Add license notes and file to build systems. * Remove remaining non-log/config mentions of memory mode * Move rrdlabels api to separate file. Also, move localhost functions that loads labels outside of database/ and into daemon/ * Remove function decl in rrd.h * merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir * Do not expose internal function from rrd.h * Rm NETDATA_RRD_INTERNALS Only one function decl is covered. We have more database internal functions that we currently expose for no good reason. These will be placed in a separate internal header in follow up PRs. * Add license note * Include libnetdata.h instead of aral.h * Use rrdb to access localhost * Fix builds without dbengine * Add header to build system files * Add rrdlabels.h to build systems * Move func def from rrd.h to rrdhost.c * Fix macos build * Rm non-existing function * Rebase master * Define buffer length macro in ad_charts. * Fix FreeBSD builds. * Mark functions static * Rm func decls without definitions * Rebase master * Rebase master * Properly initialize value of storage tiers. * Fix build after rebase.
2023-07-26change api.netdata.cloud to app.netdata.cloud (#15538)Ilya Mashchenko
2023-07-25wait for node_id while claiming (#15526)Costa Tsaousis
2023-07-12Fix coverity issues (#15375)Stelios Fragkakis
* parser cannot be null -- CID 394449 Dereference before null check (REVERSE_INULL) * Fix warning - ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
2023-07-11bearer improvements (#15342)Costa Tsaousis
2023-07-07Fix coverity issue 394862 - Argument cannot be negative (#15324)Stelios Fragkakis
* Fix coverity 394862 - Argument cannot be negative * If GUID write fails, no point in trying to add newline
2023-07-06Rename generic `error` function (#15296)thiagoftsm
2023-07-06Code reorg and cleanup - enrichment of /api/v2 (#15294)Costa Tsaousis
* claim script now accepts the same params as the kickstart * rewrote buildinfo to unify all methods * added cloud unavailable in cloud status * added all exporters * renamed httpd to h2o * rename ENABLE_COMPRESSION to ENABLE_LZ4 * rename global variable * rename ENABLE_HTTPS to ENABLE_OPENSSL * fix coverity-scan for openssl * add lz4 to coverity-scan * added all plugins and most of the features * added all plugins and most of the features * generalize bitmap code so that we can have any size of bitmaps * cleanup * fix compilation without protobuf * fix compilation with others allocators * fix bitmap * comprehensive bitmaps unit test * bitmap as macros * added developer mode * added system info to build info * cloud available/unavailable * added /api/v2/info * added units and ni to transitions * when showing instances and transitions, show only the instances that have transitions * cleanup * add missing quotes * add anchor to transitions * added more to build info * calculate retention per tier and expose it to /api/v2/info * added currently collected metrics * do not show space and retention when no numbers are available * fix impossible overflow * Add function for transitions and execute callback * In case of error, reset and try next dictionary entry * Fix error message * simpler logic to maintain retention per tier * /api/v2/alert_transitions * Handle case of recipient null Convert after and before to usec * Add classification, type and component * working /api/v2/alert_transitions * Fix query to properly handle context and alert name * cleanup * Add search with transition * accept transition in /api/v2/alert_transitions * totaly dynamic facets * fixed debug info * restructured facets * cleanup; removal of options=transitions * updated alert entries flags * method to exec * Return also exec run timestamp Temp table cleanup only when we don't execute with a transition * cleanup obsolete anchor parameter * Add sql_get_alert_configuration function * added options=config to alert_transitions * added /api/v2/alert_config * preliminary work for /api/v2/claim * initialize variables; do not expose expected retention if no disk space info is available; do not report aclk as initializing when not claimed * fix claim session key filename * put a newline into the session key file * more progress on claiming * final /api/v2/claim endpoint * after claiming, refresh our state at the output * Fix query to fetch config * Remove debug log * add configuration objects * add configuration objects - fixed * respect the NETDATA_DISABLE_CLOUD env variable * NETDATA_DISABLE_CLOUD env variable sets the default, but the config sets the final value * use a new claimed_id on every claiming * regenerate random key on claiming and wait for online status * ignore write() return value when writing a newline * dont show cloud status disabled when claimed_id is missing * added ctx to alert instances * cleanup config and transitions from /api/v2/alerts * fix unused variable * in /api/v2/alert_config show 1 config without an array * show alert values conditionally, by appending options=values * When storing host info if the key value is empty, store unknown * added options=summary to control when the alerts summary is shown * increased http_api_v2 to version 5 * claming random key file is now not world readable * added local-listeners binary that detects all the listening ports, their IPs and their command lines --------- Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-06-30Replace `info` macro with a less generic name (#15266)Carlo Cabrera
2023-06-30merged spaces, war rooms and invite your team to one place (#15289)Hugo Valente
* merged spaces, war rooms and invite your team to one place * Update docs/cloud/manage/organize-your-infrastrucutre-invite-your-team.md Co-authored-by: Fotis Voutsas <fotis@netdata.cloud> * Update docs/cloud/manage/organize-your-infrastrucutre-invite-your-team.md Co-authored-by: Fotis Voutsas <fotis@netdata.cloud> * fix from review comments * delete invite your team file * Apply suggestions from code review --------- Co-authored-by: Fotis Voutsas <fotis@netdata.cloud> Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2023-06-19/api/v2/nodes and streaming function (#15168)Costa Tsaousis
* dummy streaming function * expose global functions upstream * separate function for pushing global functions * add missing conditions * allow streaming function to run async * started internal API for functions * cache host retention and expose it to /api/v2/nodes * internal API for function table fields; more progress on streaming status * abstracted and unified rrdhost status * port old coverity warning fix - although it is not needed * add ML information to rrdhost status * add ML capability to streaming to signal the transmission of ML information; added ML information to host status * protect host->receiver * count metrics and instances per host * exposed all inbound and outbound streaming * fix for ML status and dependency of DATA_WITH_ML to INTERPOLATED, not IEEE754 * update ML dummy * added all fields * added streaming group by and cleaned up accepted values by cloud * removed type * Revert "removed type" This reverts commit faae4177e603d4f85b7433f33f92ef3ccd23976e. * added context to db summary * new /api/v2/nodes schema * added ML type * change default function charts * log to trace new capa * add more debug * removed debugging code * retry on receive interrupted read; respect sender reconnect delay in all cases * set disconnected host flag and manipulate localhost child count atomically, inside set/clear receiver * fix infinite loop * send_to_plugin() now has a spinlock to ensure that only 1 thread is writing to the plugin/child at the same time * global cloud_status() call * cloud should be a section, since it will contain error information * put cloud capabilities into cloud * aclk status in /api/v2 agents sections * keep aclk_connection_counter * updates on /api/v2/nodes * final /api/v2/nodes and addition of /api/v2/nodes_instances * parametrize all /api/v2/xxx output to control which info is outputed per endpoint * always accept nodes selector * st needs to be per instance, not per node * fix merging of contexts; fix cups plugin priorities * add after and before parameters to /api/v2/contexts/nodes/nodes_instances/q * give each libuv worker a unique id * aclk http_api_v2 version 4
2023-04-20Replace "XYZ view" with "XYZ tab" in documentation files (#14930)Fotis Voutsas
* replace 'XYZ view' with 'XYZ tab' in documentation files * Update README.md
2023-03-29First batch of adding descriptions to the documentation (#14825)Fotis Voutsas
* first batch of adding descriptions to the documentation, and fixing inconsistencies inside the files * Apply suggestions from code review Co-authored-by: Chris Akritidis <43294513+cakrit@users.noreply.github.com> * Update docs/cloud/visualize/overview.md * Replace view with tab in overview.md --------- Co-authored-by: Chris Akritidis <43294513+cakrit@users.noreply.github.com>
2023-03-27Address Learn feedback from users (#14802)Fotis Voutsas
* Update collectors/charts.d.plugin/sensors/README.md * lint the file, and correct some mistypes * add suggestions and fix wording * cover feedback by users * fix broken link * Update invite-your-team.md
2023-03-13Add instructions for reconnecting a Docker node to another Space (#14714)Fotis Voutsas
* add reconnecting instructions for Docker * add yml notice * Update claim/README.md Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> --------- Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2023-02-17Reorg learn 021723 (#14556)Chris Akritidis
* Change titles of agent alert notifications * Reintroduce netdata for iot * Eliminate guides category, merge health config docs * Rename setup to configuration * Codacy fixes and move health config reference
2023-02-14Fix broken links in markdown files (#14513)Fotis Voutsas
* fix broken links in claim/README.md * delete broken link in docs/guidelines.md * fix broken links * fix broken link * fix broken links * fix broken links * fix broken links * fix broken links * fix broken links * remove broken link * fix broken link * fix broken links * fix broken links * fix broken links * fix broken link * fix linking phrasing * fix broken links batch * fix broken links second batch * fix broken links * fix broken links * fix broken links * Update COLLECTORS.md * fix broken links * fix broken links
2023-02-02Covert our documentation links to GH absolute links (#14344)Tasos Katsoulas
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-01-25Introduce the new Structure of the documentation (#13915)Fotis Voutsas
* Moving the cloud docs under /docs/cloud (previous location: netdata/learn/*) * Added metadata on almost every document of the old learn site for the new ingest process of learn. * Map old learn document to their best fit as topic related docs. Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: DShreve2 <david@netdata.cloud> Co-authored-by: hugovalente-pm <hugo@netdata.cloud>
2023-01-17Store host and claim info in sqlite as soon as possible (#14263)Emmanuel Vasilakis
* store host and claim info as soon as possible * no need to set the flag * check for metasync_worker.loop
2023-01-10DBENGINE v2 (#14125)Costa Tsaousis
* count open cache pages refering to datafile * eliminate waste flush attempts * remove eliminated variable * journal v2 scanning split functions * avoid locking open cache for a long time while migrating to journal v2 * dont acquire datafile for the loop; disable thread cancelability while a query is running * work on datafile acquiring * work on datafile deletion * work on datafile deletion again * logs of dbengine should start with DBENGINE * thread specific key for queries to check if a query finishes without a finalize * page_uuid is not used anymore * Cleanup judy traversal when building new v2 Remove not needed calls to metric registry * metric is 8 bytes smaller; timestamps are protected with a spinlock; timestamps in metric are now always coherent * disable checks for invalid time-ranges * Remove type from page details * report scanning time * remove infinite loop from datafile acquire for deletion * remove infinite loop from datafile acquire for deletion again * trace query handles * properly allocate array of dimensions in replication * metrics cleanup * metrics registry uses arrayalloc * arrayalloc free should be protected by lock * use array alloc in page cache * journal v2 scanning fix * datafile reference leaking hunding * do not load metrics of future timestamps * initialize reasons * fix datafile reference leak * do not load pages that are entirely overlapped by others * expand metric retention atomically * split replication logic in initialization and execution * replication prepare ahead queries * replication prepare ahead queries fixed * fix replication workers accounting * add router active queries chart * restore accounting of pages metadata sources; cleanup replication * dont count skipped pages as unroutable * notes on services shutdown * do not migrate to journal v2 too early, while it has pending dirty pages in the main cache for the specific journal file * do not add pages we dont need to pdc * time in range re-work to provide info about past and future matches * finner control on the pages selected for processing; accounting of page related issues * fix invalid reference to handle->page * eliminate data collection handle of pg_lookup_next * accounting for queries with gaps * query preprocessing the same way the processing is done; cache now supports all operations on Judy * dynamic libuv workers based on number of processors; minimum libuv workers 8; replication query init ahead uses libuv workers - reserved ones (3) * get into pdc all matching pages from main cache and open cache; do not do v2 scan if main cache and open cache can satisfy the query * finner gaps calculation; accounting of overlapping pages in queries * fix gaps accounting * move datafile deletion to worker thread * tune libuv workers and thread stack size * stop netdata threads gradually * run indexing together with cache flush/evict * more work on clean shutdown * limit the number of pages to evict per run * do not lock the clean queue for accesses if it is not possible at that time - the page will be moved to the back of the list during eviction * economies on flags for smaller page footprint; cleanup and renames * eviction moves referenced pages to the end of the queue * use murmur hash for indexing partition * murmur should be static * use more indexing partitions * revert number of partitions to number of cpus * cancel threads first, then stop services * revert default thread stack size * dont execute replication requests of disconnected senders * wait more time for services that are exiting gradually * fixed last commit * finer control on page selection algorithm * default stacksize of 1MB * fix formatting * fix worker utilization going crazy when the number is rotating * avoid buffer full due to replication preprocessing of requests * support query priorities * add count of spins in spinlock when compiled with netdata internal checks * remove prioritization from dbengine queries; cache now uses mutexes for the queues * hot pages are now in sections judy arrays, like dirty * align replication queries to optimal page size * during flushing add to clean and evict in batches * Revert "during flushing add to clean and evict in batches" This reverts commit 8fb2b69d068499eacea6de8291c336e5e9f197c7. * dont lock clean while evicting pages during flushing * Revert "dont lock clean while evicting pages during flushing" This reverts commit d6c82b5f40aeba86fc7aead062fab1b819ba58b3. * Revert "Revert "during flushing add to clean and evict in batches"" This reverts commit ca7a187537fb8f743992700427e13042561211ec. * dont cross locks during flushing, for the fastest flushes possible * low-priority queries load pages synchronously * Revert "low-priority queries load pages synchronously" This reverts commit 1ef2662ddcd20fe5842b856c716df134c42d1dc7. * cache uses spinlock again * during flushing, dont lock the clean queue at all; each item is added atomically * do smaller eviction runs * evict one page at a time to minimize lock contention on the clean queue * fix eviction statistics * fix last commit * plain should be main cache * event loop cleanup; evictions and flushes can now happen concurrently * run flush and evictions from tier0 only * remove not needed variables * flushing open cache is not needed; flushing protection is irrelevant since flushing is global for all tiers; added protection to datafiles so that only one flusher can run per datafile at any given time * added worker jobs in timer to find the slow part of it * support fast eviction of pages when all_of_them is set * revert default thread stack size * bypass event loop for dispatching read extent commands to workers - send them directly * Revert "bypass event loop for dispatching read extent commands to workers - send them directly" This reverts commit 2c08bc5bab12881ae33bc73ce5dea03dfc4e1fce. * cache work requests * minimize memory operations during flushing; caching of extent_io_descriptors and page_descriptors * publish flushed pages to open cache in the thread pool * prevent eventloop requests from getting stacked in the event loop * single threaded dbengine controller; support priorities for all queries; major cleanup and restructuring of rrdengine.c * more rrdengine.c cleanup * enable db rotation * do not log when there is a filter * do not run multiple migration to journal v2 * load all extents async * fix wrong paste * report opcodes waiting, works dispatched, works executing * cleanup event loop memory every 10 minutes * dont dispatch more work requests than the number of threads available * use the dispatched counter instead of the executing counter to check if the worker thread pool is full * remove UV_RUN_NOWAIT * replication to fill the queues * caching of extent buffers; code cleanup * caching of pdc and pd; rework on journal v2 indexing, datafile creation, database rotation * single transaction wal * synchronous flushing * first cancel the threads, then signal them to exit * caching of rrdeng query handles; added priority to query target; health is now low prio * add priority to the missing points; do not allow critical priority in queries * offload query preparation and routing to libuv thread pool * updated timing charts for the offloaded query preparation * caching of WALs * accounting for struct caches (buffers); do not load extents with invalid sizes * protection against memory booming during replication due to the optimal alignment of pages; sender thread buffer is now also reset when the circular buffer is reset * also check if the expanded before is not the chart later updated time * also check if the expanded before is not after the wall clock time of when the query started * Remove unused variable * replication to queue less queries; cleanup of internal fatals * Mark dimension to be updated async * caching of extent_page_details_list (epdl) and datafile_extent_offset_list (deol) * disable pgc stress test, under an ifdef * disable mrg stress test under an ifdef * Mark chart and host labels, host info for async check and store in the database * dictionary items use arrayalloc * cache section pages structure is allocated with arrayalloc * Add function to wakeup the aclk query threads and check for exit Register function to be called during shutdown after signaling the service to exit * parallel preparation of all dimensions of queries * be more sensitive to enable streaming after replication * atomically finish chart replication * fix last commit * fix last commit again * fix last commit again again * fix last commit again again again * unify the normalization of retention calculation for collected charts; do not enable streaming if more than 60 points are to be transferred; eliminate an allocation during replication * do not cancel start streaming; use high priority queries when we have locked chart data collection * prevent starvation on opcodes execution, by allowing 2% of the requests to be re-ordered * opcode now uses 2 spinlocks one for the caching of allocations and one for the waiting queue * Remove check locks and NETDATA_VERIFY_LOCKS as it is not needed anymore * Fix bad memory allocation / cleanup * Cleanup ACLK sync initialization (part 1) * Don't update metric registry during shutdown (part 1) * Prevent crash when dashboard is refreshed and host goes away * Mark ctx that is shutting down. Test not adding flushed pages to open cache as hot if we are shutting down * make ML work * Fix compile without NETDATA_INTERNAL_CHECKS * shutdown each ctx independently * fix completion of quiesce * do not update shared ML charts * Create ML charts on child hosts. When a parent runs a ML for a child, the relevant-ML charts should be created on the child host. These charts should use the parent's hostname to differentiate multiple parents that might run ML for a child. The only exception to this rule is the training/prediction resource usage charts. These are created on the localhost of the parent host, because they provide information specific to said host. * check new ml code * first save the database, then free all memory * dbengine prep exit before freeing all memory; fixed deadlock in cache hot to dirty; added missing check to query engine about metrics without any data in the db * Cleanup metadata thread (part 2) * increase refcount before dispatching prep command * Do not try to stop anomaly detection threads twice. A separate function call has been added to stop anomaly detection threads. This commit removes the left over function calls that were made internally when a host was being created/destroyed. * Remove allocations when smoothing samples buffer The number of dims per sample is always 1, ie. we are training and predicting only individual dimensions. * set the orphan flag when loading archived hosts * track worker dispatch callbacks and threadpool worker init * make ML threads joinable; mark ctx having flushing in progress as early as possible * fix allocation counter * Cleanup metadata thread (part 3) * Cleanup metadata thread (part 4) * Skip metadata host scan when running unittest * unittest support during init * dont use all the libuv threads for queries * break an infinite loop when sleep_usec() is interrupted * ml prediction is a collector for several charts * sleep_usec() now makes sure it will never loop if it passes the time expected; sleep_usec() now uses nanosleep() because clock_nanosleep() misses signals on netdata exit * worker_unregister() in netdata threads cleanup * moved pdc/epdl/deol/extent_buffer related code to pdc.c and pdc.h * fixed ML issues * removed engine2 directory * added dbengine2 files in CMakeLists.txt * move query plan data to query target, so that they can be exposed by in jsonwrap * uniform definition of query plan according to the other query target members * event_loop should be in daemon, not libnetdata * metric_retention_by_uuid() is now part of the storage engine abstraction * unify time_t variables to have the suffix _s (meaning: seconds) * old dbengine statistics become "dbengine io" * do not enable ML resource usage charts by default * unify ml chart families, plugins and modules * cleanup query plans from query target * cleanup all extent buffers * added debug info for rrddim slot to time * rrddim now does proper gap management * full rewrite of the mem modes * use library functions for madvise * use CHECKSUM_SZ for the checksum size * fix coverity warning about the impossible case of returning a page that is entirely in the past of the query * fix dbengine shutdown * keep the old datafile lock until a new datafile has been created, to avoid creating multiple datafiles concurrently * fine tune cache evictions * dont initialize health if the health service is not running - prevent crash on shutdown while children get connected * rename AS threads to ACLK[hostname] * prevent re-use of uninitialized memory in queries * use JulyL instead of JudyL for PDC operations - to test it first * add also JulyL files * fix July memory accounting * disable July for PDC (use Judy) * use the function to remove datafiles from linked list * fix july and event_loop * add july to libnetdata subdirs * rename time_t variables that end in _t to end in _s * replicate when there is a gap at the beginning of the replication period * reset postponing of sender connections when a receiver is connected * Adjust update every properly * fix replication infinite loop due to last change * packed enums in rrd.h and cleanup of obsolete rrd structure members * prevent deadlock in replication: replication_recalculate_buffer_used_ratio_unsafe() deadlocking with replication_sender_delete_pending_requests() * void unused variable * void unused variables * fix indentation * entries_by_time calculation in VD was wrong; restored internal checks for checking future timestamps * macros to caclulate page entries by time and size * prevent statsd cleanup crash on exit * cleanup health thread related variables Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> Co-authored-by: vkalintiris <vasilis@netdata.cloud>
2023-01-04Finish renaming the `--install` option to `--install-prefix`. (#13881)Austin S. Hemmelgarn
* Finish renaming the `--install` option to `--install-prefix`. * Fix remaining references to `--install` option.
2022-10-24app to api netdata cloud (#13856)Timotej S
2022-10-16Add a thread to asynchronously process metadata updates (#13783)Stelios Fragkakis
* Remove old metalog text fle processing * Add metadata event loop * Move functions from sqlite_functions.c to sqlite_metadata.c Queue updates to the metadata event loop Migration to remove unused tables Cleanup unused functions * Queue chart labels to metadata * Store chart labels to metadata * During shutdown, run full speed * Add shutdown prepare Handle SHUTDOWN in the cmd queue function Add worker thread to handle host/chart/dimension metadata doing dictionary traversals * Remove unused RRDIM_FLAG_ACLK Add flags to trigger host/chart/dimension metadata processing * Incremental processing of chart metadata writes * Store host labels * Remove redundant return statements * Change unit tests / cleanup * Fix rescheduling * Schedule chart labels update by setting the RRDSET_FLAG_METADATA_UPDATE flag * Queue commands to update metadata for dimension and host labels * Make sure we do a final scan to store metadata during shutdown (if needed) * Remove unused structures Adjust queue size since we do batch processing of updates without queueing individual messages Remove pragma mmap for now Fix memory leak during sqlite unittest (minor) * Dont update if we are in archive mode * Cleanup * Build entire message payload and store * Initialize worker completion properly * Properly skip host check for pending metadata updates * Report bind param failures Add worker request inside the data payload Initialize variables to silence warnings Rebase on master * Report the chart id (not the dimension) and the dimension id when storing a dimension * Compilation warnings in 32bit * Add DEFINE for the queries * Remove commented out code * * Remove items parameter from unitest * Remove commented out code * sqlite_metadata.h contains only public items * Use sleep_usec instead of usleep * Rename metadata_database_init_cmd_queue to metadata_init_cmd_queue * Rename metadata_database_enq_cmd_noblock to metadata_enq_cmd_noblock
2022-10-05Allow netdata plugins to expose functions for querying more information ↵Costa Tsaousis
about specific charts (#13720) * function renames and code cleanup in popen.c; no actual code changes * netdata popen() now opens both child process stdin and stdout and returns FILE * for both * pass both input and output to parser structures * updated rrdset to call custom functions * RRDSET FUNCTION leading calls for both sync and async operation * put RRDSET functions to a separate file * added format and timeout at function definition * support for synchronous (internal plugins) and asynchronous (external plugins and children) functions * /api/v1/function endpoint * functions are now attached to the host and there is a dictionary view per chart * functions implemented at plugins.d * remove the defer until keyword hook from plugins.d when it is done * stream sender implementation of functions * sanitization of all functions so that certain characters are only allowed * strictier sanitization * common max size * 1st working plugins.d example * always init inflight dictionary * properly destroy dictionaries to avoid parallel insertion of items * add more debugging on disconnection reasons * add more debugging on disconnection reasons again * streaming receiver respects newlines * dont use the same fp for both streaming receive and send * dont free dbengine memory with internal checks * make sender proceed in the buffer * added timing info and garbage collection at plugins.d * added info about routing nodes * added info about routing nodes with delay * added more info about delays * added more info about delays again * signal sending thread to wake up * streaming version labeling and commented code to support capabilities * added functions to /api/v1/data, /api/v1/charts, /api/v1/chart, /api/v1/info * redirect top output to stdout * address coverity findings * fix resource leaks of popen * log attempts to connect to individual destinations * better messages * properly parse destinations * try to find a function from the most matching to the least matching * log added streaming destinations * rotate destinations bypassing a node in the middle that does not accept our connection * break the loops properly * use typedef to define callbacks * capabilities negotiation during streaming * functions exposed upstream based on capabilities; compression disabled per node persisting reconnects; always try to connect with all capabilities * restore functionality to lookup functions * better logging of capabilities * remove old versions from capabilities when a newer version is there * fix formatting * optimization for plugins.d rrdlabels to avoid creating and destructing dictionaries all the time * delayed health initialization for rrddim and rrdset * cleanup health initialization * fix for popen() not returning the right value * add health worker jobs for initializing rrdset and rrddim * added content type support for functions; apps.plugin permanent function to display all the processes * fixes for functions parameters parsing in apps.plugin * fix for process matching in apps.plugiin * first working function for apps.plugin * Dashboard ACL is disabled for functions; Function errors are all in JSON format * apps.plugin function processes returns json table * use json_escape_string() to escape message * fix formatting * apps.plugin exposes all its metrics to function processes * fix json formatting when filtering out some rows * reopen the internal pipe of rrdpush in case of errors * misplaced statement * do not use buffer->len * support for GLOBAL functions (functions that are not linked to a chart * added /api/v1/functions endpoint; removed format from the FUNCTIONS api; * swagger documentation about the new api end points * added plugins.d documentation about functions * never re-close a file * remove uncessesary ifdef * fixed issues identified by codacy * fix for null label value * make edit-config copy-and-paste friendly * Revert "make edit-config copy-and-paste friendly" This reverts commit 54500c0e0a97f65a0c66c4d34e966f6a9056698e. * reworked sender handshake to fix coverity findings * timeout is zero, for both send_timeout() and recv_timeout() * properly detect that parent closed the socket * support caching of function responses; limit function response to 10MB; added protection from malformed function responses * disabled excessive logging * added units to apps.plugin function processes and normalized all values to be human readable * shorter field names * fixed issues reported * fixed apps.plugin error response; tested that pluginsd can properly handle faulty responses * use double linked list macros for double linked list management * faster apps.plugin function printing by minimizing file operations * added memory percentage * fix compatibility issues with older compilers and FreeBSD * rrdpush sender code cleanup; rrhost structure cleanup from sender flags and variables; * fix letftover variable in ifdef * apps.plugin: do not call detach from the thread; exit immediately when input is broken * exclude AR charts from health * flush cleaner; prefer sender output * clarity * do not fill the cbuffer if not connected * fix * dont enabled host->sender if streaming is not enabled; send host label updates to parent; * functions are only available through ACLK * Prepared statement reports only in dev mode * fix AR chart detection * fix for streaming not being enabling itself * more cleanup of sender and receiver structures * moved read-only flags and configuration options to rrdhost->options * fixed merge with master * fix for incomplete rename * prevent service thread from working on charts that are being collected Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-08-24Remove aclk_api.[ch] (#13540)Timotej S
* get rid of aclk_starter middleman * get rid of aclk_api.[ch]
2022-07-24Rrdcontext (#13335)Costa Tsaousis
* type checking on dictionary return values * first STRING implementation, used by DICTIONARY and RRDLABEL * enable AVL compilation of STRING * Initial functions to store context info * Call simple test functions * Add host_id when getting charts * Allow host to be null and in this case it will process the localhost * Simplify init Do not use strdupz - link directly to sqlite result set * Init the database during startup * make it compile - no functionality yet * intermediate commit * intermidiate * first interface to sql * loading instances * check if we need to update cloud * comparison of rrdcontext on conflict * merge context titles * rrdcontext public interface; statistics on STRING; scratchpad on DICTIONARY * dictionaries maintain version numbers; rrdcontext api * cascading changes * first operational cleanup * string unittest * proper cleanup of referenced dictionaries * added rrdmetrics * rrdmetric starting retention * Add fields to context Adjuct context creation and delete * Memory cleanup * Fix get context list Fix memory double free in tests Store context with two hosts * calculated retention * rrdcontext retention with collection * Persist database and shutdown * loading all from sql * Get chart list and dimension list changes * fully working attempt 1 * fully working attempt 2 * missing archived flag from log * fixed archived / collected * operational * proper cleanup * cleanup - implemented all interface functions - dictionary react callback triggers after the dictionary is unlocked * track all reasons for changes * proper tracking of reasons of changes * fully working thread * better versioning of contexts * fix string indexing with AVL * running version per context vs hub version; ifdef dbengine * added option to disable rrdmetrics * release old context when a chart changes context * cleanup properly * renamed config * cleanup contexts; general cleanup; * deletion inline with dequeue; lots of cleanup; child connected/disconnected * ml should start after rrdcontext * added missing NULL to ri->rrdset; rrdcontext flags are now only changed under a mutex lock * fix buggy STRING under AVL * Rework database initialization Add migration logic to the context database * fix data race conditions during context deletion * added version hash algorithm * fix string over AVL * update aclk-schemas * compile new ctx related protos * add ctx stream message utils * add context messages * add dummy rx message handlers * add the new topics * add ctx capability * add helper functions to send the new messages * update cmake build to not fail * update topic names * handle rrdcontext_enabled * add more functions * fatal on OOM cases instead of return NULL * silence unknown query type error * fully working attempt 1 * fully working attempt 2 * allow compiling without ACLK * added family to the context * removed excess character in UUID * smarter merging of titles and families * Database migration code to add