Age | Commit message (Collapse) | Author |
|
optimization (#14493)
* first work on standardizing json formatting
* renamed old grouping to time_grouping and added group_by
* add dummy functions to enable compilation
* buffer json api work
* jsonwrap opening with buffer_json_X() functions
* cleanup
* storage for quotes
* optimize buffer printing for both numbers and strings
* removed ; from define
* contexts json generation using the new json functions
* fix buffer overflow at unit test
* weights endpoint using new json api
* fixes to weights endpoint
* check buffer overflow on all buffer functions
* do synchronous queries for weights
* buffer_flush() now resets json state too
* content type typedef
* print double values that are above the max 64-bit value
* str2ndd() can now parse values above UINT64_MAX
* faster number parsing by avoiding double calculations as much as possible
* faster number parsing
* faster hex parsing
* accurate printing and parsing of double values, even for very large numbers that cannot fit in 64bit integers
* full printing and parsing without using library functions - and related unit tests
* added IEEE754 streaming capability to enable streaming of double values in hex
* streaming and replication to transfer all values in hex
* use our own str2ndd for set2
* remove subnormal check from ieee
* base64 encoding for numbers, instead of hex
* when increasing double precision, also make sure the fractional number printed is aligned to the wanted precision
* str2ndd_encoded() parses all encoding formats, including integers
* prevent uninitialized use
* /api/v1/info using the new json API
* Fix error when compiling with --disable-ml
* Remove redundant 'buffer_unittest' declaration
* Fix formatting
* Fix formatting
* Fix formatting
* fix buffer unit test
* apps.plugin using the new JSON API
* make sure the metrics registry does not accept negative timestamps
* do not allow pages with negative timestamps to be loaded from db files; do not accept pages with negative timestamps in the cache
* Fix more formatting
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* Move collectors under Integrations/Monitoring
* Correct Monitoring to Monitor
|
|
* support multiple hosts at pluginsd structures
* cleanup obsolete code
* use a lookup hashtable to quickly find the keyword to execute, without traversing the whole linked list of keywords
* more cleanup
* move new hash function to inlined.h
* minimize comparisons, eliminate a pre-parsing of the first keyword for each line
* cleanup parser from old code
* move parser into libnetdata
* unique entries in parser keywords hashtable
* move all hashing functions to inlined.h, name their sources, simple_hash() now defaults to FNV1a, it was FNV1
* small_hash() for parser
* plugins.d now can switch hosts, and also create/update them
* update hash function and hashtable size
* updated message
* unittest all hashing functions
* reset the chart when setting a new host
* remove host tags
* enable archived hosts when a collector pushes host info
* do not need localhost to swtich to localhost
* disable ARAL and OWA with -DFSANITIZE_ADDRESS=1
|
|
* add help line to functions response
* re-arranged columns to make it more usefull and added the new fields for controlling values
* renames and re-arrangements
|
|
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
|
|
* acquiring / releasing interface for metrics
* metrics registry statistics
* cleanup metrics registry by deleting metrics when they dont have retention anymore; do not double copy the data of pages to be flushed
* print the tier in retention summary
* Open files with buffered instead of direct I/O (test)
* added more metrics stats and fixed unittest
* rename writer functions to avoid confusion with refcounting
* do not release a metric that is not acquired
* Revert to use direct I/O on write -- use direct I/O on read as well
* keep track of ARAL overhead and add it to the memory chart
* aral full check via api
* Cleanup
* give names to ARALs and PGCs
* aral improvements
* restore query expansion to the future
* prefer higher resolution tier when switching plans
* added extent read statistics
* smoother joining of tiers at query engine
* fine tune aral max allocation size
* aral restructuring to hide its internals from the rest of netdata
* aral restructuring; addtion of defrag option to aral to keep the linked list sorted - enabled by default to test it
* fully async aral
* some statistics and cleanup
* fix infinite loop while calculating retention
* aral docs and defragmenting disabled by default
* fix bug and add optimization when defragmenter is not enabled
* aral stress test
* aral speed report and documentation
* added internal checks that all pages are full
* improve internal log about metrics deletion
* metrics registry uses one aral per partition
* metrics registry aral max size to 512 elements per page
* remove data_structures/README.md dependency
---------
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
|
|
* Moving the cloud docs under /docs/cloud (previous location: netdata/learn/*)
* Added metadata on almost every document of the old learn site for the new ingest process of learn.
* Map old learn document to their best fit as topic related docs.
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: DShreve2 <david@netdata.cloud>
Co-authored-by: hugovalente-pm <hugo@netdata.cloud>
|
|
* track memory footprint of Netdata
* track db modes alloc/ram/save/map
* track system info; track sender and receiver
* fixes
* more fixes
* track workers memory, onewayalloc memory; unify judyhs size estimation
* track replication structures and buffers
* Properly clear host RRDHOST_FLAG_METADATA_UPDATE flag
* flush the replication buffer every 1000 times the circular buffer is found empty
* dont take timestamp too frequently in sender loop
* sender buffers are not used by the same thread as the sender, so they were never recreated - fixed it
* free sender thread buffer on replication threads when replication is idle
* use the last sender flag as a timestamp of the last buffer recreation
* free cbuffer before reconnecting
* recreate cbuffer on every flush
* timings for journal v2 loading
* inlining of metric and cache functions
* aral likely/unlikely
* free left-over thread buffers
* fix NULL pointer dereference in replication
* free sender thread buffer on sender thread too
* mark ctx as used before flushing
* better logging on ctx datafiles closing
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* cleanup journal v2 mounts periodically
* fix for last commit
* re-enable loading page from disk when the arrangement of pages requires it
* Remove unused statistics
* Estimate diskspace when the current datafile is full and queue a rotate command (Currently it will not attempt to estimate end size for journals)
Queue a command to check quota on startup per tier
* apps.plugin now exposes RSS chart
* shorter thread names to make debugging easier, since thread names can only be 15 characters
* more thread names fixes
* allow an apps_groups.conf target to be pid 0 or 1
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* rename "Pid" to "PID" in functions
I think it would be better for the case of "Pid" if we stuck to generally expected convention of "PID"
* `PPid` to `PPID`
* `Pid` to `PID` for pointer in PPID
* ignore "Gid" and "Uid" from this PR
|
|
|
|
Fixes https://github.com/netdata/netdata/issues/14154
|
|
|
|
|
|
* keys should not have spaces
* sticky columns should be very few
|
|
* Revert "Use llvm's ar and ranlib when compiling with clang (#13854)"
This reverts commit a9135f47bbb36e9cb437b18a7109607569580db7.
* Profile plugin
* Fix macos static thread
* Add support for replication
- Add a new capability for replication, when not supported the agent
should behave as previously.
- When replication is supported, the text protocol supports the
following new commands:
- CHART_DEFINITION_END: send the first/last entry of the child
- REPLAY_RRDSET_BEGIN: sends the name of the chart we are
replicating
- REPLAY_RRDSET_HEADER: sends a line describing the columns of the
following command (ie. start-time, end-time, dim1-name, ...)
- REPLAY_RRDSET_DONE: sends values to push for a specific start/end
time
- REPLAY_RRDSET_END: send the (a) update every of the chart, (b)
first/last entries in DB, (c) whether the child's been told to
start streaming, (d) original after/before period to replicate.
- REPLAY_CHART: Sent from a parent to a child, specifying (a)
the chart name we want data for, (b) whether the child should
start streaming once it has fullfilled the request with the
aforementioned commands, (c) after/before of the data the parent
wants
- As a consequence of the new protocol, streaming is disabled for all
charts on a new connection. It's enabled once replication is finished.
- The configuration parameters are specified from within stream.conf:
- "enable replication = yes|no"
- "seconds to replicate = 3600"
- "replication step = 600" (ie. how many seconds to fill per
roundtrip request.
* Minor fixes
- quote set and dim ids
- start streaming after writing replicated data to the buffer
- write replicated data only when buffer is less than 50% full.
- use reentrant iteration for charts
* Do not send chart definitions on connection.
* Track replication status through rrdset flags.
* Add debug flag for noisy log messages.
* Add license notice.
* Iterate charts with reentrant loop
* Set replication finished flag when streaming is disabled.
* Revert "Profile plugin"
This reverts commit 468fc9386e5283e0865fae56e9989b8ec83de14d.
Used only for testing purposes.
* Revert "Revert "Use llvm's ar and ranlib when compiling with clang (#13854)""
This reverts commit 27c955c58d95aed6c44d42e8b675f0cf3ca45c6d.
Reapply commit that I had to revert in order to be able to build the
agent on MacOS.
* Build replication source files with CMake.
* Pass number of words in plugind functions.
* Use get_word instead of indexing words.
* Use size_t instead of int.
* Pay only what we use when splitting words.
* no need to redefine PLUGINSD_MAX_WORDS
* fix formatting warning
* all usages of pluginsd_split_words() should use the return value to ensure non-cached results reuse; no need to lock the host to find a chart
* keep a sender dictionary with all the replication commands received and remove replication commands from charts
* do not replicate future data
* use last_updated to find the end of the db
* uniformity of replication logs
* rewrite of the query logic
* replication.c in C; debug info in human readable dates
* update the chart on every replication row
* update all chart members so that rrdset_done() can continue
* update the protocol to push one dimension per line and transfer data collection state to parent
* fix formatting
* remove replication object from pluginsd
* shorter communication
* fix typo
* support for replication proxies
* proper use of flags
* set receiver replication finished flag on charts created after the sender has been connected
* clear RRDSET_FLAG_SYNC_CLOCK on replicated charts
* log storing of nulls
* log first store
* log update every switches
* test ignoring timestamps but sending a point just after replication end
* replication should work on end_time
* use replicated timestamps
* at the final replication step, replicate all the remaining points
* cleanup code from tests
* print timestamps as unsigned long long
* more formating changes; fix conflicting type of replicate_chart_response()
* updated stream.conf
* always respond to replication requests
* in non-dbengine db modes, do not replicate more than the database size
* advance the db pointer of legacy db modes
* should be multiplied by update_every
* fix buggy label parsing - identified by codacy
* dont log error on history mismatches for db mode dbengine
* allow SSL requests to streaming children
* dont use ssl variable
Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
|
|
* apps.plugin function add max value on all value columns
* prevent warning of uninitialized variables
|
|
* use print macros
* cast instead
|
|
|
|
extended processes function info
|
|
* allow disabling netdata monitoring section of the dashboard
* disable-netdata-stats: Modify eBPF.plugin to disable statistic charts according user selection, by default it is enabled
* Don't send internal statistics for exporting engine if it's disabled.
* Fix global statistics flag initialization
* Don't send internal statistics for checks plugin if it's disabled.
Co-authored-by: Thiago Marques <thiagoftsm@gmail.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
|
|
(#13810)
* overload libc memory allocators with custom ones to trace all allocations
* grab libc pointers for external c plugins
* use -ldl when necessary; fallback to work without dlsym when it is not available
* initialize global variable
* add optional dl libs
* dynamically link every library function when needed for the first time
* prevent crashes on musl libc
* another attempt
* dont dereference function
* attempt no 3
* attempt no 4
* cleanup - all attempts failed
* dont enable tracing of allocations
* missing parenthesis
|
|
about specific charts (#13720)
* function renames and code cleanup in popen.c; no actual code changes
* netdata popen() now opens both child process stdin and stdout and returns FILE * for both
* pass both input and output to parser structures
* updated rrdset to call custom functions
* RRDSET FUNCTION leading calls for both sync and async operation
* put RRDSET functions to a separate file
* added format and timeout at function definition
* support for synchronous (internal plugins) and asynchronous (external plugins and children) functions
* /api/v1/function endpoint
* functions are now attached to the host and there is a dictionary view per chart
* functions implemented at plugins.d
* remove the defer until keyword hook from plugins.d when it is done
* stream sender implementation of functions
* sanitization of all functions so that certain characters are only allowed
* strictier sanitization
* common max size
* 1st working plugins.d example
* always init inflight dictionary
* properly destroy dictionaries to avoid parallel insertion of items
* add more debugging on disconnection reasons
* add more debugging on disconnection reasons again
* streaming receiver respects newlines
* dont use the same fp for both streaming receive and send
* dont free dbengine memory with internal checks
* make sender proceed in the buffer
* added timing info and garbage collection at plugins.d
* added info about routing nodes
* added info about routing nodes with delay
* added more info about delays
* added more info about delays again
* signal sending thread to wake up
* streaming version labeling and commented code to support capabilities
* added functions to /api/v1/data, /api/v1/charts, /api/v1/chart, /api/v1/info
* redirect top output to stdout
* address coverity findings
* fix resource leaks of popen
* log attempts to connect to individual destinations
* better messages
* properly parse destinations
* try to find a function from the most matching to the least matching
* log added streaming destinations
* rotate destinations bypassing a node in the middle that does not accept our connection
* break the loops properly
* use typedef to define callbacks
* capabilities negotiation during streaming
* functions exposed upstream based on capabilities; compression disabled per node persisting reconnects; always try to connect with all capabilities
* restore functionality to lookup functions
* better logging of capabilities
* remove old versions from capabilities when a newer version is there
* fix formatting
* optimization for plugins.d rrdlabels to avoid creating and destructing dictionaries all the time
* delayed health initialization for rrddim and rrdset
* cleanup health initialization
* fix for popen() not returning the right value
* add health worker jobs for initializing rrdset and rrddim
* added content type support for functions; apps.plugin permanent function to display all the processes
* fixes for functions parameters parsing in apps.plugin
* fix for process matching in apps.plugiin
* first working function for apps.plugin
* Dashboard ACL is disabled for functions; Function errors are all in JSON format
* apps.plugin function processes returns json table
* use json_escape_string() to escape message
* fix formatting
* apps.plugin exposes all its metrics to function processes
* fix json formatting when filtering out some rows
* reopen the internal pipe of rrdpush in case of errors
* misplaced statement
* do not use buffer->len
* support for GLOBAL functions (functions that are not linked to a chart
* added /api/v1/functions endpoint; removed format from the FUNCTIONS api;
* swagger documentation about the new api end points
* added plugins.d documentation about functions
* never re-close a file
* remove uncessesary ifdef
* fixed issues identified by codacy
* fix for null label value
* make edit-config copy-and-paste friendly
* Revert "make edit-config copy-and-paste friendly"
This reverts commit 54500c0e0a97f65a0c66c4d34e966f6a9056698e.
* reworked sender handshake to fix coverity findings
* timeout is zero, for both send_timeout() and recv_timeout()
* properly detect that parent closed the socket
* support caching of function responses; limit function response to 10MB; added protection from malformed function responses
* disabled excessive logging
* added units to apps.plugin function processes and normalized all values to be human readable
* shorter field names
* fixed issues reported
* fixed apps.plugin error response; tested that pluginsd can properly handle faulty responses
* use double linked list macros for double linked list management
* faster apps.plugin function printing by minimizing file operations
* added memory percentage
* fix compatibility issues with older compilers and FreeBSD
* rrdpush sender code cleanup; rrhost structure cleanup from sender flags and variables;
* fix letftover variable in ifdef
* apps.plugin: do not call detach from the thread; exit immediately when input is broken
* exclude AR charts from health
* flush cleaner; prefer sender output
* clarity
* do not fill the cbuffer if not connected
* fix
* dont enabled host->sender if streaming is not enabled; send host label updates to parent;
* functions are only available through ACLK
* Prepared statement reports only in dev mode
* fix AR chart detection
* fix for streaming not being enabling itself
* more cleanup of sender and receiver structures
* moved read-only flags and configuration options to rrdhost->options
* fixed merge with master
* fix for incomplete rename
* prevent service thread from working on charts that are being collected
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
|
|
* Deaggregate the `gui` and `email` app groupx and improve GUI coverage.
This splits the `gui` and `email` groups in the apps plugin config into a
number of much smaller and more descriptive groups, as well as expanding
what we cover for a number of the resultant groups. In particular:
- The `email` group has been explicitly split into two separate groups,
`mta` for mail transfer agents (SMTP servers), and `mda` for mail
delivery agents (IMAP/POP3 servers). The new `mda` group prrovides
slightly better coverage of MDA software than the old `email` group did.
- A new `mua` group has been added for email clients (formally Mail User
Agents), covering Evolution, Thunderbird, and a selection of popular
console email clients.
- A new `ntfs` group has been created, matching on NTFS-3G. This
particular software is in relatively widespread usage on Linux desktop
systems, and does not generally get automatically matched by the `gui`
group.
- A re-added `X` group covers _only_ X11, XDM, Xwayland, and xsettingsd.
- A new `wayland` group covers a handful of tools commonly used with
Wayland independent of the choice of compositor.
- A new `kde` group covers all of the standard KDE 5 and Plasma tooling,
including some things we were previously not matching correctly that
may not be in the process tree of the KDE session itself.
- New groups also cover the GNOME, MATE, Cinnamon, XFCE4, LXDE, LXQt,
and Enlightenment desktop environments.
- New groups have been added to cover i3, DWM, and AwesomeWM, three of
the most popular X11 window managers on Linux.
- New groups have been added to cover Sway, Wayfire, Cage, and Westom,
four commonly used Wayland compositors.
- The `gui` group is retained, matching on GUI specific tooling that is
not DE/Compositor specific. It has also been expanded to include a
couple of additional components we were not matching on previously,
such as the XDG portal tooling.
- A new `webbrowser` group has been added, covering the most common web
browsers on Linux (including some widely used text-mode browsers).
- RTKit and gpg-agent have both been added to the system group. Both are
relatively commonly used on Linux desktop systems, but are not GUI
specific.
* Add text mode browsers.
Somehow got missed in the original commit.
* Add USBGuard to system group.
* Add links to the webbrowser group.
|
|
|
|
|
|
|
|
|
|
add more monitoring tools to `apps_groups.conf`
|
|
|
|
(#13314)
|
|
* netdata doubles
* fix cmocka test
* fix cmocka test again
* fix left-overs of long double to NETDATA_DOUBLE
* RRDDIM detached from disk representation; db settings in [db] section of netdata.conf
* update the memory before saving
* rrdset is now detached from file structures too
* on memory mode map, update the memory mapped structures on every iteration
* allow RRD_ID_LENGTH_MAX to be changed
* granularity secs, back to update every
* fix formatting
* more formatting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* initial version of worker utilization
* working example
* without mutexes
* monitoring DBENGINE, ACLKSYNC, WEB workers
* added charts to monitor worker usage
* fixed charts units
* updated contexts
* updated priorities
* added documentation
* converted threads to stacked chart
* One query per query thread
* Revert "One query per query thread"
This reverts commit 6aeb391f5987c3c6ba2864b559fd7f0cd64b14d3.
* fixed priority for web charts
* read worker cpu utilization from proc
* read workers cpu utilization via /proc/self/task/PID/stat, so that we have cpu utilization even when the jobs are too long to finish within our update_every frequency
* disabled web server cpu utilization monitoring - it is now monitored by worker utilization
* tight integration of worker utilization to web server
* monitoring statsd worker threads
* code cleanup and renaming of variables
* contrained worker and statistics conflict to just one variable
* support for rendering jobs per type
* better priorities and removed the total jobs chart
* added busy time in ms per job type
* added proc.plugin monitoring, switch clock to MONOTONIC_RAW if available, global statistics now cleans up old worker threads
* isolated worker thread families
* added cgroups.plugin workers
* remove unneeded dimensions when then expected worker is just one
* plugins.d and streaming monitoring
* rebased; support worker_is_busy() to be called one after another
* added diskspace plugin monitoring
* added tc.plugin monitoring
* added ML threads monitoring
* dont create dimensions and charts that are not needed
* fix crash when job types are added on the fly
* added timex and idlejitter plugins; collected heartbeat statistics; reworked heartbeat according to the POSIX
* the right name is heartbeat for this chart
* monitor streaming senders
* added streaming senders to global stats
* prevent division by zero
* added clock_init() to external C plugins
* added freebsd and macos plugins
* added freebsd and macos to global statistics
* dont use new as a variable; address compiler warnings on FreeBSD and MacOS
* refactored contexts to be unique; added health threads monitoring
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* Remove node.d.plugin and relevant files
* fix build packages
* remove node.d related words/phrases from docs and tests
|
|
|
|
|
|
|
|
|
|
* 12139: introduce new chart for process states metrics
This commit introduces new chart for total number of processes
in different states i.e running, sleeping, sleeping_d, zombie
and stopped.
* fix recursive chart generation issue
* fix recursive chart addition
* fixing comments
* Update web/gui/dashboard_info.js
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
* Update collectors/apps.plugin/apps_plugin.c
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
* fixing commenets
* Apply suggestions from code review
* Update collectors/apps.plugin/apps_plugin.c
* Update collectors/apps.plugin/apps_plugin.c
Co-authored-by: Timotej S. <6674623+underhood@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com>
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
Co-authored-by: Timotej S. <6674623+underhood@users.noreply.github.com>
Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com>
|
|
|
|
|
|
|