summaryrefslogtreecommitdiffstats
path: root/daemon
AgeCommit message (Collapse)Author
2020-10-30allows use of system libwebsockets instead of bundled one (#9984)Timotej S
* allows usage of system libwebsockets * fixes problems that were preventing ACLK to work with LWS `4.1.` * add LWS info to buildinfo Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud>
2020-09-16adds ACLK DISABLE_CLOUD to -W buildinfo (#9936)Timotej S
2020-09-16Added a way to get build configuration info from the agent. (#9913)Austin S. Hemmelgarn
* Add a way to get build configuration info from the agent. This adds a new option to the `-W` switch called 'buildinfo'. When invoked with this argument, Netdata will print it's version, the configure options, and a list of optional features and whether they are enabled or not. This is intended to serve three purposes: * It allows developers to more quickly get an idea of how Netdata was built when triaging bug reports. * It provides an easier way to validate changes to the build system that affect optional features during the development cycle. * It provides an easier way to build CI workflows that validate that building under a given set of constraints results in a feature being enabled or not. The actual implementation is a bit large but overall exceedingly simple, consisting of a set of preprocessor directives to extract optional feature state information from config.h and then a series of printf() calls to actually report this info (which should end up optimized by smart compilers due to all the arguments being compile-time constants). * Added zlib to optional libraries. * Added remaining optional plugins to buildinfo output. * Changed formatting to be more human friendly. * Add remaining optional libraries. * Fix up formatting to be even more human friendly. * Fix typo in buildinfo output. * Remove unused variable. * Fixed spelling of config.h option name. * Update daemon/buildinfo.c Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com> * Fix option name mismatch for libcrypto. * Update daemon/buildinfo.c Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com> Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
2020-09-15Added context parameter to the data endpoint (#9931)Stelios Fragkakis
Added functionality to support composite charts
2020-09-04Fixed lack of macOS RAM info in system-info.sh (#9882)jim
* fix lack macOS ram info in system-info.sh * aligned spaces
2020-08-26Adds claimed_id streaming (#9804)Timotej S
* streams claimed_id of child nodes to parents * adds this information into /api/v1/info
2020-08-20Added code to release memory used by the global GUID map (#9729)Stelios Fragkakis
Fixed memory leak issues associated with the global GUID map during agent shutdown
2020-08-19Add missing value to list simple patternthiagoftsm
Fix access using Unix sockets when Netdata is installed with kickstart-static64
2020-08-11Fix the default value of the home directory (#9711)Chris Akritidis
Change the default home to be VARLIB_DIR instead of CACHE_DIR so that it is consistent with the installation. Override the default with the HOME var if set in the environment.
2020-08-11Fix crash when receiving malformed labels via streaming. (#9715)Markos Fountoulakis
* Fix crash when receiving malformed labels via streaming. * Disallow empty _virtualization values in system_info.
2020-08-07Stop multi-host DB statistics from being counted multiple times. (#9685)Markos Fountoulakis
2020-07-28Implemented multihost database (#9556)Stelios Fragkakis
* Hard code a node for non-legacy multidb test Skip dbengine initialization for new incoming children Add code to switch to multidb ctx when accessing the dbengine * When a non-legacy streaming connection is detected, use the multidb metadata log context * Clear the superblock memory to avoid random data written in the metadata log * Activate the host detection during compaction Activate the host detection during metadata log chart updates Keep the host in the user object during replay of the HOST command * Add defaults for health / rrdpush on HOST metadata replay Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child() Use defaults from the stream.conf * Count hosts only if not archived When host switches from archived to active update rrd_hosts_available Remove archived hosts from charts and info * Change parameter from "multidb disk space" to "dbengine multihost disk space" Remove unused variables Fix compilation error when dbengine is disabled Fix condition for machine_guid directory creation under cache_dir * Enable multidb disk space file creation. * Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine. * Fix old bug in the code that confused obsolete hosts with orphan hosts. * Do not delete multi-host DB host files. * Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies. * Identify metadata that collide with non-dbengine memory mode hosts and ignore them. * Handle non-dbengine localhost with dbengine archived charts in localhost and streaming. * Ignore archived hosts in streaming. * Add documentation before merging to master. Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
2020-07-23Added lock dir (#9584)Vladimir Kobal
* Add lock dir * Clean directory on startup * Update environment variable name * Fix file removal * Add error message * collectors/python.d: change lock file name * collectors/python.d: add `nolock` cmd option Co-authored-by: ilyam8 <ilya@netdata.cloud>
2020-07-21Network Viewer options (#9495)thiagoftsm
Bring new options to ebpf.plugin.
2020-07-16Implemented default disk space size calculation for multihost db (#9504)Stelios Fragkakis
Implemented default disk space size calculation for multihost db (#9504)
2020-06-29Corrected virtualization detection in system-info.sh. (#9425)Austin S. Hemmelgarn
This causes failure to detect virtualization to be reported as no virtualization instead of unknown virtulization.
2020-06-29fix_unit_test: Fix the unittest execution (#9445)thiagoftsm
Fix unit test execution( ` Cannot create unique machine id file`).
2020-06-29Fix internal registry (#9434)thiagoftsm
Move registry to do integration with spawn server.
2020-06-16Replace assert calls (#9349)Markos Fountoulakis
* Replace all assert() calls with the new fatal_assert() for proper logging.
2020-06-16Get netdata execution path early to avoid user permission issues (#9339)Markos Fountoulakis
* Get netdata execution path early to avoid user permission issues
2020-06-12Change streaming terminology to parent/child in docs (#9312)Joel Hans
* Intial pass through docs * Dash instead of slash * To parent/child * Child nodes * Change diagrams * Allowlist * Fixes for Andrew * Remove from build_external * Change in proc
2020-06-12Change streaming terminology to parent-child in the code (#9323)Andrew Moss
2020-06-12Add support for persistent metadata (#9324)Stelios Fragkakis
* Implemented collector metadata logging * Added persistent GUIDs for charts and dimensions * Added metadata log replay and automatic compaction * Added detection of charts with no active collector (archived) * Added new endpoint to report archived charts via `/api/v1/archivedcharts` * Added support for collector metadata update Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
2020-06-04Fix Coverity defects 359164, 359165 and 358989. (#9268)Andrew Moss
Removed uses of the host lock that could deadlock senders and replaced with the new fine-grained mutex.
2020-05-28Added health check functionality to our Docker images. (#9172)Austin S. Hemmelgarn
* Add a `ping` command to netdatacli to check if agent is alive. This provides a way to trivially check if the agent itself appears to be running (namely, the command parser for netdatacli in the agent itself is working and responding), allowing users to check this without having to rely on us continuing to have `help` be a command sent to the agent instead of executing locally. * Add a basic health check to our docke rimages. This adds a relatively basic health checker script to our Docker images. By default it verifies that the `/api/v1/info` endpoint returns a 200 status code. It also supports checking different endpoints or using `netdatacli ping` to check that Netdata is running, all controlled by a new Docker environment variable: `NETDATA_HEALTH_CHECK`. * Avoid unnessecary `chmod` in Dockerfile. Suggested by @prologic. * Fix typo in docs. * Update environment variable name to be more clear. Also add `-L` to `curl` command in health check to follow redirects.
2020-05-21Improve system-info.sh to better handle certain cases when gathering info on ↵Austin S. Hemmelgarn
the system's disk capacity. (#7902) * Remove trailing whitespace in system-info.sh. * Fix handling of APFS on macOS. APFS can have multiple volumes in a single partition, which means that the same functional 'volume' can appear multiple times in the output of `df`. Duplicate lines for such volumes will show the same total size and available space along with a common prefix for the device name. This updates the parsing logic for `df` on macOS to account for this by deduplicating lines in the `df` output that have the same total size, available space, and same normalized device name. This has the potential to incorrectly under-account space in some cases, but the liklihood of that happeing is much less than the certainty of overaccounting space on standard APFS configurations. * Properly handle VirtIO block devices when using /sys. The VirtIO Block device driver uses a dynamically allocated device major number, meaning that we can't trivially match on it. This updates the handling to properly look it up in `/proc/devices` instead of just using the whole dynamic device number range. * Add handling for NVMe block devices in sysfs code. They use dynamic major numbers just like VirtIO Block devices do. * Switch to device major discovery in /proc/devices for all device types. This converts the code to use `/proc/devices` to look up correct device major numbers for block devices taht we treat as disks just like we are already doing for those that have dynamically assigned numbers. This makes the code both more robust and easier to understand and modify. This also excludes some particularly old hardware that we were originally looking for. If needed, we can add in the required device names, but for now it's better to keep the list concise. * Correct handling of device major discovery. We need to strip leading whitespace before calling cut, not after. * Only use /sys/block if we can read /proc/devices. We use `/proc/devices` to do device number lookups that we then use to filter devices under `/sys/block`. As a result, if we can't read `/proc/devices`, then we won't actually parse anything out of `/sys/block` either, so we need to just fall back to parsing `df` output. * Deduplicate `df` output by device name on Linux. This ensures that we properly handle BTRFS subvolumes, counting each actual volume only once. * Use POSIX math expansion instead of awk to sum disk sizes. This avoids the rather annoying habit of AWK of printing integers in scientific notation instead of as exact values. * Correct `sed` options for POSIX complaince. * Fix disk info fetching for macOS> POSIX tools, as found on macOS, lack a number of rather useful filtering and sorting features, so we need to get rather creative with the handling on macOS to make the disk space computation work correctly. This unfortunately makes the calculation a bit less reliable than it would have been had the existing calculations worked correctly, but it's the best I can come up with without making things exponentiall more complicated. * Properly handle sector size when using sysfs.
2020-05-20Restore SIGCHLD signal handler after being replaced by libuv (#9107)Markos Fountoulakis
2020-05-14Docs: Remove old Cloud/dashboard and replace with new Cloud/dashboard (#8874)Joel Hans
* Add options to daemon, clean up claiming * Caught one more old conf * Remove cloud docs * I did a lot of things * Rewrite tutorial step 3 * Remove my nodes menu, sync what-is-netdata * Restore ACLK/claim/daemon docs to be handled by docs-go-live * Fix up what-is-netdata * More cleanup of README/what-is-netdata * Restore daemon/config/README.md * Fix frontmatter * Change title, fix broken link * Copyediting fixes * Remove symbols * Add a few more GIFs * Fix hash * Fix other hash * Fix wording in web gui * Address Andrew's and Jacek's comments
2020-05-14Improve the impact of health code on netdata scalability (#8407)Markos Fountoulakis
* Add support for spawning processes without pipes. * Port health_alarm_execute() from mypopen() to netdata_spawn() * Make alarm notifications asynchronous within a single health thread iteration * Initial version of spawn server. * preliminary integration of spawn client with health
2020-05-13Update daemon output with new URLs and dates (#8965)Joel Hans
2020-05-12Restore old semantics of "netdata -W set" command (#8987)Markos Fountoulakis
2020-05-11Remove UNUSED word from flood protection configuration options ↵Markos Fountoulakis
documentation. (#8964)
2020-05-11Fix shutdown via netdatacli with musl C library (#8931)Markos Fountoulakis
2020-05-12Docs: Update with go-live claiming and ACLK information (#8859) (#8960)James Mills
* Restore docs from naughty PR * Address Andrew's comments * Ini to conf * Changes based on meeting with Andrew * Tweak text around claiming * Some grammar/typo fixes * Add /var/lib/netdata to Docker instructions on README * Added a few more ACLK links per Chris Co-authored-by: Joel Hans <joel@netdata.cloud>
2020-05-11Enable support for Netdata Cloud.Andrew Moss
This PR merges the feature-branch to make the cloud live. It contains the following work: Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com(opens in new tab)> Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)> Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud(opens in new tab)> Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)> Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com(opens in new tab)> Co-authored-by: Timotej S <6674623+underhood@users.noreply.github.com(opens in new tab)> Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com(opens in new tab)> * dashboard with new navbars, v1.0-alpha.9: PR #8478 * dashboard v1.0.11: netdata/dashboard#76 Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)> * Added installer code to bundle JSON-c if it's not present. PR #8836 Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)> * Fix claiming config PR #8843 * Adds JSON-c as hard dep. for ACLK PR #8838 * Fix SSL renegotiation errors in old versions of openssl. PR #8840. Also - we have a transient problem with opensuse CI so this PR disables them with a commit from @prologic. Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)> * Fix claiming error handling PR #8850 * Added CI to verify JSON-C bundling code in installer PR #8853 * Make cloud-enabled flag in web/api/v1/info be independent of ACLK build success PR #8866 * Reduce ACLK_STABLE_TIMEOUT from 10 to 3 seconds PR #8871 * remove old-cloud related UI from old dashboard (accessible now via /old suffix) PR #8858 * dashboard v1.0.13 PR #8870 * dashboard v1.0.14 PR #8904 * Provide feedback on proxy setting changes PR #8895 * Change the name of the connect message to update during an ongoing session PR #8927 * Fetch active alarms from alarm_log PR #8944
2020-05-08Updates main copyright and links for the year 2020 (#8937)Zack Shoylev
2020-04-22Fixed issue in `system-info.sh`regarding the parsing of `lscpu` output. (#8754)Austin S. Hemmelgarn
* Fix parsing issue in system-info.sh. Depending on the exact hardware it's run on, `lscpu` may or may not report a maximum and/or minimum CPU frequency. We want to preferentially match on the maximum if it's there because the regular CPU frequency entry from `lscpu` shows the _current_ frequency most of the time, and we want to report the 'intended' frequency for the CPU. * Fix the check to see if we found a CPU frequency value. * Actually fix parsing.
2020-04-14Docs: Standardize links between documentation (#8638)Joel Hans
* Trying out some absolute-ish links * Try one out on installer * Testing logic * Trying out some more links * Fixing links * Fix links in python collectors * Changed a bunch more links * Fix build errors * Another push of links * Fix build error and add more links * Complete first pass * Fix final broken links * Fix links to files * Fix for Netlify * Two more fixes
2020-04-13Revert "Revert changes since v1.21 in pereparation for hotfix release."Austin S. Hemmelgarn
This reverts commit e2874320fc027f7ab51ab3e115d5b1889b8fd747.
2020-04-13Revert changes since v1.21 in pereparation for hotfix release.Austin S. Hemmelgarn
2020-04-06Docs: Change MacOS to macOS (#8562)Joel Hans
* Change MacOS to macOS * Change Mac as noun to macOS system
2020-04-06Prometheus web api connector (#8540)Vladimir Kobal
* Fix the Prometheus web API code in the exporting engine * Rename connector types * Remove the conditional compilation of the exporting engine * Use labels instead of tags * Fix the exporter configuration * Document functions * Add unit tests
2020-03-31Extend TLS Support (#8505)thiagoftsm
* tls13: This commit brings TLS 1.3 to Netdata * tls13: Update variables on slave side * tls13: Fix compilation error for old libraries * tls13: Fix compilation error for old libraries 2 * tls13 remove ciphers * tls13: TLS versions This commit brings the missing tls versions accpeted for Netdata and it also brings documentation update related to these versions * tls13: Remove dupplication This commit removes wrong dupplication of code * tls13: Documentation This commit brings fix for the documentation * tls13: Remove magic number This commit removes the magic number to allow the code to be readable * tls13: TLS version Small adjust with TLS version * tls13: Security Init This commit removes array from the function and overwrite the magic number with a string * tls13: Remove new variable name from stream * tls13: OpenSSL versions and old key name This commit removes the new key names and also update the names used to define openssl version
2020-03-31Switching over to soft feature flag (#8545)Andrew Moss
Preparing for the cloud release. This changes how we handle the feature flag so that it no longer requires installer switches and can be set from the config file. This still requires internal access to use and is not ready for public access yet.
2020-03-31Improve the behavior of claiming (#8516)Andrew Moss
The default cloud url has been updated to app.netdata.cloud ready for the release. The claiming process now checks the current user executing claiming and refuses to perform the claim for the wrong user. If the current UID is 0 then claiming proceeds but the file ownership is adjusted to be the correct netdata user. The default expected user is `netdata` unless the script can identify the user from the current configuration. After the claiming script is executed the CLI is used to reload the claiming state.
2020-03-26Report Why ACLK build failed (#8429)Timo
Reports ACLK build failures to GA (if the user didn't opt-out)
2020-03-26Improved ACLK (#8498)Stelios Fragkakis
Improved the stability of the ACLK
2020-03-23Fix flushing error threshold (#8425)Markos Fountoulakis
* Fix flushing error threshold to account for inactive producers of dbengine instance * Disable page invalidation during shutdown of dbengine * Fix crash during netdata shutdown if command server has failed to initialize * Add fallback for uv_listen to retry with backlog = 1 on failure. * Adhere to the API change of libuv v1.35
2020-03-16Fix streaming scaling (#8375)Markos Fountoulakis
* Disallow multiple streaming connections to the same master agent * Reject multiple streaming connections quickly without blocking * Increase timeout for systemd service shutdown to give time to flush the db. * Optimize page correlation ID to use atomic counter instead of locks * Reduce contention in global configuration mutex * Optimize complexity of inserting configuration sections from O(N) to O(1) * Reduce overhead of clockgettime() by utilizing CLOCK_MONOTONIC_COARSE when applicable. * Fix unit test compile errors
2020-03-10Bulk add frontmatter to all documentation (#8354)Joel Hans
* Bulk add frontmatter * A few extra edge cases