Age | Commit message (Collapse) | Author |
|
* allows usage of system libwebsockets
* fixes problems that were preventing ACLK to work with LWS `4.1.`
* add LWS info to buildinfo
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud>
|
|
|
|
* Add a way to get build configuration info from the agent.
This adds a new option to the `-W` switch called 'buildinfo'. When
invoked with this argument, Netdata will print it's version, the
configure options, and a list of optional features and whether they are
enabled or not.
This is intended to serve three purposes:
* It allows developers to more quickly get an idea of how Netdata was
built when triaging bug reports.
* It provides an easier way to validate changes to the build system that
affect optional features during the development cycle.
* It provides an easier way to build CI workflows that validate that
building under a given set of constraints results in a feature being
enabled or not.
The actual implementation is a bit large but overall exceedingly simple,
consisting of a set of preprocessor directives to extract optional
feature state information from config.h and then a series of printf()
calls to actually report this info (which should end up optimized by
smart compilers due to all the arguments being compile-time constants).
* Added zlib to optional libraries.
* Added remaining optional plugins to buildinfo output.
* Changed formatting to be more human friendly.
* Add remaining optional libraries.
* Fix up formatting to be even more human friendly.
* Fix typo in buildinfo output.
* Remove unused variable.
* Fixed spelling of config.h option name.
* Update daemon/buildinfo.c
Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
* Fix option name mismatch for libcrypto.
* Update daemon/buildinfo.c
Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
|
|
Added functionality to support composite charts
|
|
* fix lack macOS ram info in system-info.sh
* aligned spaces
|
|
* streams claimed_id of child nodes to parents
* adds this information into /api/v1/info
|
|
Fixed memory leak issues associated with the global GUID map during agent shutdown
|
|
Fix access using Unix sockets when Netdata is installed with kickstart-static64
|
|
Change the default home to be VARLIB_DIR instead of CACHE_DIR so that it is consistent with the installation. Override the default with the HOME var if set in the environment.
|
|
* Fix crash when receiving malformed labels via streaming.
* Disallow empty _virtualization values in system_info.
|
|
|
|
* Hard code a node for non-legacy multidb test
Skip dbengine initialization for new incoming children
Add code to switch to multidb ctx when accessing the dbengine
* When a non-legacy streaming connection is detected, use the multidb metadata log context
* Clear the superblock memory to avoid random data written in the metadata log
* Activate the host detection during compaction
Activate the host detection during metadata log chart updates
Keep the host in the user object during replay of the HOST command
* Add defaults for health / rrdpush on HOST metadata replay
Check for legacy status on host creation by checking is_archived and if not conclusive, call is_legacy_child()
Use defaults from the stream.conf
* Count hosts only if not archived
When host switches from archived to active update rrd_hosts_available
Remove archived hosts from charts and info
* Change parameter from "multidb disk space" to "dbengine multihost disk space"
Remove unused variables
Fix compilation error when dbengine is disabled
Fix condition for machine_guid directory creation under cache_dir
* Enable multidb disk space file creation.
* Stop deleting dimensions when rotating archived metrics if the dimension is active in a different database engine.
* Fix old bug in the code that confused obsolete hosts with orphan hosts.
* Do not delete multi-host DB host files.
* Discard dbengine state when a legacy memory mode instantiates to avoid inconsistencies.
* Identify metadata that collide with non-dbengine memory mode hosts and ignore them.
* Handle non-dbengine localhost with dbengine archived charts in localhost and streaming.
* Ignore archived hosts in streaming.
* Add documentation before merging to master.
Co-authored-by: Markos Fountoulakis <markos.fountoulakis.senior@gmail.com>
|
|
* Add lock dir
* Clean directory on startup
* Update environment variable name
* Fix file removal
* Add error message
* collectors/python.d: change lock file name
* collectors/python.d: add `nolock` cmd option
Co-authored-by: ilyam8 <ilya@netdata.cloud>
|
|
Bring new options to ebpf.plugin.
|
|
Implemented default disk space size calculation for multihost db (#9504)
|
|
This causes failure to detect virtualization to be reported as no
virtualization instead of unknown virtulization.
|
|
Fix unit test execution( ` Cannot create unique machine id file`).
|
|
Move registry to do integration with spawn server.
|
|
* Replace all assert() calls with the new fatal_assert() for proper logging.
|
|
* Get netdata execution path early to avoid user permission issues
|
|
* Intial pass through docs
* Dash instead of slash
* To parent/child
* Child nodes
* Change diagrams
* Allowlist
* Fixes for Andrew
* Remove from build_external
* Change in proc
|
|
|
|
* Implemented collector metadata logging
* Added persistent GUIDs for charts and dimensions
* Added metadata log replay and automatic compaction
* Added detection of charts with no active collector (archived)
* Added new endpoint to report archived charts via `/api/v1/archivedcharts`
* Added support for collector metadata update
Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>
|
|
Removed uses of the host lock that could deadlock senders and replaced with the new fine-grained mutex.
|
|
* Add a `ping` command to netdatacli to check if agent is alive.
This provides a way to trivially check if the agent itself appears to be
running (namely, the command parser for netdatacli in the agent itself
is working and responding), allowing users to check this without having
to rely on us continuing to have `help` be a command sent to the agent
instead of executing locally.
* Add a basic health check to our docke rimages.
This adds a relatively basic health checker script to our Docker images.
By default it verifies that the `/api/v1/info` endpoint returns a 200
status code.
It also supports checking different endpoints or using `netdatacli ping`
to check that Netdata is running, all controlled by a new Docker
environment variable: `NETDATA_HEALTH_CHECK`.
* Avoid unnessecary `chmod` in Dockerfile.
Suggested by @prologic.
* Fix typo in docs.
* Update environment variable name to be more clear.
Also add `-L` to `curl` command in health check to follow redirects.
|
|
the system's disk capacity. (#7902)
* Remove trailing whitespace in system-info.sh.
* Fix handling of APFS on macOS.
APFS can have multiple volumes in a single partition, which means that
the same functional 'volume' can appear multiple times in the output of
`df`. Duplicate lines for such volumes will show the same total size and
available space along with a common prefix for the device name.
This updates the parsing logic for `df` on macOS to account for this by
deduplicating lines in the `df` output that have the same total size,
available space, and same normalized device name.
This has the potential to incorrectly under-account space in some cases,
but the liklihood of that happeing is much less than the certainty of
overaccounting space on standard APFS configurations.
* Properly handle VirtIO block devices when using /sys.
The VirtIO Block device driver uses a dynamically allocated device major
number, meaning that we can't trivially match on it.
This updates the handling to properly look it up in `/proc/devices`
instead of just using the whole dynamic device number range.
* Add handling for NVMe block devices in sysfs code.
They use dynamic major numbers just like VirtIO Block devices do.
* Switch to device major discovery in /proc/devices for all device types.
This converts the code to use `/proc/devices` to look up correct device
major numbers for block devices taht we treat as disks just like we are
already doing for those that have dynamically assigned numbers. This
makes the code both more robust and easier to understand and modify.
This also excludes some particularly old hardware that we were
originally looking for. If needed, we can add in the required device
names, but for now it's better to keep the list concise.
* Correct handling of device major discovery.
We need to strip leading whitespace before calling cut, not after.
* Only use /sys/block if we can read /proc/devices.
We use `/proc/devices` to do device number lookups that we then use to
filter devices under `/sys/block`. As a result, if we can't read
`/proc/devices`, then we won't actually parse anything out of
`/sys/block` either, so we need to just fall back to parsing `df`
output.
* Deduplicate `df` output by device name on Linux.
This ensures that we properly handle BTRFS subvolumes, counting each
actual volume only once.
* Use POSIX math expansion instead of awk to sum disk sizes.
This avoids the rather annoying habit of AWK of printing integers in
scientific notation instead of as exact values.
* Correct `sed` options for POSIX complaince.
* Fix disk info fetching for macOS>
POSIX tools, as found on macOS, lack a number of rather useful filtering
and sorting features, so we need to get rather creative with the
handling on macOS to make the disk space computation work correctly.
This unfortunately makes the calculation a bit less reliable than it
would have been had the existing calculations worked correctly, but it's
the best I can come up with without making things exponentiall more
complicated.
* Properly handle sector size when using sysfs.
|
|
|
|
* Add options to daemon, clean up claiming
* Caught one more old conf
* Remove cloud docs
* I did a lot of things
* Rewrite tutorial step 3
* Remove my nodes menu, sync what-is-netdata
* Restore ACLK/claim/daemon docs to be handled by docs-go-live
* Fix up what-is-netdata
* More cleanup of README/what-is-netdata
* Restore daemon/config/README.md
* Fix frontmatter
* Change title, fix broken link
* Copyediting fixes
* Remove symbols
* Add a few more GIFs
* Fix hash
* Fix other hash
* Fix wording in web gui
* Address Andrew's and Jacek's comments
|
|
* Add support for spawning processes without pipes.
* Port health_alarm_execute() from mypopen() to netdata_spawn()
* Make alarm notifications asynchronous within a single health thread iteration
* Initial version of spawn server.
* preliminary integration of spawn client with health
|
|
|
|
|
|
documentation. (#8964)
|
|
|
|
* Restore docs from naughty PR
* Address Andrew's comments
* Ini to conf
* Changes based on meeting with Andrew
* Tweak text around claiming
* Some grammar/typo fixes
* Add /var/lib/netdata to Docker instructions on README
* Added a few more ACLK links per Chris
Co-authored-by: Joel Hans <joel@netdata.cloud>
|
|
This PR merges the feature-branch to make the cloud live. It contains the following work:
Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com(opens in new tab)>
Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)>
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud(opens in new tab)>
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com(opens in new tab)>
Co-authored-by: Timotej S <6674623+underhood@users.noreply.github.com(opens in new tab)>
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com(opens in new tab)>
* dashboard with new navbars, v1.0-alpha.9: PR #8478
* dashboard v1.0.11: netdata/dashboard#76
Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)>
* Added installer code to bundle JSON-c if it's not present. PR #8836
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
* Fix claiming config PR #8843
* Adds JSON-c as hard dep. for ACLK PR #8838
* Fix SSL renegotiation errors in old versions of openssl. PR #8840. Also - we have a transient problem with opensuse CI so this PR disables them with a commit from @prologic.
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
* Fix claiming error handling PR #8850
* Added CI to verify JSON-C bundling code in installer PR #8853
* Make cloud-enabled flag in web/api/v1/info be independent of ACLK build success PR #8866
* Reduce ACLK_STABLE_TIMEOUT from 10 to 3 seconds PR #8871
* remove old-cloud related UI from old dashboard (accessible now via /old suffix) PR #8858
* dashboard v1.0.13 PR #8870
* dashboard v1.0.14 PR #8904
* Provide feedback on proxy setting changes PR #8895
* Change the name of the connect message to update during an ongoing session PR #8927
* Fetch active alarms from alarm_log PR #8944
|
|
|
|
* Fix parsing issue in system-info.sh.
Depending on the exact hardware it's run on, `lscpu` may or may not
report a maximum and/or minimum CPU frequency. We want to preferentially
match on the maximum if it's there because the regular CPU frequency
entry from `lscpu` shows the _current_ frequency most of the time, and
we want to report the 'intended' frequency for the CPU.
* Fix the check to see if we found a CPU frequency value.
* Actually fix parsing.
|
|
* Trying out some absolute-ish links
* Try one out on installer
* Testing logic
* Trying out some more links
* Fixing links
* Fix links in python collectors
* Changed a bunch more links
* Fix build errors
* Another push of links
* Fix build error and add more links
* Complete first pass
* Fix final broken links
* Fix links to files
* Fix for Netlify
* Two more fixes
|
|
This reverts commit e2874320fc027f7ab51ab3e115d5b1889b8fd747.
|
|
|
|
* Change MacOS to macOS
* Change Mac as noun to macOS system
|
|
* Fix the Prometheus web API code in the exporting engine
* Rename connector types
* Remove the conditional compilation of the exporting engine
* Use labels instead of tags
* Fix the exporter configuration
* Document functions
* Add unit tests
|
|
* tls13: This commit brings TLS 1.3 to Netdata
* tls13: Update variables on slave side
* tls13: Fix compilation error for old libraries
* tls13: Fix compilation error for old libraries 2
* tls13 remove ciphers
* tls13: TLS versions
This commit brings the missing tls versions accpeted for Netdata
and it also brings documentation update related to these versions
* tls13: Remove dupplication
This commit removes wrong dupplication of code
* tls13: Documentation
This commit brings fix for the documentation
* tls13: Remove magic number
This commit removes the magic number to allow the code to be readable
* tls13: TLS version
Small adjust with TLS version
* tls13: Security Init
This commit removes array from the function and overwrite the magic number
with a string
* tls13: Remove new variable name from stream
* tls13: OpenSSL versions and old key name
This commit removes the new key names and also update the names
used to define openssl version
|
|
Preparing for the cloud release. This changes how we handle the feature flag so that it no longer requires installer switches and can be set from the config file. This still requires internal access to use and is not ready for public access yet.
|
|
The default cloud url has been updated to app.netdata.cloud ready for the release. The claiming process now checks the current user executing claiming and refuses to perform the claim for the wrong user. If the current UID is 0 then claiming proceeds but the file ownership is adjusted to be the correct netdata user. The default expected user is `netdata` unless the script can identify the user from the current configuration. After the claiming script is executed the CLI is used to reload the claiming state.
|
|
Reports ACLK build failures to GA (if the user didn't opt-out)
|
|
Improved the stability of the ACLK
|
|
* Fix flushing error threshold to account for inactive producers of dbengine instance
* Disable page invalidation during shutdown of dbengine
* Fix crash during netdata shutdown if command server has failed to initialize
* Add fallback for uv_listen to retry with backlog = 1 on failure.
* Adhere to the API change of libuv v1.35
|
|
* Disallow multiple streaming connections to the same master agent
* Reject multiple streaming connections quickly without blocking
* Increase timeout for systemd service shutdown to give time to flush the db.
* Optimize page correlation ID to use atomic counter instead of locks
* Reduce contention in global configuration mutex
* Optimize complexity of inserting configuration sections from O(N) to O(1)
* Reduce overhead of clockgettime() by utilizing CLOCK_MONOTONIC_COARSE when applicable.
* Fix unit test compile errors
|
|
* Bulk add frontmatter
* A few extra edge cases
|