netdata - Mirror of https://github.com/netdata/netdata

Age	Commit message (Collapse)	Author
2022-08-13	Remove extra U from log message (#13514)	Nicholas Chambers

2022-08-11	fix(packaging): add CAP_NET_ADMIN for go.d.plugin (#13507)	Ilya Mashchenko

2022-07-22	include Judy into our source tree (#13362)	Timotej S

2022-06-30	Remove obsolete --use-system-lws option from netdata-installer.sh help (#13272)	Dim-P
	Remove obsolete --use-system-lws option from netdata-installer.sh help
2022-06-23	chore(netdata-installer): remove a call to 'cleanup_old_netdata_updater()' ↵	Ilya Mashchenko
	because it is no longer exists (#13189)
2022-05-03	Rename --install option for kickstart.sh (#12798)	maneamarius
	* replace --install flag with --install-prefix and update documentation * fix * minor fix
2022-05-03	Remove node.d.plugin and relevant files (#12769)	Suraj Neupane
	* Remove node.d.plugin and relevant files * fix build packages * remove node.d related words/phrases from docs and tests
2022-05-02	Add `-pipe` to CFLAGS in most cases for builds. (#12709)	Austin S. Hemmelgarn
	* Add `-pipe` to CFLAGS in most cases for builds. This trades marginally higher memory usage at build time (on the order of a few hundred kB in the worst case scenario) for improved build times by avoiding using temporary files for passing data from the compiler to commands it invokes. * Suppress bogus shellcheck warnings. * Fix handling of CFLAGS in netdata-installer.sh.
2022-05-02	just a simple fix to avoid recompiling protobuf all the time (#12790)	Costa Tsaousis
	* just a simple fix to avoid recompiling protobuf all the time on our development environments * added quotes * remove bashism
2022-05-02	use '%b' argument when printing deferred errors (#12786)	Ilya Mashchenko

2022-04-28	Update netdata-installer.sh (#12755)	Pete Cooper
	installer help list formatting uniformity
2022-04-27	Correctly propagate errors and warnings up to the kickstart script from ↵	Austin S. Hemmelgarn
	scripts it calls. (#12686) * Overhaul deferred error handling in netdata-installer.sh * Propagate errors from netdata-installer.sh properly to the kickstart script. * Propagate errors from netdata-updater.sh properly to the kickstart script. * Overhaul logging in uninstaller and integrate with error propagation in kickstart.sh. * Fix name of variable used for propagating warnings. * Fix handling of run_ok and run_failed with no arguments. * Fix environment file validation in updater. * Add debugging info to CI. * Properly accept empty NETDATA_PREFIX in updater. * Convert remaining unguarded unsuccessful exits in updater to use fatal. * Fix usage of `env` in updater.
2022-04-18	Disable automake dependency tracking in our various one-time builds. (#12701)	Austin S. Hemmelgarn
	* Disable automake dependency tracking in our various one-time builds. * Also disable dependency tracking code in package builds.
2022-03-31	Fix Build on MacOS (#12554)	Timotej S

2022-03-22	Switch to using netdata-updater.sh to toggle auto updates on and off when ↵	Austin S. Hemmelgarn
	installing. (#12296) * Switch to using netdata-updater.sh to toggle auto updates on and off on install. * Apply suggestions from code review Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Make heading name less ambiguous. * Fix fallback case for unsupported updater script. * Fix invalid function name. Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com>
2022-03-15	Remove backends subsystem (#12146)	Vladimir Kobal

2022-03-14	Remove owner check from webserver (#12339)	thiagoftsm

2022-03-09	Final migration of release code off of Travis CI. (#12239)	Austin S. Hemmelgarn
	* Initial version of release workflow. * Remove release code from Travis config. Final removal of the Travis CI code will be handled separately. * Do not push changes if not running in GHA. This simplifies testing the core logic locally. * Remove remote branch existence checking. It is not strictly nesescary for the expected execution context, and it makes it harder to test locally safely. * Fixed some minor oversights. * Make git config repo local to make testing easier.
2022-03-08	CO-RE and syscalls (#12318)	thiagoftsm

2022-02-25	eBPF installation fixes (#12242)	thiagoftsm

2022-02-23	Add `--no-same-owner` to `tar -xf` in installer (#11940)	Christian Mäder
	This prevents errors like the following tar: protobuf-3.17.3: Cannot change ownership to uid 576694, gid 89939: Invalid argument
2022-02-22	remove deprecated node.d modules (#12047)	Ilya Mashchenko

2022-02-18	Revert "Overhaul handling of auto-updates in the installer code. (#12076)" ↵	Austin S. Hemmelgarn
	(#12182) This reverts commit da7f215ad6c98cbf54ab93dbc1d2457ac01dbb08.
2022-02-18	Overhaul handling of auto-updates in the installer code. (#12076)	Austin S. Hemmelgarn
	* Bundle updater script in native packages. * Move code for enabling/disabling auto-updates to netdata-updater.sh This lets us handle the logic sanely from the kickstart script regardless of the install method, and allows users to more reliably toggle auto-updates themselves without having to understand what is being done. * Add proper case-agnosticism to auto-update type selection. * Move auto-updater handling code to kickstart script. * Properly handle running against an older source tree. * First part of updater docmentation updates. * Fixed handling of updater in DEB packages. * Further documentation updates. * Minor typo fixes.
2022-02-17	tidy up the installer script usage message (#12171)	Pete Cooper

2022-02-15	rename DO_NOT_TRACK to DISABLE_TELEMETRY (#12126)	Ilya Mashchenko

2022-02-08	update scripts for POSIX compatibility (#11961)	maneamarius

2022-02-02	Fix compilation errors for OpenSSL on macOS (#12048)	Vladimir Kobal
	Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2022-01-31	fix typo, tidy up sentence (#12062)	Pete Cooper

2022-01-10	Update dependencies for the pubsub exporting connector (#11872)	Vladimir Kobal

2022-01-04	Removes ACLK Legacy (#11841)	Timotej S
	* remove legacy from makefiles * remove ACLK Legacy from installer * remove ACLK Legacy from configure.ac * remove legacy from cmake * aclk api cleanup * remove legacy files from packaging * changes for CI from Austin
2021-11-04	Sanely handle installing on systems with limited RAM. (#11658)	Austin S. Hemmelgarn
	* Add check for available RAM prior to installing. * Properly mark required shell for kickstart.sh It needs bash, and we only ever run it with bash, so make the shebang line indicate bash. This resolves a bunch of shellcheck complaints. * Add installation failure reporting statistics. The new event is called INSTALL_FAILED, and mostly mirrors a subset of the existing properties on the various agent events. * Fix error messages. * Fix checksums. * Apply suggestions from code review Co-authored-by: David Shreve, Jr. <david@netdata.cloud> * Fix kickstart checksums. * Fix memory calculations. They were off by a factor of 1024. * Fix kernel name handling. * Fix checksums. * Fix core count accounting. * Add a CLI option to skip checking for available RAM. * Addressed review feedback. * Update netdata-installer.sh Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> * Add RAM checking for ML builds. * Update packaging/installer/kickstart.sh Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> * Update packaging/installer/kickstart.sh Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> * Fix kickstart checksum. * Re-adjust memory limiting based on discussion from meeting. * Switch from using SI units to IEC units. * Fix typo. Co-authored-by: David Shreve, Jr. <david@netdata.cloud> Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2021-11-04	Add basic telemetry to the new kickstart script. (#11718)	Austin S. Hemmelgarn
	* Add basic telemetry to the new kickstart script. * Properly handle install type info for telemetry events. * Actually remove exit trap at end of script. * Update packaging/installer/kickstart-ng.sh Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> * Fix handling of memory info on macOS. * Addressed feedback from @ilyam8. * Fix issues pointed out in code reivew. * Update packaging/installer/kickstart-ng.sh Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> * Add install prefix to search path when looking for existing installs. * Make variable names more sensible. * Fix install prefix handling in new kickstart script. * Fix kernel name handling in telemetry function. * More generically strip final `/usr` from path when looking for existing install. * Update packaging/installer/kickstart-ng.sh Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud> Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2021-10-28	Add log grouping in installer and static build code when running under ↵	Austin S. Hemmelgarn
	GitHub Actions. (#11720) * Add log grouping in installer code when running under GitHub Actions. This will make our CI logs much easier to understand. * Add log grouping to static build process. * Use oneliner style group commands in netdata-installer.sh
2021-10-27	Anomaly Detection MVP (#11548)	vkalintiris
	* Add support for feature extraction and K-Means clustering. This patch adds support for performing feature extraction and running the K-Means clustering algorithm on the extracted features. We use the open-source dlib library to compute the K-Means clustering centers, which has been added as a new git submodule. The build system has been updated to recognize two new options: 1) --enable-ml: build an agent with ml functionality, and 2) --enable-ml-tests: support running tests with the `-W mltest` option in netdata. The second flag is meant only for internal use. To build tests successfully, you need to install the GoogleTest framework on your machine. * Boilerplate code to track hosts/dims and init ML config options. A new opaque pointer field is added to the database's host and dimension data structures. The fields point to C++ wrapper classes that will be used to store ML-related information in follow-up patches. The ML functionality needs to iterate all tracked dimensions twice per second. To avoid locking the entire DB multiple times, we use a separate dictionary to add/remove dimensions as they are created/deleted by the database. A global configuration object is initialized during the startup of the agent. It will allow our users to specify ML-related configuration options, eg. hosts/charts to skip from training, etc. * Add support for training and prediction of dimensions. Every new host spawns a training thread which is used to train the model of each dimension. Training of dimensions is done in a non-batching mode in order to avoid impacting the generated ML model by the CPU, RAM and disk utilization of the training code itself. For performance reasons, prediction is done at the time a new value is pushed in the database. The alternative option, ie. maintaining a separate thread for prediction, would be ~3-4x times slower and would increase locking contention considerably. For similar reasons, we use a custom function to unpack storage_numbers into doubles, instead of long doubles. * Add data structures required by the anomaly detector. This patch adds two data structures that will be used by the anomaly detector in follow-up patches. The first data structure is a circular bit buffer which is being used to count the number of set bits over time. The second data structure represents an expandable, rolling window that tracks set/unset bits. It is explicitly modeled as a finite-state machine in order to make the anomaly detector's behaviour easier to test and reason about. * Add anomaly detection thread. This patch creates a new anomaly detection thread per host. Each thread maintains a BitRateWindow which is updated every second based on the anomaly status of the correspondent host. Based on the updated status of the anomaly window, we can identify the existence/absence of an anomaly event, it's start/end time and the dimensions that participate in it. * Create/insert/query anomaly events from Sqlite DB. * Create anomaly event endpoints. This patch adds two endpoints to expose information about anomaly events. The first endpoint returns the list of anomalous events within a specified time range. The second endpoint provides detailed information about a single anomaly event, ie. the list of anomalous dimensions in that event along with their anomaly rate. The `anomaly-bit` option has been added to the `/data` endpoint in order to allow users to get the anomaly status of individual dimensions per second. * Fix build failures on Ubuntu 16.04 & CentOS 7. These distros do not have toolchains with C++11 enabled by default. Replacing nullptr with NULL should be fix the build problems on these platforms when the ML feature is not enabled. * Fix `make dist` to include ML makefiles and dlib sources. Currently, we add ml/kmeans/dlib to EXTRA_DIST. We might want to generate an explicit list of source files in the future, in order to bring down the generated archive's file size. * Small changes to make the LGTM & Codacy bots happy. - Cast unused result of function calls to void. - Pass a const-ref string to Database's constructor. - Reduce the scope of a local variable in the anomaly detector. * Add user configuration option to enable/disable anomaly detection. * Do not log dimension-specific operations. Training and prediction operations happen every second for each dimension. In prep for making this PR easier to run anomaly detection for many charts & dimensions, I've removed logs that would cause log flooding. * Reset dimensions' bit counter when not above anomaly rate threshold. * Update the default config options with real values. With this patch the default configuration options will match the ones we want our users to use by default. * Update conditions for creating new ML dimensions. 1. Skip dimensions with update_every != 1, 2. Skip dimensions that come from the ML charts. With this filtering in place, any configuration value for the relevant simple_pattern expressions will work correctly. * Teach buildinfo{,json} about the ML feature. * Set --enable-ml by default in the configuration options. This patch is only meant for testing the building of the ML functionality on Github. It will be reverted once tests pass successfully. * Minor build system fixes. - Add path to json header - Enable C++ linker when ML functionality is enabled - Rename ml/ml-dummy.cc to ml/ml-dummy.c * Revert "Set --enable-ml by default in the configuration options." This reverts commit 28206952a59a577675c86194f2590ec63b60506c. We pass all Github checks when building the ML functionality, except for those that run on CentOS 7 due to not having a C++11 toolchain. * Check for missing dlib and nlohmann files. We simply check the single-source files upon which our build system depends. If they are missing, an error message notifies the user about missing git submodules which are required for the ML functionality. * Allow users to specify the maximum number of KMeans iterations. * Use dlib v19.10 v19.22 broke compatibility with CentOS 7's g++. Development of the anomaly detection used v19.10, which is the version used by most Debian and Ubuntu distribution versions that are not past EOL. No observable performance improvements/regressions specific to the K-Means algorithm occur between the two versions. * Detect and use the -std=c++11 flag when building anomaly detection. This patch automatically adds the -std=c++11 when building netdata with the ML functionality, if it's supported by the user's toolchain. With this change we are able to build the agent correctly on CentOS 7. * Restructure configuration options. - update default values, - clamp values to min/max defaults, - validate and identify conflicting values. * Add update_every configuration option. Considerring that the MVP does not support per host configuration options, the update_every option will be used to filter hosts to train. With this change anomaly detection will be supported on: - Single nodes with update_every != 1, and - Children nodes with a common update_every value that might differ from the value of the parent node. * Reorganize anomaly detection charts. This follows Andrew's suggestion to have four charts to show the number of anomalous/normal dimensions, the anomaly rate, the detector's window length, and the events that occur in the prediction step. Context and family values, along with the necessary information in the dashboard_info.js file, will be updated in a follow-up commit. * Do not dump anomaly event info in logs. * Automatically handle low "train every secs" configuration values. If a user specifies a very low value for the "train every secs", then it is possible that the time it takes to train a dimension is higher than the its allotted time. In that case, we want the training thread to: - Reduce it's CPU usage per second, and - Allow the prediction thread to proceed. We achieve this by limiting the training time of a single dimension to be equal to half the time allotted to it. This means, that the training thread will never consume more than 50% of a single core. * Automatically detect if ML functionality should be enabled. With these changes, we enable ML if: - The user has not explicitly specified --disable-ml, and - Git submodules have been checked out properly, and - The toolchain supports C++11. If the user has explicitly specified --enable-ml, the build fails if git submodules are missing, or the toolchain does not support C++11. * Disable anomaly detection by default. * Do not update charts in locked region. * Cleanup code reading configuration options. * Enable C++ linker when building ML. * Disable ML functionality for CMake builds. * Skip LGTM for dlib and nlohmann libraries. * Do not build ML if libuuid is missing. * Fix dlib path in LGTM's yaml config file. * Add chart to track duration of prediction step. * Add chart to track duration of training step. * Limit the number dimensions in an anomaly event. This will ensure our JSON results won't grow without any limit. The default ML configuration options, train approximately ~1700 dimensions in a newly-installed Netdata agent. The hard-limit is set to 2000 dimensions which: - Is well above the default number of dimensions we train, - If it is ever reached it means that the user had accidentaly a very low anomaly rate threshold, and - Considering that we sort the result by anomaly score, the cutoff dimensions will be the less anomalous, ie. the least important to investigate. * Add information about the ML charts. * Update family value in ML charts. This fix will allow us to show the individual charts in the RHS Anomaly Detection submenu. * Rename chart type s/anomalydetection/anomaly_detection/g * Expose ML feat in /info endpoint. * Export ML config through /info endpoint. * Fix CentOS 7 build. * Reduce the critical region of a host's lock. Before this change, each host had a single, dedicated lock to protect its map of dimensions from adding/deleting new dimensions while training and detecting anomalies. This was problematic because training of a single dimension can take several seconds in nodes that are under heavy load. After this change, the host's lock protects only the insertion/deletion of new dimensions, and the prediction step. For the training of dimensions we use a dedicated lock per dimension, which is responsible for protecting the dimension from deletion while training. Prediction is fast enough, even on slow machines or under heavy load, which allows us to use the host's main lock and avoid increasing the complexity of our implementation in the anomaly detector. * Improve the way we are tracking anomaly detector's performance. This change allows us to: - track the total training time per update_every period, - track the maximum training time of a single dimension per update_every period, and - export the current number of total, anomalous, normal dimensions to the /info endpoint. Also, now that we use dedicated locks per dimensions, we can train under heavy load continuously without having to sleep in order to yield the training thread and allow the prediction thread to progress. * Use samples instead of seconds in ML configuration. This commit changes the way we are handling input ML configuration options from the user. Instead of treating values as seconds, we interpret all inputs as number of update_every periods. This allows us to enable anomaly detection on hosts that have update_every != 1 second, and still produce a model for training/prediction & detection that behaves in an expected way. Tested by running anomaly detection on an agent with update_every = [1, 2, 4] seconds. * Remove unecessary log message in detection thread * Move ML configuration to global section. * Update web/gui/dashboard_info.js Co-authored-by: Andrew Maguire <andrewm4894@gmail.com> * Fix typo Co-authored-by: Andrew Maguire <andrewm4894@gmail.com> * Rebase. * Use negative logic for anomaly bit. * Add info for prediction_stats and training_stats charts. * Disable ML on PPC64EL. The CI test fails with -std=c++11 and requires -std=gnu++11 instead. However, it's not easy to quickly append the required flag to CXXFLAGS. For the time being, simply disable ML on PPC64EL and if any users require this functionality we can fix it in the future. * Add comment on why we disable ML on PPC64EL. Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>
2021-10-20	New eBPF and libbpf releases (#11680)	thiagoftsm

2021-10-12	Added static builds for ARMv7l and ARMv8a (#11490)	Austin S. Hemmelgarn
	* Generic fixes for cross-arch static image builds. * Fixed handling of ARM static builds. * Add ARMv7l and ARMv8a static builds. * Fix static build deps. * Fix static build checks. * Bump OpenSSL version and optimize OpenSSL build. * Optimize bash build. * Bump cURL version and optimize cURL build. * Fix static build deps. * Fix bash build. * Further build fixes. * Fix cURL build. * Fix emulation handling.
2021-09-23	Add option for netdata-installer to allow compilation only on X86_64 (#11566)	thiagoftsm

2021-09-21	fix installer flag --use-system-protobuf (#11539)	Timotej S

2021-09-20	Update libbpf (#11480)	thiagoftsm

2021-09-15	Remove unused script (#11516)	thiagoftsm

2021-09-14	Allow arbitrary options to be passed to make from netdata-installer.sh. (#11479)	Austin S. Hemmelgarn
	* Allow arbitrary options to be passed to make from netdata-installer.sh. This is mostly intended to allow users to override the number of parallel jobs used during the build process. It’s been made generic as a handful of other options (most notably `--load-average`) are also potentially useful and because the generic behavior is consistent with how most source-based distros handle this type of thing. * Fix pattern checking of MAKEOPTS.
2021-08-25	Don’t bail early if we fail to build cloud deps with required cloud. (#11446)	Austin S. Hemmelgarn
	* Don’t bail early if we fail to build cloud deps with required cloud. Due to having two ACLK implementations, failing to build any arbitrary set of cloud dependencies is not guaranteed to prevent us from building a working cloud implementation, so we should fall back to relying on the configure script for determining if the cloud can be built. * add check in case both ACLKs fail Co-authored-by: Timotej Šiškovič <timotej@netdata.cloud>
2021-08-24	Check for failed protobuf configure or make (#11450)	Emmanuel Vasilakis
	* check for failed protobuf configure or make * revert error instead of warn
2021-08-16	Added support for bundling protobuf as part of the install. (#11374)	Austin S. Hemmelgarn
	* Add support for bundling protobu as part of the install. * Fix typo. * Fix `make dist`. * Fix handling of protobuf usage. * Add explicit check for ACLK-NG in builds. * only protos in dist from aclk-schemas Co-authored-by: Timotej Šiškovič <timotej@netdata.cloud>
2021-08-12	Default to not using LTO for builds. (#11432)	Austin S. Hemmelgarn
	* Default to not using LTO for builds. This significantly speeds up the build process and avoids strange linking errors. * Update installer help text.
2021-07-29	Update handling of builds of bundled dependencies. (#11375)	Austin S. Hemmelgarn
	This adds parallelization of the builds of the bundled dependencies (this provides a few percent build time reduction on slower systems), and switches them all to use the same version of `make` that Netdata will be built with (this changes nothing on most systems, but means that we will now uniformly use gmake on BSD systems).
2021-06-14	Allows ACLK NG and Legacy to coexist (#11225)	Timotej S

2021-06-01	Compile/Link with absolute paths for bundled/vendored deps. (#11129)	vkalintiris
	* Do not accept a path when using --with-bundled-lws. The bundled library is always placed under externaldeps/libwebsockets, when using the netdata-installer.sh script. When this option is missing, we look for the system-wide installed version. * Do not accept a path when using --with-bundled-libJudy. The bundled library is always placed under externaldeps/libJudy. When the option is not given, we look for the system-wide installed version. * Use absolute header paths for repo-internal deps. * Use absolute library paths for repo-internal deps.
2021-05-24	Store info about the installation type for later retrieval. (#11157)	Austin S. Hemmelgarn
	* Store info about the installation type for later retrieval. * Properly handle install type on updates. * Restructure install type values for easier parsing. * Fix checksums. * Fix .gitignore check.