Age | Commit message (Collapse) | Author |
|
* Moving the cloud docs under /docs/cloud (previous location: netdata/learn/*)
* Added metadata on almost every document of the old learn site for the new ingest process of learn.
* Map old learn document to their best fit as topic related docs.
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: DShreve2 <david@netdata.cloud>
Co-authored-by: hugovalente-pm <hugo@netdata.cloud>
|
|
* set web client to poll when ssl error want read or write
* turn to function
|
|
|
|
* fix(proc.plugin): add cpu label to per core util% charts
* fix codeql warning
|
|
|
|
|
|
* cache 100 pages for each size our tiers need
* smarter page caching
* account the caching structures
* dynamic max number of cached pages
* make variables const to ensure they are not changed
* make sure replication timestamps do not go to the future
* replication now sends chart and dimension states atomically; replication receivers ignores chart and dimension states when rbegin is also ignored
* make sure all pages are flushed on shutdown
* take into account empty points too
* when recalculating retention update first_time_s on metrics only when they are bigger
* Report the datafile number we use to recalculate retention
* Report the datafile number we use to recalculate retention
* rotate db at startup
* make query plans overlap
* Calculate properly first time s
* updated event labels
* negative page caching fix
* Atempt to create missing tables on query failure
* Atempt to create missing tables on query failure (part 2)
* negative page caching for all gaps, to eliminate jv2 scans
* Fix unittest
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
|
|
* run cleanup in workers
* when there is a discrepancy between update every, fix it
* fix the other occurences of metric update every mismatch
* allow resetting the same timestamp
* validate flushed pages before committing them to disk
* initialize collection with the latest time in mrg
* these should be static functions
* acquire metrics for writing to detect multiple data collections of the same metric
* print the uuid of the metric that is collected twice
* log the discrepancies of completed pages
* 1 second tolerance
* unify validation of pages and related logging across dbengine
* make do_flush_pages() thread safe
* flush pages runs on libuv workers
* added uv events to tp workers
* dont cross datafile spinlock and rwlock
* should be unlock
* prevent the creation of multiple datafiles
* break an infinite replication loop
* do not log the epxansion of the replication window due to start streaming
* log all invalid pages with internal checks
* do not shutdown event loop threads
* add information about collected page events, to find the root cause of invalid collected pages
* rewrite of the gap filling to fix the invalid collected pages problem
* handle multiple collections of the same metric gracefully
* added log about main cache page conflicts; fix gap filling once again...
* keep track of the first metric writer
* it should be an internal fatal - it does not harm users
* do not check of future timestamps on collected pages, since we inherit the clock of the children; do not check collected pages validity without internal checks
* prevent negative replication completion percentage
* internal error for the discrepancy of mrg
* better logging of dbengine new metrics collection
* without internal checks it is unused
* prevent pluginsd crash on exit due to calling pthread_cancel() on an exited thread
* renames and atomics everywhere
* if a datafile cannot be acquired for deletion during shutdown, continue - this can happen when there are hot pages in open cache referencing it
* Debug for context load
* rrdcontext uuid debug
* rrddim uuid debug
* rrdeng uuid debug
* Revert "rrdeng uuid debug"
This reverts commit 393da190826a582e7e6cc90771bf91b175826d8b.
* Revert "rrddim uuid debug"
This reverts commit 72150b30408294f141b19afcfb35abd7c34777d8.
* Revert "rrdcontext uuid debug"
This reverts commit 2c3b940dc23f460226e9b2a6861c214e840044d0.
* Revert "Debug for context load"
This reverts commit 0d880fc1589f128524e0b47abd9ff0714283ce3b.
* do not use legacy uuids on multihost dbs
* thread safety for journafile size
* handle other cases of inconsistent collected pages
* make health thread check if it should be running in key loops
* do not log uuids
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* remove MQTT-C (MQTT 3 implementation) from buildsystem
|
|
* Update kickstart script to use new DEB infrastructure.
* Fix package filename suffix handling for DEB packages.
* Fix the DEB package availability check to use new repo URL.
|
|
|
|
* query preparation runs before extent reads
* populate mrg in parallel
* fix formatting warning
* first search for a metric then add it if it does not exist
* Revert "first search for a metric then add it if it does not exist"
This reverts commit 4afa6461fcce859d03f1c9cf56dd3b5933ee5ebc.
* Revert "fix formatting warning"
This reverts commit 49473493f7f1c3399b5635a573d3c6ed2b6e46f3.
* Revert "populate mrg in parallel"
This reverts commit a40166708d4222f6329904f109114c47c44ca666.
* merge journalfiles metrics before committing them to MRG
* Revert "merge journalfiles metrics before committing them to MRG"
This reverts commit 50c8934e23a0a09ea4da80e3f88290e46496ad92.
* Revert "Revert "populate mrg in parallel""
This reverts commit f4c149d2ab7a8c9af24a10f95438a0d662a5cf8a.
* Revert "Revert "fix formatting warning""
This reverts commit 78298ff9efc49806ded029f5f1e868cc42e8f6eb.
* Revert "Revert "first search for a metric then add it if it does not exist""
This reverts commit 997b9c813b290882ba18a8c44bf73f9ee5480adf.
* preload first and last journal files v2
* fix formatting warning
* parallel loading of tiers; cleanup of ctx structures
* use half the cores
* add partitions to metrics registry
* revert accidental change
* parallel processing according to MRG partitions; dont recalculate retention on exit
|
|
|
|
* Bump the soft limit on open FDs to the max.
On systems with a low soft-limit for open file descriptors, the agent
would fail to initialize all dbengine tiers.
* Iterate the right number of dbengine tiers.
For whatever reason, this was causing a crash on MacOS but it was
running "correctly" on Linux systems.
|
|
|
|
* add consul license alarm
* minor
|
|
Switch from OCI to Docker images for transferring containers between CI steps.
|
|
* Update DEB repository configuration to new infrastructure.
* Fix typo.
|
|
Update to sqlite3 version 3.40.1
|
|
|
|
* track memory footprint of Netdata
* track db modes alloc/ram/save/map
* track system info; track sender and receiver
* fixes
* more fixes
* track workers memory, onewayalloc memory; unify judyhs size estimation
* track replication structures and buffers
* Properly clear host RRDHOST_FLAG_METADATA_UPDATE flag
* flush the replication buffer every 1000 times the circular buffer is found empty
* dont take timestamp too frequently in sender loop
* sender buffers are not used by the same thread as the sender, so they were never recreated - fixed it
* free sender thread buffer on replication threads when replication is idle
* use the last sender flag as a timestamp of the last buffer recreation
* free cbuffer before reconnecting
* recreate cbuffer on every flush
* timings for journal v2 loading
* inlining of metric and cache functions
* aral likely/unlikely
* free left-over thread buffers
* fix NULL pointer dereference in replication
* free sender thread buffer on sender thread too
* mark ctx as used before flushing
* better logging on ctx datafiles closing
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
|
|
Remove undocumented archivedcharts endpoint. Use context endpoint instead
Remove unused functions to lookup chart and dimension UUIDs
Drop/Add new index for dimension and chart tables
|
|
* Add for_each_open_fd() and fix second instance of _SC_OPEN_MAX
* Add argument to allow exclusion of file descriptors from closing
* Fix clang error
* Address review comments
* Use close_range() if possible and replace macros with enums
|
|
* cleanup journal v2 mounts periodically
* fix for last commit
* re-enable loading page from disk when the arrangement of pages requires it
* Remove unused statistics
* Estimate diskspace when the current datafile is full and queue a rotate command (Currently it will not attempt to estimate end size for journals)
Queue a command to check quota on startup per tier
* apps.plugin now exposes RSS chart
* shorter thread names to make debugging easier, since thread names can only be 15 characters
* more thread names fixes
* allow an apps_groups.conf target to be pid 0 or 1
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
|
|
|
|
* revert health to single thread
* remove getting now
* use a health struct
* remove commented code
* cleanup health log from metdata
* dont check for METADATA_UPDATE
|
|
|
|
do not lock the entire datafile list while a datafile is being deleted
|
|
return true when the file is already unmounted
|
|
* reduce journal v2 shared memory using madvise() - not integrated yet
* working attempt to minimize dbengine shared memory
* never call willneed - let the kernel decide which parts of each file are really needed
* journal files get MADV_RANDOM
* dont call MADV_DONTNEED too frequently
* madvise() is always called with the journal unlocked but referenced
* call madvise() even less frequently
* added chart for monitoring database events
* turn batch mode on under critical conditions
* max size to evict is 1/4 of the max
* fix max size to evict calculation
* use dbengine_page/extent_alloc/free to pages and extents allocations, tracking also the size of these allocations at free time
* fix calculation for batch evictions
* allow main and open cache to have as many evictors as needed
* control inline evictors for each cache; report different levels of cache pressure on every cache evaluation
* more inline evictors for extent cache
* bypass max inline evictors above critical level
* current cache usage has to be taken
* re-arrange items in journafile
* updated docs - work in progress
* more docs work
* more docs work
* Map / unmap journal file
* draw.io diagram for dbengine operations
* updated dbengine diagram
* updated docs
* journal files v2 now get mapped and unmapped as needed
* unmap journal v2 immediately when getting retention
* mmap and munmap do not block queries evaluating journal files v2
* have only one unmap function
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* store host and claim info as soon as possible
* no need to set the flag
* check for metasync_worker.loop
|
|
make sure vars are sent after SENDER_CONNECTED flag is set
|
|
* Clarify the cloud option in the Readme
* Add Netdata Cloud image
* reviewed some typos and did small tweaks
* small typo
* Update README.md
Co-authored-by: Chris Akritidis <43294513+cakrit@users.noreply.github.com>
* Update README.md
Co-authored-by: Chris Akritidis <43294513+cakrit@users.noreply.github.com>
* Update README.md
* typo
* grammer
* small add
* clean up
Co-authored-by: Alex Malkov <alex.a.malkov@gmail.com>
Co-authored-by: hugovalente-pm <hugo@netdata.cloud>
Co-authored-by: Chris Akritidis <43294513+cakrit@users.noreply.github.com>
|
|
check session variable before resuming it
|
|
|
|
add kaitaistruct for journal v2 files
|
|
|
|
This will bring us back to running only native packaging jobs on most
PRs instead of running all packaging jobs on all PRs.
|
|
If it takes more than an hour to run the updater, something has gone
horribly wrong, so just kill it instead of letting it keep running.
|
|
Replace individual collector images/links with one
Link to www.netdata.cloud/integrations instead
|
|
|
|
* Switch nightlies to GitHub releases.
Instead of using GCS.
* Fix CI handling.
* Fix handling of download URLs for nightly builds.
* Fix handling of redirects for consolidated artifact checks.
* Avoid redirect issues with the test environment.
* Add more info to logs for updater checks.
* Ignore redirect issues for updater checks.
* Fix base URL handling in updater.
* Dump post-update info in CI before checking if the update worked.
* Special-case a version of `latest` in updater.
This is to allow CI to work correctly.
* Update nightly release badge in README.md.
* Fix updater check variable name.
* Add a comment documenting the magic number in parse_version.
|
|
systemd service (#14255)
Fixes https://github.com/netdata/netdata/issues/14238
|
|
|
|
|
|
|
|
|