diff options
author | Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com> | 2022-11-15 23:00:53 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-11-15 23:00:53 +0200 |
commit | 224b051a2b2bab39a4b536e531ab9ca590bf31bb (patch) | |
tree | adb3ca35d6d6a4d4f1b7aad50542619c3efb38c0 /Makefile.am | |
parent | b4a0298bd48f217c4a6f2eaf729e0684966ea7a3 (diff) |
New journal disk based indexing for agent memory reduction (#13885)
* Add read only option to netdata_mmap so files are accessed ousing PROT_READ
* Initial functions to write the new journal file and switch to the new indexing
* Cleanup code, add parameters to pg_cache_punch_hole to avoid updating page latets oldest times
pg_cache insert to have parameter if page index locked needs to be done
Page eviction functions will try to deallocate the descriptor as well (pg_cache_punch_hole without page_index time updates)
Cleanup messages during startup
* Cleanup messages during startup
* Disbale extent caching for now, add placeholder for journal indexing and activation while the agent is running
* Add main function to populate descriptors by checking the new journal indexing
* prevent crash
* fix for binary search crash
* Avoid Time-of-check time-of-use filesystem race condition
* always add a page
* populate fixes - it is still incomplete
* pg_cache_insert returns the descriptor that ends up in the page_index
* Add populate next (Fix 1)
* Fix compilation warnings, reactivate extent caching
* Add populate next (Fix 2)
* Add populate next (Fix 3) switch to the next entry or journal file when asking to populate descriptor with next
* Fix resource leak and wrong sizeof
* Rework page population (part 1)
* Additional checksums added / journal validation
* Cleanup (part 1)
* Locking added and Cleanup (part 2)
* Close journal file after new journal index activation
* Skip warning when compiling without NETDATA_INTERNAL_CHECKS
* Ignore empty index file (header and trailer and no metrics)
* Try to remove all evicted descriptors (may prevent slight memory increase)
* Evict pages also when we succesfully do try_reserve
* Precache pages and cleanup
* Add a separate cleanup thread to release unused descriptors
* Check existence of key correctly
* Fix total file size calculation
* Statistics for journal descriptors
* Track and release jourval v2 descriptors
* Do not try to allocate pages for locality if under pressure
* Do not track v2 descriptors when populating the page_index
* Track page descriptors as they are inserted in the page index (per journal file)
Scan journal files for pending items to cleanup
Cleanup v2 descriptors only if they are not populated
Check before adding to page cache to avoid memory allocation /free
* Close journal file that has been processed and migrated to the new index
Check for valid file before trying to truncate / close. This file has been closed during startup
* Better calculation for the number of prefetched data pages based on the query end time
Code cleanup and comments
Add v2 populated descriptor expiration based on journal access time
* Code cleanup
* Faster indexing
Better journal validation (more sanity checks)
Detect new datafile/ journal creation and trigger index generation
Switch to the new index / mark descriptors in memory as needed
Update journal access time when a descriptor is returned
Code cleanup (part 1)
* Re activate descriptor clean
Code cleanup
* Allow locality precaching
* Allow locality precaching for the same page alignment
* Descriptor cleanup internal changed
* Disable locality precaching
* Precache only if not under pressure / internal cleanup at 60 seconds
* Remove unused functions
* Migrate on startup always
Make sure the metric uuid is valid (we have a page_index)
Prevent crash if no datafile is available when logging an error
Remove unused functions
* New warn limit for precaching
Stress test v2 descriptor cleanup
- Every 1s cleanup if it doesnt exist in cache
- 60s cache eviction
* Arrayalloc internal checks on free activated with NETDATA_ARRAYALLOC_INTERNAL_CHECKS
Ability to add DESCRIPTOR_EXPIRATION_TIME and DESCRIPTOR_INTERVAL_CLEANUP during compile
Defaults DESCRIPTOR_INTERVAL_CLEANUP = 60 and DESCRIPTOR_EXPIRATION_TIME = 600
* Lookup page index correctly
* Calculate index time once
* Detect a duplicate page when doing cache insert and during flushing of pages
* Better logging
* Descriptor validation (extent vs page index) when building an index file while the agent is running
* Mark invalid entries in the journal v2 file
* Schedule an index rebuild if a descriptor is found without an extent in the timerange we are processing
Release descriptor lock to prevent random shutdown locks
* Proper unlock
* Skip descriptor cleanup when journal file v2 migration is running
* Fix page cache statistics
Remove multiple entries of the page_index from the page cache
Cleanup
* Adjust preload pages on pg_cache_next. Handle invalid descriptor properly
Unlock properly
* Better handling of invalid pages
Journal indexing during runtime will scan all files to find potential ones to index
* Reactivate migration on startup
Evict descriptors to cause migration
Don't count the entries in page index (calculate when processing the extent list)
Check for valid extent since we may set the extent to NULL on startup if it is invalid
Better structure init
Address valgrind issues
* Add don't fork/dump option
* Add separate lock to protect accessing a datafile's extent list
Comment out some unused code (for now)
Abort descriptor cleanup if we are force flushing pages (page cache under pressure)
* Check for index and schedule when data flush completes
Configure max datafile size during compilation
Keep a separate JudyL array for descriptors
Skip quota test if we are deleting descriptors or explicitly flushing pages under pressure
* Fix
* set function when waiters are waken up
* add the line number to trace the deadlock
* add thread id
* add wait list
* init to zero
* disable thread cancelability inside dbengine rrdeng_load_page_next()
* make sure the owner is the thread
* disable thread cancelability for replication as a whole
* Check and queue indexing after first page flush
* Queue indexing after a small delay to allow some time for page flushing
* tracing of waiters only when compiled with internal checks
* Mark descr with extent_entry
* Return page timeout
* Check if a journalfile is ready to be indexed
Migrate the descriptors or evict if possible
Compilation warning fix
* Use page index if indexing during startup
Mark if journalfile should be checked depending on whether we can migrate or delete a page during indexing
* require 3x max message size as sender buffer
* fix for the msg of the adaptive buffer size
* fix for the msg of the duplicate replication commands
* Disable descriptor deletion during migration
* Detect descriptor with same start page time
* sender sorts replication requests before fullfilling them; receiver does not send duplicate replication requests
* dbengine never allows past timestamps to be collected
* do not accept values same as last data point stored in dbengine
* replicate non-overlapping ranges
* a better replication logic to avoid sending overlapping data to parents
* Do not start journal migration in parallel
* Always update page index times
* Fix page index first / last times on load
* internal log when replication responses do not match the requests or when replication commands are sent while there are others inflight
* do not log out of bounds RBEGIN if it is the last replication command we sent
* better checking of past data collection points
* better checking of past data collection points - optimized
* fix corruption during decompression of streaming
* Add config to disable journal indexing
Add config parameter for detailed journal integrity check (Metric chain validation check during startup)
pg cache insert drop check for existing page
Fix crc calculation for metric headers
* children disable compression globally, only when the compression gives an error
* turn boolean member into RRDHOST OPTION
* Compilation warnings
* Remove unused code
* replication sender statistics
* replication sender statistics set to 100% when no replication requests are pending
* Fix casting warning
Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
Diffstat (limited to 'Makefile.am')
0 files changed, 0 insertions, 0 deletions