Age | Commit message (Collapse) | Author |
|
* Bulk add frontmatter
* A few extra edge cases
|
|
* ACLK connection and protocol improvements (#8139)
* Adding ACLK retry on connection failure (#8147)
* Fixed reconnect issues on the ACLK. (#8163)
* Cleaning up ACLK - part 1 (#8167)
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
|
|
* Fixes to DOCS home and README
* Edit conf-guide and getting-started
* Add dbengine settings to map
* Fix tutorial and step-by-step
* Fix artifacts of old memory mode types
* A few tweaks
* Push a little harder on README
* Fix for Markos
|
|
|
|
* Fixes for database/readme.md
* Fixes for registry/readme.md
* Fixes for daemon/readme.md
* Fixes for database/engine/readme.md
* Fixes for registry/readme.md
* Fix for cli/readme.md
* Fixes on docs/a-github-star-is-important.md
* A few more documents
|
|
* update_info: New variables
This commit creates inside script and it reads them to Netdata
* update_info: API
This commit changes the web api response
* update_info: Disk space
This commit brings the disk space to info and renames the environment variables inside Netdata
* update_info: Rename variable
This commit renames the environment variable
* update_info: Rename response variable
This commit renames a response variable
* update_info: Labels
This commit creates the missing labels
* update_info: test before free
* update_info: Doc function
This commit brings docummentation to the functions to give instructions to developer
* update_info: Fix info message
This commit removes some info messages from the error.log
* update_info: Remove unecessary ifs, considering free manual
|
|
* Introduce dirty page pressure handling in the dbengine page cache that invalidates pages when the disk cannot keep up with the flushing speed.
|
|
* - Add initial mqtt support
* [WIP] Agent cloud link
- Setup main mqtt thread to connect to a broker using V5 of the MQTT protocol (TBD)
- Send alarms to "netdata/alarm"
- Add error checks to handle connection failures
- Add params for
Broker, port
Maximum concurrent sent / recev messages
- Dummy function to check claiming status
- Generic mqtt_send command to publish message to a base topic , sub topic
It will end up in the form base_topic/sub_topic
- Add host/port in the connection failure error message
* Test libmosquitto libs
* connect to broker locally (assume localhost:1883)
* subscribe to channel netdata/command
* Test try a reload command to trigger health reload
* publish alerts to netdata/alarm
* - Fix compile issues
* - Use sleep_usec instead of usleep
* - Delay reconnection on failure due to misconfiguration (high cpu usage)
* - Remove the TLS connection config
* - Fix NETDATA_MQTT_INITIALIZATION_SLEEP_WAIT to use seconds
* - Gather ACLK related code under aclk folder
- Add aclk_ functions for abstract layer
- Moved low level libs intergration in mqtt.c
* - Add README.md file with initial comment
* - Clean MQTT v5
* - Code cleanup
* - Remove alarm log for now
- Remove the heart beat
* - Remove message properties for V5
* - Remove message properties for V5 (header)
* Fixed the netdata target to use a local static version of libmosquitto.
The installer does not yet have steps to pull and build the local library.
cd project_root
git clone ssh://git@github.com/netdata/mosquitto mosquitto/
(cd mosquitto/lib && make) # Ignore the cpp error
This will leave mosquitto/lib/libmosquitto.a for the build process to use.
* - Fix compile issues with older < 1.6 libmosquitto lib
* - Enable alarm events to check it works
- Re arrange includes
- Rework topic to be agent/guid/. Actual id will be
returned by the is_agent_claimed
* - Add initial metadata info
- Added helper function in web_api
- Added a debug command (info)
* Update the claiming state to retrieve the claimed id.
* - Use define for constants like command and metadata topics
- Function to wait for initialization of the ACLK link
- New aclk_subscribe command with QOS parameter for the mqtt subscription
- Use the is_agent_claimed function to get the real claim id and use it to build the topics
that will be used for the cloud communication
- Change in netdata-claim.sh.in to write the claim id without a trailing \n
* - Use define for constants like command and metadata topics
- Function to wait for initialization of the ACLK link
- New aclk_subscribe command with QOS parameter for the mqtt subscription
- Use the is_agent_claimed function to get the real claim id and use it to build the topics
that will be used for the cloud communication
- Change in netdata-claim.sh.in to write the claim id without a trailing \n
* - Remove the alarm log for now
- Add code (but disabled) to send charts
* - Use dummy anon, anon as username and password for testing purposes
* - Use client id anon as well
* Testing without TLS
* Switching TLS back on to fix docker environment.
* - Added query processing
An incoming URL now calls web_client_api_request_v1_data to handle a request and push the results
back to the "data" topic
- Move the above processing from the message callback to the query handle loop
- Added helper "pause" , "resume" commands to stop and resume query processing to stress test loading the queue
with queries before executing them
- Changed the endpoint topics to "meta", and "cmd" (previously metadata and command)
* make info message follow protocol
* move metadata msg generation into new func
* move metadata msg generation into new func
* - Add metadata to the responses
- Add hook to queue chart changes on creation and dimensions
- Changed the queue mechanism to include delay for X seconds
- Add delayed submittion of charts to the cloud so that all DIMs are defined to avoid resubmission
* - Add additional data info for aclk_queue command
* - Use web_clinet_api_request_v1 to handle the incoming request
This will handle all requests coming from the cloud
* - Cleanup and aclk_query structure
- Add msg_id parameter
- Enable the incoming JSON request
- Enable the outgoing JSON response
* - Added new thread to handle query processing
- Add lock and cond wait to wakeup thread when queries are submitted
- Cleanup on the main init function
* - Add wait time on agent init, to allow for chart, alarms and other definitions to be completed.
- During the wait time, no queries will be queued
* - Send metadata on query thread init
- New generic create header function for the JSON response
- Pack info and charts into one message
- Modified chart to remove entries (test)
- Modified charts mod to remove entries e.g alarms and volatile info
- Change input to aclk_update_chart (RRDHOST / instead of hostname)
* - When a request fails, add to the payload
- We may need to handle in a different key
- Error check in json parsing
* - Add dummy aclk_update_alarm command
* - Move incoming request JSON parsing code away from mqtt.c
- Added #ifdef ACLK_ENABLE so that we can have code merged but disabled by default
- Added version in incoming and outgoing JSON dict
* - Disable code if ACLK_ENABLE is not defined
- Remove references to the mqtt (mosquitto) lib
- Add dummy stubs in mqtt.c for completeness if ACLK_ENABLE is not defined
* - Disable challenge sample code for now
* - Remove libmosquitto from makefile
* - Fix spaces in Makefile.am
- Remove ifdef to avoid warning from LGTM
* - Remove for now the code that builds an along log test message to send to the cloud
* - Add check for ACLK_ENABLE definition and avoid calling the chart update functions
* - Remove commented code
* - Move source files to the correct place (ACLK_PLUGIN_FILES)
* - Remove include file thats not needed
* - Remove include file thats not needed
- Add improved checks for load_claiming_state()
* - Fix error message. Used error() that also logs errno and message
* - Fix some codacy issues
* - Fix more codacy issues, code cleanup
* - Revert code to address codacy warnings
* - Revert spaces added in a previous commit by mistake
* clean up if/else nest
* print error if fopen fails
* minor - error already logs errno
* - Fix version formatting
* - Cleanup all ACLK related compiler warnings
- Re-arrange include files
- Removed unused defines
* - More compilation warnings fixed
- Bug with thread creation fixed
* - Add condition to skip compilation of the ACLK code entirely. Add env variable ACLK="yes" to enable
* - Add condition to skip the libmosquitto
* - Change feature flag from ACLK_ENABLE to ENABLE_ACLK in accordance with the rest of ENABLE_xx flags
- Typo in info message fix
Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com>
Co-authored-by: Timo <6674623+underhood@users.noreply.github.com>
|
|
* stream_forward: Fix protocol
This commit brings the necessary fixes to the protocol
* stream_forward: Fix old slave support
This commit fixes the communication with old versions of Netdata
* stream_forward: Remove declaration
There was a wrong declaration inside a block, so I am removing it
* stream_forward: USe version
This commit brings the use of version instead flags to stream
* stream_forward: Remove variable
This commit removes useless variable from hand shake
* stream_forward: Change message
Change the message setting the protocol version on it
* stream_forward: Fix version number
* stream_forward: readable definition
The definition and the variables were using the same data type, but with different declaration,
this commit fixes this.
* stream_forward: Set master version inside message
This commit updates the message used that there was a successfull connection with master
* stream_forward: FIx wrong version
This commit fixes the multiple set for stream version
* stream_forward: Reorganize code
This commit reorganizes code to speed up the processing
* stream_forward: Adjust code
This commit removes an unecessary else
* stream_forward: Brings old structure
This commits returns a previous necessary to the code
* stream_forward: fix error report
This commit fixes the error report that was happening when the stream version does not match
* stream_forward: Fixes msg and remove unecessary call
|
|
* Fix memory leaks
* Check for configuration options
* Parse simple tags
* Parse JSON tags
* Remove an unnecessary check
* Parse a JSON object
* Parse a JSON array
* Update the documentation
* Fix host locks
|
|
Improve the metadata detection for containers. The system_info structure has been updated to hold separate copies of OS_NAME, OS_ID, OS_ID_LIKE, OS_VERSION, OS_VERSION_ID and OS_DETECTION for both the container environment and the host. This new information is communicated through the /api/v1/info endpoint. For the streaming interface a partial copy of the info is carried until the stream protocol is upgraded. The anonymous_statistics script has been updated to carry the new data to Google Analytics. Some minor improvements have been made to OS-X / FreeBSD detection, and the detection of virtualization. The docs have been updated to explain how to pass the host environment to the docker container running Netdata.
|
|
* alarm_log_with_labels: Alarm Log
Rebase of alarm log to commit against master
* alarm_log_with_labels: Remove lock
This commit removes unecessary locks from health_log
* alarm_log_with_labels: Restore and Rebase
Remove previous changes and rebase the PR
* alarm_log_with_labels: Unique line
This commit brings an unique line to alarm log
* alarm_log_with_labels: Correct separator
This log file uses tabulation instead comma
* alarm_log_with_labels: Fix memory leak
There was a missing call for buffer_free
|
|
* quotes_labels: Restrict quotes
This commit brings the restriction for the values that will not be allowed to have quotes
* quotes_labels: Documentation
This commit brings update to the documentation
* quotes_labels: Missing comma
This commit brings a missing comma for the documentation
* quotes_labels: Rename variable
The variable was renamed to let code more readable
* quotes_labels: call function
There was a missing call in our utf-8 function, this commit fixes this
* quotes_labels: Remove segmentation fault
The previous code could result in a segmentation fault depending of the label size,
this commit removes this possibility
* quotes_labels: remove unecessary UTF-8
Considering that I am testing all addresses, I am removing the UTf-8 call
* quotes_labels: Rename variable
This commit renames variable according to documentation
* quotes_labels: Comparison to function
this commit converts the comparison to test labels to an unique function
* quotes_labels: Restore name
The new name was breaking compatibility with the structure value
* quotes_labels: Rename function
Rename the function to keep an unique pattern
* quotes_labels: Restore previous utf-8 library
* quotes_labels: Remove missing file
* quotes_labels: Fix grammar
Fix grammar documentation
* quotes_labels: Missing comparison
This commit brings the two missing characters that must be rejected from value
* quotes_labels: Fix grammar again
Fix grammar documentation
|
|
This commit enables streaming host labels
|
|
* error exit when rrdhost localhost init fails #7504
|
|
* Add labels to the JSON exporting connector
* Add labels to the Graphite exporting connector
* Add labels to the OpenTSDB telnet exporting connector
* Add labels to the OpenTSDB HTTP exporting connector
* Replace control characters in JSON strings
* Add unit tests
|
|
* Remove host labels from the Swagger specification
* Remove host labels from the api responses
|
|
* [libnetdata/threads] Add uv_thread_set_name
This is inspired from thread_set_name() but for libuv threads.
Both are based on pthread, but for uv we need to call it with the
uv_thread_t pointer, instead of being the thread that calls the
function for itself.
* [exporting] Set libuv threadname to "EXPORTING-index"
* [database/engine] Set libuv thread name to "DBENGINE"
* [daemon/command] Set libuv thread name to "DAEMON-COMMAND"
* [collectors/proc] Set pthread name to "PLUGIN[cpuidle]"
* Use new 'thread_set_name_np' name
|
|
* fix_db_race_condition: unit test
Adjust unit test for dbengine
* fix_db_race_condition: page cache
Fix database
* fix_db_race_condition: Missing function call
This commit brings the correct function call inside rrdengine.c
|
|
We are removing this fix for further internal testing, it will be returning after we iron out
some bugs.
This reverts commit 53ab093d84919c743450199a31bca9a13412e451.
|
|
|
|
Initial work on host labels from the dedicated branch. Includes work for issues #7096, #7400, #7411, #7369, #7410, #7458, #7459, #7412 and #7408 by @vlvkobal, @thiagoftsm, @cakrit and @amoss.
|
|
* Add top level tests
* Add a skeleton for preparing buffers
* Initialize graphite instance
* Prepare buffers for all instances
* Add Grafite collected value formatter
* Add support for exporting.conf read and parsing
* - Use new exporting_config instead of netdata_config
* Implement Grafite worker
* Disable exporting engine compilation if libuv is not available
* Add mutex locks
- Configure connectors as connector_<type> in sections of exporting.conf
- Change exporting_select_type to check for connector_ fields
* - Override exporting_config structure if there no exporting.conf so that
look ups don't fail and we maintain backwards compatibility
* Separate fixtures in unit tests
* Test exporting_discard_responce
* Test response receiving
* Test buffer sending
* Test simple connector worker
- Instance section has the format connector:instance_name
e.g graphite:my_graphite_instance
- Connectors with : in their name e.g graphite:plaintext are reserved
So graphite:plaintext is not accepted because it would activate an
instance with name "plaintext"
It should be graphite:plaintext:instance_name
* - Enable the add_connector_instance to cleanup the internal structure
by passing NULL,not NULL arguments
* Implement configurable update interval
- Add additional check to verify instance uniqueness across connectors
* Add host and chart filters
* Add the value calculation over a database series
* Add the calculated over stored data graphite connector
* Add tests for graphite connector
* Add JSON connector
* Add tests for JSON formatting functions
* Add OpenTSDB connector
* Add tests for the OpenTSDB connector
* Add temporaty notes to the documentation
|
|
|
|
|
|
* Checkpoint commit (POC)
* Implemented command server in the daemon
* Add netdatacli implementation
* Added prints in command server setup functions
* Make libuv version 1 a hard dependency for the agent
* Additional documentation
* Improved accuracy of names and documentation
* Fixed documentation
* Fixed buffer overflow
* Added support for exit status in cli. Added prefixes for exit code, stdout and stderr. Fixed parsers.
* Fix compilation errors
* Fix compile errors
* Fix compile errors
* Fix compile error
* Fix linker error for muslc
|
|
When a slave had SSL activate for stream and local access it was overwritten the addresses,
this PR fixes this problem that was not allowed the stream works 100%
|
|
* Use 4 spaces for indentation of non-recipe lines in Makefile.am files
* Be more consistent in the use of space before = in Makefile.am files
|
|
* Removed support for 16-bit and 8-bit counter overflow
* Improve behaviour of counter overflow detection versus counter resets.
* Added support for signed 32-bit and 64-bit limits for counter overflows.
* Fixed signed incremental counter issues and added unit tests.
|
|
* Adjust dbengine flushing speed more dynamically
* Added error tracking statistics for failure to flush events
* Added alarm for dbengine flushing errors
* Improved dbengine accounting for commited to be written pages
|
|
Netdata was not able to create charts when id and name were not the same
this could happen when we were using templates, this commit fixes
this specific problem, but it does not fix the problems that we have with
dash and undescore
|
|
|
|
interval rounds to 0 (#7008)
|
|
* Remove page cache error detection and deadlock resolution
* Change page cache logic to disallow deadlocks due to too many API users
* Updated documentation
* Changed default and minimum page cache size values to 32 and 8 MiB respectively
|
|
* Increase database engine default page cache size to support up to 32K metrics out of the box
* Reduce mass flood effect of dbengine page cache alarm
* changed repeating notification to every hour
|
|
* health_nan: fix result
The expression evaluate was keeping the value zero when there was a wrong variable,
but according our documentation the correct result would be NAN, this commit fixes this
* coverity_348642: remove unecessary check
This commit removes an unecessary check for variable in the alarms
* coverity_348642: remove fix
I am removing a change from other commit here
|
|
* Basic functionality for dbengine stress test.
* Fix coverity defects
* Refactored dbengine stress test to be configurable
* Added benchmark results and evaluation in dbengine documentation
* Make dbengine the default memory mode
|
|
reader querying its metrics (#6979)
|
|
* dim_template_fix: Fix lock
We had a double lock before, this commit fix this
* dim_template_fix: Fix order
This commit fix the order process
* dim_template_fix: Return
I am returning for the first solution, because the others are generating:
* dim_template_fix: Try to lock
This solution try to lock the host before to move in front
* dim_template_fix: Move chart lock
To avoid the chart to be deleted while we are linking the alarm
I am moving bak the chart lock
* dim_template_fix: Fix grammar
This commit fixes the grammar of an error message
* dim_template_fix: bring pattern
Bring the defined pattern to the code and use netdata_rwlock_trywrlock
* dim_template_fix Fix format
This commit fixes a format missing
|
|
* fix remark lint Database engine
* fix remark lint of database README
* rewrap dbengine readme for consistency
* rewrap database README
* make character limit to 120 not 80
|
|
* Reduce CPU overhead when flushing dirty pages to disk
|
|
* health_connection: Comments inside Health Config
To try to understand better what is necessary to change and where it is necessary
to change anything inside the health, I commented the functions inside this file"
"
* health_connection: Comments about Health in other files
This commit brings the rest of the comments that were missed for health"
* health_connection: Comments on health_log
I had to append more comments on health_log
* health_connection: Create a new variable
New variable is created to work with foreach
* health_connection: Fix new option and doc
The first implementation of the 'foreach' had a problem, this fixes the error.
This commit also brings the updates for the documentation
* health_connection: Understanding health
This commit is to save the place that I am working, it has the map to understand all the alam process
* health_connection: Update map
I changed the position of the error message to identify the correct place to add new alarms
* health_connection: End of simple alarm
This commit finishes what is necessary to bring the same lookup for different dimensions in one unique line
* health_connection: Documentation and template steps
This commit brings the documentation missed for template and comments to help in the next
step of apply a template to create an alarm.
* health_connection: Restoring
After some tests, it was detected that the alarms were not working as expected
* health_connection: Fix bug and bring dimension to template
This commit brings a fix for an old Netdata bug, before this the Netdata always tried to create
a new entry in an index with the same id raising an error.
It also brings the possibility to use 'foreach' in template
* health_connection: Fix cmake compilation
There was a problem with cmake compilation fixed by this commit
* health_connection: shell script
Finilize the shell script to test the PR
* health_connection: Remove debug message
During the development, I used some messages to understand the code
this commit removes the last message
* health_connection: Fix bugs
This commits fix bugs reported by tests
* health_connection: Alarm working
This commit brings the necessary change for the alarms work, but it is missing the unlink from the newest list
* health_connection: Template code written
This commit finishes the creation of alarm from template, but it was not tested yet.
* health_connection: Remove comments
I am removing the comments from this PR to bring back late
* health_connection: Remove lines
Another commit to restore the files before they to be commented
* health_connection: New alarm and remove messages
I am bringing a new alarm to test template with SP and removing comments used during the development
* health_connection: Functional test review
After to review the functional test script, it was necessary to small adjust to
test all the features available with the new version
* health_connection: Free structure
I am moving the free list for the correct place, the previous place was not safe
* health_connection: ShellCheck
This commit fixes the problems with shellcheck
* health_connection: FIx hash
This commit fix the hash calculation that was using wrong input
* health_connection: Fix message error
The system was showing a wronge message, because when we have foreach
the alarm created with templated is added in a second stage to the index
* health_connection: Fix documentation
In this commit I am fixing the grammar of the previous doc and bringing
two examples
* health_connection: Fix examples
This commit fix the last two examples that was brought in this PR
* health_connection: Fix example doc
When I brought the correct grammar in the last commit, I lost a mark
* health_connection: Grammar fix
Fixing grammar of the documentation
* health_connection: Memory leak
This commit fixes the memory leak that was present in the PR
* health_connection: Reload
This commit fix the problem that the alarms were not linked after
to receive a SIGUSR2
* health_connection: False Positive from codacy
Codacy was given a false positive, I changed the function to avoid it.
* health_connection: dead code
Remove dead code from the code.
* health_connection: Memory Leak
Remove memory leak when clean simple pattern
* health_connection: Script format
With this commit I am formatting the last message to return
for the default color on terminal
* health_connection: Script format 2
With this commit I am formatting the last message to return
for the default color on terminal
* health_connection: Script format 3
With this commit I am formatting the error message to return
for the default color on terminal
|
|
* Detect deadlock in dbengine page cache when there are too many metrics and print error message
* Resolve dbengine deadlock by dropping metrics when page cache is too small and define relevant alarms
* Changed printing deadlock errors to only happen once per dbengine instance
|
|
|
|
* Fix memory corruption during deallocation of page cache
* Refactored dataset generator in order to support the upcoming self-validating stress test and multithreading.
* Fix starvation in database engine loop when the command queues are continuously populated
* Fixing disk quota limits for dbengine dataset generator
|
|
* alarm_clear: Mapping
In this PR I mapped all the necessary steps to discover the solution for the ISSUE 6581
* alarm_clear: Documentation and fixes
This commit fixes the problem that were present in Netdata and it also updates
the documentation of the functions and Netdata.
* alarm_clear: shell script
The original implementation did not have a shell script, here I begin to fix this
* alarm_clear: shell script
It is necessay to verify why make is not producing the same binary than cmake and finish the changes in the script
* alarm_clear: adjust in health.c
I rewrote the health.c to be more readable, but I discovered the problem I had in the last few hours
were due kernel update
* alarm_clear: script changes
In this commit I am bringing the final version of the script that
test the alarm repetition
* alarm_clear: script fix and remove comments
IN this commit I am fixing the shellcheck errors and removing some debug messages
that were present in the code while I was developing
* alarm_clear: Format
The health.c had wrong tabulation, this PR brings back the pattern of space as tab for this file
* alarm_clear: Script
The script was using killlall that is not more present in all Linux distribution
this commit removes this and bring the new way to stop Netdata
* alarm_clear: return to previous tabulation
I am bringing back the old tabulation here and I will create a new PR
exclusively for this
* alarm_clear: Remove comments
I am removing comments from this PR to keep the focus in the major problem
* alarm_clear: Remove comments 2
I forgot one comment
* alarm_clear: New variable
I am appending a new variable in the check before the rebase, because the health.c changed in other file
has a direct relationship with what I did here until now
* alarm_clear: Fix clear repetition
With this last commit, I am bringing a new way to raise the clear alarm, but it is not repeating more
with this fix, it displayed one time when it is cleaned and it will display the message again, if and only if,
the alarm was raised.
|
|
engine. (#6731)
|
|
* Variable Granularity support for data collection in the dbengine.
* Variable Granularity support for data collection in the daemon.
* Added tests to validate the data being queried after having been collected by changing data collection interval
* Fix memory corruption
* Updated database engine documentation about data collection frequency behaviour
|
|
##### Summary
Implements feature #6054
Now requests like
http://localhost:19999/api/v1/chart?chart=example.random
http://localhost:19999/api/v1/data?chart=example.random&options=jsonwrap&options=showcustomvars
- return chart variables in their responses. Chart variables include only those with options set to RRDVAR_OPTION_CUSTOM_CHART_VAR
- for /api/v1/data requests chart variables are returned when parameter options=jsonwrap and options=showcustomvars
##### Component Name
[/database](https://github.com/netdata/netdata/tree/master/database/)
[/web/api/formatters](https://github.com/netdata/netdata/tree/master/web/api/formatters)
|
|
* make remark access all directories
* detailed fix after autofix by remark lint
* cross check autofix for this set of files
* crosscheck more files
* crosschecking and small fixes
* crosscheck autofixed md files
|