summaryrefslogtreecommitdiffstats
path: root/src
diff options
context:
space:
mode:
authorthiagoftsm <thiagoftsm@gmail.com>2024-02-06 09:42:08 +0000
committerGitHub <noreply@github.com>2024-02-06 11:42:08 +0200
commit4d4459e87876e03b27cf9550ea892050afa9bc3c (patch)
treed99df49a39445f109bdcaebe9a26db6180c38bb3 /src
parentfc1396bccdc4f4826692527319cdccea0009eb58 (diff)
Update documentation (Replication DB) (#16816)
Co-authored-by: ilyam8 <ilya@netdata.cloud>
Diffstat (limited to 'src')
-rw-r--r--src/streaming/README.md103
1 files changed, 66 insertions, 37 deletions
diff --git a/src/streaming/README.md b/src/streaming/README.md
index a0a82cc1e5..fe2d7c77f9 100644
--- a/src/streaming/README.md
+++ b/src/streaming/README.md
@@ -30,42 +30,42 @@ node**. This file is automatically generated by Netdata the first time it is sta
#### `[stream]` section
-| Setting | Default | Description |
-|-------------------------------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `enabled` | `no` | Whether this node streams metrics to any parent. Change to `yes` to enable streaming. |
-| [`destination`](#destination) | | A space-separated list of parent nodes to attempt to stream to, with the first available parent receiving metrics, using the following format: `[PROTOCOL:]HOST[%INTERFACE][:PORT][:SSL]`. [Read more &rarr;](#destination) |
-| `ssl skip certificate verification` | `yes` | If you want to accept self-signed or expired certificates, set to `yes` and uncomment. |
-| `CApath` | `/etc/ssl/certs/` | The directory where known certificates are found. Defaults to OpenSSL's default path. |
-| `CAfile` | `/etc/ssl/certs/cert.pem` | Add a parent node certificate to the list of known certificates in `CAPath`. |
-| `api key` | | The `API_KEY` to use as the child node. |
-| `timeout seconds` | `60` | The timeout to connect and send metrics to a parent. |
-| `default port` | `19999` | The port to use if `destination` does not specify one. |
-| [`send charts matching`](#send-charts-matching) | `*` | A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md) to filter which charts are streamed. [Read more &rarr;](#send-charts-matching) |
-| `buffer size bytes` | `10485760` | The size of the buffer to use when sending metrics. The default `10485760` equals a buffer of 10MB, which is good for 60 seconds of data. Increase this if you expect latencies higher than that. The buffer is flushed on reconnect. |
-| `reconnect delay seconds` | `5` | How long to wait until retrying to connect to the parent node. |
-| `initial clock resync iterations` | `60` | Sync the clock of charts for how many seconds when starting. |
-| `parent using h2o` | `no` | Set to yes if you are connecting to parent trough it's h2o webserver/port. Currently there is no reason to set this to `yes` unless you are testing the new h2o based netdata webserver. When production ready this will be set to `yes` as default. |
+| Setting | Default | Description |
+|-------------------------------------------------|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `enabled` | `no` | Whether this node streams metrics to any parent. Change to `yes` to enable streaming. |
+| [`destination`](#destination) | | A space-separated list of parent nodes to attempt to stream to, with the first available parent receiving metrics, using the following format: `[PROTOCOL:]HOST[%INTERFACE][:PORT][:SSL]`. [Read more &rarr;](#destination) |
+| `ssl skip certificate verification` | `yes` | If you want to accept self-signed or expired certificates, set to `yes` and uncomment. |
+| `CApath` | `/etc/ssl/certs/` | The directory where known certificates are found. Defaults to OpenSSL's default path. |
+| `CAfile` | `/etc/ssl/certs/cert.pem` | Add a parent node certificate to the list of known certificates in `CAPath`. |
+| `api key` | | The `API_KEY` to use as the child node. |
+| `timeout seconds` | `60` | The timeout to connect and send metrics to a parent. |
+| `default port` | `19999` | The port to use if `destination` does not specify one. |
+| [`send charts matching`](#send-charts-matching) | `*` | A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md) to filter which charts are streamed. [Read more &rarr;](#send-charts-matching) |
+| `buffer size bytes` | `10485760` | The size of the buffer to use when sending metrics. The default `10485760` equals a buffer of 10MB, which is good for 60 seconds of data. Increase this if you expect latencies higher than that. The buffer is flushed on reconnect. |
+| `reconnect delay seconds` | `5` | How long to wait until retrying to connect to the parent node. |
+| `initial clock resync iterations` | `60` | Sync the clock of charts for how many seconds when starting. |
+| `parent using h2o` | `no` | Set to yes if you are connecting to parent trough it's h2o webserver/port. Currently there is no reason to set this to `yes` unless you are testing the new h2o based netdata webserver. When production ready this will be set to `yes` as default. |
### `[API_KEY]` and `[MACHINE_GUID]` sections
-| Setting | Default | Description |
-|-----------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `enabled` | `no` | Whether this API KEY enabled or disabled. |
+| Setting | Default | Description |
+|-----------------------------------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `enabled` | `no` | Whether this API KEY enabled or disabled. |
| [`allow from`](#allow-from) | `*` | A space-separated list of [Netdata simple patterns](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md) matching the IPs of nodes that will stream metrics using this API key. [Read more &rarr;](#allow-from) |
-| `default history` | `3600` | The default amount of child metrics history to retain when using the `ram` memory mode. |
+| `default history` | `3600` | The default amount of child metrics history to retain when using the `ram` memory mode. |
| [`default memory mode`](#default-memory-mode) | `ram` | The [database](https://github.com/netdata/netdata/blob/master/src/database/README.md) to use for all nodes using this `API_KEY`. Valid settings are `dbengine`, `ram`, or `none`. [Read more &rarr;](#default-memory-mode) |
-| `health enabled by default` | `auto` | Whether alerts and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alerts when the child is connected. `yes` enables alerts always, and `no` disables alerts. |
-| `default postpone alarms on connect seconds` | `60` | Postpone alerts and notifications for a period of time after the child connects. |
-| `default health log history` | `432000` | History of health log events (in seconds) kept in the database. |
-| `default proxy enabled` | | Route metrics through a proxy. |
-| `default proxy destination` | | Space-separated list of `IP:PORT` for proxies. |
-| `default proxy api key` | | The `API_KEY` of the proxy. |
-| `default send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). |
-| `enable compression` | `yes` | Enable/disable stream compression. |
-| `enable replication` | `yes` | Enable/disable replication. |
-| `seconds to replicate` | `86400` | How many seconds of data to replicate from each child at a time |
-| `seconds per replication step` | `600` | The duration we want to replicate per each replication step. |
-| `is ephemeral node` | `no` | Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable after the specified duration of "cleanup ephemeral hosts after secs" from the time of the node's last connection. |
+| `health enabled by default` | `auto` | Whether alerts and notifications should be enabled for nodes using this `API_KEY`. `auto` enables alerts when the child is connected. `yes` enables alerts always, and `no` disables alerts. |
+| `default postpone alarms on connect seconds` | `60` | Postpone alerts and notifications for a period of time after the child connects. |
+| `default health log history` | `432000` | History of health log events (in seconds) kept in the database. |
+| `default proxy enabled` | | Route metrics through a proxy. |
+| `default proxy destination` | | Space-separated list of `IP:PORT` for proxies. |
+| `default proxy api key` | | The `API_KEY` of the proxy. |
+| `default send charts matching` | `*` | See [`send charts matching`](#send-charts-matching). |
+| `enable compression` | `yes` | Enable/disable stream compression. |
+| `enable replication` | `yes` | Enable/disable replication. |
+| `seconds to replicate` | `86400` | How many seconds of data to replicate from each child at a time |
+| `seconds per replication step` | `600` | The duration we want to replicate per each replication step. |
+| `is ephemeral node` | `no` | Indicate whether this child is an ephemeral node. An ephemeral node will become unavailable after the specified duration of "cleanup ephemeral hosts after secs" from the time of the node's last connection. |
#### `destination`
@@ -148,13 +148,13 @@ cache size` and `dbengine multihost disk space` settings in the `[global]` secti
### `netdata.conf`
-| Setting | Default | Description |
-|--------------------------------------------|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `[global]` section | | |
+| Setting | Default | Description |
+|--------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `[global]` section | | |
| `memory mode` | `dbengine` | Determines the [database type](https://github.com/netdata/netdata/blob/master/src/database/README.md) to be used on that node. Other options settings include `none`, and `ram`. `none` disables the database at this host. This also disables alerts and notifications, as those can't run without a database. |
-| `[web]` section | | |
-| `mode` | `static-threaded` | Determines the [web server](https://github.com/netdata/netdata/blob/master/web/server/README.md) type. The other option is `none`, which disables the dashboard, API, and registry. |
-| `accept a streaming request every seconds` | `0` | Set a limit on how often a parent node accepts streaming requests from child nodes. `0` equals no limit. If this is set, you may see `... too busy to accept new streaming request. Will be allowed in X secs` in Netdata's `error.log`. |
+| `[web]` section | | |
+| `mode` | `static-threaded` | Determines the [web server](https://github.com/netdata/netdata/blob/master/web/server/README.md) type. The other option is `none`, which disables the dashboard, API, and registry. |
+| `accept a streaming request every seconds` | `0` | Set a limit on how often a parent node accepts streaming requests from child nodes. `0` equals no limit. If this is set, you may see `... too busy to accept new streaming request. Will be allowed in X secs` in Netdata's `error.log`. |
### Basic use cases
@@ -488,6 +488,35 @@ You can monitor the replication process in two ways:
1. **Netdata Monitoring**: access the Netdata Monitoring section and look for the Replication charts.
2. **Streaming Function**: use the Streaming function (Top) to see the replication status of children nodes. This function provides real-time insights into the replication status of each child node.
+### Replication history
+
+Replication history in [dbengine](https://github.com/netdata/netdata/blob/master/src/database/README.md) mode is limited
+by [Tier 0 retention](https://learn.netdata.cloud/docs/configuring/optimizing-metrics-database/change-how-long-netdata-stores-metrics#effect-of-storage-tiers-and-disk-space-on-retention):
+
+- Child instances replicate only Tier 0 data.
+- Parent instance calculates higher-level tiers using Tier 0 as the basis.
+
+Extend replication history by increasing Tier 0 retention.
+
+Checking Tier 0 retention:
+
+- Using a web browser:
+ - Navigate to `http://{CHILD_IP}:19999/api/v2/node_instances`.
+ - Locate the `expected_retention` value for Tier 0 of your Agent.
+ - Convert the value from seconds to days for a more meaningful representation.
+- Using `curl` and `jq`:
+ - Execute the following command:
+ ```bash
+ $ curl -s "http://{CHILD_IP}:19999/api/v2/node_instances" | jq '.agents[] | {nm, retention: (.db_size[0].retention / 86400 | .*100 | round/100) }'
+ ```
+ - Example output:
+ ```json
+ {
+ "nm": "myhost",
+ "retention": 12.73
+ }
+ ```
+
## Troubleshooting
Both parent and child nodes log information at `/var/log/netdata/error.log`.