diff options
author | Chris Akritidis <43294513+cakrit@users.noreply.github.com> | 2023-04-17 06:40:13 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-04-17 06:40:13 -0700 |
commit | 227988441cb0b275b179617990f8bfe82cf322e3 (patch) | |
tree | 397530ce5038180920a224375d917100fe325c19 | |
parent | a4f8dc6cde66f69760506e57958df1e03f4ea7b0 (diff) |
Add section for scaling parent nodes (#14915)
Update change-metrics-storage.md
-rw-r--r-- | docs/store/change-metrics-storage.md | 12 |
1 files changed, 12 insertions, 0 deletions
diff --git a/docs/store/change-metrics-storage.md b/docs/store/change-metrics-storage.md index a7d7e77456..5e14fe2472 100644 --- a/docs/store/change-metrics-storage.md +++ b/docs/store/change-metrics-storage.md @@ -127,6 +127,8 @@ The Netdata parent in our production infrastructure at the time of writing: - 3 tiers are used for retention - The `dbengine page cache size MB` in `netdata.conf` is configured to be 4GB +Netdata parents can end up collecting millions of metrics per second. See also [scaling dedicated parent nodes](#scaling-dedicated-parent-nodes). + The rule of thumb calculation for this set up gives us ``` DBENGINE memory = 206,000 x 16 / 1024 MiB = 3,217 MiB = about 3 GiB @@ -193,3 +195,13 @@ All new child nodes are automatically transferred to the multihost dbengine inst space. If you want to migrate a child node from its legacy dbengine instance to the multihost dbengine instance, you must delete the instance's directory, which is located in `/var/cache/netdata/MACHINE_GUID/dbengine`, after stopping the Agent. + +## Scaling dedicated parent nodes + +When you use streaming in medium to large infrastructures, you can have potentially millions of metrics per second reaching each parent node. +In the lab we have reliably collected 1 million metrics/sec with 16cores and 32GB RAM. + +Our suggestion for scaling parents is to have them running on dedicated VMs, using a maximum of 50% of cpu, and ensuring you have enough RAM +for the desired retention. When your infrastructure can lead a parent to exceed these characteristics, split the load to multiple parents that +do not communicate with each other. With each child sending data to only one of the parents, you can still have replication, high availability, +and infrastructure level observability via the Netdata Cloud UI. |