Update docs on metric storage (#13327)

This PR - Explains the new tiering mechanism. - Housekeeping docs about Agent's database options. - Updates all the configuration options for the `dbengine`. - Provide a new way for the users to calculate the space they need for their metric storage needs (via a spreadsheet) Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud> Co-authored-by: DShreve2 <david@netdata.cloud>
author: Tasos Katsoulas <tasos@netdata.cloud> 2022-07-14 17:16:12 +0300
committer: GitHub <noreply@github.com> 2022-07-14 17:16:12 +0300
commit: bc5ba4f8912a5436a458d5015f11c06bd046c01a (patch)
tree: 266748494d2acb81827bf73a151ec6e56006e1a1
parent: bd5f778838ff6b4206509a19b967f7b6eb6b16c7 (diff)
5 files changed, 329 insertions, 294 deletions
diff --git a/daemon/config/README.md b/daemon/config/README.md
index 32d64405b5..7b4d27ecff 100644
--- a/daemon/config/README.md
+++ b/daemon/config/README.md
@@ -82,21 +82,33 @@ Please note that your data history will be lost if you have modified `history` p
 
 ### [db] section options
 
-|              setting               |  default   | info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-|:----------------------------------:|:----------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-|                mode                | `dbengine` | `dbengine`: The default for long-term metrics storage with efficient RAM and disk usage. Can be extended with `dbengine page cache size MB` and `dbengine disk space MB`. <br />`save`: Netdata will save its round robin database on exit and load it on startup. <br />`map`: Cache files will be updated in real-time. Not ideal for systems with high load or slow disks (check `man mmap`). <br />`ram`: The round-robin database will be temporary and it will be lost when Netdata exits. <br />`none`: Disables the database at this host, and disables health monitoring entirely, as that requires a database of metrics. |
-|             retention              |   `3600`   | Used with `mode = save/map/ram/alloc`, not the default `mode = dbengine`. This number reflects the number of entries the `netdata` daemon will by default keep in memory for each chart dimension. Check [Memory Requirements](/database/README.md) for more information.                                                                                                                                                                                                                                                                                                                                                           |
-|            update every            |    `1`     | The frequency in seconds, for data collection. For more information see the [performance guide](/docs/guides/configure/performance.md).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-|    dbengine page cache size MB     |     32     | Determines the amount of RAM in MiB that is dedicated to caching Netdata metric values.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-|       dbengine disk space MB       |    256     | Determines the amount of disk space in MiB that is dedicated to storing Netdata metric values and all related metadata describing them.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-|  dbengine multihost disk space MB  |    256     | Same functionality as `dbengine disk space MB`, but includes support for storing metrics streamed to a parent node by its children. Can be used in single-node environments as well.                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-|     memory deduplication (ksm)     |   `yes`    | When set to `yes`, Netdata will offer its in-memory round robin database and the dbengine page cache to kernel same page merging (KSM) for deduplication. For more information check [Memory Deduplication - Kernel Same Page Merging - KSM](/database/README.md#ksm)                                                                                                                                                                                                                                                                                                                                                               |
-| cleanup obsolete charts after secs |   `3600`   | See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also sets the timeout for cleaning up obsolete dimensions                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-|   gap when lost iterations above   |    `1`     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-|  cleanup orphan hosts after secs   |   `3600`   | How long to wait until automatically removing from the DB a remote Netdata host (child) that is no longer sending data.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-|    delete obsolete charts files    |   `yes`    | See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also affects the deletion of files for obsolete dimensions                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-|     delete orphan hosts files      |   `yes`    | Set to `no` to disable non-responsive host removal.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
-|        enable zero metrics         |    `no`    | Set to `yes` to show charts when all their metrics are zero.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+|                    setting                    |  default   | info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|:---------------------------------------------:|:----------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+|                     mode                      | `dbengine` | `dbengine`: The default for long-term metrics storage with efficient RAM and disk usage. Can be extended with `dbengine page cache size MB` and `dbengine disk space MB`. <br />`save`: Netdata will save its round robin database on exit and load it on startup. <br />`map`: Cache files will be updated in real-time. Not ideal for systems with high load or slow disks (check `man mmap`). <br />`ram`: The round-robin database will be temporary and it will be lost when Netdata exits. <br />`none`: Disables the database at this host, and disables health monitoring entirely, as that requires a database of metrics. |
+|                   retention                   |   `3600`   | Used with `mode = save/map/ram/alloc`, not the default `mode = dbengine`. This number reflects the number of entries the `netdata` daemon will by default keep in memory for each chart dimension. Check [Memory Requirements](/database/README.md) for more information.                                                                                                                                                                                                                                                                                                                                                           |
+|                 storage tiers                 |    `1`     | The number of storage tiers you want to have in your dbengine. Check the tiering mechanism in the [dbengine's reference](/database/engine/README.md#tiering). You can have up to 5 tiers of data (including the _Tier 0_). This number ranges between 1 and 5.                                                                                                                                                                                                                                                                                                                                                                      |
+|          dbengine page cache size MB          |    `32`    | Determines the amount of RAM in MiB that is dedicated to caching for _Tier 0_ Netdata metric values.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|   dbengine tier **`N`** page cache size MB    |    `32`    | Determines the amount of RAM in MiB that is dedicated for caching Netdata metric values of the **`N`** tier. <br /> `N belongs to [1..4]`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ||
+ |            dbengine disk space MB             |   `256`    | Determines the amount of disk space in MiB that is dedicated to storing _Tier 0_ Netdata metric values and all related metadata describing them. This option is available **only for legacy configuration** (`Agent v1.23.2 and prior`).                                                                                                                                                                                                                                                                                                                                                                                            |
+|       dbengine multihost disk space MB        |   `256`    | Same functionality as `dbengine disk space MB`, but includes support for storing metrics streamed to a parent node by its children. Can be used in single-node environments as well. This setting is only for _Tier 0_ metrics.                                                                                                                                                                                                                                                                                                                                                                                                     |
+| dbengine tier **`N`** multihost disk space MB |   `256`    | Same functionality as `dbengine multihost disk space MB`, but stores metrics of the **`N`** tier (both parent node and its children). Can be used in single-node environments as well. <br /> `N belongs to [1..4]`                                                                                                                                                                                                                                                                                                                                                                                                                 |
+|                 update every                  |    `1`     | The frequency in seconds, for data collection. For more information see the [performance guide](/docs/guides/configure/performance.md). These metrics stored as _Tier 0_ data. Explore the tiering mechanism in the [dbengine's reference](/database/engine/README.md#tiering).                                                                                                                                                                                                                                                                                                                                                     |
+| dbengine tier **`N`** update every iterations |    `60`    | The down sampling value of each tier from the previous one. For each Tier, the greater by one Tier has N (equal to 60 by default) less data points of any metric it collects. This setting can take values from `2` up to `255`. <br /> `N belongs to [1..4]`                                                                                                                                                                                                                                                                                                                                                                       |
+|        dbengine tier **`N`** back fill        |   `New`    | Specifies the strategy of recreating missing data on each Tier from the exact lower Tier. <br /> `New`: Sees the latest point on each Tier and save new points to it only if the exact lower Tier has available points for it's observation window (`dbengine tier N update every iterations` window). <br /> `none`: No back filling is applied. <br /> `N belongs to [1..4]`                                                                                                                                                                                                                                                      |
+|          memory deduplication (ksm)           |   `yes`    | When set to `yes`, Netdata will offer its in-memory round robin database and the dbengine page cache to kernel same page merging (KSM) for deduplication. For more information check [Memory Deduplication - Kernel Same Page Merging - KSM](/database/README.md#ksm)                                                                                                                                                                                                                                                                                                                                                               |
+|      cleanup obsolete charts after secs       |   `3600`   | See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also sets the timeout for cleaning up obsolete dimensions                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+|        gap when lost iterations above         |    `1`     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+|        cleanup orphan hosts after secs        |   `3600`   | How long to wait until automatically removing from the DB a remote Netdata host (child) that is no longer sending data.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|         delete obsolete charts files          |   `yes`    | See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also affects the deletion of files for obsolete dimensions                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|           delete orphan hosts files           |   `yes`    | Set to `no` to disable non-responsive host removal.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+|              enable zero metrics              |    `no`    | Set to `yes` to show charts when all their metrics are zero.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+
+:::info
+
+The multiplication of all the **enabled** tiers  `dbengine tier N update every iterations` values  must be less than `65535`.
+
+:::
+
 
 ### [directories] section options
 
diff --git a/database/README.md b/database/README.md
index 4873fefd80..119e21336b 100644
--- a/database/README.md
+++ b/database/README.md
@@ -7,199 +7,138 @@ custom_edit_url: https://github.com/netdata/netdata/edit/master/database/README.
 # Database
 
 Netdata is fully capable of long-term metrics storage, at per-second granularity, via its default database engine
-(`dbengine`). But to remain as flexible as possible, Netdata supports a number of types of metrics storage:
+(`dbengine`). But to remain as flexible as possible, Netdata supports several storage options:
 
 1. `dbengine`, (the default) data are in database files. The [Database Engine](/database/engine/README.md) works like a
-    traditional database. There is some amount of RAM dedicated to data caching and indexing and the rest of the data
-    reside compressed on disk. The number of history entries is not fixed in this case, but depends on the configured
-    disk space and the effective compression ratio of the data stored. This is the **only mode** that supports changing
-    the data collection update frequency (`update every`) **without losing** the previously stored metrics. For more
-    details see [here](/database/engine/README.md).
+   traditional database. There is some amount of RAM dedicated to data caching and indexing and the rest of the data
+   reside compressed on disk. The number of history entries is not fixed in this case, but depends on the configured
+   disk space and the effective compression ratio of the data stored. This is the **only mode** that supports changing
+   the data collection update frequency (`update every`) **without losing** the previously stored metrics. For more
+   details see [here](/database/engine/README.md).
 
-2.  `ram`, data are purely in memory. Data are never saved on disk. This mode uses `mmap()` and supports [KSM](#ksm).
+2. `ram`, data are purely in memory. Data are never saved on disk. This mode uses `mmap()` and supports [KSM](#ksm).
 
-3.  `save`, data are only in RAM while Netdata runs and are saved to / loaded from disk on Netdata
-    restart. It also uses `mmap()` and supports [KSM](#ksm).
+3. `save`, data are only in RAM while Netdata runs and are saved to / loaded from disk on Netdata restart. It also
+   uses `mmap()` and supports [KSM](#ksm).
 
-4.  `map`, data are in memory mapped files. This works like the swap. Keep in mind though, this will have a constant
-    write on your disk. When Netdata writes data on its memory, the Linux kernel marks the related memory pages as dirty
-    and automatically starts updating them on disk. Unfortunately we cannot control how frequently this works. The Linux
-    kernel uses exactly the same algorithm it uses for its swap memory. Check below for additional information on
-    running a dedicated central Netdata server. This mode uses `mmap()` but does not support [KSM](#ksm).
+4. `map`, data are in memory mapped files. This works like the swap. When Netdata writes data on its memory, the Linux
+   kernel marks the related memory pages as dirty and automatically starts updating them on disk. Unfortunately we
+   cannot control how frequently this works. The Linux kernel uses exactly the same algorithm it uses for its swap
+   memory. This mode uses `mmap()` but does not support [KSM](#ksm). _Keep in mind though, this option will have a
+   constant write on your disk._
 
-5.  `none`, without a database (collected metrics can only be streamed to another Netdata).
+5. `alloc`, like `ram` but it uses `calloc()` and does not support [KSM](#ksm). This mode is the fallback for all others
+   except `none`.
 
-6.  `alloc`, like `ram` but it uses `calloc()` and does not support [KSM](#ksm). This mode is the fallback for all
-    others except `none`.
+6. `none`, without a database (collected metrics can only be streamed to another Netdata).
 
-You can select the database mode by editing `netdata.conf` and setting:
-
-```conf
-[db]
-  # dbengine (default), ram, save (the default if dbengine not available), map (swap like), none, alloc
-  mode = dbengine
-```
-
-## Running Netdata in embedded devices
-
-Embedded devices usually have very limited RAM resources available.
-
-There are 2 settings for you to tweak:
-
-1.  `[db].update every`, which controls the data collection frequency
-2.  `[db].retention`, which controls the size of the database in memory (except for `[db].mode = dbengine`)
-
-By default `[db].update every = 1` and `[db].retention = 3600`. This gives you an hour of data with per second updates.
-
-If you set `[db].update every = 2` and `[db].retention = 1800`, you will still have an hour of data, but collected once every 2
-seconds. This will **cut in half** both CPU and RAM resources consumed by Netdata. Of course experiment a bit. On very
-weak devices you might have to use `[db].update every = 5` and `[db].retention = 720` (still 1 hour of data, but 1/5 of the CPU and
-RAM resources).
-
-You can also disable [data collection plugins](/collectors/README.md) you don't need. Disabling such plugins will also free both
-CPU and RAM resources.
-
-## Running a dedicated parent Netdata server
-
-Netdata allows streaming data between Netdata nodes in real-time. This allows having one or more parent Netdata servers that will maintain
-the entire database for all the nodes that connect to them (their children), and will also run health checks/alarms for all these nodes.
-
-### map
+## Which database mode to use
 
-In this mode, the database of Netdata is stored in memory mapped files. Netdata continues to read and write the database
-in memory, but the kernel automatically loads and saves memory pages from/to disk.
+The default mode `[db].mode = dbengine` has been designed to scale for longer retentions and is the only mode suitable
+for parent Agents in the _Parent - Child_ setups
 
-**We suggest _not_ to use this mode on nodes that run other applications.** There will always be dirty memory to be
-synced and this syncing process may influence the way other applications work. This mode however is useful when we need
-a parent Netdata server that would normally need huge amounts of memory.
+The other available database modes are designed to minimize resource utilization and should only be considered on
+[Parent - Child](/docs/metrics-storage-management/how-streaming-works) setups at the children side and only when the
+resource constraints are very strict.
 
-There are a few kernel options that provide finer control on the way this syncing works. But before explaining them, a
-brief introduction of how Netdata database works is needed.
+So,
 
-For each chart, Netdata maps the following files:
+- On a single node setup, use `[db].mode = dbengine`.
+- On a [Parent - Child](/docs/metrics-storage-management/how-streaming-works) setup, use `[db].mode = dbengine` on the
+  parent to increase retention, a more resource efficient mode like, `dbengine` with light retention settings, and
+  `save`, `ram` or `none` modes for the children to minimize resource utilization.
 
-1.  `chart/main.db`, this is the file that maintains chart information. Every time data are collected for a chart, this
-    is updated.
-2.  `chart/dimension_name.db`, this is the file for each dimension. At its beginning there is a header, followed by the
-    round robin database where metrics are stored.
+## Choose your database mode
 
-So, every time Netdata collects data, the following pages will become dirty:
-
-1.  the chart file
-2.  the header part of all dimension files
-3.  if the collected metrics are stored far enough in the dimension file, another page will become dirty, for each
-    dimension
-
-Each page in Linux is 4KB. So, with 200 charts and 1000 dimensions, there will be 1200 to 2200 4KB pages dirty pages
-every second. Of course 1200 of them will always be dirty (the chart header and the dimensions headers) and 1000 will be
-dirty for about 1000 seconds (4 bytes per metric, 4KB per page, so 1000 seconds, or 16 minutes per page).
-
-Hopefully, the Linux kernel does not sync all these data every second. The frequency they are synced is controlled by
-`/proc/sys/vm/dirty_expire_centisecs` or the `sysctl` `vm.dirty_expire_centisecs`. The default on most systems is 3000
-(30 seconds).
-
-On a busy server centralizing metrics from 20+ servers you will experience this:
-
-![image](https://cloud.githubusercontent.com/assets/2662304/23834750/429ab0dc-0764-11e7-821a-d7908bc881ac.png)
-
-As you can see, there is quite some stress (this is `iowait`) every 30 seconds.
-
-A simple solution is to increase this time to 10 minutes (60000). This is the same system with this setting in 10
-minutes:
-
-![image](https://cloud.githubusercontent.com/assets/2662304/23834784/d2304f72-0764-11e7-8389-fb830ffd973a.png)
-
-Of course, setting this to 10 minutes means that data on disk might be up to 10 minutes old if you get an abnormal
-shutdown.
+You can select the database mode by editing `netdata.conf` and setting:
 
-There are 2 more options to tweak:
+```conf
+[db]
+  # dbengine (default), ram, save (the default if dbengine not available), map (swap like), none, alloc
+  mode = dbengine
+```
 
-1.  `dirty_background_ratio`, by default `10`.
-2.  `dirty_ratio`, by default `20`.
+## Netdata Longer Metrics Retention
 
-These control the amount of memory that should be dirty for disk syncing to be triggered. On dedicated Netdata servers,
-you can use: `80` and `90` respectively, so that all RAM is given to Netdata.
+Metrics retention is controlled only by the disk space allocated to storing metrics. But it also affects the memory and
+CPU required by the agent to query longer timeframes.
 
-With these settings, you can expect a little `iowait` spike once every 10 minutes and in case of system crash, data on
-disk will be up to 10 minutes old.
+Since Netdata Agents usually run on the edge, on production systems, Netdata Agent **parents** should be considered.
+When having a [**parent - child**](/docs/metrics-storage-management/how-streaming-works.md) setup, the child (the
+Netdata Agent running on a production system) delegates all of its functions, including longer metrics retention and
+querying, to the parent node that can dedicate more resources to this task. A single Netdata Agent parent can centralize
+multiple children Netdata Agents (dozens, hundreds, or even thousands depending on its available resources).
 
-![image](https://cloud.githubusercontent.com/assets/2662304/23835030/ba4bf506-0768-11e7-9bc6-3b23e080c69f.png)
+## Running Netdata on embedded devices
 
-To have these settings automatically applied on boot, create the file `/etc/sysctl.d/netdata-memory.conf` with these
-contents:
+Embedded devices typically have very limited RAM resources available.
 
-```conf
-vm.dirty_expire_centisecs = 60000
-vm.dirty_background_ratio = 80
-vm.dirty_ratio = 90
-vm.dirty_writeback_centisecs = 0
-```
+There are two settings for you to configure:
 
-There is another mode to help overcome the memory size problem. What is **most interesting for this setup** is
-`[db].mode = dbengine`.
+1. `[db].update every`, which controls the data collection frequency
+2. `[db].retention`, which controls the size of the database in memory (except for `[db].mode = dbengine`)
 
-### dbengine
+By default `[db].update every = 1` and `[db].retention = 3600`. This gives you an hour of data with per second updates.
 
-In this mode, the database of Netdata is stored in database files. The [Database Engine](/database/engine/README.md)
-works like a traditional database. There is some amount of RAM dedicated to data caching and indexing and the rest of
-the data reside compressed on disk. The number of history entries is not fixed in this case, but depends on the
-configured disk space and the effective compression ratio of the data stored.
+If you set `[db].update every = 2` and `[db].retention = 1800`, you will still have an hour of data, but collected once
+every 2 seconds. This will **cut in half** both CPU and RAM resources consumed by Netdata. Of course experiment a bit to find the right setting.
+On very weak devices you might have to use `[db].update every = 5` and `[db].retention = 720` (still 1 hour of data, but
+1/5 of the CPU and RAM resources).
 
-We suggest to use **this** mode on nodes that also run other applications. The Database Engine uses direct I/O to avoid
-polluting the OS filesystem caches and does not generate excessive I/O traffic so as to create the minimum possible
-interference with other applications. Using mode `dbengine` we can overcome most memory restrictions. For more
-details see [here](/database/engine/README.md).
+You can also disable [data collection plugins](/collectors/README.md) that you don't need. Disabling such plugins will also
+free both CPU and RAM resources.
 
-## KSM
+## Memory optimizations
 
-Netdata offers all its in-memory database to kernel for deduplication.
+### KSM
 
-In the past KSM has been criticized for consuming a lot of CPU resources. Although this is true when KSM is used for
-deduplicating certain applications, it is not true with netdata, since the Netdata memory is written very infrequently
-(if you have 24 hours of metrics in netdata, each byte at the in-memory database will be updated just once per day).
+KSM performs memory deduplication by scanning through main memory for physical pages that have identical content, and
+identifies the virtual pages that are mapped to those physical pages. It leaves one page unchanged, and re-maps each
+duplicate page to point to the same physical page. Netdata offers all of its in-memory database to kernel for
+deduplication.
 
-KSM is a solution that will provide 60+% memory savings to Netdata.
+In the past, KSM has been criticized for consuming a lot of CPU resources. This is true when KSM is used for
+deduplicating certain applications, but it is not true for Netdata. Agent's memory is written very infrequently
+(if you have 24 hours of metrics in Netdata, each byte at the in-memory database will be updated just once per day). KSM
+is a solution that will provide 60+% memory savings to Netdata.
 
 ### Enable KSM in kernel
 
-You need to run a kernel compiled with:
+To enable KSM in kernel, you need to run a kernel compiled with the following:
 
 ```sh
 CONFIG_KSM=y
 ```
 
-When KSM is enabled at the kernel is just available for the user to enable it.
+When KSM is enabled at the kernel, it is just available for the user to enable it.
 
-So, if you build a kernel with `CONFIG_KSM=y` you will just get a few files in `/sys/kernel/mm/ksm`. Nothing else
-happens. There is no performance penalty (apart I guess from the memory this code occupies into the kernel).
+If you build a kernel with `CONFIG_KSM=y`, you will just get a few files in `/sys/kernel/mm/ksm`. Nothing else
+happens. There is no performance penalty (apart from the memory this code occupies into the kernel).
 
 The files that `CONFIG_KSM=y` offers include:
 
--   `/sys/kernel/mm/ksm/run` by default `0`. You have to set this to `1` for the
-    kernel to spawn `ksmd`.
--   `/sys/kernel/mm/ksm/sleep_millisecs`, by default `20`. The frequency ksmd
-    should evaluate memory for deduplication.
--   `/sys/kernel/mm/ksm/pages_to_scan`, by default `100`. The amount of pages
-    ksmd will evaluate on each run.
+- `/sys/kernel/mm/ksm/run` by default `0`. You have to set this to `1` for the kernel to spawn `ksmd`.
+- `/sys/kernel/mm/ksm/sleep_millisecs`, by default `20`. The frequency ksmd should evaluate memory for deduplication.
+- `/sys/kernel/mm/ksm/pages_to_scan`, by default `100`. The amount of pages ksmd will evaluate on each run.
 
 So, by default `ksmd` is just disabled. It will not harm performance and the user/admin can control the CPU resources
-he/she is willing `ksmd` to use.
+they are willing to have used by `ksmd`.
 
 ### Run `ksmd` kernel daemon
 
-To activate / run `ksmd` you need to run:
+To activate / run `ksmd,` you need to run the following:
 
 ```sh
 echo 1 >/sys/kernel/mm/ksm/run
 echo 1000 >/sys/kernel/mm/ksm/sleep_millisecs
 ```
 
-With these settings ksmd does not even appear in the running process list (it will run once per second and evaluate 100
+With these settings, ksmd does not even appear in the running process list (it will run once per second and evaluate 100
 pages for de-duplication).
 
 Put the above lines in your boot sequence (`/etc/rc.local` or equivalent) to have `ksmd` run at boot.
 
-## Monitoring Kernel Memory de-duplication performance
+### Monitoring Kernel Memory de-duplication performance
 
 Netdata will create charts for kernel memory de-duplication performance, like this:
 
diff --git a/database/engine/README.md b/database/engine/README.md
index e48bee59b4..c67e400f4c 100644
--- a/database/engine/README.md
+++ b/database/engine/README.md
@@ -6,74 +6,114 @@ custom_edit_url: https://github.com/netdata/netdata/edit/master/database/engine/
 
 # Database engine
 
-The Database Engine works like a traditional database. It dedicates a certain amount of RAM to data caching and
-indexing, while the rest of the data resides compressed on disk. Unlike other [database modes](/database/README.md), the
-amount of historical metrics stored is based on the amount of disk space you allocate and the effective compression
+The Database Engine works like a traditional time series database. Unlike other [database modes](/database/README.md),
+the amount of historical metrics stored is based on the amount of disk space you allocate and the effective compression
 ratio, not a fixed number of metrics collected.
 
-By using both RAM and disk space, the database engine allows for long-term storage of per-second metrics inside of the
-Agent itself.
+## Tiering
 
-In addition, the dbengine is the only mode that supports changing the data collection update frequency
-(`update every`) without losing the metrics your Agent already gathered and stored.
+Tiering is a mechanism of providing multiple tiers of data with
+different [granularity on metrics](/docs/store/distributed-data-architecture.md#granularity-of-metrics).
 
-## Configuration
+For Netdata Agents with version `netdata-1.35.0.138.nightly` and greater, `dbengine` supports Tiering, allowing almost
+unlimited retention of data.
 
-To use the database engine, open `netdata.conf` and set `[db].mode` to `dbengine`.
 
-```conf
+### Metric size
+
+Every Tier down samples the exact lower tier (lower tiers have greater resolution). You can have up to 5
+Tiers **[0. . 4]** of data (including the Tier 0, which has the highest resolution)
+
+Tier 0 is the default that was always available in `dbengine` mode. Tier 1 is the first level of aggregation, Tier 2 is
+the second, and so on.
+
+Metrics on all tiers except of the _Tier 0_ also store the following five additional values for every point for accurate
+representation:
+
+1. The `sum` of the points aggregated
+2. The `min` of the points aggregated
+3. The `max` of the points aggregated
+4. The `count` of the points aggregated (could be constant, but it may not be due to gaps in data collection)
+5. The `anomaly_count` of the points aggregated (how many of the aggregated points found anomalous)
+
+Among `min`, `max` and `sum`, the correct value is chosen based on the user query. `average` is calculated on the fly at
+query time.
+
+### Tiering in a nutshell
+
+The `dbengine` is capable of retaining metrics for years. To further understand the `dbengine` tiering mechanism let's
+explore the following configuration.
+
+```
 [db]
     mode = dbengine
+    
+    # per second data collection
+    update every = 1
+    
+    # enables Tier 1 and Tier 2, Tier 0 is always enabled in dbengine mode
+    storage tiers = 3
+    
+    # Tier 0, per second data for a week
+    dbengine multihost disk space MB = 1100
+    
+    # Tier 1, per minute data for a month
+    dbengine tier 1 multihost disk space MB = 330
+
+    # Tier 2, per hour data for a year
+    dbengine tier 2 multihost disk space MB = 67
 ```
 
-To configure the database engine, look for the `dbengine page cache size MB` and `dbengine multihost disk space MB` settings in the
-`[db]` section of your `netdata.conf`. The Agent ignores the `[db].retention` setting when using the dbengine.
+For 2000 metrics, collected every second and retained for a week, Tier 0 needs: 1 byte x 2000 metrics x 3600 secs per
+hour x 24 hours per day x 7 days per week = 1100MB.
 
-```conf
-[db]
-    dbengine page cache size MB = 32
-    dbengine multihost disk space MB = 256
-```
+By setting `dbengine multihost disk space MB` to `1100`, this node will start maintaining about a week of data. But pay
+attention to the number of metrics. If you have more than 2000 metrics on a node, or you need more that a week of high
author	Tasos Katsoulas <tasos@netdata.cloud>	2022-07-14 17:16:12 +0300
committer	GitHub <noreply@github.com>	2022-07-14 17:16:12 +0300
commit	bc5ba4f8912a5436a458d5015f11c06bd046c01a (patch)
tree	266748494d2acb81827bf73a151ec6e56006e1a1
parent	bd5f778838ff6b4206509a19b967f7b6eb6b16c7 (diff)