Make dbengine the default memory mode (#6977)

* Basic functionality for dbengine stress test. * Fix coverity defects * Refactored dbengine stress test to be configurable * Added benchmark results and evaluation in dbengine documentation * Make dbengine the default memory mode
author: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com> 2019-10-03 17:04:51 +0300
committer: GitHub <noreply@github.com> 2019-10-03 17:04:51 +0300
commit: 95119afff48735607643bfe3824ed3727b6edbb0 (patch)
tree: 7d588b0f7131743d58386c100cb9fb2b0d97c187 /database
parent: 06cdca8fdfb5f8af43a368e9afe0e996fb1ea8fd (diff)
5 files changed, 65 insertions, 52 deletions
diff --git a/database/README.md b/database/README.md
index 1efdd9a94b..143615a0e8 100644
--- a/database/README.md
+++ b/database/README.md
@@ -25,7 +25,7 @@ Currently Netdata supports 6 memory modes:
 
 1.  `ram`, data are purely in memory. Data are never saved on disk. This mode uses `mmap()` and supports [KSM](#ksm).
 
-2.  `save`, (the default) data are only in RAM while Netdata runs and are saved to / loaded from disk on Netdata
+2.  `save`, data are only in RAM while Netdata runs and are saved to / loaded from disk on Netdata
     restart. It also uses `mmap()` and supports [KSM](#ksm).
 
 3.  `map`, data are in memory mapped files. This works like the swap. Keep in mind though, this will have a constant
@@ -39,11 +39,12 @@ Currently Netdata supports 6 memory modes:
 5.  `alloc`, like `ram` but it uses `calloc()` and does not support [KSM](#ksm). This mode is the fallback for all
     others except `none`.
 
-6.  `dbengine`, data are in database files. The [Database Engine](engine/) works like a traditional database. There is
-    some amount of RAM dedicated to data caching and indexing and the rest of the data reside compressed on disk. The
-    number of history entries is not fixed in this case, but depends on the configured disk space and the effective
-    compression ratio of the data stored. This is the **only mode** that supports changing the data collection update
-    frequency (`update_every`) **without losing** the previously stored metrics. For more details see [here](engine/).
+6.  `dbengine`, (the default) data are in database files. The [Database Engine](engine/) works like a traditional
+    database. There is some amount of RAM dedicated to data caching and indexing and the rest of the data reside
+    compressed on disk. The number of history entries is not fixed in this case, but depends on the configured disk
+    space and the effective compression ratio of the data stored. This is the **only mode** that supports changing the
+    data collection update frequency (`update_every`) **without losing** the previously stored metrics. For more details
+    see [here](engine/).
 
 You can select the memory mode by editing `netdata.conf` and setting:
 
@@ -63,7 +64,7 @@ Embedded devices usually have very limited RAM resources available.
 There are 2 settings for you to tweak:
 
 1.  `update every`, which controls the data collection frequency
-2.  `history`, which controls the size of the database in RAM
+2.  `history`, which controls the size of the database in RAM (except for `memory mode = dbengine`)
 
 By default `update every = 1` and `history = 3600`. This gives you an hour of data with per second updates.
 
diff --git a/database/engine/README.md b/database/engine/README.md
index 12c22a92c2..e824aa3a27 100644
--- a/database/engine/README.md
+++ b/database/engine/README.md
@@ -141,4 +141,55 @@ kern.maxfiles=65536
 
 You can apply the settings by running `sysctl -p` or by rebooting.
 
+## Evaluation
+
+We have evaluated the performance of the `dbengine` API that the netdata daemon uses internally. This is **not** the
+web API of netdata. Our benchmarks ran on a **single** `dbengine` instance, multiple of which can be running in a
+netdata master server. We used a server with an AMD Ryzen Threadripper 2950X 16-Core Processor and 2 disk drives, a
+Seagate Constellation ES.3 2TB magnetic HDD and a SAMSUNG MZQLB960HAJR-00007 960GB NAND Flash SSD.
+
+For our workload, we defined 32 charts with 128 metrics each, giving us a total of 4096 metrics. We defined 1 worker
+thread per chart (32 threads) that generates new data points with a data generation interval of 1 second. The time axis
+of the time-series is emulated and accelerated so that the worker threads can generate as many data points as possible
+without delays. 
+
+We also defined 32 worker threads that perform queries on random metrics with semi-random time ranges. The
+starting time of the query is randomly selected between the beginning of the time-series and the time of the latest data
+point. The ending time is randomly selected between 1 second and 1 hour after the starting time. The pseudo-random
+numbers are generated with a uniform distribution.
+
+The data are written to the database at the same time as they are read from it. This is a concurrent read/write mixed
+workload with a duration of 60 seconds. The faster `dbengine` runs, the bigger the dataset size becomes since more
+data points will be generated. We set a page cache size of 64MiB for the two disk-bound scenarios. This way, the dataset
+size of the metric data is much bigger than the RAM that is being used for caching so as to trigger I/O requests most
+of the time. In our final scenario, we set the page cache size to 16 GiB. That way, the dataset fits in the page cache
+so as to avoid all disk bottlenecks.
+
+The reported numbers are the following:
+
+| device | page cache | dataset | reads/sec | writes/sec |
+| :---: | :---: | ---: | ---: | ---: |
+| HDD | 64 MiB | 4.1 GiB | 813K | 18.0M |
+| SSD | 64 MiB | 9.8 GiB | 1.7M | 43.0M |
+| N/A | 16 GiB | 6.8 GiB |118.2M | 30.2M |
+
+where "reads/sec" is the number of metric data points being read from the database via its API per second and
+"writes/sec" is the number of metric data points being written to the database per second. 
+
+Notice that the HDD numbers are pretty high and not much slower than the SSD numbers. This is thanks to the database
+engine design being optimized for rotating media. In the database engine disk I/O requests are:
+
+-   asynchronous to mask the high I/O latency of HDDs.
+-   mostly large to reduce the amount of HDD seeking time.
+-   mostly sequential to reduce the amount of HDD seeking time.
+-   compressed to reduce the amount of required throughput.
+
+As a result, the HDD is not thousands of times slower than the SSD, which is typical for other workloads.
+
+An interesting observation to make is that the CPU-bound run (16 GiB page cache) generates fewer data than the SSD run
+(6.8 GiB vs 9.8 GiB). The reason is that the 32 reader threads in the SSD scenario are more frequently blocked by I/O,
+and generate a read load of 1.7M/sec, whereas in the CPU-bound scenario the read load is 70 times higher at 118M/sec.
+Consequently, there is a significant degree of interference by the reader threads, that slow down the writer threads.
+This is also possible because the interference effects are greater than the SSD impact on data generation throughput.
+
 [![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdatabase%2Fengine%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>)
diff --git a/database/engine/rrdengine.c b/database/engine/rrdengine.c
index 7b57a4194a..896d71f169 100644
--- a/database/engine/rrdengine.c
+++ b/database/engine/rrdengine.c
@@ -815,47 +815,6 @@ error_after_loop_init:
     complete(&ctx->rrdengine_completion);
 }
 
-
-#define NR_PAGES (256)
-static void basic_functional_test(struct rrdengine_instance *ctx)
-{
-    int i, j, failed_validations;
-    uuid_t uuid[NR_PAGES];
-    void *buf;
-    struct rrdeng_page_descr *handle[NR_PAGES];
-    char uuid_str[UUID_STR_LEN];
-    char backup[NR_PAGES][UUID_STR_LEN * 100]; /* backup storage for page data verification */
-
-    for (i = 0 ; i < NR_PAGES ; ++i) {
-        uuid_generate(uuid[i]);
-        uuid_unparse_lower(uuid[i], uuid_str);
-//      fprintf(stderr, "Generated uuid[%d]=%s\n", i, uuid_str);
-        buf = rrdeng_create_page(ctx, &uuid[i], &handle[i]);
-        /* Each page contains 10 times its own UUID stringified */
-        for (j = 0 ; j < 100 ; ++j) {
-            strcpy(buf + UUID_STR_LEN * j, uuid_str);
-            strcpy(backup[i] + UUID_STR_LEN * j, uuid_str);
-        }
-        rrdeng_commit_page(ctx, handle[i], (Word_t)i);
-    }
-    fprintf(stderr, "\n********** CREATED %d METRIC PAGES ***********\n\n", NR_PAGES);
-    failed_validations = 0;
-    for (i = 0 ; i < NR_PAGES ; ++i) {
-        buf = rrdeng_get_latest_page(ctx, &uuid[i], (void **)&handle[i]);
-        if (NULL == buf) {
-            ++failed_validations;
-            fprintf(stderr, "Page %d was LOST.\n", i);
-        }
-        if (memcmp(backup[i], buf, UUID_STR_LEN * 100)) {
-            ++failed_validations;
-            fprintf(stderr, "Page %d data comparison with backup FAILED validation.\n", i);
-        }
-        rrdeng_put_page(ctx, handle[i]);
-    }
-    fprintf(stderr, "\n********** CORRECTLY VALIDATED %d/%d METRIC PAGES ***********\n\n",
-            NR_PAGES - failed_validations, NR_PAGES);
-
-}
 /* C entry point for development purposes
  * make "LDFLAGS=-errdengine_main"
  */
@@ -868,8 +827,6 @@ void rrdengine_main(void)
     if (ret) {
         exit(ret);
     }
-    basic_functional_test(ctx);
-
     rrdeng_exit(ctx);
     fprintf(stderr, "Hello world!");
     exit(0);
diff --git a/database/engine/rrdenginelib.c b/database/engine/rrdenginelib.c
index 96504b275f..1a04dc2a47 100644
--- a/database/engine/rrdenginelib.c
+++ b/database/engine/rrdenginelib.c
@@ -8,7 +8,7 @@ void print_page_cache_descr(struct rrdeng_page_descr *descr)
 {
     struct page_cache_descr *pg_cache_descr = descr->pg_cache_descr;
     char uuid_str[UUID_STR_LEN];
-    char str[BUFSIZE];
+    char str[BUFSIZE + 1];
     int pos = 0;
 
     uuid_unparse_lower(*descr->id, uuid_str);
@@ -31,7 +31,7 @@ void print_page_cache_descr(struct rrdeng_page_descr *descr)
 void print_page_descr(struct rrdeng_page_descr *descr)
 {
     char uuid_str[UUID_STR_LEN];
-    char str[BUFSIZE];
+    char str[BUFSIZE + 1];
     int pos = 0;
 
     uuid_unparse_lower(*descr->id, uuid_str);
diff --git a/database/rrd.c b/database/rrd.c
index 31ad3f07e1..dcab65189e 100644
--- a/database/rrd.c
+++ b/database/rrd.c
@@ -15,7 +15,11 @@ int rrd_delete_unupdated_dimensions = 0;
 
 int default_rrd_update_every = UPDATE_EVERY;
 int default_rrd_history_entries = RRD_DEFAULT_HISTORY_ENTRIES;
+#ifdef ENABLE_DBENGINE
+RRD_MEMORY_MODE default_rrd_memory_mode = RRD_MEMORY_MODE_DBENGINE;
+#else
 RRD_MEMORY_MODE default_rrd_memory_mode = RRD_MEMORY_MODE_SAVE;
+#endif
 int gap_when_lost_iterations_above = 1;
author	Markos Fountoulakis <44345837+mfundul@users.noreply.github.com>	2019-10-03 17:04:51 +0300
committer	GitHub <noreply@github.com>	2019-10-03 17:04:51 +0300
commit	95119afff48735607643bfe3824ed3727b6edbb0 (patch)
tree	7d588b0f7131743d58386c100cb9fb2b0d97c187 /database
parent	06cdca8fdfb5f8af43a368e9afe0e996fb1ea8fd (diff)