set default for `minimum num samples to train` to `900` (#13174)

This will enable first set of initial models to be trained quicker and makes sense now that ml is enabled by default.
author: Andrew Maguire <andrewm4894@gmail.com> 2022-06-20 13:50:46 +0100
committer: GitHub <noreply@github.com> 2022-06-20 13:50:46 +0100
commit: a9f41c9a8baed8345ba029094c0cc916fa6fcdf8 (patch)
tree: 22ce82376d981080c44f5073b1e109cfa8293545 /ml
parent: e8ed52c20f76e3b1d07ecbdc00a6845a5f1862be (diff)
2 files changed, 2 insertions, 2 deletions
diff --git a/ml/Config.cc b/ml/Config.cc
index 4afa7c033b..65b05a34d6 100644
--- a/ml/Config.cc
+++ b/ml/Config.cc
@@ -29,7 +29,7 @@ void Config::readMLConfig(void) {
      */
 
     unsigned MaxTrainSamples = config_get_number(ConfigSectionML, "maximum num samples to train", 4 * 3600);
-    unsigned MinTrainSamples = config_get_number(ConfigSectionML, "minimum num samples to train", 1 * 3600);
+    unsigned MinTrainSamples = config_get_number(ConfigSectionML, "minimum num samples to train", 1 * 900);
     unsigned TrainEvery = config_get_number(ConfigSectionML, "train every", 1 * 3600);
 
     unsigned DBEngineAnomalyRateEvery = config_get_number(ConfigSectionML, "dbengine anomaly rate every", 30);
diff --git a/ml/README.md b/ml/README.md
index 428071417c..c752fc8521 100644
--- a/ml/README.md
+++ b/ml/README.md
@@ -228,7 +228,7 @@ This example assumes 3 child nodes [streaming](https://learn.netdata.cloud/docs/
 
 - `enabled`: `yes` to enable, `no` to disable.
 - `maximum num samples to train`: (`3600`/`86400`) This is the maximum amount of time you would like to train each model on. For example, the default of `14400` trains on the preceding 4 hours of data, assuming an `update every` of 1 second.
-- `minimum num samples to train`: (`900`/`21600`) This is the minimum amount of data required to be able to train a model. For example, the default of `3600` implies that once at least 1 hour of data is available for training, a model is trained, otherwise it is skipped and checked again at the next training run.
+- `minimum num samples to train`: (`900`/`21600`) This is the minimum amount of data required to be able to train a model. For example, the default of `900` implies that once at least 15 minutes of data is available for training, a model is trained, otherwise it is skipped and checked again at the next training run.
 - `train every`: (`1800`/`21600`) This is how often each model will be retrained. For example, the default of `3600` means that each model is retrained every hour. Note: The training of all models is spread out across the `train every` period for efficiency, so in reality, it means that each model will be trained in a staggered manner within each `train every` period.
 - `dbengine anomaly rate every`: (`30`/`900`) This is how often netdata will aggregate all the anomaly bits into a single chart (`anomaly_detection.anomaly_rates`). The aggregation into a single chart allows enabling anomaly rate ranking over _all_ metrics with one API call as opposed to a call per chart.
 - `num samples to diff`: (`0`/`1`) This is a `0` or `1` to determine if you want the model to operate on differences of the raw data or just the raw data. For example, the default of `1` means that we take differences of the raw values. Using differences is more general and works on dimensions that might naturally tend to have some trends or cycles in them that is normal behavior to which we don't want to be too sensitive.
author	Andrew Maguire <andrewm4894@gmail.com>	2022-06-20 13:50:46 +0100
committer	GitHub <noreply@github.com>	2022-06-20 13:50:46 +0100
commit	a9f41c9a8baed8345ba029094c0cc916fa6fcdf8 (patch)
tree	22ce82376d981080c44f5073b1e109cfa8293545 /ml
parent	e8ed52c20f76e3b1d07ecbdc00a6845a5f1862be (diff)