summaryrefslogtreecommitdiffstats
path: root/web/api/queries
diff options
context:
space:
mode:
authorCosta Tsaousis <costa@tsaousis.gr>2018-10-25 03:34:21 +0300
committerGitHub <noreply@github.com>2018-10-25 03:34:21 +0300
commit93467af78c91bf0b0898e3b28b3a7647e7442f7c (patch)
tree585157c5eb0bb355febba46b932cfb9265163414 /web/api/queries
parent6c5fc794f077c9affe6892d24d8e1d5a78956fc6 (diff)
query engine documentation and stats (#4483)
Diffstat (limited to 'web/api/queries')
-rw-r--r--web/api/queries/README.md122
-rw-r--r--web/api/queries/query.c6
-rw-r--r--web/api/queries/rrdr.h3
3 files changed, 130 insertions, 1 deletions
diff --git a/web/api/queries/README.md b/web/api/queries/README.md
index e0a5707581..22d2234ed8 100644
--- a/web/api/queries/README.md
+++ b/web/api/queries/README.md
@@ -1,4 +1,124 @@
# Database Queries
-TBD
+Netdata database can be queried with `/api/v1/data` and `/api/v1/badge.svg` API methods.
+
+Every data query accepts the following parameters:
+
+name|description
+:----:|:----:
+`chart`|The chart to be queried.
+`points`|The number of points to be returned. Netdata can reduce number of points by applying query grouping methods.
+`before`|The absolute timestamp or the relative (to now) time the query should finish evaluating data.
+`after`|The absolute timestamp or the relative (to `before`) time the query should start evaluating data.
+`group`|The grouping method to use when reducing the points the database has.
+`gtime`|A resampling period to change the units of the metrics (i.e. setting this to `60` will convert `per second` metrics to `per minute`.
+`options`|A bitmap of options that can affect the operation of the query. Only 2 options are used by the query engine: `unaligned` and `percentage`. All the other options are used by the output formatters.
+`dimensions`|A simple pattern to filter the dimensions to be queried.
+
+## Operation
+
+The query engine works as follows (in this order):
+
+1. **Identify the exact time-frame required, in absolute timestamps.**
+
+ `after` and `before` define a time-frame:
+
+ - in **absolute timestamps** (unix timestamps, i.e. seconds since epoch).
+
+ - in **relative timestamps**:
+
+ `before` is relative to now and `after` is relative to `before`.
+
+ So, `before=-60&after=-60` evaluates to the time-frame from -120 up to -60 seconds in
+ the past, relative to now.
+
+ At the end of this operation, `after` and `before` are absolute timestamps.
+ The engine verifies that the time-frame is available at the database. If it is not,
+ it will adjust `after` and `before` accordingly so that usable data can be returned,
+ or no data at all if the time-frame is entirely outside the current range of the
+ database.
+
+2. **Identify the grouping of database points required.**
+
+ Grouping database points is used when the caller requests a longer time-frame to be
+ expressed with fewer points, compared to what is available at the database.
+
+ There are 2 uses of this (that can be combined):
+
+ - The caller requests a specific number of `points` to be returned.
+
+ For example, for a time-frame of 10 minutes, the database has 600 points (1/sec),
+ while the caller requested these 10 minutes to be expressed in 200 points.
+
+ This feature is used by netdata dashboards when you zoom-out the charts.
+ The dashboard is requesting the number of points the user's screen has, and netdata
+ returns that many points to perfectly match the screen. This saves bandwidth
+ and makes drawing the charts a lot faster.
+
+ - The caller requests a **re-sampling** of the database, by setting `gtime` to any value
+ above `1`. For example, the database maintains the metrics in the form of `X/sec`
+ but the caller set `gtime=60` to get `X/min`.
+
+ Using the above information the query engine tries to find a best fit for database-points
+ to result-points ratio (we call this `group points`). It always tries to keep `group points`
+ an integer. Keep in mind the query engine may alter a bit `after` if required. So, the engine
+ may decide to shift the starting point of the time-frame to keep the query optimal.
+
+3. **Align the time-frame.**
+
+ Alignment is a very important aspect of netdata queries. Without it, the animated
+ charts on the dashboards would constantly change shape during incremental updates.
+ To provide consistent grouping of all points, the query engine (by default) aligns
+ `after` and `before` to be a multiple of `group points`.
+
+ For example, if `group points` is 60 and alignment is enabled, the engine will return
+ each point with durations XX:XX:00 - XX:XX:59 matching minutes. Of course, depending
+ on the database granularity for the specific chart and the requested points to be
+ returned, the engine may use any integer number for `group points`.
+
+ To disable alignment, pass `&options=unaligned` to the query.
+
+4. **Execute the query**
+
+ To execute the query, the engine evaluates all dimensions of the chart, one after another.
+ The engine will not evaluate dimensions that do not match the simple pattern given at
+ the `dimensions` parameter, except when `options=percentage` is given (this option requires
+ all the dimensions to be evaluated to find the percentage of each dimension vs to chart
+ total).
+
+ For each dimension, it starts evaluating values from `after` towards `before`.
+ For each value it calls the **grouping method** specified (the default is `average`).
+
+## Grouping methods
+
+The following grouping methods are supported. These are given all the values in the time-frame
+and they group the values every `group points`.
+
+name|identifier(s)|description
+:---:|:---:|:---:
+Min|`min`|finds the minimum value
+Max|`max`|finds the maximum value
+Average|`average` `mean`|finds the average value
+Sum|`sum`|adds all the values and returns the sum
+Median|`median`|sorts the values and returns the value in the middle of the list
+Standard Deviation|`stddev`|finds the standard deviation of the values
+Coefficient of Variation|`cv` `rds`|finds the relative standard deviation of the values
+Single Exponential Smoothing|`ses` `ema` `ewma`|finds the exponential weighted moving average of the values
+Double Exponential Smoothing|`des`|applies Holt-Winters double exponential smoothing
+Incremental Sum|`incremental_sum` `incremental-sum`|find the difference of the last vs the first values
+
+## Further processing
+
+The result of the query engine is always a structure that has dimensions and values
+for each dimension.
+
+Formatting modules are then used to convert this result in many different formats and return it
+to the caller.
+
+## Performance
+
+The query engine is highly optimized for speed. Most of its modules implement "online"
+versions of the algorithms, requiring just one pass on the database values to produce
+the result.
+
diff --git a/web/api/queries/query.c b/web/api/queries/query.c
index 764375932a..1806b6d4e9 100644
--- a/web/api/queries/query.c
+++ b/web/api/queries/query.c
@@ -408,6 +408,7 @@ static inline void do_dimension(
RRDR_VALUE_FLAGS
group_value_flags = RRDR_VALUE_NOTHING;
+ size_t db_points_read = 0;
for( ; points_added < points_wanted ; now += dt, slot++ ) {
if(unlikely(slot >= entries)) slot = 0;
@@ -442,6 +443,7 @@ static inline void do_dimension(
// add this value for grouping
r->internal.grouping_add(r, value);
values_in_group++;
+ db_points_read++;
if(unlikely(values_in_group == group_size)) {
rrdr_line = rrdr_line_init(r, now, rrdr_line);
@@ -469,6 +471,9 @@ static inline void do_dimension(
}
}
+ r->internal.db_points_read += db_points_read;
+ r->internal.result_points_generated += points_added;
+
r->before = max_date;
r->after = min_date;
rrdr_done(r, rrdr_line);
@@ -943,5 +948,6 @@ RRDR *rrd2rrdr(
}
}
+ rrdr_query_completed(r->internal.db_points_read, r->internal.result_points_generated);
return r;
}
diff --git a/web/api/queries/rrdr.h b/web/api/queries/rrdr.h
index 44e14fe19f..a0295db463 100644
--- a/web/api/queries/rrdr.h
+++ b/web/api/queries/rrdr.h
@@ -85,6 +85,9 @@ typedef struct rrdresult {
#ifdef NETDATA_INTERNAL_CHECKS
const char *log;
#endif
+
+ size_t db_points_read;
+ size_t result_points_generated;
} internal;
} RRDR;