summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--backends/prometheus/README.md25
-rw-r--r--database/README.md16
-rw-r--r--health/README.md48
-rw-r--r--registry/README.md4
-rw-r--r--streaming/README.md11
5 files changed, 49 insertions, 55 deletions
diff --git a/backends/prometheus/README.md b/backends/prometheus/README.md
index 826cf051bd..57b64590c4 100644
--- a/backends/prometheus/README.md
+++ b/backends/prometheus/README.md
@@ -1,18 +1,20 @@
-> IMPORTANT: the format netdata sends metrics to prometheus has changed since netdata v1.7. The new prometheus backend for netdata supports a lot more features and is aligned to the development of the rest of the netdata backends.
-
# Using netdata with Prometheus
+> IMPORTANT: the format netdata sends metrics to prometheus has changed since netdata v1.7. The new prometheus backend for netdata supports a lot more features and is aligned to the development of the rest of the netdata backends.
+
Prometheus is a distributed monitoring system which offers a very simple setup along with a robust data model. Recently netdata added support for Prometheus. I'm going to quickly show you how to install both netdata and prometheus on the same server. We can then use grafana pointed at Prometheus to obtain long term metrics netdata offers. I'm assuming we are starting at a fresh ubuntu shell (whether you'd like to follow along in a VM or a cloud instance is up to you).
## Installing netdata and prometheus
### Installing netdata
+
There are number of ways to install netdata according to [Installation](https://github.com/netdata/netdata/wiki/Installation)
The suggested way of installing the latest netdata and keep it upgrade automatically. Using one line installation:
```
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
```
+
At this point we should have netdata listening on port 19999. Attempt to take your browser here:
```
@@ -22,15 +24,16 @@ http://your.netdata.ip:19999
*(replace `your.netdata.ip` with the IP or hostname of the server running netdata)*
### Installing Prometheus
+
In order to install prometheus we are going to introduce our own systemd startup script along with an example of prometheus.yaml configuration. Prometheus needs to be pointed to your server at a specific target url for it to scrape netdata's api. Prometheus is always a pull model meaning netdata is the passive client within this architecture. Prometheus always initiates the connection with netdata.
-##### Download Prometheus
+#### Download Prometheus
```sh
wget -O /tmp/prometheus-2.3.2.linux-amd64.tar.gz https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz
```
-##### Create prometheus system user
+#### Create prometheus system user
```sh
sudo useradd -r prometheus
@@ -104,6 +107,7 @@ scrape_configs:
static_configs:
- targets: ['{your.netdata.ip}:19999']
```
+
#### Install nodes.yml
The following is completely optional, it will enable Prometheus to generate alerts from some NetData sources. Tweak the values to your own needs. We will use the following `nodes.yml` file below. Save it at `/opt/prometheus/nodes.yml`, and add a *- "nodes.yml"* entry under the *rule_files:* section in the example prometheus.yml file above.
@@ -166,7 +170,6 @@ ExecStop=/bin/kill -SIGINT $MAINPID
[Install]
WantedBy=multi-user.target
```
-
##### Start Prometheus
```
@@ -180,7 +183,7 @@ If everything is working correctly when you fetch `http://your.prometheus.ip:909
---
-## netdata support for prometheus
+## Netdata support for prometheus
> IMPORTANT: the format netdata sends metrics to prometheus has changed since netdata v1.6. The new format allows easier queries for metrics and supports both `as collected` and normalized metrics.
@@ -208,7 +211,7 @@ Then each netdata chart contains metrics called `dimensions`. All the dimensions
### netdata data source
-netdata can send metrics to prometheus from 3 data sources:
+Netdata can send metrics to prometheus from 3 data sources:
- `as collected` or `raw` - this data source sends the metrics to prometheus as they are collected. No conversion is done by netdata. The latest value for each metric is just given to prometheus. This is the most preferred method by prometheus, but it is also the harder to work with. To work with this data source, you will need to understand how to get meaningful values out of them.
@@ -231,7 +234,6 @@ netdata can send metrics to prometheus from 3 data sources:
Keep in mind that early versions of netdata were sending the metrics as: `CHART_DIMENSION{}`.
-
### Querying Metrics
Fetch with your web browser this URL:
@@ -298,6 +300,7 @@ netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="iowait"} 233
# COMMENT netdata_system_cpu_total: chart "system.cpu", context "system.cpu", family "cpu", dimension "idle", value * 1 / 1 delta gives percentage (counter)
netdata_system_cpu_total{chart="system.cpu",family="cpu",dimension="idle"} 918470 1500066716438
```
+
*(netdata response for `system.cpu` with source=`as-collected`)*
For more information check prometheus documentation.
@@ -315,11 +318,11 @@ The `format=prometheus` parameter only exports the host's netdata metrics. If y
This will report all upstream host data, and `honor_labels` will make Prometheus take note of the instance names provided.
-### timestamps
+### Timestamps
To pass the metrics through prometheus pushgateway, netdata supports the option `&timestamps=no` to send the metrics without timestamps.
-## netdata host variables
+## Netdata host variables
netdata collects various system configuration metrics, like the max number of TCP sockets supported, the max number of files allowed system-wide, various IPC sizes, etc. These metrics are not exposed to prometheus by default.
@@ -369,7 +372,7 @@ netdata sends all metrics prefixed with `netdata_`. You can change this in `netd
It can also be changed from the URL, by appending `&prefix=netdata`.
-### accuracy of `average` and `sum` data sources
+### Accuracy of `average` and `sum` data sources
When the data source is set to `average` or `sum`, netdata remembers the last access of each client accessing prometheus metrics and uses this last access time to respond with the `average` or `sum` of all the entries in the database since that. This means that prometheus servers are not losing data when they access netdata with data source = `average` or `sum`.
diff --git a/database/README.md b/database/README.md
index 8f5e3a6df1..68156f8a44 100644
--- a/database/README.md
+++ b/database/README.md
@@ -1,4 +1,4 @@
-# netdata database
+# Netdata database
Although `netdata` does all its calculations using `long double`, it stores all values using
a [custom-made 32-bit number](../libnetdata/storage_number/).
@@ -26,17 +26,17 @@ Currently netdata supports 5 memory modes:
1. `ram`, data are purely in memory. Data are never saved on disk. This mode uses `mmap()` and
supports [KSM](#ksm).
-
+
2. `save`, (the default) data are only in RAM while netdata runs and are saved to / loaded from
disk on netdata restart. It also uses `mmap()` and supports [KSM](#ksm).
-
+
3. `map`, data are in memory mapped files. This works like the swap. Keep in mind though, this
will have a constant write on your disk. When netdata writes data on its memory, the Linux kernel
marks the related memory pages as dirty and automatically starts updating them on disk.
Unfortunately we cannot control how frequently this works. The Linux kernel uses exactly the
same algorithm it uses for its swap memory. Check below for additional information on running a
dedicated central netdata server. This mode uses `mmap()` but does not support [KSM](#ksm).
-
+
4. `none`, without a database (collected metrics can only be streamed to another netdata).
5. `alloc`, like `ram` but it uses `calloc()` and does not support [KSM](#ksm). This mode is the
@@ -73,9 +73,9 @@ by netdata. Of course experiment a bit. On very weak devices you might have to u
You can also disable [data collection plugins](../collectors) you don't need.
Disabling such plugins will also free both CPU and RAM resources.
-## running a dedicated central netdata server
+## Running a dedicated central netdata server
-netdata allows streaming data between netdata nodes. This allows us to have a central netdata
+Netdata allows streaming data between netdata nodes. This allows us to have a central netdata
server that will maintain the entire database for all nodes, and will also run health checks/alarms
for all nodes.
@@ -166,7 +166,7 @@ netdata, each byte at the in-memory database will be updated just once per day).
KSM is a solution that will provide 60+% memory savings to netdata.
-#### Enable KSM in kernel
+### Enable KSM in kernel
You need to run a kernel compiled with:
@@ -186,7 +186,7 @@ The files that `CONFIG_KSM=y` offers include:
So, by default `ksmd` is just disabled. It will not harm performance and the user/admin can control the CPU resources he/she is willing `ksmd` to use.
-#### Run `ksmd` kernel daemon
+### Run `ksmd` kernel daemon
To activate / run `ksmd` you need to run:
diff --git a/health/README.md b/health/README.md
index 597bd3c324..114b13181d 100644
--- a/health/README.md
+++ b/health/README.md
@@ -1,4 +1,3 @@
-
# Health monitoring
Each netdata node runs an independent thread evaluating health monitoring checks.
@@ -40,16 +39,16 @@ killall -USR2 netdata
There are 2 entities:
-1. **alarms**, which are attached to specific charts, and
+1. **alarms**, which are attached to specific charts, and
-2. **templates**, which define rules that should be applied to all charts having a
+1. **templates**, which define rules that should be applied to all charts having a
specific `context`. You can use this feature to apply **alarms** to all disks,
all network interfaces, all mysql databases, all nginx web servers, etc.
Both of these entities have exactly the same format and feature set.
The only difference is the label `alarm` or `template`.
-netdata supports overriding **templates** with **alarms**.
+Netdata supports overriding **templates** with **alarms**.
For example, when a template is defined for a set of charts, an alarm with exactly the
same name attached to the same chart the template matches, will have higher precedence
(i.e. netdata will use the alarm on this chart and prevent the template from being applied
@@ -59,7 +58,7 @@ to it).
The following lines are parsed.
-#### alarm line `alarm` or `template`
+#### Alarm line `alarm` or `template`
This line starts an alarm or alarm template.
@@ -78,7 +77,7 @@ This line has to be first on each alarm or template.
---
-#### alarm line `on`
+#### Alarm line `on`
This line defines the data the alarm should be attached to.
@@ -112,7 +111,7 @@ So, `plugin = proc`, `module = /proc/net/dev` and `context = net.net`.
---
-#### alarm line `os`
+#### Alarm line `os`
This alarm or template will be used only if the O/S of the host loading it, matches this
pattern list. The value is a space separated list of simple patterns (use `*` as wildcard,
@@ -124,7 +123,7 @@ os: linux freebsd macos
---
-#### alarm line `hosts`
+#### Alarm line `hosts`
This alarm or template will be used only if the hostname of the host loading it, matches
this pattern list. The value is a space separated list of simple patterns (use `*` as wildcard,
@@ -141,7 +140,7 @@ This is useful when you centralize metrics from multiple hosts, to one netdata.
---
-#### alarm line `families`
+#### Alarm line `families`
This line is only used in alarm templates. It filters the charts. So, if you need to create
an alarm template for a few of a kind of chart (a few of your disks, or a few of your network
@@ -165,7 +164,7 @@ The family of a chart is usually the submenu of the netdata dashboard it appears
---
-#### alarm line `lookup`
+#### Alarm line `lookup`
This lines makes a database lookup to find a value. This result of this lookup is available as `$this`.
@@ -205,7 +204,7 @@ The timestamps of the timeframe evaluated by the database lookup is available as
---
-#### alarm line `calc`
+#### Alarm line `calc`
This expression is evaluated just after the `lookup` (if any). Its purpose is to apply some
calculation before using the value looked up from the db.
@@ -225,7 +224,7 @@ Check [Expressions](#expressions) for more information.
---
-#### alarm line `every`
+#### Alarm line `every`
Sets the update frequency of this alarm. This is the same to the `every DURATION` given
in the `lookup` lines.
@@ -240,7 +239,7 @@ every: DURATION
---
-#### alarm lines `green` and `red`
+#### Alarm lines `green` and `red`
Set the green and red thresholds of a chart. Both are available as `$green` and `$red` in
expressions. If multiple alarms define different thresholds, the ones defined by the first
@@ -257,7 +256,7 @@ red: NUMBER
---
-#### alarm lines `warn` and `crit`
+#### Alarm lines `warn` and `crit`
These expressions should evaluate to true or false (alternatively non-zero or zero).
They trigger the alarm. Both are optional.
@@ -272,7 +271,7 @@ Check [Expressions](#expressions) for more information.
---
-#### alarm line `to`
+#### Alarm line `to`
This will be the first parameter of the script to be executed when the alarm switches status.
Its meaning is left up to the `exec` script.
@@ -288,7 +287,7 @@ to: ROLE1 ROLE2 ROLE3 ...
---
-#### alarm line `exec`
+#### Alarm line `exec`
The script that will be executed when the alarm changes status.
@@ -303,7 +302,7 @@ methods netdata supports, including custom hooks.
---
-#### alarm line `delay`
+#### Alarm line `delay`
This is used to provide optional hysteresis settings for the notifications, to defend
against notification floods. These settings do not affect the actual alarm - only the time
@@ -374,13 +373,9 @@ Expressions can have variables. Variables start with `$`. Check below for more i
There are two special values you can use:
- - `nan`, for example `$this != nan` will check if the variable `this` is available.
- A variable can be `nan` if the database lookup failed. All calculations (i.e. addition,
- multiplication, etc) with a `nan` result in a `nan`.
+- `nan`, for example `$this != nan` will check if the variable `this` is available. A variable can be `nan` if the database lookup failed. All calculations (i.e. addition, multiplication, etc) with a `nan` result in a `nan`.
- - `inf`, for example `$this != inf` will check if `this` is not infinite. A value or
- variable can be infinite if divided by zero. All calculations (i.e. addition,
- multiplication, etc) with a `inf` result in a `inf`.
+- `inf`, for example `$this != inf` will check if `this` is not infinite. A value or variable can be infinite if divided by zero. All calculations (i.e. addition, multiplication, etc) with a `inf` result in a `inf`.
---
@@ -412,10 +407,10 @@ Which in turn, results in the following behavior:
* While the value is falling, it will return to a warning state when it goes below 85,
and a normal state when it goes below 75.
-
+
* If the value is constantly varying between 80 and 90, then it will trigger a warning the
first time it goes above 85, but will remain a warning until it goes below 75 (or goes above 85).
-
+
* If the value is constantly varying between 90 and 100, then it will trigger a critical alert
the first time it goes above 95, but will remain a critical alert goes below 85 (at which
point it will return to being a warning).
@@ -653,5 +648,4 @@ You can find the context of charts by looking up the chart in either
You can find how netdata interpreted the expressions by examining the alarm at
`http://your.netdata:19999/api/v1/alarms?all`. For each expression, netdata will return the
expression as given in its config file, and the same expression with additional parentheses
-added to indicate the evaluation flow of the expression.
-
+added to indicate the evaluation flow of the expression.
diff --git a/registry/README.md b/registry/README.md
index ac1265ec13..d37bf17d9a 100644
--- a/registry/README.md
+++ b/registry/README.md
@@ -36,11 +36,11 @@ The registry keeps track of 3 entities:
For each netdata installation (each `machine_guid`) the registry keeps track of the different URLs it is accessed.
-2. **persons**: i.e. the web browsers accessing the netdata installations (a random GUID generated by the registry the first time it sees a new web browser; we call this **person_guid**)
+1. **persons**: i.e. the web browsers accessing the netdata installations (a random GUID generated by the registry the first time it sees a new web browser; we call this **person_guid**)
For each person, the registry keeps track of the netdata installations it has accessed and their URLs.
-3. **URLs** of netdata installations (as seen by the web browsers)
+1. **URLs** of netdata installations (as seen by the web browsers)
For each URL, the registry keeps the URL and nothing more. Each URL is linked to *persons* and *machines*. The only way to find a URL is to know its **machine_guid** or have a **person_guid** it is linked to it.
diff --git a/streaming/README.md b/streaming/README.md
index 1f3bd73912..7cf8bf53ee 100644
--- a/streaming/README.md
+++ b/streaming/README.md
@@ -13,7 +13,7 @@ a netdata performs:
The following configurations are supported:
-#### netdata without a database or web API (headless collector)
+#### Netdata without a database or web API (headless collector)
Local netdata (`slave`), **without any database or alarms**, collects metrics and sends them to
another netdata (`master`).
@@ -28,7 +28,7 @@ of maintaining a local database and accepting dashboard requests, it streams all
The same `master` can collect data for any number of `slaves`.
-#### database replication
+#### Database replication
Local netdata (`slave`), **with a local database (and possibly alarms)**, collects metrics and
sends them to another netdata (`master`).
@@ -306,10 +306,10 @@ On each of the slaves, edit `/etc/netdata/stream.conf` (to edit it on your syste
[stream]
# stream metrics to another netdata
enabled = yes
-
+
# the IP and PORT of the master
destination = 10.11.12.13:19999
-
+
# the API key to use
api key = 11111111-2222-3333-4444-555555555555
```
@@ -340,7 +340,6 @@ The file `/var/lib/netdata/registry/netdata.public.unique.id` contains a random
Both the sender and the receiver of metrics log information at `/var/log/netdata/error.log`.
-
On both master and slave do this:
```
@@ -394,7 +393,6 @@ This means a setup like the following is also possible:
<img src="https://cloud.githubusercontent.com/assets/2662304/23629551/bb1fd9c2-02c0-11e7-90f5-cab5a3ed4c53.png"/>
</p>
-
## proxies
A proxy is a netdata that is receiving metrics from a netdata, and streams them to another netdata.
@@ -410,4 +408,3 @@ The sending side of a netdata proxy, connects and disconnects to the final desti
metrics, following the same pattern of the receiving side.
For a practical example see [Monitoring ephemeral nodes](#monitoring-ephemeral-nodes).
-