summaryrefslogtreecommitdiffstats
path: root/backends
diff options
context:
space:
mode:
authorPromise Akpan <akpanpromise@hotmail.com>2019-10-03 18:21:37 +0100
committerJoel Hans <joel@netdata.cloud>2019-10-03 10:21:37 -0700
commit5d7b0043d5a0196e2eeb79bd68a65f9ff7a54630 (patch)
tree290392ed503405dba707804a14e0361a442c3e49 /backends
parent2701913e93424d58fb4ec3d07af37b5d39fe66f2 (diff)
Fix Remark Lint Warnings for Backends (#6917)
* fix remark warnings for AWS Kinesis doc * fix remark warnings for MongoDB * fix remark warnings on OpenTSDB doc * fix lint warnings on prometheus docs * format prometheus page * fix main backend readme * fix remark warnings except local links * remove slash in prometheus doc * remove link to header to fix lint error * make character limit to 120 not 80
Diffstat (limited to 'backends')
-rw-r--r--backends/README.md187
-rw-r--r--backends/WALKTHROUGH.md293
-rw-r--r--backends/aws_kinesis/README.md21
-rw-r--r--backends/mongodb/README.md14
-rw-r--r--backends/opentsdb/README.md18
-rw-r--r--backends/prometheus/README.md141
-rw-r--r--backends/prometheus/remote_write/README.md16
7 files changed, 358 insertions, 332 deletions
diff --git a/backends/README.md b/backends/README.md
index 8cd2311eba..ac0847dcab 100644
--- a/backends/README.md
+++ b/backends/README.md
@@ -1,26 +1,25 @@
# Metrics long term archiving
-Netdata supports backends for archiving the metrics, or providing long term dashboards,
-using Grafana or other tools, like this:
+Netdata supports backends for archiving the metrics, or providing long term dashboards, using Grafana or other tools,
+like this:
![image](https://cloud.githubusercontent.com/assets/2662304/20649711/29f182ba-b4ce-11e6-97c8-ab2c0ab59833.png)
-Since Netdata collects thousands of metrics per server per second, which would easily congest any backend
-server when several Netdata servers are sending data to it, Netdata allows sending metrics at a lower
-frequency, by resampling them.
+Since Netdata collects thousands of metrics per server per second, which would easily congest any backend server when
+several Netdata servers are sending data to it, Netdata allows sending metrics at a lower frequency, by resampling them.
-So, although Netdata collects metrics every second, it can send to the backend servers averages or sums every
-X seconds (though, it can send them per second if you need it to).
+So, although Netdata collects metrics every second, it can send to the backend servers averages or sums every X seconds
+(though, it can send them per second if you need it to).
## features
1. Supported backends
- - **graphite** (`plaintext interface`, used by **Graphite**, **InfluxDB**, **KairosDB**,
- **Blueflood**, **ElasticSearch** via logstash tcp input and the graphite codec, etc)
+ - **graphite** (`plaintext interface`, used by **Graphite**, **InfluxDB**, **KairosDB**, **Blueflood**,
+ **ElasticSearch** via logstash tcp input and the graphite codec, etc)
- metrics are sent to the backend server as `prefix.hostname.chart.dimension`. `prefix` is
- configured below, `hostname` is the hostname of the machine (can also be configured).
+ metrics are sent to the backend server as `prefix.hostname.chart.dimension`. `prefix` is configured below,
+ `hostname` is the hostname of the machine (can also be configured).
- **opentsdb** (`telnet or HTTP interfaces`, used by **OpenTSDB**, **InfluxDB**, **KairosDB**, etc)
@@ -33,12 +32,12 @@ X seconds (though, it can send them per second if you need it to).
- **prometheus** is described at [prometheus page](prometheus/) since it pulls data from Netdata.
- **prometheus remote write** (a binary snappy-compressed protocol buffer encoding over HTTP used by
- **Elasticsearch**, **Gnocchi**, **Graphite**, **InfluxDB**, **Kafka**, **OpenTSDB**,
- **PostgreSQL/TimescaleDB**, **Splunk**, **VictoriaMetrics**,
- and a lot of other [storage providers](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage))
+ **Elasticsearch**, **Gnocchi**, **Graphite**, **InfluxDB**, **Kafka**, **OpenTSDB**, **PostgreSQL/TimescaleDB**,
+ **Splunk**, **VictoriaMetrics**, and a lot of other [storage
+ providers](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage))
- metrics are labeled in the format, which is used by Netdata for the [plaintext prometheus protocol](prometheus/).
- Notes on using the remote write backend are [here](prometheus/remote_write/).
+ metrics are labeled in the format, which is used by Netdata for the [plaintext prometheus
+ protocol](prometheus/). Notes on using the remote write backend are [here](prometheus/remote_write/).
- **AWS Kinesis Data Streams**
@@ -54,32 +53,37 @@ X seconds (though, it can send them per second if you need it to).
4. Netdata supports three modes of operation for all backends:
- - `as-collected` sends to backends the metrics as they are collected, in the units they are collected.
- So, counters are sent as counters and gauges are sent as gauges, much like all data collectors do.
- For example, to calculate CPU utilization in this format, you need to know how to convert kernel ticks to percentage.
+ - `as-collected` sends to backends the metrics as they are collected, in the units they are collected. So,
+ counters are sent as counters and gauges are sent as gauges, much like all data collectors do. For example, to
+ calculate CPU utilization in this format, you need to know how to convert kernel ticks to percentage.
- - `average` sends to backends normalized metrics from the Netdata database.
- In this mode, all metrics are sent as gauges, in the units Netdata uses. This abstracts data collection
- and simplifies visualization, but you will not be able to copy and paste queries from other sources to convert units.
- For example, CPU utilization percentage is calculated by Netdata, so Netdata will convert ticks to percentage and
- send the average percentage to the backend.
+ - `average` sends to backends normalized metrics from the Netdata database. In this mode, all metrics are sent as
+ gauges, in the units Netdata uses. This abstracts data collection and simplifies visualization, but you will not
+ be able to copy and paste queries from other sources to convert units. For example, CPU utilization percentage
+ is calculated by Netdata, so Netdata will convert ticks to percentage and send the average percentage to the
+ backend.
- - `sum` or `volume`: the sum of the interpolated values shown on the Netdata graphs is sent to the backend.
- So, if Netdata is configured to send data to the backend every 10 seconds, the sum of the 10 values shown on the
+ - `sum` or `volume`: the sum of the interpolated values shown on the Netdata graphs is sent to the backend. So, if
+ Netdata is configured to send data to the backend every 10 seconds, the sum of the 10 values shown on the
Netdata charts will be used.
-Time-series databases suggest to collect the raw values (`as-collected`). If you plan to invest on building your monitoring around a time-series database and you already know (or you will invest in learning) how to convert units and normalize the metrics in Grafana or other visualization tools, we suggest to use `as-collected`.
+ Time-series databases suggest to collect the raw values (`as-collected`). If you plan to invest on building your
+ monitoring around a time-series database and you already know (or you will invest in learning) how to convert units
+ and normalize the metrics in Grafana or other visualization tools, we suggest to use `as-collected`.
-If, on the other hand, you just need long term archiving of Netdata metrics and you plan to mainly work with Netdata, we suggest to use `average`. It decouples visualization from data collection, so it will generally be a lot simpler. Furthermore, if you use `average`, the charts shown in the back-end will match exactly what you see in Netdata, which is not necessarily true for the other modes of operation.
+ If, on the other hand, you just need long term archiving of Netdata metrics and you plan to mainly work with
+ Netdata, we suggest to use `average`. It decouples visualization from data collection, so it will generally be a lot
+ simpler. Furthermore, if you use `average`, the charts shown in the back-end will match exactly what you see in
+ Netdata, which is not necessarily true for the other modes of operation.
5. This code is smart enough, not to slow down Netdata, independently of the speed of the backend server.
## configuration
-In `/etc/netdata/netdata.conf` you should have something like this (if not download the latest version
-of `netdata.conf` from your Netdata):
+In `/etc/netdata/netdata.conf` you should have something like this (if not download the latest version of `netdata.conf`
+from your Netdata):
-```
+```conf
[backend]
enabled = yes | no
type = graphite | opentsdb:telnet | opentsdb:http | opentsdb:https | prometheus_remote_write | json | kinesis | mongodb
@@ -98,92 +102,87 @@ of `netdata.conf` from your Netdata):
- `enabled = yes | no`, enables or disables sending data to a backend
-- `type = graphite | opentsdb:telnet | opentsdb:http | opentsdb:https | json | kinesis | mongodb`, selects the backend type
+- `type = graphite | opentsdb:telnet | opentsdb:http | opentsdb:https | json | kinesis | mongodb`, selects the backend
+ type
-- `destination = host1 host2 host3 ...`, accepts **a space separated list** of hostnames,
- IPs (IPv4 and IPv6) and ports to connect to.
- Netdata will use the **first available** to send the metrics.
+- `destination = host1 host2 host3 ...`, accepts **a space separated list** of hostnames, IPs (IPv4 and IPv6) and
+ ports to connect to. Netdata will use the **first available** to send the metrics.
The format of each item in this list, is: `[PROTOCOL:]IP[:PORT]`.
`PROTOCOL` can be `udp` or `tcp`. `tcp` is the default and only supported by the current backends.
- `IP` can be `XX.XX.XX.XX` (IPv4), or `[XX:XX...XX:XX]` (IPv6).
- For IPv6 you can to enclose the IP in `[]` to separate it from the port.
+ `IP` can be `XX.XX.XX.XX` (IPv4), or `[XX:XX...XX:XX]` (IPv6). For IPv6 you can to enclose the IP in `[]` to
+ separate it from the port.
`PORT` can be a number of a service name. If omitted, the default port for the backend will be used
(graphite = 2003, opentsdb = 4242).
Example IPv4:
-```
+```conf
destination = 10.11.14.2:4242 10.11.14.3:4242 10.11.14.4:4242
```
Example IPv6 and IPv4 together:
-```
+```conf
destination = [ffff:...:0001]:2003 10.11.12.1:2003
```
- When multiple servers are defined, Netdata will try the next one when the first one fails. This allows
- you to load-balance different servers: give your backend servers in different order on each Netdata.
+ When multiple servers are defined, Netdata will try the next one when the first one fails. This allows you to
+ load-balance different servers: give your backend servers in different order on each Netdata.
- Netdata also ships [`nc-backend.sh`](nc-backend.sh),
- a script that can be used as a fallback backend to save the metrics to disk and push them to the
- time-series database when it becomes available again. It can also be used to monitor / trace / debug
- the metrics Netdata generates.
+ Netdata also ships [`nc-backend.sh`](nc-backend.sh), a script that can be used as a fallback backend to save the
+ metrics to disk and push them to the time-series database when it becomes available again. It can also be used to
+ monitor / trace / debug the metrics Netdata generates.
For kinesis backend `destination` should be set to an AWS region (for example, `us-east-1`).
The MongoDB backend doesn't use the `destination` option for its configuration. It uses the `mongodb.conf`
[configuration file](../backends/mongodb/) instead.
-- `data source = as collected`, or `data source = average`, or `data source = sum`, selects the kind of
- data that will be sent to the backend.
+- `data source = as collected`, or `data source = average`, or `data source = sum`, selects the kind of data that will
+ be sent to the backend.
-- `hostname = my-name`, is the hostname to be used for sending data to the backend server. By default
- this is `[global].hostname`.
+- `hostname = my-name`, is the hostname to be used for sending data to the backend server. By default this is
+ `[global].hostname`.
- `prefix = Netdata`, is the prefix to add to all metrics.
-- `update every = 10`, is the number of seconds between sending data to the backend. Netdata will add
- some randomness to this number, to prevent stressing the backend server when many Netdata servers send
- data to the same backend. This randomness does not affect the quality of the data, only the time they
- are sent.
-
-- `buffer on failures = 10`, is the number of iterations (each iteration is `[backend].update every` seconds)
- to buffer data, when the backend is not available. If the backend fails to receive the data after that
- many failures, data loss on the backend is expected (Netdata will also log it).
-
-- `timeout ms = 20000`, is the timeout in milliseconds to wait for the backend server to process the data.
- By default this is `2 * update_every * 1000`.
-
-- `send hosts matching = localhost *` includes one or more space separated patterns, using `*` as wildcard
- (any number of times within each pattern). The patterns are checked against the hostname (the localhost
- is always checked as `localhost`), allowing us to filter which hosts will be sent to the backend when
- this Netdata is a central Netdata aggregating multiple hosts. A pattern starting with `!` gives a
- negative match. So to match all hosts named `*db*` except hosts containing `*slave*`, use
- `!*slave* *db*` (so, the order is important: the first pattern matching the hostname will be used - positive
- or negative).
-
-- `send charts matching = *` includes one or more space separated patterns, using `*` as wildcard (any
- number of times within each pattern). The patterns are checked against both chart id and chart name.
- A pattern starting with `!` gives a negative match. So to match all charts named `apps.*`
- except charts ending in `*reads`, use `!*reads apps.*` (so, the order is important: the first pattern
- matching the chart id or the chart name will be used - positive or negative).
-
-- `send names instead of ids = yes | no` controls the metric names Netdata should send to backend.
- Netdata supports names and IDs for charts and dimensions. Usually IDs are unique identifiers as read
- by the system and names are human friendly labels (also unique). Most charts and metrics have the same
- ID and name, but in several cases they are different: disks with device-mapper, interrupts, QoS classes,
- statsd synthetic charts, etc.
-
-- `host tags = list of TAG=VALUE` defines tags that should be appended on all metrics for the given host.
- These are currently only sent to opentsdb and prometheus. Please use the appropriate format for each
- time-series db. For example opentsdb likes them like `TAG1=VALUE1 TAG2=VALUE2`, but prometheus like
- `tag1="value1",tag2="value2"`. Host tags are mirrored with database replication (streaming of metrics
- between Netdata servers).
+- `update every = 10`, is the number of seconds between sending data to the backend. Netdata will add some randomness
+ to this number, to prevent stressing the backend server when many Netdata servers send data to the same backend.
+ This randomness does not affect the quality of the data, only the time they are sent.
+
+- `buffer on failures = 10`, is the number of iterations (each iteration is `[backend].update every` seconds) to
+ buffer data, when the backend is not available. If the backend fails to receive the data after that many failures,
+ data loss on the backend is expected (Netdata will also log it).
+
+- `timeout ms = 20000`, is the timeout in milliseconds to wait for the backend server to process the data. By default
+ this is `2 * update_every * 1000`.
+
+- `send hosts matching = localhost *` includes one or more space separated patterns, using `*` as wildcard (any number
+ of times within each pattern). The patterns are checked against the hostname (the localhost is always checked as
+ `localhost`), allowing us to filter which hosts will be sent to the backend when this Netdata is a central Netdata
+ aggregating multiple hosts. A pattern starting with `!` gives a negative match. So to match all hosts named `*db*`
+ except hosts containing `*slave*`, use `!*slave* *db*` (so, the order is important: the first pattern matching the
+ hostname will be used - positive or negative).
+
+- `send charts matching = *` includes one or more space separated patterns, using `*` as wildcard (any number of times
+ within each pattern). The patterns are checked against both chart id and chart name. A pattern starting with `!`
+ gives a negative match. So to match all charts named `apps.*` except charts ending in `*reads`, use `!*reads
+ apps.*` (so, the order is important: the first pattern matching the chart id or the chart name will be used -
+ positive or negative).
+
+- `send names instead of ids = yes | no` controls the metric names Netdata should send to backend. Netdata supports
+ names and IDs for charts and dimensions. Usually IDs are unique identifiers as read by the system and names are
+ human friendly labels (also unique). Most charts and metrics have the same ID and name, but in several cases they
+ are different: disks with device-mapper, interrupts, QoS classes, statsd synthetic charts, etc.
+
+- `host tags = list of TAG=VALUE` defines tags that should be appended on all metrics for the given host. These are
+ currently only sent to opentsdb and prometheus. Please use the appropriate format for each time-series db. For
+ example opentsdb likes them like `TAG1=VALUE1 TAG2=VALUE2`, but prometheus like `tag1="value1",tag2="value2"`. Host
+ tags are mirrored with database replication (streaming of metrics between Netdata servers).
## monitoring operation
@@ -194,16 +193,15 @@ Netdata provides 5 charts:
2. **Buffered data size**, the amount of data (in KB) Netdata added the buffer.
-3. ~~**Backend latency**, the time the backend server needed to process the data Netdata sent.
- If there was a re-connection involved, this includes the connection time.~~
- (this chart has been removed, because it only measures the time Netdata needs to give the data
- to the O/S - since the backend servers do not ack the reception, Netdata does not have any means
- to measure this properly).
+3. ~~**Backend latency**, the time the backend server needed to process the data Netdata sent. If there was a
+ re-connection involved, this includes the connection time.~~ (this chart has been removed, because it only measures
+ the time Netdata needs to give the data to the O/S - since the backend servers do not ack the reception, Netdata
+ does not have any means to measure this properly).
4. **Backend operations**, the number of operations performed by Netdata.
-5. **Backend thread CPU usage**, the CPU resources consumed by the Netdata thread, that is responsible
- for sending the metrics to the backend server.
+5. **Backend thread CPU usage**, the CPU resources consumed by the Netdata thread, that is responsible for sending the
+ metrics to the backend server.
![image](https://cloud.githubusercontent.com/assets/2662304/20463536/eb196084-af3d-11e6-8ee5-ddbd3b4d8449.png)
@@ -216,7 +214,8 @@ Netdata adds 4 alarms:
1. `backend_last_buffering`, number of seconds since the last successful buffering of backend data
2. `backend_metrics_sent`, percentage of metrics sent to the backend server
3. `backend_metrics_lost`, number of metrics lost due to repeating failures to contact the backend server
-4. ~~`backend_slow`, the percentage of time between iterations needed by the backend time to process the data sent by Netdata~~ (this was misleading and has been removed).
+4. ~~`backend_slow`, the percentage of time between iterations needed by the backend time to process the data sent by
+ Netdata~~ (this was misleading and has been removed).
![image](https://cloud.githubusercontent.com/assets/2662304/20463779/a46ed1c2-af43-11e6-91a5-07ca4533cac3.png)
diff --git a/backends/WALKTHROUGH.md b/backends/WALKTHROUGH.md
index 19f4ac0e17..c6461db469 100644
--- a/backends/WALKTHROUGH.md
+++ b/backends/WALKTHROUGH.md
@@ -2,123 +2,102 @@
## Intro
-In this article I will walk you through the basics of getting Netdata,
-Prometheus and Grafana all working together and monitoring your application
-servers. This article will be using docker on your local workstation. We will be
-working with docker in an ad-hoc way, launching containers that run ‘/bin/bash’
-and attaching a TTY to them. I use docker here in a purely academic fashion and
-do not condone running Netdata in a container. I pick this method so individuals
-without cloud accounts or access to VMs can try this out and for it’s speed of
-deployment.
+In this article I will walk you through the basics of getting Netdata, Prometheus and Grafana all working together and
+monitoring your application servers. This article will be using docker on your local workstation. We will be working
+with docker in an ad-hoc way, launching containers that run ‘/bin/bash’ and attaching a TTY to them. I use docker here
+in a purely academic fashion and do not condone running Netdata in a container. I pick this method so individuals
+without cloud accounts or access to VMs can try this out and for it’s speed of deployment.
## Why Netdata, Prometheus, and Grafana
-Some time ago I was introduced to Netdata by a coworker. We were attempting to
-troubleshoot python code which seemed to be bottlenecked. I was instantly
-impressed by the amount of metrics Netdata exposes to you. I quickly added
-Netdata to my set of go-to tools when troubleshooting systems performance.
-
-Some time ago, even later, I was introduced to Prometheus. Prometheus is a
-monitoring application which flips the normal architecture around and polls
-rest endpoints for its metrics. This architectural change greatly simplifies
-and decreases the time necessary to begin monitoring your applications.
-Compared to current monitoring solutions the time spent on designing the
-infrastructure is greatly reduced. Running a single Prometheus server per
-application becomes feasible with the help of Grafana.
-
-Grafana has been the go to graphing tool for… some time now. It’s awesome,
-anyone that has used it knows it’s awesome. We can point Grafana at Prometheus
-and use Prometheus as a data source. This allows a pretty simple overall
-monitoring architecture: Install Netdata on your application servers, point
-Prometheus at Netdata, and then point Grafana at Prometheus.
-
-I’m omitting an important ingredient in this stack in order to keep this tutorial
-simple and that is service discovery. My personal preference is to use Consul.
-Prometheus can plug into consul and automatically begin to scrape new hosts that
-register a Netdata client with Consul.
-
-At the end of this tutorial you will understand how each technology fits
-together to create a modern monitoring stack. This stack will offer you
-visibility into your application and systems performance.
+Some time ago I was introduced to Netdata by a coworker. We were attempting to troubleshoot python code which seemed to
+be bottlenecked. I was instantly impressed by the amount of metrics Netdata exposes to you. I quickly added Netdata to
+my set of go-to tools when troubleshooting systems performance.
+
+Some time ago, even later, I was introduced to Prometheus. Prometheus is a monitoring application which flips the normal
+architecture around and polls rest endpoints for its metrics. This architectural change greatly simplifies and decreases
+the time necessary to begin monitoring your applications. Compared to current monitoring solutions the time spent on
+designing the infrastructure is greatly reduced. Running a single Prometheus server per application becomes feasible
+with the help of Grafana.
+
+Grafana has been the go to graphing tool for… some time now. It’s awesome, anyone that has used it knows it’s awesome.
+We can point Grafana at Prometheus and use Prometheus as a data source. This allows a pretty simple overall monitoring
+architecture: Install Netdata on your application servers, point Prometheus at Netdata, and then point Grafana at
+Prometheus.
+
+I’m omitting an important ingredient in this stack in order to keep this tutorial simple and that is service discovery.
+My personal preference is to use Consul. Prometheus can plug into consul and automatically begin to scrape new hosts
+that register a Netdata client with Consul.
+
+At the end of this tutorial you will understand how each technology fits together to create a modern monitoring stack.
+This stack will offer you visibility into your application and systems performance.
## Getting Started - Netdata
-To begin let’s create our container which we will install Netdata on. We need
-to run a container, forward the necessary port that Netdata listens on, and
-attach a tty so we can interact with the bash shell on the container. But
-before we do this we want name resolution between the two containers to work.
-In order to accomplish this we will create a user-defined network and attach
-both containers to this network. The first command we should run is:
+To begin let’s create our container which we will install Netdata on. We need to run a container, forward the necessary
+port that Netdata listens on, and attach a tty so we can interact with the bash shell on the container. But before we do
+this we want name resolution between the two containers to work. In order to accomplish this we will create a
+user-defined network and attach both containers to this network. The first command we should run is:
```sh
docker network create --driver bridge netdata-tutorial
```
-With this user-defined network created we can now launch our container we will
-install Netdata on and point it to this network.
+With this user-defined network created we can now launch our container we will install Netdata on and point it to this
+network.
```sh
docker run -it --name netdata --hostname netdata --network=netdata-tutorial -p 19999:19999 centos:latest '/bin/bash'
```
-This command creates an interactive tty session (-it), gives the container both
-a name in relation to the docker daemon and a hostname (this is so you know what
-container is which when working in the shells and docker maps hostname
-resolution to this container), forwards the local port 19999 to the container’s
-port 19999 (-p 19999:19999), sets the command to run (/bin/bash) and then
-chooses the base container images (centos:latest). After running this you should
-be sitting inside the shell of the container.
+This command creates an interactive tty session (-it), gives the container both a name in relation to the docker daemon
+and a hostname (this is so you know what container is which when working in the shells and docker maps hostname
+resolution to this container), forwards the local port 19999 to the container’s port 19999 (-p 19999:19999), sets the
+command to run (/bin/bash) and then chooses the base container images (centos:latest). After running this you should be
+sitting inside the shell of the container.
-After we have entered the shell we can install Netdata. This process could not
-be easier. If you take a look at [this link](../packaging/installer/#installation), the Netdata devs give us
-several one-liners to install Netdata. I have not had any issues with these one
-liners and their bootstrapping scripts so far (If you guys run into anything do
-share). Run the following command in your container.
+After we have entered the shell we can install Netdata. This process could not be easier. If you take a look at [this
+link](../packaging/installer/#installation), the Netdata devs give us several one-liners to install Netdata. I have not
+had any issues with these one liners and their bootstrapping scripts so far (If you guys run into anything do share).
+Run the following command in your container.
```sh
bash <(curl -Ss https://my-netdata.io/kickstart.sh) --dont-wait
```
-After the install completes you should be able to hit the Netdata dashboard at
-<http://localhost:19999/> (replace localhost if you’re doing this on a VM or have
-the docker container hosted on a machine not on your local system). If this is
-your first time using Netdata I suggest you take a look around. The amount of
-time I’ve spent digging through /proc and calculating my own metrics has been
-greatly reduced by this tool. Take it all in.
+After the install completes you should be able to hit the Netdata dashboard at <http://localhost:19999/> (replace
+localhost if you’re doing this on a VM or have the docker container hosted on a machine not on your local system). If
+this is your first time using Netdata I suggest you take a look around. The amount of time I’ve spent digging through
+/proc and calculating my own metrics has been greatly reduced by this tool. Take it all in.
Next I want to draw your attention to a particular endpoint. Navigate to
-<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> In your
-browser. This is the endpoint which publishes all the metrics in a format which
-Prometheus understands. Let’s take a look at one of these metrics.
-`netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"}
-0.0831255 1501271696000` This metric is representing several things which I will
-go in more details in the section on prometheus. For now understand that this
-metric: `netdata_system_cpu_percentage_average` has several labels: [chart,
-family, dimension]. This corresponds with the first cpu chart you see on the
-Netdata dashboard.
+<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> In your browser. This is the endpoint which
+publishes all the metrics in a format which Prometheus understands. Let’s take a look at one of these metrics.
+`netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0831255 1501271696000` This
+metric is representing several things which I will go in more details in the section on prometheus. For now understand
+that this metric: `netdata_system_cpu_percentage_average` has several labels: (chart, family, dimension). This
+corresponds with the first cpu chart you see on the Netdata dashboard.
![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%204.00.45%20PM.png)
-This CHART is called ‘system.cpu’, The FAMILY is cpu, and the DIMENSION we are
-observing is “system”. You can begin to draw links between the charts in Netdata
-to the prometheus metrics format in this manner.
+This CHART is called ‘system.cpu’, The FAMILY is cpu, and the DIMENSION we are observing is “system”. You can begin to
+draw links between the charts in Netdata to the prometheus metrics format in this manner.
## Prometheus
-We will be installing prometheus in a container for purpose of demonstration.
-While prometheus does have an official container I would like to walk through
-the install process and setup on a fresh container. This will allow anyone
+We will be installing prometheus in a container for purpose of demonstration. While prometheus does have an official
+container I would like to walk through the install process and setup on a fresh container. This will allow anyone
reading to migrate this tutorial to a VM or Server of any sort.
-Let’s start another container in the same fashion as we did the Netdata
-container.
+Let’s start another container in the same fashion as we did the Netdata container.
```sh
docker run -it --name prometheus --hostname prometheus
--network=netdata-tutorial -p 9090:9090 centos:latest '/bin/bash'
```
-This should drop you into a shell once again. Once there quickly install your favorite editor as we will be editing files later in this tutorial.
+This should drop you into a shell once again. Once there quickly install your favorite editor as we will be editing
+files later in this tutorial.
```sh
yum install vim -y
@@ -139,39 +118,33 @@ mkdir /opt/prometheus
sudo tar -xvf /tmp/prometheus-*linux-amd64.tar.gz -C /opt/prometheus --strip=1
```
-This should get prometheus installed into the container. Let’s test that we can run prometheus and connect to it’s web interface.
+This should get prometheus installed into the container. Let’s test that we can run prometheus and connect to it’s web
+interface.
```sh
/opt/prometheus/prometheus
```
-Now attempt to go to <http://localhost:9090/>. You should be presented with the
-prometheus homepage. This is a good point to talk about Prometheus’s data model
-which can be viewed here: <https://prometheus.io/docs/concepts/data_model/> As
-explained we have two key elements in Prometheus metrics. We have the ‘metric’
-and its ‘labels’. Labels allow for granularity between metrics. Let’s use our
-previous example to further explain.
+Now attempt to go to <http://localhost:9090/>. You should be presented with the prometheus homepage. This is a good
+point to talk about Prometheus’s data model which can be viewed here: <https://prometheus.io/docs/concepts/data_model/>
+As explained we have two key elements in Prometheus metrics. We have the ‘metric’ and its ‘labels’. Labels allow for
+granularity between metrics. Let’s use our previous example to further explain.
-```
+```conf
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0831255 1501271696000
```
-Here our metric is
-‘netdata_system_cpu_percentage_average’ and our labels are ‘chart’, ‘family’,
-and ‘dimension. The last two values constitute the actual metric value for the
-metric type (gauge, counter, etc…). We can begin graphing system metrics with
-this information, but first we need to hook up Prometheus to poll Netdata stats.
-
-Let’s move our attention to Prometheus’s configuration. Prometheus gets it
-config from the file located (in our example) at
-`/opt/prometheus/prometheus.yml`. I won’t spend an extensive amount of time
-going over the configuration values documented here:
-<https://prometheus.io/docs/operating/configuration/>. We will be adding a new
-“job” under the “scrape_configs”. Let’s make the “scrape_configs” section look
-like this (we can use the dns name Netdata due to the custom user-defined
-network we created in docker beforehand).
-
-```yml
+Here our metric is ‘netdata_system_cpu_percentage_average’ and our labels are ‘chart’, ‘family’, and ‘dimension. The
+last two values constitute the actual metric value for the metric type (gauge, counter, etc…). We can begin graphing
+system metrics with this information, but first we need to hook up Prometheus to poll Netdata stats.
+
+Let’s move our attention to Prometheus’s configuration. Prometheus gets it config from the file located (in our example)
+at `/opt/prometheus/prometheus.yml`. I won’t spend an extensive amount of time going over the configuration values
+documented here: <https://prometheus.io/docs/operating/configuration/>. We will be adding a new“job” under the
+“scrape_configs”. Let’s make the “scrape_configs” section look like this (we can use the dns name Netdata due to the
+custom user-defined network we created in docker beforehand).
+
+```yaml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
@@ -192,84 +165,66 @@ scrape_configs:
- targets: ['netdata:19999']
```
-Let’s start prometheus once again by running `/opt/prometheus/prometheus`. If we
-
-now navigate to prometheus at ‘<http://localhost:9090/targets’> we should see our
-
-target being successfully scraped. If we now go back to the Prometheus’s
-homepage and begin to type ‘netdata\_’ Prometheus should auto complete metrics
-it is now scraping.
+Let’s start prometheus once again by running `/opt/prometheus/prometheus`. If we now navigate to prometheus at
+‘<http://localhost:9090/targets’> we should see our target being successfully scraped. If we now go back to the
+Prometheus’s homepage and begin to type ‘netdata\_’ Prometheus should auto complete metrics it is now scraping.
![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%205.13.43%20PM.png)
-Let’s now start exploring how we can graph some metrics. Back in our NetData
-container lets get the CPU spinning with a pointless busy loop. On the shell do
-the following:
+Let’s now start exploring how we can graph some metrics. Back in our NetData container lets get the CPU spinning with a
+pointless busy loop. On the shell do the following:
-```
+```sh
[root@netdata /]# while true; do echo "HOT HOT HOT CPU"; done
```
-Our NetData cpu graph should be showing some activity. Let’s represent this in
-Prometheus. In order to do this let’s keep our metrics page open for reference:
-<http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> We are
-setting out to graph the data in the CPU chart so let’s search for “system.cpu”
-in the metrics page above. We come across a section of metrics with the first
-comments `# COMMENT homogeneous chart "system.cpu", context "system.cpu", family
-"cpu", units "percentage"` Followed by the metrics. This is a good start now let
-us drill down to the specific metric we would like to graph.
+Our NetData cpu graph should be showing some activity. Let’s represent this in Prometheus. In order to do this let’s
+keep our metrics page open for reference: <http://localhost:19999/api/v1/allmetrics?format=prometheus&help=yes> We are
+setting out to graph the data in the CPU chart so let’s search for “system.cpu”in the metrics page above. We come across
+a section of metrics with the first comments `# COMMENT homogeneous chart "system.cpu", context "system.cpu", family
+"cpu", units "percentage"` Followed by the metrics. This is a good start now let us drill down to the specific metric we
+would like to graph.
-```
+```conf
# COMMENT
netdata_system_cpu_percentage_average: dimension "system", value is percentage, gauge, dt 1501275951 to 1501275951 inclusive
netdata_system_cpu_percentage_average{chart="system.cpu",family="cpu",dimension="system"} 0.0000000 1501275951000
```
-Here we learn that the metric name we care about is
-‘netdata_system_cpu_percentage_average’ so throw this into Prometheus and see
-what we get. We should see something similar to this (I shut off my busy loop)
+Here we learn that the metric name we care about is‘netdata_system_cpu_percentage_average’ so throw this into Prometheus
+and see what we get. We should see something similar to this (I shut off my busy loop)
![](https://github.com/ldelossa/NetdataTutorial/raw/master/Screen%20Shot%202017-07-28%20at%205.47.53%20PM.png)
-This is a good step toward what we want. Also make note that Prometheus will tag
-on an ‘instance’ label for us which corresponds to our statically defined job in
-the configuration file. This allows us to tailor our queries to specific
-instances. Now we need to isolate the dimension we want in our query. To do this
-let us refine the query slightly. Let’s query the dimension also. Place this
-into our query text box.
-`netdata_system_cpu_percentage_average{dimension="system"}` We now wind up with
-the following graph.
+This is a good step toward what we want. Also make note that Prometheus will tag on an ‘instance’ label for us which
+corresponds to our statically defined job in the configuration file. This allows us to tailor our queries to specific
+instances. Now we need to isolate the dimension we want in our query. To do this let us refine the query slightly. Let’s