summaryrefslogtreecommitdiffstats
path: root/packaging/docker
diff options
context:
space:
mode:
authorAustin S. Hemmelgarn <austin@netdata.cloud>2020-05-28 04:29:56 -0400
committerGitHub <noreply@github.com>2020-05-28 18:29:56 +1000
commited8a6bea174bc4c52d4af4f4b138b5fe98003b49 (patch)
tree3b641745d9494fd62480a181b205dd93a4f3559f /packaging/docker
parenta68436be6f2553284e4cf0471e39cbb215e12b34 (diff)
Added health check functionality to our Docker images. (#9172)
* Add a `ping` command to netdatacli to check if agent is alive. This provides a way to trivially check if the agent itself appears to be running (namely, the command parser for netdatacli in the agent itself is working and responding), allowing users to check this without having to rely on us continuing to have `help` be a command sent to the agent instead of executing locally. * Add a basic health check to our docke rimages. This adds a relatively basic health checker script to our Docker images. By default it verifies that the `/api/v1/info` endpoint returns a 200 status code. It also supports checking different endpoints or using `netdatacli ping` to check that Netdata is running, all controlled by a new Docker environment variable: `NETDATA_HEALTH_CHECK`. * Avoid unnessecary `chmod` in Dockerfile. Suggested by @prologic. * Fix typo in docs. * Update environment variable name to be more clear. Also add `-L` to `curl` command in health check to follow redirects.
Diffstat (limited to 'packaging/docker')
-rw-r--r--packaging/docker/Dockerfile3
-rw-r--r--packaging/docker/README.md23
-rwxr-xr-xpackaging/docker/health.sh17
3 files changed, 43 insertions, 0 deletions
diff --git a/packaging/docker/Dockerfile b/packaging/docker/Dockerfile
index 0940cf7fc3..c36554c8d2 100644
--- a/packaging/docker/Dockerfile
+++ b/packaging/docker/Dockerfile
@@ -60,6 +60,7 @@ FROM netdata/base:${ARCH}
# Copy files over
RUN mkdir -p /opt/src
COPY --from=builder /app /
+COPY packaging/docker/health.sh /health.sh
# Configure system
ARG NETDATA_UID=201
@@ -106,3 +107,5 @@ ENV NETDATA_PORT 19999
EXPOSE $NETDATA_PORT
ENTRYPOINT ["/usr/sbin/run.sh"]
+
+HEALTHCHECK --interval=60s --timeout=10s --retries=3 CMD /health.sh
diff --git a/packaging/docker/README.md b/packaging/docker/README.md
index 523dbf7731..b1d95fd653 100644
--- a/packaging/docker/README.md
+++ b/packaging/docker/README.md
@@ -98,6 +98,29 @@ volumes:
Run `docker-compose up -d` in the same directory as the `docker-compose.yml` file to start the container.
+## Health Checks
+
+Our Docker image provides integrated support for health checks through the standard Docker interfaces.
+
+You can control how the health checks run by using the environment variable `NETDATA_HEALTHCHECK_TARGET` as follows:
+
+- If left unset, the health check will attempt to access the
+ `/api/v1/info` endpoint of the agent.
+- If set to the exact value 'cli', the health check
+ script will use `netdatacli ping` to determine if the agent is running
+ correctly or not. This is sufficient to ensure that Netdata did not
+ hang during startup, but does not provide a rigorous verification
+ that the daemon is collecting data or is otherwise usable.
+- If set to anything else, the health check will treat the vaule as a
+ URL to check for a 200 status code on. In most cases, this should
+ start with `http://localhost:19999/` to check the agent running in
+ the container.
+
+In most cases, the default behavior of checking the `/api/v1/info`
+endpoint will be sufficient. If you are using a configuration which
+disables the web server or restricts access to certain API's, you will
+need to use a non-default configuration for health checks to work.
+
## Configure Agent containers
You may need to configure the above `docker run...` and `docker-compose` commands based on your needs. You should
diff --git a/packaging/docker/health.sh b/packaging/docker/health.sh
new file mode 100755
index 0000000000..088a6c0d71
--- /dev/null
+++ b/packaging/docker/health.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+#
+# This is the script that gets run for our Docker image health checks.
+
+if [ -z "${NETDATA_HEALTHCHECK_TARGET}" ] ; then
+ # If users didn't request something else, query `/api/v1/info`.
+ NETDATA_HEALTHCHECK_TARGET="http://localhost:19999/api/v1/info"
+fi
+
+case "${NETDATA_HEALTHCHECK_TARGET}" in
+ cli)
+ netdatacli ping || exit 1
+ ;;
+ *)
+ curl -sSL "${NETDATA_HEALTHCHECK_TARGET}" || exit 1
+ ;;
+esac