diff options
author | Austin S. Hemmelgarn <austin@netdata.cloud> | 2020-05-28 04:29:56 -0400 |
---|---|---|
committer | GitHub <noreply@github.com> | 2020-05-28 18:29:56 +1000 |
commit | ed8a6bea174bc4c52d4af4f4b138b5fe98003b49 (patch) | |
tree | 3b641745d9494fd62480a181b205dd93a4f3559f /packaging/docker | |
parent | a68436be6f2553284e4cf0471e39cbb215e12b34 (diff) |
Added health check functionality to our Docker images. (#9172)
* Add a `ping` command to netdatacli to check if agent is alive.
This provides a way to trivially check if the agent itself appears to be
running (namely, the command parser for netdatacli in the agent itself
is working and responding), allowing users to check this without having
to rely on us continuing to have `help` be a command sent to the agent
instead of executing locally.
* Add a basic health check to our docke rimages.
This adds a relatively basic health checker script to our Docker images.
By default it verifies that the `/api/v1/info` endpoint returns a 200
status code.
It also supports checking different endpoints or using `netdatacli ping`
to check that Netdata is running, all controlled by a new Docker
environment variable: `NETDATA_HEALTH_CHECK`.
* Avoid unnessecary `chmod` in Dockerfile.
Suggested by @prologic.
* Fix typo in docs.
* Update environment variable name to be more clear.
Also add `-L` to `curl` command in health check to follow redirects.
Diffstat (limited to 'packaging/docker')
-rw-r--r-- | packaging/docker/Dockerfile | 3 | ||||
-rw-r--r-- | packaging/docker/README.md | 23 | ||||
-rwxr-xr-x | packaging/docker/health.sh | 17 |
3 files changed, 43 insertions, 0 deletions
diff --git a/packaging/docker/Dockerfile b/packaging/docker/Dockerfile index 0940cf7fc3..c36554c8d2 100644 --- a/packaging/docker/Dockerfile +++ b/packaging/docker/Dockerfile @@ -60,6 +60,7 @@ FROM netdata/base:${ARCH} # Copy files over RUN mkdir -p /opt/src COPY --from=builder /app / +COPY packaging/docker/health.sh /health.sh # Configure system ARG NETDATA_UID=201 @@ -106,3 +107,5 @@ ENV NETDATA_PORT 19999 EXPOSE $NETDATA_PORT ENTRYPOINT ["/usr/sbin/run.sh"] + +HEALTHCHECK --interval=60s --timeout=10s --retries=3 CMD /health.sh diff --git a/packaging/docker/README.md b/packaging/docker/README.md index 523dbf7731..b1d95fd653 100644 --- a/packaging/docker/README.md +++ b/packaging/docker/README.md @@ -98,6 +98,29 @@ volumes: Run `docker-compose up -d` in the same directory as the `docker-compose.yml` file to start the container. +## Health Checks + +Our Docker image provides integrated support for health checks through the standard Docker interfaces. + +You can control how the health checks run by using the environment variable `NETDATA_HEALTHCHECK_TARGET` as follows: + +- If left unset, the health check will attempt to access the + `/api/v1/info` endpoint of the agent. +- If set to the exact value 'cli', the health check + script will use `netdatacli ping` to determine if the agent is running + correctly or not. This is sufficient to ensure that Netdata did not + hang during startup, but does not provide a rigorous verification + that the daemon is collecting data or is otherwise usable. +- If set to anything else, the health check will treat the vaule as a + URL to check for a 200 status code on. In most cases, this should + start with `http://localhost:19999/` to check the agent running in + the container. + +In most cases, the default behavior of checking the `/api/v1/info` +endpoint will be sufficient. If you are using a configuration which +disables the web server or restricts access to certain API's, you will +need to use a non-default configuration for health checks to work. + ## Configure Agent containers You may need to configure the above `docker run...` and `docker-compose` commands based on your needs. You should diff --git a/packaging/docker/health.sh b/packaging/docker/health.sh new file mode 100755 index 0000000000..088a6c0d71 --- /dev/null +++ b/packaging/docker/health.sh @@ -0,0 +1,17 @@ +#!/bin/sh +# +# This is the script that gets run for our Docker image health checks. + +if [ -z "${NETDATA_HEALTHCHECK_TARGET}" ] ; then + # If users didn't request something else, query `/api/v1/info`. + NETDATA_HEALTHCHECK_TARGET="http://localhost:19999/api/v1/info" +fi + +case "${NETDATA_HEALTHCHECK_TARGET}" in + cli) + netdatacli ping || exit 1 + ;; + *) + curl -sSL "${NETDATA_HEALTHCHECK_TARGET}" || exit 1 + ;; +esac |