summaryrefslogtreecommitdiffstats
path: root/collectors
diff options
context:
space:
mode:
authorD34DC3N73R <D34DC3N73R@users.noreply.github.com>2023-04-20 09:05:06 -0700
committerGitHub <noreply@github.com>2023-04-20 09:05:06 -0700
commit5b676d5f912fc27a126ff4ff6ba5b35da9cf930c (patch)
treec9366410df94fa2a5fd695e1d10ac32004d8349e /collectors
parentfadd2d81a9948016777fb4fcc0c5cd2ee16ac46e (diff)
Add Docker instructions to enable Nvidia GPUs (#14924)
* Add Docker instructions `gcompat` adds a glibc library for binaries linked against glibc, like nvidia drivers, which don't natively work with musl distributions like alpine. * Edit duplicate heading
Diffstat (limited to 'collectors')
-rw-r--r--collectors/python.d.plugin/nvidia_smi/README.md74
1 files changed, 71 insertions, 3 deletions
diff --git a/collectors/python.d.plugin/nvidia_smi/README.md b/collectors/python.d.plugin/nvidia_smi/README.md
index cb40d87d75..7d45289a4b 100644
--- a/collectors/python.d.plugin/nvidia_smi/README.md
+++ b/collectors/python.d.plugin/nvidia_smi/README.md
@@ -11,9 +11,6 @@ learn_rel_path: "Integrations/Monitor/Devices"
Monitors performance metrics (memory usage, fan speed, pcie bandwidth utilization, temperature, etc.) using `nvidia-smi` cli tool.
-> **Warning**: this collector does not work when the Netdata Agent is [running in a container](https://github.com/netdata/netdata/blob/master/packaging/docker/README.md).
-
-
## Requirements and Notes
- You must have the `nvidia-smi` tool installed and your NVIDIA GPU(s) must support the tool. Mostly the newer high end models used for AI / ML and Crypto or Pro range, read more about [nvidia_smi](https://developer.nvidia.com/nvidia-system-management-interface).
@@ -87,3 +84,74 @@ Now you can manually run the `nvidia_smi` module in debug mode:
./python.d.plugin nvidia_smi debug trace
```
+## Docker
+
+GPU monitoring in a docker container is possible with [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed on the host system, and `gcompat` added to the `NETDATA_EXTRA_APK_PACKAGES` environment variable.
+
+Sample `docker-compose.yml`
+```yaml
+version: '3'
+services:
+ netdata:
+ image: netdata/netdata
+ container_name: netdata
+ hostname: example.com # set to fqdn of host
+ ports:
+ - 19999:19999
+ restart: unless-stopped
+ cap_add:
+ - SYS_PTRACE
+ security_opt:
+ - apparmor:unconfined
+ environment:
+ - NETDATA_EXTRA_APK_PACKAGES=gcompat
+ volumes:
+ - netdataconfig:/etc/netdata
+ - netdatalib:/var/lib/netdata
+ - netdatacache:/var/cache/netdata
+ - /etc/passwd:/host/etc/passwd:ro
+ - /etc/group:/host/etc/group:ro
+ - /proc:/host/proc:ro
+ - /sys:/host/sys:ro
+ - /etc/os-release:/host/etc/os-release:ro
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+ count: all
+ capabilities: [gpu]
+
+volumes:
+ netdataconfig:
+ netdatalib:
+ netdatacache:
+```
+
+Sample `docker run`
+```yaml
+docker run -d --name=netdata \
+ -p 19999:19999 \
+ -e NETDATA_EXTRA_APK_PACKAGES=gcompat \
+ -v netdataconfig:/etc/netdata \
+ -v netdatalib:/var/lib/netdata \
+ -v netdatacache:/var/cache/netdata \
+ -v /etc/passwd:/host/etc/passwd:ro \
+ -v /etc/group:/host/etc/group:ro \
+ -v /proc:/host/proc:ro \
+ -v /sys:/host/sys:ro \
+ -v /etc/os-release:/host/etc/os-release:ro \
+ --restart unless-stopped \
+ --cap-add SYS_PTRACE \
+ --security-opt apparmor=unconfined \
+ --gpus all \
+ netdata/netdata
+```
+
+### Docker Troubleshooting
+To troubleshoot `nvidia-smi` in a docker container, first confirm that `nvidia-smi` is working on the host system. If that is working correctly, run `docker exec -it netdata nvidia-smi` to confirm it's working within the docker container. If `nvidia-smi` is fuctioning both inside and outside of the container, confirm that `nvidia-smi: yes` is uncommented in `python.d.conf`.
+```bash
+docker exec -it netdata bash
+cd /etc/netdata
+./edit-config python.d.conf
+```