summaryrefslogtreecommitdiffstats
path: root/collectors/ebpf.plugin/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'collectors/ebpf.plugin/README.md')
-rw-r--r--collectors/ebpf.plugin/README.md219
1 files changed, 219 insertions, 0 deletions
diff --git a/collectors/ebpf.plugin/README.md b/collectors/ebpf.plugin/README.md
new file mode 100644
index 0000000000..bf00cac167
--- /dev/null
+++ b/collectors/ebpf.plugin/README.md
@@ -0,0 +1,219 @@
+<!--
+---
+title: "eBPF monitoring with Netdata"
+custom_edit_url: https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/README.md
+---
+-->
+
+# eBPF monitoring with Netdata
+
+This collector plugin uses eBPF (Extended Berkeley Packet Filter) to monitor system calls inside your operating system's
+kernel. For now, the main goal of this plugin is to monitor IO and process management on the host where it is running.
+
+<figure>
+ <img src="https://user-images.githubusercontent.com/1153921/74746434-ad6a1e00-5222-11ea-858a-a7882617ae02.png" alt="An example of VFS charts, made possible by the eBPF collector plugin" />
+ <figcaption>An example of VFS charts, made possible by the eBPF collector plugin</figcaption>
+</figure>
+
+With this eBPF collector, you can monitor sophisticated system-level metrics about your complex applications while
+maintaining Netdata's [high standards for performance](#performance).
+
+## Enable the collector on Linux
+
+eBPF is only available on Linux systems, which means this collector only works on Linux.
+
+The collector is currently in an _alpha_ stage, as we are still working on improving compatibility with more Linux
+distributions and versions, and to ensure the collector works as expected.
+
+Follow the next few steps to ensure compatibility, prepare your system, install Netdata with eBPF compiled, and enable
+the collector.
+
+### Ensure kernel compatibility
+
+To enable this plugin and its collector, you must be on a Linux system with a kernel that is more recent than `4.11.0`
+and compiled with the option `CONFIG_KPROBES=y`. You can verify whether your kernel has this option enabled by running
+the following commands:
+
+```bash
+grep CONFIG_KPROBES=y /boot/config-$(uname -r)
+zgrep CONFIG_KPROBES=y /proc/config.gz
+```
+
+If `Kprobes` is enabled, you will see `CONFIG_KPROBES=y` as the command's output, and can skip ahead to the next step: [mount `debugfs` and `tracefs`](#mount-debugfs-and-tracefs).
+
+If you don't see `CONFIG_KPROBES=y` for any of the commands above, you will have to recompile your kernel to enable it.
+
+The process of recompiling Linux kernels varies based on your distribution and version. Read the documentation for your
+system's distribution to learn more about the specific workflow for recompiling the kernel, ensuring that you set the
+`CONFIG_KPROBES` setting to `y` in the process.
+
+- [Ubuntu](https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel)
+- [Debian](https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s-common-official)
+- [Fedora](https://fedoraproject.org/wiki/Building_a_custom_kernel)
+- [CentOS](https://wiki.centos.org/HowTos/Custom_Kernel)
+- [Arch Linux](https://wiki.archlinux.org/index.php/Kernel/Traditional_compilation)
+- [Slackware](https://docs.slackware.com/howtos:slackware_admin:kernelbuilding)
+
+### Mount `debugfs` and `tracefs`
+
+The eBPF collector also requires both the `tracefs` and `debugfs` filesystems. Try mounting the `tracefs` and `debugfs`
+filesystems using the commands below:
+
+```bash
+sudo mount -t debugfs nodev /sys/kernel/debug
+sudo mount -t tracefs nodev /sys/kernel/tracing
+```
+​
+If they are already mounted, you will see an error. If they are not mounted, they should be after running those two
+commands. You can also configure your system's `/etc/fstab` configuration to mount these filesystems.
+
+### Install Netdata with the `--enable-ebpf`
+
+eBPF collection is only enabled if you install Netdata with the `--enable-ebpf` option.
+
+If you installed via the [one-line installation script](/packaging/installer/README.md), [64-bit
+binary](/packaging/installer/methods/kickstart-64.md), or [manually](/packaging/installer/methods/manual.md), you can
+append the `--enable-ebpf` option when you reinstall.
+
+For example, if you used the one-line installation script, you can reinstall Netdata with the following:
+
+```bash
+bash <(curl -Ss https://my-netdata.io/kickstart.sh) --enable-ebpf
+```
+
+This process will not overwrite any changes you made to configuration files.
+
+### Edit `netdata.conf` to enable the collector
+
+After installing Netdata with the `--enable-ebpf` option, you still need to enable the plugin explicitly. To do so, use
+`edit-config` to open `netdata.conf` and set `ebpf = yes` in the `[plugins]` section.
+
+```bash
+cd /etc/netdata/ # Replace with your Netdata configuration directory, if not /etc/netdata/
+./edit-config netdata.conf
+```
+
+Scroll down to the `[plugins]` section and uncomment the `ebpf` line after changing its setting to `yes`.
+
+```conf
+[plugins]
+ ebpf = yes
+```
+
+Restart Netdata with `service netdata restart`, or the appropriate method for your system, and reload your browser to
+see eBPF charts.
+
+## Charts
+
+The first version of `ebpf.plugin` gives a general vision about process running on computer. The charts related
+to this plugin are inside the **eBPF** option on dashboard menu and divided in three groups `file`, `vfs`, and
+`process`.
+
+All the collector charts show values per second. The collector retains the total value, but charts only show the
+difference between the previous and current metrics collections.
+
+### File
+
+This group has two charts to demonstrate how software interacts with the Linux kernel to open and close file
+descriptors.
+
+#### File descriptor
+
+This chart contains two dimensions that show the number of calls to the functions `do_sys_open` and `__close_fd`. These
+functions are not commonly called from software, but they are behind the system cals `open(2)`, `openat(2)`, and
+`close(2)`. ​
+
+#### File error
+
+This charts demonstrate the number of times some software tried and failed to open or close a file descriptor.
+
+### VFS
+
+A [virtual file system](https://en.wikipedia.org/wiki/Virtual_file_system) (VFS) is a layer on top of regular
+filesystems. The functions present inside this API are used for all filesystems, so it's possible the charts in this
+group won't show _all_ the actions that occured on your system.
+
+#### Deleted objects
+
+This chart monitors calls for `vfs_unlink`. This function is responsible for removing object from the file system.
+
+#### IO
+
+This chart shows the number of calls to the functions `vfs_read` and `vfs_write`.
+
+#### IO bytes
+
+This chart also monitors `vfs_read` and `vfs_write`, but instead shows the total of bytes read and written with these
+functions.
+
+Netdata displays the number of bytes written as negative, because they are moving down to disk.
+
+#### IO errors
+
+Netdata counts and shows the number of instances where a running program experiences a read or write error.
+
+### Process
+
+For this group, the eBPF collector monitors process/thread creation and process end, and then displays any errors in the
+following charts.
+
+#### Process thread
+
+Internally, the Linux kernel treats both process and threads as `tasks`. To create a thread, the kernel offers a few
+system calls: `fork(2)`, `vfork(2)` and `clone(2)`. Each of these system calls in turn use the function `_do_fork`. To
+generate this chart, Netdata monitors `_do_fork` to populate the `process` dimension, and monitors `sys_clone` to
+identify threads
+
+#### Exit
+
+Ending a task is actually two steps. The first is a call to the internal function `do_exit`, which notifies the
+operating system that the task is finishing its work. The second step is the release of kernel information, which is
+done with the internal function `release_task`. The difference between the two dimensions can help you discover [zombie
+processes](https://en.wikipedia.org/wiki/Zombie_process).
+
+#### Task error
+
+The functions responsible for ending tasks do not return values, so this chart contains information about failures on
+process and thread creation.
+
+## Configuration
+
+This plugin has different configuration modes, all of which can be adjusted with its configuration file at
+`ebpf.conf`. By default, the plugin uses the less expensive `entry` mode. You can learn more about how the
+plugin works using `entry` by reading this configuration file.
+
+You can always edit this file with `edit-config`:
+
+```bash
+cd /etc/netdata/ # Replace with your Netdata configuration directory, if not /etc/netdata/
+./edit-config ebpf.conf
+```
+
+### `[global]`
+
+In this section we define variables applied to the whole collector and the other subsections.
+
+#### load
+
+The collector has two different eBPF programs. These programs monitor the same functions inside the kernel, but they
+monitor, process, and display different kinds of information.
+
+By default, this plugin uses the `entry` mode. Changing this mode can create significant overhead on your operating
+system, but also offer important information if you are developing or debugging software. The `load` option accepts the
+following values: ​
+
+- `entry`: This is the default mode. In this mode, Netdata monitors only calls for the functions described in the
+ sections above. When this mode is selected, Netdata does not show charts related to errors.
+- `return`: In this mode, Netdata also monitors the calls to function. In the `entry` mode, Netdata only traces kernel
+ functions, but with `return`, Netdata also monitors the return of each function. This mode creates more charts, but
+ also creates an overhead of roughly 110 nanosections for each function call.
+
+## Performance
+
+Because eBPF monitoring is complex, we are evaluating the performance of this new collector in various real-world
+conditions, across various system loads, and when monitoring complex applications.
+
+Our [initial testing](https://github.com/netdata/netdata/issues/8195) shows the performance of the eBPF collector is
+nearly identical to our [apps.plugin collector](/collectors/apps.plugin/README.md), despite collecting and displaying
+much more sophisticated metrics. You can now use the eBPF to gather deeper insights without affecting the performance of
+your complex applications at any load.