summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAndrew Maguire <andrewm4894@gmail.com>2022-03-24 20:17:18 +0000
committerGitHub <noreply@github.com>2022-03-24 16:17:18 -0400
commit2fbd9cfc3725689538843ed02fdd28d7330e8aeb (patch)
tree3fb273dab84c6a65ed0f235753bb92da0b6ad287
parentf791ba196511ebb86366c672bb1945531efbe84d (diff)
Add ml notebooks (#12313)
* initial setting up of notebook * add open in colab button * draft work * first version of notebook * fix open in colab button * Update ml/notebooks/README.md Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * use underscores in filename * add one final visualization approach using scatter plots * get a better random sample for plots * small text update * fix link * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * address review comments * add ipynb files to dockerignore Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com>
l---------.dockerignore7
-rw-r--r--.gitignore3
-rw-r--r--ml/notebooks/README.md5
-rw-r--r--ml/notebooks/netdata_anomaly_detection_deepdive.ipynb1712
4 files changed, 1726 insertions, 1 deletions
diff --git a/.dockerignore b/.dockerignore
index 3e4e48b0b5..60ea3668fc 120000
--- a/.dockerignore
+++ b/.dockerignore
@@ -1 +1,6 @@
-.gitignore \ No newline at end of file
+.gitignore
+
+# Jupyter notebook checkpoints
+.ipynb_checkpoints
+# Jupyter notebooks
+.ipynb \ No newline at end of file
diff --git a/.gitignore b/.gitignore
index 6b5423cff1..ab19032c92 100644
--- a/.gitignore
+++ b/.gitignore
@@ -227,3 +227,6 @@ Session.*.vim
# Special exceptions
!packaging/repoconfig/Makefile
+
+# Jupyter notebook checkpoints
+.ipynb_checkpoints
diff --git a/ml/notebooks/README.md b/ml/notebooks/README.md
new file mode 100644
index 0000000000..5e9db6dee8
--- /dev/null
+++ b/ml/notebooks/README.md
@@ -0,0 +1,5 @@
+## Machine Learning Notebooks
+
+This folder is a home for any documentation supporting machine learning related notebooks.
+
+- [Netdata anomaly detection deepdive](netdata_anomaly_detection_deepdive.ipynb): This is a starter notebook to help users understand how anomaly detection works in the Netdata agent and go a little deeper if they want. \ No newline at end of file
diff --git a/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb b/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb
new file mode 100644
index 0000000000..8d0c0c7e59
--- /dev/null
+++ b/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb
@@ -0,0 +1,1712 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "name": "#%% md\n"
+ }
+ },
+ "source": [
+ "## Netdata Anomaly Detection Deepdive"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "name": "#%% md\n"
+ }
+ },
+ "source": [
+ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/netdata/netdata/blob/master/ml/notebooks/netdata_anomaly_detection_deepdive.ipynb)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "name": "#%% md\n"
+ }
+ },
+ "source": [
+ "This notebook will walk through a simplified python based implementation of the C & C++ code in [`netdata/netdata/ml/`](https://github.com/netdata/netdata/tree/master/ml) used to power the [anomaly detection capabilities](https://github.com/netdata/netdata/blob/master/ml/README.md) of the Netdata agent.\n",
+ "\n",
+ "The main goal here is to help interested users learn more about how the machine learning works under the hood. If you just want to get started by enabling ml on your agent you can check out these [simple configuration steps](https://learn.netdata.cloud/docs/agent/ml#configuration). \n",
+ "\n",
+ "🚧 **Note**: This functionality is still under active development and considered experimental. Changes might cause the feature to break. We dogfood it internally and among early adopters within the Netdata community to build the feature. If you would like to get involved and help us with some feedback, email us at analytics-ml-team@netdata.cloud or come join us in the [🤖-ml-powered-monitoring](https://discord.gg/4eRSEUpJnc) channel of the Netdata discord. Alternativley, if GitHub is more of your thing, feel free to create a [GitHub discussion](https://github.com/netdata/netdata/discussions?discussions_q=label%3Aarea%2Fml).\n",
+ "\n",
+ "In this notebook we will:\n",
+ "\n",
+ "1. [**Get raw data**](#get-raw-data): Pull some recent data from one of our demo agents.\n",
+ "2. [**Add some anomalous data**](#add-some-anomalous-data): Be evil and mess up the tail end of the data to make it obviously \"anomalous\".\n",
+ "3. [**Lets do some ML!**](#lets-do-some-ml): Implement an unsupervised clustering based approach to anomaly detection.\n",
+ "4. [**Lets visualize all this!**](#lets-visualize-all-this): Plot and explore all this visually.\n",
+ "5. [**So, how does it _actually_ work?**](#so-how-does-it-actually-work): Dig a little deeper on what's going on under the hood."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "name": "#%% md\n"
+ }
+ },
+ "source": [
+ "### Imports & Helper Functions"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "name": "#%% md\n"
+ }
+ },
+ "source": [
+ "Uncomment and run the next cell to install [netdata-pandas](https://github.com/netdata/netdata-pandas) which we will use to easily pull data from the [Netdata agent REST API](https://learn.netdata.cloud/docs/agent/web/api) into a nice clean [Pandas](https://pandas.pydata.org/) [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) where it will be easier to work with. \n",
+ "\n",
+ "Once you have [netdata-pandas](https://github.com/netdata/netdata-pandas) installed you can comment it back out and rerun the cell to clear the output."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "id": "aL4gm-jUffEx",
+ "pycharm": {
+ "is_executing": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# uncomment the line below (when running in google colab) to install the netdata-pandas library, comment it again when done.\n",
+ "#!pip install netdata-pandas"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "EMZBHjG4mOQh",
+ "pycharm": {
+ "is_executing": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from datetime import datetime, timedelta\n",
+ "import itertools\n",
+ "import random\n",
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns\n",
+ "import matplotlib.patches as mpatches\n",
+ "from sklearn.cluster import KMeans\n",
+ "from scipy.spatial.distance import cdist\n",
+ "from netdata_pandas.data import get_data\n",
+ "\n",
+ "# helper functions\n",
+ "\n",
+ "\n",
+ "def preprocess_df(df, lags_n, diffs_n, smooth_n):\n",
+ " \"\"\"Given a pandas dataframe preprocess it to take differences, add smoothing, lags and abs values. \n",
+ " \"\"\"\n",
+ " if diffs_n >= 1:\n",
+ " # take differences\n",
+ " df = df.diff(diffs_n).dropna()\n",
+ " if smooth_n >= 2:\n",
+ " # apply a rolling average to smooth out the data a bit\n",
+ " df = df.rolling(smooth_n).mean().dropna()\n",
+ " if lags_n >= 1:\n",
+ " # for each dimension add a new columns for each of lags_n lags of the differenced and smoothed values for that dimension\n",
+ " df_columns_new = [f'{col}_lag{n}' for n in range(lags_n+1) for col in df.columns]\n",
+ " df = pd.concat([df.shift(n) for n in range(lags_n + 1)], axis=1).dropna()\n",
+ " df.columns = df_columns_new\n",
+ " # sort columns to have lagged values next to each other for clarity when looking at the feature vectors\n",
+ " df = df.reindex(sorted(df.columns), axis=1)\n",
+ " \n",
+ " # take absolute values as last step\n",
+ " df = abs(df)\n",
+ " \n",
+ " return df\n",
+ "\n",
+ "\n",
+ "def add_shading_to_plot(ax, a, b, t, c='y', alpha=0.2):\n",
+ " \"\"\"Helper function to add shading to plot and add legend item.\n",
+ " \"\"\"\n",
+ " plt.axvspan(a, b, color=c, alpha=alpha, lw=0)\n",
+ " handles, labels = ax.get_legend_handles_labels()\n",
+ " patch = mpatches.Patch(color=c, label=t, alpha=alpha)\n",
+ " handles.append(patch) \n",
+ " plt.legend(handles=handles)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Inputs & Parameters"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A full list of all the anomaly detection configuration parameters, and descriptions of each, can be found in the [configuration](https://github.com/netdata/netdata/blob/master/ml/README.md#configuration) section of the [ml readme](https://github.com/netdata/netdata/blob/master/ml/README.md).\n",
+ "\n",
+ "Below we will focus on some basic params to decide what data to pull and the main ml params of importance in understanding how it all works.\n",
+ "\n",
+ "#### training size/scheduling parameters:\n",
+ "- `train_every`: How often to train or retrain each model.\n",
+ "- `num_samples_to_train`: How much of the recent data to train on, for example 3600 would mean training on the last 1 hour of raw data. The default in the netdata agent currently is 14400, so last 4 hours.\n",
+ "\n",
+ "#### feature preprocessing related parameters:\n",
+ "- `num_samples_to_diff`: This is really just a 1 or 0 flag to turn on or off differencing in the feature preprocessing. It defaults to 1 (to take differences) and generally should be left alone.\n",
+ "- `num_samples_to_smooth`: The extent of smoothing (averaging) applied as part of feature preprocessing.\n",
+ "- `num_samples_to_lag`: The number of previous values to also include in our feature vector.\n",
+ "\n",
+ "#### anomaly score related parameters:\n",
+ "- `dimension_anomaly_score_threshold`: The threshold on the anomaly score, above which the data it considered anomalous and the [anomaly bit](https://github.com/netdata/netdata/blob/master/ml/README.md#anomaly-bit) is set to 1 (its actually set to 100 in reality but this just to make it behave more like a rate when aggregated in the netdata agent api). By default this is `0.99` which means anything with an anomaly score above 99% is considered anomalous. Decreasing this threshold makes the model more sensitive and will leave to more anomaly bits, increasing it does the opposite.\n",
+ "\n",
+ "#### model parameters:\n",
+ "- `n_clusters_per_dimension`: This is the number of clusters to fit for each model, by default it is set to 2 such that 2 cluster [centroids](https://en.wikipedia.org/wiki/Centroid) will be fit for each model.\n",
+ "- `max_iterations`: The maximum number of iterations the fitting of the clusters is allowed to take. In reality the clustering will converge a lot sooner than this.\n",
+ "\n",
+ "**Note**: There is much more detailed discussion of all there configuration parameters in the [\"Configuration\"](https://github.com/netdata/netdata/blob/master/ml/README.md#configuration) section of the ml readme."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "id": "tBUVUpR3fohX"
+ },
+ "outputs": [],
+ "source": [
+ "# data params\n",
+ "hosts = ['london.my-netdata.io']\n",
+ "charts = ['system.cpu']\n",
+ "# if want to just focus on a subset of dims, in this case lets just pick one for simplicity\n",
+ "dims = ['system.cpu|user'] \n",
+ "last_n_hours = 2\n",
+ "# based on last_n_hours define the relevant 'before' and 'after' params for the netdata rest api on the agent\n",
+ "before = int(datetime.utcnow().timestamp())\n",
+ "after = int((datetime.utcnow() - timedelta(hours=last_n_hours)).timestamp())\n",
+ "\n",
+ "# ml params\n",
+ "train_every = 3600\n",
+ "num_samples_to_train = 3600\n",
+ "num_samples_to_diff = 1\n",
+ "num_samples_to_smooth = 3\n",
+ "num_samples_to_lag = 5\n",
+ "dimension_anomaly_score_threshold = 0.99\n",
+ "n_clusters_per_dimension = 2\n",
+ "max_iterations = 1000"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 1. Get raw data<a id=\"get-raw-data\"></a>"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Next we will use the `get_data()` function from the [netdata-pandas](https://github.com/netdata/netdata-pandas) library to just pull down our raw data from the agent into a Pandas dataframe."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 660
+ },
+ "id": "Ypudrfu-fpje",
+ "outputId": "b25c7322-03b4-4475-c416-37c3abbe78a4"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(7200, 1)\n",
+ "1647978087 1647985286\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "<div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>system.cpu|user</th>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>time_idx</th>\n",
+ " <th></th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>1647978087</th>\n",
+ " <td>1.503759</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1647978088</th>\n",
+ " <td>0.252525</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1647978089</th>\n",
+ " <td>0.755668</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1647978090</th>\n",
+ " <td>0.503778</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1647978091</th>\n",
+ " <td>0.501253</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "</div>"
+ ],
+ "text/plain": [
+ " system.cpu|user\n",
+ "time_idx \n",
+ "1647978087 1.503759\n",
+ "1647978088 0.252525\n",
+ "1647978089 0.755668\n",
+ "1647978090 0.503778\n",
+ "1647978091 0.501253"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "