summaryrefslogtreecommitdiffstats
path: root/ml
AgeCommit message (Collapse)Author
2022-07-06Multi-Tier database backend for long term metrics storage (#13263)Stelios Fragkakis
* Tier part 1 * Tier part 2 * Tier part 3 * Tier part 4 * Tier part 5 * Fix some ML compilation errors * fix more conflicts * pass proper tier * move metric_uuid from state to RRDDIM * move aclk_live_status from state to RRDDIM * move ml_dimension from state to RRDDIM * abstracted the data collection interface * support flushing for mem db too * abstracted the query api * abstracted latest/oldest time per metric * cleanup * store_metric for tier1 * fix for store_metric * allow multiple tiers, more than 2 * state to tier * Change storage type in db. Query param to request min, max, sum or average * Store tier data correctly * Fix skipping tier page type * Add tier grouping in the tier * Fix to handle archived charts (part 1) * Temp fix for query granularity when requesting tier1 data * Fix parameters in the correct order and calculate the anomaly based on the anomaly count * Proper tiering grouping * Anomaly calculation based on anomaly count * force type checking on storage handles * update cmocka tests * fully dynamic number of storage tiers * fix static allocation * configure grouping for all tiers; disable tiers for unittest; disable statsd configuration for private charts mode * use default page dt using the tiering info * automatic selection of tier * fix for automatic selection of tier * working prototype of dynamic tier selection * automatic selection of tier done right (I hope) * ask for the proper tier value, based on the grouping function * fixes for unittests and load_metric_next() * fixes for lgtm findings * minor renames * add dbengine to page cache size setting * add dbengine to page cache with malloc * query engine optimized to loop as little are required based on the view_update_every * query engine grouping methods now do not assume a constant number of points per group and they allocate memory with OWA * report db points per tier in jsonwrap * query planer that switches database tiers on the fly to satisfy the query for the entire timeframe * dbegnine statistics and documentation (in progress) * calculate average point duration in db * handle single point pages the best we can * handle single point pages even better * Keep page type in the rrdeng_page_descr * updated doc * handle future backwards compatibility - improved statistics * support &tier=X in queries * enfore increasing iterations on tiers * tier 1 is always 1 iteration * backfilling higher tiers on first data collection * reversed anomaly bit * set up to 5 tiers * natural points should only be offered on tier 0, except a specific tier is selected * do not allow more than 65535 points of tier0 to be aggregated on any tier * Work only on actually activated tiers * fix query interpolation * fix query interpolation again * fix lgtm finding * Activate one tier for now * backfilling of higher tiers using raw metrics from lower tiers * fix for crash on start when storage tiers is increased from the default * more statistics on exit * fix bug that prevented higher tiers to get any values; added backfilling options * fixed the statistics log line * removed limit of 255 iterations per tier; moved the code of freezing rd->tiers[x]->db_metric_handle * fixed division by zero on zero points_wanted * removed dead code * Decide on the descr->type for the type of metric * dont store metrics on unknown page types * free db_metric_handle on sql based context queries * Disable STORAGE_POINT value check in the exporting engine unit tests * fix for db modes other than dbengine * fix for aclk archived chart queries destroying db_metric_handles of valid rrddims * fix left-over freez() instead of OWA freez on median queries Co-authored-by: Costa Tsaousis <costa@netdata.cloud> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-28netdata doubles (#13217)Costa Tsaousis
* netdata doubles * fix cmocka test * fix cmocka test again * fix left-overs of long double to NETDATA_DOUBLE * RRDDIM detached from disk representation; db settings in [db] section of netdata.conf * update the memory before saving * rrdset is now detached from file structures too * on memory mode map, update the memory mapped structures on every iteration * allow RRD_ID_LENGTH_MAX to be changed * granularity secs, back to update every * fix formatting * more formatting
2022-06-25make configuration example clearer (#13182)Andrew Maguire
make ml configuration example cleare
2022-06-22Query Engine multi-granularity support (and MC improvements) (#13155)Costa Tsaousis
* set grouping functions * storage engine should check the validity of timestamps, not the query engine * calculate and store in RRDR anomaly rates for every query * anomaly rate used by volume metric correlations * mc volume should use absolute data, to avoid cancelling effect * return anomaly-rates in jasonwrap with jw-anomaly-rates option to data queries * dont return null on anomaly rates * allow passing group query options from the URL * added countif to the query engine and used it in metric correlations * fix configure * fix countif and anomaly rate percentages * added group_options to metric correlations; updated swagger * added newline at the end of yaml file * always check the time the highlighted window was above/below the highlighted window * properly track time in memory queries * error for internal checks only * moved pack_storage_number() into the storage engines * moved unpack_storage_number() inside the storage engines * remove old comment * pass unit tests * properly detect zero or subnormal values in pack_storage_number() * fill nulls before the value, not after * make sure math.h is included * workaround for isfinite() * fix for isfinite() * faster isfinite() alternative * fix for faster isfinite() alternative * next_metric() now returns end_time too * variable step implemented in a generic way * remove left-over variables * ensure we always complete the wanted number of points * fixes * ensure no infinite loop * mc-volume-improvements: Add information about invalid condition * points should have a duration in the past * removed unneeded info() line * Fix unit tests for exporting engine * new_point should only be checked when it is fetched from the db; better comment about the premature breaking of the main query loop Co-authored-by: Thiago Marques <thiagoftsm@gmail.com> Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-20set default for `minimum num samples to train` to `900` (#13174)Andrew Maguire
This will enable first set of initial models to be trained quicker and makes sense now that ml is enabled by default.
2022-06-17enable ml by default (#13158)Andrew Maguire
* enabled ml by default
2022-06-16revert to default of `host anomaly rate threshold=0.01` (#13150)Andrew Maguire
2022-06-06Update default value for "host anomaly rate threshold" (#13075)Shyam Sreevalsan
2022-05-31Treat dimensions as normal when we don't have enough/valid data. (#13005)vkalintiris
Ideally, we'd log such cases but this is not currently feasible because we have to process thousands of dimensions per second.
2022-05-12Fix compilation warnings (#12886)Vladimir Kobal
2022-05-09Dim rates (#12833)vkalintiris
2022-05-09Workers utilization charts (#12807)Costa Tsaousis
* initial version of worker utilization * working example * without mutexes * monitoring DBENGINE, ACLKSYNC, WEB workers * added charts to monitor worker usage * fixed charts units * updated contexts * updated priorities * added documentation * converted threads to stacked chart * One query per query thread * Revert "One query per query thread" This reverts commit 6aeb391f5987c3c6ba2864b559fd7f0cd64b14d3. * fixed priority for web charts * read worker cpu utilization from proc * read workers cpu utilization via /proc/self/task/PID/stat, so that we have cpu utilization even when the jobs are too long to finish within our update_every frequency * disabled web server cpu utilization monitoring - it is now monitored by worker utilization * tight integration of worker utilization to web server * monitoring statsd worker threads * code cleanup and renaming of variables * contrained worker and statistics conflict to just one variable * support for rendering jobs per type * better priorities and removed the total jobs chart * added busy time in ms per job type * added proc.plugin monitoring, switch clock to MONOTONIC_RAW if available, global statistics now cleans up old worker threads * isolated worker thread families * added cgroups.plugin workers * remove unneeded dimensions when then expected worker is just one * plugins.d and streaming monitoring * rebased; support worker_is_busy() to be called one after another * added diskspace plugin monitoring * added tc.plugin monitoring * added ML threads monitoring * dont create dimensions and charts that are not needed * fix crash when job types are added on the fly * added timex and idlejitter plugins; collected heartbeat statistics; reworked heartbeat according to the POSIX * the right name is heartbeat for this chart * monitor streaming senders * added streaming senders to global stats * prevent division by zero * added clock_init() to external C plugins * added freebsd and macos plugins * added freebsd and macos to global statistics * dont use new as a variable; address compiler warnings on FreeBSD and MacOS * refactored contexts to be unique; added health threads monitoring Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2022-05-03update ml defaults in docs (#12782)Andrew Maguire
2022-05-03Configurable storage engine for Netdata agents: step 1 (#12776)Adrien BĂ©raud
* rrd: move API structures out of rrddim_volatile In C, unlike C++, it's not possible to reference a nested structure from outside this structure. Since we later want to use rrddim_query_ops and rrddim_collect_ops separately from rrddim_volatile, move these nested structures out. * rrd: use opaque handle types for different memory modes
2022-04-29some config updates for ml (#12771)Andrew Maguire
- set `dbengine anomaly rate every` to 30 by default to provide better sorting functionality for anomaly advisor. - increase upper clamp on `MaxTrainSamples` to 24 hours for those who might like a larger training window on a parent. - decrease lower clamp on `MinTrainSamples` to 15 minutes to enable faster training of initial models.
2022-04-27expand on the various parent-child config options (#12734)Andrew Maguire
expand on the various parent-child config options
2022-04-13Cancel anomaly detection threads before joining. (#12681)vkalintiris
Originally, the main training/detection thread loops where meant to be run only for the localhost host. They would stop when `netdata_exit` was set to true during the shutdown process. By enabling training/detection for children, we have to explicitly cancel ML threads because the service thread can free a child host at any point in time without setting `netdata_exit` to true. To support this: - We send a cancellation request to the training and the detection threads when we call rrdhost_free. - We disable/enable cancelation for the actual training/detection step on every iteration (in order to protect locks and shared data structures).
2022-04-12Delete ML-related data of a host in the proper order. (#12672)vkalintiris
Initialization of ML-related structures and threads should happen when the underlying RRD objects have been fully initialized. Destruction should happen in the opposite way, ie. before deleting an RRD host/dimension.
2022-04-11fix: remove instance-specific information from chart titles (#12644)Ilya Mashchenko
Co-authored-by: Vasilis Kalintiris <vasilis@netdata.cloud>
2022-04-06add some new ml params to README (#12575)Andrew Maguire
Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com>
2022-04-05Fix training/prediction stats charts context. (#12610)vkalintiris
2022-04-05Enable streaming of anomaly_detection.* charts (#12606)vkalintiris
2022-04-04Fix coverity issues (#12598)vkalintiris
* Clamp LagN to non-zero values. * Free static threads even on test failure. * Initialize rusage. * s/free/freez/
2022-04-04Update ML-related charts (#12574)vkalintiris
* Move CPU usage stats under netdata charts Use the hostname in each chart's name, and the machine GUID in each chart's id. * Move anomaly_detection.* charts to child host instance. * Add option to enable/disable streaming of ML-related charts. * Update priority of prediction/training charts.
2022-03-30ML-related changes to address issue/discussion comments. (#12494)vkalintiris
* Increase training thread's max sleep time. With this change we will only cap the allotted time when it is more than ten seconds. The previous limit was one second, which had the effect of scheduling dimensions near the beggining of each training window. This was not desirable because it would cause high CPU usage on parents with many children. * Only exclude netdata.* charts from training. * Use heartbeat in detection thread. * Track rusage of prediction thread. * Track rusage of training thread. * Add support for random sampling of extracted features. * Rebase * Skip RNG when ML is disabled and fix undef behaviour
2022-03-28reduce min `dbengine anomaly rate every` 60s->30s (#12543)Andrew Maguire
2022-03-24Add ml notebooks (#12313)Andrew Maguire
* initial setting up of notebook * add open in colab button * draft work * first version of notebook * fix open in colab button * Update ml/notebooks/README.md Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * use underscores in filename * add one final visualization approach using scatter plots * get a better random sample for plots * small text update * fix link * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * Update ml/notebooks/netdata_anomaly_detection_deepdive.ipynb Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com> * address review comments * add ipynb files to dockerignore Co-authored-by: Tina Luedtke <kickoke@users.noreply.github.com>
2022-03-09Docs fix: Add missing frontmatter (#12353)Tina Luedtke
2022-03-09Prepend context in anomaly rate dimension id. (#12342)vkalintiris
FE needs this information when they issue a `/data` request on the anomaly rates chart. However, this information is only available at the creation time of the anomaly rate dimension.
2022-02-24Fix builds where HAVE_C___ATOMIC is not defined. (#12240)vkalintiris
2022-02-24Track anomaly rates with DBEngine. (#12083)vkalintiris
* Track anomaly rates with DBEngine. This commit adds support for tracking anomaly rates with DBEngine. We do so by creating a single chart with id "anomaly_detection.anomaly_rates" for each trainable/predictable host, which is responsible for tracking the anomaly rate of each dimension that we train/predict for that host. The rrdset->state->is_ar_chart boolean flag is set to true only for anomaly rates charts. We use this flag to: - Disable exposing the anomaly rates charts through the functionality in backends/, exporting/ and streaming/. - Skip generation of configuration options for the name, algorithm, multiplier, divisor of each dimension in an anomaly rates chart. - Skip the creation of health variables for anomaly rates dimensions. - Skip the chart/dim queue of ACLK. - Post-process the RRDR result of an anomaly rates chart, so that we can return a sorted, trimmed number of anomalous dimensions. In a child/parent configuration where both the child and the parent run ML for the child, we want to be able to stream the rest of the ML-related charts to the parent. To be able to do this without any chart name collisions, the charts are now created on localhost and their IDs and titles have the node's machine_guid and hostname as a suffix, respectively. * Fix exporting_engine tests. * Restore default ML configuration. The reverted changes where meant for local testing only. This commit restores the default values that we want to have when someone runs anomaly detection on their node. * Set context for anomaly_detection.* charts. * Check for anomaly rates chart only with a valid pointer. * Remove duplicate code. * Use a more descriptive name for id/title pair variable
2022-02-23Skip training of constant metrics. (#12212)vkalintiris
Detect dimensions whose values do not change, and skip them from training. This allows us to reduce the number of training operations by ~40-50%. Notice that we don't skip the very 1st training iteration, because a dimension's value might change at any point in time, and we need to have a trained model in order to compute its anomaly score.
2022-01-13Use absolute features when doing training/prediction. (#11876)vkalintiris
Use absolute features when doing training/prediction.
2022-01-10Fix compilation warnings (#11846)Vladimir Kobal
2021-12-22Provide runtime ml info from a new endpoint. (#11886)vkalintiris
* Provide runtime ml info from a new endpoint. * Add hosts & charts to skip from ML in the /info endpoint. This information belongs in /info, and not in /ml-info, because the value of these variables can not change at the agent's runtime. * Use strdupz instead of strdup.
2021-12-06Create ML README.md (#11848)Andrew Maguire
Create README.md for `netdata/ml`
2021-11-22Use two digits after the decimal point for the anomaly rate. (#11804)vkalintiris
2021-11-17Use the proper format specifier when logging configuration options. (#11795)vkalintiris
2021-10-27Anomaly Detection MVP (#11548)vkalintiris
* Add support for feature extraction and K-Means clustering. This patch adds support for performing feature extraction and running the K-Means clustering algorithm on the extracted features. We use the open-source dlib library to compute the K-Means clustering centers, which has been added as a new git submodule. The build system has been updated to recognize two new options: 1) --enable-ml: build an agent with ml functionality, and 2) --enable-ml-tests: support running tests with the `-W mltest` option in netdata. The second flag is meant only for internal use. To build tests successfully, you need to install the GoogleTest framework on your machine. * Boilerplate code to track hosts/dims and init ML config options. A new opaque pointer field is added to the database's host and dimension data structures. The fields point to C++ wrapper classes that will be used to store ML-related information in follow-up patches. The ML functionality needs to iterate all tracked dimensions twice per second. To avoid locking the entire DB multiple times, we use a separate dictionary to add/remove dimensions as they are created/deleted by the database. A global configuration object is initialized during the startup of the agent. It will allow our users to specify ML-related configuration options, eg. hosts/charts to skip from training, etc. * Add support for training and prediction of dimensions. Every new host spawns a training thread which is used to train the model of each dimension. Training of dimensions is done in a non-batching mode in order to avoid impacting the generated ML model by the CPU, RAM and disk utilization of the training code itself. For performance reasons, prediction is done at the time a new value is pushed in the database. The alternative option, ie. maintaining a separate thread for prediction, would be ~3-4x times slower and would increase locking contention considerably. For similar reasons, we use a custom function to unpack storage_numbers into doubles, instead of long doubles. * Add data structures required by the anomaly detector. This patch adds two data structures that will be used by the anomaly detector in follow-up patches. The first data structure is a circular bit buffer which is being used to count the number of set bits over time. The second data structure represents an expandable, rolling window that tracks set/unset bits. It is explicitly modeled as a finite-state machine in order to make the anomaly detector's behaviour easier to test and reason about. * Add anomaly detection thread. This patch creates a new anomaly detection thread per host. Each thread maintains a BitRateWindow which is updated every second based on the anomaly status of the correspondent host. Based on the updated status of the anomaly window, we can identify the existence/absence of an anomaly event, it's start/end time and the dimensions that participate in it. * Create/insert/query anomaly events from Sqlite DB. * Create anomaly event endpoints. This patch adds two endpoints to expose information about anomaly events. The first endpoint returns the list of anomalous events within a specified time range. The second endpoint provides detailed information about a single anomaly event, ie. the list of anomalous dimensions in that event along with their anomaly rate. The `anomaly-bit` option has been added to the `/data` endpoint in order to allow users to get the anomaly status of individual dimensions per second. * Fix build failures on Ubuntu 16.04 & CentOS 7. These distros do not have toolchains with C++11 enabled by default. Replacing nullptr with NULL should be fix the build problems on these platforms when the ML feature is not enabled. * Fix `make dist` to include ML makefiles and dlib sources. Currently, we add ml/kmeans/dlib to EXTRA_DIST. We might want to generate an explicit list of source files in the future, in order to bring down the generated archive's file size. * Small changes to make the LGTM & Codacy bots happy. - Cast unused result of function calls to void. - Pass a const-ref string to Database's constructor. - Reduce the scope of a local variable in the anomaly detector. * Add user configuration option to enable/disable anomaly detection. * Do not log dimension-specific operations. Training and prediction operations happen every second for each dimension. In prep for making this PR easier to run anomaly detection for many charts & dimensions, I've removed logs that would cause log flooding. * Reset dimensions' bit counter when not above anomaly rate threshold. * Update the default config options with real values. With this patch the default configuration options will match the ones we want our users to use by default. * Update conditions for creating new ML dimensions. 1. Skip dimensions with update_every != 1, 2. Skip dimensions that come from the ML charts. With this filtering in place, any configuration value for the relevant simple_pattern expressions will work correctly. * Teach buildinfo{,json} about the ML feature. * Set --enable-ml by default in the configuration options. This patch is only meant for testing the building of the ML functionality on Github. It will be reverted once tests pass successfully. * Minor build system fixes. - Add path to json header - Enable C++ linker when ML functionality is enabled - Rename ml/ml-dummy.cc to ml/ml-dummy.c * Revert "Set --enable-ml by default in the configuration options." This reverts commit 28206952a59a577675c86194f2590ec63b60506c. We pass all Github checks when building the ML functionality, except for those that run on CentOS 7 due to not having a C++11 toolchain. * Check for missing dlib and nlohmann files. We simply check the single-source files upon which our build system depends. If they are missing, an error message notifies the user about missing git submodules which are required for the ML functionality. * Allow users to specify the maximum number of KMeans iterations. * Use dlib v19.10 v19.22 broke compatibility with CentOS 7's g++. Development of the anomaly detection used v19.10, which is the version used by most Debian and Ubuntu distribution versions that are not past EOL. No observable performance improvements/regressions specific to the K-Means algorithm occur between the two versions. * Detect and use the -std=c++11 flag when building anomaly detection. This patch automatically adds the -std=c++11 when building netdata with the ML functionality, if it's supported by the user's toolchain. With this change we are able to build the agent correctly on CentOS 7. * Restructure configuration options. - update default values, - clamp values to min/max defaults, - validate and identify conflicting values. * Add update_every configuration option. Considerring that the MVP does not support per host configuration options, the update_every option will be used to filter hosts to train. With this change anomaly detection will be supported on: - Single nodes with update_every != 1, and - Children nodes with a common update_every value that might differ from the value of the parent node. * Reorganize anomaly detection charts. This follows Andrew's suggestion to have four charts to show the number of anomalous/normal dimensions, the anomaly rate, the detector's window length, and the events that occur in the prediction step. Context and family values, along with the necessary information in the dashboard_info.js file, will be updated in a follow-up commit. * Do not dump anomaly event info in logs. * Automatically handle low "train every secs" configuration values. If a user specifies a very low value for the "train every secs", then it is possible that the time it takes to train a dimension is higher than the its allotted time. In that case, we want the training thread to: - Reduce it's CPU usage per second, and - Allow the prediction thread to proceed. We achieve this by limiting the training time of a single dimension to be equal to half the time allotted to it. This means, that the training thread will never consume more than 50% of a single core. * Automatically detect if ML functionality should be enabled. With these changes, we enable ML if: - The user has not explicitly specified --disable-ml, and - Git submodules have been checked out properly, and - The toolchain supports C++11. If the user has explicitly specified --enable-ml, the build fails if git submodules are missing, or the toolchain does not support C++11. * Disable anomaly detection by default. * Do not update charts in locked region. * Cleanup code reading configuration options. * Enable C++ linker when building ML. * Disable ML functionality for CMake builds. * Skip LGTM for dlib and nlohmann libraries. * Do not build ML if libuuid is missing. * Fix dlib path in LGTM's yaml config file. * Add chart to track duration of prediction step. * Add chart to track duration of training step. * Limit the number dimensions in an anomaly event. This will ensure our JSON results won't grow without any limit. The default ML configuration options, train approximately ~1700 dimensions in a newly-installed Netdata agent. The hard-limit is set to 2000 dimensions which: - Is well above the default number of dimensions we train, - If it is ever reached it means that the user had accidentaly a very low anomaly rate threshold, and - Considering that we sort the result by anomaly score, the cutoff dimensions will be the less anomalous, ie. the least important to investigate. * Add information about the ML charts. * Update family value in ML charts. This fix will allow us to show the individual charts in the RHS Anomaly Detection submenu. * Rename chart type s/anomalydetection/anomaly_detection/g * Expose ML feat in /info endpoint. * Export ML config through /info endpoint. * Fix CentOS 7 build. * Reduce the critical region of a host's lock. Before this change, each host had a single, dedicated lock to protect its map of dimensions from adding/deleting new dimensions while training and detecting anomalies. This was problematic because training of a single dimension can take several seconds in nodes that are under heavy load. After this change, the host's lock protects only the insertion/deletion of new dimensions, and the prediction step. For the training of dimensions we use a dedicated lock per dimension, which is responsible for protecting the dimension from deletion while training. Prediction is fast enough, even on slow machines or under heavy load, which allows us to use the host's main lock and avoid increasing the complexity of our implementation in the anomaly detector. * Improve the way we are tracking anomaly detector's performance. This change allows us to: - track the total training time per update_every period, - track the maximum training time of a single dimension per update_every period, and - export the current number of total, anomalous, normal dimensions to the /info endpoint. Also, now that we use dedicated locks per dimensions, we can train under heavy load continuously without having to sleep in order to yield the training thread and allow the prediction thread to progress. * Use samples instead of seconds in ML configuration. This commit changes the way we are handling input ML configuration options from the user. Instead of treating values as seconds, we interpret all inputs as number of update_every periods. This allows us to enable anomaly detection on hosts that have update_every != 1 second, and still produce a model for training/prediction & detection that behaves in an expected way. Tested by running anomaly detection on an agent with update_every = [1, 2, 4] seconds. * Remove unecessary log message in detection thread * Move ML configuration to global section. * Update web/gui/dashboard_info.js Co-authored-by: Andrew Maguire <andrewm4894@gmail.com> * Fix typo Co-authored-by: Andrew Maguire <andrewm4894@gmail.com> * Rebase. * Use negative logic for anomaly bit. * Add info for prediction_stats and training_stats charts. * Disable ML on PPC64EL. The CI test fails with -std=c++11 and requires -std=gnu++11 instead. However, it's not easy to quickly append the required flag to CXXFLAGS. For the time being, simply disable ML on PPC64EL and if any users require this functionality we can fix it in the future. * Add comment on why we disable ML on PPC64EL. Co-authored-by: Andrew Maguire <andrewm4894@gmail.com>