summaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2018-12-29 11:21:49 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2018-12-29 11:21:49 -0800
commit3868772b99e3146d02cf47e739d79022eba1d77c (patch)
treed32c0283496e6955937b618981766b5f0878724f /Documentation/admin-guide
parent6f9d71c9c759b1e7d31189a4de228983192c7dc7 (diff)
parent942104a21ce4951420ddf6c6b3179a0627301f7e (diff)
Merge tag 'docs-5.0' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet: "A fairly normal cycle for documentation stuff. We have a new document on perf security, more Italian translations, more improvements to the memory-management docs, improvements to the pathname lookup documentation, and the usual array of smaller fixes. As is often the case, there are a few reaches outside of Documentation/ to adjust kerneldoc comments" * tag 'docs-5.0' of git://git.lwn.net/linux: (38 commits) docs: improve pathname-lookup document structure configfs: fix wrong name of struct in documentation docs/mm-api: link slab_common.c to "The Slab Cache" section slab: make kmem_cache_create{_usercopy} description proper kernel-doc doc:process: add links where missing docs/core-api: make mm-api.rst more structured x86, boot: documentation whitespace fixup Documentation: devres: note checking needs when converting doc:it: add some process/* translations doc:it: fixes in process/1.Intro Documentation: convert path-lookup from markdown to resturctured text Documentation/admin-guide: update admin-guide index.rst Documentation/admin-guide: introduce perf-security.rst file scripts/kernel-doc: Fix struct and struct field attribute processing Documentation: dev-tools: Fix typos in index.rst Correct gen_init_cpio tool's documentation Document /proc/pid PID reuse behavior Documentation: update path-lookup.md for parallel lookups Documentation: Use "while" instead of "whilst" dmaengine: Add mailing list address to the documentation ...
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/devices.rst1
-rw-r--r--Documentation/admin-guide/dynamic-debug-howto.rst8
-rw-r--r--Documentation/admin-guide/index.rst1
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt2
-rw-r--r--Documentation/admin-guide/mm/concepts.rst51
-rw-r--r--Documentation/admin-guide/perf-security.rst97
-rw-r--r--Documentation/admin-guide/ras.rst2
-rw-r--r--Documentation/admin-guide/security-bugs.rst2
8 files changed, 132 insertions, 32 deletions
diff --git a/Documentation/admin-guide/devices.rst b/Documentation/admin-guide/devices.rst
index 7fadc05330dd..d41671aeaef0 100644
--- a/Documentation/admin-guide/devices.rst
+++ b/Documentation/admin-guide/devices.rst
@@ -1,3 +1,4 @@
+.. _admin_devices:
Linux allocated devices (4.x+ version)
======================================
diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst
index fdf72429f801..252e5ef324e5 100644
--- a/Documentation/admin-guide/dynamic-debug-howto.rst
+++ b/Documentation/admin-guide/dynamic-debug-howto.rst
@@ -110,8 +110,8 @@ If your query set is big, you can batch them too::
~# cat query-batch-file > <debugfs>/dynamic_debug/control
-A another way is to use wildcard. The match rule support ``*`` (matches
-zero or more characters) and ``?`` (matches exactly one character).For
+Another way is to use wildcards. The match rule supports ``*`` (matches
+zero or more characters) and ``?`` (matches exactly one character). For
example, you can match all usb drivers::
~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
@@ -258,7 +258,7 @@ this boot parameter for debugging purposes.
If ``foo`` module is not built-in, ``foo.dyndbg`` will still be processed at
boot time, without effect, but will be reprocessed when module is
-loaded later. ``dyndbg_query=`` and bare ``dyndbg=`` are only processed at
+loaded later. ``ddebug_query=`` and bare ``dyndbg=`` are only processed at
boot.
@@ -301,7 +301,7 @@ The ``dyndbg`` option is a "fake" module parameter, which means:
For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or
enabled by ``-DDEBUG`` flag during compilation) can be disabled later via
-the sysfs interface if the debug messages are no longer needed::
+the debugfs interface if the debug messages are no longer needed::
echo "module module_name -p" > <debugfs>/dynamic_debug/control
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 965745d5fb9a..0a491676685e 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -76,6 +76,7 @@ configure specific aspects of kernel behavior to your liking.
thunderbolt
LSM/index
mm/index
+ perf-security
.. only:: subproject and html
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b7c9040f547e..37e235be1d35 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -331,7 +331,7 @@
APC and your system crashes randomly.
apic= [APIC,X86] Advanced Programmable Interrupt Controller
- Change the output verbosity whilst booting
+ Change the output verbosity while booting
Format: { quiet (default) | verbose | debug }
Change the amount of debugging information output
when initialising the APIC and IO-APIC components.
diff --git a/Documentation/admin-guide/mm/concepts.rst b/Documentation/admin-guide/mm/concepts.rst
index 291699c810d4..c2531b14bf46 100644
--- a/Documentation/admin-guide/mm/concepts.rst
+++ b/Documentation/admin-guide/mm/concepts.rst
@@ -4,13 +4,13 @@
Concepts overview
=================
-The memory management in Linux is complex system that evolved over the
-years and included more and more functionality to support variety of
+The memory management in Linux is a complex system that evolved over the
+years and included more and more functionality to support a variety of
systems from MMU-less microcontrollers to supercomputers. The memory
-management for systems without MMU is called ``nommu`` and it
+management for systems without an MMU is called ``nommu`` and it
definitely deserves a dedicated document, which hopefully will be
eventually written. Yet, although some of the concepts are the same,
-here we assume that MMU is available and CPU can translate a virtual
+here we assume that an MMU is available and a CPU can translate a virtual
address to a physical address.
.. contents:: :local:
@@ -21,10 +21,10 @@ Virtual Memory Primer
The physical memory in a computer system is a limited resource and
even for systems that support memory hotplug there is a hard limit on
the amount of memory that can be installed. The physical memory is not
-necessary contiguous, it might be accessible as a set of distinct
+necessarily contiguous; it might be accessible as a set of distinct
address ranges. Besides, different CPU architectures, and even
-different implementations of the same architecture have different view
-how these address ranges defined.
+different implementations of the same architecture have different views
+of how these address ranges are defined.
All this makes dealing directly with physical memory quite complex and
to avoid this complexity a concept of virtual memory was developed.
@@ -48,8 +48,8 @@ appropriate kernel configuration option.
Each physical memory page can be mapped as one or more virtual
pages. These mappings are described by page tables that allow
-translation from virtual address used by programs to real address in
-the physical memory. The page tables organized hierarchically.
+translation from a virtual address used by programs to the physical
+memory address. The page tables are organized hierarchically.
The tables at the lowest level of the hierarchy contain physical
addresses of actual pages used by the software. The tables at higher
@@ -121,8 +121,8 @@ Nodes
Many multi-processor machines are NUMA - Non-Uniform Memory Access -
systems. In such systems the memory is arranged into banks that have
different access latency depending on the "distance" from the
-processor. Each bank is referred as `node` and for each node Linux
-constructs an independent memory management subsystem. A node has it's
+processor. Each bank is referred to as a `node` and for each node Linux
+constructs an independent memory management subsystem. A node has its
own set of zones, lists of free and used pages and various statistics
counters. You can find more details about NUMA in
:ref:`Documentation/vm/numa.rst <numa>` and in
@@ -149,9 +149,9 @@ for program's stack and heap or by explicit calls to mmap(2) system
call. Usually, the anonymous mappings only define virtual memory areas
that the program is allowed to access. The read accesses will result
in creation of a page table entry that references a special physical
-page filled with zeroes. When the program performs a write, regular
+page filled with zeroes. When the program performs a write, a regular
physical page will be allocated to hold the written data. The page
-will be marked dirty and if the kernel will decide to repurpose it,
+will be marked dirty and if the kernel decides to repurpose it,
the dirty page will be swapped out.
Reclaim
@@ -181,8 +181,8 @@ pressure.
The process of freeing the reclaimable physical memory pages and
repurposing them is called (surprise!) `reclaim`. Linux can reclaim
pages either asynchronously or synchronously, depending on the state
-of the system. When system is not loaded, most of the memory is free
-and allocation request will be satisfied immediately from the free
+of the system. When the system is not loaded, most of the memory is free
+and allocation requests will be satisfied immediately from the free
pages supply. As the load increases, the amount of the free pages goes
down and when it reaches a certain threshold (high watermark), an
allocation request will awaken the ``kswapd`` daemon. It will
@@ -190,7 +190,7 @@ asynchronously scan memory pages and either just free them if the data
they contain is available elsewhere, or evict to the backing storage
device (remember those dirty pages?). As memory usage increases even
more and reaches another threshold - min watermark - an allocation
-will trigger the `direct reclaim`. In this case allocation is stalled
+will trigger `direct reclaim`. In this case allocation is stalled
until enough memory pages are reclaimed to satisfy the request.
Compaction
@@ -200,7 +200,7 @@ As the system runs, tasks allocate and free the memory and it becomes
fragmented. Although with virtual memory it is possible to present
scattered physical pages as virtually contiguous range, sometimes it is
necessary to allocate large physically contiguous memory areas. Such
-need may arise, for instance, when a device driver requires large
+need may arise, for instance, when a device driver requires a large
buffer for DMA, or when THP allocates a huge page. Memory `compaction`
addresses the fragmentation issue. This mechanism moves occupied pages
from the lower part of a memory zone to free pages in the upper part
@@ -208,15 +208,16 @@ of the zone. When a compaction scan is finished free pages are grouped
together at the beginning of the zone and allocations of large
physically contiguous areas become possible.
-Like reclaim, the compaction may happen asynchronously in ``kcompactd``
-daemon or synchronously as a result of memory allocation request.
+Like reclaim, the compaction may happen asynchronously in the ``kcompactd``
+daemon or synchronously as a result of a memory allocation request.
OOM killer
==========
-It may happen, that on a loaded machine memory will be exhausted. When
-the kernel detects that the system runs out of memory (OOM) it invokes
-`OOM killer`. Its mission is simple: all it has to do is to select a
-task to sacrifice for the sake of the overall system health. The
-selected task is killed in a hope that after it exits enough memory
-will be freed to continue normal operation.
+It is possible that on a loaded machine memory will be exhausted and the
+kernel will be unable to reclaim enough memory to continue to operate. In
+order to save the rest of the system, it invokes the `OOM killer`.
+
+The `OOM killer` selects a task to sacrifice for the sake of the overall
+system health. The selected task is killed in a hope that after it exits
+enough memory will be freed to continue normal operation.
diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst
new file mode 100644
index 000000000000..f73ebfe9bfe2
--- /dev/null
+++ b/Documentation/admin-guide/perf-security.rst
@@ -0,0 +1,97 @@
+.. _perf_security:
+
+Perf Events and tool security
+=============================
+
+Overview
+--------
+
+Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ can
+impose a considerable risk of leaking sensitive data accessed by monitored
+processes. The data leakage is possible both in scenarios of direct usage of
+perf_events system call API [2]_ and over data files generated by Perf tool user
+mode utility (Perf) [3]_ , [4]_ . The risk depends on the nature of data that
+perf_events performance monitoring units (PMU) [2]_ collect and expose for
+performance analysis. Having that said perf_events/Perf performance monitoring
+is the subject for security access control management [5]_ .
+
+perf_events/Perf access control
+-------------------------------
+
+To perform security checks, the Linux implementation splits processes into two
+categories [6]_ : a) privileged processes (whose effective user ID is 0, referred
+to as superuser or root), and b) unprivileged processes (whose effective UID is
+nonzero). Privileged processes bypass all kernel security permission checks so
+perf_events performance monitoring is fully available to privileged processes
+without access, scope and resource restrictions.
+
+Unprivileged processes are subject to a full security permission check based on
+the process's credentials [5]_ (usually: effective UID, effective GID, and
+supplementary group list).
+
+Linux divides the privileges traditionally associated with superuser into
+distinct units, known as capabilities [6]_ , which can be independently enabled
+and disabled on per-thread basis for processes and files of unprivileged users.
+
+Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated as
+privileged processes with respect to perf_events performance monitoring and
+bypass *scope* permissions checks in the kernel.
+
+Unprivileged processes using perf_events system call API is also subject for
+PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose outcome
+determines whether monitoring is permitted. So unprivileged processes provided
+with CAP_SYS_PTRACE capability are effectively permitted to pass the check.
+
+Other capabilities being granted to unprivileged processes can effectively
+enable capturing of additional data required for later performance analysis of
+monitored processes or a system. For example, CAP_SYSLOG capability permits
+reading kernel space memory addresses from /proc/kallsyms file.
+
+perf_events/Perf unprivileged users
+-----------------------------------
+
+perf_events/Perf *scope* and *access* control for unprivileged processes is
+governed by perf_event_paranoid [2]_ setting:
+
+-1:
+ Impose no *scope* and *access* restrictions on using perf_events performance
+ monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ locking limit is
+ ignored when allocating memory buffers for storing performance data.
+ This is the least secure mode since allowed monitored *scope* is
+ maximized and no perf_events specific limits are imposed on *resources*
+ allocated for performance monitoring.
+
+>=0:
+ *scope* includes per-process and system wide performance monitoring
+ but excludes raw tracepoints and ftrace function tracepoints monitoring.
+ CPU and system events happened when executing either in user or
+ in kernel space can be monitored and captured for later analysis.
+ Per-user per-cpu perf_event_mlock_kb locking limit is imposed but
+ ignored for unprivileged processes with CAP_IPC_LOCK [6]_ capability.
+
+>=1:
+ *scope* includes per-process performance monitoring only and excludes
+ system wide performance monitoring. CPU and system events happened when
+ executing either in user or in kernel space can be monitored and
+ captured for later analysis. Per-user per-cpu perf_event_mlock_kb
+ locking limit is imposed but ignored for unprivileged processes with
+ CAP_IPC_LOCK capability.
+
+>=2:
+ *scope* includes per-process performance monitoring only. CPU and system
+ events happened when executing in user space only can be monitored and
+ captured for later analysis. Per-user per-cpu perf_event_mlock_kb
+ locking limit is imposed but ignored for unprivileged processes with
+ CAP_IPC_LOCK capability.
+
+Bibliography
+------------
+
+.. [1] `<https://lwn.net/Articles/337493/>`_
+.. [2] `<http://man7.org/linux/man-pages/man2/perf_event_open.2.html>`_
+.. [3] `<http://web.eece.maine.edu/~vweaver/projects/perf_events/>`_
+.. [4] `<https://perf.wiki.kernel.org/index.php/Main_Page>`_
+.. [5] `<https://www.kernel.org/doc/html/latest/security/credentials.html>`_
+.. [6] `<http://man7.org/linux/man-pages/man7/capabilities.7.html>`_
+.. [7] `<http://man7.org/linux/man-pages/man2/ptrace.2.html>`_
+
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index 197896718f81..c7495e42e6f4 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -54,7 +54,7 @@ those errors are correctable.
Types of errors
---------------
-Most mechanisms used on modern systems use use technologies like Hamming
+Most mechanisms used on modern systems use technologies like Hamming
Codes that allow error correction when the number of errors on a bit packet
is below a threshold. If the number of errors is above, those mechanisms
can indicate with a high degree of confidence that an error happened, but
diff --git a/Documentation/admin-guide/security-bugs.rst b/Documentation/admin-guide/security-bugs.rst
index 30187d49dc2c..dcd6c93c7aac 100644
--- a/Documentation/admin-guide/security-bugs.rst
+++ b/Documentation/admin-guide/security-bugs.rst
@@ -44,7 +44,7 @@ only valid reason for deferring the publication of a fix is to accommodate
the logistics of QA and large scale rollouts which require release
coordination.
-Whilst embargoed information may be shared with trusted individuals in
+While embargoed information may be shared with trusted individuals in
order to develop a fix, such information will not be published alongside
the fix or on any other disclosure channel without the permission of the
reporter. This includes but is not limited to the original bug report