diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-03-09 09:56:17 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-03-09 09:56:17 -0800 |
commit | 1a29e857507046e413ca7a4a7c9cd32fed9ea255 (patch) | |
tree | 5a46d9c4dcab39fc588a9ac2c9f5e4c866d41254 /Documentation | |
parent | c4703acd6d4a58dc4b31ad2a8f8b14becb898d25 (diff) | |
parent | 4064174becc09a5a2385a27c8a6fd40888b0e13c (diff) |
Merge tag 'docs-5.1' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A fairly routine cycle for docs - lots of typo fixes, some new
documents, and more translations. There's also some LICENSES
adjustments from Thomas"
* tag 'docs-5.1' of git://git.lwn.net/linux: (74 commits)
docs: Bring some order to filesystem documentation
Documentation/locking/lockdep: Drop last two chars of sample states
doc: rcu: Suspicious RCU usage is a warning
docs: driver-api: iio: fix errors in documentation
Documentation/process/howto: Update for 4.x -> 5.x versioning
docs: Explicitly state that the 'Fixes:' tag shouldn't split lines
doc: security: Add kern-doc for lsm_hooks.h
doc: sctp: Merge and clean up rst files
Docs: Correct /proc/stat path
scripts/spdxcheck.py: fix C++ comment style detection
doc: fix typos in license-rules.rst
Documentation: fix admin-guide/README.rst minimum gcc version requirement
doc: process: complete removal of info about -git patches
doc: translations: sync translations 'remove info about -git patches'
perf-security: wrap paragraphs on 72 columns
perf-security: elaborate on perf_events/Perf privileged users
perf-security: document collected perf_events/Perf data categories
perf-security: document perf_events/Perf resource control
sysfs.txt: add note on available attribute macros
docs: kernel-doc: typo "if ... if" -> "if ... is"
...
Diffstat (limited to 'Documentation')
69 files changed, 3378 insertions, 1109 deletions
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index e133ccd60228..607c1e75e5aa 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -530,8 +530,8 @@ that simply cannot make consistent memory. dma_free_attrs(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs) -Free memory allocated by the dma_alloc_attrs(). All parameters common -parameters must identical to those otherwise passed to dma_fre_coherent, +Free memory allocated by the dma_alloc_attrs(). All common +parameters must be identical to those otherwise passed to dma_free_coherent, and the attrs argument must be identical to the attrs passed to dma_alloc_attrs(). @@ -717,7 +717,7 @@ dma-api/num_free_entries The current number of free dma_debug_entries dma-api/nr_total_entries The total number of dma_debug_entries in the allocator, both free and used. -dma-api/driver-filter You can write a name of a driver into this file +dma-api/driver_filter You can write a name of a driver into this file to limit the debug output to requests from that particular driver. Write an empty string to that file to disable the filter and see diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/DMA-ISA-LPC.txt index 8c2b8be6e45b..b1ec7b16c21f 100644 --- a/Documentation/DMA-ISA-LPC.txt +++ b/Documentation/DMA-ISA-LPC.txt @@ -52,8 +52,8 @@ Address translation ------------------- To translate the virtual address to a bus address, use the normal DMA -API. Do _not_ use isa_virt_to_phys() even though it does the same -thing. The reason for this is that the function isa_virt_to_phys() +API. Do _not_ use isa_virt_to_bus() even though it does the same +thing. The reason for this is that the function isa_virt_to_bus() will require a Kconfig dependency to ISA, not just ISA_DMA_API which is really all you need. Remember that even though the DMA controller has its origins in ISA it is used elsewhere. diff --git a/Documentation/RCU/lockdep-splat.txt b/Documentation/RCU/lockdep-splat.txt index 238e9f61352f..9c015976b174 100644 --- a/Documentation/RCU/lockdep-splat.txt +++ b/Documentation/RCU/lockdep-splat.txt @@ -14,9 +14,9 @@ being the real world and all that. So let's look at an example RCU lockdep splat from 3.0-rc5, one that has long since been fixed: -=============================== -[ INFO: suspicious RCU usage. ] -------------------------------- +============================= +WARNING: suspicious RCU usage +----------------------------- block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage! other info that might help us debug this: @@ -24,11 +24,11 @@ other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 3 locks held by scsi_scan_6/1552: - #0: (&shost->scan_mutex){+.+.+.}, at: [<ffffffff8145efca>] + #0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>] scsi_scan_host_selected+0x5a/0x150 - #1: (&eq->sysfs_lock){+.+...}, at: [<ffffffff812a5032>] + #1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>] elevator_exit+0x22/0x60 - #2: (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffff812b6233>] + #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>] cfq_exit_queue+0x43/0x190 stack backtrace: diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 47e577264198..a582c780c3bd 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -251,7 +251,7 @@ Configuring the kernel Compiling the kernel -------------------- - - Make sure you have at least gcc 3.2 available. + - Make sure you have at least gcc 4.6 available. For more information, refer to :ref:`Documentation/process/changes.rst <changes>`. Please note that you can still run a.out user programs with this kernel. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 42379633801f..49f2acc5eece 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1197,9 +1197,10 @@ arch/x86/kernel/cpu/cpufreq/elanfreq.c. elevator= [IOSCHED] - Format: {"cfq" | "deadline" | "noop"} - See Documentation/block/cfq-iosched.txt and - Documentation/block/deadline-iosched.txt for details. + Format: { "mq-deadline" | "kyber" | "bfq" } + See Documentation/block/deadline-iosched.txt, + Documentation/block/kyber-iosched.txt and + Documentation/block/bfq-iosched.txt for details. elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390] Specifies physical address of start of kernel core @@ -1996,6 +1997,12 @@ Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, the default is off. + kpti= [ARM64] Control page table isolation of user + and kernel address spaces. + Default: enabled on cores which need mitigation. + 0: force disabled + 1: force enabled + kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index f73ebfe9bfe2..72effa7c23b9 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -6,83 +6,211 @@ Perf Events and tool security Overview -------- -Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ can -impose a considerable risk of leaking sensitive data accessed by monitored -processes. The data leakage is possible both in scenarios of direct usage of -perf_events system call API [2]_ and over data files generated by Perf tool user -mode utility (Perf) [3]_ , [4]_ . The risk depends on the nature of data that -perf_events performance monitoring units (PMU) [2]_ collect and expose for -performance analysis. Having that said perf_events/Perf performance monitoring -is the subject for security access control management [5]_ . +Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ +can impose a considerable risk of leaking sensitive data accessed by +monitored processes. The data leakage is possible both in scenarios of +direct usage of perf_events system call API [2]_ and over data files +generated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk +depends on the nature of data that perf_events performance monitoring +units (PMU) [2]_ and Perf collect and expose for performance analysis. +Collected system and performance data may be split into several +categories: + +1. System hardware and software configuration data, for example: a CPU + model and its cache configuration, an amount of available memory and + its topology, used kernel and Perf versions, performance monitoring + setup including experiment time, events configuration, Perf command + line parameters, etc. + +2. User and kernel module paths and their load addresses with sizes, + process and thread names with their PIDs and TIDs, timestamps for + captured hardware and software events. + +3. Content of kernel software counters (e.g., for context switches, page + faults, CPU migrations), architectural hardware performance counters + (PMC) [8]_ and machine specific registers (MSR) [9]_ that provide + execution metrics for various monitored parts of the system (e.g., + memory controller (IMC), interconnect (QPI/UPI) or peripheral (PCIe) + uncore counters) without direct attribution to any execution context + state. + +4. Content of architectural execution context registers (e.g., RIP, RSP, + RBP on x86_64), process user and kernel space memory addresses and + data, content of various architectural MSRs that capture data from + this category. + +Data that belong to the fourth category can potentially contain +sensitive process data. If PMUs in some monitoring modes capture values +of execution context registers or data from process memory then access +to such monitoring capabilities requires to be ordered and secured +properly. So, perf_events/Perf performance monitoring is the subject for +security access control management [5]_ . perf_events/Perf access control ------------------------------- -To perform security checks, the Linux implementation splits processes into two -categories [6]_ : a) privileged processes (whose effective user ID is 0, referred -to as superuser or root), and b) unprivileged processes (whose effective UID is -nonzero). Privileged processes bypass all kernel security permission checks so -perf_events performance monitoring is fully available to privileged processes -without access, scope and resource restrictions. - -Unprivileged processes are subject to a full security permission check based on -the process's credentials [5]_ (usually: effective UID, effective GID, and -supplementary group list). - -Linux divides the privileges traditionally associated with superuser into -distinct units, known as capabilities [6]_ , which can be independently enabled -and disabled on per-thread basis for processes and files of unprivileged users. - -Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated as -privileged processes with respect to perf_events performance monitoring and -bypass *scope* permissions checks in the kernel. - -Unprivileged processes using perf_events system call API is also subject for -PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose outcome -determines whether monitoring is permitted. So unprivileged processes provided -with CAP_SYS_PTRACE capability are effectively permitted to pass the check. - -Other capabilities being granted to unprivileged processes can effectively -enable capturing of additional data required for later performance analysis of -monitored processes or a system. For example, CAP_SYSLOG capability permits -reading kernel space memory addresses from /proc/kallsyms file. +To perform security checks, the Linux implementation splits processes +into two categories [6]_ : a) privileged processes (whose effective user +ID is 0, referred to as superuser or root), and b) unprivileged +processes (whose effective UID is nonzero). Privileged processes bypass +all kernel security permission checks so perf_events performance +monitoring is fully available to privileged processes without access, +scope and resource restrictions. + +Unprivileged processes are subject to a full security permission check +based on the process's credentials [5]_ (usually: effective UID, +effective GID, and supplementary group list). + +Linux divides the privileges traditionally associated with superuser +into distinct units, known as capabilities [6]_ , which can be +independently enabled and disabled on per-thread basis for processes and +files of unprivileged users. + +Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated +as privileged processes with respect to perf_events performance +monitoring and bypass *scope* permissions checks in the kernel. + +Unprivileged processes using perf_events system call API is also subject +for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose +outcome determines whether monitoring is permitted. So unprivileged +processes provided with CAP_SYS_PTRACE capability are effectively +permitted to pass the check. + +Other capabilities being granted to unprivileged processes can +effectively enable capturing of additional data required for later +performance analysis of monitored processes or a system. For example, +CAP_SYSLOG capability permits reading kernel space memory addresses from +/proc/kallsyms file. + +perf_events/Perf privileged users +--------------------------------- + +Mechanisms of capabilities, privileged capability-dumb files [6]_ and +file system ACLs [10]_ can be used to create a dedicated group of +perf_events/Perf privileged users who are permitted to execute +performance monitoring without scope limits. The following steps can be +taken to create such a group of privileged Perf users. + +1. Create perf_users group of privileged Perf users, assign perf_users + group to Perf tool executable and limit access to the executable for + other users in the system who are not in the perf_users group: + +:: + + # groupadd perf_users + # ls -alhF + -rwxr-xr-x 2 root root 11M Oct 19 15:12 perf + # chgrp perf_users perf + # ls -alhF + -rwxr-xr-x 2 root perf_users 11M Oct 19 15:12 perf + # chmod o-rwx perf + # ls -alhF + -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf + +2. Assign the required capabilities to the Perf tool executable file and + enable members of perf_users group with performance monitoring + privileges [6]_ : + +:: + + # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + perf: OK + # getcap perf + perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep + +As a result, members of perf_users group are capable of conducting +performance monitoring by using functionality of the configured Perf +tool executable that, when executes, passes perf_events subsystem scope +checks. + +This specific access control management is only available to superuser +or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ +capabilities. perf_events/Perf unprivileged users ----------------------------------- -perf_events/Perf *scope* and *access* control for unprivileged processes is -governed by perf_event_paranoid [2]_ setting: +perf_events/Perf *scope* and *access* control for unprivileged processes +is governed by perf_event_paranoid [2]_ setting: -1: - Impose no *scope* and *access* restrictions on using perf_events performance - monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ locking limit is - ignored when allocating memory buffers for storing performance data. - This is the least secure mode since allowed monitored *scope* is - maximized and no perf_events specific limits are imposed on *resources* - allocated for performance monitoring. + Impose no *scope* and *access* restrictions on using perf_events + performance monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ + locking limit is ignored when allocating memory buffers for storing + performance data. This is the least secure mode since allowed + monitored *scope* is maximized and no perf_events specific limits + are imposed on *resources* allocated for performance monitoring. >=0: *scope* includes per-process and system wide performance monitoring - but excludes raw tracepoints and ftrace function tracepoints monitoring. - CPU and system events happened when executing either in user or - in kernel space can be monitored and captured for later analysis. - Per-user per-cpu perf_event_mlock_kb locking limit is imposed but - ignored for unprivileged processes with CAP_IPC_LOCK [6]_ capability. + but excludes raw tracepoints and ftrace function tracepoints + monitoring. CPU and system events happened when executing either in + user or in kernel space can be monitored and captured for later + analysis. Per-user per-cpu perf_event_mlock_kb locking limit is + imposed but ignored for unprivileged processes with CAP_IPC_LOCK + [6]_ capability. >=1: - *scope* includes per-process performance monitoring only and excludes - system wide performance monitoring. CPU and system events happened when - executing either in user or in kernel space can be monitored and - captured for later analysis. Per-user per-cpu perf_event_mlock_kb - locking limit is imposed but ignored for unprivileged processes with - CAP_IPC_LOCK capability. + *scope* includes per-process performance monitoring only and + excludes system wide performance monitoring. CPU and system events + happened when executing either in user or in kernel space can be + monitored and captured for later analysis. Per-user per-cpu + perf_event_mlock_kb locking limit is imposed but ignored for + unprivileged processes with CAP_IPC_LOCK capability. >=2: - *scope* includes per-process performance monitoring only. CPU and system - events happened when executing in user space only can be monitored and - captured for later analysis. Per-user per-cpu perf_event_mlock_kb - locking limit is imposed but ignored for unprivileged processes with - CAP_IPC_LOCK capability. + *scope* includes per-process performance monitoring only. CPU and + system events happened when executing in user space only can be + monitored and captured for later analysis. Per-user per-cpu + perf_event_mlock_kb locking limit is imposed but ignored for + unprivileged processes with CAP_IPC_LOCK capability. + +perf_events/Perf resource control +--------------------------------- + +Open file descriptors ++++++++++++++++++++++ + +The perf_events system call API [2]_ allocates file descriptors for +every configured PMU event. Open file descriptors are a per-process +accountable resource governed by the RLIMIT_NOFILE [11]_ limit +(ulimit -n), which is usually derived from the login shell process. When +configuring Perf collection for a long list of events on a large server +system, this limit can be easily hit preventing required monitoring +configuration. RLIMIT_NOFILE limit can be increased on per-user basis +modifying content of the limits.conf file [12]_ . Ordinarily, a Perf +sampling session (perf record) requires an amount of open perf_event +file descriptors that is not less than the number of monitored events +multiplied by the number of monitored CPUs. + +Memory allocation ++++++++++++++++++ + +The amount of memory available to user processes for capturing +performance monitoring data is governed by the perf_event_mlock_kb [2]_ +setting. This perf_event specific resource setting defines overall +per-cpu limits of memory allowed for mapping by the user processes to +execute performance monitoring. The setting essentially extends the +RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped +specifically for capturing monitored performance events and related data. + +For example, if a machine has eight |