summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2020-12-15string.h: add FORTIFY coverage for strscpy()Francis Laniel
The fortified version of strscpy ensures the following before vanilla strscpy is called: 1. There is no read overflow because we either size is smaller than src length or we shrink size to src length by calling fortified strnlen. 2. There is no write overflow because we either failed during compilation or at runtime by checking that size is smaller than dest size. Link: https://lkml.kernel.org/r/20201122162451.27551-4-laniel_francis@privacyrequired.com Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Daniel Axtens <dja@axtens.net> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15lib: string.h: detect intra-object overflow in fortified string functionsDaniel Axtens
Patch series "Fortify strscpy()", v7. This patch implements a fortified version of strscpy() enabled by setting CONFIG_FORTIFY_SOURCE=y. The new version ensures the following before calling vanilla strscpy(): 1. There is no read overflow because either size is smaller than src length or we shrink size to src length by calling fortified strnlen(). 2. There is no write overflow because we either failed during compilation or at runtime by checking that size is smaller than dest size. Note that, if src and dst size cannot be got, the patch defaults to call vanilla strscpy(). The patches adds the following: 1. Implement the fortified version of strscpy(). 2. Add a new LKDTM test to ensures the fortified version still returns the same value as the vanilla one while panic'ing when there is a write overflow. 3. Correct some typos in LKDTM related file. I based my modifications on top of two patches from Daniel Axtens which modify calls to __builtin_object_size, in fortified string functions, to ensure the true size of char * are returned and not the surrounding structure size. About performance, I measured the slow down of fortified strscpy(), using the vanilla one as baseline. The hardware I used is an Intel i3 2130 CPU clocked at 3.4 GHz. I ran "Linux 5.10.0-rc4+ SMP PREEMPT" inside qemu 3.10 with 4 CPU cores. The following code, called through LKDTM, was used as a benchmark: #define TIMES 10000 char *src; char dst[7]; int i; ktime_t begin; src = kstrdup("foobar", GFP_KERNEL); if (src == NULL) return; begin = ktime_get(); for (i = 0; i < TIMES; i++) strscpy(dst, src, strlen(src)); pr_info("%d fortified strscpy() tooks %lld", TIMES, ktime_get() - begin); begin = ktime_get(); for (i = 0; i < TIMES; i++) __real_strscpy(dst, src, strlen(src)); pr_info("%d vanilla strscpy() tooks %lld", TIMES, ktime_get() - begin); kfree(src); I called the above code 30 times to compute stats for each version (in ns, round to int): | version | mean | std | median | 95th | | --------- | ------- | ------ | ------- | ------- | | fortified | 245_069 | 54_657 | 216_230 | 331_122 | | vanilla | 172_501 | 70_281 | 143_539 | 219_553 | On average, fortified strscpy() is approximately 1.42 times slower than vanilla strscpy(). For the 95th percentile, the fortified version is about 1.50 times slower. So, clearly the stats are not in favor of fortified strscpy(). But, the fortified version loops the string twice (one in strnlen() and another in vanilla strscpy()) while the vanilla one only loops once. This can explain why fortified strscpy() is slower than the vanilla one. This patch (of 5): When the fortify feature was first introduced in commit 6974f0c4555e ("include/linux/string.h: add the option of fortified string.h functions"), Daniel Micay observed: * It should be possible to optionally use __builtin_object_size(x, 1) for some functions (C strings) to detect intra-object overflows (like glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative approach to avoid likely compatibility issues. This is a case that often cannot be caught by KASAN. Consider: struct foo { char a[10]; char b[10]; } void test() { char *msg; struct foo foo; msg = kmalloc(16, GFP_KERNEL); strcpy(msg, "Hello world!!"); // this copy overwrites foo.b strcpy(foo.a, msg); } The questionable copy overflows foo.a and writes to foo.b as well. It cannot be detected by KASAN. Currently it is also not detected by fortify, because strcpy considers __builtin_object_size(x, 0), which considers the size of the surrounding object (here, struct foo). However, if we switch the string functions over to use __builtin_object_size(x, 1), the compiler will measure the size of the closest surrounding subobject (here, foo.a), rather than the size of the surrounding object as a whole. See https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html for more info. Only do this for string functions: we cannot use it on things like memcpy, memmove, memcmp and memchr_inv due to code like this which purposefully operates on multiple structure members: (arch/x86/kernel/traps.c) /* * regs->sp points to the failing IRET frame on the * ESPFIX64 stack. Copy it to the entry stack. This fills * in gpregs->ss through gpregs->ip. * */ memmove(&gpregs->ip, (void *)regs->sp, 5*8); This change passes an allyesconfig on powerpc and x86, and an x86 kernel built with it survives running with syz-stress from syzkaller, so it seems safe so far. Link: https://lkml.kernel.org/r/20201122162451.27551-1-laniel_francis@privacyrequired.com Link: https://lkml.kernel.org/r/20201122162451.27551-2-laniel_francis@privacyrequired.com Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15ilog2: improve ilog2 for constant argumentsJakub Jelinek
As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 the const_ilog2 macro generates a lot of code which interferes badly with GCC inlining heuristics, until it can be proven that the ilog2 argument can or can't be simplified into a constant. It can be expressed using __builtin_clzll builtin which is supported by GCC 3.4 and later and when used only in the __builtin_constant_p guarded code it ought to always fold back to a constant. Other compilers support the same builtin for many years too. Other option would be to change the const_ilog2 macro, though as the description says it is meant to be used also in C constant expressions, and while GCC will fold it to constant with constant argument even in those, perhaps it is better to avoid using extensions in that case. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/20201120125154.GB3040@hirez.programming.kicks-ass.net Link: https://lkml.kernel.org/r/20201021132718.GB2176@tucnak Signed-off-by: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15bitmap: remove unused function declarationMa, Jianpeng
Link: https://lkml.kernel.org/r/BN7PR11MB26097166B6B46387D8A1ABA4FDE30@BN7PR11MB2609.namprd11.prod.outlook.com Fixes: 2afe27c718b6 ("lib/bitmap.c: bitmap_[empty,full]: remove code duplication") Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15include/linux/bitmap.h: convert bitmap_empty() / bitmap_full() to return booleanAndy Shevchenko
There is no need to return int type out of boolean expression. Link: https://lkml.kernel.org/r/20201027180936.20806-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15kernel.h: split out mathematical helpersAndy Shevchenko
kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out mathematical helpers. At the same time convert users in header and lib folder to use new header. Though for time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [sfr@canb.auug.org.au: fix powerpc build] Link: https://lkml.kernel.org/r/20201029150809.13059608@canb.auug.org.au Link: https://lkml.kernel.org/r/20201028173212.41768-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15asm-generic: force inlining of get_order() to work around gcc10 poor decisionChristophe Leroy
When building mpc885_ads_defconfig with gcc 10.1, the function get_order() appears 50 times in vmlinux: [linux]# ppc-linux-objdump -x vmlinux | grep get_order | wc -l 50 [linux]# size vmlinux text data bss dec hex filename 3842620 675624 135160 4653404 47015c vmlinux In the old days, marking a function 'static inline' was forcing GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline a function. It looks like GCC 10 is taking poor decisions on this. get_order() compiles into the following tiny function, occupying 20 bytes of text. 0000007c <get_order>: 7c: 38 63 ff ff addi r3,r3,-1 80: 54 63 a3 3e rlwinm r3,r3,20,12,31 84: 7c 63 00 34 cntlzw r3,r3 88: 20 63 00 20 subfic r3,r3,32 8c: 4e 80 00 20 blr By forcing get_order() to be __always_inline, the size of text is reduced by 1940 bytes, that is almost twice the space occupied by 50 times get_order() [linux-powerpc]# size vmlinux text data bss dec hex filename 3840680 675588 135176 4651444 46f9b4 vmlinux Link: https://lkml.kernel.org/r/96c6172d619c51acc5c1c4884b80785c59af4102.1602949927.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Joel Stanley <joel@jms.id.au> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15proc: fix lookup in /proc/net subdirectories after setns(2)Alexey Dobriyan
Commit 1fde6f21d90f ("proc: fix /proc/net/* after setns(2)") only forced revalidation of regular files under /proc/net/ However, /proc/net/ is unusual in the sense of /proc/net/foo handlers take netns pointer from parent directory which is old netns. Steps to reproduce: (void)open("/proc/net/sctp/snmp", O_RDONLY); unshare(CLONE_NEWNET); int fd = open("/proc/net/sctp/snmp", O_RDONLY); read(fd, &c, 1); Read will read wrong data from original netns. Patch forces lookup on every directory under /proc/net . Link: https://lkml.kernel.org/r/20201205160916.GA109739@localhost.localdomain Fixes: 1da4d377f943 ("proc: revalidate misc dentries") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15Merge tag 'pci-v5.11-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Decode PCIe 64 GT/s link speed (Gustavo Pimentel) - Remove unused HAVE_PCI_SET_MWI (Heiner Kallweit) - Reduce pci_set_cacheline_size() message to debug level (Heiner Kallweit) - Fix pci_slot_release() NULL pointer dereference (Jubin Zhong) - Unify ECAM constants in native PCI Express drivers (Krzysztof Wilczyński) - Return u8 from pci_find_capability() and similar (Puranjay Mohan) - Return u16 from pci_find_ext_capability() and similar (Bjorn Helgaas) - Fix ACPI companion lookup for device 0 on the root bus (Rafael J. Wysocki) Resource management: - Keep both device and resource name for config space remaps (Alexander Lobakin) - Bounds-check command-line resource alignment requests (Bjorn Helgaas) - Fix overflow in command-line resource alignment requests (Colin Ian King) Driver binding: - Avoid duplicate IDs in driver dynamic IDs list (Zhenzhong Duan) Power management: - Save/restore Precision Time Measurement Capability for suspend/resume (David E. Box) - Disable PTM during suspend to save power (David E. Box) - Add sysfs attribute for device power state (Maximilian Luz) - Rename pci_wakeup_bus() to pci_resume_bus() (Mika Westerberg) - Do not generate wakeup event when runtime resuming device (Mika Westerberg) - Save/restore ASPM L1SS Capability for suspend/resume (Vidya Sagar) Virtualization: - Mark AMD Raven iGPU ATS as broken in some platforms (Alex Deucher) - Add function 1 DMA alias quirk for Marvell 9215 SATA controller (Bjorn Helgaas) MSI: - Disable MSI for Pericom PCIe-USB adapter (Andy Shevchenko) - Improve warnings for 32-bit-limited MSI support (Vidya Sagar) Error handling: - Cache RCEC EA Capability offset in pci_init_capabilities() (Sean V Kelley) - Rename reset_link() to reset_subordinates() (Sean V Kelley) - Write AER Capability only when we control it (Sean V Kelley) - Clear AER status only when we control AER (Sean V Kelley) - Bind RCEC devices to the Root Port driver (Qiuxu Zhuo) - Recover from RCiEP AER errors (Qiuxu Zhuo) - Recover from RCEC AER errors (Sean V Kelley) - Add pcie_link_rcec() to associate RCiEPs (Sean V Kelley) - Add pcie_walk_rcec() to RCEC AER handling (Sean V Kelley) - Add pcie_walk_rcec() to RCEC PME handling (Sean V Kelley) - Add RCEC AER error injection support (Qiuxu Zhuo) Broadcom iProc PCIe controller driver: - Fix out-of-bound array accesses (Bharat Gooty) - Invalidate correct PAXB inbound windows (Roman Bacik) - Enhance PCIe Link information display (Srinath Mannam) Cadence PCIe controller driver: - Make "cdns,max-outbound-regions" property optional (Kishon Vijay Abraham I) Intel VMD host bridge driver: - Offset client MSI-X vectors (Jon Derrick) - Update type of __iomem pointers (Krzysztof Wilczyński) NVIDIA Tegra PCIe controller driver: - Move "dbi" accesses to post common DWC initialization (Vidya Sagar) - Read "dbi" base address to program in application logic (Vidya Sagar) - Fix ASPM-L1SS advertisement disable code (Vidya Sagar) - Set DesignWare IP version (Vidya Sagar) - Continue unconfig sequence even if parts fail (Vidya Sagar) - Check return value of tegra_pcie_init_controller() (Vidya Sagar) - Disable LTSSM during L2 entry (Vidya Sagar) Qualcomm PCIe controller driver: - Document PCIe bindings for SM8250 SoC (Manivannan Sadhasivam) - Add SM8250 SoC support (Manivannan Sadhasivam) - Add support for configuring BDF to SID mapping for SM8250 (Manivannan Sadhasivam) Renesas R-Car PCIe controller driver: - rcar: Drop unused members from struct rcar_pcie_host (Lad Prabhakar) - PCI: rcar-pci-host: Document r8a774e1 bindings (Lad Prabhakar) - PCI: rcar-pci-host: Convert bindings to json-schema (Yoshihiro Shimoda) - PCI: rcar-pci-host: Document r8a77965 bindings (Yoshihiro Shimoda) Samsung Exynos PCIe controller driver: - Rework driver to support Exynos5433 PCIe PHY (Jaehoon Chung) - Rework driver to support Exynos5433 variant (Jaehoon Chung) - Drop samsung,exynos5440-pcie binding (Marek Szyprowski) - Add the samsung,exynos-pcie binding (Marek Szyprowski) - Add the samsung,exynos-pcie-phy binding (Marek Szyprowski) Synopsys DesignWare PCIe controller driver: - Support multiple ATU memory regions (Rob Herring) - Move intel-gw ATU offset out of driver match data (Rob Herring) - Move "dbi", "dbi2", and "addr_space" resource setup into common code (Rob Herring) - Remove intel-gw unneeded function wrappers (Rob Herring) - Ensure all outbound ATU windows are reset (Rob Herring) - Use the common MSI irq_chip in dra7xx (Rob Herring) - Drop the .set_num_vectors() host op (Rob Herring) - Move MSI interrupt setup into DWC common code (Rob Herring) - Rework MSI initialization (Rob Herring) - Move link handling into common code (Rob Herring) - Move dw_pcie_msi_init() into core (Rob Herring) - Move dw_pcie_setup_rc() to DWC common code (Rob Herring) - Remove unnecessary wrappers around dw_pcie_host_init() (Rob Herring) - Drop keystone duplicated 'num-viewport'" (Rob Herring) - Move inbound and outbound windows to common struct (Rob Herring) - Detect number of iATU windows (Rob Herring) - Warn if non-prefetchable memory aperture size is > 32-bit (Vidya Sagar) - Add support to program ATU for >4GB memory (Vidya Sagar) - Set 32-bit DMA mask for MSI target address allocation (Vidya Sagar) TI J721E PCIe driver: - Fix "ti,syscon-pcie-ctrl" to take argument (Kishon Vijay Abraham I) - Add host mode dt-bindings for TI's J7200 SoC (Kishon Vijay Abraham I) - Add EP mode dt-bindings for TI's J7200 SoC (Kishon Vijay Abraham I) - Get offset within "syscon" from "ti,syscon-pcie-ctrl" phandle arg (Kishon Vijay Abraham I) TI Keystone PCIe controller driver: - Enable compile-testing on !ARM (Alex Dewar)" * tag 'pci-v5.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (100 commits) PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller PCI/ACPI: Fix companion lookup for device 0 on the root bus PCI: Keep both device and resource name for config space remaps PCI: xgene: Removed unused ".bus_shift" initialisers from pci-xgene.c PCI: vmd: Update type of the __iomem pointers PCI: iproc: Convert to use the new ECAM constants PCI: thunder-pem: Add constant for custom ".bus_shift" initialiser PCI: Unify ECAM constants in native PCI Express drivers PCI: Disable PTM during suspend to save power PCI/PTM: Save/restore Precision Time Measurement Capability for suspend/resume PCI: Mark AMD Raven iGPU ATS as broken in some platforms PCI: j721e: Get offset within "syscon" from "ti,syscon-pcie-ctrl" phandle arg dt-bindings: PCI: Add EP mode dt-bindings for TI's J7200 SoC dt-bindings: PCI: Add host mode dt-bindings for TI's J7200 SoC dt-bindings: pci: ti,j721e: Fix "ti,syscon-pcie-ctrl" to take argument PCI: dwc: Set 32-bit DMA mask for MSI target address allocation PCI: qcom: Add support for configuring BDF to SID mapping for SM8250 PCI: Reduce pci_set_cacheline_size() message to debug level PCI: Remove unused HAVE_PCI_SET_MWI PCI: qcom: Add SM8250 SoC support ...
2020-12-15Merge tag 'acpi-5.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to upstream revision 20201113, fix and clean up some resources manipulation code, extend the enumeration and gpio-line-names property documentation, clean up the handling of _DEP during device enumeration, add a new backlight DMI quirk, clean up transaction handling in the EC driver and make some assorted janitorial changes. Specifics: - Update ACPICA code in the kernel to upstream revision 20201113 with changes as follows: * Add 5 new UUIDs to the known UUID table (Bob Moore) * Remove extreaneous "the" in comments (Colin Ian King) * Add function trace macros to improve debugging (Erik Kaneda) * Fix interpreter memory leak (Erik Kaneda) * Handle "orphan" _REG for GPIO OpRegions (Hans de Goede) - Introduce resource_union() and resource_intersection() helpers and clean up some resource-manipulation code with the help of them (Andy Shevchenko) - Revert problematic commit related to the handling of resources in the ACPI core (Daniel Scally) - Extend the ACPI device enumeration documentation and the gpio-line-names _DSD property documentation, clean up the latter (Flavio Suligoi) - Clean up _DEP handling during device enumeration, modify the list of _DEP exceptions and the handling of it and fix up terminology related to _DEP (Hans de Goede, Rafael Wysocki) - Eliminate in_interrupt() usage from the ACPI EC driver (Sebastian Andrzej Siewior) - Clean up the advance_transaction() routine and related code in the ACPI EC driver (Rafael Wysocki) - Add new backlight quirk for GIGABYTE GB-BXBT-2807 (Jasper St Pierre) - Make assorted janitorial changes in several ACPI-related pieces of code (Hanjun Guo, Jason Yan, Punit Agrawal)" * tag 'acpi-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (40 commits) ACPI: scan: Fix up _DEP-related terminology with supplier/consumer ACPI: scan: Drop INT3396 from acpi_ignore_dep_ids[] ACPI: video: Add DMI quirk for GIGABYTE GB-BXBT-2807 Revert "ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks" ACPI: scan: Add PNP0D80 to the _DEP exceptions list ACPI: scan: Call acpi_get_object_info() from acpi_add_single_object() ACPI: scan: Add acpi_info_matches_hids() helper ACPICA: Update version to 20201113 ACPICA: Interpreter: fix memory leak by using existing buffer ACPICA: Add function trace macros to improve debugging ACPICA: Also handle "orphan" _REG methods for GPIO OpRegions ACPICA: Remove extreaneous "the" in comments ACPICA: Add 5 new UUIDs to the known UUID table resource: provide meaningful MODULE_LICENSE() in test suite ASoC: Intel: catpt: Replace open coded variant of resource_intersection() ACPI: processor: Drop duplicate setting of shared_cpu_map ACPI: EC: Clean up status flags checks in advance_transaction() ACPI: EC: Untangle error handling in advance_transaction() ACPI: EC: Simplify error handling in advance_transaction() ACPI: EC: Rename acpi_ec_is_gpe_raised() ...
2020-12-15Merge tag 'pm-5.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These update cpufreq (core and drivers), cpuidle (polling state implementation and the PSCI driver), the OPP (operating performance points) framework, devfreq (core and drivers), the power capping RAPL (Running Average Power Limit) driver, the Energy Model support, the generic power domains (genpd) framework, the ACPI device power management, the core system-wide suspend code and power management utilities. Specifics: - Use local_clock() instead of jiffies in the cpufreq statistics to improve accuracy (Viresh Kumar). - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq drivers (Viresh Kumar). - Clean up the cpufreq core, the intel_pstate driver and the schedutil cpufreq governor (Rafael Wysocki). - Fix up error code paths in the sti-cpufreq and mediatek cpufreq drivers (Yangtao Li, Qinglang Miao). - Fix cpufreq_online() to return error codes instead of success (0) in all cases when it fails (Wang ShaoBo). - Add mt8167 support to the mediatek cpufreq driver and blacklist mt8516 in the cpufreq-dt-platdev driver (Fabien Parent). - Modify the tegra194 cpufreq driver to always return values from the frequency table as the current frequency and clean up that driver (Sumit Gupta, Jon Hunter). - Modify the arm_scmi cpufreq driver to allow it to discover the power scale present in the performance protocol and provide this information to the Energy Model (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali Rohár). - Clean up the CPPC cpufreq driver (Ionela Voinescu). - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd Bergmann). - Rework the poling interval selection for the polling state in cpuidle (Mel Gorman). - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver (Ulf Hansson). - Modify the OPP framework to support empty (node-less) OPP tables in DT for passing dependency information (Nicola Mazzucato). - Fix potential lockdep issue in the OPP core and clean up the OPP core (Viresh Kumar). - Modify dev_pm_opp_put_regulators() to accept a NULL argument and update its users accordingly (Viresh Kumar). - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke). - Add support for governor feature flags to devfreq, make devfreq sysfs file permissions depend on the governor and clean up the devfreq core (Chanwoo Choi). - Clean up the tegra20 devfreq driver and deprecate it to allow another driver based on EMC_STAT to be used instead of it (Dmitry Osipenko). - Add interconnect support to the tegra30 devfreq driver, allow it to take the interconnect and OPP information from DT and clean it up (Dmitry Osipenko). - Add interconnect support to the exynos-bus devfreq driver along with interconnect properties documentation (Sylwester Nawrocki). - Add suport for AMD Fam17h and Fam19h processors to the RAPL power capping driver (Victor Ding, Kim Phillips). - Fix handling of overly long constraint names in the powercap framework (Lukasz Luba). - Fix the wakeup configuration handling for bridges in the ACPI device power management core (Rafael Wysocki). - Add support for using an abstract scale for power units in the Energy Model (EM) and document it (Lukasz Luba). - Add em_cpu_energy() micro-optimization to the EM (Pavankumar Kondeti). - Modify the generic power domains (genpd) framwework to support suspend-to-idle (Ulf Hansson). - Fix creation of debugfs nodes in genpd (Thierry Strudel). - Clean up genpd (Lina Iyer). - Clean up the core system-wide suspend code and make it print driver flags for devices with debug enabled (Alex Shi, Patrice Chotard, Chen Yu). - Modify the ACPI system reboot code to make it prepare for system power off to avoid confusing the platform firmware (Kai-Heng Feng). - Update the pm-graph (multiple changes, mostly usability-related) and cpupower (online and offline CPU information support) PM utilities (Todd Brandt, Brahadambal Srinivasan)" * tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() PM: domains: create debugfs nodes when adding power domains opp: of: Allow empty opp-table with opp-shared dt-bindings: opp: Allow empty OPP tables media: venus: dev_pm_opp_put_*() accepts NULL argument drm/panfrost: dev_pm_opp_put_*() accepts NULL argument drm/lima: dev_pm_opp_put_*() accepts NULL argument PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table opp: Don't create an OPP table from dev_pm_opp_get_opp_table() cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table opp: Reduce the size of critical section in _opp_kref_release() PM / EM: Micro optimization in em_cpu_energy cpufreq: arm_scmi: Discover the power scale in performance protocol ...
2020-12-15Merge tag 'thermal-v5.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Add upper and lower limits clamps for the cooling device state in the power allocator governor (Michael Kao) - Add upper and lower limits support for the power allocator governor (Lukasz Luba) - Optimize conditions testing for the trip points (Bernard Zhao) - Replace spin_lock_irqsave by spin_lock in hard IRQ on the rcar driver (Tian Tao) - Add MT8516 dt-bindings and device reset optional support (Fabien Parent) - Add a quiescent period to cool down the PCH when entering S0iX (Sumeet Pawnikar) - Use bitmap API instead of re-inventing the wheel on sun8i (Yangtao Li) - Remove useless NULL check in the hwmon driver (Bernard Zhao) - Update the current state in the cpufreq cooling device only if the frequency change is effective (Zhuguangqing) - Improve the schema validation for the rcar DT bindings (Geert Uytterhoeven) - Fix the user time unit in the documentation (Viresh Kumar) - Add PCI ids for Lewisburg PCH (Andres Freund) - Add hwmon support on amlogic (Martin Blumenstingl) - Fix build failure for PCH entering on in S0iX (Randy Dunlap) - Improve the k_* coefficient for the power allocator governor (Lukasz Luba) - Fix missing const on a sysfs attribute (Rikard Falkeborn) - Remove broken interrupt support on rcar to be replaced by a new one (Niklas Söderlund) - Improve the error code handling at init time on imx8mm (Fabio Estevam) - Compute interval validity once instead at each temperature reading iteration on acerhdf (Daniel Lezcano) - Add r8a779a0 support (Niklas Söderlund) - Add PCI ids for AlderLake PCH and mmio refactoring (Srinivas Pandruvada) - Add RFIM and mailbox support on int340x (Srinivas Pandruvada) - Use macro for temperature calculation on PCH (Sumeet Pawnikar) - Simplify return conditions at probe time on Broadcom (Zheng Yongjun) - Fix workload name on PCH (Srinivas Pandruvada) - Migrate the devfreq cooling device code to the energy model API (Lukasz Luba) - Emit a warning if the thermal_zone_device_update is called without the .get_temp() ops (Daniel Lezcano) - Add critical and hot ops for the thermal zone (Daniel Lezcano) - Remove notification usage when critical is reached on rcar (Daniel Lezcano) - Fix devfreq build when ENERGY_MODEL is not set (Lukasz Luba) * tag 'thermal-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (45 commits) thermal/drivers/devfreq_cooling: Fix the build when !ENERGY_MODEL thermal/drivers/rcar: Remove notification usage thermal/core: Add critical and hot ops thermal/core: Emit a warning if the thermal zone is updated without ops drm/panfrost: Register devfreq cooling and attempt to add Energy Model thermal: devfreq_cooling: remove old power model and use EM thermal: devfreq_cooling: add new registration functions with Energy Model thermal: devfreq_cooling: use a copy of device status thermal: devfreq_cooling: change tracing function and arguments thermal: int340x: processor_thermal: Correct workload type name thermal: broadcom: simplify the return expression of bcm2711_thermal_probe() thermal: intel: pch: use macro for temperature calculation thermal: int340x: processor_thermal: Add mailbox driver thermal: int340x: processor_thermal: Add RFIM driver thermal: int340x: processor_thermal: Add AlderLake PCI device id thermal: int340x: processor_thermal: Refactor MMIO interface thermal: rcar_gen3_thermal: Add r8a779a0 support dt-bindings: thermal: rcar-gen3-thermal: Add r8a779a0 support platform/x86/drivers/acerhdf: Check the interval value when it is set platform/x86/drivers/acerhdf: Use module_param_cb to set/get polling interval ...
2020-12-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - support for inhibiting input devices at request from userspace. If a device implements open/close methods, it can also put device into low power state. This is needed, for example, to disable keyboard and touchpad on convertibles when they are transitioned into tablet mode - now that ordinary input devices can be configured for polling mode, dedicated input polling device implementation has been removed - GTCO tablet driver has been removed, as it used problematic custom HID parser, devices are EOL, and there is no interest from the manufacturer - a new driver for Dialog DA7280 haptic chips has been introduced - a new driver for power button on Dell Wyse 3020 - support for eKTF2132 in ektf2127 driver - support for SC2721 and SC2730 in sc27xx-vibra driver - enhancements for Atmel touchscreens, AD7846 touchscreens, Elan touchpads, ADP5589, ST1232 touchscreen, TM2 touchkey drivers - fixes and cleanups to allow clean builds with W=1 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (86 commits) Input: da7280 - fix spelling mistake "sequemce" -> "sequence" Input: cyapa_gen6 - fix out-of-bounds stack access Input: sc27xx - add support for sc2730 and sc2721 dt-bindings: input: Add compatible string for SC2721 and SC2730 dt-bindings: input: Convert sc27xx-vibra.txt to json-schema Input: stmpe - add axis inversion and swapping capability Input: adp5589-keys - do not explicitly control IRQ for wakeup Input: adp5589-keys - do not unconditionally configure as wakeup source Input: ipx4xx-beeper - convert comma to semicolon Input: parkbd - convert comma to semicolon Input: new da7280 haptic driver dt-bindings: input: Add document bindings for DA7280 MAINTAINERS: da7280 updates to the Dialog Semiconductor search terms Input: elantech - fix protocol errors for some trackpoints in SMBus mode Input: elan_i2c - add new trackpoint report type 0x5F Input: elants - document some registers and values Input: atmel_mxt_ts - simplify the return expression of mxt_send_bootloader_cmd() Input: imx_keypad - add COMPILE_TEST support Input: applespi - use new structure for SPI transfer delays Input: synaptics-rmi4 - use new structure for SPI transfer delays ...
2020-12-15Merge tag 'platform-drivers-x86-v5.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: "Highlights: - New driver for changing BIOS settings from within Linux on Dell devices. This introduces a new generic sysfs API for this. Lenovo is working on also supporting this API on their devices - New Intel PMT telemetry and crashlog drivers - Support for SW_TABLET_MODE reporting for the acer-wmi and intel-hid drivers - Preparation work for improving support for Microsoft Surface hardware - Various fixes / improvements / quirks for the panasonic-laptop and others" * tag 'platform-drivers-x86-v5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (81 commits) platform/x86: ISST: Mark mmio_range_devid_0 and mmio_range_devid_1 with static keyword platform/x86: intel-hid: add Rocket Lake ACPI device ID x86/platform: classmate-laptop: add WiFi media button platform/x86: mlx-platform: Fix item counter assignment for MSN2700/ComEx system platform/x86: mlx-platform: Fix item counter assignment for MSN2700, MSN24xx systems tools/power/x86/intel-speed-select: Update version for v5.11 tools/power/x86/intel-speed-select: Account for missing sysfs for die_id tools/power/x86/intel-speed-select: Read TRL from mailbox platform/x86: intel-hid: Do not create SW_TABLET_MODE input-dev when a KIOX010A ACPI dev is present platform/x86: intel-hid: Add alternative method to enable switches platform/x86: intel-hid: Add support for SW_TABLET_MODE platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on some HP x360 models platform/x86: ISST: Change PCI device macros platform/x86: ISST: Allow configurable offset range platform/x86: ISST: Check for unaligned mmio address acer-wireless: send an EV_SYN/SYN_REPORT between state changes platform/x86: dell-wmi-sysman: work around for BIOS bug platform/x86: mlx-platform: remove an unused variable platform/x86: thinkpad_acpi: remove trailing semicolon in macro definition platform/x86: dell-smbios-base: Fix error return code in dell_smbios_init ...
2020-12-15Merge tag 'mmc-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds
Pull MMC updates from Ulf Hansson: "MMC core: - Initial support for SD express card/host MMC host: - mxc: Convert the driver to DT-only - mtk-sd: Add HS400 enhanced strobe support - mtk-sd: Add support for the MT8192 SoC variant - sdhci-acpi: Allow changing HS200/HS400 driver strength for AMDI0040 - sdhci-esdhc-imx: Convert the driver to DT-only - sdhci-pci-gli: Improve performance for HS400 mode for GL9763E - sdhci-pci-gli: Reduce power consumption for GL9755 - sdhci-xenon: Introduce ACPI support - tmio: Fix command error processing - tmio: Inform the core about the max_busy_timeout - tmio/renesas_sdhi: Support custom calculation of busy-wait time - renesas_sdhi: Reset SCC only when available - rtsx_pci: Add SD Express mode support for RTS5261 - rtsx_pci: Various fixes and improvements for RTS5261 MEMSTICK: - Minor fixes/improvements" * tag 'mmc-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (72 commits) dt-bindings: mmc: eliminate yamllint warnings mmc: sdhci-xenon: introduce ACPI support mmc: sdhci-xenon: use clk only with DT mmc: sdhci-xenon: switch to device_* API mmc: sdhci-xenon: use match data for controllers variants dt-bindings: mmc: Fix xlnx,mio-bank property values for arasan driver mmc: renesas_sdhi: populate hook for longer busy_wait mmc: tmio: add hook for custom busy_wait calculation mmc: tmio: set max_busy_timeout dt-bindings: mmc: imx: fix the wrongly dropped imx8qm compatible string mmc: sdhci-pci-gli: Disable slow mode in HS400 mode for GL9763E mmc: sdhci: Use more concise device_property_read_u64 memstick: r592: Fix error return in r592_probe() mmc: mxc: Convert the driver to DT-only mmc: mxs: Remove the unused .id_table mmc: sdhci-of-arasan: Fix fall-through warnings for Clang mmc: sdhci-pci-gli: Reduce power consumption for GL9755 mmc: mediatek: depend on COMMON_CLK to fix compile tests mmc: pxamci: Fix error return code in pxamci_probe mmc: sdhci: Update firmware interface API ...
2020-12-15Merge tag 'spi-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "The big change this release has been some excellent work from Lukas Wunner which closes a bunch of holes in the cleanup paths for drivers, mainly introduced as a result of devm conversions causing bad interactions with the support SPI has for allocating the bus and driver data together. Together with some of the other work done it feels like we've turned the corner on several long standing pain points with the API. Summary: - Many cleanups around probe/remove and error handling from Lukas Wunner and Uwe Kleine-König, and further fixes around PM from Zhang Qilong. - Provide a mask for which bits of the mode can safely be configured by drivers and use that to fix an issue with the ADS7846 driver. - Documentation of the expected interactions between SPI and GPIO level chip select polarity configuration from H. Nikolaus Schaller, hopefully we're pretty much at the end of sorting out the interactions there. Thanks to Nikolaus, Sven Van Asbroeck and Linus Walleij for this. - DMA support for Allwinner sun6i controllers. - Support for Canaan K210 Designware implementations and Intel Adler Lake" * tag 'spi-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (69 commits) spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors spi: Limit the spi device max speed to controller's max speed spi: spi-geni-qcom: Use the new method of gpio CS control platform/chrome: cros_ec_spi: Drop bits_per_word assignment platform/chrome: cros_ec_spi: Don't overwrite spi::mode spi: dw: Add support for the Canaan K210 SoC SPI spi: dw: Add support for 32-bits max xfer size dt-bindings: spi: dw-apb-ssi: Add Canaan K210 SPI controller spi: Update DT binding docs to support SiFive FU740 SoC spi: atmel-quadspi: Fix use-after-free on unbind spi: npcm-fiu: Disable clock in probe error path spi: ar934x: Don't leak SPI master in probe error path spi: mt7621: Don't leak SPI master in probe error path spi: mt7621: Disable clock in probe error path media: netup_unidvb: Don't leak SPI master in probe error path spi: sc18is602: Don't leak SPI master in probe error path spi: rb4xx: Don't leak SPI master in probe error path spi: gpio: Don't leak SPI master in probe error path spi: spi-mtk-nor: Don't leak SPI master in probe error path spi: mxic: Don't leak SPI master in probe error path ...
2020-12-15Merge tag 'regulator-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "This has been a quiet release for the regulator API, a few new drivers and the usual fixes and cleanup traffic but not much else going on: - Optimisations for the handling of voltage enumeration, especially with sparse selector sets, from Claudiu Beznea. - Support for several ARM SCMI regulators, Dialog DA9121, NXP PF8x00, Qualcomm PMX55, PM8350 and PM8350c The addition of the SCMI regulator driver (which controls regulators via system firmware) means that we've pulled in the support for the underlying firmware operations from the firmware tree" * tag 'regulator-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (53 commits) regulator: mc13892-regulator: convert comma to semicolon regulator: pfuze100: Convert the driver to DT-only regulator: max14577: Add proper module aliases strings regulator: da9121: Potential Oops in da9121_assign_chip_model() regulator: da9121: Fix index used for DT property regulator: da9121: Remove uninitialised string variable regulator: axp20x: Fix DLDO2 voltage control register mask for AXP22x regulator: qcom-rpmh: Add support for PM8350/PM8350c regulator: dt-bindings: Add PM8350x compatibles regulator: da9121: include linux/gpio/consumer.h regulator: da9121: Mark some symbols with static keyword regulator: da9121: Request IRQ directly and free in release function to avoid masking race regulator: da9121: add interrupt support regulator: da9121: add mode support regulator: da9121: add current support regulator: da9121: Update registration to support multiple buck variants regulator: da9121: Add support for device variants via devicetree regulator: da9121: Add device variant descriptors regulator: da9121: Add device variant regmaps regulator: da9121: Add device variants ...
2020-12-15Merge tag 'regmap-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "This is quite a busy release for regmap with two substantial features being added: - Support for register maps Soundwire 1.2 multi-byte operations, allowing atomic support for registers larger than a single byte. - Support for relaxed I/O without barriers in MMIO regmaps, allowing them to be used efficiently on systems where default MMIO operations include barriers. There was also an addition and revert of use of the new Soundwire support for RT715 due to build issues with the driver built in, my tests only covered building it as a module, the patch wasn't just dropped as it had already been merged elsewhere" * tag 'regmap-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: ASoC: rt715: Fix build regmap: sdw: add required header files regmap: Remove duplicate `type` field from regmap `regcache_sync` trace event regmap: Fix order of regmap write log regmap: mmio: add config option to allow relaxed MMIO accesses
2020-12-15Merge tag 'irq-core-2020-12-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Generic interrupt and irqchips subsystem updates. Unusually, there is not a single completely new irq chip driver, just new DT bindings and extensions of existing drivers to accomodate new variants! Core: - Consolidation and robustness changes for irq time accounting - Cleanup and consolidation of irq stats - Remove the fasteoi IPI flow which has been proved useless - Provide an interface for converting legacy interrupt mechanism into irqdomains Drivers: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups" * tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling driver core: platform: Add devm_platform_get_irqs_affinity() ACPI: Drop acpi_dev_irqresource_disabled() resource: Add irqresource_disabled() genirq/affinity: Add irq_update_affinity_desc() irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device platform-msi: Track shared domain allocation irqchip/ti-sci-intr: Fix freeing of irqs irqchip/ti-sci-inta: Fix printing of inta id on probe success drivers/irqchip: Remove EZChip NPS interrupt controller Revert "genirq: Add fasteoi IPI flow" irqchip/hip04: Make IPIs use handle_percpu_devid_irq() irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq() irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq() irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq() irqchip/ocelot: Add support for Jaguar2 platforms irqchip/ocelot: Add support for Serval platforms irqchip/ocelot: Add support for Luton platforms irqchip/ocelot: prepare to support more SoC ...
2020-12-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "More MM work: a memcg scalability improvememt" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/lru: revise the comments of lru_lock mm/lru: introduce relock_page_lruvec() mm/lru: replace pgdat lru_lock with lruvec lock mm/swap.c: serialize memcg changes in pagevec_lru_move_fn mm/compaction: do page isolation first in compaction mm/lru: introduce TestClearPageLRU() mm/mlock: remove __munlock_isolate_lru_page() mm/mlock: remove lru_lock on TestClearPageMlocked mm/vmscan: remove lruvec reget in move_pages_to_lru mm/lru: move lock into lru_note_cost mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn mm/memcg: add debug checking in lock_page_memcg mm: page_idle_get_page() does not need lru_lock mm/rmap: stop store reordering issue on page->mapping mm/vmscan: remove unnecessary lruvec adding mm/thp: narrow lru locking mm/thp: simplify lru_add_page_tail() mm/thp: use head for head page in lru_add_page_tail() mm/thp: move lru_add_page_tail() to huge_memory.c
2020-12-15mm/lru: revise the comments of lru_lockHugh Dickins
Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix the incorrect comments in code. Also fixed some zone->lru_lock comment error from ancient time. etc. I struggled to understand the comment above move_pages_to_lru() (surely it never calls page_referenced()), and eventually realized that most of it had got separated from shrink_active_list(): move that comment back. Link: https://lkml.kernel.org/r/1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Jann Horn <jannh@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/lru: introduce relock_page_lruvec()Alexander Duyck
Add relock_page_lruvec() to replace repeated same code, no functional change. When testing for relock we can avoid the need for RCU locking if we simply compare the page pgdat and memcg pointers versus those that the lruvec is holding. By doing this we can avoid the extra pointer walks and accesses of the memory cgroup. In addition we can avoid the checks entirely if lruvec is currently NULL. [alex.shi@linux.alibaba.com: use page_memcg()] Link: https://lkml.kernel.org/r/66d8e79d-7ec6-bfbc-1c82-bf32db3ae5b7@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-19-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Tejun Heo <tj@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/lru: replace pgdat lru_lock with lruvec lockAlex Shi
This patch moves per node lru_lock into lruvec, thus bring a lru_lock for each of memcg per node. So on a large machine, each of memcg don't have to suffer from per node pgdat->lru_lock competition. They could go fast with their self lru_lock. After move memcg charge before lru inserting, page isolation could serialize page's memcg, then per memcg lruvec lock is stable and could replace per node lru lock. In isolate_migratepages_block(), compact_unlock_should_abort and lock_page_lruvec_irqsave are open coded to work with compact_control. Also add a debug func in locking which may give some clues if there are sth out of hands. Daniel Jordan's testing show 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box. https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/ Hugh Dickins helped on the patch polish, thanks! [alex.shi@linux.alibaba.com: fix comment typo] Link: https://lkml.kernel.org/r/5b085715-292a-4b43-50b3-d73dc90d1de5@linux.alibaba.com [alex.shi@linux.alibaba.com: use page_memcg()] Link: https://lkml.kernel.org/r/5a4c2b72-7ee8-2478-fc0e-85eb83aafec4@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-18-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rong Chen <rong.a.chen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>