summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2017-09-08mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memoryJérôme Glisse
HMM (heterogeneous memory management) need struct page to support migration from system main memory to device memory. Reasons for HMM and migration to device memory is explained with HMM core patch. This patch deals with device memory that is un-addressable memory (ie CPU can not access it). Hence we do not want those struct page to be manage like regular memory. That is why we extend ZONE_DEVICE to support different types of memory. A persistent memory type is define for existing user of ZONE_DEVICE and a new device un-addressable type is added for the un-addressable memory type. There is a clear separation between what is expected from each memory type and existing user of ZONE_DEVICE are un-affected by new requirement and new use of the un-addressable type. All specific code path are protect with test against the memory type. Because memory is un-addressable we use a new special swap type for when a page is migrated to device memory (this reduces the number of maximum swap file). The main two additions beside memory type to ZONE_DEVICE is two callbacks. First one, page_free() is call whenever page refcount reach 1 (which means the page is free as ZONE_DEVICE page never reach a refcount of 0). This allow device driver to manage its memory and associated struct page. The second callback page_fault() happens when there is a CPU access to an address that is back by a device page (which are un-addressable by the CPU). This callback is responsible to migrate the page back to system main memory. Device driver can not block migration back to system memory, HMM make sure that such page can not be pin into device memory. If device is in some error condition and can not migrate memory back then a CPU page fault to device memory should end with SIGBUS. [arnd@arndb.de: fix warning] Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Nellans <dnellans@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sherry Cheung <SCheung@nvidia.com> Cc: Subhash Gutti <sgutti@nvidia.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08mm/hmm: heterogeneous memory management (HMM for short)Jérôme Glisse
HMM provides 3 separate types of functionality: - Mirroring: synchronize CPU page table and device page table - Device memory: allocating struct page for device memory - Migration: migrating regular memory to device memory This patch introduces some common helpers and definitions to all of those 3 functionality. Link: http://lkml.kernel.org/r/20170817000548.32038-3-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: Sherry Cheung <SCheung@nvidia.com> Signed-off-by: Subhash Gutti <sgutti@nvidia.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Do not allow use of freed init data and code even when boot consoles are forced to stay. Also check for the init memory more precisely. - Some code clean up by starting contributors. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: Clean up do_syslog() error handling printk/console: Enhance the check for consoles using init memory printk/console: Always disable boot consoles that use init memory before it is freed printk: Modify operators of printed_len and text_len
2017-09-07Merge tag 'audit-pr-20170907' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "A small pull request for audit this time, only four patches and only two with any real code changes. Those two changes are the removal of a pointless SELinux AVC initialization audit event and a fix to improve the audit timestamp overhead. The other two patches are comment cleanup and administrative updates, nothing very exciting. Everything passes our tests" * tag 'audit-pr-20170907' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: update the function comments selinux: remove AVC init audit log message audit: update the audit info in MAINTAINERS audit: Reduce overhead using a coarse clock
2017-09-07Merge tag 'pstore-v4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: "Make pstore permissions more versatile by removing CAP_SYSLOG requirement and defining more restrictive root directory DAC permissions default (0750, which can be adjust after boot unlike the CAP_SYSLOG check). Suggested by Nick Kralevich" * tag 'pstore-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: Revert "pstore: Honor dmesg_restrict sysctl on dmesg dumps" pstore: Make default pstorefs root dir perms 0750
2017-09-07Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ...
2017-09-06Merge branch 'for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Several notable changes this cycle: - Thread mode was merged. This will be used for cgroup2 support for CPU and possibly other controllers. Unfortunately, CPU controller cgroup2 support didn't make this pull request but most contentions have been resolved and the support is likely to be merged before the next merge window. - cgroup.stat now shows the number of descendant cgroups. - cpuset now can enable the easier-to-configure v2 behavior on v1 hierarchy" * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits) cpuset: Allow v2 behavior in v1 cgroup cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup cgroup: remove unneeded checks cgroup: misc changes cgroup: short-circuit cset_cgroup_from_root() on the default hierarchy cgroup: re-use the parent pointer in cgroup_destroy_locked() cgroup: add cgroup.stat interface with basic hierarchy stats cgroup: implement hierarchy limits cgroup: keep track of number of descent cgroups cgroup: add comment to cgroup_enable_threaded() cgroup: remove unnecessary empty check when enabling threaded mode cgroup: update debug controller to print out thread mode information cgroup: implement cgroup v2 thread support cgroup: implement CSS_TASK_ITER_THREADED cgroup: introduce cgroup->dom_cgrp and threaded css_set handling cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS cgroup: reorganize cgroup.procs / task write path cgroup: replace css_set walking populated test with testing cgrp->nr_populated_csets cgroup: distinguish local and children populated states cgroup: remove now unused list_head @pending in cgroup_apply_cftypes() ...
2017-09-06Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: "Nothing major. I introduced a flag collsion bug during v4.13 cycle which is fixed in this pull request. Fortunately, the flag is for debugging / verification and the bug isn't critical" * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix flag collision workqueue: Use TASK_IDLE workqueue: fix path to documentation workqueue: doc change for ST behavior on NUMA systems
2017-09-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - various misc bits - DAX updates - OCFS2 - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits) mm,fork: introduce MADV_WIPEONFORK x86,mpx: make mpx depend on x86-64 to free up VMA flag mm: add /proc/pid/smaps_rollup mm: hugetlb: clear target sub-page last when clearing huge page mm: oom: let oom_reap_task and exit_mmap run concurrently swap: choose swap device according to numa node mm: replace TIF_MEMDIE checks by tsk_is_oom_victim mm, oom: do not rely on TIF_MEMDIE for memory reserves access z3fold: use per-cpu unbuddied lists mm, swap: don't use VMA based swap readahead if HDD is used as swap mm, swap: add sysfs interface for VMA based swap readahead mm, swap: VMA based swap readahead mm, swap: fix swap readahead marking mm, swap: add swap readahead hit statistics mm/vmalloc.c: don't reinvent the wheel but use existing llist API mm/vmstat.c: fix wrong comment selftests/memfd: add memfd_create hugetlbfs selftest mm/shmem: add hugetlbfs support to memfd_create() mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas() ...
2017-09-06mm,fork: introduce MADV_WIPEONFORKRik van Riel
Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty in the child process after fork. This differs from MADV_DONTFORK in one important way. If a child process accesses memory that was MADV_WIPEONFORK, it will get zeroes. The address ranges are still valid, they are just empty. If a child process accesses memory that was MADV_DONTFORK, it will get a segmentation fault, since those address ranges are no longer valid in the child after fork. Since MADV_DONTFORK also seems to be used to allow very large programs to fork in systems with strict memory overcommit restrictions, changing the semantics of MADV_DONTFORK might break existing programs. MADV_WIPEONFORK only works on private, anonymous VMAs. The use case is libraries that store or cache information, and want to know that they need to regenerate it in the child process after fork. Examples of this would be: - systemd/pulseaudio API checks (fail after fork) (replacing a getpid check, which is too slow without a PID cache) - PKCS#11 API reinitialization check (mandated by specification) - glibc's upcoming PRNG (reseed after fork) - OpenSSL PRNG (reseed after fork) The security benefits of a forking server having a re-inialized PRNG in every child process are pretty obvious. However, due to libraries having all kinds of internal state, and programs getting compiled with many different versions of each library, it is unreasonable to expect calling programs to re-initialize everything manually after fork. A further complication is the proliferation of clone flags, programs bypassing glibc's functions to call clone directly, and programs calling unshare, causing the glibc pthread_atfork hook to not get called. It would be better to have the kernel take care of this automatically. The patch also adds MADV_KEEPONFORK, to undo the effects of a prior MADV_WIPEONFORK. This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO: https://man.openbsd.org/minherit.2 [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines] Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: Colm MacCártaigh <colm@allcosts.net> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06mm: oom: let oom_reap_task and exit_mmap run concurrentlyAndrea Arcangeli
This is purely required because exit_aio() may block and exit_mmap() may never start, if the oom_reap_task cannot start running on a mm with mm_users == 0. At the same time if the OOM reaper doesn't wait at all for the memory of the current OOM candidate to be freed by exit_mmap->unmap_vmas, it would generate a spurious OOM kill. If it wasn't because of the exit_aio or similar blocking functions in the last mmput, it would be enough to change the oom_reap_task() in the case it finds mm_users == 0, to wait for a timeout or to wait for __mmput to set MMF_OOM_SKIP itself, but it's not just exit_mmap the problem here so the concurrency of exit_mmap and oom_reap_task is apparently warranted. It's a non standard runtime, exit_mmap() runs without mmap_sem, and oom_reap_task runs with the mmap_sem for reading as usual (kind of MADV_DONTNEED). The race between the two is solved with a combination of tsk_is_oom_victim() (serialized by task_lock) and MMF_OOM_SKIP (serialized by a dummy down_write/up_write cycle on the same lines of the ksm_exit method). If the oom_reap_task() may be running concurrently during exit_mmap, exit_mmap will wait it to finish in down_write (before taking down mm structures that would make the oom_reap_task fail with use after free). If exit_mmap comes first, oom_reap_task() will skip the mm if MMF_OOM_SKIP is already set and in turn all memory is already freed and furthermore the mm data structures may already have been taken down by free_pgtables. [aarcange@redhat.com: incremental one liner] Link: http://lkml.kernel.org/r/20170726164319.GC29716@redhat.com [rientjes@google.com: remove unused mmput_async] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708141733130.50317@chino.kir.corp.google.com [aarcange@redhat.com: microoptimization] Link: http://lkml.kernel.org/r/20170817171240.GB5066@redhat.com Link: http://lkml.kernel.org/r/20170726162912.GA29716@redhat.com Fixes: 26db62f179d1 ("oom: keep mm of the killed task available") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06mm: replace TIF_MEMDIE checks by tsk_is_oom_victimMichal Hocko
TIF_MEMDIE is set only to the tasks whick were either directly selected by the OOM killer or passed through mark_oom_victim from the allocator path. tsk_is_oom_victim is more generic and allows to identify all tasks (threads) which share the mm with the oom victim. Please note that the freezer still needs to check TIF_MEMDIE because we cannot thaw tasks which do not participage in oom_victims counting otherwise a !TIF_MEMDIE task could interfere after oom_disbale returns. Link: http://lkml.kernel.org/r/20170810075019.28998-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookupsDan Williams
devm_memremap_pages() records mapped ranges in pgmap_radix with an entry per section's worth of memory (128MB). The key for each of those entries is a section number. This leads to false positives when devm_memremap_pages() is passed a section-unaligned range as lookups in the misalignment fail to return NULL. We can close this hole by using the pfn as the key for entries in the tree. The number of entries required to describe a remapped range is reduced by leveraging multi-order entries. In practice this approach usually yields just one entry in the tree if the size and starting address are of the same power-of-2 alignment. Previously we always needed nr_entries = mapping_size / 128MB. Link: https://lists.01.org/pipermail/linux-nvdimm/2016-August/006666.html Link: http://lkml.kernel.org/r/150215410565.39310.13767886055248249438.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction")Roman Gushchin
Commit fa06235b8eb0 ("cgroup: reset css on destruction") caused css_reset callback to be called from the offlining path. Although it solves the problem mentioned in the commit description ("For instance, memory cgroup needs to reset memory.low, otherwise pages charged to a dead cgroup might never get reclaimed."), generally speaking, it's not correct. An offline cgroup can still be a resource domain, and we shouldn't grant it more resources than it had before deletion. For instance, if an offline memory cgroup has dirty pages, we should still imply i/o limits during writeback. The css_reset callback is designed to return the cgroup state into the original state, that means reset all limits and counters. It's spomething different from the offlining, and we shouldn't use it from the offlining path. Instead, we should adjust necessary settings from the per-controller css_offline callbacks (e.g. reset memory.low). Link: http://lkml.kernel.org/r/20170727130428.28856-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-09-05Merge tag 'devprop-4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework updates from Rafael Wysocki: "These introduce fwnode operations for all of the separate types of 'firmware nodes' that can be handled by the device properties framework, make the framework use const fwnode arguments all over, add a helper for the consolidated handling of node references and switch over the framework to the new UUID API. Specifics: - Introduce fwnode operations for all of the separate types of 'firmware nodes' that can be handled by the device properties framework and drop the type field from struct fwnode_handle (Sakari Ailus, Arnd Bergmann). - Make the device properties framework use const fwnode arguments where possible (Sakari Ailus). - Add a helper for the consolidated handling of node references to the device properties framework (Sakari Ailus). - Switch over the ACPI part of the device properties framework to the new UUID API (Andy Shevchenko)" * tag 'devprop-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: device property: Switch to use new generic UUID API device property: export irqchip_fwnode_ops device property: Introduce fwnode_property_get_reference_args device property: Constify fwnode property API device property: Constify argument to pset fwnode backend ACPI: Constify internal fwnode arguments ACPI: Constify acpi_bus helper functions, switch to macros ACPI: Prepare for constifying acpi_get_next_subnode() fwnode argument device property: Get rid of struct fwnode_handle type field ACPI: Use IS_ERR_OR_NULL() instead of non-NULL check in is_acpi_data_node()
2017-09-05Merge tag 'pm-4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "This time (again) cpufreq gets the majority of changes which mostly are driver updates (including a major consolidation of intel_pstate), some schedutil governor modifications and core cleanups. There also are some changes in the system suspend area, mostly related to diagnostics and debug messages plus some renames of things related to suspend-to-idle. One major change here is that suspend-to-idle is now going to be preferred over S3 on systems where the ACPI tables indicate to do so and provide requsite support (the Low Power Idle S0 _DSM in particular). The system sleep documentation and the tools related to it are updated too. The rest is a few cpuidle changes (nothing major), devfreq updates, generic power domains (genpd) framework updates and a few assorted modifications elsewhere. Specifics: - Drop the P-state selection algorithm based on a PID controller from intel_pstate and make it use the same P-state selection method (based on the CPU load) for all types of systems in the active mode (Rafael Wysocki, Srinivas Pandruvada). - Rework the cpufreq core and governors to make it possible to take cross-CPU utilization updates into account and modify the schedutil governor to actually do so (Viresh Kumar). - Clean up the handling of transition latency information in the cpufreq core and untangle it from the information on which drivers cannot do dynamic frequency switching (Viresh Kumar). - Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek cpufreq driver and update its DT bindings (Sean Wang). - Modify the cpufreq dt-platdev driver to autimatically create cpufreq devices for the new (v2) Operating Performance Points (OPP) DT bindings and update its whitelist of supported systems (Viresh Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley Xiao). - Add support for Ux500 to the cpufreq-dt driver and drop the obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann). - Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem Nguyen). - Fix and clean up assorted issues in the cpufreq drivers and core (Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva, Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla). - Update the IO-wait boost handling in the schedutil governor to make it less aggressive (Joel Fernandes). - Rework system suspend diagnostics to make it print fewer messages to the kernel log by default, add a sysfs knob to allow more suspend-related messages to be printed and add Low Power S0 Idle constraints checks to the ACPI suspend-to-idle code (Rafael Wysocki, Srinivas Pandruvada). - Prefer suspend-to-idle over S3 on ACPI-based systems with the ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM interface present in the ACPI tables (Rafael Wysocki). - Update documentation related to system sleep and rename a number of items in the code to make it cleare that they are related to suspend-to-idle (Rafael Wysocki). - Export a variable allowing device drivers to check the target system sleep state from the core system suspend code (Florian Fainelli). - Clean up the cpuidle subsystem to handle the polling state on x86 in a more straightforward way and to use %pOF instead of full_name (Rafael Wysocki, Rob Herring). - Update the devfreq framework to fix and clean up a few minor issues (Chanwoo Choi, Rob Herring). - Extend diagnostics in the generic power domains (genpd) framework and clean it up slightly (Thara Gopinath, Rob Herring). - Fix and clean up a couple of issues in the operating performance points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz). - Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling (AVS) driver (David Wu). - Fix the usage of notifiers in CPU power management on some platforms (Alex Shi). - Update the pm-graph system suspend/hibernation and boot profiling utility (Todd Brandt). - Make it possible to run the cpupower utility without CPU0 (Prarit Bhargava)" * tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits) cpuidle: Make drivers initialize polling state cpuidle: Move polling state initialization code to separate file cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol cpufreq: imx6q: Fix imx6sx low frequency support cpufreq: speedstep-lib: make several arrays static, makes code smaller PM: docs: Delete the obsolete states.txt document PM: docs: Describe high-level PM strategies and sleep states PM / devfreq: Fix memory leak when fail to register device PM / devfreq: Add dependency on PM_OPP PM / devfreq: Move private devfreq_update_stats() into devfreq PM / devfreq: Convert to using %pOF instead of full_name PM / AVS: rockchip-io: add io selectors and supplies for RV1108 cpufreq: ti: Fix 'of_node_put' being called twice in error handling path cpufreq: dt-platdev: Drop few entries from whitelist cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2 ARM: ux500: don't select CPUFREQ_DT cpuidle: Convert to using %pOF instead of full_name cpufreq: Convert to using %pOF instead of full_name PM / Domains: Convert to using %pOF instead of full_name cpufreq: Cap the default transition delay value to 10 ms ...
2017-09-05Merge tag 'char-misc-4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver update for 4.14-rc1. Lots of different stuff in here, it's been an active development cycle for some reason. Highlights are: - updated binder driver, this brings binder up to date with what shipped in the Android O release, plus some more changes that happened since then that are in the Android development trees. - coresight updates and fixes - mux driver file renames to be a bit "nicer" - intel_th driver updates - normal set of hyper-v updates and changes - small fpga subsystem and driver updates - lots of const code changes all over the driver trees - extcon driver updates - fmc driver subsystem upadates - w1 subsystem minor reworks and new features and drivers added - spmi driver updates Plus a smattering of other minor driver updates and fixes. All of these have been in linux-next with no reported issues for a while" * tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits) ANDROID: binder: don't queue async transactions to thread. ANDROID: binder: don't enqueue death notifications to thread todo. ANDROID: binder: Don't BUG_ON(!spin_is_locked()). ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl ANDROID: binder: push new transactions to waiting threads. ANDROID: binder: remove proc waitqueue android: binder: Add page usage in binder stats android: binder: fixup crash introduced by moving buffer hdr drivers: w1: add hwmon temp support for w1_therm drivers: w1: refactor w1_slave_show to make the temp reading functionality separate drivers: w1: add hwmon support structures eeprom: idt_89hpesx: Support both ACPI and OF probing mcb: Fix an error handling path in 'chameleon_parse_cells()' MCB: add support for SC31 to mcb-lpc mux: make device_type const char: virtio: constify attribute_group structures. Documentation/ABI: document the nvmem sysfs files lkdtm: fix spelling mistake: "incremeted" -> "incremented" perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file nvmem: include linux/err.h from header ...
2017-09-05Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - VMAP_STACK support, allowing the kernel stacks to be allocated in the vmalloc space with a guard page for trapping stack overflows. One of the patches introduces THREAD_ALIGN and changes the generic alloc_thread_stack_node() to use this instead of THREAD_SIZE (no functional change for other architectures) - Contiguous PTE hugetlb support re-enabled (after being reverted a couple of times). We now have the semantics agreed in the generic mm layer together with API improvements so that the architecture code can detect between contiguous and non-contiguous huge PTEs - Initial support for persistent memory on ARM: DC CVAP instruction exposed to user space (HWCAP) and the in-kernel pmem API implemented - raid6 improvements for arm64: faster algorithm for the delta syndrome and implementation of the recovery routines using Neon - FP/SIMD refactoring and removal of support for Neon in interrupt context. This is in preparation for full SVE support - PTE accessors converted from inline asm to cmpxchg so that we can use LSE atomics if available (ARMv8.1) - Perf support for Cortex-A35 and A73 - Non-urgent fixes and cleanups * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: cleanup {COMPAT_,}SET_PERSONALITY() macro arm64: introduce separated bits for mm_context_t flags arm64: hugetlb: Cleanup setup_hugepagesz arm64: Re-enable support for contiguous hugepages arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages arm64: hugetlb: Add break-before-make logic for contiguous entries arm64: hugetlb: Spring clean huge pte accessors arm64: hugetlb: Introduce pte_pgprot helper arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present arm64: kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores arm64: dma-mapping: Mark atomic_pool as __ro_after_init arm64: dma-mapping: Do not pass data to gen_pool_set_algo() arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect() arm64: Move PTE_RDONLY bit handling out of set_pte_at() kvm: arm64: Convert kvm_set_s2pte_readonly() from inline asm to cmpxchg() arm64: Convert pte handling from inline asm to using (cmp)xchg arm64: neon/efi: Make EFI fpsimd save/restore variables static ...
2017-09-05bpf: fix numa_node validationEric Dumazet
syzkaller reported crashes in bpf map creation or map update [1] Problem is that nr_node_ids is a signed integer, NUMA_NO_NODE is also an integer, so it is very tempting to declare numa_node as a signed integer. This means the typical test to validate a user provided value : if (numa_node != NUMA_NO_NODE && (numa_node >= nr_node_ids || !node_online(numa_node))) must be written : if (numa_node != NUMA_NO_NODE && ((unsigned int)numa_node >= nr_node_ids || !node_online(numa_node))) [1] kernel BUG at mm/slab.c:3256! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 2946 Comm: syzkaller916108 Not tainted 4.13.0-rc7+ #35 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d2bc60c0 task.stack: ffff8801c0c90000 RIP: 0010:____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: 0018:ffff8801c0c97638 EFLAGS: 00010096 RAX: ffffffffffff8b7b RBX: 0000000001080220 RCX: 0000000000000000 RDX: 00000000ffff8b7b RSI: 0000000001080220 RDI: ffff8801dac00040 RBP: ffff8801c0c976c0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff8801c0c97620 R11: 0000000000000001 R12: ffff8801dac00040 R13: ffff8801dac00040 R14: 0000000000000000 R15: 00000000ffff8b7b FS: 0000000002119940(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020001fec CR3: 00000001d2980000 CR4: 00000000001406f0 Call Trace: __do_kmalloc_node mm/slab.c:3688 [inline] __kmalloc_node+0x33/0x70 mm/slab.c:3696 kmalloc_node include/linux/slab.h:535 [inline] alloc_htab_elem+0x2a8/0x480 kernel/bpf/hashtab.c:740 htab_map_update_elem+0x740/0xb80 kernel/bpf/hashtab.c:820 map_update_elem kernel/bpf/syscall.c:587 [inline] SYSC_bpf kernel/bpf/syscall.c:1468 [inline] SyS_bpf+0x20c5/0x4c40 kernel/bpf/syscall.c:1443 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x440409 RSP: 002b:00007ffd1f1792b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440409 RDX: 0000000000000020 RSI: 0000000020006000 RDI: 0000000000000002 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401d70 R13: 0000000000401e00 R14: 0000000000000000 R15: 0000000000000000 Code: 83 c2 01 89 50 18 4c 03 70 08 e8 38 f4 ff ff 4d 85 f6 0f 85 3e ff ff ff 44 89 fe 4c 89 ef e8 94 fb ff ff 49 89 c6 e9 2b ff ff ff <0f> 0b 0f 0b 0f 0b 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RIP: ____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: ffff8801c0c97638 ---[ end trace d745f355da2e33ce ]--- Kernel panic - not syncing: Fatal exception Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05audit: update the function commentsGeliang Tang
Update the function comments to match the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-09-05audit: Reduce overhead using a coarse clockMel Gorman
Commit 2115bb250f26 ("audit: Use timespec64 to represent audit timestamps") noted that audit timestamps were not y2038 safe and used a 64-bit timestamp. In itself, this makes sense but the conversion was from CURRENT_TIME to ktime_get_real_ts64() which is a heavier call to record an accurate timestamp which is required in some, but not all, cases. The impact is that when auditd is running without any rules that all syscalls have higher overhead. This is visible in the sysbench-thread benchmark as a 11.5% performance hit. That benchmark is dumb as rocks but it's also visible in redis as an 8-10% hit on all operations which is of greater concern. It is somewhat stupid of audit to track syscalls without any rules related to syscalls but that is how it behaves. The overhead can be directly measured with perf comparing 4.9 with 4.12 4.9 7.76% sysbench [kernel.vmlinux] [k] __schedule 7.62% sysbench [kernel.vmlinux] [k] _raw_spin_lock 7.37% sysbench libpthread-2.22.so [.] __lll_lock_elision 7.29% sysbench [kernel.vmlinux] [.] syscall_return_via_sysret 6.59% sysbench [kernel.vmlinux] [k] native_sched_clock 5.21% sysbench libc-2.22.so [.] __sched_yield 4.38% sysbench [kernel.vmlinux] [k] entry_SYSCALL_64 4.28% sysbench [kernel.vmlinux] [k] do_syscall_64 3.49% sysbench libpthread-2.22.so [.] __lll_unlock_elision 3.13% sysbench [kernel.vmlinux] [k] __audit_syscall_exit 2.87% sysbench [kernel.vmlinux] [k] update_curr 2.73% sysbench [kernel.vmlinux] [k] pick_next_task_fair 2.31% sysbench [kernel.vmlinux] [k] syscall_trace_enter 2.20% sysbench [kernel.vmlinux] [k] __audit_syscall_entry ..... 0.00% swapper [kernel.vmlinux] [k] read_tsc 4.12 7.84% sysbench [kernel.vmlinux] [k] __schedule 7.05% sysbench [kernel.vmlinux] [k] _raw_spin_lock 6.57% sysbench libpthread-2.22.so [.] __lll_lock_elision 6.50% sysbench [kernel.vmlinux] [.] syscall_return_via_sysret 5.95% sysbench [kernel.vmlinux] [k] read_tsc 5.71% sysbench [kernel.vmlinux] [k] native_sched_clock 4.78% sysbench libc-2.22.so [.] __sched_yield 4.30% sysbench [kernel.vmlinux] [k] entry_SYSCALL_64 3.94% sysbench [kernel.vmlinux] [k] do_syscall_64 3.37% sysbench libpthread-2.22.so [.] __lll_unlock_elision 3.32% sysbench [kernel.vmlinux] [k] __audit_syscall_exit 2.91% sysbench [kernel.vmlinux] [k] __getnstimeofday64 Note the additional overhead from read_tsc which goes from 0% to 5.95%. This is on a single-socket E3-1230 but similar overheads have been measured on an older machine which the patch also eliminates. The patch in question has no explanation as to why a fully-accurate timestamp is required and is likely an oversight. Using a coarser, but monotically increasing, timestamp the overhead can be eliminated. While it can be worked around by configuring or disabling audit, it's tricky enough to detect that a kernel fix is justified. With this patch, we see the following; sysbenchthread 4.9.0 4.12.0 4.12.0 vanilla vanilla coarse-v1r1 Amean 1 1.49 ( 0.00%) 1.66 ( -11.42%) 1.51 ( -1.34%) Amean 3 1.48 ( 0.00%) 1.65 ( -11.45%) 1.50 ( -0.96%) Amean 5 1.49 ( 0.00%) 1.67 ( -12.31%) 1.51 ( -1.83%) Amean 7 1.49 ( 0.00%) 1.66 ( -11.72%) 1.50 ( -0.67%) Amean 12 1.48 ( 0.00%) 1.65 ( -11.57%) 1.52 ( -2.89%) Amean 16 1.49 ( 0.00%) 1.65 ( -11.13%) 1.51 ( -1.73%) The benchmark is reporting the time required for different thread counts to lock/unlock a private mutex which, while dense, demonstrates the syscall overhead. This is showing that 4.12 took a 11-12% hit but the overhead is almost eliminated by the patch. While the variance is not reported here, it's well within the noise with the patch applied. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-09-05Merge branch 'for-4.13-fixes' into for-4.14Tejun Heo
2017-09-04Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache quality monitoring update from Thomas Gleixner: "This update provides a complete rewrite of the Cache Quality Monitoring (CQM) facility. The existing CQM support was duct taped into perf with a lot of issues and the attempts to fix those turned out to be incomplete and horrible. After lengthy discussions it was decided to integrate the CQM support into the Resource Director Technology (RDT) facility, which is the obvious choise as in hardware CQM is part of RDT. This allowed to add Memory Bandwidth Monitoring support on top. As a result the mechanisms for allocating cache/memory bandwidth and the corresponding monitoring mechanisms are integrated into a single management facility with a consistent user interface" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) x86/intel_rdt: Turn off most RDT features on Skylake x86/intel_rdt: Add command line options for resource director technology x86/intel_rdt: Move special case code for Haswell to a quirk function x86/intel_rdt: Remove redundant ternary operator on return x86/intel_rdt/cqm: Improve limbo list processing x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug x86/intel_rdt: Modify the intel_pqr_state for better performance x86/intel_rdt/cqm: Clear the default RMID during hotcpu x86/intel_rdt: Show bitmask of shareable resource with other executing units x86/intel_rdt/mbm: Handle counter overflow x86/intel_rdt/mbm: Add mbm counter initialization x86/intel_rdt/mbm: Basic counting of MBM events (total and local) x86/intel_rdt/cqm: Add CPU hotplug support x86/intel_rdt/cqm: Add sched_in support x86/intel_rdt: Introduce rdt_enable_key for scheduling x86/intel_rdt/cqm: Add mount,umount support x86/intel_rdt/cqm: Add rmdir support x86/intel_rdt: Separate the ctrl bits from rmdir x86/intel_rdt/cqm: Add mon_data x86/intel_rdt: Prepare for RDT monitor data support ...
2017-09-04Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Thomas Gleixner: "A single fix to handle the removal of the first dynamic CPU hotplug state correctly" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp/hotplug: Handle removal correctly in cpuhp_store_callbacks()
2017-09-04Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The interrupt subsystem delivers this time: - Refactoring of the GIC-V3 driver to prepare for the GIC-V4 support - Initial GIC-V4 support - Consolidation of the FSL MSI support - Utilize the effective affinity interface in various ARM irqchip drivers - Yet another interrupt chip driver (UniPhier AIDET) - Bulk conversion of the irq chip driver to use %pOF - The usual small fixes and improvements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) irqchip/ls-scfg-msi: Add MSI affinity support irqchip/ls-scfg-msi: Add LS1043a v1.1 MSI support irqchip/ls-scfg-msi: Add LS1046a MSI support arm64: dts: ls1046a: Add MSI dts node arm64: dts: ls1043a: Share all MSIs arm: dts: ls1021a: Share all MSIs arm64: dts: ls1043a: Fix typo of MSI compatible string arm: dts: ls1021a: Fix typo of MSI compatible string irqchip/ls-scfg-msi: Fix typo of MSI compatible strings irqchip/irq-bcm7120-l2: Use correct I/O accessors for irq_fwd_mask irqchip/mmp: Make mmp_intc_conf const irqchip/gic: Make irq_chip const irqchip/gic-v3: Advertise GICv4 support to KVM irqchip/gic-v4: Enable low-level GICv4 operations irqchip/gic-v4: Add some basic documentation irqchip/gic-v4: Add VLPI configuration interface irqchip/gic-v4: Add VPE command interface irqchip/gic-v4: Add per-VM VPE domain creation irqchip/gic-v3-its: Set implementation defined bit to enable VLPIs irqchip/gic-v3-its: Allow doorbell interrupts to be injected/cleared ...
2017-09-04Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A rather small update for the time(r) subsystem: - A new clocksource driver IMX-TPM - Minor fixes to the alarmtimer facility - Device tree cleanups for Renesas drivers - A new kselftest and fixes for the timer related tests - Conversion of the clocksource drivers to use %pOF - Use the proper helpers to access rlimits in the posix-cpu-timer code" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: alarmtimer: Ensure RTC module is not unloaded clocksource: Convert to using %pOF instead of full_name clocksource/drivers/bcm2835: Remove message for a memory allocation failure devicetree: bindings: Remove deprecated properties devicetree: bindings: Remove unused 32-bit CMT bindings devicetree: bindings: Deprecate property, update example devicetree: bindings: r8a73a4 and R-Car Gen2 CMT bindings devicetree: bindings: R-Car Gen2 CMT0 and CMT1 bindings devicetree: bindings: Remove sh7372 CMT binding clocksource/drivers/imx-tpm: Add imx tpm timer support dt-bindings: timer: Add nxp tpm timer binding doc posix-cpu-timers: Use dedicated helper to access rlimit values alarmtimer: Fix unavailable wake-up source in sysfs timekeeping: Use proper timekeeper for debug code kselftests: timers: set-timer-lat: Add one-shot timer test cases kselftests: timers: set-timer-lat: Tweak reporting when timer fires early kselftests: timers: freq-step: Fix build warning kselftests: timers: freq-step: Define ADJ_SETOFFSET if device has older kernel headers
2017-09-04Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "PCID support, 5-level paging support, Secure Memory Encryption support The main changes in this cycle are support for three new, complex hardware features of x86 CPUs: - Add 5-level paging support, which is a new hardware feature on upcoming Intel CPUs allowing up to 128 PB of virtual address space and 4 PB of physical RAM space - a 512-fold increase over the old limits. (Supercomputers of the future forecasting hurricanes on an ever warming planet can certainly make good use of more RAM.) Many of the necessary changes went upstream in previous cycles, v4.14 is the first kernel that can enable 5-level paging. This feature is activated via CONFIG_X86_5LEVEL=y - disabled by default. (By Kirill A. Shutemov) - Add 'encrypted memory' support, which is a new hardware feature on upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system RAM to be encrypted and decrypted (mostly) transparently by the CPU, with a little help from the kernel to transition to/from encrypted RAM. Such RAM should be more secure against various attacks like RAM access via the memory bus and should make the radio signature of memory bus traffic harder to intercept (and decrypt) as well. This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled by default. (By Tom Lendacky) - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a hardware feature that attaches an address space tag to TLB entries and thus allows to skip TLB flushing in many cases, even if we switch mm's. (By Andy Lutomirski) All three of these features were in the works for a long time, and it's coincidence of the three independent development paths that they are all enabled in v4.14 at once" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits) x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y) x86/mm: Use pr_cont() in dump_pagetable() x86/mm: Fix SME encryption stack ptr handling kvm/x86: Avoid clearing the C-bit in rsvd_bits() x86/CPU: Align CR3 defines x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages acpi, x86/mm: Remove encryption mask from ACPI page protectio