summaryrefslogtreecommitdiffstats
path: root/mm
AgeCommit message (Collapse)Author
2019-03-29mm/migrate.c: add missing flush_dcache_page for non-mapped page migrateLars Persson
Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL and SIGSEGV that could not be traced back to a userspace code bug. They had all the magic signs of an I/D cache coherency issue. Now recently we noticed that the /proc/sys/vm/compact_memory interface was quite efficient at provoking this class of userspace crashes. Studying the code in mm/migrate.c there is a distinction made between migrating a page that is mapped at the instant of migration and one that is not mapped. Our problem turned out to be the non-mapped pages. For the non-mapped page the code performs a copy of the page content and all relevant meta-data of the page without doing the required D-cache maintenance. This leaves dirty data in the D-cache of the CPU and on the 1004K cores this data is not visible to the I-cache. A subsequent page-fault that triggers a mapping of the page will happily serve the process with potentially stale code. What about ARM then, this bug should have seen greater exposure? Well ARM became immune to this flaw back in 2010, see commit c01778001a4f ("ARM: 6379/1: Assume new page cache pages have dirty D-cache"). My proposed fix moves the D-cache maintenance inside move_to_new_page to make it common for both cases. Link: http://lkml.kernel.org/r/20190315083502.11849-1-larper@axis.com Fixes: 97ee0524614 ("flush cache before installing new page at migraton") Signed-off-by: Lars Persson <larper@axis.com> Reviewed-by: Paul Burton <paul.burton@mips.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate()Qian Cai
Due to has_unmovable_pages() taking an incorrect irqsave flag instead of the isolation flag in set_migratetype_isolate(), there are issues with HWPOSION and error reporting where dump_page() is not called when there is an unmovable page. Link: http://lkml.kernel.org/r/20190320204941.53731-1-cai@lca.pw Fixes: d381c54760dc ("mm: only report isolation failures when offlining memory") Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm/memory_hotplug.c: fix notification in offline error pathQian Cai
When start_isolate_page_range() returned -EBUSY in __offline_pages(), it calls memory_notify(MEM_CANCEL_OFFLINE, &arg) with an uninitialized "arg". As the result, it triggers warnings below. Also, it is only necessary to notify MEM_CANCEL_OFFLINE after MEM_GOING_OFFLINE. page:ffffea0001200000 count:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x3fffe000001000(reserved) raw: 003fffe000001000 ffffea0001200008 ffffea0001200008 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: unmovable page WARNING: CPU: 25 PID: 1665 at mm/kasan/common.c:665 kasan_mem_notifier+0x34/0x23b CPU: 25 PID: 1665 Comm: bash Tainted: G W 5.0.0+ #94 Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017 RIP: 0010:kasan_mem_notifier+0x34/0x23b RSP: 0018:ffff8883ec737890 EFLAGS: 00010206 RAX: 0000000000000246 RBX: ff10f0f4435f1000 RCX: f887a7a21af88000 RDX: dffffc0000000000 RSI: 0000000000000020 RDI: ffff8881f221af88 RBP: ffff8883ec737898 R08: ffff888000000000 R09: ffffffffb0bddcd0 R10: ffffed103e857088 R11: ffff8881f42b8443 R12: dffffc0000000000 R13: 00000000fffffff9 R14: dffffc0000000000 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560fbd31d730 CR3: 00000004049c6003 CR4: 00000000001606a0 Call Trace: notifier_call_chain+0xbf/0x130 __blocking_notifier_call_chain+0x76/0xc0 blocking_notifier_call_chain+0x16/0x20 memory_notify+0x1b/0x20 __offline_pages+0x3e2/0x1210 offline_pages+0x11/0x20 memory_block_action+0x144/0x300 memory_subsys_offline+0xe5/0x170 device_offline+0x13f/0x1e0 state_store+0xeb/0x110 dev_attr_store+0x3f/0x70 sysfs_kf_write+0x104/0x150 kernfs_fop_write+0x25c/0x410 __vfs_write+0x66/0x120 vfs_write+0x15a/0x4f0 ksys_write+0xd2/0x1b0 __x64_sys_write+0x73/0xb0 do_syscall_64+0xeb/0xb78 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f14f75cc3b8 RSP: 002b:00007ffe84d01d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f14f75cc3b8 RDX: 0000000000000008 RSI: 0000563f8e433d70 RDI: 0000000000000001 RBP: 0000563f8e433d70 R08: 000000000000000a R09: 00007ffe84d018f0 R10: 000000000000000a R11: 0000000000000246 R12: 00007f14f789e780 R13: 0000000000000008 R14: 00007f14f7899740 R15: 0000000000000008 Link: http://lkml.kernel.org/r/20190320204255.53571-1-cai@lca.pw Fixes: 7960509329c2 ("mm, memory_hotplug: print reason for the offlining failure") Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm/debug.c: fix __dump_page when mapping->host is not setOscar Salvador
While debugging something, I added a dump_page() into do_swap_page(), and I got the splat from below. The issue happens when dereferencing mapping->host in __dump_page(): ... else if (mapping) { pr_warn("%ps ", mapping->a_ops); if (mapping->host->i_dentry.first) { struct dentry *dentry; dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias); pr_warn("name:\"%pd\" ", dentry); } } ... Swap address space does not contain an inode information, and so mapping->host equals NULL. Although the dump_page() call was added artificially into do_swap_page(), I am not sure if we can hit this from any other path, so it looks worth fixing it. We can easily do that by checking mapping->host first. Link: http://lkml.kernel.org/r/20190318072931.29094-1-osalvador@suse.de Fixes: 1c6fb1d89e73c ("mm: print more information about mapping in __dump_page") Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specifiedYang Shi
When MPOL_MF_STRICT was specified and an existing page was already on a node that does not follow the policy, mbind() should return -EIO. But commit 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") broke the rule. And commit c8633798497c ("mm: mempolicy: mbind and migrate_pages support thp migration") didn't return the correct value for THP mbind() too. If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it reaches queue_pages_to_pte_range() or queue_pages_pmd() to check if an existing page was already on a node that does not follow the policy. And, non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified. Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c [akpm@linux-foundation.org: tweak code comment] Link: http://lkml.kernel.org/r/1553020556-38583-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: Cyril Hrubis <chrubis@suse.cz> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Rafael Aquini <aquini@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm: add support for kmem caches in DMA32 zoneNicolas Boichat
Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables", v6. This is a followup to the discussion in [1], [2]. IOMMUs using ARMv7 short-descriptor format require page tables (level 1 and 2) to be allocated within the first 4GB of RAM, even on 64-bit systems. For L1 tables that are bigger than a page, we can just use __get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still use GFP_DMA). For L2 tables that only take 1KB, it would be a waste to allocate a full page, so we considered 3 approaches: 1. This series, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. [3] This series is the most memory-efficient approach. stable@ note: We confirmed that this is a regression, and IOMMU errors happen on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek platforms (and maybe others?). [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html [2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html [3] https://patchwork.codeaurora.org/patch/671639/ This patch (of 3): IOMMUs using ARMv7 short-descriptor format require page tables to be allocated within the first 4GB of RAM, even on 64-bit systems. On arm64, this is done by passing GFP_DMA32 flag to memory allocation functions. For IOMMU L2 tables that only take 1KB, it would be a waste to allocate a full page using get_free_pages, so we considered 3 approaches: 1. This patch, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. This change makes it possible to create a custom cache in DMA32 zone using kmem_cache_create, then allocate memory using kmem_cache_alloc. We do not create a DMA32 kmalloc cache array, as there are currently no users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK. This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32 kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and unnecessary). Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm/hotplug: fix offline undo_isolate_page_range()Qian Cai
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") introduced move_pfn_range_to_zone() which calls memmap_init_zone() during onlining a memory block. memmap_init_zone() will reset pagetype flags and makes migrate type to be MOVABLE. However, in __offline_pages(), it also call undo_isolate_page_range() after offline_isolated_pages() to do the same thing. Due to commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed __first_valid_page() to skip offline pages, undo_isolate_page_range() here just waste CPU cycles looping around the offlining PFN range while doing nothing, because __first_valid_page() will return NULL as offline_isolated_pages() has already marked all memory sections within the pfn range as offline via offline_mem_sections(). Also, after calling the "useless" undo_isolate_page_range() here, it reaches the point of no returning by notifying MEM_OFFLINE. Those pages will be marked as MIGRATE_MOVABLE again once onlining. The only thing left to do is to decrease the number of isolated pageblocks zone counter which would make some paths of the page allocation slower that the above commit introduced. Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages on ppc64, an "int" should still be enough to represent the number of pageblocks there. Fix an incorrect comment along the way. [cai@lca.pw: v4] Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm/debug.c: add a cast to u64 for atomic64_read()Qian Cai
atomic64_read() on ppc64le returns "long int", so fix the same way as commit d549f545e690 ("drm/virtio: use %llu format string form atomic64_t") by adding a cast to u64, which makes it work on all arches. In file included from ./include/linux/printk.h:7, from ./include/linux/kernel.h:15, from mm/debug.c:9: mm/debug.c: In function 'dump_mm': ./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 19 has type 'long int' [-Wformat=] #define KERN_SOH "A" /* ASCII Start Of Header */ ^~~~~~ ./include/linux/kern_levels.h:8:20: note: in expansion of macro 'KERN_SOH' #define KERN_EMERG KERN_SOH "0" /* system is unusable */ ^~~~~~~~ ./include/linux/printk.h:297:9: note: in expansion of macro 'KERN_EMERG' printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__) ^~~~~~~~~~ mm/debug.c:133:2: note: in expansion of macro 'pr_emerg' pr_emerg("mm %px mmap %px seqnum %llu task_size %lu" ^~~~~~~~ mm/debug.c:140:17: note: format string is defined here "pinned_vm %llx data_vm %lx exec_vm %lx stack_vm %lx" ~~~^ %lx Link: http://lkml.kernel.org/r/20190310183051.87303-1-cai@lca.pw Fixes: 70f8a3ca68d3 ("mm: make mm->pinned_vm an atomic64 counter") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29mm/memory.c: fix modifying of page protection by insert_pfn()Jan Kara
Aneesh has reported that PPC triggers the following warning when excercising DAX code: IP set_pte_at+0x3c/0x190 LR insert_pfn+0x208/0x280 Call Trace: insert_pfn+0x68/0x280 dax_iomap_pte_fault.isra.7+0x734/0xa40 __xfs_filemap_fault+0x280/0x2d0 do_wp_page+0x48c/0xa40 __handle_mm_fault+0x8d0/0x1fd0 handle_mm_fault+0x140/0x250 __do_page_fault+0x300/0xd60 handle_page_fault+0x18 Now that is WARN_ON in set_pte_at which is VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); The problem is that on some architectures set_pte_at() cannot cope with a situation where there is already some (different) valid entry present. Use ptep_set_access_flags() instead to modify the pfn which is built to deal with modifying existing PTE. Link: http://lkml.kernel.org/r/20190311084537.16029-1-jack@suse.cz Fixes: b2770da64254 "mm: add vm_insert_mixed_mkwrite()" Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Chandan Rajendra <chandan@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-29kasan: fix variable 'tag' set but not used warningQian Cai
set_tag() compiles away when CONFIG_KASAN_SW_TAGS=n, so make arch_kasan_set_tag() a static inline function to fix warnings below. mm/kasan/common.c: In function '__kasan_kmalloc': mm/kasan/common.c:475:5: warning: variable 'tag' set but not used [-Wunused-but-set-variable] u8 tag; ^~~ Link: http://lkml.kernel.org/r/20190307185244.54648-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-16Merge tag 'devdax-for-5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull device-dax updates from Dan Williams: "New device-dax infrastructure to allow persistent memory and other "reserved" / performance differentiated memories, to be assigned to the core-mm as "System RAM". Some users want to use persistent memory as additional volatile memory. They are willing to cope with potential performance differences, for example between DRAM and 3D Xpoint, and want to use typical Linux memory management apis rather than a userspace memory allocator layered over an mmap() of a dax file. The administration model is to decide how much Persistent Memory (pmem) to use as System RAM, create a device-dax-mode namespace of that size, and then assign it to the core-mm. The rationale for device-dax is that it is a generic memory-mapping driver that can be layered over any "special purpose" memory, not just pmem. On subsequent boots udev rules can be used to restore the memory assignment. One implication of using pmem as RAM is that mlock() no longer keeps data off persistent media. For this reason it is recommended to enable NVDIMM Security (previously merged for 5.0) to encrypt pmem contents at rest. We considered making this recommendation an actively enforced requirement, but in the end decided to leave it as a distribution / administrator policy to allow for emulation and test environments that lack security capable NVDIMMs. Summary: - Replace the /sys/class/dax device model with /sys/bus/dax, and include a compat driver so distributions can opt-in to the new ABI. - Allow for an alternative driver for the device-dax address-range - Introduce the 'kmem' driver to hotplug / assign a device-dax address-range to the core-mm. - Arrange for the device-dax target-node to be onlined so that the newly added memory range can be uniquely referenced by numa apis" NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because we currently have special - and very annoying rules in the kernel about accessing PMEM only with the "MC safe" accessors, because machine checks inside the regular repeat string copy functions can be fatal in some (not described) circumstances. And apparently the PMEM modules can cause that a lot more than regular RAM. The argument is that this happens because PMEM doesn't necessarily get scrubbed at boot like RAM does, but that is planned to be added for the user space tooling. Quoting Dan from another email: "The exposure can be reduced in the volatile-RAM case by scanning for and clearing errors before it is onlined as RAM. The userspace tooling for that can be in place before v5.1-final. There's also runtime notifications of errors via acpi_nfit_uc_error_notify() from background scrubbers on the DIMM devices. With that mechanism the kernel could proactively clear newly discovered poison in the volatile case, but that would be additional development more suitable for v5.2. I understand the concern, and the need to highlight this issue by tapping the brakes on feature development, but I don't see PMEM as RAM making the situation worse when the exposure is also there via DAX in the PMEM case. Volatile-RAM is arguably a safer use case since it's possible to repair pages where the persistent case needs active application coordination" * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: "Hotplug" persistent memory for use like normal RAM mm/resource: Let walk_system_ram_range() search child resources mm/memory-hotplug: Allow memory resources to be children mm/resource: Move HMM pr_debug() deeper into resource code mm/resource: Return real error codes from walk failures device-dax: Add a 'modalias' attribute to DAX 'bus' devices device-dax: Add a 'target_node' attribute device-dax: Auto-bind device after successful new_id acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node device-dax: Add /sys/class/dax backwards compatibility device-dax: Add support for a dax override driver device-dax: Move resource pinning+mapping into the common driver device-dax: Introduce bus + driver model device-dax: Start defining a dax bus model device-dax: Remove multi-resource infrastructure device-dax: Kill dax_region base device-dax: Kill dax_region ida
2019-03-15filemap: add a comment about FAULT_FLAG_RETRY_NOWAIT behaviorLinus Torvalds
I thought Josef Bacik's patch to drop the mmap_sem was buggy, because when looking at the error cases, there was one case where we returned VM_FAULT_RETRY without actually dropping the mmap_sem. Josef had to explain to me (using small words) that yes, that's actually what we're supposed to do, and his patch was correct. Which not only convinced me he knew what he was doing and I should stop arguing with him, but also that I should add a comment to the case I was confused about. Patiently-pointed-out-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-15filemap: drop the mmap_sem for all blocking operationsJosef Bacik
Currently we only drop the mmap_sem if there is contention on the page lock. The idea is that we issue readahead and then go to lock the page while it is under IO and we want to not hold the mmap_sem during the IO. The problem with this is the assumption that the readahead does anything. In the case that the box is under extreme memory or IO pressure we may end up not reading anything at all for readahead, which means we will end up reading in the page under the mmap_sem. Even if the readahead does something, it could get throttled because of io pressure on the system and the process is in a lower priority cgroup. Holding the mmap_sem while doing IO is problematic because it can cause system-wide priority inversions. Consider some large company that does a lot of web traffic. This large company has load balancing logic in it's core web server, cause some engineer thought this was a brilliant plan. This load balancing logic gets statistics from /proc about the system, which trip over processes mmap_sem for various reasons. Now the web server application is in a protected cgroup, but these other processes may not be, and if they are being throttled while their mmap_sem is held we'll stall, and cause this nice death spiral. Instead rework filemap fault path to drop the mmap sem at any point that we may do IO or block for an extended period of time. This includes while issuing readahead, locking the page, or needing to call ->readpage because readahead did not occur. Then once we have a fully uptodate page we can return with VM_FAULT_RETRY and come back again to find our nicely in-cache page that was gotten outside of the mmap_sem. This patch also adds a new helper for locking the page with the mmap_sem dropped. This doesn't make sense currently as generally speaking if the page is already locked it'll have been read in (unless there was an error) before it was unlocked. However a forthcoming patchset will change this with the ability to abort read-ahead bio's if necessary, making it more likely that we could contend for a page lock and still have a not uptodate page. This allows us to deal with this case by grabbing the lock and issuing the IO without the mmap_sem held, and then returning VM_FAULT_RETRY to come back around. [josef@toxicpanda.com: v6] Link: http://lkml.kernel.org/r/20181212152757.10017-1-josef@toxicpanda.com [kirill@shutemov.name: fix race in filemap_fault()] Link: http://lkml.kernel.org/r/20181228235106.okk3oastsnpxusxs@kshutemo-mobl1 [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20181211173801.29535-4-josef@toxicpanda.com Signed-off-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: syzbot+b437b5a429d680cf2217@syzkaller.appspotmail.com Cc: Dave Chinner <david@fromorbit.com> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-15filemap: kill page_cache_read usage in filemap_faultJosef Bacik
Patch series "drop the mmap_sem when doing IO in the fault path", v6. Now that we have proper isolation in place with cgroups2 we have started going through and fixing the various priority inversions. Most are all gone now, but this one is sort of weird since it's not necessarily a priority inversion that happens within the kernel, but rather because of something userspace does. We have giant applications that we want to protect, and parts of these giant applications do things like watch the system state to determine how healthy the box is for load balancing and such. This involves running 'ps' or other such utilities. These utilities will often walk /proc/<pid>/whatever, and these files can sometimes need to down_read(&task->mmap_sem). Not usually a big deal, but we noticed when we are stress testing that sometimes our protected application has latency spikes trying to get the mmap_sem for tasks that are in lower priority cgroups. This is because any down_write() on a semaphore essentially turns it into a mutex, so even if we currently have it held for reading, any new readers will not be allowed on to keep from starving the writer. This is fine, except a lower priority task could be stuck doing IO because it has been throttled to the point that its IO is taking much longer than normal. But because a higher priority group depends on this completing it is now stuck behind lower priority work. In order to avoid this particular priority inversion we want to use the existing retry mechanism to stop from holding the mmap_sem at all if we are going to do IO. This already exists in the read case sort of, but needed to be extended for more than just grabbing the page lock. With io.latency we throttle at submit_bio() time, so the readahead stuff can block and even page_cache_read can block, so all these paths need to have the mmap_sem dropped. The other big thing is ->page_mkwrite. btrfs is particularly shitty here because we have to reserve space for the dirty page, which can be a very expensive operation. We use the same retry method as the read path, and simply cache the page and verify the page is still setup properly the next pass through ->page_mkwrite(). I've tested these patches with xfstests and there are no regressions. This patch (of 3): If we do not have a page at filemap_fault time we'll do this weird forced page_cache_read thing to populate the page, and then drop it again and loop around and find it. This makes for 2 ways we can read a page in filemap_fault, and it's not really needed. Instead add a FGP_FOR_MMAP flag so that pagecache_get_page() will return a unlocked page that's in pagecache. Then use the normal page locking and readpage logic already in filemap_fault. This simplifies the no page in page cache case significantly. [akpm@linux-foundation.org: fix comment text] [josef@toxicpanda.com: don't unlock null page in FGP_FOR_MMAP case] Link: http://lkml.kernel.org/r/20190312201742.22935-1-josef@toxicpanda.com Link: http://lkml.kernel.org/r/20181211173801.29535-2-josef@toxicpanda.com Signed-off-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-14filemap: pass vm_fault to the mmap ra helpersJosef Bacik
All of the arguments to these functions come from the vmf. Cut down on the amount of arguments passed by simply passing in the vmf to these two helpers. Link: http://lkml.kernel.org/r/20181211173801.29535-3-josef@toxicpanda.com Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things - the rest of MM - remove flex_arrays, replace with new simple radix-tree implementation * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits) Drop flex_arrays sctp: convert to genradix proc: commit to genradix generic radix trees selinux: convert to kvmalloc md: convert to kvmalloc openvswitch: convert to kvmalloc of: fix kmemleak crash caused by imbalance in early memory reservation mm: memblock: update comments and kernel-doc memblock: split checks whether a region should be skipped to a helper function memblock: remove memblock_{set,clear}_region_flags memblock: drop memblock_alloc_*_nopanic() variants memblock: memblock_alloc_try_nid: don't panic treewide: add checks for the return value of memblock_alloc*() swiotlb: add checks for the return value of memblock_alloc*() init/main: add checks for the return value of memblock_alloc*() mm/percpu: add checks for the return value of memblock_alloc*() sparc: add checks for the return value of memblock_alloc*() ia64: add checks for the return value of memblock_alloc*() arch: don't memset(0) memory returned by memblock_alloc() ...
2019-03-12mm: memblock: update comments and kernel-docMike Rapoport
* Remove comments mentioning bootmem * Extend "DOC: memblock overview" * Add kernel-doc comments for several more functions [akpm@linux-foundation.org: fix copy-n-paste error] Link: http://lkml.kernel.org/r/1549626347-25461-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: split checks whether a region should be skipped to a helper functionMike Rapoport
__next_mem_range() and __next_mem_range_rev() duplicate the code that checks whether a region should be skipped because of node or flags incompatibility. Split this code into a helper function. Link: http://lkml.kernel.org/r/1549455025-17706-3-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: remove memblock_{set,clear}_region_flagsMike Rapoport
The memblock API provides dedicated helpers to set or clear a flag on a memory region, e.g. memblock_{mark,clear}_hotplug(). The memblock_{set,clear}_region_flags() functions are used only by the memblock internal function that adjusts the region flags. Drop these functions and use open-coded implementation instead. Link: http://lkml.kernel.org/r/1549455025-17706-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: drop memblock_alloc_*_nopanic() variantsMike Rapoport
As all the memblock allocation functions return NULL in case of error rather than panic(), the duplicates with _nopanic suffix can be removed. Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> [printk] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: memblock_alloc_try_nid: don't panicMike Rapoport
As all the memblock_alloc*() users are now checking the return value and panic() in case of error, the panic() call can be removed from the core memblock allocator, namely memblock_alloc_try_nid(). Link: http://lkml.kernel.org/r/1548057848-15136-21-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12treewide: add checks for the return value of memblock_alloc*()Mike Rapoport
Add check for the return value of memblock_alloc*() functions and call panic() in case of error. The panic message repeats the one used by panicing memblock allocators with adjustment of parameters to include only relevant ones. The replacement was mostly automated with semantic patches like the one below with manual massaging of format strings. @@ expression ptr, size, align; @@ ptr = memblock_alloc(size, align); + if (!ptr) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align); [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type] Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc] Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails] Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx [akpm@linux-foundation.org: fix xtensa printk warning] Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Guo Ren <ren_guo@c-sky.com> [c-sky] Acked-by: Paul Burton <paul.burton@mips.com> [MIPS] Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Reviewed-by: Juergen Gross <jgross@suse.com> [Xen] Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12mm/percpu: add checks for the return value of memblock_alloc*()Mike Rapoport
Add panic() calls if memblock_alloc() returns NULL. The panic() format duplicates the one used by memblock itself and in order to avoid explosion with long parameters list replace open coded allocation size calculations with a local variable. Link: http://lkml.kernel.org/r/1548057848-15136-17-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: make memblock_find_in_range_node() and choose_memblock_flags() staticMike Rapoport
These functions are not used outside memblock. Make them static. Link: http://lkml.kernel.org/r/1548057848-15136-12-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: refactor internal allocation functionsMike Rapoport
Currently, memblock has several internal functions with overlapping functionality. They all call memblock_find_in_range_node() to find free memory and then reserve the allocated range and mark it with kmemleak. However, there is difference in the allocation constraints and in fallback strategies. The allocations returning physical address first attempt to find free memory on the specified node within mirrored memory regions, then retry on the same node without the requirement for memory mirroring and finally fall back to all available memory. The allocations returning virtual address start with clamping the allowed range to memblock.current_limit, attempt to allocate from the specified node from regions with mirroring and with user defined minimal address. If such allocation fails, next attempt is done with node restriction lifted. Next, the allocation is retried with minimal address reset to zero and at last without the requirement for mirrored regions. Let's consolidate various fallbacks handling and make them more consistent for physical and virtual variants. Most of the fallback handling is moved to memblock_alloc_range_nid() and it now handles node and mirror fallbacks. The memblock_alloc_internal() uses memblock_alloc_range_nid() to get a physical address of the allocated range and converts it to virtual address. The fallback for allocation below the specified minimal address remains in memblock_alloc_internal() because memblock_alloc_range_nid() is used by CMA with exact requirement for lower bounds. The memblock_phys_alloc_nid() function is completely dropped as it is not used anywhere outside memblock and its only usage can be replaced by a call to memblock_alloc_range_nid(). [rppt@linux.ibm.com: fix parameter order in memblock_phys_alloc_try_nid()] Link: http://lkml.kernel.org/r/20190203113915.GC8620@rapoport-lnx Link: http://lkml.kernel.org/r/1548057848-15136-11-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: drop memblock_alloc_base()Mike Rapoport
The memblock_alloc_base() function tries to allocate a memory up to the limit specified by its max_addr parameter and panics if the allocation fails. Replace its usage with memblock_phys_alloc_range() and make the callers check the return value and panic in case of error. Link: http://lkml.kernel.org/r/1548057848-15136-10-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: drop __memblock_alloc_base()Mike Rapoport
The __memblock_alloc_base() function tries to allocate a memory up to the limit specified by its max_addr parameter. Depending on the value of this parameter, the __memblock_alloc_base() can is replaced with the appropriate memblock_phys_alloc*() variant. Link: http://lkml.kernel.org/r/1548057848-15136-9-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Rob Herring <robh@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: memblock_phys_alloc(): don't panicMike Rapoport
Make the memblock_phys_alloc() function an inline wrapper for memblock_phys_alloc_range() and update the memblock_phys_alloc() callers to check the returned value and panic in case of error. Link: http://lkml.kernel.org/r/1548057848-15136-8-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-