summaryrefslogtreecommitdiffstats
path: root/mm/hugetlb.c
AgeCommit message (Collapse)Author
2017-06-02mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specifiedJames Morse
KVM uses get_user_pages() to resolve its stage2 faults. KVM sets the FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it finds a VM_FAULT_HWPOISON. KVM handles these hwpoison pages as a special case. (check_user_page_hwpoison()) When huge pages are involved, this doesn't work so well. get_user_pages() calls follow_hugetlb_page(), which stops early if it receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning -EFAULT to the caller. The step to map this to -EHWPOISON based on the FOLL_ flags is missing. The hwpoison special case is skipped, and -EFAULT is returned to user-space, causing Qemu or kvmtool to exit. Instead, move this VM_FAULT_ to errno mapping code into a header file and use it from faultin_page() and follow_hugetlb_page(). With this, KVM works as expected. This isn't a problem for arm64 today as we haven't enabled MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86 too, so I think this should be a fix. This doesn't apply earlier than stable's v4.11.1 due to all sorts of cleanup. [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()] suggested. Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [4.11.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm/hugetlb.c: don't call region_abort if region_chg failsMike Kravetz
Changes to hugetlbfs reservation maps is a two step process. The first step is a call to region_chg to determine what needs to be changed, and prepare that change. This should be followed by a call to call to region_add to commit the change, or region_abort to abort the change. The error path in hugetlb_reserve_pages called region_abort after a failed call to region_chg. As a result, the adds_in_progress counter in the reservation map is off by 1. This is caught by a VM_BUG_ON in resv_map_release when the reservation map is freed. syzkaller fuzzer (when using an injected kmalloc failure) found this bug, that resulted in the following: kernel BUG at mm/hugetlb.c:742! Call Trace: hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493 evict+0x481/0x920 fs/inode.c:553 iput_final fs/inode.c:1515 [inline] iput+0x62b/0xa20 fs/inode.c:1542 hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306 newseg+0x422/0xd30 ipc/shm.c:575 ipcget_new ipc/util.c:285 [inline] ipcget+0x21e/0x580 ipc/util.c:639 SYSC_shmget ipc/shm.c:673 [inline] SyS_shmget+0x158/0x230 ipc/shm.c:657 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742 Link: http://lkml.kernel.org/r/1490821682-23228-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()Naoya Horiguchi
I found the race condition which triggers the following bug when move_pages() and soft offline are called on a single hugetlb page concurrently. Soft offlining page 0x119400 at 0x700000000000 BUG: unable to handle kernel paging request at ffffea0011943820 IP: follow_huge_pmd+0x143/0x190 PGD 7ffd2067 PUD 7ffd1067 PMD 0 [61163.582052] Oops: 0000 [#1] SMP Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check] CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P OE 4.11.0-rc2-mm1+ #2 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:follow_huge_pmd+0x143/0x190 RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202 RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000 RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80 RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000 R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800 R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000 FS: 00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0 Call Trace: follow_page_mask+0x270/0x550 SYSC_move_pages+0x4ea/0x8f0 SyS_move_pages+0xe/0x10 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fc976e03949 RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949 RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827 RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004 R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650 R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000 Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0 CR2: ffffea0011943820 ---[ end trace e4f81353a2d23232 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This bug is triggered when pmd_present() returns true for non-present hugetlb, so fixing the present check in follow_huge_pmd() prevents it. Using pmd_present() to determine present/non-present for hugetlb is not correct, because pmd_present() checks multiple bits (not only _PAGE_PRESENT) for historical reason and it can misjudge hugetlb state. Fixes: e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09mm: convert generic code to 5-level pagingKirill A. Shutemov
Convert all non-architecture-specific code to 5-level paging. It's mostly mechanical adding handling one more page table level in places where we deal with pud_t. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from ↵Ingo Molnar
<linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-24mm: alloc_contig_range: allow to specify GFP maskLucas Stach
Currently alloc_contig_range assumes that the compaction should be done with the default GFP_KERNEL flags. This is probably right for all current uses of this interface, but may change as CMA is used in more use-cases (including being the default DMA memory allocator on some platforms). Change the function prototype, to allow for passing through the GFP mask set by upper layers. Also respect global restrictions by applying memalloc_noio_flags to the passed in flags. Link: http://lkml.kernel.org/r/20170127172328.18574-1-l.stach@pengutronix.de Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alexander Graf <agraf@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmfDave Jiang
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22userfaultfd: hugetlbfs: add UFFDIO_COPY support for shared mappingsMike Kravetz
When userfaultfd hugetlbfs support was originally added, it followed the pattern of anon mappings and did not support any vmas marked VM_SHARED. As such, support was only added for private mappings. Remove this limitation and support shared mappings. The primary functional change required is adding pages to the page cache. More subtle changes are required for huge page reservation handling in error paths. A lengthy comment in the code describes the reservation handling. [mike.kravetz@oracle.com: update] Link: http://lkml.kernel.org/r/c9c8cafe-baa7-05b4-34ea-1dfa5523a85f@oracle.com Link: http://lkml.kernel.org/r/1487195210-12839-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22userfaultfd: hugetlbfs: gup: support VM_FAULT_RETRYAndrea Arcangeli
Add support for VM_FAULT_RETRY to follow_hugetlb_page() so that get_user_pages_unlocked/locked and "nonblocking/FOLL_NOWAIT" features will work on hugetlbfs. This is required for fully functional userfaultfd non-present support on hugetlbfs. Link: http://lkml.kernel.org/r/20161216144821.5183-25-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22userfaultfd: hugetlbfs: add userfaultfd hugetlb hookMike Kravetz
When processing a hugetlb fault for no page present, check the vma to determine if faults are to be handled via userfaultfd. If so, drop the hugetlb_fault_mutex and call handle_userfault(). Link: http://lkml.kernel.org/r/20161216144821.5183-21-aarcange@redhat.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22userfaultfd: hugetlbfs: fix __mcopy_atomic_hugetlb retry/error processingMike Kravetz
The new routine copy_huge_page_from_user() uses kmap_atomic() to map PAGE_SIZE pages. However, this prevents page faults in the subsequent call to copy_from_user(). This is OK in the case where the routine is copied with mmap_sema held. However, in another case we want to allow page faults. So, add a new argument allow_pagefault to indicate if the routine should allow page faults. [dan.carpenter@oracle.com: unmap the correct pointer] Link: http://lkml.kernel.org/r/20170113082608.GA3548@mwanda [akpm@linux-foundation.org: kunmap() takes a page*, per Hugh] Link: http://lkml.kernel.org/r/20161216144821.5183-20-aarcange@redhat.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd supportMike Kravetz
hugetlb_mcopy_atomic_pte is the low level routine that implements the userfaultfd UFFDIO_COPY command. It is based on the existing mcopy_atomic_pte routine with modifications for huge pages. Link: http://lkml.kernel.org/r/20161216144821.5183-18-aarcange@redhat.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10mm/hugetlb.c: fix reservation race when freeing surplus pagesMike Kravetz
return_unused_surplus_pages() decrements the global reservation count, and frees any unused surplus pages that were backing the reservation. Commit 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()") added a call to cond_resched_lock in the loop freeing the pages. As a result, the hugetlb_lock could be dropped, and someone else could use the pages that will be freed in subsequent iterations of the loop. This could result in inconsistent global hugetlb page state, application api failures (such as mmap) failures or application crashes. When dropping the lock in return_unused_surplus_pages, make sure that the global reservation count (resv_huge_pages) remains sufficiently large to prevent someone else from claiming pages about to be freed. Analyzed by Paul Cassella. Fixes: 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()") Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Paul Cassella <cassella@cray.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> [3.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12mm: add tlb_remove_check_page_size_change to track page size changeAneesh Kumar K.V
With commit e77b0852b551 ("mm/mmu_gather: track page size with mmu gather and force flush if page size change") we added the ability to force a tlb flush when the page size change in a mmu_gather loop. We did that by checking for a page size change every time we added a page to mmu_gather for lazy flush/remove. We can improve that by moving the page size change check early and not doing it every time we add a page. This also helps us to do tlb flush when invalidating a range covering dax mapping. Wrt dax mapping we don't have a backing struct page and hence we don't call tlb_remove_page, which earlier forced the tlb flush on page size change. Moving the page size change check earlier means we will do the same even for dax mapping. We also avoid doing this check on architecture other than powerpc. In a later patch we will remove page size check from tlb_remove_page(). Link: http://lkml.kernel.org/r/20161026084839.27299-5-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12mm/hugetlb: add tlb_remove_hugetlb_entry for handling hugetlb pagesAneesh Kumar K.V
This add tlb_remove_hugetlb_entry similar to tlb_remove_pmd_tlb_entry. Link: http://lkml.kernel.org/r/20161026084839.27299-4-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12mm/hugetlb.c: use huge_pte_lock instead of opencoding the lockAneesh Kumar K.V
No functional change by this patch. Link: http://lkml.kernel.org/r/20161018090234.22574-1-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12mm/hugetlb.c: use the right pte val for compare in hugetlb_cowAneesh Kumar K.V
We cannot use the pte value used in set_pte_at for pte_same comparison, because archs like ppc64, filter/add new pte flag in set_pte_at. Instead fetch the pte value inside hugetlb_cow. We are comparing pte value to make sure the pte didn't change since we dropped the page table lock. hugetlb_cow get called with page table lock held, and we can take a copy of the pte value before we drop the page table lock. With hugetlbfs, we optimize the MAP_PRIVATE write fault path with no previous mapping (huge_pte_none entries), by forcing a cow in the fault path. This avoid take an addition fault to covert a read-only mapping to read/write. Here we were comparing a recently instantiated pte (via set_pte_at) to the pte values from linux page table. As explained above on ppc64 such pte_same check returned wrong result, resulting in us taking an additional fault on ppc64. Fixes: 6a119eae942c ("powerpc/mm: Add a _PAGE_PTE bit") Link: http://lkml.kernel.org/r/20161018154245.18023-1-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reported-by: Jan Stancek <jstancek@redhat.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11mm/hugetlb: fix huge page reservation leak in private mapping error pathsMike Kravetz
Error paths in hugetlb_cow() and hugetlb_no_page() may free a newly allocated huge page. If a reservation was associated with the huge page, alloc_huge_page() consumed the reservation while allocating. When the newly allocated page is freed in free_huge_page(), it will increment the global reservation count. However, the reservation entry in the reserve map will remain. This is not an issue for shared mappings as the entry in the reserve map indicates a reservation exists. But, an entry in a private mapping reserve map indicates the reservation was consumed and no longer exists. This results in an inconsistency between the reserve map and the global reservation count. This 'leaks' a reserved huge page. Create a new routine restore_reserve_on_error() to restore the reserve entry in these specific error paths. This routine makes use of a new function vma_add_reservation() which will add a reserve entry for a specific address/page. In general, these error paths were rarely (if ever) taken on most architectures. However, powerpc contained arch specific code that that resulted in an extra fault and execution of these error paths on all private mappings. Fixes: 67961f9db8c4 ("mm/hugetlb: fix huge page reserve accounting for private mappings) Link: http://lkml.kernel.org/r/1476933077-23091-2-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Kirill A . Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07mm: remove unnecessary condition in remove_inode_hugepageszhong jiang
When the huge page is added to the page cahce (huge_add_to_page_cache), the page private flag will be cleared. since this code (remove_inode_hugepages) will only be called for pages in the page cahce, PagePrivate(page) will always be false. The patch remove the code without any functional change. Link: http://lkml.kernel.org/r/1475113323-29368-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07mm/hugetlb: introduce ARCH_HAS_GIGANTIC_PAGEYisheng Xie
Avoid making ifdef get pretty unwieldy if many ARCHs support gigantic page. No functional change with this patch. Link: http://lkml.kernel.org/r/1475227569-63446-2-git-send-email-xieyisheng1@huawei.com Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07mm/hugetlb: improve locking in dissolve_free_huge_pages()Gerald Schaefer
For every pfn aligned to minimum_order, dissolve_free_huge_pages() will call dissolve_free_huge_page() which takes the hugetlb spinlock, even if the page is not huge at all or a hugepage that is in-use. Improve this by doing the PageHuge() and page_count() checks already in dissolve_free_huge_pages() before calling dissolve_free_huge_page(). In dissolve_free_huge_page(), when holding the spinlock, those checks need to be revalidated. Link: http://lkml.kernel.org/r/20160926172811.94033-4-gerald.schaefer@de.ibm.com Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07mm/hugetlb: check for reserved hugepages during memory offlineGerald Schaefer
In dissolve_free_huge_pages(), free hugepages will be dissolved without making sure that there are enough of them left to satisfy hugepage reservations. Fix this by adding a return value to dissolve_free_huge_pages() and checking h->free_huge_pages vs. h->resv_huge_pages. Note that this may lead to the situation where dissolve_free_huge_page() returns an error and all free hugepages that were dissolved before that error are lost, while the memory block still cannot be set offline. Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Link: http://lkml.kernel.org/r/20160926172811.94033-3-gerald.schaefer@de.ibm.com Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07mm/hugetlb: fix memory offline with hugepage size > memory block sizeGerald Schaefer
Patch series "mm/hugetlb: memory offline issues with hugepages", v4. This addresses several issues with hugepages and memory offline. While the first patch fixes a panic, and is therefore rather important, the last patch is just a performance optimization. The second patch fixes a theoretical issue with reserved hugepages, while still leaving some ugly usability issue, see description. This patch (of 3): dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a list corruption and addressing exception when trying to set a memory block offline that is part (but not the first part) of a "gigantic" hugetlb page with a size > memory block size. When no other smaller hugetlb page sizes are present, the VM_BUG_ON() will trigger directly. In the other case we will run into an addressing exception later, because dissolve_free_huge_page() will not work on the head page of the compound hugetlb page which will result in a NULL hstate from page_hstate(). To fix this, first remove the VM_BUG_ON() because it is wrong, and then use the compound head page in dissolve_free_huge_page(). This means that an unused pre-allocated gigantic page that has any part of itself inside the memory block that is going offline will be dissolved completely. Losing an unused gigantic hugepage is preferable to failing the memory offline, for example in the situation where a (possibly faulty) memory DIMM needs to go offline. Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Link: http://lkml.kernel.org/r/20160926172811.94033-2-gerald.schaefer@de.ibm.com Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11mm/hugetlb: fix incorrect hugepages count during mem hotplugzhong jiang
When memory hotplug operates, free hugepages will be freed if the movable node is offline. Therefore, /proc/sys/vm/nr_hugepages will be incorrect. Fix it by reducing max_huge_pages when the node is offlined. n-horiguchi@ah.jp.nec.com said: : dissolve_free_huge_page intends to break a hugepage into buddy, and the : destination hugepage is supposed to be allocated from the pool of the : destination node, so the system-wide pool size is reduced. So adding : h->max_huge_pages-- makes sense to me. Link: http://lkml.kernel.org/r/1470624546-902-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-05Merge tag 'powerpc-4.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "These were delayed for various reasons, so I let them sit in next a bit longer, rather than including them in my first pull request. Fixes: - Fix early access to cpu_spec relocation from Benjamin Herrenschmidt - Fix incorrect event codes in power9-event-list from Madhavan Srinivasan - Move register_process_table() out of ppc_md from Michael Ellerman Use jump_label use for [cpu|mmu]_has_feature(): - Add mmu_early_init_devtree() from Michael Ellerman - Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman - Do hash device tree scanning earlier from Michael Ellerman - Do radix device tree scanning earlier from Michael Ellerman - Do feature patching before MMU init from Michael Ellerman - Check features don't change after patching from Michael Ellerman - Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V - Convert mmu_has_feature() to returning bool from Michael Ellerman - Convert cpu_has_feature() to returning bool from Michael Ellerman - Define radix_enabled() in one place & use static inline from Michael Ellerman - Add early_[cpu|mmu]_has_feature() from Michael Ellerman - Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V - jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao - Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V - Remove mfvtb() from Kevin Hao - Move cpu_has_feature() to a separate file from Kevin Hao - Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman - Add option to use jump label for cpu_has_feature() from Kevin Hao - Add option to use jump label for mmu_has_feature() from Kevin Hao - Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V - Annotate jump label assembly from Michael Ellerman TLB flush enhancements from Aneesh Kumar K.V: - radix: Implement tlb mmu gather flush efficiently - Add helper for finding SLBE LLP encoding - Use hugetlb flush functions - Drop multiple definition of mm_is_core_local - radix: Add tlb flush of THP ptes - radix: Rename function and drop unused arg - radix/hugetlb: Add helper for finding page size - hugetlb: Add flush_hugetlb_tlb_range - remove flush_tlb_page_nohash Add new ptrace regsets from Anshuman Khandual and Simon Guo: - elf: Add powerpc specific core note sections - Add the function flush_tmregs_to_thread - Enable in transaction NT_PRFPREG ptrace requests - Enable in transaction NT_PPC_VMX ptrace requests - Enable in transaction NT_PPC_VSX ptrace requests - Adapt gpr32_get, gpr32_set functions for transaction - Enable support for NT_PPC_CGPR - Enable support for NT_PPC_CFPR - Enable support for NT_PPC_CVMX - Enable support for NT_PPC_CVSX - Enable support for TM SPR state - Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR - Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR - Enable support for EBB registers - Enable support for Performance Monitor registers" * tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits) powerpc/mm: Move register_process_table() out of ppc_md powerpc/perf: Fix incorrect event codes in power9-event-list powerpc/32: Fix early access to cpu_spec relocation powerpc/ptrace: Enable support for Performance Monitor registers powerpc/ptrace: Enable support for EBB registers powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR powerpc/ptrace: Enable support for TM SPR state powerpc/ptrace: Enable support for NT_PPC_CVSX powerpc/ptrace: Enable support for NT_PPC_CVMX powerpc/ptrace: Enable support for NT_PPC_CFPR powerpc/ptrace: Enable support for NT_PPC_CGPR powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests powerpc/process: Add the function flush_tmregs_to_thread elf: Add powerpc specific core note sections powerpc/mm: remove flush_tlb_page_nohash powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range ...
2016-08-02mm, hugetlb: fix huge_pte_alloc BUG_ONMichal Hocko
Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he runs his database load with memory online and offline running in parallel. The reason is that huge_pmd_share might detect a shared pmd which is currently migrated and so it has migration pte which is !pte_huge. There doesn't seem to be any easy way to prevent from the race and in fact seeing the migration swap entry is not harmful. Both callers of huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range will copy the swap entry and make it COW if needed. hugetlb_fault will back off and so the page fault is retries if the page is still under migration and waits for its completion in hugetlb_fault. That means that the BUG_ON is wrong and we should update it. Let's simply check that all present ptes are pte_huge instead. Link: http://lkml.kernel.org/r/20160721074340.GA26398@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: zhongjiang <zhongjiang@huawei.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02mm/hugetlb: avoid soft lockup in set_max_huge_pages()Jia He
In powerpc servers with large memory(32TB), we watched several soft lockups for hugepage under stress tests. The call traces are as follows: 1. get_page_from_freelist+0x2d8/0xd50 __alloc_pages_nodemask+0x180/0xc20 alloc_fresh_huge_page+0xb0/0x190 set_max_huge_pages+0x164/0x3b0 2. prep_new_huge_page+0x5c/0x100 alloc_fresh_huge_page+0xc8/0x190 set_max_huge_pages+0x164/0x3b0 This patch fixes such soft lockups. It is safe to call cond_resched() there because it is out of spin_lock/unlock section. Link: http://lkml.kernel.org/r/1469674442-14848-1-git-send-email-hejianet@gmail.com Signed-off-by: Jia He <hejianet@gmail.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-01powerpc/mm/hugetlb: Add flush_hugetlb_tlb_rangeAneesh Kumar K.V
Some archs like ppc64 need to do special things when flushing tlb for hugepage. Add a new helper to flush hugetlb tlb range. This helps us to avoid flushing the entire tlb mapping for the pid. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-28Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "The rest of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits) mm, compaction: simplify contended compaction handling mm, compaction: introduce direct compaction priority mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations mm, page_alloc: make THP-specific decisions more generic mm, page_alloc: restructure direct compaction handling in slowpath mm, page_alloc: don't retry initial attempt in slowpath mm, page_alloc: set alloc_flags only once in slowpath lib/stackdepot.c: use __GFP_NOWARN for stack allocations mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB mm, kasan: account for object redzone in SLUB's nearest_obj() mm: fix use-after-free if memory allocation failed in vma_adjust() zsmalloc: Delete an unnecessary check before the function call "iput" mm/memblock.c: fix index adjustment error in __next_mem_range_rev() mem-hotplug: alloc new page from a nearest neighbor node when mem-offline mm: optimize copy_page_to/from_iter_iovec mm: add cond_resched() to generic_swapfile_activate() Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode mm: hwpoison: remove incorrect comments make __section_nr() more efficient ...
2016-07-28mm: hwpoison: remove incorrect commentsNaoya Horiguchi
dequeue_hwpoisoned_huge_page() can be called without page lock hold, so let's remove incorrect comment. The reason why the page lock is not really needed is that dequeue_hwpoisoned_huge_page() checks page_huge_active() inside hugetlb_lock, which allows us to avoid trying to dequeue a hugepage that are just allocated but not linked to active list yet, even without taking page lock. Link: http://lkml.kernel.org/r/20160720092901.GA15995@www9186uo.sakura.ne.jp Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Zhan Chen <zhanc1@andrew.cmu.edu> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted cleanups and fixes. Probably the most interesting part long-term is ->d_init() - that will have a bunch of followups in (at least) ceph and lustre, but we'll need to sort the barrier-related rules before it can get used for really non-trivial stuff. Another fun thing is the merge of ->d_iput() callers (dentry_iput() and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all except the one in __d_lookup_lru())" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) fs/dcache.c: avoid soft-lockup in dput() vfs: new d_init method vfs: Update lookup_dcache() comment bdev: get rid of ->bd_inodes Remove last traces of ->sync_page new helper: d_same_name() dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends() vfs: clean up documentation vfs: document ->d_real() vfs: merge .d_select_inode() into .d_real() unify dentry_iput() and dentry_unlink_inode() binfmt_misc: ->s_root is not going anywhere drop redundant ->owner initializations ufs: get rid of redundant checks orangefs: constify inode_operations missed comment updates from ->direct_IO() prototype change file_inode(f)->i_mapping is f->f_mapping trim fsnotify hooks a bit 9p: new helper - v9fs_parent_fid() debugfs: ->d_parent is never NULL or negative ...
2016-07-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...
2016-07-26mm/mmu_gather: track page size with mmu gather and force flush if page size ↵Aneesh Kumar K.V
change This allows an arch which needs to do special handing with respect to different page size when flushing tlb to implement the same in mmu gather. Link: http://lkml.kernel.org/r/1465049193-22197-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26mm/hugetlb: simplify hugetlb unmapAneesh Kumar K.V
For hugetlb like THP (and unlike regular page), we do tlb flush after dropping ptl. Because of the above, we don't need to track force_flush like we do now. Instead we can simply call tlb_remove_page() which will do the flush if needed. No functionality change in this patch. Link: http://lkml.kernel.org/r/1465049193-22197-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "There are a couple of new things for s390 with this merge request: - a new scheduling domain "drawer" is added to reflect the unusual topology found on z13 machines. Performance tests showed up to 8 percent gain with the additional domain. - the new crc-32 checksum crypto module uses the vector-galois-field multiply and sum SIMD instruction to speed up crc-32 and crc-32c. - proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in the generic vmlinux.lds linker script definitions. - kcov instrumentation support. A prerequisite for that is the inline assembly basic block cleanup, which is the reason for the net/iucv/iucv.c change. - support for 2GB pages is added to the hugetlbfs backend. Then there are two removals: - the oprofile hardware sampling support is dead code and is removed. The oprofile user space uses the perf interface nowadays. - the ETR clock synchronization is removed, this has been superseeded be the STP clock synchronization. And it always has been "interesting" code.. And the usual bug fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits) s390/pci: Delete an unnecessary check before the function call "pci_dev_put" s390/smp: clean up a condition s390/cio/chp : Remove deprecated create_singlethread_workqueue s390/chsc: improve channel path descriptor determination s390/chsc: sanitize fmt check for chp_desc determination s390/cio: make fmt1 channel path descriptor optional s390/chsc: fix ioctl CHSC_INFO_CU command s390/cio/device_ops: fix kernel doc s390/cio: allow to reset channel measurement block s390/console: Make preferred console handling more consistent s390/mm: fix gmap tlb flush issues s390/mm: add support for 2GB hugepages s390: have unique symbol for __switch_to address s390/cpuinfo: show maximum thread id s390/ptrace: clarify bits in the per_struct s390: stack address vs thread_info s390: remove pointless load within __switch_to s390: enable kcov support s390/cpumf: use basic block for ecctr inline assembly s390/hypfs: use basic block for diag inline assembly ...
2016-07-15mm: thp: refix false positive BUG in page_move_anon_rmap()Hugh Dickins
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's worth: the syzkaller fuzzer hit it again. It's still wrong for some THP cases, because linear_page_index() was never intended to apply to addresses before the start of a vma. That's easily fixed with a signed long cast inside linear_page_index(); and Dmitry has tested such a patch, to verify the false positive. But why extend linear_page_index() just for this case? when the avoidance in page_move_anon_rmap() has already grown ugly, and there's no reason for the check at all (nothing else there is using address or index). Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE, remove CONFIG_DEBUG_VM PageTransHuge adjustment. And one more thing: should the compound_head(page) be done inside or outside page_move_anon_rmap()? It's usually pushed down to the lowest level nowadays (and mm/memory.c shows no other explicit use of it), so I think it's better done in page_move_anon_rmap() than by caller. Fixes: 0798d3c022dc ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()") Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-06s390/mm: add support for 2GB hugepagesGerald Schaefer
This adds support for 2GB hugetlbfs pages on s390. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-30Merge branch 'd_real' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into work.misc
2016-06-24mm/hugetlb: clear compound_mapcount when freeing gigantic pagesGerald Schaefer
While working on s390 support for gigantic hugepages I ra