summaryrefslogtreecommitdiffstats
path: root/mm/mmap.c
AgeCommit message (Collapse)Author
2020-06-09mmap locking API: convert mmap_sem commentsMichel Lespinasse
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: convert mmap_sem API commentsMichel Lespinasse
Convert comments that reference old mmap_sem APIs to reference corresponding new mmap locking APIs instead. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: rename mmap_sem to mmap_lockMichel Lespinasse
Rename the mmap_sem field to mmap_lock. Any new uses of this lock should now go through the new mmap locking api. The mmap_lock is still implemented as a rwsem, though this could change in the future. [akpm@linux-foundation.org: fix it for mm-gup-might_lock_readmmap_sem-in-get_user_pages_fast.patch] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-11-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04mm: mmap: fix a typo in comment "compatbility"->"compatibility"Ethon Paul
There is a typo in comment, fix it. Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Link: http://lkml.kernel.org/r/20200410163206.14016-1-ethp@qq.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10mm/vma: introduce VM_ACCESS_FLAGSAnshuman Khandual
There are many places where all basic VMA access flags (read, write, exec) are initialized or checked against as a group. One such example is during page fault. Existing vma_is_accessible() wrapper already creates the notion of VMA accessibility as a group access permissions. Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which will not only reduce code duplication but also extend the VMA accessibility concept in general. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Salter <msalter@redhat.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rob Springer <rspringer@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10mm/mmap.c: initialize align_offset explicitly for vm_unmapped_areaJaewon Kim
On passing requirement to vm_unmapped_area, arch_get_unmapped_area and arch_get_unmapped_area_topdown did not set align_offset. Internally on both unmapped_area and unmapped_area_topdown, if info->align_mask is 0, then info->align_offset was meaningless. But commit df529cabb7a2 ("mm: mmap: add trace point of vm_unmapped_area") always prints info->align_offset even though it is uninitialized. Fix this uninitialized value issue by setting it to 0 explicitly. Before: vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022 After: vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0 Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm: use fallthrough;Joe Perches
Convert the various /* fallthrough */ comments to the pseudo-keyword fallthrough; Done via script: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07mm/vma: make vma_is_accessible() available for general useAnshuman Khandual
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it available for general use. While here, this replaces all remaining open encodings for VMA access check with vma_is_accessible(). Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Guo Ren <guoren@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paulburton@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: mmap: add trace point of vm_unmapped_areaJaewon Kim
Even on 64 bit kernel, the mmap failure can happen for a 32 bit task. Virtual memory space shortage of a task on mmap is reported to userspace as -ENOMEM. It can be confused as physical memory shortage of overall system. The vm_unmapped_area can be called to by some drivers or other kernel core system like filesystem. In my platform, GPU driver calls to vm_unmapped_area and the driver returns -ENOMEM even in GPU side shortage. It can be hard to distinguish which code layer returns the -ENOMEM. Create mmap trace file and add trace point of vm_unmapped_area. i.e.) 277.156599: vm_unmapped_area: addr=77e0d03000 err=0 total_vm=0x17014b flags=0x1 len=0x400000 lo=0x8000 hi=0x7878c27000 mask=0x0 ofs=0x1 342.838740: vm_unmapped_area: addr=0 err=-12 total_vm=0xffb08 flags=0x0 len=0x100000 lo=0x40000000 hi=0xfffff000 mask=0x0 ofs=0x22 [akpm@linux-foundation.org: prefix address printk with 0x, per Matthew] Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200320055823.27089-3-jaewon31.kim@samsung.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mmap: remove inline of vm_unmapped_areaJaewon Kim
Patch series "mm: mmap: add mmap trace point", v3. Create mmap trace file and add trace point of vm_unmapped_area(). This patch (of 2): In preparation for next patch remove inline of vm_unmapped_area and move code to mmap.c. There is no logical change. Also remove unmapped_area[_topdown] out of mm.h, there is no code calling to them. Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20200320055823.27089-2-jaewon31.kim@samsung.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-20mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()Catalin Marinas
Currently the arm64 kernel ignores the top address byte passed to brk(), mmap() and mremap(). When the user is not aware of the 56-bit address limit or relies on the kernel to return an error, untagging such pointers has the potential to create address aliases in user-space. Passing a tagged address to munmap(), madvise() is permitted since the tagged pointer is expected to be inside an existing mapping. The current behaviour breaks the existing glibc malloc() implementation which relies on brk() with an address beyond 56-bit to be rejected by the kernel. Remove untagging in the above functions by partially reverting commit ce18d171cb73 ("mm: untag user pointers in mmap/munmap/mremap/brk"). In addition, update the arm64 tagged-address-abi.rst document accordingly. Link: https://bugzilla.redhat.com/1797052 Fixes: ce18d171cb73 ("mm: untag user pointers in mmap/munmap/mremap/brk") Cc: <stable@vger.kernel.org> # 5.4.x- Cc: Florian Weimer <fweimer@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reported-by: Victor Stinner <vstinner@redhat.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-31mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma()Miaohe Lin
The jump labels try_prev and none are not really needed in find_mergeable_anon_vma(), eliminate them to improve readability. Link: http://lkml.kernel.org/r/1574079844-17493-1-git-send-email-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-27Merge tag 'timers-core-2020-01-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timekeeping and timers departement provides: - Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (ie where host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code" * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n alarmtimer: Use wakeup source from alarmtimer platform device alarmtimer: Make alarmtimer platform device child of RTC device alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality hrtimer: Add missing sparse annotation for __run_timer() lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres() MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources clocksource/drivers/timer-microchip-pit64b: Fix sparse warning clocksource/drivers/exynos_mct: Rename Exynos to lowercase clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access clocksource/drivers/timer-ti-dm: Switch to platform_get_irq clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource clocksource/drivers/bcm2835_timer: Fix memory leak of timer clocksource/drivers/cadence-ttc: Use ttc driver as platform driver clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page ...
2020-01-14x86/vdso: Handle faults on timens pageDmitry Safonov
If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-25-dima@arista.com
2020-01-06arm64: Revert support for execute-only user mappingsCatalin Marinas
The ARMv8 64-bit architecture supports execute-only user permissions by clearing the PTE_USER and PTE_UXN bits, practically making it a mostly privileged mapping but from which user running at EL0 can still execute. The downside, however, is that the kernel at EL1 inadvertently reading such mapping would not trip over the PAN (privileged access never) protection. Revert the relevant bits from commit cab15ce604e5 ("arm64: Introduce execute-only page access permissions") so that PROT_EXEC implies PROT_READ (and therefore PTE_USER) until the architecture gains proper support for execute-only user mappings. Fixes: cab15ce604e5 ("arm64: Introduce execute-only page access permissions") Cc: <stable@vger.kernel.org> # 4.9.x- Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm/mmap.c: make vma_merge() comment more easy to understandWei Yang
Case 1/6, 2/7 and 3/8 have the same pattern and we handle them in the same logic. Rearrange the comment to make it a little easy for audience to understand. Link: http://lkml.kernel.org/r/20191030012445.16944-1-richardw.yang@linux.intel.com Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Michel Lespinasse <walken@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm/mmap.c: use IS_ERR_VALUE to check return value of get_unmapped_areaGaowei Pu
get_unmapped_area() returns an address or -errno on failure. Historically we have checked for the failure by offset_in_page() which is correct but quite hard to read. Newer code started using IS_ERR_VALUE which is much easier to read. Convert remaining users of offset_in_page as well. [mhocko@suse.com: rewrite changelog] [mhocko@kernel.org: fix mremap.c and uprobes.c sites also] Link: http://lkml.kernel.org/r/20191012102512.28051-1-pugaowei@gmail.com Signed-off-by: Gaowei Pu <pugaowei@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Rik van Riel <riel@surriel.com> Cc: Qian Cai <cai@lca.pw> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm/mmap.c: rb_parent is not necessary in __vma_link_list()Wei Yang
Now we use rb_parent to get next, while this is not necessary. When prev is NULL, this means vma should be the first element in the list. Then next should be current first one (mm->mmap), no matter whether we have parent or not. After removing it, the code shows the beauty of symmetry. Link: http://lkml.kernel.org/r/20190813032656.16625-1-richardw.yang@linux.intel.com Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm/mmap.c: extract __vma_unlink_list() as counterpart for __vma_link_list()Wei Yang
Just make the code a little easier to read. Link: http://lkml.kernel.org/r/20191006012636.31521-3-richardw.yang@linux.intel.com Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm/mmap.c: __vma_unlink_prev() is not necessary nowWei Yang
The third parameter of __vma_unlink_common() could differentiate these two types. __vma_unlink_prev() is not necessary now. Link: http://lkml.kernel.org/r/20191006012636.31521-2-richardw.yang@linux.intel.com Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm/mmap.c: prev could be retrieved from vma->vm_prevWei Yang
Currently __vma_unlink_common handles two cases: * has_prev * or not When has_prev is false, it is obvious prev is calculated from vma->vm_prev in __vma_unlink_common. When has_prev is true, the prev is passed through from __vma_unlink_prev in __vma_adjust for non-case 8. And at the beginning next is calculated from vma->vm_next, which implies vma is next->vm_prev. The above statement sounds a little complicated, while to think in another point of view, no matter whether vma and next is swapped, the mmap link list still preserves its property. It is proper to access vma->vm_prev. Link: http://lkml.kernel.org/r/20191006012636.31521-1-richardw.yang@linux.intel.com Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm/mmap.c: remove a never-triggered warning in __vma_adjust()Wei Yang
The upper level of "if" makes sure (end >= next->vm_end), which means there are only two possibilities: 1) end == next->vm_end 2) end > next->vm_end remove_next is assigned to be (1 + end > next->vm_end). This means if remove_next is 1, end must equal to next->vm_end. The VM_WARN_ON will never trigger. Link: http://lkml.kernel.org/r/20190912063126.13250-1-richardw.yang@linux.intel.com Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: untag user pointers in mmap/munmap/mremap/brkCatalin Marinas
There isn't a good reason to differentiate between the user address space layout modification syscalls and the other memory permission/attributes ones (e.g. mprotect, madvise) w.r.t. the tagged address ABI. Untag the user addresses on entry to these functions. Link: http://lkml.kernel.org/r/20190821164730.47450-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Andrey Konovalov <andreyknvl@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Dave P Martin <Dave.Martin@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25augmented rbtree: add new RB_DECLARE_CALLBACKS_MAX macroMichel Lespinasse
Add RB_DECLARE_CALLBACKS_MAX, which generates augmented rbtree callbacks for the case where the augmented value is a scalar whose definition follows a max(f(node)) pattern. This actually covers all present uses of RB_DECLARE_CALLBACKS, and saves some (source) code duplication in the various RBCOMPUTE function definitions. [walken@google.com: fix mm/vmalloc.c] Link: http://lkml.kernel.org/r/CANN689FXgK13wDYNh1zKxdipeTuALG4eKvKpsdZqKFJ-rvtGiQ@mail.gmail.com [walken@google.com: re-add check to check_augmented()] Link: http://lkml.kernel.org/r/20190727022027.GA86863@google.com Link: http://lkml.kernel.org/r/20190703040156.56953-3-walken@google.com Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24mm: mmap: increase sockets maximum memory size pgoff for 32bitsIvan Khoronzhuk
The AF_XDP sockets umem mapping interface uses XDP_UMEM_PGOFF_FILL_RING and XDP_UMEM_PGOFF_COMPLETION_RING offsets. These offsets are established already and are part of the configuration interface. But for 32-bit systems, using AF_XDP socket configuration, these values are too large to pass the maximum allowed file size verification. The offsets can be tuned off, but instead of changing the existing interface, let's extend the max allowed file size for sockets. No one has been using this until this patch with 32 bits as without this fix af_xdp sockets can't be used at all, so it unblocks af_xdp socket usage for 32bit systems. All list of mmap cbs for sockets was verified for side effects and all of them contain dummy cb - sock_no_mmap() at this moment, except the following: xsk_mmap() - it's what this fix is needed for. tcp_mmap() - doesn't have obvious issues with pgoff - no any references on it. packet_mmap() - return -EINVAL if it's even set. Link: http://lkml.kernel.org/r/20190812124326.32146-1-ivan.khoronzhuk@linaro.org Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24mm/mmap.c: refine find_vma_prev() with rb_last()Wei Yang
When addr is out of range of the whole rb_tree, pprev will point to the right-most node. rb_tree facility already provides a helper function, rb_last(), to do this task. We can leverage this instead of reimplementing it. This patch refines find_vma_prev() with rb_last() to make it a little nicer to read. [akpm@linux-foundation.org: little cleanup, per Vlastimil] Link: http://lkml.kernel.org/r/20190809001928.4950-1-richardw.yang@linux.intel.com Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-20vfs: don't allow writes to swap filesDarrick J. Wong
Don't let userspace write to an active swap file because the kernel effectively has a long term lease on the storage and things could get seriously corrupted if we let this happen. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-09x86/mpx, mm/core: Fix recursive munmap() corruptionDave Hansen
This is a bit of a mess, to put it mildly. But, it's a bug that only seems to have showed up in 4.20 but wasn't noticed until now, because nobody uses MPX. MPX has the arch_unmap() hook inside of munmap() because MPX uses bounds tables that protect other areas of memory. When memory is unmapped, there is also a need to unmap the MPX bounds tables. Barring this, unused bounds tables can eat 80% of the address space. But, the recursive do_munmap() that gets called vi arch_unmap() wreaks havoc with __do_munmap()'s state. It can result in freeing populated page tables, accessing bogus VMA state, double-freed VMAs and more. See the "long story" further below for the gory details. To fix this, call arch_unmap() before __do_unmap() has a chance to do anything meaningful. Also, remove the 'vma' argument and force the MPX code to do its own, independent VMA lookup. == UML / unicore32 impact == Remove unused 'vma' argument to arch_unmap(). No functional change. I compile tested this on UML but not unicore32. == powerpc impact == powerpc uses arch_unmap() well to watch for munmap() on the VDSO and zeroes out 'current->mm->context.vdso_base'. Moving arch_unmap() makes this happen earlier in __do_munmap(). But, 'vdso_base' seems to only be used in perf and in the signal delivery that happens near the return to userspace. I can not find any likely impact to powerpc, other than the zeroing happening a little earlier. powerpc does not use the 'vma' argument and is unaffected by its removal. I compile-tested a 64-bit powerpc defconfig. == x86 impact == For the common success case this is functionally identical to what was there before. For the munmap() failure case, it's possible that some MPX tables will be zapped for memory that continues to be in use. But, this is an extraordinarily unlikely scenario and the harm would be that MPX provides no protection since the bounds table got reset (zeroed). I can't imagine anyone doing this: ptr = mmap(); // use ptr ret = munmap(ptr); if (ret) // oh, there was an error, I'll // keep using ptr. Because if you're doing munmap(), you are *done* with the memory. There's probably no good data in there _anyway_. This passes the original reproducer from Richard Biener as well as the existing mpx selftests/. The long story: munmap() has a couple of pieces: 1. Find the affected VMA(s) 2. Split the start/end one(s) if neceesary 3. Pull the VMAs out of the rbtree 4. Actually zap the memory via unmap_region(), including freeing page tables (or queueing them to be freed). 5. Fix up some of the accounting (like fput()) and actually free the VMA itself. This specific ordering was actually introduced by: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") during the 4.20 merge window. The previous __do_munmap() code was actually safe because the only thing after arch_unmap() was remove_vma_list(). arch_unmap() could not see 'vma' in the rbtree because it was detached, so it is not even capable of doing operations unsafe for remove_vma_list()'s use of 'vma'. Richard Biener reported a test that shows this in dmesg: [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551 [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576 What triggered this was the recursive do_munmap() called via arch_unmap(). It was freeing page tables that has not been properly zapped. But, the problem was bigger than this. For one, arch_unmap() can free VMAs. But, the calling __do_munmap() has variables that *point* to VMAs and obviously can't handle them just getting freed while the pointer is still in use. I tried a couple of things here. First, I tried to fix the page table freeing problem in isolation, but I then found the VMA issue. I also tried having the MPX code return a flag if it modified the rbtree which would force __do_munmap() to re-walk to restart. That spiralled out of control in complexity pretty fast. Just moving arch_unmap() and accepting that the bonkers failure case might eat some bounds tables seems like the simplest viable fix. This was also reported in the following kernel bugzilla entry: https://bugzilla.kernel.org/show_bug.cgi?id=203123 There are some reports that this commit triggered this bug: dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap") While that commit certainly made the issues easier to hit, I believe the fundamental issue has been with us as long as MPX itself, thus the Fixes: tag below is for one of the original MPX commits. [ mingo: Minor edits to the changelog and the patch. ] Reported-by: Richard Biener <rguenther@suse.de> Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-um@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: stable@vger.kernel.org Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19coredump: fix race condition between mmget_not_zero()/get_task_mm() and core ↵Andrea Arcangeli
dumping The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: fix some typos in mm directoryWei Yang
No functional change. Link: http://lkml.kernel.org/r/20190118235123.27843-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm/mmap.c: remove some redundancy in arch_get_unmapped_area_topdown()Yang Fan
The variable 'addr' is redundant in arch_get_unmapped_area_topdown(), just use parameter 'addr0' directly. Then remove the const qualifier of the parameter, and change its name to 'addr'. And in according with other functions, remove the const qualifier of all other no-pointer parameters in function arch_get_unmapped_area_topdown(). Link: http://lkml.kernel.org/r/20190127041112.25599-1-nullptr.cpp@gmail.com Signed-off-by: Yang Fan <nullptr.cpp@gmail.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-27mm: enforce min addr even if capable() in expand_downwards()Jann Horn
security_mmap_addr() does a capability check with current_cred(), but we can reach this code from contexts like a VFS write handler where current_cred() must not be used. This can be abused on systems without SMAP to make NULL pointer dereferences exploitable again. Fixes: 8869477a49c3 ("security: protect from stack expansion into low vm addresses") Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28mm/mmap.c: remove verify_mm_writelocked()Yangtao Li
We should get rid of this function. It no longer serves its purpose. This is a historical artifact from 2005 where do_brk was called outside of the core mm. We do have a proper abstraction in vm_brk_flags and that one does the locking properly so there is no need to use this function. Link: http://lkml.kernel.org/r/20181108174856.10811-1-tiny.windzz@gmail.com Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-10mm: mmap: Allow for "high" userspace addressesSteve Capper
This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of arch_get_unmapped_* functions. However, on arm64 we use the generic versions of these functions. Rather than duplicate the generic arch_get_unmapped_* implementations for arm64, this patch instead introduces two architectural helper macros and applies them to arch_get_unmapped_*: arch_get_mmap_end(addr) - get mmap upper limit depending on addr hint arch_get_mmap_base(addr, base) - get mmap_base depending on addr hint If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. Signed-off-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-26mm: brk: downgrade mmap_sem to read when shrinkingYang Shi
brk might be used to shrink memory mapping too other than munmap(). So, it may hold write mmap_sem for long time when shrinking large mapping, as what commit ("mm: mmap: zap pages with read mmap_sem in munmap") described. The brk() will not manipulate vmas anymore after __do_munmap() call for the mapping shrink use case. But, it may set mm->brk after __do_munmap(), which needs hold write mmap_sem. However, a simple trick can workaround this by setting mm->brk before __do_munmap(). Then restore the original value if __do_munmap() fails. With this trick, it is safe to downgrade to read mmap_sem. So, the same optimization, which downgrades mmap_sem to read for zapping pages, is also feasible and reasonable to this case. The period of holding exclusive mmap_sem for shrinking large mapping would be reduced significantly with this optimization. [akpm@linux-foundation.org: tweak comment] [yang.shi@linux.alibaba.com: fix unsigned compare against 0 issue] Link: http://lkml.kernel.org/r/1538687672-17795-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1538067582-60038-2-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26mm: mremap: downgrade mmap_sem to read when shrinkingYang Shi
Other than munmap, mremap might be used to shrink memory mapping too. So, it may hold write mmap_sem for long time when shrinking large mapping, as what commit ("mm: mmap: zap pages with read mmap_sem in munmap") described. The mremap() will not manipulate vmas anymore after __do_munmap() call for the mapping shrink use case, so it is safe to downgrade to read mmap_sem. So, the same optimization, which downgrades mmap_sem to read for zapping pages, is also feasible and reasonable to this case. The period of holding exclusive mmap_sem for shrinking large mapping would be reduced significantly with this optimization. MREMAP_FIXED and MREMAP_MAYMOVE are more complicated to adopt this optimization since they need manipulate vmas after do_munmap(), downgrading mmap_sem may create race window. Simple mapping shrink is the low hanging fruit, and it may cover the most cases of unmap with munmap together. [akpm@linux-foundation.org: tweak comment] [yang.shi@linux.alibaba.com: fix unsigned compare against 0 issue] Link: http://lkml.kernel.org/r/1538687672-17795-2-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1538067582-60038-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26mm: unmap VM_PFNMAP mappings with optimized pathYang Shi
When unmapping VM_PFNMAP mappings, vm flags need to be updated. Since the vmas have been detached, so it sounds safe to update vm flags with read mmap_sem. Link: http://lkml.kernel.org/r/1537376621-51150-4-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26mm: unmap VM_HUGETLB mappings with optimized pathYang Shi
When unmapping VM_HUGETLB mappings, vm flags need to be updated. Since the vmas have been detached, so it sounds safe to update vm flags with read mmap_sem. Link: http://lkml.kernel.org/r/1537376621-51150-3-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26