summaryrefslogtreecommitdiffstats
path: root/mm
AgeCommit message (Collapse)Author
2018-06-15mm: fix oom_kill event handlingRoman Gushchin
Commit e27be240df53 ("mm: memcg: make sure memory.events is uptodate when waking pollers") converted most of memcg event counters to per-memcg atomics, which made them less confusing for a user. The "oom_kill" counter remained untouched, so now it behaves differently than other counters (including "oom"). This adds nothing but confusion. Let's fix this by adding the MEMCG_OOM_KILL event, and follow the MEMCG_OOM approach. This also removes a hack from count_memcg_event_mm(), introduced earlier specially for the OOM_KILL counter. [akpm@linux-foundation.org: fix for droppage of memcg-replace-mm-owner-with-mm-memcg.patch] Link: http://lkml.kernel.org/r/20180508124637.29984-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15mm: use octal not symbolic permissionsJoe Perches
mm/*.c files use symbolic and octal styles for permissions. Using octal and not symbolic permissions is preferred by many as more readable. https://lkml.org/lkml/2016/8/2/1945 Prefer the direct use of octal for permissions. Done using $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c and some typing. Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l 44 After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l 86 Miscellanea: o Whitespace neatening around these conversions. Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdownsMel Gorman
Commit 5d1904204c99 ("mremap: fix race between mremap() and page cleanning") fixed races between mremap and other operations for both file-backed and anonymous mappings. The file-backed was the most critical as it allowed the possibility that data could be changed on a physical page after page_mkclean returned which could trigger data loss or data integrity issues. A customer reported that the cost of the TLBs for anonymous regressions was excessive and resulting in a 30-50% drop in performance overall since this commit on a microbenchmark. Unfortunately I neither have access to the test-case nor can I describe what it does other than saying that mremap operations dominate heavily. This patch removes the LATENCY_LIMIT to handle TLB flushes on a PMD boundary instead of every 64 pages to reduce the number of TLB shootdowns by a factor of 8 in the ideal case. LATENCY_LIMIT was almost certainly used originally to limit the PTL hold times but the latency savings are likely offset by the cost of IPIs in many cases. This patch is not reported to completely restore performance but gets it within an acceptable percentage. The given metric here is simply described as "higher is better". Baseline that was known good 002: Metric: 91.05 004: Metric: 109.45 008: Metric: 73.08 016: Metric: 58.14 032: Metric: 61.09 064: Metric: 57.76 128: Metric: 55.43 Current 001: Metric: 54.98 002: Metric: 56.56 004: Metric: 41.22 008: Metric: 35.96 016: Metric: 36.45 032: Metric: 35.71 064: Metric: 35.73 128: Metric: 34.96 With patch 001: Metric: 61.43 002: Metric: 81.64 004: Metric: 67.92 008: Metric: 51.67 016: Metric: 50.47 032: Metric: 52.29 064: Metric: 50.01 128: Metric: 49.04 So for low threads, it's not restored but for larger number of threads, it's closer to the "known good" baseline. Using a different mremap-intensive workload that is not representative of the real workload there is little difference observed outside of noise in the headline metrics However, the TLB shootdowns are reduced by 11% on average and at the peak, TLB shootdowns were reduced by 21%. Interrupts were sampled every second while the workload ran to get those figures. It's known that the figures will vary as the non-representative load is non-deterministic. An alternative patch was posted that should have significantly reduced the TLB flushes but unfortunately it does not perform as well as this version on the customer test case. If revisited, the two patches can stack on top of each other. Link: http://lkml.kernel.org/r/20180606183803.k7qaw2xnbvzshv34@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15mm/memblock: add missing include <linux/bootmem.h>Mathieu Malaterre
Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis") introduced two new function definitions: memblock_virt_alloc_try_nid_nopanic() memblock_virt_alloc_try_nid() Commit ea1f5f3712af ("mm: define memblock_virt_alloc_try_nid_raw") introduced the following function definition: memblock_virt_alloc_try_nid_raw() This commit adds an includeof header file <linux/bootmem.h> to provide the missing function prototypes. Silence the following gcc warning (W=1): mm/memblock.c:1334:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_raw' [-Wmissing-prototypes] mm/memblock.c:1371:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_nopanic' [-Wmissing-prototypes] mm/memblock.c:1407:15: warning: no previous prototype for `memblock_virt_alloc_try_nid' [-Wmissing-prototypes] Link: http://lkml.kernel.org/r/20180606194144.16990-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15mm: fix race between kmem_cache destroy, create and deactivateShakeel Butt
The memcg kmem cache creation and deactivation (SLUB only) is asynchronous. If a root kmem cache is destroyed whose memcg cache is in the process of creation or deactivation, the kernel may crash. Example of one such crash: general protection fault: 0000 [#1] SMP PTI CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp ... Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn RIP: 0010:has_cpu_slab ... Call Trace: ? on_each_cpu_cond __kmem_cache_shrink kmemcg_cache_deact_after_rcu kmemcg_deactivate_workfn process_one_work worker_thread kthread ret_from_fork+0x35/0x40 To fix this race, on root kmem cache destruction, mark the cache as dying and flush the workqueue used for memcg kmem cache creation and deactivation. SLUB's memcg kmem cache deactivation also includes RCU callback and thus make sure all previous registered RCU callbacks have completed as well. [shakeelb@google.com: handle the RCU callbacks for SLUB deactivation] Link: http://lkml.kernel.org/r/20180611192951.195727-1-shakeelb@google.com [shakeelb@google.com: add more documentation, rename fields for readability] Link: http://lkml.kernel.org/r/20180522201336.196994-1-shakeelb@google.com [akpm@linux-foundation.org: fix build, per Shakeel] [shakeelb@google.com: v3. Instead of refcount, flush the workqueue] Link: http://lkml.kernel.org/r/20180530001204.183758-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180521174116.171846-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15mm/swapfile.c: fix swap_count comment about nonexistent SWAP_HAS_CONTDaniel Jordan
Commit 570a335b8e22 ("swap_info: swap count continuations") introduces COUNT_CONTINUED but refers to it incorrectly as SWAP_HAS_CONT in a comment in swap_count. Fix it. Link: http://lkml.kernel.org/r/20180612175919.30413-1-daniel.m.jordan@oracle.com Fixes: 570a335b8e22 ("swap_info: swap count continuations") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15mm: fix null pointer dereference in mem_cgroup_protectedRoman Gushchin
Shakeel reported a crash in mem_cgroup_protected(), which can be triggered by memcg reclaim if the legacy cgroup v1 use_hierarchy=0 mode is used: BUG: unable to handle kernel NULL pointer dereference at 0000000000000120 PGD 8000001ff55da067 P4D 8000001ff55da067 PUD 1fdc7df067 PMD 0 Oops: 0000 [#4] SMP PTI CPU: 0 PID: 15581 Comm: bash Tainted: G D 4.17.0-smp-clean #5 Hardware name: ... RIP: 0010:mem_cgroup_protected+0x54/0x130 Code: 4c 8b 8e 00 01 00 00 4c 8b 86 08 01 00 00 48 8d 8a 08 ff ff ff 48 85 d2 ba 00 00 00 00 48 0f 44 ca 48 39 c8 0f 84 cf 00 00 00 <48> 8b 81 20 01 00 00 4d 89 ca 4c 39 c8 4c 0f 46 d0 4d 85 d2 74 05 RSP: 0000:ffffabe64dfafa58 EFLAGS: 00010286 RAX: ffff9fb6ff03d000 RBX: ffff9fb6f5b1b000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9fb6f5b1b000 RDI: ffff9fb6f5b1b000 RBP: ffffabe64dfafb08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000c800 R12: ffffabe64dfafb88 R13: ffff9fb6f5b1b000 R14: ffffabe64dfafb88 R15: ffff9fb77fffe000 FS: 00007fed1f8ac700(0000) GS:ffff9fb6ff400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000120 CR3: 0000001fdcf86003 CR4: 00000000001606f0 Call Trace: ? shrink_node+0x194/0x510 do_try_to_free_pages+0xfd/0x390 try_to_free_mem_cgroup_pages+0x123/0x210 try_charge+0x19e/0x700 mem_cgroup_try_charge+0x10b/0x1a0 wp_page_copy+0x134/0x5b0 do_wp_page+0x90/0x460 __handle_mm_fault+0x8e3/0xf30 handle_mm_fault+0xfe/0x220 __do_page_fault+0x262/0x500 do_page_fault+0x28/0xd0 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x485b72 The problem happens because parent_mem_cgroup() returns a NULL pointer, which is dereferenced later without a check. As cgroup v1 has no memory guarantee support, let's make mem_cgroup_protected() immediately return MEMCG_PROT_NONE, if the given cgroup has no parent (non-hierarchical mode is used). Link: http://lkml.kernel.org/r/20180611175418.7007-2-guro@fb.com Fixes: bf8d5d52ffe8 ("memcg: introduce memory.min") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Shakeel Butt <shakeelb@google.com> Tested-by: Shakeel Butt <shakeelb@google.com> Tested-by: John Stultz <john.stultz@linaro.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15mm/ksm.c: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm()Jia He
In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by PAGE_SIZE unaligned for rmap_item->address under memory pressure tests(start 20 guests and run memhog in the host). WARNING: CPU: 4 PID: 4641 at virt/kvm/arm/mmu.c:1826 kvm_age_hva_handler+0xc0/0xc8 CPU: 4 PID: 4641 Comm: memhog Tainted: G W 4.17.0-rc3+ #8 Call trace: kvm_age_hva_handler+0xc0/0xc8 handle_hva_to_gpa+0xa8/0xe0 kvm_age_hva+0x4c/0xe8 kvm_mmu_notifier_clear_flush_young+0x54/0x98 __mmu_notifier_clear_flush_young+0x6c/0xa0 page_referenced_one+0x154/0x1d8 rmap_walk_ksm+0x12c/0x1d0 rmap_walk+0x94/0xa0 page_referenced+0x194/0x1b0 shrink_page_list+0x674/0xc28 shrink_inactive_list+0x26c/0x5b8 shrink_node_memcg+0x35c/0x620 shrink_node+0x100/0x430 do_try_to_free_pages+0xe0/0x3a8 try_to_free_pages+0xe4/0x230 __alloc_pages_nodemask+0x564/0xdc0 alloc_pages_vma+0x90/0x228 do_anonymous_page+0xc8/0x4d0 __handle_mm_fault+0x4a0/0x508 handle_mm_fault+0xf8/0x1b0 do_page_fault+0x218/0x4b8 do_translation_fault+0x90/0xa0 do_mem_abort+0x68/0xf0 el0_da+0x24/0x28 In rmap_walk_ksm, the rmap_item->address might still have the STABLE_FLAG, then the start and end in handle_hva_to_gpa might not be PAGE_SIZE aligned. Thus it will cause exceptions in handle_hva_to_gpa on arm64. This patch fixes it by ignoring (not removing) the low bits of address when doing rmap_walk_ksm. IMO, it should be backported to stable tree. the storm of WARN_ONs is very easy for me to reproduce. More than that, I watched a panic (not reproducible) as follows: page:ffff7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0 flags: 0x1fffc00000000000() raw: 1fffc00000000000 0000000000000000 0000000000000000 ffffecf981470000 raw: dead000000000100 dead000000000200 ffff8017c001c000 0000000000000000 page dumped because: nonzero _refcount CPU: 29 PID: 18323 Comm: qemu-kvm Tainted: G W 4.14.15-5.hxt.aarch64 #1 Hardware name: <snip for confidential issues> Call trace: dump_backtrace+0x0/0x22c show_stack+0x24/0x2c dump_stack+0x8c/0xb0 bad_page+0xf4/0x154 free_pages_check_bad+0x90/0x9c free_pcppages_bulk+0x464/0x518 free_hot_cold_page+0x22c/0x300 __put_page+0x54/0x60 unmap_stage2_range+0x170/0x2b4 kvm_unmap_hva_handler+0x30/0x40 handle_hva_to_gpa+0xb0/0xec kvm_unmap_hva_range+0x5c/0xd0 I even injected a fault on purpose in kvm_unmap_hva_range by seting size=size-0x200, the call trace is similar as above. So I thought the panic is similarly caused by the root cause of WARN_ON. Andrea said: : It looks a straightforward safe fix, on x86 hva_to_gfn_memslot would : zap those bits and hide the misalignment caused by the low metadata : bits being erroneously left set in the address, but the arm code : notices when that's the last page in the memslot and the hva_end is : getting aligned and the size is below one page. : : I think the problem triggers in the addr += PAGE_SIZE of : unmap_stage2_ptes that never matches end because end is aligned but : addr is not. : : } while (pte++, addr += PAGE_SIZE, addr != end); : : x86 again only works on hva_start/hva_end after converting it to : gfn_start/end and that being in pfn units the bits are zapped before : they risk to cause trouble. Jia He said: : I've tested by myself in arm64 server (QDF2400,46 cpus,96G mem) Without : this patch, the WARN_ON is very easy for reproducing. After this patch, I : have run the same benchmarch for a whole day without any WARN_ONs Link: http://lkml.kernel.org/r/1525403506-6750-1-git-send-email-hejianet@gmail.com Signed-off-by: Jia He <jia.he@hxt-semitech.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Jia He <hejianet@gmail.com> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-12treewide: Use array_size() in vmalloc()Kees Cook
The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: kvzalloc() -> kvcalloc()Kees Cook
The kvzalloc() function has a 2-factor argument form, kvcalloc(). This patch replaces cases of: kvzalloc(a * b, gfp) with: kvcalloc(a * b, gfp) as well as handling cases of: kvzalloc(a * b * c, gfp) with: kvzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kvcalloc(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kvzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kvzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kvzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kvzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kvzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kvzalloc( - sizeof(char) * COUNT + COUNT , ...) | kvzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kvzalloc + kvcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kvzalloc + kvcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kvzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kvzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kvzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kvzalloc(C1 * C2 * C3, ...) | kvzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kvzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kvzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kvzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kvzalloc(sizeof(THING) * C2, ...) | kvzalloc(sizeof(TYPE) * C2, ...) | kvzalloc(C1 * C2 * C3, ...) | kvzalloc(C1 * C2, ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - (E1) * E2 + E1, E2 , ...) | - kvzalloc + kvcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kvzalloc + kvcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-08Merge tag 'libnvdimm-for-4.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This adds a user for the new 'bytes-remaining' updates to memcpy_mcsafe() that you already received through Ingo via the x86-dax- for-linus pull. Not included here, but still targeting this cycle, is support for handling memory media errors (poison) consumed via userspace dax mappings. Summary: - DAX broke a fundamental assumption of truncate of file mapped pages. The truncate path assumed that it is safe to disconnect a pinned page from a file and let the filesystem reclaim the physical block. With DAX the page is equivalent to the filesystem block. Introduce dax_layout_busy_page() to enable filesystems to wait for pinned DAX pages to be released. Without this wait a filesystem could allocate blocks under active device-DMA to a new file. - DAX arranges for the block layer to be bypassed and uses dax_direct_access() + copy_to_iter() to satisfy read(2) calls. However, the memcpy_mcsafe() facility is available through the pmem block driver. In order to safely handle media errors, via the DAX block-layer bypass, introduce copy_to_iter_mcsafe(). - Fix cache management policy relative to the ACPI NFIT Platform Capabilities Structure to properly elide cache flushes when they are not necessary. The table indicates whether CPU caches are power-fail protected. Clarify that a deep flush is always performed on REQ_{FUA,PREFLUSH} requests" * tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) dax: Use dax_write_cache* helpers libnvdimm, pmem: Do not flush power-fail protected CPU caches libnvdimm, pmem: Unconditionally deep flush on *sync libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH acpi, nfit: Remove ecc_unit_size dax: dax_insert_mapping_entry always succeeds libnvdimm, e820: Register all pmem resources libnvdimm: Debug probe times linvdimm, pmem: Preserve read-only setting for pmem devices x86, nfit_test: Add unit test for memcpy_mcsafe() pmem: Switch to copy_to_iter_mcsafe() dax: Report bytes remaining in dax_iomap_actor() dax: Introduce a ->copy_to_iter dax operation uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation xfs, dax: introduce xfs_break_dax_layouts() xfs: prepare xfs_break_layouts() for another layout type xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL mm, fs, dax: handle layout changes to pinned dax mappings mm: fix __gup_device_huge vs unmap mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS ...
2018-06-08Merge branch 'for-4.18/dax' into libnvdimm-for-nextDan Williams
2018-06-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - v9fs updates - MM - procfs updates - lib/ updates - autofs updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) autofs: small cleanup in autofs_getpath() autofs: clean up includes autofs: comment on selinux changes needed for module autoload autofs: update MAINTAINERS entry for autofs autofs: use autofs instead of autofs4 in documentation autofs: rename autofs documentation files autofs: create autofs Kconfig and Makefile autofs: delete fs/autofs4 source files autofs: update fs/autofs4/Makefile autofs: update fs/autofs4/Kconfig autofs: copy autofs4 to autofs autofs4: use autofs instead of autofs4 everywhere autofs4: merge auto_fs.h and auto_fs4.h fs/binfmt_misc.c: do not allow offset overflow checkpatch: improve patch recognition lib/ucs2_string.c: add MODULE_LICENSE() lib/mpi: headers cleanup lib/percpu_ida.c: use _irqsave() instead of local_irq_save() + spin_lock lib/idr.c: remove simple_ida_lock lib/bitmap.c: micro-optimization for __bitmap_complement() ...
2018-06-07mm: kvmalloc does not fallback to vmalloc for incompatible gfp flagsMichal Hocko
kvmalloc warned about incompatible gfp_mask to catch abusers (mostly GFP_NOFS) with an intention that this will motivate authors of the code to fix those. Linus argues that this just motivates people to do even more hacks like if (gfp == GFP_KERNEL) kvmalloc else kmalloc I haven't seen this happening much (Linus pointed to bucket_lock special cases an atomic allocation but my git foo hasn't found much more) but it is true that we can grow those in future. Therefore Linus suggested to simply not fallback to vmalloc for incompatible gfp flags and rather stick with the kmalloc path. Link: http://lkml.kernel.org/r/20180601115329.27807-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07mm/shmem.c: zero out unused vma fields in shmem_pseudo_vma_init()Kirill A. Shutemov
shmem/tmpfs uses pseudo vma to allocate page with correct NUMA policy. The pseudo vma doesn't have vm_page_prot set. We are going to encode encryption KeyID in vm_page_prot. Having garbage there causes problems. Zero out all unused fields in the pseudo vma. Link: http://lkml.kernel.org/r/20180531135602.20321-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07mm, page_alloc: do not break __GFP_THISNODE by zonelist resetVlastimil Babka
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for allocations that can ignore memory policies. The zonelist is obtained from current CPU's node. This is a problem for __GFP_THISNODE allocations that want to allocate on a different node, e.g. because the allocating thread has been migrated to a different CPU. This has been observed to break SLAB in our 4.4-based kernel, because there it relies on __GFP_THISNODE working as intended. If a slab page is put on wrong node's list, then further list manipulations may corrupt the list because page_to_nid() is used to determine which node's list_lock should be locked and thus we may take a wrong lock and race. Current SLAB implementation seems to be immune by luck thanks to commit 511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on arbitrary node") but there may be others assuming that __GFP_THISNODE works as promised. We can fix it by simply removing the zonelist reset completely. There is actually no reason to reset it, because memory policies and cpusets don't affect the zonelist choice in the first place. This was different when commit 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their own restricted zonelists. We might consider this for 4.17 although I don't know if there's anything currently broken. SLAB is currently not affected, but in kernels older than 4.7 that don't yet have 511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on arbitrary node") it is. That's at least 4.4 LTS. Older ones I'll have to check. So stable backports should be more important, but will have to be reviewed carefully, as the code went through many changes. BTW I think that also the ac->preferred_zoneref reset is currently useless if we don't also reset ac->nodemask from a mempolicy to NULL first (which we probably should for the OOM victims etc?), but I would leave that for a separate patch. Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07userfaultfd: prevent non-cooperative events vs mcopy_atomic racesMike Rapoport
If a process monitored with userfaultfd changes it's memory mappings or forks() at the same time as uffd monitor fills the process memory with UFFDIO_COPY, the actual creation of page table entries and copying of the data in mcopy_atomic may happen either before of after the memory mapping modifications and there is no way for the uffd monitor to maintain consistent view of the process memory layout. For instance, let's consider fork() running in parallel with userfaultfd_copy(): process | uffd monitor ---------------------------------+------------------------------ fork() | userfaultfd_copy() ... | ... dup_mmap() | down_read(mmap_sem) down_write(mmap_sem) | /* create PTEs, copy data */ dup_uffd() | up_read(mmap_sem) copy_page_range() | up_write(mmap_sem) | dup_uffd_complete() | /* notify monitor */ | If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will be present by the time copy_page_range() is called and they will appear in the child's memory mappings. However, if the fork() is the first to take the mmap_sem, the new pages won't be mapped in the child's address space. If the pages are not present and child tries to access them, the monitor will get page fault notification and everything is fine. However, if the pages *are present*, the child can access them without uffd noticing. And if we copy them into child it'll see the wrong data. Since we are talking about background copy, we'd need to decide whether the pages should be copied or not regardless #PF notifications. Since userfaultfd monitor has no way to determine what was the order, let's disallow userfaultfd_copy in parallel with the non-cooperative events. In such case we return -EAGAIN and the uffd monitor can understand that userfaultfd_copy() clashed with a non-cooperative event and take an appropriate action. Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07mm: memcg: allow lowering memory.swap.max below the current usageTejun Heo
Currently an attempt to set swap.max into a value lower than the actual swap usage fails, which causes configuration problems as there's no way of lowering the configuration below the current usage short of turning off swap entirely. This makes swap.max difficult to use and allows delegatees to lock the delegator out of reducing swap allocation. This patch updates swap_max_write() so that the limit can be lowered below the current usage. It doesn't implement active reclaiming of swap entries for the following reasons. * mem_cgroup_swap_full() already tells the swap machinary to aggressively reclaim swap entries if the usage is above 50% of limit, so simply lowering the limit automatically triggers gradual reclaim. * Forcing back swapped out pages is likely to heavily impact the workload and mess up the working set. Given that swap usually is a lot less valuable and less scarce, letting the existing usage dissipate over time through the above gradual reclaim and as they're falted back in is likely the better behavior. Link: http://lkml.kernel.org/r/20180523185041.GR1718769@devbig577.frc2.facebook.com Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07mm/shmem.c: use new return type vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. See commit 1c8f422059ae ("mm: change return type to vm_fault_t") vmf_error() is the newly introduce inline function in 4.17-rc6. Link: http://lkml.kernel.org/r/20180521202410.GA17912@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07slub: remove 'reserved' file from sysfsMatthew Wilcox
Christoph doubts anyone was using the 'reserved' file in sysfs, so remove it. Link: http://lkml.kernel.org/r/20180518194519.3820-17-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07slub: remove kmem_cache->reservedMatthew Wilcox