summaryrefslogtreecommitdiffstats
path: root/mm/slab.h
AgeCommit message (Collapse)Author
2020-12-15Merge tag 'net-next-5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - support "prefer busy polling" NAPI operation mode, where we defer softirq for some time expecting applications to periodically busy poll - AF_XDP: improve efficiency by more batching and hindering the adjacency cache prefetcher - af_packet: make packet_fanout.arr size configurable up to 64K - tcp: optimize TCP zero copy receive in presence of partial or unaligned reads making zero copy a performance win for much smaller messages - XDP: add bulk APIs for returning / freeing frames - sched: support fragmenting IP packets as they come out of conntrack - net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs BPF: - BPF switch from crude rlimit-based to memcg-based memory accounting - BPF type format information for kernel modules and related tracing enhancements - BPF implement task local storage for BPF LSM - allow the FENTRY/FEXIT/RAW_TP tracing programs to use bpf_sk_storage Protocols: - mptcp: improve multiple xmit streams support, memory accounting and many smaller improvements - TLS: support CHACHA20-POLY1305 cipher - seg6: add support for SRv6 End.DT4/DT6 behavior - sctp: Implement RFC 6951: UDP Encapsulation of SCTP - ppp_generic: add ability to bridge channels directly - bridge: Connectivity Fault Management (CFM) support as is defined in IEEE 802.1Q section 12.14. Drivers: - mlx5: make use of the new auxiliary bus to organize the driver internals - mlx5: more accurate port TX timestamping support - mlxsw: - improve the efficiency of offloaded next hop updates by using the new nexthop object API - support blackhole nexthops - support IEEE 802.1ad (Q-in-Q) bridging - rtw88: major bluetooth co-existance improvements - iwlwifi: support new 6 GHz frequency band - ath11k: Fast Initial Link Setup (FILS) - mt7915: dual band concurrent (DBDC) support - net: ipa: add basic support for IPA v4.5 Refactor: - a few pieces of in_interrupt() cleanup work from Sebastian Andrzej Siewior - phy: add support for shared interrupts; get rid of multiple driver APIs and have the drivers write a full IRQ handler, slight growth of driver code should be compensated by the simpler API which also allows shared IRQs - add common code for handling netdev per-cpu counters - move TX packet re-allocation from Ethernet switch tag drivers to a central place - improve efficiency and rename nla_strlcpy - number of W=1 warning cleanups as we now catch those in a patchwork build bot Old code removal: - wan: delete the DLCI / SDLA drivers - wimax: move to staging - wifi: remove old WDS wifi bridging support" * tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits) net: hns3: fix expression that is currently always true net: fix proc_fs init handling in af_packet and tls nfc: pn533: convert comma to semicolon af_vsock: Assign the vsock transport considering the vsock address flags af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path vsock_addr: Check for supported flag values vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag vm_sockets: Add flags field in the vsock address data structure net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context nfc: s3fwrn5: Release the nfc firmware net: vxget: clean up sparse warnings mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3 mlxsw: spectrum_router_xm: Introduce basic XM cache flushing mlxsw: reg: Add Router LPM Cache Enable Register mlxsw: reg: Add Router LPM Cache ML Delete Register mlxsw: spectrum_router_xm: Implement L-value tracking for M-index mlxsw: reg: Add XM Router M Table Register ...
2020-12-15mm: extract might_alloc() debug checkDaniel Vetter
Extracted from slab.h, which seems to have the most complete version including the correct might_sleep() check. Roll it out to slob.c. Motivated by a discussion with Paul about possibly changing call_rcu behaviour to allocate memory, but only roughly every 500th call. There are a lot fewer places in the kernel that care about whether allocating memory is allowed or not (due to deadlocks with reclaim code) than places that care whether sleeping is allowed. But debugging these also tends to be a lot harder, so nice descriptive checks could come in handy. I might have some use eventually for annotations in drivers/gpu. Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking does not consult the PF_MEMALLOC flags. But there is no flag equivalent for GFP_NOWAIT, hence this check can't go wrong due to memalloc_no*_save/restore contexts. Willy is working on a patch series which might change this: https://lore.kernel.org/linux-mm/20200625113122.7540-7-willy@infradead.org/ I think best would be if that updates gfpflags_allow_blocking(), since there's a ton of callers all over the place for that already. Link: https://lkml.kernel.org/r/20201125162532.1299794-3-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Waiman Long <longman@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Qian Cai <cai@lca.pw> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Christian König <christian.koenig@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström (Intel) <thomas_os@shipmail.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: memcontrol: remove unused mod_memcg_obj_state()Muchun Song
Since commit 991e7673859e ("mm: memcontrol: account kernel stack per node") there is no user of the mod_memcg_obj_state(). So just remove it. Also rework type of the idx parameter of the mod_objcg_state() from int to enum node_stat_item. Link: https://lkml.kernel.org/r/20201013153504.92602-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Christopher Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-06mm: memcg/slab: fix obj_cgroup_charge() return value handlingRoman Gushchin
Commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") introduced a regression into the handling of the obj_cgroup_charge() return value. If a non-zero value is returned (indicating of exceeding one of memory.max limits), the allocation should fail, instead of falling back to non-accounted mode. To make the code more readable, move memcg_slab_pre_alloc_hook() and memcg_slab_post_alloc_hook() calling conditions into bodies of these hooks. Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201127161828.GD840171@carbon.dhcp.thefacebook.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-02mm: memcontrol/slab: Use helpers to access slab page's memcg_dataRoman Gushchin
To gather all direct accesses to struct page's memcg_data field in one place, let's introduce 3 new helpers to use in the slab accounting code: struct obj_cgroup **page_objcgs(struct page *page); struct obj_cgroup **page_objcgs_check(struct page *page); bool set_page_objcgs(struct page *page, struct obj_cgroup **objcgs); They are similar to the corresponding API for generic pages, except that the setter can return false, indicating that the value has been already set from a different thread. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20201027001657.3398190-3-guro@fb.com Link: https://lore.kernel.org/bpf/20201201215900.3569844-3-guro@fb.com
2020-12-02mm: memcontrol: Use helpers to read page's memcg dataRoman Gushchin
Patch series "mm: allow mapping accounted kernel pages to userspace", v6. Currently a non-slab kernel page which has been charged to a memory cgroup can't be mapped to userspace. The underlying reason is simple: PageKmemcg flag is defined as a page type (like buddy, offline, etc), so it takes a bit from a page->mapped counter. Pages with a type set can't be mapped to userspace. But in general the kmemcg flag has nothing to do with mapping to userspace. It only means that the page has been accounted by the page allocator, so it has to be properly uncharged on release. Some bpf maps are mapping the vmalloc-based memory to userspace, and their memory can't be accounted because of this implementation detail. This patchset removes this limitation by moving the PageKmemcg flag into one of the free bits of the page->mem_cgroup pointer. Also it formalizes accesses to the page->mem_cgroup and page->obj_cgroups using new helpers, adds several checks and removes a couple of obsolete functions. As the result the code became more robust with fewer open-coded bit tricks. This patch (of 4): Currently there are many open-coded reads of the page->mem_cgroup pointer, as well as a couple of read helpers, which are barely used. It creates an obstacle on a way to reuse some bits of the pointer for storing additional bits of information. In fact, we already do this for slab pages, where the last bit indicates that a pointer has an attached vector of objcg pointers instead of a regular memcg pointer. This commits uses 2 existing helpers and introduces a new helper to converts all read sides to calls of these helpers: struct mem_cgroup *page_memcg(struct page *page); struct mem_cgroup *page_memcg_rcu(struct page *page); struct mem_cgroup *page_memcg_check(struct page *page); page_memcg_check() is intended to be used in cases when the page can be a slab page and have a memcg pointer pointing at objcg vector. It does check the lowest bit, and if set, returns NULL. page_memcg() contains a VM_BUG_ON_PAGE() check for the page not being a slab page. To make sure nobody uses a direct access, struct page's mem_cgroup/obj_cgroups is converted to unsigned long memcg_data. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
2020-10-18mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current()Roman Gushchin
Patch series "mm: kmem: kernel memory accounting in an interrupt context". This patchset implements memcg-based memory accounting of allocations made from an interrupt context. Historically, such allocations were passed unaccounted mostly because charging the memory cgroup of the current process wasn't an option. Also performance reasons were likely a reason too. The remote charging API allows to temporarily overwrite the currently active memory cgroup, so that all memory allocations are accounted towards some specified memory cgroup instead of the memory cgroup of the current process. This patchset extends the remote charging API so that it can be used from an interrupt context. Then it removes the fence that prevented the accounting of allocations made from an interrupt context. It also contains a couple of optimizations/code refactorings. This patchset doesn't directly enable accounting for any specific allocations, but prepares the code base for it. The bpf memory accounting will likely be the first user of it: a typical example is a bpf program parsing an incoming network packet, which allocates an entry in hashmap map to store some information. This patch (of 4): Currently memcg_kmem_bypass() is called before obtaining the current memory/obj cgroup using get_mem/obj_cgroup_from_current(). Moving memcg_kmem_bypass() into get_mem/obj_cgroup_from_current() reduces the number of call sites and allows further code simplifications. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200827225843.1270629-1-guro@fb.com Link: http://lkml.kernel.org/r/20200827225843.1270629-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/slab.h: remove duplicate includeYueHaibing
Remove duplicate header which is included twice. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200818114323.58156-1-yuehaibing@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13mm: memcg/slab: uncharge during kmem_cache_free_bulk()Bharata B Rao
Object cgroup charging is done for all the objects during allocation, but during freeing, uncharging ends up happening for only one object in the case of bulk allocation/freeing. Fix this by having a separate call to uncharge all the objects from kmem_cache_free_bulk() and by modifying memcg_slab_free_hook() to take care of bulk uncharging. Fixes: 964d4bd370d5 ("mm: memcg/slab: save obj_cgroup for non-root slab objects" Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201009060423.390479-1-bharata@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: slab: rename (un)charge_slab_page() to (un)account_slab_page()Roman Gushchin
charge_slab_page() and uncharge_slab_page() are not related anymore to memcg charging and uncharging. In order to make their names less confusing, let's rename them to account_slab_page() and unaccount_slab_page() respectively. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200707173612.124425-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: remove unused argument by charge_slab_page()Roman Gushchin
charge_slab_page() is not using the gfp argument anymore, remove it. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200707173612.124425-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: use a single set of kmem_caches for all allocationsRoman Gushchin
Instead of having two sets of kmem_caches: one for system-wide and non-accounted allocations and the second one shared by all accounted allocations, we can use just one. The idea is simple: space for obj_cgroup metadata can be allocated on demand and filled only for accounted allocations. It allows to remove a bunch of code which is required to handle kmem_cache clones for accounted allocations. There is no more need to create them, accumulate statistics, propagate attributes, etc. It's a quite significant simplification. Also, because the total number of slab_caches is reduced almost twice (not all kmem_caches have a memcg clone), some additional memory savings are expected. On my devvm it additionally saves about 3.5% of slab memory. [guro@fb.com: fix build on MIPS] Link: http://lkml.kernel.org/r/20200717214810.3733082-1-guro@fb.com Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-18-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: deprecate slab_root_cachesRoman Gushchin
Currently there are two lists of kmem_caches: 1) slab_caches, which contains all kmem_caches, 2) slab_root_caches, which contains only root kmem_caches. And there is some preprocessor magic to have a single list if CONFIG_MEMCG_KMEM isn't enabled. It was required earlier because the number of non-root kmem_caches was proportional to the number of memory cgroups and could reach really big values. Now, when it cannot exceed the number of root kmem_caches, there is really no reason to maintain two lists. We never iterate over the slab_root_caches list on any hot paths, so it's perfectly fine to iterate over slab_caches and filter out non-root kmem_caches. It allows to remove a lot of config-dependent code and two pointers from the kmem_cache structure. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-16-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: remove memcg_kmem_get_cache()Roman Gushchin
The memcg_kmem_get_cache() function became really trivial, so let's just inline it into the single call point: memcg_slab_pre_alloc_hook(). It will make the code less bulky and can also help the compiler to generate a better code. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-15-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: simplify memcg cache creationRoman Gushchin
Because the number of non-root kmem_caches doesn't depend on the number of memory cgroups anymore and is generally not very big, there is no more need for a dedicated workqueue. Also, as there is no more need to pass any arguments to the memcg_create_kmem_cache() except the root kmem_cache, it's possible to just embed the work structure into the kmem_cache and avoid the dynamic allocation of the work structure. This will also simplify the synchronization: for each root kmem_cache there is only one work. So there will be no more concurrent attempts to create a non-root kmem_cache for a root kmem_cache: the second and all following attempts to queue the work will fail. On the kmem_cache destruction path there is no more need to call the expensive flush_workqueue() and wait for all pending works to be finished. Instead, cancel_work_sync() can be used to cancel/wait for only one work. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-14-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: use a single set of kmem_caches for all accounted allocationsRoman Gushchin
This is fairly big but mostly red patch, which makes all accounted slab allocations use a single set of kmem_caches instead of creating a separate set for each memory cgroup. Because the number of non-root kmem_caches is now capped by the number of root kmem_caches, there is no need to shrink or destroy them prematurely. They can be perfectly destroyed together with their root counterparts. This allows to dramatically simplify the management of non-root kmem_caches and delete a ton of code. This patch performs the following changes: 1) introduces memcg_params.memcg_cache pointer to represent the kmem_cache which will be used for all non-root allocations 2) reuses the existing memcg kmem_cache creation mechanism to create memcg kmem_cache on the first allocation attempt 3) memcg kmem_caches are named <kmemcache_name>-memcg, e.g. dentry-memcg 4) simplifies memcg_kmem_get_cache() to just return memcg kmem_cache or schedule it's creation and return the root cache 5) removes almost all non-root kmem_cache management code (separate refcounter, reparenting, shrinking, etc) 6) makes slab debugfs to display root_mem_cgroup css id and never show :dead and :deact flags in the memcg_slabinfo attribute. Following patches in the series will simplify the kmem_cache creation. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-13-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: charge individual slab objects instead of pagesRoman Gushchin
Switch to per-object accounting of non-root slab objects. Charging is performed using obj_cgroup API in the pre_alloc hook. Obj_cgroup is charged with the size of the object and the size of metadata: as now it's the size of an obj_cgroup pointer. If the amount of memory has been charged successfully, the actual allocation code is executed. Otherwise, -ENOMEM is returned. In the post_alloc hook if the actual allocation succeeded, corresponding vmstats are bumped and the obj_cgroup pointer is saved. Otherwise, the charge is canceled. On the free path obj_cgroup pointer is obtained and used to uncharge the size of the releasing object. Memcg and lruvec counters are now representing only memory used by active slab objects and do not include the free space. The free space is shared and doesn't belong to any specific cgroup. Global per-node slab vmstats are still modified from (un)charge_slab_page() functions. The idea is to keep all slab pages accounted as slab pages on system level. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-10-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: save obj_cgroup for non-root slab objectsRoman Gushchin
Store the obj_cgroup pointer in the corresponding place of page->obj_cgroups for each allocated non-root slab object. Make sure that each allocated object holds a reference to obj_cgroup. Objcg pointer is obtained from the memcg->objcg dereferencing in memcg_kmem_get_cache() and passed from pre_alloc_hook to post_alloc_hook. Then in case of successful allocation(s) it's getting stored in the page->obj_cgroups vector. The objcg obtaining part look a bit bulky now, but it will be simplified by next commits in the series. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-9-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg/slab: allocate obj_cgroups for non-root slab pagesRoman Gushchin
Allocate and release memory to store obj_cgroup pointers for each non-root slab page. Reuse page->mem_cgroup pointer to store a pointer to the allocated space. This commit temporarily increases the memory footprint of the kernel memory accounting. To store obj_cgroup pointers we'll need a place for an objcg_pointer for each allocated object. However, the following patches in the series will enable sharing of slab pages between memory cgroups, which will dramatically increase the total slab utilization. And the final memory footprint will be significantly smaller than before. To distinguish between obj_cgroups and memcg pointers in case when it's not obvious which one is used (as in page_cgroup_ino()), let's always set the lowest bit in the obj_cgroup case. The original obj_cgroups pointer is marked to be ignored by kmemleak, which otherwise would report a memory leak for each allocated vector. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-8-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcontrol: decouple reference counting from page accountingJohannes Weiner
The reference counting of a memcg is currently coupled directly to how many 4k pages are charged to it. This doesn't work well with Roman's new slab controller, which maintains pools of objects and doesn't want to keep an extra balance sheet for the pages backing those objects. This unusual refcounting design (reference counts usually track pointers to an object) is only for historical reasons: memcg used to not take any css references and simply stalled offlining until all charges had been reparented and the page counters had dropped to zero. When we got rid of the reparenting requirement, the simple mechanical translation was to take a reference for every charge. More historical context can be found in commit e8ea14cc6ead ("mm: memcontrol: take a css reference for each charged page"), commit 64f219938941 ("mm: memcontrol: remove obsolete kmemcg pinning tricks") and commit b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups"). The new slab controller exposes the limitations in this scheme, so let's switch it to a more idiomatic reference counting model based on actual kernel pointers to the memcg: - The per-cpu stock holds a reference to the memcg its caching - User pages hold a reference for their page->mem_cgroup. Transparent huge pages will no longer acquire tail references in advance, we'll get them if needed during the split. - Kernel pages hold a reference for their page->mem_cgroup - Pages allocated in the root cgroup will acquire and release css references for simplicity. css_get() and css_put() optimize that. - The current memcg_charge_slab() already hacked around the per-charge references; this change gets rid of that as well. - tcp accounting will handle reference in mem_cgroup_sk_{alloc,free} Roman: 1) Rebased on top of the current mm tree: added css_get() in mem_cgroup_charge(), dropped mem_cgroup_try_charge() part 2) I've reformatted commit references in the commit log to make checkpatch.pl happy. [hughd@google.com: remove css_put_many() from __mem_cgroup_clear_mc()] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007302011450.2347@eggly.anvils Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200623174037.3951353-6-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: memcg: convert vmstat slab counters to bytesRoman Gushchin
In order to prepare for per-object slab memory accounting, convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes. To make it obvious, rename them to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB). Internally global and per-node counters are stored in pages, however memcg and lruvec counters are stored in bytes. This scheme may look weird, but only for now. As soon as slab pages will be shared between multiple cgroups, global and node counters will reflect the total number of slab pages. However memcg and lruvec counters will be used for per-memcg slab memory tracking, which will take separate kernel objects in the account. Keeping global and node counters in pages helps to avoid additional overhead. The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it will fit into atomic_long_t we use for vmstats. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-4-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm, slab/slub: improve error reporting and overhead of cache_from_obj()Vlastimil Babka
cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get the cache from its page in kmem_cache_free()") to support kmemcg, where per-memcg cache can be different from the root one, so we can't use the kmem_cache pointer given to kmem_cache_free(). Prior to that commit, SLUB already had debugging check+warning that could be enabled to compare the given kmem_cache pointer to one referenced by the slab page where the object-to-be-freed resides. This check was moved to cache_from_obj(). Later the check was also enabled for SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate cache membership under freelist hardening"). These checks and warnings can be useful especially for the debugging, which can be improved. Commit 598a0717a816 changed the pr_err() with WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported, others are silent. This patch changes it to WARN() so that all errors are reported. It's also useful to print SLUB allocation/free tracking info for the offending object, if tracking is enabled. Thus, export the SLUB print_tracking() function and provide an empty one for SLAB. For SLUB we can also benefit from the static key check in kmem_cache_debug_flags(), but we need to move this function to slab.h and declare the static key there. [1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com [vbabka@suse.cz: avoid bogus WARN()] Link: https://lore.kernel.org/r/20200623090213.GW5535@shao2-debian Link: http://lkml.kernel.org/r/b33e0fa7-cd28-4788-9e54-5927846329ef@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Garrett <mjg59@google.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Link: http://lkml.kernel.org/r/afeda7ac-748b-33d8-a905-56b708148ad5@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm, slab/slub: move and improve cache_from_obj()Vlastimil Babka
The function cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get the cache from its page in kmem_cache_free()") to support kmemcg, where per-memcg cache can be different from the root one, so we can't use the kmem_cache pointer given to kmem_cache_free(). Prior to that commit, SLUB already had debugging check+warning that could be enabled to compare the given kmem_cache pointer to one referenced by the slab page where the object-to-be-freed resides. This check was moved to cache_from_obj(). Later the check was also enabled for SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate cache membership under freelist hardening"). These checks and warnings can be useful especially for the debugging, which can be improved. Commit 598a0717a816 changed the pr_err() with WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported, others are silent. This patch changes it to WARN() so that all errors are reported. It's also useful to print SLUB allocation/free tracking info for the offending object, if tracking is enabled. We could export the SLUB print_tracking() function and provide an empty one for SLAB, or realize that both the debugging and hardening cases in cache_from_obj() are only supported by SLUB anyway. So this patch moves cache_from_obj() from slab.h to separate instances in slab.c and slub.c, where the SLAB version only does the kmemcg lookup and even could be completely removed once the kmemcg rework [1] is merged. The SLUB version can thus easily use the print_tracking() function. It can also use the kmem_cache_debug_flags() static key check for improved performance in kernels without the hardening and with debugging not enabled on boot. [1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-10-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm, slab: check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_orderLong Li
kmalloc cannot allocate memory from HIGHMEM. Allocating large amounts of memory currently bypasses the check and will simply leak the memory when page_address() returns NULL. To fix this, factor the GFP_SLAB_BUG_MASK check out of slab & slub, and call it from kmalloc_order() as well. In order to make the code clear, the warning message is put in one place. Signed-off-by: Long Li <lonuxli.64@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200704035027.GA62481@lilong Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26mm, slab: fix sign conversion problem in memcg_uncharge_slab()Waiman Long
It was found that running the LTP test on a PowerPC system could produce erroneous values in /proc/meminfo, like: MemTotal: 531915072 kB MemFree: 507962176 kB MemAvailable: 1100020596352 kB Using bisection, the problem is tracked down to commit 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()"). In memcg_uncharge_slab() with a "int order" argument: unsigned int nr_pages = 1 << order; : mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages); The mod_lruvec_state() function will eventually call the __mod_zone_page_state() which accepts a long argument. Depending on the compiler and how inlining is done, "-nr_pages" may be treated as a negative number or a very large positive number. Apparently, it was treated as a large positive number in that PowerPC system leading to incorrect stat counts. This problem hasn't been seen in x86-64 yet, perhaps the gcc compiler there has some slight difference in behavior. It is fixed by making nr_pages a signed value. For consistency, a similar change is applied to memcg_charge_slab() as well. Link: http://lkml.kernel.org/r/20200620184719.10994-1-longman@redhat.com Fixes: 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()"). Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: kmem: rename (__)memcg_kmem_(un)charge_memcg() to __memcg_kmem_(un)charge()Roman Gushchin
Drop the _memcg suffix from (__)memcg_kmem_(un)charge functions. It's shorter and more obvious. These are the most basic functions which are just (un)charging the given cgroup with the given amount of pages. Also fix up the corresponding comments. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/20200109202659.752357-7-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: memcg/slab: cache page number in memcg_(un)charge_slab()Roman Gushchin
There are many places in memcg_charge_slab() and memcg_uncharge_slab() which are calculating the number of pages to charge, css references to grab etc depending on the order of the slab page. Let's simplify the code by calculating it once and caching in the local variable. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/20200109202659.752357-6-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: kmem: switch to nr_pages in (__)memcg_kmem_charge_memcg()Roman Gushchin
These functions are charging the given number of kernel pages to the given memory cgroup. The number doesn't have to be a power of two. Let's make them to take the unsigned int nr_pages as an argument instead of the page order. It makes them look consistent with the corresponding uncharge functions and functions like: mem_cgroup_charge_skmem(memcg, nr_pages). Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/20200109202659.752357-5-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: kmem: cleanup memcg_kmem_uncharge_memcg() argumentsRoman Gushchin
Drop the unused page argument and put the memcg pointer at the first place. This make the function consistent with its peers: __memcg_kmem_uncharge_memcg(), memcg_kmem_charge_memcg(), etc. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/20200109202659.752357-3-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: kmem: cleanup (__)memcg_kmem_charge_memcg() argumentsRoman Gushchin
Patch series "mm: memcg: kmem API cleanup", v2. This patchset aims to clean up the kernel memory charging API. It doesn't bring any functional changes, just removes unused arguments, renames some functions and fixes some comments. Currently it's not obvious which functions are most basic (memcg_kmem_(un)charge_memcg()) and which are based on them (memcg_kmem_(un)charge()). The patchset renames these functions and removes unused arguments: TL;DR: was: memcg_kmem_charge_memcg(page, gfp, order, memcg) memcg_kmem_uncharge_memcg(memcg, nr_pages) memcg_kmem_charge(page, gfp, order) memcg_kmem_uncharge(page, order) now: memcg_kmem_charge(memcg, gfp, nr_pages) memcg_kmem_uncharge(memcg, nr_pages) memcg_kmem_charge_page(page, gfp, order) memcg_kmem_uncharge_page(page, order) This patch (of 6): The first argument of memcg_kmem_charge_memcg() and __memcg_kmem_charge_memcg() is the page pointer and it's not used. Let's drop it. Memcg pointer is passed as the last argument. Move it to the first place for consistency with other memcg functions, e.g. __memcg_kmem_uncharge_memcg() or try_charge(). Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/20200109202659.752357-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm: clean up and clarify lruvec lookup procedureJohannes Weiner
There is a per-memcg lruvec and a NUMA node lruvec. Which one is being used is somewhat confusing right now, and it's easy to make mistakes - especially when it comes to global reclaim. How it works: when memory cgroups are enabled, we always use the root_mem_cgroup's per-node lruvecs. When memory cgroups are not compiled in or disabled at runtime, we use pgdat->lruvec. Document that in a comment. Due to the way the reclaim code is generalized, all lookups use the mem_cgroup_lruvec() helper function, and nobody should have to find the right lruvec manually right now. But to avoid future mistakes, rename the pgdat->lruvec member to pgdat->__lruvec and delete the convenience wrapper that suggests it's a commonly accessed member. While in this area, swap the mem_cgroup_lruvec() argument order. The name suggests a memcg operation, yet it takes a pgdat first and a memcg second. I have to double take every time I call this. Fix that. Link: http://lkml.kernel.org/r/20191022144803.302233-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm, slab: make kmalloc_info[] contain all types of namesPengfei Li
Patch series "mm, slab: Make kmalloc_info[] contain all types of names", v6. There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). Patch1 predefines the names of all types of kmalloc to save the time spent dynamically generating names. These changes make sense, and the time spent by new_kmalloc_cache() has been reduced by approximately 36.3%. Time spent by new_kmalloc_cache() (CPU cycles) 5.3-rc7 66264 5.3-rc7+patch 42188 This patch (of 3): There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Besides, remove the kmalloc_cache_name() that is no longer used. Link: http://lkml.kernel.org/r/1569241648-26908-2-git-send-email-lpf.vector@gmail.com Signed-off-by: Pengfei Li <lpf.vector@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-06mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properlyRoman Gushchin
page_cgroup_ino() doesn't return a valid memcg pointer for non-compound slab pages, because it depends on PgHead AND PgSlab flags to be set to determine the memory cgroup from the kmem_cache. It's correct for compound pages, but not for generic small pages. Those don't have PgHead set, so it ends up returning zero. Fix this by replacing the condition to PageSlab() && !PageTail(). Before this patch: [root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/user@0.service/ | grep slab 0x0000000000000080 38 0 _______S___________________________________ slab After this patch: [root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/user@0.service/ | grep slab 0x0000000000000080 147 0 _______S___________________________________ slab Also, hwpoison_filter_task() uses output of page_cgroup_ino() in order to filter error injection events based on memcg. So if page_cgroup_ino() fails to return memcg pointer, we just fail to inject memory error. Considering that hwpoison filter is for testing, affected users are limited and the impact should be marginal. [n-horiguchi@ah.jp.nec.com: changelog additions] Link: http://lkml.kernel.org/r/20191031012151.2722280-1-guro@fb.com Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24