summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorHugh Dickins <hughd@google.com>2020-12-15 14:21:31 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2020-12-15 14:48:04 -0800
commit15b447361794271f4d03c04d82276a841fe06328 (patch)
tree1b43dc5bf2be76c4dba7ab3ab60c70d6e3d5a774 /Documentation
parent2a5e4e340b0fe0f8d402196a466887db6a270b9b (diff)
mm/lru: revise the comments of lru_lock
Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix the incorrect comments in code. Also fixed some zone->lru_lock comment error from ancient time. etc. I struggled to understand the comment above move_pages_to_lru() (surely it never calls page_referenced()), and eventually realized that most of it had got separated from shrink_active_list(): move that comment back. Link: https://lkml.kernel.org/r/1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Jann Horn <jannh@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/cgroup-v1/memcg_test.rst15
-rw-r--r--Documentation/admin-guide/cgroup-v1/memory.rst21
-rw-r--r--Documentation/trace/events-kmem.rst2
-rw-r--r--Documentation/vm/unevictable-lru.rst22
4 files changed, 21 insertions, 39 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
index 4f83de2dab6e..45b94f7b3beb 100644
--- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst
+++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
@@ -133,18 +133,9 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
8. LRU
======
- Each memcg has its own private LRU. Now, its handling is under global
- VM's control (means that it's handled under global pgdat->lru_lock).
- Almost all routines around memcg's LRU is called by global LRU's
- list management functions under pgdat->lru_lock.
-
- A special function is mem_cgroup_isolate_pages(). This scans
- memcg's private LRU and call __isolate_lru_page() to extract a page
- from LRU.
-
- (By __isolate_lru_page(), the page is removed from both of global and
- private LRU.)
-
+ Each memcg has its own vector of LRUs (inactive anon, active anon,
+ inactive file, active file, unevictable) of pages from each node,
+ each LRU handled under a single lru_lock for that memcg and node.
9. Typical Tests.
=================
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index a44cd467d218..52688ae34461 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -287,20 +287,17 @@ When oom event notifier is registered, event will be delivered.
2.6 Locking
-----------
- lock_page_cgroup()/unlock_page_cgroup() should not be called under
- the i_pages lock.
+Lock order is as follows:
- Other lock order is following:
+ Page lock (PG_locked bit of page->flags)
+ mm->page_table_lock or split pte_lock
+ lock_page_memcg (memcg->move_lock)
+ mapping->i_pages lock
+ lruvec->lru_lock.
- PG_locked.
- mm->page_table_lock
- pgdat->lru_lock
- lock_page_cgroup.
-
- In many cases, just lock_page_cgroup() is called.
-
- per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
- pgdat->lru_lock, it has no lock of its own.
+Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
+lruvec->lru_lock; PG_lru bit of page->flags is cleared before
+isolating a page from its LRU under lruvec->lru_lock.
2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
-----------------------------------------------
diff --git a/Documentation/trace/events-kmem.rst b/Documentation/trace/events-kmem.rst
index 555484110e36..68fa75247488 100644
--- a/Documentation/trace/events-kmem.rst
+++ b/Documentation/trace/events-kmem.rst
@@ -69,7 +69,7 @@ When pages are freed in batch, the also mm_page_free_batched is triggered.
Broadly speaking, pages are taken off the LRU lock in bulk and
freed in batch with a page list. Significant amounts of activity here could
indicate that the system is under memory pressure and can also indicate
-contention on the zone->lru_lock.
+contention on the lruvec->lru_lock.
4. Per-CPU Allocator Activity
=============================
diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst
index 17d0861b0f1d..0e1490524f53 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -33,7 +33,7 @@ reclaim in Linux. The problems have been observed at customer sites on large
memory x86_64 systems.
To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of
-main memory will have over 32 million 4k pages in a single zone. When a large
+main memory will have over 32 million 4k pages in a single node. When a large
fraction of these pages are not evictable for any reason [see below], vmscan
will spend a lot of time scanning the LRU lists looking for the small fraction
of pages that are evictable. This can result in a situation where all CPUs are
@@ -55,7 +55,7 @@ unevictable, either by definition or by circumstance, in the future.
The Unevictable Page List
-------------------------
-The Unevictable LRU infrastructure consists of an additional, per-zone, LRU list
+The Unevictable LRU infrastructure consists of an additional, per-node, LRU list
called the "unevictable" list and an associated page flag, PG_unevictable, to
indicate that the page is being managed on the unevictable list.
@@ -84,15 +84,9 @@ The unevictable list does not differentiate between file-backed and anonymous,
swap-backed pages. This differentiation is only important while the pages are,
in fact, evictable.
-The unevictable list benefits from the "arrayification" of the per-zone LRU
+The unevictable list benefits from the "arrayification" of the per-node LRU
lists and statistics originally proposed and posted by Christoph Lameter.
-The unevictable list does not use the LRU pagevec mechanism. Rather,
-unevictable pages are placed directly on the page's zone's unevictable list
-under the zone lru_lock. This allows us to prevent the stranding of pages on
-the unevictable list when one task has the page isolated from the LRU and other
-tasks are changing the "evictability" state of the page.
-
Memory Control Group Interaction
--------------------------------
@@ -101,8 +95,8 @@ The unevictable LRU facility interacts with the memory control group [aka
memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the
lru_list enum.
-The memory controller data structure automatically gets a per-zone unevictable
-list as a result of the "arrayification" of the per-zone LRU lists (one per
+The memory controller data structure automatically gets a per-node unevictable
+list as a result of the "arrayification" of the per-node LRU lists (one per
lru_list enum element). The memory controller tracks the movement of pages to
and from the unevictable list.
@@ -196,7 +190,7 @@ for the sake of expediency, to leave a unevictable page on one of the regular
active/inactive LRU lists for vmscan to deal with. vmscan checks for such
pages in all of the shrink_{active|inactive|page}_list() functions and will
"cull" such pages that it encounters: that is, it diverts those pages to the
-unevictable list for the zone being scanned.
+unevictable list for the node being scanned.
There may be situations where a page is mapped into a VM_LOCKED VMA, but the
page is not marked as PG_mlocked. Such pages will make it all the way to
@@ -328,7 +322,7 @@ If the page was NOT already mlocked, mlock_vma_page() attempts to isolate the
page from the LRU, as it is likely on the appropriate active or inactive list
at that time. If the isolate_lru_page() succeeds, mlock_vma_page() will put
back the page - by calling putback_lru_page() - which will notice that the page
-is now mlocked and divert the page to the zone's unevictable list. If
+is now mlocked and divert the page to the node's unevictable list. If
mlock_vma_page() is unable to isolate the page from the LRU, vmscan will handle
it later if and when it attempts to reclaim the page.
@@ -603,7 +597,7 @@ Some examples of these unevictable pages on the LRU lists are:
unevictable list in mlock_vma_page().
shrink_inactive_list() also diverts any unevictable pages that it finds on the
-inactive lists to the appropriate zone's unevictable list.
+inactive lists to the appropriate node's unevictable list.
shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd
after shrink_active_list() had moved them to the inactive list, or pages mapped