From 99cb252f5e68d72afa3245a4e73d216d295cd335 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Tue, 12 Nov 2019 16:22:19 -0400 Subject: mm/mmu_notifier: add an interval tree notifier MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Of the 13 users of mmu_notifiers, 8 of them use only invalidate_range_start/end() and immediately intersect the mmu_notifier_range with some kind of internal list of VAs. 4 use an interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list of some kind (scif_dma, vhost, gntdev, hmm) And the remaining 5 either don't use invalidate_range_start() or do some special thing with it. It turns out that building a correct scheme with an interval tree is pretty complicated, particularly if the use case is synchronizing against another thread doing get_user_pages(). Many of these implementations have various subtle and difficult to fix races. This approach puts the interval tree as common code at the top of the mmu notifier call tree and implements a shareable locking scheme. It includes: - An interval tree tracking VA ranges, with per-range callbacks - A read/write locking scheme for the interval tree that avoids sleeping in the notifier path (for OOM killer) - A sequence counter based collision-retry locking scheme to tell device page fault that a VA range is being concurrently invalidated. This is based on various ideas: - hmm accumulates invalidated VA ranges and releases them when all invalidates are done, via active_invalidate_ranges count. This approach avoids having to intersect the interval tree twice (as umem_odp does) at the potential cost of a longer device page fault. - kvm/umem_odp use a sequence counter to drive the collision retry, via invalidate_seq - a deferred work todo list on unlock scheme like RTNL, via deferred_list. This makes adding/removing interval tree members more deterministic - seqlock, except this version makes the seqlock idea multi-holder on the write side by protecting it with active_invalidate_ranges and a spinlock To minimize MM overhead when only the interval tree is being used, the entire SRCU and hlist overheads are dropped using some simple branches. Similarly the interval tree overhead is dropped when in hlist mode. The overhead from the mandatory spinlock is broadly the same as most of existing users which already had a lock (or two) of some sort on the invalidation path. Link: https://lore.kernel.org/r/20191112202231.3856-3-jgg@ziepe.ca Acked-by: Christian König Tested-by: Philip Yang Tested-by: Ralph Campbell Reviewed-by: John Hubbard Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe --- mm/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'mm/Kconfig') diff --git a/mm/Kconfig b/mm/Kconfig index a5dae9a7eb51..d0b5046d9aef 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -284,6 +284,7 @@ config VIRT_TO_BUS config MMU_NOTIFIER bool select SRCU + select INTERVAL_TREE config KSM bool "Enable KSM for page merging" -- cgit v1.2.3 From a22dd506400d0f4784ad596f073b9eb5ed7c6a2a Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Tue, 12 Nov 2019 16:22:30 -0400 Subject: mm/hmm: remove hmm_mirror and related MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The only two users of this are now converted to use mmu_interval_notifier, delete all the code and update hmm.rst. Link: https://lore.kernel.org/r/20191112202231.3856-14-jgg@ziepe.ca Reviewed-by: Jérôme Glisse Tested-by: Ralph Campbell Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe --- mm/Kconfig | 1 - 1 file changed, 1 deletion(-) (limited to 'mm/Kconfig') diff --git a/mm/Kconfig b/mm/Kconfig index d0b5046d9aef..e38ff1d5968d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -675,7 +675,6 @@ config DEV_PAGEMAP_OPS config HMM_MIRROR bool depends on MMU - depends on MMU_NOTIFIER config DEVICE_PRIVATE bool "Unaddressable device memory (GPU memory, ...)" -- cgit v1.2.3