summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2013-09-10shrinker: add node awarenessDave Chinner
Pass the node of the current zone being reclaimed to shrink_slab(), allowing the shrinker control nodemask to be set appropriately for node aware shrinkers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10list_lru: remove special case function list_lru_dispose_all.Glauber Costa
The list_lru implementation has one function, list_lru_dispose_all, with only one user (the dentry code). At first, such function appears to make sense because we are really not interested in the result of isolating each dentry separately - all of them are going away anyway. However, it's implementation is buggy in the following way: When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries marking them with DCACHE_SHRINK_LIST. However, this is done without the nlru->lock taken. The imediate result of that is that someone else may add or remove the dentry from the LRU at the same time. When list_lru_del happens in that scenario we will see an element that is not yet marked with DCACHE_SHRINK_LIST (even though it will be in the future) and obviously remove it from an lru where the element no longer is. Since list_lru_dispose_all will in effect count down nlru's nr_items and list_lru_del will do the same, this will lead to an imbalance. The solution for this would not be so simple: we can obviously just keep the lru_lock taken, but then we have no guarantees that we will be able to acquire the dentry lock (dentry->d_lock). To properly solve this, we need a communication mechanism between the lru and dentry code, so they can coordinate this with each other. Such mechanism already exists in the form of the list_lru_walk_cb callback. So it is possible to construct a dcache-side prune function that does the right thing only by calling list_lru_walk in a loop until no more dentries are available. With only one user, plus the fact that a sane solution for the problem would involve boucing between dcache and list_lru anyway, I see little justification to keep the special case list_lru_dispose_all in tree. Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: Michal Hocko <mhocko@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10list_lru: per-node APIGlauber Costa
This patch adapts the list_lru API to accept an optional node argument, to be used by NUMA aware shrinking functions. Code that does not care about the NUMA placement of objects can still call into the very same functions as before. They will simply iterate over all nodes. Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10list_lru: fix broken LRU_RETRY behaviourDave Chinner
The LRU_RETRY code assumes that the list traversal status after we have dropped and regained the list lock. Unfortunately, this is not a valid assumption, and that can lead to racing traversals isolating objects that the other traversal expects to be the next item on the list. This is causing problems with the inode cache shrinker isolation, with races resulting in an inode on a dispose list being "isolated" because a racing traversal still thinks it is on the LRU. The inode is then never reclaimed and that causes hangs if a subsequent lookup on that inode occurs. Fix it by always restarting the list walk on a LRU_RETRY return from the isolate callback. Avoid the possibility of livelocks the current code was trying to avoid by always decrementing the nr_to_walk counter on retries so that even if we keep hitting the same item on the list we'll eventually stop trying to walk and exit out of the situation causing the problem. Reported-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10list_lru: per-node list infrastructureDave Chinner
Now that we have an LRU list API, we can start to enhance the implementation. This splits the single LRU list into per-node lists and locks to enhance scalability. Items are placed on lists according to the node the memory belongs to. To make scanning the lists efficient, also track whether the per-node lists have entries in them in a active nodemask. Note: We use a fixed-size array for the node LRU, this struct can be very big if MAX_NUMNODES is big. If this becomes a problem this is fixable by turning this into a pointer and dynamically allocating this to nr_node_ids. This quantity is firwmare-provided, and still would provide room for all nodes at the cost of a pointer lookup and an extra allocation. Because that allocation will most likely come from a may very well fail. [glommer@openvz.org: fix warnings, added note about node lru] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Greg Thelen <gthelen@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10dcache: convert to use new lru list infrastructureDave Chinner
[glommer@openvz.org: don't reintroduce double decrement of nr_unused_dentries, adapted for new LRU return codes] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10inode: move inode to a different list inside lockGlauber Costa
When removing an element from the lru, this will be done today after the lock is released. This is a clear mistake, although we are not sure if the bugs we are seeing are related to this. All list manipulations are done inside the lock, and so should this one. Signed-off-by: Glauber Costa <glommer@openvz.org> Tested-by: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10inode: convert inode lru list to generic lru list code.Dave Chinner
[glommer@openvz.org: adapted for new LRU return codes] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10list: add a new LRU list typeDave Chinner
Several subsystems use the same construct for LRU lists - a list head, a spin lock and and item count. They also use exactly the same code for adding and removing items from the LRU. Create a generic type for these LRU lists. This is the beginning of generic, node aware LRUs for shrinkers to work with. [glommer@openvz.org: enum defined constants for lru. Suggested by gthelen, don't relock over retry] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Greg Thelen <gthelen@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10shrinker: convert superblock shrinkers to new APIDave Chinner
Convert superblock shrinker to use the new count/scan API, and propagate the API changes through to the filesystem callouts. The filesystem callouts already use a count/scan API, so it's just changing counters to longs to match the VM API. This requires the dentry and inode shrinker callouts to be converted to the count/scan API. This is mainly a mechanical change. [glommer@openvz.org: use mult_frac for fractional proportions, build fixes] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10mm: new shrinker APIDave Chinner
The current shrinker callout API uses an a single shrinker call for multiple functions. To determine the function, a special magical value is passed in a parameter to change the behaviour. This complicates the implementation and return value specification for the different behaviours. Separate the two different behaviours into separate operations, one to return a count of freeable objects in the cache, and another to scan a certain number of objects in the cache for freeing. In defining these new operations, ensure the return values and resultant behaviours are clearly defined and documented. Modify shrink_slab() to use the new API and implement the callouts for all the existing shrinkers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10dcache: remove dentries from LRU before putting on dispose listDave Chinner
One of the big problems with modifying the way the dcache shrinker and LRU implementation works is that the LRU is abused in several ways. One of these is shrink_dentry_list(). Basically, we can move a dentry off the LRU onto a different list without doing any accounting changes, and then use dentry_lru_prune() to remove it from what-ever list it is now on to do the LRU accounting at that point. This makes it -really hard- to change the LRU implementation. The use of the per-sb LRU lock serialises movement of the dentries between the different lists and the removal of them, and this is the only reason that it works. If we want to break up the dentry LRU lock and lists into, say, per-node lists, we remove the only serialisation that allows this lru list/dispose list abuse to work. To make this work effectively, the dispose list has to be isolated from the LRU list - dentries have to be removed from the LRU *before* being placed on the dispose list. This means that the LRU accounting and isolation is completed before disposal is started, and that means we can change the LRU implementation freely in future. This means that dentries *must* be marked with DCACHE_SHRINK_LIST when they are placed on the dispose list so that we don't think that parent dentries found in try_prune_one_dentry() are on the LRU when the are actually on the dispose list. This would result in accounting the dentry to the LRU a second time. Hence dentry_lru_del() has to handle the DCACHE_SHRINK_LIST case Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10dentry: move to per-sb LRU locksDave Chinner
With the dentry LRUs being per-sb structures, there is no real need for a global dentry_lru_lock. The locking can be made more fine-grained by moving to a per-sb LRU lock, isolating the LRU operations of different filesytsems completely from each other. The need for this is independent of any performance consideration that may arise: in the interest of abstracting the lru operations away, it is mandatory that each lru works around its own lock instead of a global lock for all of them. [glommer@openvz.org: updated changelog ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10dcache: convert dentry_stat.nr_unused to per-cpu countersDave Chinner
Before we split up the dcache_lru_lock, the unused dentry counter needs to be made independent of the global dcache_lru_lock. Convert it to per-cpu counters to do this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10super: fix calculation of shrinkable objects for small numbersGlauber Costa
The sysctl knob sysctl_vfs_cache_pressure is used to determine which percentage of the shrinkable objects in our cache we should actively try to shrink. It works great in situations in which we have many objects (at least more than 100), because the aproximation errors will be negligible. But if this is not the case, specially when total_objects < 100, we may end up concluding that we have no objects at all (total / 100 = 0, if total < 100). This is certainly not the biggest killer in the world, but may matter in very low kernel memory situations. Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10fs: bump inode and dentry counters to longGlauber Costa
This series reworks our current object cache shrinking infrastructure in two main ways: * Noticing that a lot of users copy and paste their own version of LRU lists for objects, we put some effort in providing a generic version. It is modeled after the filesystem users: dentries, inodes, and xfs (for various tasks), but we expect that other users could benefit in the near future with little or no modification. Let us know if you have any issues. * The underlying list_lru being proposed automatically and transparently keeps the elements in per-node lists, and is able to manipulate the node lists individually. Given this infrastructure, we are able to modify the up-to-now hammer called shrink_slab to proceed with node-reclaim instead of always searching memory from all over like it has been doing. Per-node lru lists are also expected to lead to less contention in the lru locks on multi-node scans, since we are now no longer fighting for a global lock. The locks usually disappear from the profilers with this change. Although we have no official benchmarks for this version - be our guest to independently evaluate this - earlier versions of this series were performance tested (details at http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no visible performance regressions while yielding a better qualitative behavior in NUMA machines. With this infrastructure in place, we can use the list_lru entry point to provide memcg isolation and per-memcg targeted reclaim. Historically, those two pieces of work have been posted together. This version presents only the infrastructure work, deferring the memcg work for a later time, so we can focus on getting this part tested. You can see more about the history of such work at http://lwn.net/Articles/552769/ Dave Chinner (18): dcache: convert dentry_stat.nr_unused to per-cpu counters dentry: move to per-sb LRU locks dcache: remove dentries from LRU before putting on dispose list mm: new shrinker API shrinker: convert superblock shrinkers to new API list: add a new LRU list type inode: convert inode lru list to generic lru list code. dcache: convert to use new lru list infrastructure list_lru: per-node list infrastructure shrinker: add node awareness fs: convert inode and dentry shrinking to be node aware xfs: convert buftarg LRU to generic code xfs: rework buffer dispose list tracking xfs: convert dquot cache lru to list_lru fs: convert fs shrinkers to new scan/count API drivers: convert shrinkers to new count/scan API shrinker: convert remaining shrinkers to count/scan API shrinker: Kill old ->shrink API. Glauber Costa (7): fs: bump inode and dentry counters to long super: fix calculation of shrinkable objects for small numbers list_lru: per-node API vmscan: per-node deferred work i915: bail out earlier when shrinker cannot acquire mutex hugepage: convert huge zero page shrinker to new shrinker API list_lru: dynamically adjust node arrays This patch: There are situations in very large machines in which we can have a large quantity of dirty inodes, unused dentries, etc. This is particularly true when umounting a filesystem, where eventually since every live object will eventually be discarded. Dave Chinner reported a problem with this while experimenting with the shrinker revamp patchset. So we believe it is time for a change. This patch just moves int to longs. Machines where it matters should have a big long anyway. Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Chinner <dchinner@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10Add missing unlocks to error paths of mountpoint_last.Dave Jones
Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10... and fold the renamed __vfs_follow_link() into its only callerAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10fs: remove vfs_follow_linkChristoph Hellwig
For a long time no filesystem has been using vfs_follow_link, and as seen by recent filesystem submissions any new use is accidental as well. Remove vfs_follow_link, document the replacement in Documentation/filesystems/porting and also rename __vfs_follow_link to match its only caller better. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 3 (of many) from Al Viro: "Waiman's conversion of d_path() and bits related to it, kern_path_mountpoint(), several cleanups and fixes (exportfs one is -stable fodder, IMO). There definitely will be more... ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: split read_seqretry_or_unlock(), convert d_walk() to resulting primitives dcache: Translating dentry into pathname without taking rename_lock autofs4 - fix device ioctl mount lookup introduce kern_path_mountpoint() rename user_path_umountat() to user_path_mountpoint_at() take unlazy_walk() into umount_lookup_last() Kill indirect include of file.h from eventfd.h, use fdget() in cgroup.c prune_super(): sb->s_op is never NULL exportfs: don't assume that ->iterate() won't feed us too long entries afs: get rid of redundant ->d_name.len checks
2013-09-10vfs: make sure we don't have a stale root path if unlazy_walk() failsLinus Torvalds
When I moved the RCU walk termination into unlazy_walk(), I didn't copy quite all of it: for the successful RCU termination we properly add the necessary reference counts to our temporary copy of the root path, but for the failure case we need to make sure that any temporary root path information is cleared out (since it does _not_ have the proper reference counts from the RCU lookup). We could clean up this mess by just always dropping the temporary root information, but Al points out that that would mean that a single lookup through symlinks could see multiple different root entries if it races with another thread doing chroot. Not that I think we should really care (we had that before too, back before we had a copy of the root path in the nameidata). Al says he has a cunning plan. In the meantime, this is the minimal fix for the problem, even if it's not all that pretty. Reported-by: Mace Moneta <moneta.mace@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-09Merge tag 'dmaengine-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine Pull dmaengine update from Dan Williams: "Collection of random updates to the core and some end-driver fixups for ioatdma and mv_xor: - NUMA aware channel allocation - Cleanup dmatest debugfs interface - ioat: make raid-support Atom only - mv_xor: big endian Aside from the top three commits these have all had some soak time in -next. The top commit fixes a recent build breakage. It has been a long while since my last pull request, hopefully it does not show. Thanks to Vinod for keeping an eye on drivers/dma/ this past year" * tag 'dmaengine-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine: dmaengine: dma_sync_wait and dma_find_channel undefined MAINTAINERS: update email for Dan Williams dma: mv_xor: Fix incorrect error path ioatdma: silence GCC warnings dmaengine: make dma_channel_rebalance() NUMA aware dmaengine: make dma_submit_error() return an error code ioatdma: disable RAID on non-Atom platforms and reenable unaligned copies mv_xor: support big endian systems using descriptor swap feature mv_xor: use {readl, writel}_relaxed instead of __raw_{readl, writel} dmatest: print message on debug level in case of no error dmatest: remove IS_ERR_OR_NULL checks of debugfs calls dmatest: make module parameters writable
2013-09-09dmaengine: dma_sync_wait and dma_find_channel undefinedJon Mason
dma_sync_wait and dma_find_channel are declared regardless of whether CONFIG_DMA_ENGINE is enabled, but calling the function without CONFIG_DMA_ENGINE enabled results "undefined reference" errors. To get around this, declare dma_sync_wait and dma_find_channel as inline functions if CONFIG_DMA_ENGINE is undefined. Signed-off-by: Jon Mason <jon.mason@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2013-09-09Merge tag 'late-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC late changes from Kevin Hilman: "These are changes that arrived a little late before the merge window, or had dependencies on previous branches. Highlights: - ux500: misc. cleanup, fixup I2C devices - exynos: DT updates for RTC; PM updates - at91: DT updates for NAND; new platforms added to generic defconfig - sunxi: DT updates: cubieboard2, pinctrl driver, gated clocks - highbank: LPAE fixes, select necessary ARM errata - omap: PM fixes and improvements; OMAP5 mailbox support - omap: basic support for new DRA7xx SoCs" * tag 'late-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (60 commits) ARM: dts: vexpress: Add CCI node to TC2 device-tree ARM: EXYNOS: Skip C1 cpuidle state for exynos5440 ARM: EXYNOS: always enable PM domains support for EXYNOS4X12 ARM: highbank: clean-up some unused includes ARM: sun7i: Enable the A20 clocks in the DTSI ARM: sun6i: Enable clock support in the DTSI ARM: sun5i: dt: Use the A10s gates in the DTSI ARM: at91: at91_dt_defconfig: enable rm9200 support ARM: dts: add ADC device tree node for exynos5420/5250 ARM: dts: Add RTC DT node to Exynos5420 SoC ARM: dts: Update the "status" property of RTC DT node for Exynos5250 SoC ARM: dts: Fix the RTC DT node name for Exynos5250 irqchip: mmp: avoid to include irqs head file ARM: mmp: avoid to include head file in mach-mmp irqchip: mmp: support irqchip irqchip: move mmp irq driver ARM: OMAP: AM33xx: clock: Add RNG clock data ARM: OMAP: TI81XX: add always-on powerdomain for TI81XX ARM: OMAP4: clock: Lock PLLs in the right sequence ARM: OMAP: AM33XX: hwmod: Add hwmod data for debugSS ...
2013-09-09Merge tag 'renesas-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM Renesas SoC cleanup, refactoring and more SMP support from Kevin Hilman: "Lots of cleanup and refactoring and some SMP additions for Renesas platforms. Due to some inter-dependencies with other arm-soc branches, this Renesas stuff was separated out for sending after the other branches were merged. Highlights: - remove unused board support and cleanup of unused headers - refactoring of init and device registration - simplify IRQ initialization" * tag 'renesas-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (68 commits) ARM: shmobile: Per-CPU SMP boot / sleep code for SCU SoCs ARM: shmobile: Introduce per-CPU SMP boot / sleep code ARM: shmobile: Use shared SCU CPU Hotplug code on r8a7779 ARM: shmobile: Use shared SCU CPU Hotplug code on sh73a0 ARM: shmobile: Add shared SCU CPU Hotplug code ARM: shmobile: Use shared SCU SMP boot code on emev2 ARM: shmobile: Use shared SCU SMP boot code on r8a7779 ARM: shmobile: Use shared SCU SMP boot code on sh73a0 ARM: shmobile: Introduce shared SCU SMP boot code ARM: shmobile: sh73a0: Remove global GPIO_NR definition ARM: shmobile: kzm9d: remove nfsroot settings from bootargs ARM: shmobile: armadillo800eva: remove nfsroot settings from bootargs ARM: shmobile: r8a7779: move r8a7779_init_irq_xxx() to setup ARM: shmobile: r8a7740: move r8a7740_init_irq_of() to setup ARM: shmobile: bockw: add missing __initdata ARM: shmobile: r8a7790: add missing __initdata ARM: shmobile: r8a7779: add missing __initdata ARM: shmobile: Remove unused shmobile_init_time() ARM: shmobile: Use clocksource_of_init() on r8a7790 ARM: shmobile: Use default ->init_time() on KZM9G DT ref ...
2013-09-09Merge tag 'drivers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver update from Kevin Hilman: "This contains the ARM SoC related driver updates for v3.12. The only thing this cycle are core PM updates and CPUidle support for ARM's TC2 big.LITTLE development platform" * tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: cpuidle: big.LITTLE: vexpress-TC2 CPU idle driver ARM: vexpress: tc2: disable GIC CPU IF in tc2_pm_suspend drivers: irq-chip: irq-gic: introduce gic_cpu_if_down()
2013-09-09Merge tag 'clk-for-linus-3.12' of git://git.linaro.org/people/mturquette/linuxLinus Torvalds
Pull clock framework changes from Michael Turquette: "The common clk framework changes for 3.12 are dominated by clock driver patches, both new drivers and fixes to existing. A high percentage of these are for Samsung platforms like Exynos. Core framework fixes and some new features like automagical clock re-parenting round out the patches" * tag 'clk-for-linus-3.12' of git://git.linaro.org/people/mturquette/linux: (102 commits) clk: only call get_parent if there is one clk: samsung: exynos5250: Simplify registration of PLL rate tables clk: samsung: exynos4: Register PLL rate tables for Exynos4x12 clk: samsung: exynos4: Register PLL rate tables for Exynos4210 clk: samsung: exynos4: Reorder registration of mout_vpllsrc clk: samsung: pll: Add support for rate configuration of PLL46xx clk: samsung: pll: Use new registration method for PLL46xx clk: samsung: pll: Add support for rate configuration of PLL45xx clk: samsung: pll: Use new registration method for PLL45xx clk: samsung: exynos4: Rename exynos4_plls to exynos4x12_plls clk: samsung: exynos4: Remove checks for DT node clk: samsung: exynos4: Remove unused static clkdev aliases clk: samsung: Modify _get_rate() helper to use __clk_lookup() clk: samsung: exynos4: Use separate aliases for cpufreq related clocks clocksource: samsung_pwm_timer: Get clock from device tree ARM: dts: exynos4: Specify PWM clocks in PWM node pwm: samsung: Update DT bindings documentation to cover clocks clk: Move symbol export to proper location clk: fix new_parent dereference before null check clk: wm831x: Initialise wm831x pointer on init ...
2013-09-09Merge tag 'trace-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Not much changes for the 3.12 merge window. The major tracing changes are still in flux, and will have to wait for 3.13. The changes for 3.12 are mostly clean ups and minor fixes. H Peter Anvin added a check to x86_32 static function tracing that helps a small segment of the kernel community. Oleg Nesterov had a few changes from 3.11, but were mostly clean ups and not worth pushing in the -rc time frame. Li Zefan had small clean up with annotating a raw_init with __init. I fixed a slight race in updating function callbacks, but the race is so small and the bug that happens when it occurs is so minor it's not even worth pushing to stable. The only real enhancement is from Alexander Z Lam that made the tracing_cpumask work for trace buffer instances, instead of them all sharing a global cpumask" * tag 'trace-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/rcu: Do not trace debug_lockdep_rcu_enabled() x86-32, ftrace: Fix static ftrace when early microcode is enabled ftrace: Fix a slight race in modifying what function callback gets traced tracing: Make tracing_cpumask available for all instances tracing: Kill the !CONFIG_MODULES code in trace_events.c tracing: Don't pass file_operations array to event_create_dir() tracing: Kill trace_create_file_ops() and friends tracing/syscalls: Annotate raw_init function with __init
2013-09-09clk: only call get_parent if there is oneAlex Elder
In __clk_init(), after a clock is mostly initialized, a scan is done of the orphan clocks to see if the clock being registered is the parent of any of them. This code assumes that any clock that provides a get_parent method actually has at least one parent, and that's not a valid assumption. As a result, an orphan clock with no parent can return *something* as the parent index, and that value is blindly used to dereference the orphan's parent_names[] array (which will be ZERO_SIZE_PTR or NULL). Fix this by ensuring get_parent is only called for orphans with at least one parent. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-09-09split read_seqretry_or_unlock(), convert d_walk() to resulting primitivesAl Viro
Separate "check if we need to retry" from "unlock if we are done and had seq_writelock"; that allows to use these guys in d_walk(), where we need to recheck every time we ascend back to parent, but do *not* want to unlock until the very end. Lift rcu_read_lock/rcu_read_unlock out into callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-09Merge tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs updates from Ben Myers: "For 3.12-rc1 there are a number of bugfixes in addition to work to ease usage of shared code between libxfs and the kernel, the rest of the work to enable project and group quotas to be used simultaneously, performance optimisations in the log and the CIL, directory entry file type support, fixes for log space reservations, some spelling/grammar cleanups, and the addition of user namespace support. - introduce readahead to log recovery - add directory entry file type support - fix a number of spelling errors in comments - introduce new Q_XGETQSTATV quotactl for project quotas - add USER_NS support - log space reservation rework - CIL optimisations - kernel/userspace libxfs rework" * tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits) xfs: XFS_MOUNT_QUOTA_ALL needed by userspace xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino Fix wrong flag ASSERT in xfs_attr_shortform_getvalue xfs: finish removing IOP_* macros. xfs: inode log reservations are too small xfs: check correct status variable for xfs_inobt_get_rec() call xfs: inode buffers may not be valid during recovery readahead xfs: check LSN ordering for v5 superblocks during recovery xfs: btree block LSN escaping to disk uninitialised XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568 xfs: fix bad dquot buffer size in log recovery readahead xfs: don't account buffer cancellation during log recovery readahead xfs: check for underflow in xfs_iformat_fork() xfs: xfs_dir3_sfe_put_ino can be static xfs: introduce object readahead to log recovery xfs: Simplify xfs_ail_min() with list_first_entry_or_null() xfs: Register hotcpu notifier after initialization xfs: add xfs sb v4 support for dirent filetype field xfs: Add write support for dirent filetype field xfs: Add read-only support for dirent filetype field ...
2013-09-09direct-io: Use return from cmpxchg to decide of assignment happenedOlof Johansson
Not using the return value can in the generic case be racy, so it's in general good practice to check the return value instead. This also resolved the warning caused on ARM and other architectures: fs/direct-io.c: In function 'sb_init_dio_done_wq': fs/direct-io.c:557:2: warning: value computed is not used [-Wunused-value] Signed-off-by: Olof Johansson <olof@lixom.net> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: H Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-09dcache: Translating dentry into pathname without taking rename_lockWaiman Long
When running the AIM7's short workload, Linus' lockref patch eliminated most of the spinlock contention. However, there were still some left: 8.46% reaim [kernel.kallsyms] [k] _raw_spin_lock |--42.21%-- d_path | proc_pid_readlink | SyS_readlinkat | SyS_readlink | system_call | __GI___readlink | |--40.97%-- sys_getcwd | system_call | __getcwd The big one here is the rename_lock (seqlock) contention in d_path() and the getcwd system call. This patch will eliminate the need to take the rename_lock while translating dentries into the full pathnames. The need to take the rename_lock is to make sure that no rename operation can be ongoing while the translation is in progress. However, only one thread can take the rename_lock thus blocking all the other threads that need it even though the translation process won't make any change to the dentries. This patch will replace the writer's write_seqlock/write_sequnlock sequence of the rename_lock of the callers of the prepend_path() and __dentry_path() functions with the reader's read_seqbegin/read_seqretry sequence within these 2 functions. As a result, the code will have to retry if one or more rename operations had been performed. In addition, RCU read lock will be taken during the translation process to make sure that no dentries will go away. To prevent live-lock from happening, the code will switch back to take the rename_lock if read_seqretry() fails for three times. To further reduce spinlock contention, this patch does not take the dentry's d_lock when copying the filename from the dentries. Instead, it treats the name pointer and length as unreliable and just copy the string byte-by-byte over until it hits a null byte or the end of string as specified by the length. This should avoid stepping into invalid memory address. The error cases are left to be handled by the sequence number check. The following code re-factoring are also made: 1. Move prepend('/') into prepend_name() to remove one conditional check. 2. Move the global root check in prepend_path() back to the top of the while loop. With this patch, the _raw_spin_lock will now account for only 1.2% of the total CPU cycles for the short workload. This patch also has the effect of reducing the effect of running perf on its profile since the perf command itself can be a heavy user of the d_path() function depending on the complexity of the workload. When taking the perf profile of the high-systime workload, the amount of spinlock contention contributed by running perf without this patch was about 16%. With this patch, the spinlock contention caused by the running of perf will go away and we will have a more accurate perf profile. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-09Merge tag 'for-linus-20130909' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull mtd updates from David Woodhouse: - factor out common code from MTD tests - nand-gpio cleanup and portability to non-ARM - m25p80 support for 4-byte addressing chips, other new chips - pxa3xx cleanup and support for new platforms - remove obsolete alauda, octagon-5066 drivers - erase/write support for bcm47xxsflash - improve detection of ECC requirements for NAND, controller setup - NFC acceleration support for atmel-nand, read/write via SRAM - etc * tag 'for-linus-20130909' of git://git.infradead.org/linux-mtd: (184 commits) mtd: chips: Add support for PMC SPI Flash chips