summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2015-02-02nfsd: implement pNFS operationsChristoph Hellwig
Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage outstanding layouts and devices. Layout management is very straight forward, with a nfs4_layout_stateid structure that extends nfs4_stid to manage layout stateids as the top-level structure. It is linked into the nfs4_file and nfs4_client structures like the other stateids, and contains a linked list of layouts that hang of the stateid. The actual layout operations are implemented in layout drivers that are not part of this commit, but will be added later. The worst part of this commit is the management of the pNFS device IDs, which suffers from a specification that is not sanely implementable due to the fact that the device-IDs are global and not bound to an export, and have a small enough size so that we can't store the fsid portion of a file handle, and must never be reused. As we still do need perform all export authentication and validation checks on a device ID passed to GETDEVICEINFO we are caught between a rock and a hard place. To work around this issue we add a new hash that maps from a 64-bit integer to a fsid so that we can look up the export to authenticate against it, a 32-bit integer as a generation that we can bump when changing the device, and a currently unused 32-bit integer that could be used in the future to handle more than a single device per export. Entries in this hash table are never deleted as we can't reuse the ids anyway, and would have a severe lifetime problem anyway as Linux export structures are temporary structures that can go away under load. Parts of the XDR data, structures and marshaling/unmarshaling code, as well as many concepts are derived from the old pNFS server implementation from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman, Mike Sager, Ricardo Labiaga and many others. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02nfsd: make find_any_file available outside nfs4state.cChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02nfsd: make find/get/put file available outside nfs4state.cChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02nfsd: make lookup/alloc/unhash_stid available outside nfs4state.cChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02nfsd: add fh_fsid_match helperChristoph Hellwig
Add a helper to check that the fsid parts of two file handles match. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02nfsd: move nfsd_fh_match to nfsfh.hChristoph Hellwig
The pnfs code will need it too. Also remove the nfsd_ prefix to match the other filehandle helpers in that file. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02fs: add FL_LAYOUT lease typeChristoph Hellwig
This (ab-)uses the file locking code to allow filesystems to recall outstanding pNFS layouts on a file. This new lease type is similar but not quite the same as FL_DELEG. A FL_LAYOUT lease can always be granted, an a per-filesystem lock (XFS iolock for the initial implementation) ensures not FL_LAYOUT leases granted when we would need to recall them. Also included are changes that allow multiple outstanding read leases of different types on the same file as long as they have a differnt owner. This wasn't a problem until now as nfsd never set FL_LEASE leases, and no one else used FL_DELEG leases, but given that nfsd will also issues FL_LAYOUT leases we will have to handle it now. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02fs: track fl_owner for leasesChristoph Hellwig
Just like for other lock types we should allow different owners to have a read lease on a file. Currently this can't happen, but with the addition of pNFS layout leases we'll need this feature. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02Merge branch 'locks-3.20' of git://git.samba.org/jlayton/linux into for-3.20J. Bruce Fields
Christoph's block pnfs patches have some minor dependencies on these lock patches.
2015-01-23nfsd: factor out a helper to decode nfstime4 valuesChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-23sunrpc/lockd: fix references to the BKLJeff Layton
The BKL is completely out of the picture in the lockd and sunrpc code these days. Update the antiquated comments that refer to it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-23nfsd: fix year-2038 nfs4 state problemJ. Bruce Fields
Someone with a weird time_t happened to notice this, it shouldn't really manifest till 2038. It may not be our ownly year-2038 problem. Reported-by: Aaron Pace <Aaron.Pace@alcatel-lucent.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-21locks: update comments that refer to inode->i_flockJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-16locks: consolidate NULL i_flctx checks in locks_remove_fileJeff Layton
We have each of the locks_remove_* variants doing this individually. Have the caller do it instead, and have locks_remove_flock and locks_remove_lease just assume that it's a valid pointer. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-16locks: keep a count of locks on the flctx listsJeff Layton
This makes things a bit more efficient in the cifs and ceph lock pushing code. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: clean up the lm_change prototypeJeff Layton
Now that we use standard list_heads for tracking leases, we can have lm_change take a pointer to the lease to be modified instead of a double pointer. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: add a dedicated spinlock to protect i_flctx listsJeff Layton
We can now add a dedicated spinlock without expanding struct inode. Change to using that to protect the various i_flctx lists. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: convert lease handling to file_lock_contextJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: convert posix locks to file_lock_contextJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: move flock locks to file_lock_contextJeff Layton
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locksJeff Layton
There is only a single call site for each of these functions, and the caller takes the i_lock prior to calling them and drops it just afterward. Move the spinlocking into the functions instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: add a new struct file_locking_context pointer to struct inodeJeff Layton
The current scheme of using the i_flock list is really difficult to manage. There is also a legitimate desire for a per-inode spinlock to manage these lists that isn't the i_lock. Start conversion to a new scheme to eventually replace the old i_flock list with a new "file_lock_context" object. We start by adding a new i_flctx to struct inode. For now, it lives in parallel with i_flock list, but will eventually replace it. The idea is to allocate a structure to sit in that pointer and act as a locus for all things file locking. We allocate a file_lock_context for an inode when the first lock is added to it, and it's only freed when the inode is freed. We use the i_lock to protect the assignment, but afterward it should mostly be accessed locklessly. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16locks: have locks_release_file use flock_lock_file to release generic flock ↵Jeff Layton
locks ...instead of open-coding it and removing flock locks directly. This helps consolidate the flock lock removal logic into a single spot. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-16locks: add new struct list_head to struct file_lockJeff Layton
...that we can use to queue file_locks to per-ctx list_heads. Go ahead and convert locks_delete_lock and locks_dispose_list to use it instead of the fl_block list. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This fixes a regression in the latest fuse update plus a fix for a rather theoretical memory ordering issue" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add memory barrier to INIT fuse: fix LOOKUP vs INIT compat handling
2015-01-15nfsd: nfs4state: Remove unused functionRickard Strandqvist
Remove the function renew_client() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15lockd: xdr: Remove unused functionRickard Strandqvist
Remove the function nlm_encode_fh() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-13locks: fix NULL-deref in generic_delete_leaseNeilBrown
commit 0efaa7e82f02fe69c05ad28e905f31fc86e6f08e locks: generic_delete_lease doesn't need a file_lock at all moves the call to fl->fl_lmops->lm_change() to a place in the code where fl might be a non-lease lock. When that happens, fl_lmops is NULL and an Oops ensures. So add an extra test to restore correct functioning. Reported-by: Linda Walsh <suse@tlinx.org> Link: https://bugzilla.suse.com/show_bug.cgi?id=912569 Cc: stable@vger.kernel.org (v3.18) Fixes: 0efaa7e82f02fe69c05ad28e905f31fc86e6f08e Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-11Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: group scheduling corner case fix, two deadline scheduler fixes, effective_load() overflow fix, nested sleep fix, 6144 CPUs system fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group() sched/deadline: Avoid double-accounting in case of missed deadlines sched/deadline: Fix migration of SCHED_DEADLINE tasks sched: Fix odd values in effective_load() calculations sched, fanotify: Deal with nested sleeps sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation
2015-01-09Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull two nfsd bugfixes from Bruce Fields. * 'for-3.19' of git://linux-nfs.org/~bfields/linux: rpc: fix xdr_truncate_encode to handle buffer ending on page boundary nfsd: fix fi_delegees leak when fi_had_conflict returns true
2015-01-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull two Ceph fixes from Sage Weil: "These are both pretty trivial: a sparse warning fix and size_t printk thing" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix sparse endianness warnings ceph: use %zu for len in ceph_fill_inline_data()
2015-01-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "None of these are huge, but my commit does fix a regression from 3.18 that could cause lost files during log replay. This also adds Dave Sterba to the list of Btrfs maintainers. It doesn't mean we're doing things differently, but Dave has really been helping with the maintainer workload for years" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: don't delay inode ref updates during log replay Btrfs: correctly get tree level in tree_backref_for_extent Btrfs: call inode_dec_link_count() on mkdir error path Btrfs: abort transaction if we don't find the block group Btrfs, scrub: uninitialized variable in scrub_extent_for_parity() Btrfs: add more maintainers
2015-01-09sched, fanotify: Deal with nested sleepsPeter Zijlstra
As per e23738a7300a ("sched, inotify: Deal with nested sleeps"). fanotify_read is a wait loop with sleeps in. Wait loops rely on task_struct::state and sleeps do too, since that's the only means of actually sleeping. Therefore the nested sleeps destroy the wait loop state and the wait loop breaks the sleep functions that assume TASK_RUNNING (mutex_lock). Fix this by using the new woken_wake_function and wait_woken() stuff, which registers wakeups in wait and thereby allows shrinking the task_state::state changes to the actual sleep part. Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Takashi Iwai <tiwai@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Paris <eparis@redhat.com> Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-08vfs: renumber FMODE_NONOTIFY and add to uniqueness checkDavid Drysdale
Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc. The clashing O_PATH value was added in commit 5229645bdc35 ("vfs: add nonconflicting values for O_PATH") but this can't be changed as it is user-visible. FMODE_NONOTIFY is only used internally in the kernel, but it is in the same numbering space as the other O_* flags, as indicated by the comment at the top of include/uapi/asm-generic/fcntl.h (and its use in fs/notify/fanotify/fanotify_user.c). So renumber it to avoid the clash. All of this has happened before (commit 12ed2e36c98a: "fanotify: FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will happen again -- so update the uniqueness check in fcntl_init() to include __FMODE_NONOTIFY. Signed-off-by: David Drysdale <drysdale@google.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jan Kara <jack@suse.cz> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when ↵Xue jiufei
link file In ocfs2_link(), the parent directory inode passed to function ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent of new_dentry not old_dentry. We should get old_dir from old_dentry and lookup old_dentry in old_dir in case another node remove the old dentry. With this change, hard linking works again, when paths are relative with at least one subdirectory. This is how the problem was reproducable: # mkdir a # mkdir b # touch a/test # ln a/test b/test ln: failed to create hard link `b/test' => `a/test': No such file or directory However when creating links in the same dir, it worked well. Now the link gets created. Fixes: 0e048316ff57 ("ocfs2: check existence of old dentry in ocfs2_link()") Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reported-by: Szabo Aron - UBIT <aron@ubit.hu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Tested-by: Aron Szabo <aron@ubit.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08ocfs2: remove bogus check in dlm_process_recovery_dataJoseph Qi
In dlm_process_recovery_data, only when dlm_new_lock failed the ret will be set to -ENOMEM. And in this case, newlock is definitely NULL. So test newlock is meaningless, remove it. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08ceph: use %zu for len in ceph_fill_inline_data()Ilya Dryomov
len is size_t, should be printed with %zu. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-07nfsd4: tweak rd_dircount accountingJ. Bruce Fields
RFC 3530 14.2.24 says This value represents the length of the names of the directory entries and the cookie value for these entries. This length represents the XDR encoding of the data (names and cookies)... The "xdr encoding" of the name should probably include the 4 bytes for the length. But this is all just a hint so not worth e.g. backporting to stable. Also reshuffle some lines to more clearly group together the dircount-related code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07nfsd: fi_delegees doesn't need to be an atomic_tJeff Layton
fi_delegees is always handled under the fi_lock, so there's no need to use an atomic_t for this field. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07nfsd: fix fi_delegees leak when fi_had_conflict returns trueJeff Layton
Currently, nfs4_set_delegation takes a reference to an existing delegation and then checks to see if there is a conflict. If there is one, then it doesn't release that reference. Change the code to take the reference after the check and only if there is no conflict. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-06Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Revert a potential seek_data/hole regression which shows up when using ext4 to handle ext3 file systems, plus two minor bug fixes" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: remove spurious KERN_INFO from ext4_warning call Revert "ext4: fix suboptimal seek_{data,hole} extents traversial" ext4: prevent online resize with backup superblock
2015-01-06fuse: add memory barrier to INITMiklos Szeredi
Theoretically we need to order setting of various fields in fc with fc->initialized. No known bug reports related to this yet. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-01-06fuse: fix LOOKUP vs INIT compat handlingMiklos Szeredi
Analysis from Marc: "Commit 7078187a795f ("fuse: introduce fuse_simple_request() helper") from the above pull request triggers some EIO errors for me in some tests that rely on fuse Looking at the code changes and a bit of debugging info I think there's a general problem here that fuse_get_req checks and possibly waits for fc->initialized, and this was always called first. But this commit changes the ordering and in many places fc->minor is now possibly used before fuse_get_req, and we can't be sure that fc has been initialized. In my case fuse_lookup_init sets req->out.args[0].size to the wrong size because fc->minor at that point is still 0, leading to the EIO error." Fix by moving the compat adjustments into fuse_simple_request() to after fuse_get_req(). This is also more readable than the original, since now compatibility is handled in a single function instead of cluttering each operation. Reported-by: Marc Dionne <marc.c.dionne@gmail.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Fixes: 7078187a795f ("fuse: introduce fuse_simple_request() helper")
2015-01-02ext4: remove spurious KERN_INFO from ext4_warning callJakub Wilk
Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-01-02Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"Theodore Ts'o
This reverts commit 14516bb7bb6ffbd49f35389f9ece3b2045ba5815. This was causing regression test failures with generic/285 with an ext3 filesystem using CONFIG_EXT4_USE_FOR_EXT23. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-01-02Btrfs: don't delay inode ref updates during log replayChris Mason
Commit 1d52c78afbb (Btrfs: try not to ENOSPC on log replay) added a check to skip delayed inode updates during log replay because it confuses the enospc code. But the delayed processing will end up ignoring delayed refs from log replay because the inode itself wasn't put through the delayed code. This can end up triggering a warning at commit time: WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34() Which is repeated for each commit because we never process the delayed inode ref update. The fix used here is to change btrfs_delayed_delete_inode_ref to return an error if we're currently in log replay. The caller will do the ref deletion immediately and everything will work properly. Signed-off-by: Chris Mason <clm@fb.com> cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78afbbf80b58299e076a159617d6b42fe3c
2015-01-02Btrfs: correctly get tree level in tree_backref_for_extentFilipe Manana
If we are using skinny metadata, the block's tree level is in the offset of the key and not in a btrfs_tree_block_info structure following the extent item (it doesn't exist). Therefore fix it. Besides returning the correct level in the tree, this also prevents reading past the leaf's end in the case where the extent item is the last item in the leaf (eb) and it has only 1 inline reference - this is because sizeof(struct btrfs_tree_block_info) is greater than sizeof(struct btrfs_extent_inline_ref). Got it while running a scrub which produced the following warning: BTRFS: checksum error at logical 42123264 on dev /dev/sde, sector 15840: metadata node (level 24) in tree 5 Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02Btrfs: call inode_dec_link_count() on mkdir error pathWang Shilong
In btrfs_mkdir(), if it fails to create dir, we should clean up existed items, setting inode's link properly to make sure it could be cleaned up properly. Signed-off-by: Wang Shilong <wangshilong1991@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02Btrfs: abort transaction if we don't find the block groupJosef Bacik
We shouldn't BUG_ON() if there is corruption. I hit this while testing my block group patch and the abort worked properly. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()Dan Carpenter
The only way that "ret" is set is when we call scrub_pages_for_parity() so the skip to "if (ret) " test doesn't make sense and causes a static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>