summaryrefslogtreecommitdiffstats
path: root/fs/xfs/libxfs/xfs_attr_leaf.c
AgeCommit message (Collapse)Author
2020-11-18xfs: fix forkoff miscalculation related to XFS_LITINO(mp)Gao Xiang
Currently, commit e9e2eae89ddb dropped a (int) decoration from XFS_LITINO(mp), and since sizeof() expression is also involved, the result of XFS_LITINO(mp) is simply as the size_t type (commonly unsigned long). Considering the expression in xfs_attr_shortform_bytesfit(): offset = (XFS_LITINO(mp) - bytes) >> 3; let "bytes" be (int)340, and "XFS_LITINO(mp)" be (unsigned long)336. on 64-bit platform, the expression is offset = ((unsigned long)336 - (int)340) >> 3 = (int)(0xfffffffffffffffcUL >> 3) = -1 but on 32-bit platform, the expression is offset = ((unsigned long)336 - (int)340) >> 3 = (int)(0xfffffffcUL >> 3) = 0x1fffffff instead. so offset becomes a large positive number on 32-bit platform, and cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0. Therefore, one result is "ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));" assertion failure in xfs_idata_realloc(), which was also the root cause of the original bugreport from Dennis, see: https://bugzilla.redhat.com/show_bug.cgi?id=1894177 And it can also be manually triggered with the following commands: $ touch a; $ setfattr -n user.0 -v "`seq 0 80`" a; $ setfattr -n user.1 -v "`seq 0 80`" a on 32-bit platform. Fix the case in xfs_attr_shortform_bytesfit() by bailing out "XFS_LITINO(mp) < bytes" in advance suggested by Eric and a misleading comment together with this bugfix suggested by Darrick. It seems the other users of XFS_LITINO(mp) are not impacted. Fixes: e9e2eae89ddb ("xfs: only check the superblock version for dinode size calculation") Cc: <stable@vger.kernel.org> # 5.7+ Reported-and-tested-by: Dennis Gilmore <dgilmore@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-15xfs: Convert xfs_attr_sf macros to inline functionsCarlos Maiolino
xfs_attr_sf_totsize() requires access to xfs_inode structure, so, once xfs_attr_shortform_addname() is its only user, move it to xfs_attr.c instead of playing with more #includes. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-15xfs: Use variable-size array for nameval in xfs_attr_sf_entryCarlos Maiolino
nameval is a variable-size array, so, define it as it, and remove all the -1 magic number subtractions Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-15xfs: Remove typedef xfs_attr_shortform_tCarlos Maiolino
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-15xfs: remove typedef xfs_attr_sf_entry_tCarlos Maiolino
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-08-27xfs: initialize the shortform attr header padding entryDarrick J. Wong
Don't leak kernel memory contents into the shortform attr fork. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-08-26xfs: fix boundary test in xfs_attr_shortform_verifyEric Sandeen
The boundary test for the fixed-offset parts of xfs_attr_sf_entry in xfs_attr_shortform_verify is off by one, because the variable array at the end is defined as nameval[1] not nameval[]. Hence we need to subtract 1 from the calculation. This can be shown by: # touch file # setfattr -n root.a file and verifications will fail when it's written to disk. This only matters for a last attribute which has a single-byte name and no value, otherwise the combination of namelen & valuelen will push endp further out and this test won't fail. Fixes: 1e1bbd8e7ee06 ("xfs: create structure verifier function for shortform xattrs") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-07-28xfs: Pull up trans roll in xfs_attr3_leaf_clearflagAllison Collins
New delayed allocation routines cannot be handling transactions so pull them out into the calling functions Signed-off-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
2020-07-28xfs: Pull up trans roll from xfs_attr3_leaf_setflagAllison Collins
New delayed allocation routines cannot be handling transactions so pull them up into the calling functions Signed-off-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
2020-07-28xfs: Pull up trans handling in xfs_attr3_leaf_flipflagsAllison Collins
Since delayed operations cannot roll transactions, pull up the transaction handling into the calling function Signed-off-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
2020-07-28xfs: Add xfs_has_attr and subroutinesAllison Collins
This patch adds a new functions to check for the existence of an attribute. Subroutines are also added to handle the cases of leaf blocks, nodes or shortform. Common code that appears in existing attr add and remove functions have been factored out to help reduce the appearance of duplicated code. We will need these routines later for delayed attributes since delayed operations cannot return error codes. Signed-off-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix a leak-on-error bug reported by Dan Carpenter] [darrick: fix unused variable warning reported by 0day] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reported-by: dan.carpenter@oracle.com Reported-by: kernel test robot <lkp@intel.com>
2020-05-27xfs: more lockdep whackamole with kmem_alloc*Darrick J. Wong
Dave Airlie reported the following lockdep complaint: > ====================================================== > WARNING: possible circular locking dependency detected > 5.7.0-0.rc5.20200515git1ae7efb38854.1.fc33.x86_64 #1 Not tainted > ------------------------------------------------------ > kswapd0/159 is trying to acquire lock: > ffff9b38d01a4470 (&xfs_nondir_ilock_class){++++}-{3:3}, > at: xfs_ilock+0xde/0x2c0 [xfs] > > but task is already holding lock: > ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at: > __fs_reclaim_acquire+0x5/0x30 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (fs_reclaim){+.+.}-{0:0}: > fs_reclaim_acquire+0x34/0x40 > __kmalloc+0x4f/0x270 > kmem_alloc+0x93/0x1d0 [xfs] > kmem_alloc_large+0x4c/0x130 [xfs] > xfs_attr_copy_value+0x74/0xa0 [xfs] > xfs_attr_get+0x9d/0xc0 [xfs] > xfs_get_acl+0xb6/0x200 [xfs] > get_acl+0x81/0x160 > posix_acl_xattr_get+0x3f/0xd0 > vfs_getxattr+0x148/0x170 > getxattr+0xa7/0x240 > path_getxattr+0x52/0x80 > do_syscall_64+0x5c/0xa0 > entry_SYSCALL_64_after_hwframe+0x49/0xb3 > > -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}: > __lock_acquire+0x1257/0x20d0 > lock_acquire+0xb0/0x310 > down_write_nested+0x49/0x120 > xfs_ilock+0xde/0x2c0 [xfs] > xfs_reclaim_inode+0x3f/0x400 [xfs] > xfs_reclaim_inodes_ag+0x20b/0x410 [xfs] > xfs_reclaim_inodes_nr+0x31/0x40 [xfs] > super_cache_scan+0x190/0x1e0 > do_shrink_slab+0x184/0x420 > shrink_slab+0x182/0x290 > shrink_node+0x174/0x680 > balance_pgdat+0x2d0/0x5f0 > kswapd+0x21f/0x510 > kthread+0x131/0x150 > ret_from_fork+0x3a/0x50 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(fs_reclaim); > lock(&xfs_nondir_ilock_class); > lock(fs_reclaim); > lock(&xfs_nondir_ilock_class); > > *** DEADLOCK *** > > 4 locks held by kswapd0/159: > #0: ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at: > __fs_reclaim_acquire+0x5/0x30 > #1: ffffffffbbb7cef8 (shrinker_rwsem){++++}-{3:3}, at: > shrink_slab+0x115/0x290 > #2: ffff9b39f07a50e8 > (&type->s_umount_key#56){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0 > #3: ffff9b39f077f258 > (&pag->pag_ici_reclaim_lock){+.+.}-{3:3}, at: > xfs_reclaim_inodes_ag+0x82/0x410 [xfs] This is a known false positive because inodes cannot simultaneously be getting reclaimed and the target of a getxattr operation, but lockdep doesn't know that. We can (selectively) shut up lockdep until either it gets smarter or we change inode reclaim not to require the ILOCK by applying a stupid GFP_NOLOCKDEP bandaid. Reported-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Dave Airlie <airlied@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-05-19xfs: cleanup xfs_idestroy_forkChristoph Hellwig
Move freeing the dynamically allocated attr and COW fork, as well as zeroing the pointers where actually needed into the callers, and just pass the xfs_ifork structure to xfs_idestroy_fork. Also simplify the kmem_free calls by not checking for NULL first. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19xfs: move the fork format fields into struct xfs_iforkChristoph Hellwig
Both the data and attr fork have a format that is stored in the legacy idinode. Move it into the xfs_ifork structure instead, where it uses up padding. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19xfs: move the per-fork nextents fields into struct xfs_iforkChristoph Hellwig
There are there are three extents counters per inode, one for each of the forks. Two are in the legacy icdinode and one is directly in struct xfs_inode. Switch to a single counter in the xfs_ifork structure where it uses up padding at the end of the structure. This simplifies various bits of code that just wants the number of extents counter and can now directly dereference it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19xfs: don't fail verifier on empty attr3 leaf blockBrian Foster
The attr fork can transition from shortform to leaf format while empty if the first xattr doesn't fit in shortform. While this empty leaf block state is intended to be transient, it is technically not due to the transactional implementation of the xattr set operation. We historically have a couple of bandaids to work around this problem. The first is to hold the buffer after the format conversion to prevent premature writeback of the empty leaf buffer and the second is to bypass the xattr count check in the verifier during recovery. The latter assumes that the xattr set is also in the log and will be recovered into the buffer soon after the empty leaf buffer is reconstructed. This is not guaranteed, however. If the filesystem crashes after the format conversion but before the xattr set that induced it, only the format conversion may exist in the log. When recovered, this creates a latent corrupted state on the inode as any subsequent attempts to read the buffer fail due to verifier failure. This includes further attempts to set xattrs on the inode or attempts to destroy the attr fork, which prevents the inode from ever being removed from the unlinked list. To avoid this condition, accept that an empty attr leaf block is a valid state and remove the count check from the verifier. This means that on rare occasions an attr fork might exist in an unexpected state, but is otherwise consistent and functional. Note that we retain the logic to avoid racing with metadata writeback to reduce the window where this can occur. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-19xfs: only check the superblock version for dinode size calculationChristoph Hellwig
The size of the dinode structure is only dependent on the file system version, so instead of checking the individual inode version just use the newly added xfs_sb_version_has_large_dinode helper, and simplify various calling conventions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-12xfs: add a function to deal with corrupt buffers post-verifiersDarrick J. Wong
Add a helper function to get rid of buffers that we have decided are corrupt after the verifiers have run. This function is intended to handle metadata checks that can't happen in the verifiers, such as inter-block relationship checking. Note that we now mark the buffer stale so that it will not end up on any LRU and will be purged on release. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-02xfs: remove XFS_DA_OP_INCOMPLETEChristoph Hellwig
Now that we use the on-disk flags field also for the interface to the lower level attr routines we can use the XFS_ATTR_INCOMPLETE definition from the on-disk format directly instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: clean up the attr flag confusionChristoph Hellwig
The ATTR_* flags have a long IRIX history, where they a userspace interface, the on-disk format and an internal interface. We've split out the on-disk interface to the XFS_ATTR_* values, but despite (or because?) of that the flag have still been a mess. Switch the internal interface to pass the on-disk XFS_ATTR_* flags for the namespace and the Linux XATTR_* flags for the actual flags instead. The ATTR_* values that are actually used are move to xfs_fs.h with a new XFS_IOC_* prefix to not conflict with the userspace version that has the same name and must have the same value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: factor out a xfs_attr_match helperChristoph Hellwig
Factor out a helper that compares an on-disk attr vs the name, length and flags specified in struct xfs_da_args. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove ATTR_ALLOC and XFS_DA_OP_ALLOCVALChristoph Hellwig
Use a NULL args->value as the indicator to lazily allocate a buffer instead, and let the caller always free args->value instead of duplicating the cleanup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove ATTR_KERNOVALChristoph Hellwig
We can just pass down the Linux convention of a zero valuelen to just query for the existance of an attribute to the low-level code instead. The use in the legacy xfs_attr_list code only used by the ioctl interface was already dead code, as the callers check that the flag is not present. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-09xfs: fix misuse of the XFS_ATTR_INCOMPLETE flagChristoph Hellwig
XFS_ATTR_INCOMPLETE is a flag in the on-disk attribute format, and thus in a different namespace as the ATTR_* flags in xfs_da_args.flags. Switch to using a XFS_DA_OP_INCOMPLETE flag in op_flags instead. Without this users might be able to inject this flag into operations using the attr by handle ioctl. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22xfs: remove the mappedbno argument to xfs_da_get_bufChristoph Hellwig
Use the xfs_da_get_buf_daddr function directly for the two callers that pass a mapped disk address, and then remove the mappedbno argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22xfs: remove the mappedbno argument to xfs_da_read_bufChristoph Hellwig
Move the code for reading an already mapped block into xfs_da3_node_read_mapped, which is the only caller ever passing a block number in the mappedbno argument and replace the mappedbno argument with the simple xfs_dabuf_get flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22xfs: remove the mappedbno argument to xfs_attr3_leaf_readChristoph Hellwig
This argument is always hard coded to -1, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-15xfs: fix attr leaf header freemap.size underflowBrian Foster
The leaf format xattr addition helper xfs_attr3_leaf_add_work() adjusts the block freemap in a couple places. The first update drops the size of the freemap that the caller had already selected to place the xattr name/value data. Before the function returns, it also checks whether the entries array has encroached on a freemap range by virtue of the new entry addition. This is necessary because the entries array grows from the start of the block (but end of the block header) towards the end of the block while the name/value data grows from the end of the block in the opposite direction. If the associated freemap is already empty, however, size is zero and the subtraction underflows the field and causes corruption. This is reproduced rarely by generic/070. The observed behavior is that a smaller sized freemap is aligned to the end of the entries list, several subsequent xattr additions land in larger freemaps and the entries list expands into the smaller freemap until it is fully consumed and then underflows. Note that it is not otherwise a corruption for the entries array to consume an empty freemap because the nameval list (i.e. the firstused pointer in the xattr header) starts beyond the end of the corrupted freemap. Update the freemap size modification to account for the fact that the freemap entry can be empty and thus stale. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-10xfs: add a btree entries pointer to struct xfs_da3_icnode_hdrChristoph Hellwig
All but two callers of the ->node_tree_p dir operation already have a xfs_da3_icnode_hdr from a previous call to xfs_da3_node_hdr_from_disk at hand. Add a pointer to the btree entries to struct xfs_da3_icnode_hdr to clean up this pattern. The two remaining callers now expand the whole header as well, but that isn't very expensive and not in a super hot path anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-10xfs: devirtualize ->node_hdr_to_diskChristoph Hellwig
Replace the ->node_hdr_to_disk dir ops method with a directly called xfs_da_node_hdr_to_disk helper that takes care of the v4 vs v5 difference. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-10xfs: devirtualize ->node_hdr_from_diskChristoph Hellwig
Replace the ->node_hdr_from_disk dir ops method with a directly called xfs_da_node_hdr_from_disk helper that takes care of the v4 vs v5 difference. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-10xfs: Correct comment tyops -> typosJoe Perches
Just fix the typos checkpatch notices... Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-04xfs: always log corruption errorsDarrick J. Wong
Make sure we log something to dmesg whenever we return -EFSCORRUPTED up the call stack. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-10-29xfs: check attribute leaf block structureDarrick J. Wong
Add missing structure checks in the attribute leaf verifier. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21xfs: fix inode fork extent count overflowDave Chinner
[commit message is verbose for discussion purposes - will trim it down later. Some questions about implementation details at the end.] Zorro Lang recently ran a new test to stress single inode extent counts now that they are no longer limited by memory allocation. The test was simply: # xfs_io -f -c "falloc 0 40t" /mnt/scratch/big-file # ~/src/xfstests-dev/punch-alternating /mnt/scratch/big-file This test uncovered a problem where the hole punching operation appeared to finish with no error, but apparently only created 268M extents instead of the 10 billion it was supposed to. Further, trying to punch out extents that should have been present resulted in success, but no change in the extent count. It looked like a silent failure. While running the test and observing the behaviour in real time, I observed the extent coutn growing at ~2M extents/minute, and saw this after about an hour: # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next ; \ > sleep 60 ; \ > xfs_io -f -c "stat" /mnt/scratch/big-file |grep next fsxattr.nextents = 127657993 fsxattr.nextents = 129683339 # And a few minutes later this: # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next fsxattr.nextents = 4177861124 # Ah, what? Where did that 4 billion extra extents suddenly come from? Stop the workload, unmount, mount: # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next fsxattr.nextents = 166044375 # And it's back at the expected number. i.e. the extent count is correct on disk, but it's screwed up in memory. I loaded up the extent list, and immediately: # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next fsxattr.nextents = 4192576215 # It's bad again. So, where does that number come from? xfs_fill_fsxattr(): if (ip->i_df.if_flags & XFS_IFEXTENTS) fa->fsx_nextents = xfs_iext_count(&ip->i_df); else fa->fsx_nextents = ip->i_d.di_nextents; And that's the behaviour I just saw in a nutshell. The on disk count is correct, but once the tree is loaded into memory, it goes whacky. Clearly there's something wrong with xfs_iext_count(): inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp) { return ifp->if_bytes / sizeof(struct xfs_iext_rec); } Simple enough, but 134M extents is 2**27, and that's right about where things went wrong. A struct xfs_iext_rec is 16 bytes in size, which means 2**27 * 2**4 = 2**31 and we're right on target for an integer overflow. And, sure enough: struct xfs_ifork { int if_bytes; /* bytes in if_u1 */ .... Once we get 2**27 extents in a file, we overflow if_bytes and the in-core extent count goes wrong. And when we reach 2**28 extents, if_bytes wraps back to zero and things really start to go wrong there. This is where the silent failure comes from - only the first 2**28 extents can be looked up directly due to the overflow, all the extents above this index wrap back to somewhere in the first 2**28 extents. Hence with a regular pattern, trying to punch a hole in the range that didn't have holes mapped to a hole in the first 2**28 extents and so "succeeded" without changing anything. Hence "silent failure"... Fix this by converting if_bytes to a int64_t and converting all the index variables and size calculations to use int64_t types to avoid overflows in future. Signed integers are still used to enable easy detection of extent count underflows. This enables scalability of extent counts to the limits of the on-disk format - MAXEXTNUM (2**31) extents. Current testing is at over 500M extents and still going: fsxattr.nextents = 517310478 Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-09xfs: move local to extent inode logging into bmap helperBrian Foster
The callers of xfs_bmap_local_to_extents_empty() log the inode external to the function, yet this function is where the on-disk format value is updated. Push the inode logging down into the function itself to help prevent future mistakes. Note that internal bmap callers track the inode logging flags independently and thus may log the inode core twice due to this change. This is harmless, so leave this code around for consistency with the other attr fork conversion functions. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-09xfs: remove broken error handling on failed attr sf to leaf changeBrian Foster
xfs_attr_shortform_to_leaf() attempts to put the shortform fork back together after a failed attempt to convert from shortform to leaf format. While this code reallocates and copies back the shortform attr fork data, it never resets the inode format field back to local format. Further, now that the inode is properly logged after the initial switch from local format, any error that triggers the recovery code will eventually abort the transaction and shutdown the fs. Therefore, remove the broken and unnecessary error handling code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-09xfs: log the inode on directory sf to block format changeBrian Foster
When a directory changes from shortform (sf) to block format, the sf format is copied to a temporary buffer, the inode format is modified and the updated format filled with the dentries from the temporary buffer. If the inode format is modified and attempt to grow the inode fails (due to I/O error, for example), it is possible to return an error while leaving the directory in an inconsistent state and with an otherwise clean transaction. This results in corruption of the associated directory and leads to xfs_dabuf_map() errors as subsequent lookups cannot accurately determine the format of the directory. This problem is reproduced occasionally by generic/475. The fundamental problem is that xfs_dir2_sf_to_block() changes the on-disk inode format without logging the inode. The inode is eventually logged by the bmapi layer in the common case, but error checking introduces the possibility of failing the high level request before this happens. Update both of the dir2 and attr callers of xfs_bmap_local_to_extents_empty() to log the inode core as consistent with the bmap local to extent format change codepath. This ensures that any subsequent errors after the format has changed cause the transaction to abort. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: allocate xattr buffer on demandDave Chinner
When doing file lookups and checking for permissions, we end up in xfs_get_acl() to see if there are any ACLs on the inode. This requires and xattr lookup, and to do that we have to supply a buffer large enough to hold an maximum sized xattr. On workloads were we are accessing a wide range of cache cold files under memory pressure (e.g. NFS fileservers) we end up spending a lot of time allocating the buffer. The buffer is 64k in length, so is a contiguous multi-page allocation, and if that then fails we fall back to vmalloc(). Hence the allocation here is /expensive/ when we are looking up hundreds of thousands of files a second. Initial numbers from a bpf trace show average time in xfs_get_acl() is ~32us, with ~19us of that in the memory allocation. Note these are average times, so there are going to be affected by the worst case allocations more than the common fast case... To avoid this, we could just do a "null" lookup to see if the ACL xattr exists and then only do the allocation if it exists. This, however, optimises the path for the "no ACL present" case at the expense of the "acl present" case. i.e. we can halve the time in xfs_get_acl() for the no acl case (i.e down to ~10-15us), but that then increases the ACL case by 30% (i.e. up to 40-45us). To solve this and speed up both cases, drive the xattr buffer allocation into the attribute code once we know what the actual xattr length is. For the no-xattr case, we avoid the allocation completely, speeding up that case. For the common ACL case, we'll end up with a fast heap allocation (because it'll be smaller than a page), and only for the rarer "we have a remote xattr" will we have a multi-page allocation occur. Hence the common ACL case will be much faster, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: consolidate attribute value copyingDave Chinner
The same code is used to copy do the attribute copying in three different places. Consolidate them into a single function in preparation from on-demand buffer allocation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: move remote attr retrieval into xfs_attr3_leaf_getvalueDave Chinner
Because we repeat exactly the same code to get the remote attribute value after both calls to xfs_attr3_leaf_getvalue() if it's a remote attr. Just do it in xfs_attr3_leaf_getvalue() so the callers don't have to care about it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: remove unnecessary indenting from xfs_attr3_leaf_getvalueDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: make attr lookup returns consistentDave Chinner
Shortform, leaf and remote value attr value retrieval return different values for success. This makes it more complex to handle actual errors xfs_attr_get() as some errors mean success and some mean failure. Make the return values consistent for success and failure consistent for all attribute formats. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-26fs: xfs: Remove KM_NOSLEEP and KM_SLEEP.Tetsuo Handa
Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP, we can remove KM_NOSLEEP and replace KM_SLEEP with 0. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28xfs: remove unused header filesEric Sandeen
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28xfs: add struct xfs_mount pointer to struct xfs_bufChristoph Hellwig
We need to derive the mount pointer from a buffer in a lot of place. Add a direct pointer to short cut the pointer chasing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-18xfs: fix xfs_buf magic number endian checksDarrick J. Wong
Create a separate magic16 check function so that we don't run afoul of static checkers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-11xfs: factor xfs_da3_blkinfo verification into common helperBrian Foster
With the verifier magic value helper in place, we've left a bit more duplicate code across the verifiers that involve struct xfs_da3_blkinfo. This includes the da node, xattr leaf and dir leaf verifiers, all of which perform similar checks for v4 and v5 filesystems. Create a common helper to verify an xfs_da3_blkinfo structure, taking care to only access v5 fields where appropriate, and refactor the aforementioned verifiers to use the helper. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11xfs: miscellaneous verifier magic value fixupsBrian Foster
Most buffer verifiers have hardcoded magic value checks conditionalized on the version of the filesystem. The magic value field of the verifier structure facilitates abstraction of some of this code. Populate the ->magic field of various verifiers to take advantage of this abstraction. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-11xfs: always check magic values in on-disk byte orderBrian Foster
Most verifiers that check on-disk magic values convert the CPU endian magic value constant to disk endian to facilitate compile time optimization of the byte swap and reduce the need for runtime byte swaps in buffer verifiers. Several buffer verifiers do not follow this pattern. Update those verifiers for consistency. Also fix up a random typo in the inode readahead verifier name. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>