summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/qgroup.c
AgeCommit message (Collapse)Author
2020-12-08btrfs: pass root owner to read_tree_blockJosef Bacik
In order to properly set the lockdep class of a newly allocated block we need to know the owner of the block. For non-refcounted trees this is straightforward, we always know in advance what tree we're reading from. For refcounted trees we don't necessarily know, however all refcounted trees share the same lockdep class name, tree-<level>. Fix all the callers of read_tree_block() to pass in the root objectid we're using. In places like relocation and backref we could probably unconditionally use 0, but just in case use the root when we have it, otherwise use 0 in the cases we don't have the root as it's going to be a refcounted tree anyway. This is a preparation patch for further changes. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: use btrfs_read_node_slot in btrfs_qgroup_trace_subtreeJosef Bacik
We're open-coding btrfs_read_node_slot() here, replace with the helper. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: use btrfs_read_node_slot in qgroup_trace_new_subtree_blocksJosef Bacik
We're open-coding btrfs_read_node_slot() here, replace with the helper. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: use btrfs_read_node_slot in qgroup_trace_extent_swapJosef Bacik
We're open-coding btrfs_read_node_slot() here, replace with the helper. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: locking: rip out path->leave_spinningJosef Bacik
We no longer distinguish between blocking and spinning, so rip out all this code. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: locking: remove all the blocking helpersJosef Bacik
Now that we're using a rw_semaphore we no longer need to indicate if a lock is blocking or not, nor do we need to flip the entire path from blocking to spinning. Remove these helpers and all the places they are called. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23btrfs: fix lockdep splat when enabling and disabling qgroupsFilipe Manana
When running test case btrfs/017 from fstests, lockdep reported the following splat: [ 1297.067385] ====================================================== [ 1297.067708] WARNING: possible circular locking dependency detected [ 1297.068022] 5.10.0-rc4-btrfs-next-73 #1 Not tainted [ 1297.068322] ------------------------------------------------------ [ 1297.068629] btrfs/189080 is trying to acquire lock: [ 1297.068929] ffff9f2725731690 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.069274] but task is already holding lock: [ 1297.069868] ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs] [ 1297.070219] which lock already depends on the new lock. [ 1297.071131] the existing dependency chain (in reverse order) is: [ 1297.071721] -> #1 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}: [ 1297.072375] lock_acquire+0xd8/0x490 [ 1297.072710] __mutex_lock+0xa3/0xb30 [ 1297.073061] btrfs_qgroup_inherit+0x59/0x6a0 [btrfs] [ 1297.073421] create_subvol+0x194/0x990 [btrfs] [ 1297.073780] btrfs_mksubvol+0x3fb/0x4a0 [btrfs] [ 1297.074133] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs] [ 1297.074498] btrfs_ioctl_snap_create+0x58/0x80 [btrfs] [ 1297.074872] btrfs_ioctl+0x1a90/0x36f0 [btrfs] [ 1297.075245] __x64_sys_ioctl+0x83/0xb0 [ 1297.075617] do_syscall_64+0x33/0x80 [ 1297.075993] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.076380] -> #0 (sb_internal#2){.+.+}-{0:0}: [ 1297.077166] check_prev_add+0x91/0xc60 [ 1297.077572] __lock_acquire+0x1740/0x3110 [ 1297.077984] lock_acquire+0xd8/0x490 [ 1297.078411] start_transaction+0x3c5/0x760 [btrfs] [ 1297.078853] btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.079323] btrfs_ioctl+0x2c60/0x36f0 [btrfs] [ 1297.079789] __x64_sys_ioctl+0x83/0xb0 [ 1297.080232] do_syscall_64+0x33/0x80 [ 1297.080680] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.081139] other info that might help us debug this: [ 1297.082536] Possible unsafe locking scenario: [ 1297.083510] CPU0 CPU1 [ 1297.084005] ---- ---- [ 1297.084500] lock(&fs_info->qgroup_ioctl_lock); [ 1297.084994] lock(sb_internal#2); [ 1297.085485] lock(&fs_info->qgroup_ioctl_lock); [ 1297.085974] lock(sb_internal#2); [ 1297.086454] *** DEADLOCK *** [ 1297.087880] 3 locks held by btrfs/189080: [ 1297.088324] #0: ffff9f2725731470 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0xa73/0x36f0 [btrfs] [ 1297.088799] #1: ffff9f2702b60cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1f4d/0x36f0 [btrfs] [ 1297.089284] #2: ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs] [ 1297.089771] stack backtrace: [ 1297.090662] CPU: 5 PID: 189080 Comm: btrfs Not tainted 5.10.0-rc4-btrfs-next-73 #1 [ 1297.091132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 1297.092123] Call Trace: [ 1297.092629] dump_stack+0x8d/0xb5 [ 1297.093115] check_noncircular+0xff/0x110 [ 1297.093596] check_prev_add+0x91/0xc60 [ 1297.094076] ? kvm_clock_read+0x14/0x30 [ 1297.094553] ? kvm_sched_clock_read+0x5/0x10 [ 1297.095029] __lock_acquire+0x1740/0x3110 [ 1297.095510] lock_acquire+0xd8/0x490 [ 1297.095993] ? btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.096476] start_transaction+0x3c5/0x760 [btrfs] [ 1297.096962] ? btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.097451] btrfs_quota_enable+0xaf/0xa70 [btrfs] [ 1297.097941] ? btrfs_ioctl+0x1f4d/0x36f0 [btrfs] [ 1297.098429] btrfs_ioctl+0x2c60/0x36f0 [btrfs] [ 1297.098904] ? do_user_addr_fault+0x20c/0x430 [ 1297.099382] ? kvm_clock_read+0x14/0x30 [ 1297.099854] ? kvm_sched_clock_read+0x5/0x10 [ 1297.100328] ? sched_clock+0x5/0x10 [ 1297.100801] ? sched_clock_cpu+0x12/0x180 [ 1297.101272] ? __x64_sys_ioctl+0x83/0xb0 [ 1297.101739] __x64_sys_ioctl+0x83/0xb0 [ 1297.102207] do_syscall_64+0x33/0x80 [ 1297.102673] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1297.103148] RIP: 0033:0x7f773ff65d87 This is because during the quota enable ioctl we lock first the mutex qgroup_ioctl_lock and then start a transaction, and starting a transaction acquires a fs freeze semaphore (at the VFS level). However, every other code path, except for the quota disable ioctl path, we do the opposite: we start a transaction and then lock the mutex. So fix this by making the quota enable and disable paths to start the transaction without having the mutex locked, and then, after starting the transaction, lock the mutex and check if some other task already enabled or disabled the quotas, bailing with success if that was the case. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23btrfs: do nofs allocations when adding and removing qgroup relationsFilipe Manana
When adding or removing a qgroup relation we are doing a GFP_KERNEL allocation which is not safe because we are holding a transaction handle open and that can make us deadlock if the allocator needs to recurse into the filesystem. So just surround those calls with a nofs context. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23btrfs: fix lockdep splat when reading qgroup config on mountFilipe Manana
Lockdep reported the following splat when running test btrfs/190 from fstests: [ 9482.126098] ====================================================== [ 9482.126184] WARNING: possible circular locking dependency detected [ 9482.126281] 5.10.0-rc4-btrfs-next-73 #1 Not tainted [ 9482.126365] ------------------------------------------------------ [ 9482.126456] mount/24187 is trying to acquire lock: [ 9482.126534] ffffa0c869a7dac0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}, at: qgroup_rescan_init+0x43/0xf0 [btrfs] [ 9482.126647] but task is already holding lock: [ 9482.126777] ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs] [ 9482.126886] which lock already depends on the new lock. [ 9482.127078] the existing dependency chain (in reverse order) is: [ 9482.127213] -> #1 (btrfs-quota-00){++++}-{3:3}: [ 9482.127366] lock_acquire+0xd8/0x490 [ 9482.127436] down_read_nested+0x45/0x220 [ 9482.127528] __btrfs_tree_read_lock+0x27/0x120 [btrfs] [ 9482.127613] btrfs_read_lock_root_node+0x41/0x130 [btrfs] [ 9482.127702] btrfs_search_slot+0x514/0xc30 [btrfs] [ 9482.127788] update_qgroup_status_item+0x72/0x140 [btrfs] [ 9482.127877] btrfs_qgroup_rescan_worker+0xde/0x680 [btrfs] [ 9482.127964] btrfs_work_helper+0xf1/0x600 [btrfs] [ 9482.128039] process_one_work+0x24e/0x5e0 [ 9482.128110] worker_thread+0x50/0x3b0 [ 9482.128181] kthread+0x153/0x170 [ 9482.128256] ret_from_fork+0x22/0x30 [ 9482.128327] -> #0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}: [ 9482.128464] check_prev_add+0x91/0xc60 [ 9482.128551] __lock_acquire+0x1740/0x3110 [ 9482.128623] lock_acquire+0xd8/0x490 [ 9482.130029] __mutex_lock+0xa3/0xb30 [ 9482.130590] qgroup_rescan_init+0x43/0xf0 [btrfs] [ 9482.131577] btrfs_read_qgroup_config+0x43a/0x550 [btrfs] [ 9482.132175] open_ctree+0x1228/0x18a0 [btrfs] [ 9482.132756] btrfs_mount_root.cold+0x13/0xed [btrfs] [ 9482.133325] legacy_get_tree+0x30/0x60 [ 9482.133866] vfs_get_tree+0x28/0xe0 [ 9482.134392] fc_mount+0xe/0x40 [ 9482.134908] vfs_kern_mount.part.0+0x71/0x90 [ 9482.135428] btrfs_mount+0x13b/0x3e0 [btrfs] [ 9482.135942] legacy_get_tree+0x30/0x60 [ 9482.136444] vfs_get_tree+0x28/0xe0 [ 9482.136949] path_mount+0x2d7/0xa70 [ 9482.137438] do_mount+0x75/0x90 [ 9482.137923] __x64_sys_mount+0x8e/0xd0 [ 9482.138400] do_syscall_64+0x33/0x80 [ 9482.138873] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 9482.139346] other info that might help us debug this: [ 9482.140735] Possible unsafe locking scenario: [ 9482.141594] CPU0 CPU1 [ 9482.142011] ---- ---- [ 9482.142411] lock(btrfs-quota-00); [ 9482.142806] lock(&fs_info->qgroup_rescan_lock); [ 9482.143216] lock(btrfs-quota-00); [ 9482.143629] lock(&fs_info->qgroup_rescan_lock); [ 9482.144056] *** DEADLOCK *** [ 9482.145242] 2 locks held by mount/24187: [ 9482.145637] #0: ffffa0c8411c40e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xb9/0x400 [ 9482.146061] #1: ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs] [ 9482.146509] stack backtrace: [ 9482.147350] CPU: 1 PID: 24187 Comm: mount Not tainted 5.10.0-rc4-btrfs-next-73 #1 [ 9482.147788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 9482.148709] Call Trace: [ 9482.149169] dump_stack+0x8d/0xb5 [ 9482.149628] check_noncircular+0xff/0x110 [ 9482.150090] check_prev_add+0x91/0xc60 [ 9482.150561] ? kvm_clock_read+0x14/0x30 [ 9482.151017] ? kvm_sched_clock_read+0x5/0x10 [ 9482.151470] __lock_acquire+0x1740/0x3110 [ 9482.151941] ? __btrfs_tree_read_lock+0x27/0x120 [btrfs] [ 9482.152402] lock_acquire+0xd8/0x490 [ 9482.152887] ? qgroup_rescan_init+0x43/0xf0 [btrfs] [ 9482.153354] __mutex_lock+0xa3/0xb30 [ 9482.153826] ? qgroup_rescan_init+0x43/0xf0 [btrfs] [ 9482.154301] ? qgroup_rescan_init+0x43/0xf0 [btrfs] [ 9482.154768] ? qgroup_rescan_init+0x43/0xf0 [btrfs] [ 9482.155226] qgroup_rescan_init+0x43/0xf0 [btrfs] [ 9482.155690] btrfs_read_qgroup_config+0x43a/0x550 [btrfs] [ 9482.156160] open_ctree+0x1228/0x18a0 [btrfs] [ 9482.156643] btrfs_mount_root.cold+0x13/0xed [btrfs] [ 9482.157108] ? rcu_read_lock_sched_held+0x5d/0x90 [ 9482.157567] ? kfree+0x31f/0x3e0 [ 9482.158030] legacy_get_tree+0x30/0x60 [ 9482.158489] vfs_get_tree+0x28/0xe0 [ 9482.158947] fc_mount+0xe/0x40 [ 9482.159403] vfs_kern_mount.part.0+0x71/0x90 [ 9482.159875] btrfs_mount+0x13b/0x3e0 [btrfs] [ 9482.160335] ? rcu_read_lock_sched_held+0x5d/0x90 [ 9482.160805] ? kfree+0x31f/0x3e0 [ 9482.161260] ? legacy_get_tree+0x30/0x60 [ 9482.161714] legacy_get_tree+0x30/0x60 [ 9482.162166] vfs_get_tree+0x28/0xe0 [ 9482.162616] path_mount+0x2d7/0xa70 [ 9482.163070] do_mount+0x75/0x90 [ 9482.163525] __x64_sys_mount+0x8e/0xd0 [ 9482.163986] do_syscall_64+0x33/0x80 [ 9482.164437] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 9482.164902] RIP: 0033:0x7f51e907caaa This happens because at btrfs_read_qgroup_config() we can call qgroup_rescan_init() while holding a read lock on a quota btree leaf, acquired by the previous call to btrfs_search_slot_for_read(), and qgroup_rescan_init() acquires the mutex qgroup_rescan_lock. A qgroup rescan worker does the opposite: it acquires the mutex qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to update the qgroup status item in the quota btree through the call to update_qgroup_status_item(). This inversion of locking order between the qgroup_rescan_lock mutex and quota btree locks causes the splat. Fix this simply by releasing and freeing the path before calling qgroup_rescan_init() at btrfs_read_qgroup_config(). CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13btrfs: qgroup: don't commit transaction when we already hold the handleQu Wenruo
[BUG] When running the following script, btrfs will trigger an ASSERT(): #/bin/bash mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 1G" $mnt/file sync btrfs quota enable $mnt btrfs quota rescan -w $mnt # Manually set the limit below current usage btrfs qgroup limit 512M $mnt $mnt # Crash happens touch $mnt/file The dmesg looks like this: assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3230! invalid opcode: 0000 [#1] SMP PTI RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs] btrfs_commit_transaction.cold+0x11/0x5d [btrfs] try_flush_qgroup+0x67/0x100 [btrfs] __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs] btrfs_delayed_update_inode+0xaa/0x350 [btrfs] btrfs_update_inode+0x9d/0x110 [btrfs] btrfs_dirty_inode+0x5d/0xd0 [btrfs] touch_atime+0xb5/0x100 iterate_dir+0xf1/0x1b0 __x64_sys_getdents64+0x78/0x110 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fb5afe588db [CAUSE] In try_flush_qgroup(), we assume we don't hold a transaction handle at all. This is true for data reservation and mostly true for metadata. Since data space reservation always happens before we start a transaction, and for most metadata operation we reserve space in start_transaction(). But there is an exception, btrfs_delayed_inode_reserve_metadata(). It holds a transaction handle, while still trying to reserve extra metadata space. When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we will join current transaction and commit, while we still have transaction handle from qgroup code. [FIX] Let's check current->journal before we join the transaction. If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means we are not holding a transaction, thus are able to join and then commit transaction. If current->journal is a valid transaction handle, we avoid committing transaction and just end it This is less effective than committing current transaction, as it won't free metadata reserved space, but we may still free some data space before new data writes. Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634 Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05btrfs: clean up NULL checks in qgroup_unreserve_range()Dan Carpenter
Smatch complains that this code dereferences "entry" before checking whether it's NULL on the next line. Fortunately, rb_entry() will never return NULL so it doesn't cause a problem. We can clean up the NULL checking a bit to silence the warning and make the code more clear. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26btrfs: drop the path before adding qgroup items when enabling qgroupsJosef Bacik
When enabling qgroups we walk the tree_root and then add a qgroup item for every root that we have. This creates a lock dependency on the tree_root and qgroup_root, which results in the following lockdep splat (with tree locks using rwsem), eg. in tests btrfs/017 or btrfs/022: ====================================================== WARNING: possible circular locking dependency detected 5.9.0-default+ #1299 Not tainted ------------------------------------------------------ btrfs/24552 is trying to acquire lock: ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] but task is already holding lock: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-root-00){++++}-{3:3}: __lock_acquire+0x3fb/0x730 lock_acquire.part.0+0x6a/0x130 down_read_nested+0x46/0x130 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] __btrfs_read_lock_root_node+0x3a/0x50 [btrfs] btrfs_search_slot_get_root+0x11d/0x290 [btrfs] btrfs_search_slot+0xc3/0x9f0 [btrfs] btrfs_insert_item+0x6e/0x140 [btrfs] btrfs_create_tree+0x1cb/0x240 [btrfs] btrfs_quota_enable+0xcd/0x790 [btrfs] btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs] __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (btrfs-quota-00){++++}-{3:3}: check_prev_add+0x91/0xc30 validate_chain+0x491/0x750 __lock_acquire+0x3fb/0x730 lock_acquire.part.0+0x6a/0x130 down_read_nested+0x46/0x130 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] __btrfs_read_lock_root_node+0x3a/0x50 [btrfs] btrfs_search_slot_get_root+0x11d/0x290 [btrfs] btrfs_search_slot+0xc3/0x9f0 [btrfs] btrfs_insert_empty_items+0x58/0xa0 [btrfs] add_qgroup_item.part.0+0x72/0x210 [btrfs] btrfs_quota_enable+0x3bb/0x790 [btrfs] btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs] __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-root-00); lock(btrfs-quota-00); lock(btrfs-root-00); lock(btrfs-quota-00); *** DEADLOCK *** 5 locks held by btrfs/24552: #0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0 #1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs] #2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs] #3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs] #4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] stack backtrace: CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0x77/0x97 check_noncircular+0xf3/0x110 check_prev_add+0x91/0xc30 validate_chain+0x491/0x750 __lock_acquire+0x3fb/0x730 lock_acquire.part.0+0x6a/0x130 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] ? lock_acquire+0xc4/0x140 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] down_read_nested+0x46/0x130 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] ? btrfs_root_node+0xd9/0x200 [btrfs] __btrfs_read_lock_root_node+0x3a/0x50 [btrfs] btrfs_search_slot_get_root+0x11d/0x290 [btrfs] btrfs_search_slot+0xc3/0x9f0 [btrfs] btrfs_insert_empty_items+0x58/0xa0 [btrfs] add_qgroup_item.part.0+0x72/0x210 [btrfs] btrfs_quota_enable+0x3bb/0x790 [btrfs] btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs] __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by dropping the path whenever we find a root item, add the qgroup item, and then re-lookup the root item we found and continue processing roots. Reported-by: David Sterba <dsterba@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: delete duplicated words + other fixes in commentsRandy Dunlap
Delete repeated words in fs/btrfs/. {to, the, a, and old} and change "into 2 part" to "into 2 parts". Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve ↵Qu Wenruo
retry-after-EDQUOT commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT") tries to reduce the early EDQUOT problems by checking the qgroup free against threshold and tries to wake up commit kthread to free some space. The problem of that mechanism is, it can only free qgroup per-trans metadata space, can't do anything to data, nor prealloc qgroup space. Now since we have the ability to flush qgroup space, and implemented retry-after-EDQUOT behavior, such mechanism can be completely replaced. So this patch will cleanup such mechanism in favor of retry-after-EDQUOT. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: qgroup: try to flush qgroup space when we get -EDQUOTQu Wenruo
[PROBLEM] There are known problem related to how btrfs handles qgroup reserved space. One of the most obvious case is the the test case btrfs/153, which do fallocate, then write into the preallocated range. btrfs/153 1s ... - output mismatch (see xfstests-dev/results//btrfs/153.out.bad) --- tests/btrfs/153.out 2019-10-22 15:18:14.068965341 +0800 +++ xfstests-dev/results//btrfs/153.out.bad 2020-07-01 20:24:40.730000089 +0800 @@ -1,2 +1,5 @@ QA output created by 153 +pwrite: Disk quota exceeded +/mnt/scratch/testfile2: Disk quota exceeded +/mnt/scratch/testfile2: Disk quota exceeded Silence is golden ... (Run 'diff -u xfstests-dev/tests/btrfs/153.out xfstests-dev/results//btrfs/153.out.bad' to see the entire diff) [CAUSE] Since commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to"), we always reserve space no matter if it's COW or not. Such behavior change is mostly for performance, and reverting it is not a good idea anyway. For preallcoated extent, we reserve qgroup data space for it already, and since we also reserve data space for qgroup at buffered write time, it needs twice the space for us to write into preallocated space. This leads to the -EDQUOT in buffered write routine. And we can't follow the same solution, unlike data/meta space check, qgroup reserved space is shared between data/metadata. The EDQUOT can happen at the metadata reservation, so doing NODATACOW check after qgroup reservation failure is not a solution. [FIX] To solve the problem, we don't return -EDQUOT directly, but every time we got a -EDQUOT, we try to flush qgroup space: - Flush all inodes of the root NODATACOW writes will free the qgroup reserved at run_dealloc_range(). However we don't have the infrastructure to only flush NODATACOW inodes, here we flush all inodes anyway. - Wait for ordered extents This would convert the preallocated metadata space into per-trans metadata, which can be freed in later transaction commit. - Commit transaction This will free all per-trans metadata space. Also we don't want to trigger flush multiple times, so here we introduce a per-root wait list and a new root status, to ensure only one thread starts the flushing. Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: qgroup: allow to unreserve range without releasing other rangesQu Wenruo
[PROBLEM] Before this patch, when btrfs_qgroup_reserve_data() fails, we free all reserved space of the changeset. For example: ret = btrfs_qgroup_reserve_data(inode, changeset, 0, SZ_1M); ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_1M, SZ_1M); ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_2M, SZ_1M); If the last btrfs_qgroup_reserve_data() failed, it will release the entire [0, 3M) range. This behavior is kind of OK for now, as when we hit -EDQUOT, we normally go error handling and need to release all reserved ranges anyway. But this also means the following call is not possible: ret = btrfs_qgroup_reserve_data(); if (ret == -EDQUOT) { /* Do something to free some qgroup space */ ret = btrfs_qgroup_reserve_data(); } As if the first btrfs_qgroup_reserve_data() fails, it will free all reserved qgroup space. [CAUSE] This is because we release all reserved ranges when btrfs_qgroup_reserve_data() fails. [FIX] This patch will implement a new function, qgroup_unreserve_range(), to iterate through the ulist nodes, to find any nodes in the failure range, and remove the EXTENT_QGROUP_RESERVED bits from the io_tree, and decrease the extent_changeset::bytes_changed, so that we can revert to previous state. This allows later patches to retry btrfs_qgroup_reserve_data() if EDQUOT happens. Suggested-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: qgroup: export qgroups in sysfsQu Wenruo
This patch will add the following sysfs interface: /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/referenced /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/exclusive /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/max_referenced /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/max_exclusive /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/limit_flags Which is also available in output of "btrfs qgroup show". /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_data /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_meta_pertrans /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_meta_prealloc The last 3 rsv related members are not visible to users, but can be very useful to debug qgroup limit related bugs. Also, to avoid '/' used in <qgroup_id>, the separator between qgroup level and qgroup id is changed to '_'. The interface is not hidden behind 'debug' as we want this interface to be included into production build and to provide another way to read the qgroup information besides the ioctls. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: use __u16 for the return value of btrfs_qgroup_level()Qu Wenruo
The qgroup level is limited to u16, so no need to use u64 for it. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: make btrfs_qgroup_check_reserved_leak take btrfs_inodeNikolay Borisov
vfs_inode is used only for the inode number everything else requires btrfs_inode. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ use btrfs_ino ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: make btrfs_qgroup_reserve_data take btrfs_inodeNikolay Borisov
There's only a single use of vfs_inode in a tracepoint so let's take btrfs_inode directly. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: make btrfs_qgroup_release_data take btrfs_inodeNikolay Borisov
It just forwards its argument to __btrfs_qgroup_release_data. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: make btrfs_qgroup_free_data take btrfs_inodeNikolay Borisov
It passes btrfs_inode to its callee so change the interface. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: make __btrfs_qgroup_release_data take btrfs_inodeNikolay Borisov
It uses vfs_inode only for a tracepoint so convert its interface to take btrfs_inode directly. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: make qgroup_free_reserved_data take btrfs_inodeNikolay Borisov
It only uses btrfs_inode so can just as easily take it as an argument. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: qgroup: catch reserved space leaks at unmount timeQu Wenruo
Before this patch, qgroup completely relies on per-inode extent io tree to detect reserved data space leak. However previous bug has already shown how release page before btrfs_finish_ordered_io() could lead to leak, and since it's QGROUP_RESERVED bit cleared without triggering qgroup rsv, it can't be detected by per-inode extent io tree. So this patch adds another (and hopefully the final) safety net to catch qgroup data reserved space leak. At least the new safety net catches all the leaks during development, so it should be pretty useful in the real world. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25btrfs: qgroup: mark qgroup inconsistent if we're inherting snapshot to a new ↵Qu Wenruo
qgroup [BUG] For the following operation, qgroup is guaranteed to be screwed up due to snapshot adding to a new qgroup: # mkfs.btrfs -f $dev # mount $dev $mnt # btrfs qgroup en $mnt # btrfs subv create $mnt/src # xfs_io -f -c "pwrite 0 1m" $mnt/src/file # sync # btrfs qgroup create 1/0 $mnt/src # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot # btrfs qgroup show -prce $mnt/src qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 16.00KiB 16.00KiB none none --- --- 0/257 1.02MiB 16.00KiB none none --- --- 0/258 1.02MiB 16.00KiB none none 1/0 --- 1/0 0.00B 0.00B none none --- 0/258 ^^^^^^^^^^^^^^^^^^^^ [CAUSE] The problem is in btrfs_qgroup_inherit(), we don't have good enough check to determine if the new relation would break the existing accounting. Unlike btrfs_add_qgroup_relation(), which has proper check to determine if we can do quick update without a rescan, in btrfs_qgroup_inherit() we can even assign a snapshot to multiple qgroups. [FIX] Fix it by manually marking qgroup inconsistent for snapshot inheritance. For subvolume creation, since all its extents are exclusively owned, we don't need to rescan. In theory, we should call relation check like quick_update_accounting() when doing qgroup inheritance and inform user about qgroup accounting inconsistency. But we don't have good mechanism to relay that back to the user in the snapshot creation context, thus we can only silently mark the qgroup inconsistent. Anyway, user shouldn't use qgroup inheritance during snapshot creation, and should add qgroup relationship after snapshot creation by 'btrfs qgroup assign', which has a much better UI to inform user about qgroup inconsistent and kick in rescan automatically. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23btrfs: move the root freeing stuff into btrfs_put_rootJosef Bacik
There are a few different ways to free roots, either you allocated them yourself and you just do free_extent_buffer(root->node); free_extent_buffer(root->commit_node); btrfs_put_root(root); Which is the pattern for log roots. Or for snapshots/subvolumes that are being dropped you simply call btrfs_free_fs_root() which does all the cleanup for you. Unify this all into btrfs_put_root(), so that we don't free up things associated with the root until the last reference is dropped. This makes the root freeing code much more significant. The only caveat is at close_ctree() time we have to free the extent buffers for all of our main roots (extent_root, chunk_root, etc) because we have to drop the btree_inode and we'll run into issues if we hold onto those nodes until ->kill_sb() time. This will be addressed in the future when we kill the btree_inode. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_runningQu Wenruo
After the previous patch, qgroup_rescan_running is protected by btrfs_fs_info::qgroup_rescan_lock, thus no need for the extra spinlock. Suggested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is ↵Qu Wenruo
at least queued [BUG] There are some reports about btrfs wait forever to unmount itself, with the following call trace: INFO: task umount:4631 blocked for more than 491 seconds. Tainted: G X 5.3.8-2-default #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D 0 4631 3337 0x00000000 Call Trace: ([<00000000174adf7a>] __schedule+0x342/0x748) [<00000000174ae3ca>] schedule+0x4a/0xd8 [<00000000174b1f08>] schedule_timeout+0x218/0x420 [<00000000174af10c>] wait_for_common+0x104/0x1d8 [<000003ff804d6994>] btrfs_qgroup_wait_for_completion+0x84/0xb0 [btrfs] [<000003ff8044a616>] close_ctree+0x4e/0x380 [btrfs] [<0000000016fa3136>] generic_shutdown_super+0x8e/0x158 [<0000000016fa34d6>] kill_anon_super+0x26/0x40 [<000003ff8041ba88>] btrfs_kill_super+0x28/0xc8 [btrfs] [<0000000016fa39f8>] deactivate_locked_super+0x68/0x98 [<0000000016fcb198>] cleanup_mnt+0xc0/0x140 [<0000000016d6a846>] task_work_run+0xc6/0x110 [<0000000016d04f76>] do_notify_resume+0xae/0xb8 [<00000000174b30ae>] system_call+0xe2/0x2c8 [CAUSE] The problem happens when we have called qgroup_rescan_init(), but not queued the worker. It can be caused mostly by error handling. Qgroup ioctl thread | Unmount thread ----------------------------------------+----------------------------------- | btrfs_qgroup_rescan() | |- qgroup_rescan_init() | | |- qgroup_rescan_running = true; | | | |- trans = btrfs_join_transaction() | | Some error happened | | | |- btrfs_qgroup_rescan() returns error | But qgroup_rescan_running == true; | | close_ctree() | |- btrfs_qgroup_wait_for_completion() | |- running == true; | |- wait_for_completion(); btrfs_qgroup_rescan_worker is never queued, thus no one is going to wake up close_ctree() and we get a deadlock. All involved qgroup_rescan_init() callers are: - btrfs_qgroup_rescan() The example above. It's possible to trigger the deadlock when error happened. - btrfs_quota_enable() Not possible. Just after qgroup_rescan_init() we queue the work. - btrfs_read_qgroup_config() It's possible to trigger the deadlock. It only init the work, the work queueing happens in btrfs_qgroup_rescan_resume(). Thus if error happened in between, deadlock is possible. We shouldn't set fs_info->qgroup_rescan_running just in qgroup_rescan_init(), as at that stage we haven't yet queued qgroup rescan worker to run. [FIX] Set qgroup_rescan_running before queueing the work, so that we ensure the rescan work is queued when we wait for it. Fixes: 8d9eddad1946 ("Btrfs: fix qgroup rescan worker initialization") Signed-off-by: Jeff Mahoney <jeffm@suse.com> [ Change subject and cause analyse, use a smaller fix ] Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_rootJosef Bacik
We are now using these for all roots, rename them to btrfs_put_root() and btrfs_grab_root(); Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23btrfs: use btrfs_put_fs_root to free roots alwaysJosef Bacik
If we are going to track leaked roots we need to free them all the same way, so don't kfree() roots directly, use btrfs_put_fs_root. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-19btrfs: destroy qgroup extent records on transaction abortJeff Mahoney
We clean up the delayed references when we abort a transaction but we leave the pending qgroup extent records behind, leaking memory. This patch destroys the extent records when we destroy the delayed refs and makes sure ensure they're gone before releasing the transaction. Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> [ Rebased to latest upstream, remove to_qgroup() helper, use rbtree_postorder_for_each_entry_safe() wrapper ] Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: qgroup: return ENOTCONN instead of EINVAL when quotas are not enabledMarcos Paulo de Souza
[PROBLEM] qgroup create/remove code is currently returning EINVAL when the user tries to create a qgroup on a subvolume without quota enabled. EINVAL is already being used for too many error scenarios so that is hard to depict what is the problem. [FIX] Currently scrub and balance code return -ENOTCONN when the user tries to cancel/pause and no scrub or balance is currently running for the desired subvolume. Do the same here by returning -ENOTCONN when a user tries to create/delete/assing/list a qgroup on a subvolume without quota enabled. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: qgroup: remove one-time use variables for quota_root checksMarcos Paulo de Souza
Remove some variables that are set only to be checked later, and never used. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-08btrfs: fix memory leak in qgroup accountingJohannes Thumshirn
When running xfstests on the current btrfs I get the following splat from kmemleak: unreferenced object 0xffff88821b2404e0 (size 32): comm "kworker/u4:7", pid 26663, jiffies 4295283698 (age 8.776s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 10 ff fd 26 82 88 ff ff ...........&.... 10 ff fd 26 82 88 ff ff 20 ff fd 26 82 88 ff ff ...&.... ..&.... backtrace: [<00000000f94fd43f>] ulist_alloc+0x25/0x60 [btrfs] [<00000000fd023d99>] btrfs_find_all_roots_safe+0x41/0x100 [btrfs] [<000000008f17bd32>] btrfs_find_all_roots+0x52/0x70 [btrfs] [<00000000b7660afb>] btrfs_qgroup_rescan_worker+0x343/0x680 [btrfs] [<0000000058e66778>] btrfs_work_helper+0xac/0x1e0 [btrfs] [<00000000f0188930>] process_one_work+0x1cf/0x350 [<00000000af5f2f8e>] worker_thread+0x28/0x3c0 [<00000000b55a1add>] kthread+0x109/0x120 [<00000000f88cbd17>] ret_from_fork+0x35/0x40 This corresponds to: (gdb) l *(btrfs_find_all_roots_safe+0x41) 0x8d7e1 is in btrfs_find_all_roots_safe (fs/btrfs/backref.c:1413). 1408 1409 tmp = ulist_alloc(GFP_NOFS); 1410 if (!tmp) 1411 return -ENOMEM; 1412 *roots = ulist_alloc(GFP_NOFS); 1413 if (!*roots) { 1414 ulist_free(tmp); 1415 return -ENOMEM; 1416 } 1417 Following the lifetime of the allocated 'roots' ulist, it gets freed again in btrfs_qgroup_account_extent(). But this does not happen if the function is called with the 'BTRFS_FS_QUOTA_ENABLED' flag cleared, then btrfs_qgroup_account_extent() does a short leave and directly returns. Instead of directly returning we should jump to the 'out_free' in order to free all resources as expected. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> [ add comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-13btrfs: Fix error messages in qgroup_rescan_initNikolay Borisov
The branch of qgroup_rescan_init which is executed from the mount path prints wrong errors messages. The textual print out in case BTRFS_QGROUP_STATUS_FLAG_RESCAN/BTRFS_QGROUP_STATUS_FLAG_ON are not set are transposed. Fix it by exchanging their place. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: rename btrfs_block_group_cacheDavid Sterba
The type name is misleading, a single entry is named 'cache' while this normally means a collection of objects. Rename that everywhere. Also the identifier was quite long, making function prototypes harder to format. Suggested-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: opencode extent_buffer_getDavid Sterba
The helper is trivial and we can understand what the atomic_inc on something named refs does. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: get rid of unique workqueue helper functionsOmar Sandoval
Commit 9e0af2376434 ("Btrfs: fix task hang under heavy compressed write") worked around the issue that a recycled work item could get a false dependency on the original work item due to how the workqueue code guarantees non-reentrancy. It did so by giving different work functions to different types of work. However, the fixes in the previous few patches are more complete, as they prevent a work item from being recycled at all (except for a tiny window that the kernel workqueue code handles for us). This obsoletes the previous fix, so we don't need the unique helpers for correctness. The only other reason to keep them would be so they show up in stack traces, but they always seem to be optimized to a tail call, so they don't show up anyways. So, let's just get rid of the extra indirection. While we're here, rename normal_work_helper() to the more informative btrfs_work_helper(). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: get rid of unnecessary memset() of work itemOmar Sandoval
Commit fc97fab0ea59 ("btrfs: Replace fs_info->qgroup_rescan_worker workqueue with btrfs_workqueue.") converted qgroup_rescan_work to be initialized with btrfs_init_work(), but it left behind an unnecessary memset(). Get rid of the memset(). Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-17btrfs: tracepoints: Fix wrong parameter order for qgroup eventsQu Wenruo
[BUG] For btrfs:qgroup_meta_reserve event, the trace event can output garbage: qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2 The diff should always be alinged to sector size (4k), so there is definitely something wrong. [CAUSE] For the wrong @diff, it's caused by wrong parameter order. The correct parameters are: struct btrfs_root, s64 diff, int type. However the parameters used are: struct btrfs_root, int type, s64 diff. Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-27btrfs: qgroup: Fix reserved data space leak if we have multiple reserve callsQu Wenruo
[BUG] The following script can cause btrfs qgroup data space leak: mkfs.btrfs -f $dev mount $dev -o nospace_cache $mnt btrfs subv create $mnt/subv btrfs quota en $mnt btrfs quota rescan -w $mnt btrfs qgroup limit 128m $mnt/subv for (( i = 0; i < 3; i++)); do # Create 3 64M holes for latter fallocate to fail truncate -s 192m $mnt/subv/file xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null sync # it's supposed to fail, and each failure will leak at least 64M # data space xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null rm $mnt/subv/file sync done # Shouldn't fail after we removed the file xfs_io -f -c "falloc 0 64m" $mnt/subv/file [CAUSE] Btrfs qgroup data reserve code allow multiple reservations to happen on a single extent_changeset: E.g: btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M); btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M); btrfs_qgroup_reserve_data(inode, &am