summaryrefslogtreecommitdiffstats
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2010-10-29Btrfs: fix clone ioctl where range is adjacent to extentSage Weil
We had an edge case issue where the requested range was just following an existing extent. Instead of skipping to the next extent, we used the previous one which lead to having zero sized extents. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: fix delalloc checks in clone ioctlSage Weil
The lookup_first_ordered_extent() was done on the wrong inode, and the ->delalloc_bytes test was wrong, as the following btrfs_wait_ordered_range() would only invoke a range write and wouldn't write the entire file data range. Also, a bad parameter was passed to btrfs_wait_ordered_range(). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: drop unused variable in block_alloc_rsvChris Mason
The alloc_target variable is not really used. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: cleanup warnings from gcc 4.6 (nonbugs)Andi Kleen
These are all the cases where a variable is set, but not read which are not bugs as far as I can see, but simply leftovers. Still needs more review. Found by gcc 4.6's new warnings Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: Fix variables set but not read (bugs found by gcc 4.6)Andi Kleen
These are all the cases where a variable is set, but not read which are really bugs. - Couple of incorrect error handling fixed. - One incorrect use of a allocation policy - Some other things Still needs more review. Found by gcc 4.6's new warnings. [akpm@linux-foundation.org: fix build. Might have been bitrot] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: Use ERR_CAST helpersJulia Lawall
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: use memdup_user helpersJulia Lawall
Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = \(kmalloc@p\|kzalloc@p\)(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) || ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: fix raid code for removing missing drivesChris Mason
When btrfs is mounted in degraded mode, it has some internal structures to track the missing devices. This missing device is setup as readonly, but the mapping code can get upset when we try to write to it. This changes the mapping code to return -EIO instead of oops when we try to write to the readonly device. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: Switch the extent buffer rbtree into a radix treeMiao Xie
This patch reduces the CPU time spent in the extent buffer search by using the radix tree instead of the rbtree and using the rcu lock instead of the spin lock. I did a quick test by the benchmark tool[1] and found the patch improve the file creation/deletion performance problem that I have reported[2]. Before applying this patch: Create files: Total files: 50000 Total time: 0.971531 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.366761 Average time: 0.000027 After applying this patch: Create files: Total files: 50000 Total time: 0.927455 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.292280 Average time: 0.000026 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 [2] http://marc.info/?l=linux-btrfs&m=128212635122920&w=2 Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: restructure try_release_extent_buffer()Miao Xie
restructure try_release_extent_buffer() and write a function to release the extent buffer. It will be used later. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: use the flusher threads for delalloc throttlingChris Mason
We have a fairly complex set of loops around walking our list of delalloc inodes when we find metadata delalloc space running low. It doesn't work very well, can use large amounts of CPU and doesn't do very efficient writeback. This switches us to kick the bdi flusher threads instead. All dirty data in btrfs is accounted as delalloc data, so this is very similar in terms of what it writes, but we're able to just kick off the IO and wait for progress. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: tune the chunk allocation to 5% of the FS as metadataChris Mason
An earlier commit tried to keep us from allocating too many empty metadata chunks. It was somewhat too restrictive and could lead to ENOSPC errors on empty filesystems. This increases the limits to about 5% of the FS size, allowing more metadata chunks to be preallocated. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: don't loop forever on bad btree blocksChris Mason
When btrfs discovers the generation number in a btree block is incorrect, it can loop forever without forcing the RAID code to try a valid mirror, and without returning EIO. This changes things to properly kick out the EIO. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Merge branch 'bug-fixes' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work Conflicts: fs/btrfs/extent-tree.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: let the user know space caching is enabledJosef Bacik
If you mount -o space_cache, the option will be persistent across mounts, but to make sure the user knows that they did this, emit a message telling them if they didn't mount with -o space_cache but the feature is still used. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-29Btrfs: Add a clear_cache mount optionJosef Bacik
If something goes wrong with the free space cache we need a way to make sure it's not loaded on mount and that it's cleared for everybody. When you pass the clear_cache option it will make it so all block groups are setup to be cleared, which keeps them from being loaded and then they will be truncated when the transaction is committed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-29Btrfs: add support for mixed data+metadata block groupsJosef Bacik
There are just a few things that need to be fixed in the kernel to support mixed data+metadata block groups. Mostly we just need to make sure that if we are using mixed block groups that we continue to allocate mixed block groups as we need them. Also we need to make sure __find_space_info will find our space info if we search for DATA or METADATA only. Tested this with xfstests and it works nicely. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-29Btrfs: check cache->caching_ctl before returning if caching has startedJosef Bacik
With the free space disk caching we can mark the block group as started with the caching, but we don't have a caching ctl. This can race with anybody else who tries to get the caching ctl before we cache (this is very hard to do btw). So instead check to see if cache->caching_ctl is set, and if not return NULL. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-29Btrfs: load free space cache if it existsJosef Bacik
This patch actually loads the free space cache if it exists. The only thing that really changes here is that we need to cache the block group if we're going to remove an extent from it. Previously we did not do this since the caching kthread would pick it up. With the on disk cache we don't have this luxury so we need to make sure we read the on disk cache in first, and then remove the extent, that way when the extent is unpinned the free space is added to the block group. This has been tested with all sorts of things. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-29Btrfs: write out free space cacheJosef Bacik
This is a simple bit, just dump the free space cache out to our preallocated inode when we're writing out dirty block groups. There are a bunch of changes in inode.c in order to account for special cases. Mostly when we're doing the writeout we're holding trans_mutex, so we need to use the nolock transacation functions. Also we can't do asynchronous completions since the async thread could be blocked on already completed IO waiting for the transaction lock. This has been tested with xfstests and btrfs filesystem balance, as well as my ENOSPC tests. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-29convert btrfsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-28Btrfs: create special free space cache inodeJosef Bacik
In order to save free space cache, we need an inode to hold the data, and we need a special item to point at the right inode for the right block group. So first, create a special item that will point to the right inode, and the number of extent entries we will have and the number of bitmaps we will have. We truncate and pre-allocate space everytime to make sure it's uptodate. This feature will be turned on as soon as you mount with -o space_cache, however it is safe to boot into old kernels, they will just generate the cache the old fashion way. When you boot back into a newer kernel we will notice that we modified and not the cache and automatically discard the cache. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-26Btrfs: remove warn_on from use_block_rsvJosef Bacik
Because btrfs_dirty_inode does a btrfs_join_transaction, it doesn't actually reserve space. It does this so we can try and dirty the inode quickly without having to deal with the ENOSPC problems. But if it does get back ENOSPC it handles it properly. The problem is use_block_rsv does a WARN_ON whenever this case happens, even tho btrfs_dirty_inode takes it into account and actually expects to get -ENOSPC if things are particularly tight. So instead just remove the warning. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-26Btrfs: set trans to null in reserve_metadata_bytes if we commit the transactionJosef Bacik
btrfs_commit_transaction will free our trans, but because we pass trans to shrink_delalloc we could possibly have a use after free situation. So instead if we commit the transaction, set trans to null and set committed to true so we don't keep trying to commit a transaction. This fixes a panic I could reproduce at will. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-25new helper: ihold()Al Viro
Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25new helper: inode_unhashed()Al Viro
note: for race-free uses you inode_lock held Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-22Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...
2010-10-22Btrfs: fix error handling in btrfs_get_sbJosef Bacik
If we failed to find the root subvol id, or the subvol=<name>, we would deactivate the locked super and close the devices. The problem is at this point we have gotten the SB all setup, which includes setting super_operations, so when we'd deactiveate the super, we'd do a close_ctree() which closes the devices, so we'd end up closing the devices twice. So if you do something like this mount /dev/sda1 /mnt/test1 mount /dev/sda1 /mnt/test2 -o subvol=xxx umount /mnt/test1 it would blow up (if subvol xxx doesn't exist). This patch fixes that problem. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: rework how we reserve metadata bytesJosef Bacik
With multi-threaded writes we were getting ENOSPC early because somebody would come in, start flushing delalloc because they couldn't make their reservation, and in the meantime other threads would come in and use the space that was getting freed up, so when the original thread went to check to see if they had space they didn't and they'd return ENOSPC. So instead if we have some free space but not enough for our reservation, take the reservation and then start doing the flushing. The only time we don't take reservations is when we've already overcommitted our space, that way we don't have people who come late to the party way overcommitting ourselves. This also moves all of the retrying and flushing code into reserve_metdata_bytes so it's all uniform. This keeps my fs_mark test from returning -ENOSPC as soon as it starts and actually lets me fill up the disk. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: don't allocate chunks as aggressivelyJosef Bacik
Because the ENOSPC code over reserves super aggressively we end up allocating chunks way more often than we should. For example with my fs_mark tests on a 2gb fs I can end up reserved 1gb just for metadata, when only 34mb of that is being used. So instead check to see if the amount of space actually used is less than 30% of the total space, and if so don't allocate a chunk, but only if we have at least 256mb of free space to make sure we don't put too much pressure on free space. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: re-work delalloc flushingJosef Bacik
Currently we try and flush delalloc, but we only do that in a sort of weak way, which works fine in most cases but if we're under heavy pressure we need to be able to wait for flushing to happen. Also instead of checking the bytes reserved in the block_rsv, check the space info since it is more accurate. The sync option will be used in a future patch. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: fix reservation code for mixed block groupsJosef Bacik
The global reservation stuff tries to add together DATA and METADATA used in order to figure out how much to reserve for everything, but this doesn't work right for mixed block groups. Instead if we have mixed block groups just set data used to 0. Also with mixed block groups we will use bytes_may_use for keeping track of delalloc bytes, so we need to take that into account in our reservation calculations. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: fix df regressionJosef Bacik
The new ENOSPC stuff breaks out the raid types which breaks the way we were reporting df to the system. This fixes it back so that Available is the total space available to data and used is the actual bytes used by the filesystem. This means that Available is Total - data used - all of the metadata space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: fix the df ioctl to report raid typesJosef Bacik
The new ENOSPC stuff broke the df ioctl since we no longer create seperate space info's for each RAID type. So instead, loop through each space info's raid lists so we can get the right RAID information which will allow the df ioctl to tell us RAID types again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: stop trying to shrink delalloc if there are no inodes to reclaimJosef Bacik
In very severe ENOSPC cases we can run out of inodes to do delalloc on, which means we'll just keep looping trying to shrink delalloc. Instead, if we fail to shrink delalloc 3 times in a row break out since we're not likely to make any progress. Tested this with a 100mb fs an xfstests test 13. Before the patch it would hang the box, with the patch we get -ENOSPC like we should. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-15llseek: automatically add .llseek fopArnd Bergmann
All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
2010-09-16block: remove BLKDEV_IFL_WAITChristoph Hellwig
All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10btrfs: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-10Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits) block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n xen-blkfront: fix missing out label blkdev: fix blkdev_issue_zeroout return value block: update request stacking methods to support discards block: fix missing export of blk_types.h writeback: fix bad _bh spinlock nesting drbd: revert "delay probes", feature is being re-implemented differently drbd: Initialize all members of sync_conf to their defaults [Bugz 315] drbd: Disable delay probes for the upcomming release writeback: cleanup bdi_register writeback: add new tracepoints writeback: remove unnecessary init_timer call writeback: optimize periodic bdi thread wakeups writeback: prevent unnecessary bdi threads wakeups writeback: move bdi threads exiting logic to the forker thread writeback: restructure bdi forker loop a little writeback: move last_active to bdi writeback: do not remove bdi from bdi_list writeback: simplify bdi code a little writeback: do not lose wake-ups in bdi threads ... Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and drivers/scsi/scsi_error.c as per Jens.
2010-08-09btrfs: remove junk sb_dirt changeArtem Bityutskiy
BTRFS does not define a '->write_super()' method, so it should not mark its superblock as dirty. This looks like some left-over. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09Make ->drop_inode() just return whether inode needs to be droppedAl Viro
... and let iput_final() do the actual eviction or retention Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09convert btrfs to ->evict_inode()Al Viro
NB: do we want btrfs_wait_ordered_range() on eviction of inodes with positive i_nlink on subvolume with zero root_refs? If not, btrfs_evict_inode() can be simplified by unconditionally bailing out in case of i_nlink > 0 in the very beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09simplify checks for I_CLEAR/I_FREEINGAl Viro
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is equivalent to I_FREEING for almost all code looking at either; it's there to keep track of having called clear_inode() exactly once per inode lifetime, at some point after having set I_FREEING. I_CLEAR and I_FREEING never get set at the same time with the current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR instead of I_CLEAR without loss of information. As the result of such change, checks become simpler and the amount of code that needs to know about I_CLEAR shrinks a lot. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09remove inode_setattrChristoph Hellwig
Replace inode_setattr with opencoded variants of it in all callers. This moves the remaining call to vmtruncate into the filesystem methods where it can be replaced with the proper truncate sequence. In a few cases it was obvious that we would never end up calling vmtruncate so it was left out in the opencoded variant: spufs: explicitly checks for ATTR_SIZE earlier btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above In addition to that ncpfs called inode_setattr with handcrafted iattrs, which allowed to trim down the opencoded variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-07block: unify flags for struct bio and struct requestChristoph Hellwig
Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE Btrfs: fix CLONE ioctl destination file size expansion to block boundary Btrfs: fix split_leaf double split corner case
2010-07-19Btrfs: fix checks in BTRFS_IOC_CLONE_RANGEDan Rosenberg
1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check whether the donor file is append-only before writing to it. 2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer overflow that allows a user to specify an out-of-bounds range to copy from the source file (if off + len wraps around). I haven't been able to successfully exploit this, but I'd imagine that a clever attacker could use this to read things he shouldn't. Even if it's not exploitable, it couldn't hurt to be safe. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> cc: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-07-19Btrfs: fix CLONE ioctl destination file size expansion to block boundarySage Weil
The CLONE and CLONE_RANGE ioctls round up the range of extents being cloned to the block size when the range to clone extends to the end of file (this is always the case with CLONE). It was then using that offset when extending the destination file's i_size. Fix this by not setting i_size beyond the originally requested ending offset. This bug was introduced by a22285a6 (2.6.35-rc1). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-07-19Btrfs: fix split_leaf double split corner caseChris Mason
split_leaf was not properly balancing leaves when it was forced to split a leaf twice. This commit adds an extra push left and right before forcing the double split in hopes of getting the slot where we want to insert at either the start or end of the leaf. If the extra pushes do work, then we are able to avoid splitting twice and we keep the tree properly balanced. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-07-06writeback: remove writeback_inodes_wbcChristoph Hellwig
This was just an odd wrapper around writeback_inodes_wb. Removing this also allows to get rid of the bdi member of struct writeback_control which was rather out of place there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>