summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2020-08-22Merge branch 'work.epoll' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull epoll fixes from Al Viro: "Fix reference counting and clean up exit paths" * 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_epoll_ctl(): clean the failure exits up a bit epoll: Keep a reference on files added to the check list
2020-08-22do_epoll_ctl(): clean the failure exits up a bitAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-22epoll: Keep a reference on files added to the check listMarc Zyngier
When adding a new fd to an epoll, and that this new fd is an epoll fd itself, we recursively scan the fds attached to it to detect cycles, and add non-epool files to a "check list" that gets subsequently parsed. However, this check list isn't completely safe when deletions can happen concurrently. To sidestep the issue, make sure that a struct file placed on the check list sees its f_count increased, ensuring that a concurrent deletion won't result in the file disapearing from under our feet. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-21Merge tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Make sure the head link cancelation includes async work - Get rid of kiocb_wait_page_queue_init(), makes no sense to have it as a separate function since you moved it into io_uring itself - io_import_iovec cleanups (Pavel, me) - Use system_unbound_wq for ring exit work, to avoid spawning tons of these if we have tons of rings exiting at the same time - Fix req->flags overflow flag manipulation (Pavel) * tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block: io_uring: kill extra iovec=NULL in import_iovec() io_uring: comment on kfree(iovec) checks io_uring: fix racy req->flags modification io_uring: use system_unbound_wq for ring exit work io_uring: cleanup io_import_iovec() of pre-mapped request io_uring: get rid of kiocb_wait_page_queue_init() io_uring: find and cancel head link async work on files exit
2020-08-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "11 patches. Subsystems affected by this: misc, mm/hugetlb, mm/vmalloc, mm/misc, romfs, relay, uprobes, squashfs, mm/cma, mm/pagealloc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, page_alloc: fix core hung in free_pcppages_bulk() mm: include CMA pages in lowmem_reserve at boot squashfs: avoid bio_alloc() failure with 1Mbyte blocks uprobes: __replace_page() avoid BUG in munlock_vma_page() kernel/relay.c: fix memleak on destroy relay channel romfs: fix uninitialized memory leak in romfs_dev_read() mm/rodata_test.c: fix missing function declaration mm/vunmap: add cond_resched() in vunmap_pmd_range khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() hugetlb_cgroup: convert comma to semicolon mailmap: add Andi Kleen
2020-08-21Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Improvements to ext4's block allocator performance for very large file systems, especially when the file system or files which are highly fragmented. There is a new mount option, prefetch_block_bitmaps which will pull in the block bitmaps and set up the in-memory buddy bitmaps when the file system is initially mounted. Beyond that, a lot of bug fixes and cleanups. In particular, a number of changes to make ext4 more robust in the face of write errors or file system corruptions" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits) ext4: limit the length of per-inode prealloc list ext4: reorganize if statement of ext4_mb_release_context() ext4: add mb_debug logging when there are lost chunks ext4: Fix comment typo "the the". jbd2: clean up checksum verification in do_one_pass() ext4: change to use fallthrough macro ext4: remove unused parameter of ext4_generic_delete_entry function mballoc: replace seq_printf with seq_puts ext4: optimize the implementation of ext4_mb_good_group() ext4: delete invalid comments near ext4_mb_check_limits() ext4: fix typos in ext4_mb_regular_allocator() comment ext4: fix checking of directory entry validity for inline directories fs: prevent BUG_ON in submit_bh_wbc() ext4: correctly restore system zone info when remount fails ext4: handle add_system_zone() failure in ext4_setup_system_zone() ext4: fold ext4_data_block_valid_rcu() into the caller ext4: check journal inode extents more carefully ext4: don't allow overlapping system zones ext4: handle error of ext4_setup_system_zone() on remount ext4: delete the invalid BUGON in ext4_mb_load_buddy_gfp() ...
2020-08-21afs: Fix NULL deref in afs_dynroot_depopulate()David Howells
If an error occurs during the construction of an afs superblock, it's possible that an error occurs after a superblock is created, but before we've created the root dentry. If the superblock has a dynamic root (ie. what's normally mounted on /afs), the afs_kill_super() will call afs_dynroot_depopulate() to unpin any created dentries - but this will oops if the root hasn't been created yet. Fix this by skipping that bit of code if there is no root dentry. This leads to an oops looking like: general protection fault, ... KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f] ... RIP: 0010:afs_dynroot_depopulate+0x25f/0x529 fs/afs/dynroot.c:385 ... Call Trace: afs_kill_super+0x13b/0x180 fs/afs/super.c:535 deactivate_locked_super+0x94/0x160 fs/super.c:335 afs_get_tree+0x1124/0x1460 fs/afs/super.c:598 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x1387/0x2070 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 which is oopsing on this line: inode_lock(root->d_inode); presumably because sb->s_root was NULL. Fixes: 0da0b7fd73e4 ("afs: Display manually added cells in dynamic root mount") Reported-by: syzbot+c1eff8205244ae7e11a6@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21squashfs: avoid bio_alloc() failure with 1Mbyte blocksPhillip Lougher
This is a regression introduced by the patch "migrate from ll_rw_block usage to BIO". Bio_alloc() is limited to 256 pages (1 Mbyte). This can cause a failure when reading 1 Mbyte block filesystems. The problem is a datablock can be fully (or almost uncompressed), requiring 256 pages, but, because blocks are not aligned to page boundaries, it may require 257 pages to read. Bio_kmalloc() can handle 1024 pages, and so use this for the edge condition. Fixes: 93e72b3c612a ("squashfs: migrate from ll_rw_block usage to BIO") Reported-by: Nicolas Prochazka <nicolas.prochazka@gmail.com> Reported-by: Tomoatsu Shimada <shimada@walbrix.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Cc: Philippe Liard <pliard@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Adrien Schildknecht <adrien+dev@schischi.me> Cc: Daniel Rosenberg <drosen@google.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200815035637.15319-1-phillip@squashfs.org.uk Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21romfs: fix uninitialized memory leak in romfs_dev_read()Jann Horn
romfs has a superblock field that limits the size of the filesystem; data beyond that limit is never accessed. romfs_dev_read() fetches a caller-supplied number of bytes from the backing device. It returns 0 on success or an error code on failure; therefore, its API can't represent short reads, it's all-or-nothing. However, when romfs_dev_read() detects that the requested operation would cross the filesystem size limit, it currently silently truncates the requested number of bytes. This e.g. means that when the content of a file with size 0x1000 starts one byte before the filesystem size limit, ->readpage() will only fill a single byte of the supplied page while leaving the rest uninitialized, leaking that uninitialized memory to userspace. Fix it by returning an error code instead of truncating the read when the requested read operation would go beyond the end of the filesystem. Fixes: da4458bda237 ("NOMMU: Make it possible for RomFS to use MTD devices directly") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200818013202.2246365-1-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-20afs: Fix key ref leak in afs_put_operation()David Howells
The afs_put_operation() function needs to put the reference to the key that's authenticating the operation. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-20io_uring: kill extra iovec=NULL in import_iovec()Pavel Begunkov
If io_import_iovec() returns an error, return iovec is undefined and must not be used, so don't set it to NULL when failing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20io_uring: comment on kfree(iovec) checksPavel Begunkov
kfree() handles NULL pointers well, but io_{read,write}() checks it because of performance reasons. Leave a comment there for those who are tempted to patch it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20io_uring: fix racy req->flags modificationPavel Begunkov
Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and io_cqring_overflow_flush() are racy, because they might be called asynchronously. REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can be guaranteed that requests _currently_ marked inflight can't be overflown, the problem will be solved with removing the flag altogether. That's how the patch works, it removes inflight status of a request in io_cqring_fill_event() whenever it should be thrown into CQ-overflow list. That's Ok to do, because no opcode specific handling can be done after io_cqring_fill_event(), the same assumption as with "struct io_completion" patches. And it already have a good place for such cleanups, which is io_clean_op(). A nice side effect of this is removing this inflight check from the hot path. note on synchronisation: now __io_cqring_fill_event() may be taking two spinlocks simultaneously, completion_lock and inflight_lock. It's fine, because we never do that in reverse order, and CQ-overflow of inflight requests shouldn't happen often. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-19io_uring: use system_unbound_wq for ring exit workJens Axboe
We currently use system_wq, which is unbounded in terms of number of workers. This means that if we're exiting tons of rings at the same time, then we'll briefly spawn tons of event kworkers just for a very short blocking time as the rings exit. Use system_unbound_wq instead, which has a sane cap on the concurrency level. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-19ext4: limit the length of per-inode prealloc listbrookxu
In the scenario of writing sparse files, the per-inode prealloc list may be very long, resulting in high overhead for ext4_mb_use_preallocated(). To circumvent this problem, we limit the maximum length of per-inode prealloc list to 512 and allow users to modify it. After patching, we observed that the sys ratio of cpu has dropped, and the system throughput has increased significantly. We created a process to write the sparse file, and the running time of the process on the fixed kernel was significantly reduced, as follows: Running time on unfixed kernel: [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat real 0m2.051s user 0m0.008s sys 0m2.026s Running time on fixed kernel: [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat real 0m0.471s user 0m0.004s sys 0m0.395s Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: reorganize if statement of ext4_mb_release_context()brookxu
Reorganize the if statement of ext4_mb_release_context(), make it easier to read. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/5439ac6f-db79-ad68-76c1-a4dda9aa0cc3@gmail.com Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: add mb_debug logging when there are lost chunksbrookxu
Lost chunks are when some other process raced with the current thread to grab a particular block allocation. Add mb_debug log for developers who wants to see how often this is happening for a particular workload. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/0a165ac0-1912-aebd-8a0d-b42e7cd1aea1@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: Fix comment typo "the the".kyoungho koo
I have found double typed comments "the the". So i modified it to one "the" Signed-off-by: kyoungho koo <rnrudgh@gmail.com> Link: https://lore.kernel.org/r/20200424171620.GA11943@koo-Z370-HD3 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19jbd2: clean up checksum verification in do_one_pass()Shijie Luo
Remove the unnecessary chksum_err and checksum_seen variables as well as some redundant code to make the function easier to understand. [ With changes suggested by jack@ and tytso@ ] Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18io_uring: cleanup io_import_iovec() of pre-mapped requestJens Axboe
io_rw_prep_async() goes through a dance of clearing req->io, calling the iovec import, then re-setting req->io. Provide an internal helper that does the right thing without needing state tweaked to get there. This enables further cleanups in io_read, io_write, and io_resubmit_prep(), but that's left for another time. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-18ext4: change to use fallthrough macroShijie Luo
Change to use fallthrough macro in switch case. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200810114435.24182-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: remove unused parameter of ext4_generic_delete_entry functionKyoungho Koo
The ext4_generic_delete_entry function does not use the parameter handle, so it can be removed. Signed-off-by: Kyoungho Koo <rnrudgh@gmail.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200810080701.GA14160@koo-Z370-HD3 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18mballoc: replace seq_printf with seq_putsXu Wang
seq_puts is a lot cheaper than seq_printf, so use that to print literal strings. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200810022158.9167-1-vulab@iscas.ac.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: optimize the implementation of ext4_mb_good_group()brookxu
It might be better to adjust the code in two places: 1. Determine whether grp is currupt or not should be placed first. 2. (cr<=2 && free <ac->ac_g_ex.fe_len)should may belong to the crx strategy, and it may be more appropriate to put it in the subsequent switch statement block. For cr1, cr2, the conditions in switch potentially realize the above judgment. For cr0, we should add (free <ac->ac_g_ex.fe_len) judgment, and then delete (free / fragments) >= ac->ac_g_ex.fe_len), because cr0 returns true by default. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/e20b2d8f-1154-adb7-3831-a9e11ba842e9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: delete invalid comments near ext4_mb_check_limits()brookxu
These comments do not seem to be related to ext4_mb_check_limits(), it may be invalid. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c49faf0c-d5d5-9c51-6911-9e0ff57c6bfa@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18ext4: fix typos in ext4_mb_regular_allocator() commentbrookxu
Fix typos in ext4_mb_regular_allocator() comment Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/d6514145-73b3-808b-ec5a-a8be27c51f9c@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-16io_uring: get rid of kiocb_wait_page_queue_init()Jens Axboe
The 5.9 merge moved this function io_uring, which means that we don't need to retain the generic nature of it. Clean up this part by removing redundant checks, and just inlining the small remainder in io_rw_should_retry(). No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-16io_uring: find and cancel head link async work on files exitJens Axboe
Commit f254ac04c874 ("io_uring: enable lookup of links holding inflight files") only handled 2 out of the three head link cases we have, we also need to lookup and cancel work that is blocked in io-wq if that work has a link that's holding a reference to the files structure. Put the "cancel head links that hold this request pending" logic into io_attempt_cancel(), which will to through the motions of finding and canceling head links that hold the current inflight files stable request pending. Cc: stable@vger.kernel.org Reported-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-16Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "A few differerent things in here. Seems like syzbot got some more io_uring bits wired up, and we got a handful of reports and the associated fixes are in here. General fixes too, and a lot of them marked for stable. Lastly, a bit of fallout from the async buffered reads, where we now more easily trigger short reads. Some applications don't really like that, so the io_read() code now handles short reads internally, and got a cleanup along the way so that it's now easier to read (and documented). We're now passing tests that failed before" * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block: io_uring: short circuit -EAGAIN for blocking read attempt io_uring: sanitize double poll handling io_uring: internally retry short reads io_uring: retain iov_iter state over io_read/io_write calls task_work: only grab task signal lock when needed io_uring: enable lookup of links holding inflight files io_uring: fail poll arm on queue proc failure io_uring: hold 'ctx' reference around task_work queue + execute fs: RWF_NOWAIT should imply IOCB_NOIO io_uring: defer file table grabbing request cleanup for locked requests io_uring: add missing REQ_F_COMP_LOCKED for nested requests io_uring: fix recursive completion locking on oveflow flush io_uring: use TWA_SIGNAL for task_work uncondtionally io_uring: account locked memory before potential error case io_uring: set ctx sq/cq entry count earlier io_uring: Fix NULL pointer dereference in loop_rw_iter() io_uring: add comments on how the async buffered read retry works io_uring: io_async_buf_func() need not test page bit
2020-08-15io_uring: short circuit -EAGAIN for blocking read attemptJens Axboe
One case was missed in the short IO retry handling, and that's hitting -EAGAIN on a blocking attempt read (eg from io-wq context). This is a problem on sockets that are marked as non-blocking when created, they don't carry any REQ_F_NOWAIT information to help us terminate them instead of perpetually retrying. Fixes: 227c0c9673d8 ("io_uring: internally retry short reads") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15io_uring: sanitize double poll handlingJens Axboe
There's a bit of confusion on the matching pairs of poll vs double poll, depending on if the request is a pure poll (IORING_OP_POLL_ADD) or poll driven retry. Add io_poll_get_double() that returns the double poll waitqueue, if any, and io_poll_get_single() that returns the original poll waitqueue. With that, remove the argument to io_poll_remove_double(). Finally ensure that wait->private is cleared once the double poll handler has run, so that remove knows it's already been seen. Cc: stable@vger.kernel.org # v5.8 Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15Merge tag '9p-for-5.9-rc1' of git://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: - some code cleanup - a couple of static analysis fixes - setattr: try to pick a fid associated with the file rather than the dentry, which might sometimes matter * tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux: 9p: Remove unneeded cast from memory allocation 9p: remove unused code in 9p net/9p: Fix sparse endian warning in trans_fd.c 9p: Fix memory leak in v9fs_mount 9p: retrieve fid from file when file instance exist.
2020-08-15Merge tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small cifs/smb3 fixes, one for stable fixing mkdir path with the 'idsfromsid' mount option" * tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: SMB3: Fix mkdir when idsfromsid configured on mount cifs: Convert to use the fallthrough macro cifs: Fix an error pointer dereference in cifs_mount()
2020-08-15Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Stable fixes: - pNFS: Don't return layout segments that are being used for I/O - pNFS: Don't move layout segments off the active list when being used for I/O Features: - NFS: Add support for user xattrs through the NFSv4.2 protocol - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC - NFSv4.0 allow nconnect for v4.0 Bugfixes and cleanups: - nfs: ensure correct writeback errors are returned on close() - nfs: nfs_file_write() should check for writeback errors - nfs: Fix getxattr kernel panic and memory overflow - NFS: Fix the pNFS/flexfiles mirrored read failover code - SUNRPC: dont update timeout value on connection reset - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS - sunrpc: destroy rpc_inode_cachep after unregister_filesystem" * tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits) NFS: Fix flexfiles read failover fs: nfs: delete repeated words in comments rpc_pipefs: convert comma to semicolon nfs: Fix getxattr kernel panic and memory overflow NFS: Don't return layout segments that are in use NFS: Don't move layouts to plh_return_segs list while in use NFS: Add layout segment info to pnfs read/write/commit tracepoints NFS: Add tracepoints for layouterror and layoutstats. NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close() SUNRPC dont update timeout value on connection reset nfs: nfs_file_write() should check for writeback errors nfs: ensure correct writeback errors are returned on close() NFSv4.2: xattr cache: get rid of cache discard work queue NFS: remove redundant initialization of variable result NFSv4.0 allow nconnect for v4.0 freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS sunrpc: destroy rpc_inode_cachep after unregister_filesystem NFSv4.2: add client side xattr caching. NFSv4.2: hook in the user extended attribute handlers NFSv4.2: add the extended attribute proc functions. ...
2020-08-14fs: autofs: delete repeated words in commentsRandy Dunlap
Drop duplicated words {the, at} in comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ian Kent <raven@themaw.net> Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14exec: restore EACCES of S_ISDIR execve()Kees Cook
Patch series "Fix S_ISDIR execve() errno". Fix an errno change for execve() of directories, noticed by Marc Zyngier. Along with the fix, include a regression test to avoid seeing this return in the future. This patch (of 2): The return code for attempting to execute a directory has always been EACCES. Adjust the S_ISDIR exec test to reflect the old errno instead of the general EISDIR for other kinds of "open" attempts on directories. Fixes: 633fb6ac3980 ("exec: move S_ISREG() check earlier") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Greg Kroah-Hartman <gregkh@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@google.com> Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org Link: https://lore.kernel.org/lkml/20200813151305.6191993b@why Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-13SMB3: Fix mkdir when idsfromsid configured on mountSteve French
mkdir uses a compounded create operation which was not setting the security descriptor on create of a directory. Fix so mkdir now sets the mode and owner info properly when idsfromsid and modefromsid are configured on the mount. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> # v5.8 Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-08-13io_uring: internally retry short readsJens Axboe
We've had a few application cases of not handling short reads properly, and it is understandable as short reads aren't really expected if the application isn't doing non-blocking IO. Now that we retain the iov_iter over retries, we can implement internal retry pretty trivially. This ensures that we don't return a short read, even for buffered reads on page cache conflicts. Cleanup the deep nesting and hard to read nature of io_read() as well, it's much more straight forward now to read and understand. Added a few comments explaining the logic as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-13io_uring: retain iov_iter state over io_read/io_write callsJens Axboe
Instead of maintaining (and setting/remembering) iov_iter size and segment counts, just put the iov_iter in the async part of the IO structure. This is mostly a preparation patch for doing appropriate internal retries for short reads, but it also cleans up the state handling nicely and simplifies it quite a bit. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-13Merge tag 'for-5.9-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull more btrfs updates from David Sterba: "One minor update, the rest are fixes that have arrived a bit late for the first batch. There are also some recent fixes for bugs that were discovered during the merge window and pop up during testing. User visible change: - show correct subvolume path in /proc/mounts for bind mounts Fixes: - fix compression messages when remounting with different level or compression algorithm - tree-log: fix some memory leaks on error handling paths - restore I_VERSION on remount - fix return values and error code mixups - fix umount crash with quotas enabled when removing sysfs files - fix trim range on a shrunk device" * tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: trim: fix underflow in trim length to prevent access beyond device boundary btrfs: fix return value mixup in btrfs_get_extent btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() btrfs: check correct variable after allocation in btrfs_backref_iter_alloc btrfs: make sure SB_I_VERSION doesn't get unset by remount btrfs: fix memory leaks after failure to lookup checksums during inode logging btrfs: don't show full path of bind mounts in subvol= btrfs: fix messages after changing compression level by remount btrfs: only search for left_info if there is no right_info in try_merge_free_space btrfs: inode: fix NULL pointer dereference if inode doesn't need compression
2020-08-13Merge tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Two small fixes that have come in during the past week: - Fix duplicated words in comments - Fix an ubsan complaint about null pointer arithmetic" * tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init xfs: delete duplicated words + other fixes
2020-08-13Merge tag 'exfat-for-5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - don't clear MediaFailure and VolumeDirty bit in volume flags if these were already set before mounting - write multiple dirty buffers at once in sync mode - remove unneeded EXFAT_SB_DIRTY bit set * tag 'exfat-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: retain 'VolumeFlags' properly exfat: optimize exfat_zeroed_cluster() exfat: add error check when updating dir-entries exfat: write multiple sectors at once exfat: remove EXFAT_SB_DIRTY flag
2020-08-12io_uring: enable lookup of links holding inflight filesJens Axboe
When a process exits, we cancel whatever requests it has pending that are referencing the file table. However, if a link is holding a reference, then we cannot find it by simply looking at the inflight list. Enable checking of the poll and timeout list to find the link, and cancel it appropriately. Cc: stable@vger.kernel.org Reported-by: Josef <josef.grieb@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-12Merge tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "Xiubo has completed his work on filesystem client metrics, they are sent to all available MDSes once per second now. Other than that, we have a lot of fixes and cleanups all around the filesystem, including a tweak to cut down on MDS request resends in multi-MDS setups from Yanhu and fixups for SELinux symlink labeling and MClientSession message decoding from Jeff" * tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client: (22 commits) ceph: handle zero-length feature mask in session messages ceph: use frag's MDS in either mode ceph: move sb->wb_pagevec_pool to be a global mempool ceph: set sec_context xattr on symlink creation ceph: remove redundant initialization of variable mds ceph: fix use-after-free for fsc->mdsc ceph: remove unused variables in ceph_mdsmap_decode() ceph: delete repeated words in fs/ceph/ ceph: send client provided metric flags in client metadata ceph: periodically send perf metrics to MDSes ceph: check the sesion state and return false in case it is closed libceph: replace HTTP links with HTTPS ones ceph: remove unnecessary cast in kfree() libceph: just have osd_req_op_init() return a pointer ceph: do not access the kiocb after aio requests ceph: clean up and optimize ceph_check_delayed_caps() ceph: fix potential mdsc use-after-free crash ceph: switch to WARN_ON_ONCE in encode_supported_features() ceph: add global total_caps to count the mdsc's total caps number ceph: add check_session_state() helper and make it global ...
2020-08-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction, mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util, memory-hotplug, cleanups, uaccess, migration, gup, pagemap), - various other subsystems (alpha, misc, sparse, bitmap, lib, bitops, checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump, exec, kdump, rapidio, panic, kcov, kgdb, ipc). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits) mm/gup: remove task_struct pointer for all gup code mm: clean up the last pieces of page fault accountings mm/xtensa: use general page fault accounting mm/x86: use general page fault accounting mm/sparc64: use general page fault accounting mm/sparc32: use general page fault accounting mm/sh: use general page fault accounting mm/s390: use general page fault accounting mm/riscv: use general page fault accounting mm/powerpc: use general page fault accounting mm/parisc: use general page fault accounting mm/openrisc: use general page fault accounting mm/nios2: use general page fault accounting mm/nds32: use general page fault accounting mm/mips: use general page fault accounting mm/microblaze: use general page fault accounting mm/m68k: use general page fault accounting mm/ia64: use general page fault accounting mm/hexagon: use general page fault accounting mm/csky: use general page fault accounting ...
2020-08-12mm/gup: remove task_struct pointer for all gup codePeter Xu
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12exec: move path_noexec() check earlierKees Cook
The path_noexec() check, like the regular file check, was happening too late, letting LSMs see impossible execve()s. Check it earlier as well in may_open() and collect the redundant fs/exec.c path_noexec() test under the same robustness comment as the S_ISREG() check. My notes on the call path, and related arguments, checks, etc: do_open_execat() struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, ... do_filp_open(dfd, filename, open_flags) path_openat(nameidata, open_flags, flags) file = alloc_empty_file(open_flags, current_cred()); do_open(nameidata, file, open_flags) may_open(path, acc_mode, open_flag) /* new location of MAY_EXEC vs path_noexec() test */ inode_permission(inode, MAY_OPEN | acc_mode) security_inode_permission(inode, acc_mode) vfs_open(path, file) do_dentry_open(file, path->dentry->d_inode, open) security_file_open(f) open() /* old location of path_noexec() test */ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12exec: move S_ISREG() check earlierKees Cook
The execve(2)/uselib(2) syscalls have always rejected non-regular files. Recently, it was noticed that a deadlock was introduced when trying to execute pipes, as the S_ISREG() test was happening too late. This was fixed in commit 73601ea5b7b1 ("fs/open.c: allow opening only regular files during execve()"), but it was added after inode_permission() had already run, which meant LSMs could see bogus attempts to execute non-regular files. Move the test into the other inode type checks (which already look for other pathological conditions[1]). Since there is no need to use FMODE_EXEC while we still have access to "acc_mode", also switch the test to MAY_EXEC. Also include a comment with the redundant S_ISREG() checks at the end of execve(2)/uselib(2) to note that they are present to avoid any mistakes. My notes on the call path, and related arguments, checks, etc: do_open_execat() struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, ... do_filp_open(dfd, filename, open_flags) path_openat(nameidata, open_flags, flags) file = alloc_empty_file(open_flags, current_cred()); do_open(nameidata, file, open_flags) may_open(path, acc_mode, open_flag) /* new location of MAY_EXEC vs S_ISREG() test */ inode_permission(inode, MAY_OPEN | acc_mode) security_inode_permission(inode, acc_mode) vfs_open(path, file) do_dentry_open(file, path->dentry->d_inode, open) /* old location of FMODE_EXEC vs S_ISREG() test */ security_file_open(f) open() [1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12exec: change uselib(2) IS_SREG() failure to EACCESKees Cook
Patch series "Relocate execve() sanity checks", v2. While looking at the code paths for the proposed O_MAYEXEC flag, I saw some things that looked like they should be fixed up. exec: Change uselib(2) IS_SREG() failure to EACCES This just regularizes the return code on uselib(2). exec: Move S_ISREG() check earlier This moves the S_ISREG() check even earlier than it was already. exec: Move path_noexec() check earlier This adds the path_noexec() check to the same place as the S_ISREG() check. This patch (of 3): Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so the behavior matches execve(2), and the seemingly documented value. The "not a regular file" failure mode of execve(2) is explicitly documented[1], but it is not mentioned in uselib(2)[2] which does, however, say that open(2) and mmap(2) errors may apply. The documentation for open(2) does not include a "not a regular file" error[3], but mmap(2) does[4], and it is EACCES. [1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS [2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS [3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS [4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORS Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-1-keescook@chromium.org Link: http://lkml.kernel.org/r/20200605160013.3954297-2-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12coredump: add %f for executable filenameLepton Wu
The document reads "%e" should be "executable filename" while actually it could be changed by things like pr_ctl PR_SET_NAME. People who uses "%e" in core_pattern get surprised when they find out they get thread name instead of executable filename. This is either a bug of document or a bug of code. Since the behavior of "%e" is there for long time, it could bring another surprise for users if we "fix" the code. So we just "fix" the document. And more, for users who really need the "executable filename" in core_pattern, we introduce a new "%f" for the real executable filename. We already have "%E" for executable path in kernel, so just reuse most of its code for the new added "%f" format. Signed-off-by: Lepton Wu <ytht.net@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200701031432.2978761-1-ytht.net@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>