summaryrefslogtreecommitdiffstats
path: root/fs/fuse/file.c
AgeCommit message (Collapse)Author
2020-06-09Merge tag 'fuse-update-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix a rare deadlock in virtiofs - Fix st_blocks in writeback cache mode - Fix wrong checks in splice move causing spurious warnings - Fix a race between a GETATTR request and a FUSE_NOTIFY_INVAL_INODE notification - Use rb-tree instead of linear search for pages currently under writeout by userspace - Fix copy_file_range() inconsistencies * tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: copy_file_range should truncate cache fuse: fix copy_file_range cache issues fuse: optimize writepages search fuse: update attr_version counter on fuse_notify_inval_inode() fuse: don't check refcount after stealing page fuse: fix weird page warning fuse: use dump_page virtiofs: do not use fuse_fill_super_common() for device installation fuse: always allow query of st_dev fuse: always flush dirty data on close(2) fuse: invalidate inode attr in writeback cache mode fuse: Update stale comment in queue_interrupt() fuse: BUG_ON correction in fuse_dev_splice_write() virtiofs: Add mount option and atime behavior to the doc virtiofs: schedule blocking async replies in separate worker
2020-06-02fuse: convert from readpages to readaheadMatthew Wilcox (Oracle)
Implement the new readahead operation in fuse by using __readahead_batch() to fill the array of pages in fuse_args_pages directly. This lets us inline fuse_readpages_fill() into fuse_readahead(). [willy@infradead.org: build fix] Link: http://lkml.kernel.org/r/20200415025938.GB5820@bombadil.infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: http://lkml.kernel.org/r/20200414150233.24495-25-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-20fuse: copy_file_range should truncate cacheMiklos Szeredi
After the copy operation completes the cache is not up-to-date. Truncate all pages in the interval that has successfully been copied. Truncating completely copied dirty pages is okay, since the data has been overwritten anyway. Truncating partially copied dirty pages is not okay; add a comment for now. Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-20fuse: fix copy_file_range cache issuesMiklos Szeredi
a) Dirty cache needs to be written back not just in the writeback_cache case, since the dirty pages may come from memory maps. b) The fuse_writeback_range() helper takes an inclusive interval, so the end position needs to be pos+len-1 instead of pos+len. Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: optimize writepages searchMaxim Patlasov
Re-work fi->writepages, replacing list with rb-tree. This improves performance because kernel fuse iterates through fi->writepages for each writeback page and typical number of entries is about 800 (for 100MB of fuse writeback). Before patch: 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 41.3473 s, 260 MB/s 2 1 0 57445400 40416 6323676 0 0 33 374743 8633 19210 1 8 88 3 0 29.86% [kernel] [k] _raw_spin_lock 26.62% [fuse] [k] fuse_page_is_writeback After patch: 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 21.4954 s, 500 MB/s 2 9 0 53676040 31744 10265984 0 0 64 854790 10956 48387 1 6 88 6 0 23.55% [kernel] [k] copy_user_enhanced_fast_string 9.87% [kernel] [k] __memcpy 3.10% [kernel] [k] _raw_spin_lock Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: always flush dirty data on close(2)Miklos Szeredi
We want cached data to synced with the userspace filesystem on close(), for example to allow getting correct st_blocks value. Do this regardless of whether the userspace filesystem implements a FLUSH method or not. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: invalidate inode attr in writeback cache modeEryu Guan
Under writeback mode, inode->i_blocks is not updated, making utils du read st.blocks as 0. For example, when using virtiofs (cache=always & nondax mode) with writeback_cache enabled, writing a new file and check its disk usage with du, du reports 0 usage. # uname -r 5.6.0-rc6+ # mount -t virtiofs virtiofs /mnt/virtiofs # rm -f /mnt/virtiofs/testfile # create new file and do extend write # xfs_io -fc "pwrite 0 4k" /mnt/virtiofs/testfile wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0001 sec (28.103 MiB/sec and 7194.2446 ops/sec) # du -k /mnt/virtiofs/testfile 0 <==== disk usage is 0 # stat -c %s,%b /mnt/virtiofs/testfile 4096,0 <==== i_size is correct, but st_blocks is 0 Fix it by invalidating attr in fuse_flush(), so we get up-to-date attr from server on next getattr. Signed-off-by: Eryu Guan <eguan@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-04-20virtiofs: schedule blocking async replies in separate workerVivek Goyal
In virtiofs (unlike in regular fuse) processing of async replies is serialized. This can result in a deadlock in rare corner cases when there's a circular dependency between the completion of two or more async replies. Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR == SCRATCH_MNT (which is a misconfiguration): - Process A is waiting for page lock in worker thread context and blocked (virtio_fs_requests_done_work()). - Process B is holding page lock and waiting for pending writes to finish (fuse_wait_on_page_writeback()). - Write requests are waiting in virtqueue and can't complete because worker thread is blocked on page lock (process A). Fix this by creating a unique work_struct for each async reply that can block (O_DIRECT read). Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06fuse: use true,false for bool variablezhengbin
Fixes coccicheck warning: fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06fuse: don't overflow LLONG_MAX with end offsetMiklos Szeredi
Handle the special case of fuse_readpages() wanting to read the last page of a hugest file possible and overflowing the end offset in the process. This is basically to unbreak xfstests:generic/525 and prevent filesystems from doing bad things with an overflowing offset. Reported-by: Xiao Yang <ice_yangxiao@163.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06fix up iter on short count in fuse_direct_io()Miklos Szeredi
fuse_direct_io() can end up advancing the iterator by more than the amount of data read or written. This case is handled by the generic code if going through ->direct_IO(), but not in the FOPEN_DIRECT_IO case. Fix by reverting the extra bytes from the iterator in case of error or a short count. To test: install lxcfs, then the following testcase int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY); sendfile(1, fd, NULL, 16777216); sendfile(1, fd, NULL, 16777216); will spew WARN_ON() in iov_iter_pipe(). Reported-by: Peter Geis <pgwipeout@gmail.com> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 3c3db095b68c ("fuse: use iov_iter based generic splice helpers") Cc: <stable@vger.kernel.org> # v5.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-16fuse: fix fuse_send_readpages() in the syncronous read caseMiklos Szeredi
Buffered read in fuse normally goes via: -> generic_file_buffered_read() -> fuse_readpages() -> fuse_send_readpages() ->fuse_simple_request() [called since v5.4] In the case of a read request, fuse_simple_request() will return a non-negative bytecount on success or a negative error value. A positive bytecount was taken to be an error and the PG_error flag set on the page. This resulted in generic_file_buffered_read() falling back to ->readpage(), which would repeat the read request and succeed. Because of the repeated read succeeding the bug was not detected with regression tests or other use cases. The FTP module in GVFS however fails the second read due to the non-seekable nature of FTP downloads. Fix by checking and ignoring positive return value from fuse_simple_request(). Reported-by: Ondrej Holy <oholy@redhat.com> Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441 Fixes: 134831e36bbd ("fuse: convert readpages to simple api") Cc: <stable@vger.kernel.org> # v5.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-11-27fuse: fix leak of fuse_io_privMiklos Szeredi
exit_aio() is sometimes stuck in wait_for_completion() after aio is issued with direct IO and the task receives a signal. The reason is failure to call ->ki_complete() due to a leaked reference to fuse_io_priv. This happens in fuse_async_req_send() if fuse_simple_background() returns an error (e.g. -EINTR). In this case the error value is propagated via io->err, so return success to not confuse callers. This issue is tracked as a virtio-fs issue: https://gitlab.com/virtio-fs/qemu/issues/14 Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Fixes: 45ac96ed7c36 ("fuse: convert direct_io to simple api") Cc: <stable@vger.kernel.org> # v5.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-11-12fuse: verify write returnMiklos Szeredi
Make sure filesystem is not returning a bogus number of bytes written. Fixes: ea9b9907b82a ("fuse: implement perform_write") Cc: <stable@vger.kernel.org> # v2.6.26 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23fuse: redundant get_fuse_inode() calls in fuse_writepages_fill()Vasily Averin
Currently fuse_writepages_fill() calls get_fuse_inode() few times with the same argument. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23fuse: truncate pending writes on O_TRUNCMiklos Szeredi
Make sure cached writes are not reordered around open(..., O_TRUNC), with the obvious wrong results. Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24fuse: kmemcg account fs dataKhazhismel Kumykov
account per-file, dentry, and inode data blockdev/superblock and temporary per-request data was left alone, as this usually isn't accounted Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24fuse: fix missing unlock_page in fuse_writepage()Vasily Averin
unlock_page() was missing in case of an already in-flight write against the same page. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Fixes: ff17be086477 ("fuse: writepage: skip already in flight") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: simplify request allocationMiklos Szeredi
Page arrays are not allocated together with the request anymore. Get rid of the dead code Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert release to simple apiMiklos Szeredi
Since we cannot reserve the request structure up-front, make sure that the request allocation doesn't fail using __GFP_NOFAIL. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert writepages to simple apiMiklos Szeredi
Derive fuse_writepage_args from fuse_io_args. Sending the request is tricky since it was done with fi->lock held, hence we must either use atomic allocation or release the lock. Both are possible so try atomic first and if it fails, release the lock and do the regular allocation with GFP_NOFS and __GFP_NOFAIL. Both flags are necessary for correct operation. Move the page realloc function from dev.c to file.c and convert to using fuse_writepage_args. The last caller of fuse_write_fill() is gone, so get rid of it. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert readdir to simple apiMiklos Szeredi
The old fuse_read_fill() helper can be deleted, now that the last user is gone. The fuse_io_args struct is moved to fuse_i.h so it can be shared between readdir/read code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert readpages to simple apiMiklos Szeredi
Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the functionality of the same named members in fuse_req. fuse_short_read() can now take struct fuse_args_pages. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert direct_io to simple apiMiklos Szeredi
Change of semantics in fuse_async_req_send/fuse_send_(read|write): these can now return error, in which case the 'end' callback isn't called, so the fuse_io_args object needs to be freed. Added verification that the return value is sane (less than or equal to the requested read/write size). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert sync write to simple apiMiklos Szeredi
Extract a fuse_write_flags() helper that converts ki_flags relevant write to open flags. The other parts of fuse_send_write() aren't used in the fuse_perform_write() case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: covert readpage to simple apiMiklos Szeredi
Derive fuse_io_args from struct fuse_args_pages. This will be used for both synchronous and asynchronous read/write requests. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: fuse_short_read(): don't take fuse_req as argumentMiklos Szeredi
This will allow the use of this function when converting to the simple api (which doesn't use fuse_req). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert ioctl to simple apiMiklos Szeredi
fuse_simple_request() is converted to return length of last (instead of single) out arg, since FUSE_IOCTL_OUT has two out args, the second of which is variable length. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: move page allocMiklos Szeredi
fuse_req_pages_alloc() is moved to file.c, since its internal use by the device code will eventually be removed. Rename to fuse_pages_alloc() to signify that it's not only usable for fuse_req page array. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert destroy to simple apiMiklos Szeredi
We can use the "force" flag to make sure the DESTROY request is always sent to userspace. So no need to keep it allocated during the lifetime of the filesystem. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert flush to simple apiMiklos Szeredi
Add 'force' to fuse_args and use fuse_get_req_nofail_nopages() to allocate the request in that case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: simplify 'nofail' requestMiklos Szeredi
Instead of complex games with a reserved request, just use __GFP_NOFAIL. Both calers (flush, readdir) guarantee that connection was already initialized, so no need to wait for fc->initialized. Also remove unneeded clearing of FR_BACKGROUND flag. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: flatten 'struct fuse_args'Miklos Szeredi
...to make future expansion simpler. The hiearachical structure is a historical thing that does not serve any practical purpose. The generated code is excatly the same before and after the patch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-02fuse: cleanup fuse_wait_on_page_writebackMaxim Patlasov
fuse_wait_on_page_writeback() always returns zero and nobody cares. Let's make it void. Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-09fuse: copy_file_range needs to strip setuid bits and update timestampsAmir Goldstein
Like ->write_iter(), we update mtime and strip setuid of dst file before copy and like ->read_iter(), we update atime of src file after copy. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-09vfs: allow copy_file_range to copy across devicesAmir Goldstein
We want to enable cross-filesystem copy_file_range functionality where possible, so push the "same superblock only" checks down to the individual filesystem callouts so they can make their own decisions about cross-superblock copy offload and fallack to generic_copy_file_range() for cross-superblock copy. [Amir] We do not call ->remap_file_range() in case the files are not on the same sb and do not call ->copy_file_range() in case the files do not belong to the same filesystem driver. This changes behavior of the copy_file_range(2) syscall, which will now allow cross filesystem in-kernel copy. CIFS already supports cross-superblock copy, between two shares to the same server. This functionality will now be available via the copy_file_range(2) syscall. Cc: Steve French <stfrench@microsoft.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-09vfs: no fallback for ->copy_file_rangeDave Chinner
Now that we have generic_copy_file_range(), remove it as a fallback case when offloads fail. This puts the responsibility for executing fallbacks on the filesystems that implement ->copy_file_range and allows us to add operational validity checks to generic_copy_file_range(). Rework vfs_copy_file_range() to call a new do_copy_file_range() helper to execute the copying callout, and move calls to generic_file_copy_range() into filesystem methods where they currently return failures. [Amir] overlayfs is not responsible of executing the fallback. It is the responsibility of the underlying filesystem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-28fuse: extract helper for range writebackMiklos Szeredi
The fuse_writeback_range() helper flushes dirty data to the userspace filesystem. When the function returns, the WRITE requests for the data in the given range have all been completed. This is not equivalent to fsync() on the given range, since the userspace filesystem may not yet have the data on stable storage. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-05-28fuse: fix copy_file_range() in the writeback caseMiklos Szeredi
Prior to sending COPY_FILE_RANGE to userspace filesystem, we must flush all dirty pages in both the source and destination files. This patch adds the missing flush of the source file. Tested on libfuse-3.5.0 with: libfuse/example/passthrough_ll /mnt/fuse/ -o writeback libfuse/test/test_syscalls /mnt/fuse/tmp/test Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Cc: <stable@vger.kernel.org> # v4.20 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-05-27fuse: add FUSE_WRITE_KILL_PRIVMiklos Szeredi
In the FOPEN_DIRECT_IO case the write path doesn't call file_remove_privs() and that means setuid bit is not cleared if unpriviliged user writes to a file with setuid bit set. pjdfstest chmod test 12.t tests this and fails. Fix this by adding a flag to the FUSE_WRITE message that requests clearing privileges on the given file. This needs This better than just calling fuse_remove_privs(), because the attributes may not be up to date, so in that case a write may miss clearing the privileges. Test case: $ passthrough_ll /mnt/pasthrough-mnt -o default_permissions,allow_other,cache=never $ mkdir /mnt/pasthrough-mnt/testdir $ cd /mnt/pasthrough-mnt/testdir $ prove -rv pjdfstests/tests/chmod/12.t Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Vivek Goyal <vgoyal@redhat.com>
2019-05-27fuse: fallocate: fix return with locked inodeMiklos Szeredi
Do the proper cleanup in case the size check fails. Tested with xfstests:generic/228 Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate") Cc: Liu Bo <bo.liu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: Add ioctl flag for x32 compat ioctlIan Abbott
Currently, a CUSE server running on a 64-bit kernel can tell when an ioctl request comes from a process running a 32-bit ABI, but cannot tell whether the requesting process is using legacy IA32 emulation or x32 ABI. In particular, the server does not know the size of the client process's `time_t` type. For 64-bit kernels, the `FUSE_IOCTL_COMPAT` and `FUSE_IOCTL_32BIT` flags are currently set in the ioctl input request (`struct fuse_ioctl_in` member `flags`) for a 32-bit requesting process. This patch defines a new flag `FUSE_IOCTL_COMPAT_X32` and sets it if the 32-bit requesting process is using the x32 ABI. This allows the server process to distinguish between requests coming from client processes using IA32 emulation or the x32 ABI and so infer the size of the client process's `time_t` type and any other IA32/x32 differences. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: document fuse_fsync_in.fsync_flagsAlan Somers
The FUSE_FSYNC_DATASYNC flag was introduced by commit b6aeadeda22a ("[PATCH] FUSE - file operations") as a magic number. No new values have been added to fsync_flags since. Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: Add FOPEN_STREAM to use stream_open()Kirill Smelkov
Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: honor RLIMIT_FSIZE in fuse_file_fallocateLiu Bo
fstests generic/228 reported this failure that fuse fallocate does not honor what 'ulimit -f' has set. This adds the necessary inode_newsize_ok() check. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation") Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: fix writepages on 32bitMiklos Szeredi
Writepage requests were cropped to i_size & 0xffffffff, which meant that mmaped writes to any file larger than 4G might be silently discarded. Fix by storing the file size in a properly sized variable (loff_t instead of size_t). Reported-by: Antonio SJ Musumeci <trapexit@spawn.link> Fixes: 6eaf4782eb09 ("fuse: writepages: crop secondary requests") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-02-13fuse: cache readdir calls if filesystem opts out of opendirChad Austin
If a filesystem returns ENOSYS from opendir and thus opts out of opendir and releasedir requests, it almost certainly would also like readdir results cached. Default open_flags to FOPEN_KEEP_CACHE and FOPEN_CACHE_DIR in that case. With this patch, I've measured recursive directory enumeration across large FUSE mounts to be faster than native mounts. Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-02-13fuse: support clients that don't implement 'opendir'Chad Austin
Allow filesystems to return ENOSYS from opendir, preventing the kernel from sending opendir and releasedir messages in the future. This avoids userspace transitions when filesystems don't need to keep track of state per directory handle. A new capability flag, FUSE_NO_OPENDIR_SUPPORT, parallels FUSE_NO_OPEN_SUPPORT, indicating the new semantics for returning ENOSYS from opendir. Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-02-13fuse: lift bad inode checks into callersMiklos Szeredi
Bad inode checks were done done in various places, and move them into fuse_file_{read|write}_iter(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-02-13fuse: multiplex cached/direct_io file operationsMiklos Szeredi
This is cleanup, as well as allowing switching between I/O modes while the file is open in the future. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>