summaryrefslogtreecommitdiffstats
path: root/fs/overlayfs/super.c
AgeCommit message (Collapse)Author
2020-09-02ovl: pass ovl_fs down to functions accessing private xattrsMiklos Szeredi
This paves the way for optionally using the "user.overlay." xattr namespace. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-02ovl: drop flags argument from ovl_do_setxattr()Miklos Szeredi
All callers pass zero flags to ovl_do_setxattr(). So drop this argument. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-02ovl: adhere to the vfs_ vs. ovl_do_ conventions for xattrsMiklos Szeredi
Call ovl_do_*xattr() when accessing an overlay private xattr, vfs_*xattr() otherwise. This has an effect on debug output, which is made more consistent by this patch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-02ovl: provide a mount option "volatile"Vivek Goyal
Container folks are complaining that dnf/yum issues too many sync while installing packages and this slows down the image build. Build requirement is such that they don't care if a node goes down while build was still going on. In that case, they will simply throw away unfinished layer and start new build. So they don't care about syncing intermediate state to the disk and hence don't want to pay the price associated with sync. So they are asking for mount options where they can disable sync on overlay mount point. They primarily seem to have two use cases. - For building images, they will mount overlay with nosync and then sync upper layer after unmounting overlay and reuse upper as lower for next layer. - For running containers, they don't seem to care about syncing upper layer because if node goes down, they will simply throw away upper layer and create a fresh one. So this patch provides a mount option "volatile" which disables all forms of sync. Now it is caller's responsibility to throw away upper if system crashes or shuts down and start fresh. With "volatile", I am seeing roughly 20% speed up in my VM where I am just installing emacs in an image. Installation time drops from 31 seconds to 25 seconds when nosync option is used. This is for the case of building on top of an image where all packages are already cached. That way I take out the network operations latency out of the measurement. Giuseppe is also looking to cut down on number of iops done on the disk. He is complaining that often in cloud their VMs are throttled if they cross the limit. This option can help them where they reduce number of iops (by cutting down on frequent sync and writebacks). Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-02ovl: check for incompatible features in work dirAmir Goldstein
An incompatible feature is marked by a non-empty directory nested 2 levels deep under "work" dir, e.g.: workdir/work/incompat/volatile. This commit checks for marked incompat features, warns about them and fails to mount the overlay, for example: overlayfs: overlay with incompat feature 'volatile' cannot be mounted Very old kernels (i.e. v3.18) will fail to remove a non-empty "work" dir and fail the mount. Newer kernels will fail to remove a "work" dir with entries nested 3 levels and fall back to read-only mount. User mounting with old kernel will see a warning like these in dmesg: overlayfs: cleanup of 'incompat/...' failed (-39) overlayfs: cleanup of 'work/incompat' failed (-39) overlayfs: cleanup of 'ovl-work/work' failed (-39) overlayfs: failed to create directory /vdf/ovl-work/work (errno: 17); mounting read-only These warnings should give the hint to the user that: 1. mount failure is caused by backward incompatible features 2. mount failure can be resolved by manually removing the "work" directory There is nothing preventing users on old kernels from manually removing workdir entirely or mounting overlay with a new workdir, so this is in no way a full proof backward compatibility enforcement, but only a best effort. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: fix mount option checks for nfs_export with no upperdirAmir Goldstein
Without upperdir mount option, there is no index dir and the dependency checks nfs_export => index for mount options parsing are incorrect. Allow the combination nfs_export=on,index=off with no upperdir and move the check for dependency redirect_dir=nofollow for non-upper mount case to mount options parsing. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: force read-only sb on failure to create index dirAmir Goldstein
With index feature enabled, on failure to create index dir, overlay is being mounted read-only. However, we do not forbid user to remount overlay read-write. Fix that by setting ofs->workdir to NULL, which prevents remount read-write. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: fix regression with re-formatted lower squashfsAmir Goldstein
Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") relaxed the requirement for non null uuid with single lower layer to allow enabling index and nfs_export features with single lower squashfs. Fabian reported a regression in a setup when overlay re-uses an existing upper layer and re-formats the lower squashfs image. Because squashfs has no uuid, the origin xattr in upper layer are decoded from the new lower layer where they may resolve to a wrong origin file and user may get an ESTALE or EIO error on lookup. To avoid the reported regression while still allowing the new features with single lower squashfs, do not allow decoding origin with lower null uuid unless user opted-in to one of the new features that require following the lower inode of non-dir upper (index, xino, metacopy). Reported-by: Fabian <godi.beat@gmx.net> Link: https://lore.kernel.org/linux-unionfs/32532923.JtPX5UtSzP@fgdesktop/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=onAmir Goldstein
Mounting with nfs_export=on, xfstests overlay/031 triggers a kernel panic since v5.8-rc1 overlayfs updates. overlayfs: orphan index entry (index/00fb1..., ftype=4000, nlink=2) BUG: kernel NULL pointer dereference, address: 0000000000000030 RIP: 0010:ovl_cleanup_and_whiteout+0x28/0x220 [overlay] Bisect point at commit c21c839b8448 ("ovl: whiteout inode sharing") Minimal reproducer: -------------------------------------------------- rm -rf l u w m mkdir -p l u w m mkdir -p l/testdir touch l/testdir/testfile mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m echo 1 > m/testdir/testfile umount m rm -rf u/testdir mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m umount m -------------------------------------------------- When mount with nfs_export=on, and fail to verify an orphan index, we're cleaning this index from indexdir by calling ovl_cleanup_and_whiteout(). This dereferences ofs->workdir, that was earlier set to NULL. The design was that ovl->workdir will point at ovl->indexdir, but we are assigning ofs->indexdir to ofs->workdir only after ovl_indexdir_cleanup(). There is no reason not to do it sooner, because once we get success from ofs->indexdir = ovl_workdir_create(... there is no turning back. Reported-and-tested-by: Murphy Zhou <jencce.kernel@gmail.com> Fixes: c21c839b8448 ("ovl: whiteout inode sharing") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-16ovl: inode reference leak in ovl_is_inuse true case.youngjun
When "ovl_is_inuse" true case, trap inode reference not put. plus adding the comment explaining sequence of ovl_is_inuse after ovl_setup_trap. Fixes: 0be0bfd2de9d ("ovl: fix regression caused by overlapping layers detection") Cc: <stable@vger.kernel.org> # v4.19+ Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-08ovl: remove unnecessary lock checkyoungjun
Directory is always locked until "out_unlock" label. So lock check is not needed. Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: make private mounts longtermMiklos Szeredi
Overlayfs is using clone_private_mount() to create internal mounts for underlying layers. These are used for operations requiring a path, such as dentry_open(). Since these private mounts are not in any namespace they are treated as short term, "detached" mounts and mntput() involves taking the global mount_lock, which can result in serious cacheline pingpong. Make these private mounts longterm instead, which trade the penalty on mntput() for a slightly longer shutdown time due to an added RCU grace period when putting these mounts. Introduce a new helper kern_unmount_many() that can take care of multiple longterm mounts with a single RCU grace period. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: get rid of redundant members in struct ovl_fsMiklos Szeredi
ofs->upper_mnt is copied to ->layers[0].mnt and ->layers[0].trap could be used instead of a separate ->upperdir_trap. Split the lowerdir option early to get the number of layers, then allocate the ->layers array, and finally fill the upper and lower layers, as before. Get rid of path_put_init() in ovl_lower_dir(), since the only caller will take care of that. [Colin Ian King] Fix null pointer dereference on null stack pointer on error return found by Coverity. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-04ovl: add accessor for ofs->upper_mntMiklos Szeredi
Next patch will remove ofs->upper_mnt, so add an accessor function for this field. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: sync dirty data when remounting to ro modeChengguang Xu
sync_filesystem() does not sync dirty data for readonly filesystem during umount, so before changing to readonly filesystem we should sync dirty data for data integrity. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: whiteout inode sharingChengguang Xu
Share inode with different whiteout files for saving inode and speeding up delete operation. If EMLINK is encountered when linking a shared whiteout, create a new one. In case of any other error, disable sharing for this super block. Note: ofs->whiteout is protected by inode lock on workdir. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: inherit SB_NOSEC flag from upperdirJeffle Xu
Since the stacking of regular file operations [1], the overlayfs edition of write_iter() is called when writing regular files. Since then, xattr lookup is needed on every write since file_remove_privs() is called from ovl_write_iter(), which would become the performance bottleneck when writing small chunks of data. In my test case, file_remove_privs() would consume ~15% CPU when running fstime of unixbench (the workload is repeadly writing 1 KB to the same file) [2]. Inherit the SB_NOSEC flag from upperdir. Since then xattr lookup would be done only once on the first write. Unixbench fstime gets a ~20% performance gain with this patch. [1] https://lore.kernel.org/lkml/20180606150905.GC9426@magnolia/T/ [2] https://www.spinics.net/lists/linux-unionfs/msg07153.html Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: skip overlayfs superblocks at global syncKonstantin Khlebnikov
Stacked filesystems like overlayfs has no own writeback, but they have to forward syncfs() requests to backend for keeping data integrity. During global sync() each overlayfs instance calls method ->sync_fs() for backend although it itself is in global list of superblocks too. As a result one syscall sync() could write one superblock several times and send multiple disk barriers. This patch adds flag SB_I_SKIP_SYNC into sb->sb_iflags to avoid that. Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: index dir act as work dirAmir Goldstein
With index=on, let index dir act as the work dir for copy up and cleanups. This will help implementing whiteout inode sharing. We still create the "work" dir on mount regardless of index=on and it is used to test the features supported by upper fs. One reason is that before the feature tests, we do not know if index could be enabled or not. The reason we do not use "index" directory also as workdir with index=off is because the existence of the "index" directory acts as a simple persistent signal that index was enabled on this filesystem and tools may want to use that signal. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13ovl: resolve more conflicting mount optionsAmir Goldstein
Similar to the way that a conflict between metacopy=on,redirect_dir=off is resolved, also resolve conflicts between nfs_export=on,index=off and nfs_export=on,metacopy=on. An explicit mount option wins over a default config value. Both explicit mount options result in an error. Without this change the xfstests group overlay/exportfs are skipped if metacopy is enabled by default. Reported-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-27ovl: enable xino automatically in more casesAmir Goldstein
So far, with xino=auto, we only enable xino if we know that all underlying filesystem use 32bit inode numbers. When users configure overlay with xino=auto, they already declare that they are ready to handle 64bit inode number from overlay. It is a very common case, that underlying filesystem uses 64bit ino, but rarely or never uses the high inode number bits (e.g. tmpfs, xfs). Leaving it for the users to declare high ino bits are unused with xino=on is not a recipe for many users to enjoy the benefits of xino. There appears to be very little reason not to enable xino when users declare xino=auto even if we do not know how many bits underlying filesystem uses for inode numbers. In the worst case of xino bits overflow by real inode number, we already fall back to the non-xino behavior - real inode number with unique pseudo dev or to non persistent inode number and overlay st_dev (for directories). The only annoyance from auto enabling xino is that xino bits overflow emits a warning to kmsg. Suppress those warnings unless users explicitly asked for xino=on, suggesting that they expected high ino bits to be unused by underlying filesystem. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-27ovl: avoid possible inode number collisions with xino=onAmir Goldstein
When xino feature is enabled and a real directory inode number overflows the lower xino bits, we cannot map this directory inode number to a unique and persistent inode number and we fall back to the real inode st_ino and overlay st_dev. The real inode st_ino with high bits may collide with a lower inode number on overlay st_dev that was mapped using xino. To avoid possible collision with legitimate xino values, map a non persistent inode number to a dedicated range in the xino address space. The dedicated range is created by adding one more bit to the number of reserved high xino bits. We could have added just one more fsid, but that would have had the undesired effect of changing persistent overlay inode numbers on kernel or require more complex xino mapping code. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-27ovl: use a private non-persistent ino poolAmir Goldstein
There is no reason to deplete the system's global get_next_ino() pool for overlay non-persistent inode numbers and there is no reason at all to allocate non-persistent inode numbers for non-directories. For non-directories, it is much better to leave i_ino the same as real i_ino, to be consistent with st_ino/d_ino. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: strict upper fs requirements for remote upper fsAmir Goldstein
Overlayfs works sub-optimally with upper fs that has no xattr/d_type/ RENAME_WHITEOUT support. We should basically deprecate support for those filesystems, but so far, we only issue a warning and don't fail the mount for the sake of backward compat. Some features are already being disabled with no xattr support. For newly supported remote upper fs, we do not need to worry about backward compatibility, so we can fail the mount if upper fs is a sub-optimal filesystem. This reduces the in-tree remote filesystems supported as upper to just FUSE, for which the remote upper fs support was added. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: check if upper fs supports RENAME_WHITEOUTAmir Goldstein
As with other required upper fs features, we only warn if support is missing to avoid breaking existing sub-optimal setups. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: allow remote upperMiklos Szeredi
No reason to prevent upper layer being a remote filesystem. Do the revalidation in that case, just as we already do for lower layers. This lets virtiofs be used as upper layer, which appears to be a real use case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: decide if revalidate needed on a per-dentry basisMiklos Szeredi
Allow completely skipping ->revalidate() on a per-dentry basis, in case the underlying layers used for a dentry do not themselves have ->revalidate(). E.g. negative overlay dentry has no underlying layers, hence revalidate is unnecessary. Or if lower layer is remote but overlay dentry is pure-upper, then can skip revalidate. The following places need to update whether the dentry needs revalidate or not: - fill-super (root dentry) - lookup - create - fh_to_dentry Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: separate detection of remote upper layer from stacked overlayMiklos Szeredi
Following patch will allow remote as upper layer, but not overlay stacked on upper layer. Separate the two concepts. This patch is doesn't change behavior. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: restructure dentry revalidationMiklos Szeredi
Use a common loop for plain and weak revalidation. This will aid doing revalidation on upper layer. This patch doesn't change behavior. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: simplify i_ino initializationAmir Goldstein
Move i_ino initialization to ovl_inode_init() to avoid the dance of setting i_ino in ovl_fill_inode() sometimes on the first call and sometimes on the seconds call. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-17ovl: factor out helper ovl_get_root()Amir Goldstein
Allocates and initializes the root dentry and inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-13ovl: fix some xino configurationsAmir Goldstein
Fix up two bugs in the coversion to xino_mode: 1. xino=off does not always end up in disabled mode 2. xino=auto on 32bit arch should end up in disabled mode Take a proactive approach to disabling xino on 32bit kernel: 1. Disable XINO_AUTO config during build time 2. Disable xino with a warning on mount time As a by product, xino=on on 32bit arch also ends up in disabled mode. We never intended to enable xino on 32bit arch and this will make the rest of the logic simpler. Fixes: 0f831ec85eda ("ovl: simplify ovl_same_sb() helper") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: implement async IO routinesJiufei Xue
A performance regression was observed since linux v4.19 with aio test using fio with iodepth 128 on overlayfs. The queue depth of the device was always 1 which is unexpected. After investigation, it was found that commit 16914e6fc7e1 ("ovl: add ovl_read_iter()") and commit 2a92e07edc5e ("ovl: add ovl_write_iter()") resulted in vfs_iter_{read,write} being called on underlying filesystem, which always results in syncronous IO. Implement async IO for stacked reading and writing. This resolves the performance regresion. This is implemented by allocating a new kiocb for submitting the AIO request on the underlying filesystem. When the request is completed, the new kiocb is freed and the completion callback is called on the original iocb. Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: layer is constMiklos Szeredi
The ovl_layer struct is never modified except at initialization. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: fix corner case of non-constant st_dev;st_inoAmir Goldstein
On non-samefs overlay without xino, non pure upper inodes should use a pseudo_dev assigned to each unique lower fs, but if lower layer is on the same fs and upper layer, it has no pseudo_dev assigned. In this overlay layers setup: - two filesystems, A and B - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B Non pure upper overlay inode, whose origin is in layer 1 will have the st_dev;st_ino values of the real lower inode before copy up and the st_dev;st_ino values of the real upper inode after copy up. Fix this inconsitency by assigning a unique pseudo_dev also for upper fs, that will be used as st_dev value along with the lower inode st_dev for overlay inodes in the case above. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: fix corner case of conflicting lower layer uuidAmir Goldstein
This fixes ovl_lower_uuid_ok() to correctly detect the corner case: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B In this case, bad_uuid would not have been set for B, because the check only involved the list of lower fs. Hence we'll try to decode a layer 2 origin on layer 1 and fail. We check for conflicting (and null) uuid among all lower layers, including those layers that are on the same fs as the upper layer. Reported-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: generalize the lower_fs[] arrayAmir Goldstein
Rename lower_fs[] array to fs[], extend its size by one and use index fsid (instead of fsid-1) to access the fs[] array. Initialize fs[0] with upper fs values. fsid 0 is reserved even with lower only overlay, so fs[0] remains null in this case. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-24ovl: simplify ovl_same_sb() helperAmir Goldstein
No code uses the sb returned from this helper, so make it retrun a boolean and rename it to ovl_same_fs(). The xino mode is irrelevant when all layers are on same fs, so instead of describing samefs with mode OVL_XINO_OFF, use a new xino_mode state, which is 0 in the case of samefs, -1 in the case of xino=off and > 0 with xino enabled. Create a new helper ovl_same_dev(), to use instead of the common check for (ovl_same_fs() || xinobits). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-22ovl: generalize the lower_layers[] arrayAmir Goldstein
Rename lower_layers[] array to layers[], extend its size by one and initialize layers[0] with upper layer values. Lower layers are now addressed with index 1..numlower. layers[0] is reserved even with lower only overlay. [SzM: replace ofs->numlower with ofs->numlayer, the latter's value is incremented by one] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-22ovl: use pr_fmt auto generate prefixlijiazi
Use pr_fmt auto generate "overlayfs: " prefix. Signed-off-by: lijiazi <lijiazi@xiaomi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-10ovl: fix lookup failure on multi lower squashfsAmir Goldstein
In the past, overlayfs required that lower fs have non null uuid in order to support nfs export and decode copy up origin file handles. Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") relaxed this requirement for nfs export support, as long as uuid (even if null) is unique among all lower fs. However, said commit unintentionally also relaxed the non null uuid requirement for decoding copy up origin file handles, regardless of the unique uuid requirement. Amend this mistake by disabling decoding of copy up origin file handle from lower fs with a conflicting uuid. We still encode copy up origin file handles from those fs, because file handles like those already exist in the wild and because they might provide useful information in the future. There is an unhandled corner case described by Miklos this way: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B In this case bad_uuid won't be set for B, because the check only involves the list of lower fs. Hence we'll try to decode a layer 2 origin on layer 1 and fail. We will deal with this corner case later. Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid ...") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-07-16ovl: fix regression caused by overlapping layers detectionAmir Goldstein
Once upon a time, commit 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs") in v4.13 added some sanity checks on overlayfs layers. This change caused a docker regression. The root cause was mount leaks by docker, which as far as I know, still exist. To mitigate the regression, commit 85fdee1eef1a ("ovl: fix regression caused by exclusive upper/work dir protection") in v4.14 turned the mount errors into warnings for the default index=off configuration. Recently, commit 146d62e5a586 ("ovl: detect overlapping layers") in v5.2, re-introduced exclusive upper/work dir checks regardless of index=off configuration. This changes the status quo and mount leak related bug reports have started to re-surface. Restore the status quo to fix the regressions. To clarify, index=off does NOT relax overlapping layers check for this ovelayfs mount. index=off only relaxes exclusive upper/work dir checks with another overlayfs mount. To cover the part of overlapping layers detection that used the exclusive upper/work dir checks to detect overlap with self upper/work dir, add a trap also on the work base dir. Link: https://github.com/moby/moby/issues/34672 Link: https://lore.kernel.org/linux-fsdevel/20171006121405.GA32700@veci.piliscsaba.szeredi.hu/ Link: https://github.com/containers/libpod/issues/3540 Fixes: 146d62e5a586 ("ovl: detect overlapping layers") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Colin Walters <walters@verbum.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-21Merge tag 'spdx-5.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-18ovl: fix typo in MODULE_PARM_DESCNicolas Schier
Change first argument to MODULE_PARM_DESC() calls, that each of them matched the actual module parameter name. The matching results in changing (the 'parm' section from) the output of `modinfo overlay` from: parm: ovl_check_copy_up:Obsolete; does nothing parm: redirect_max:ushort parm: ovl_redirect_max:Maximum length of absolute redirect xattr value parm: redirect_dir:bool parm: ovl_redirect_dir_def:Default to on or off for the redirect_dir feature parm: redirect_always_follow:bool parm: ovl_redirect_always_follow:Follow redirects even if redirect_dir feature is turned off parm: index:bool parm: ovl_index_def:Default to on or off for the inodes index feature parm: nfs_export:bool parm: ovl_nfs_export_def:Default to on or off for the NFS export feature parm: xino_auto:bool parm: ovl_xino_auto_def:Auto enable xino feature parm: metacopy:bool parm: ovl_metacopy_def:Default to on or off for the metadata only copy up feature into: parm: check_copy_up:Obsolete; does nothing parm: redirect_max:Maximum length of absolute redirect xattr value (ushort) parm: redirect_dir:Default to on or off for the redirect_dir feature (bool) parm: redirect_always_follow:Follow redirects even if redirect_dir feature is turned off (bool) parm: index:Default to on or off for the inodes index feature (bool) parm: nfs_export:Default to on or off for the NFS export feature (bool) parm: xino_auto:Auto enable xino feature (bool) parm: metacopy:Default to on or off for the metadata only copy up feature (bool) Signed-off-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-18ovl: fix bogus -Wmaybe-unitialized warningArnd Bergmann
gcc gets a bit confused by the logic in ovl_setup_trap() and can't figure out whether the local 'trap' variable in the caller was initialized or not: fs/overlayfs/super.c: In function 'ovl_fill_super': fs/overlayfs/super.c:1333:4: error: 'trap' may be used uninitialized in this function [-Werror=maybe-uninitialized] iput(trap); ^~~~~~~~~~ fs/overlayfs/super.c:1312:17: note: 'trap' was declared here Reword slightly to make it easier for the compiler to understand. Fixes: 146d62e5a586 ("ovl: detect overlapping layers") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-18ovl: don't fail with disconnected lower NFSMiklos Szeredi
NFS mounts can be disconnected from fs root. Don't fail the overlapping layer check because of this. The check is not authoritative anyway, since topology can change during or after the check. Reported-by: Antti Antinoja <antti@fennosys.fi> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 146d62e5a586 ("ovl: detect overlapping layers")
2019-05-29ovl: detect overlapping layersAmir Goldstein
Overlapping overlay layers are not supported and can cause unexpected behavior, but overlayfs does not currently check or warn about these configurations. User is not supposed to specify the same directory for upper and lower dirs or for different lower layers and user is not supposed to specify directories that are descendants of each other for overlay layers, but that is exactly what this zysbot repro did: https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000 Moving layer root directories into other layers while overlayfs is mounted could also result in unexpected behavior. This commit places "traps" in the overlay inode hash table. Those traps are dummy overlay inodes that are hashed by the layers root inodes. On mount, the hash table trap entries are used to verify that overlay layers are not overlapping. While at it, we also verify that overlay layers are not overlapping with directories "in-use" by other overlay instances as upperdir/workdir. On lookup, the trap entries are used to verify that overlay layers root inodes have not been moved into other layers after mount. Some examples: $ ./run --ov --samefs -s ... ( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt mount -o bind base/lower lower mount -o bind base/upper upper mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w) $ umount mnt $ mount -t overlay none mnt ... -o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w [ 94.434900] overlayfs: overlapping upperdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w [ 151.350132] overlayfs: conflicting lowerdir path mount: none is already mounted or mnt busy $ mount -t overlay none mnt ... -o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w [ 201.205045] overlayfs: overlapping lowerdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w $ mv base/upper/0/ base/lower/ $ find mnt/0 mnt/0 mnt/0/w find: 'mnt/0/w/work': Too many levels of symbolic links find: 'mnt/0/u': Too many levels of symbolic links Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-05-01overlayfs: make use of ->free_inode()Al Viro
synchronous parts are left in ->destroy_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-01ovl: automatically enable redirect_dir on metacopy=onMiklos Szeredi
Current behavior is to automatically disable metacopy if redirect_dir is not enabled and proceed with the mount. If "metacopy=on" mount option was given, then this behavior can confuse the user: no mount failure, yet metacopy is disabled. This patch makes metacopy=on imply redirect_dir=on. The converse is also true: turning off full redirect with redirect_dir= {off|follow|nofollow} will disable metacopy. If both metacopy=on and redirect_dir={off|follow|nofollow} is specified, then mount will fail, since there's no way to correctly resolve the conflict. Reported-by: Daniel Walsh <dwalsh@redhat.com> Fixes: d5791044d2e5 ("ovl: Provide a mount option metacopy=on/off...") Cc: <stable@vger.kernel.org> # v4.19 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>