summaryrefslogtreecommitdiffstats
path: root/fs/overlayfs/super.c
AgeCommit message (Collapse)Author
2019-12-10ovl: fix lookup failure on multi lower squashfsAmir Goldstein
In the past, overlayfs required that lower fs have non null uuid in order to support nfs export and decode copy up origin file handles. Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of lower fs") relaxed this requirement for nfs export support, as long as uuid (even if null) is unique among all lower fs. However, said commit unintentionally also relaxed the non null uuid requirement for decoding copy up origin file handles, regardless of the unique uuid requirement. Amend this mistake by disabling decoding of copy up origin file handle from lower fs with a conflicting uuid. We still encode copy up origin file handles from those fs, because file handles like those already exist in the wild and because they might provide useful information in the future. There is an unhandled corner case described by Miklos this way: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B In this case bad_uuid won't be set for B, because the check only involves the list of lower fs. Hence we'll try to decode a layer 2 origin on layer 1 and fail. We will deal with this corner case later. Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid ...") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-07-16ovl: fix regression caused by overlapping layers detectionAmir Goldstein
Once upon a time, commit 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs") in v4.13 added some sanity checks on overlayfs layers. This change caused a docker regression. The root cause was mount leaks by docker, which as far as I know, still exist. To mitigate the regression, commit 85fdee1eef1a ("ovl: fix regression caused by exclusive upper/work dir protection") in v4.14 turned the mount errors into warnings for the default index=off configuration. Recently, commit 146d62e5a586 ("ovl: detect overlapping layers") in v5.2, re-introduced exclusive upper/work dir checks regardless of index=off configuration. This changes the status quo and mount leak related bug reports have started to re-surface. Restore the status quo to fix the regressions. To clarify, index=off does NOT relax overlapping layers check for this ovelayfs mount. index=off only relaxes exclusive upper/work dir checks with another overlayfs mount. To cover the part of overlapping layers detection that used the exclusive upper/work dir checks to detect overlap with self upper/work dir, add a trap also on the work base dir. Link: https://github.com/moby/moby/issues/34672 Link: https://lore.kernel.org/linux-fsdevel/20171006121405.GA32700@veci.piliscsaba.szeredi.hu/ Link: https://github.com/containers/libpod/issues/3540 Fixes: 146d62e5a586 ("ovl: detect overlapping layers") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Colin Walters <walters@verbum.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-21Merge tag 'spdx-5.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-18ovl: fix typo in MODULE_PARM_DESCNicolas Schier
Change first argument to MODULE_PARM_DESC() calls, that each of them matched the actual module parameter name. The matching results in changing (the 'parm' section from) the output of `modinfo overlay` from: parm: ovl_check_copy_up:Obsolete; does nothing parm: redirect_max:ushort parm: ovl_redirect_max:Maximum length of absolute redirect xattr value parm: redirect_dir:bool parm: ovl_redirect_dir_def:Default to on or off for the redirect_dir feature parm: redirect_always_follow:bool parm: ovl_redirect_always_follow:Follow redirects even if redirect_dir feature is turned off parm: index:bool parm: ovl_index_def:Default to on or off for the inodes index feature parm: nfs_export:bool parm: ovl_nfs_export_def:Default to on or off for the NFS export feature parm: xino_auto:bool parm: ovl_xino_auto_def:Auto enable xino feature parm: metacopy:bool parm: ovl_metacopy_def:Default to on or off for the metadata only copy up feature into: parm: check_copy_up:Obsolete; does nothing parm: redirect_max:Maximum length of absolute redirect xattr value (ushort) parm: redirect_dir:Default to on or off for the redirect_dir feature (bool) parm: redirect_always_follow:Follow redirects even if redirect_dir feature is turned off (bool) parm: index:Default to on or off for the inodes index feature (bool) parm: nfs_export:Default to on or off for the NFS export feature (bool) parm: xino_auto:Auto enable xino feature (bool) parm: metacopy:Default to on or off for the metadata only copy up feature (bool) Signed-off-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-18ovl: fix bogus -Wmaybe-unitialized warningArnd Bergmann
gcc gets a bit confused by the logic in ovl_setup_trap() and can't figure out whether the local 'trap' variable in the caller was initialized or not: fs/overlayfs/super.c: In function 'ovl_fill_super': fs/overlayfs/super.c:1333:4: error: 'trap' may be used uninitialized in this function [-Werror=maybe-uninitialized] iput(trap); ^~~~~~~~~~ fs/overlayfs/super.c:1312:17: note: 'trap' was declared here Reword slightly to make it easier for the compiler to understand. Fixes: 146d62e5a586 ("ovl: detect overlapping layers") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-18ovl: don't fail with disconnected lower NFSMiklos Szeredi
NFS mounts can be disconnected from fs root. Don't fail the overlapping layer check because of this. The check is not authoritative anyway, since topology can change during or after the check. Reported-by: Antti Antinoja <antti@fennosys.fi> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 146d62e5a586 ("ovl: detect overlapping layers")
2019-05-29ovl: detect overlapping layersAmir Goldstein
Overlapping overlay layers are not supported and can cause unexpected behavior, but overlayfs does not currently check or warn about these configurations. User is not supposed to specify the same directory for upper and lower dirs or for different lower layers and user is not supposed to specify directories that are descendants of each other for overlay layers, but that is exactly what this zysbot repro did: https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000 Moving layer root directories into other layers while overlayfs is mounted could also result in unexpected behavior. This commit places "traps" in the overlay inode hash table. Those traps are dummy overlay inodes that are hashed by the layers root inodes. On mount, the hash table trap entries are used to verify that overlay layers are not overlapping. While at it, we also verify that overlay layers are not overlapping with directories "in-use" by other overlay instances as upperdir/workdir. On lookup, the trap entries are used to verify that overlay layers root inodes have not been moved into other layers after mount. Some examples: $ ./run --ov --samefs -s ... ( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt mount -o bind base/lower lower mount -o bind base/upper upper mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w) $ umount mnt $ mount -t overlay none mnt ... -o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w [ 94.434900] overlayfs: overlapping upperdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w [ 151.350132] overlayfs: conflicting lowerdir path mount: none is already mounted or mnt busy $ mount -t overlay none mnt ... -o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w [ 201.205045] overlayfs: overlapping lowerdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w $ mv base/upper/0/ base/lower/ $ find mnt/0 mnt/0 mnt/0/w find: 'mnt/0/w/work': Too many levels of symbolic links find: 'mnt/0/u': Too many levels of symbolic links Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-05-01overlayfs: make use of ->free_inode()Al Viro
synchronous parts are left in ->destroy_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-01ovl: automatically enable redirect_dir on metacopy=onMiklos Szeredi
Current behavior is to automatically disable metacopy if redirect_dir is not enabled and proceed with the mount. If "metacopy=on" mount option was given, then this behavior can confuse the user: no mount failure, yet metacopy is disabled. This patch makes metacopy=on imply redirect_dir=on. The converse is also true: turning off full redirect with redirect_dir= {off|follow|nofollow} will disable metacopy. If both metacopy=on and redirect_dir={off|follow|nofollow} is specified, then mount will fail, since there's no way to correctly resolve the conflict. Reported-by: Daniel Walsh <dwalsh@redhat.com> Fixes: d5791044d2e5 ("ovl: Provide a mount option metacopy=on/off...") Cc: <stable@vger.kernel.org> # v4.19 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-26ovl: relax requirement for non null uuid of lower fsAmir Goldstein
We use uuid to associate an overlay lower file handle with a lower layer, so we can accept lower fs with null uuid as long as all lower layers with null uuid are on the same fs. This change allows enabling index and nfs_export features for the setup of single lower fs of type squashfs - squashfs supports file handles, but has a null uuid. This change also allows enabling index and nfs_export features for nested overlayfs, where the lower overlay has nfs_export enabled. Enabling the index feature with single lower squashfs fixes the unionmount-testsuite test: ./run --ov --squashfs --verify As a by-product, if, like the lower squashfs, upper fs also uses the generic export_encode_fh() implementation to export 32bit inode file handles (e.g. ext4), then the xino_auto config/module/mount option will enable unique overlay inode numbers. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-10ovl: fix oopses in ovl_fill_super() failure pathsMiklos Szeredi
ovl_free_fs() dereferences ofs->workbasedir and ofs->upper_mnt in cases when those might not have been initialized yet. Fix the initialization order for these fields. Reported-by: syzbot+c75f181dc8429d2eb887@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.15 Fixes: 95e6d4177cb7 ("ovl: grab reference to workbasedir early") Fixes: a9075cdb467d ("ovl: factor out ovl_free_fs() helper")
2018-07-20ovl: Do not expose metacopy only dentry from d_real()Vivek Goyal
Metacopy dentry/inode is internal to overlay and is never exposed outside of it. Exception is metacopy upper file used for fsync(). Modify d_real() to look for dentries/inode which have data, but also allow matching upper inode without data for the fsync case. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Store lower data inode in ovl_inodeVivek Goyal
Right now ovl_inode stores inode pointer for lower inode. This helps with quickly getting lower inode given overlay inode (ovl_inode_lower()). Now with metadata only copy-up, we can have metacopy inode in middle layer as well and inode containing data can be different from ->lower. I need to be able to open the real file in ovl_open_realfile() and for that I need to quickly find the lower data inode. Hence store lower data inode also in ovl_inode. Also provide an helper ovl_inode_lowerdata() to access this field. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: A new xattr OVL_XATTR_METACOPY for file on upperVivek Goyal
Now we will have the capability to have upper inodes which might be only metadata copy up and data is still on lower inode. So add a new xattr OVL_XATTR_METACOPY to distinguish between two cases. Presence of OVL_XATTR_METACOPY reflects that file has been copied up metadata only and and data will be copied up later from lower origin. So this xattr is set when a metadata copy takes place and cleared when data copy takes place. We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects whether ovl inode has data or not (as opposed to metadata only copy up). If a file is copied up metadata only and later when same file is opened for WRITE, then data copy up takes place. We copy up data, remove METACOPY xattr and then set the UPPERDATA flag in ovl_inode->flags. While all these operations happen with oi->lock held, read side of oi->flags can be lockless. That is another thread on another cpu can check if UPPERDATA flag is set or not. So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if another cpu sees UPPERDATA flag set, then it should be guaranteed that effects of data copy up and remove xattr operations are also visible. For example. CPU1 CPU2 ovl_open() acquire(oi->lock) ovl_open_maybe_copy_up() ovl_copy_up_data() open_open_need_copy_up() vfs_removexattr() ovl_already_copied_up() ovl_dentry_needs_data_copy_up() ovl_set_flag(OVL_UPPERDATA) ovl_test_flag(OVL_UPPERDATA) release(oi->lock) Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if CPU1 perceives the effects of setting UPPERDATA flag but not the effects of preceding operations (ex. upper that is not fully copied up), it will be a problem. Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation and smp_rmb() on UPPERDATA flag test operation. May be some other lock or barrier is already covering it. But I am not sure what that is and is it obvious enough that we will not break it in future. So hence trying to be safe here and introducing barriers explicitly for UPPERDATA flag/bit. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-20ovl: Provide a mount option metacopy=on/off for metadata copyupVivek Goyal
By default metadata only copy up is disabled. Provide a mount option so that users can choose one way or other. Also provide a kernel config and module option to enable/disable metacopy feature. metacopy feature requires redirect_dir=on when upper is present. Otherwise, it requires redirect_dir=follow atleast. As of now, metacopy does not work with nfs_export=on. So if both metacopy=on and nfs_export=on then nfs_export is disabled. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18vfs: remove open_flags from d_real()Miklos Szeredi
Opening regular files on overlayfs is now handled via ovl_open(). Remove the now unused "open_flags" argument from d_op->d_real() and the d_real() helper. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18Partially revert "locks: fix file locking on overlayfs"Miklos Szeredi
This partially reverts commit c568d68341be7030f5647def68851e469b21ca11. Overlayfs files will now automatically get the correct locks, no need to hack overlay support in VFS. It is a partial revert, because it leaves the locks_inode() calls in place and defines locks_inode() to file_inode(). We could revert those as well, but it would be unnecessary code churn and it makes sense to document that we are getting the inode for locking purposes. Don't revert MS_NOREMOTELOCK yet since that has been part of the userspace API for some time (though not in a useful way). Will try to remove internal flags later when the dust around the new mount API settles. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2018-07-18Revert "vfs: add flags to d_real()"Miklos Szeredi
This reverts commit 495e642939114478a5237a7d91661ba93b76f15a. No user of "flags" argument of d_real() remain. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18Revert "ovl: fix relatime for directories"Miklos Szeredi
This reverts commit cd91304e7190b4c4802f8e413ab2214b233e0260. Overlayfs no longer relies on the vfs correct atime handling. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18ovl: deal with overlay files in ovl_d_real()Miklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: return dentry from ovl_create_real()Miklos Szeredi
Al Viro suggested to simplify callers of ovl_create_real() by returning the created dentry (or ERR_PTR) from ovl_create_real(). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: struct cattr cleanupsAmir Goldstein
* Rename to ovl_cattr * Fold ovl_create_real() hardlink argument into struct ovl_cattr * Create macro OVL_CATTR() to initialize struct ovl_cattr from mode Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31ovl: strip debug argument from ovl_do_ helpersAmir Goldstein
It did not prove to be useful. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: add support for "xino" mount and config optionsAmir Goldstein
With mount option "xino=on", mounter declares that there are enough free high bits in underlying fs to hold the layer fsid. If overlayfs does encounter underlying inodes using the high xino bits reserved for layer fsid, a warning will be emitted and the original inode number will be used. The mount option name "xino" goes after a similar meaning mount option of aufs, but in overlayfs case, the mapping is stateless. An example for a use case of "xino=on" is when upper/lower is on an xfs filesystem. xfs uses 64bit inode numbers, but it currently never uses the upper 8bit for inode numbers exposed via stat(2) and that is not likely to change in the future without user opting-in for a new xfs feature. The actual number of unused upper bit is much larger and determined by the xfs filesystem geometry (64 - agno_log - agblklog - inopblog). That means that for all practical purpose, there are enough unused bits in xfs inode numbers for more than OVL_MAX_STACK unique fsid's. Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode numbers are allocated sequentially since boot, so they will practially never use the high inode number bits. For compatibility with applications that expect 32bit inodes, the feature can be disabled with "xino=off". The option "xino=auto" automatically detects underlying filesystem that use 32bit inodes and enables the feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of the same name, determine if the default mode for overlayfs mount is "xino=auto" or "xino=off". Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: constant st_ino for non-samefs with xinoAmir Goldstein
On 64bit systems, when overlay layers are not all on the same fs, but all inode numbers of underlying fs are not using the high bits, use the high bits to partition the overlay st_ino address space. The high bits hold the fsid (upper fsid is 0). This way overlay inode numbers are unique and all inodes use overlay st_dev. Inode numbers are also persistent for a given layer configuration. Currently, our only indication for available high ino bits is from a filesystem that supports file handles and uses the default encode_fh() operation, which encodes a 32bit inode number. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: allocate anon bdev per unique lower fsAmir Goldstein
Instead of allocating an anonymous bdev per lower layer, allocate one anonymous bdev per every unique lower fs that is different than upper fs. Every unique lower fs is assigned an fsid > 0 and the number of unique lower fs are stored in ofs->numlowerfs. The assigned fsid is stored in the lower layer struct and will be used also for inode number multiplexing. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16ovl: check lower ancestry on encode of lower dir file handleAmir Goldstein
This change relaxes copy up on encode of merge dir with lower layer > 1 and handles the case of encoding a merge dir with lower layer 1, where an ancestor is a non-indexed merge dir. In that case, decode of the lower file handle will not have been possible if the non-indexed ancestor is redirected before or after encode. Before encoding a non-upper directory file handle from real layer N, we need to check if it will be possible to reconnect an overlay dentry from the real lower decoded dentry. This is done by following the overlay ancestry up to a "layer N connected" ancestor and verifying that all parents along the way are "layer N connectable". If an ancestor that is NOT "layer N connectable" is found, we need to copy up an ancestor, which is "layer N connectable", thus making that ancestor "layer N connected". For example: layer 1: /a layer 2: /a/b/c The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is copied up and renamed, upper dir /a will be indexed by lower dir /a from layer 1. The dir /a from layer 2 will never be indexed, so the algorithm in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected overlay dentry from the connected lower dentry /a/b/c. To avoid this problem on decode time, we need to copy up an ancestor of /a/b/c, which is "layer 2 connectable", on encode time. That ancestor is /a/b. After copy up (and index) of /a/b, it will become "layer 2 connected" and when the time comes to decode the file handle from lower dentry /a/b/c, ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding a connected overlay dentry will be accomplished. (*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup an entry /a in the lower layers above layer N and find the indexed dir /a from layer 1. If that improvement is made, then the check for "layer N connected" will need to verify there are no redirects in lower layers above layer N. In the example above, /a will be "layer 2 connectable". However, if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be "layer 2 connectable": layer 1: /A (redirect = /a) layer 2: /a/b/c Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: wire up NFS export operationsAmir Goldstein
Now that NFS export operations are implemented, enable overlayfs NFS export support if the "nfs_export" feature is enabled. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: store 'has_upper' and 'opaque' as bit flagsAmir Goldstein
We need to make some room in struct ovl_entry to store information about redirected ancestors for NFS export, so cram two booleans as bit flags. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: use directory index entries for consistency verificationAmir Goldstein
A directory index is a directory type entry in index dir with a "trusted.overlay.upper" xattr containing an encoded ovl_fh of the merge directory upper dir inode. On lookup of non-dir files, lower file is followed by origin file handle. On lookup of dir entries, lower dir is found by name and then compared to origin file handle. We only trust dir index if we verified that lower dir matches origin file handle, otherwise index may be inconsistent and we ignore it. If we find an indexed non-upper dir or an indexed merged dir, whose index 'upper' xattr points to a different upper dir, that means that the lower directory may be also referenced by another upper dir via redirect, so we fail the lookup on inconsistency error. To be consistent with directory index entries format, the association of index dir to upper root dir, that was stored by older kernels in "trusted.overlay.origin" xattr is now stored in "trusted.overlay.upper" xattr. This also serves as an indication that overlay was mounted with a kernel that support index directory entries. For backward compatibility, if an 'origin' xattr exists on the index dir we also verify it on mount. Directory index entries are going to be used for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: add support for "nfs_export" configurationAmir Goldstein
Introduce the "nfs_export" config, module and mount options. The NFS export feature depends on the "index" feature and enables two implicit overlayfs features: "index_all" and "verify_lower". The "index_all" feature creates an index on copy up of every file and directory. The "verify_lower" feature uses the full index to detect overlay filesystems inconsistencies on lookup, like redirect from multiple upper dirs to the same lower dir. NFS export can be enabled for non-upper mount with no index. However, because lower layer redirects cannot be verified with the index, enabling NFS export support on an overlay with no upper layer requires turning off redirect follow (e.g. "redirect_dir=nofollow"). The full index may incur some overhead on mount time, especially when verifying that lower directory file handles are not stale. NFS export support, full index and consistency verification will be implemented by following patches. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: generalize ovl_verify_origin() and helpersAmir Goldstein
Remove the "origin" language from the functions that handle set, get and verify of "origin" xattr and pass the xattr name as an argument. The same helpers are going to be used for NFS export to get, get and verify the "upper" xattr for directory index entries. ovl_verify_origin() is now a helper used only to verify non upper file handle stored in "origin" xattr of upper inode. The upper root dir file handle is still stored in "origin" xattr on the index dir for backward compatibility. This is going to be changed by the patch that adds directory index entries support. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: simplify arguments to ovl_check_origin_fh()Amir Goldstein
Pass the fs instance with lower_layers array instead of the dentry lowerstack array to ovl_check_origin_fh(), because the dentry members of lowerstack play no role in this helper. This change simplifies the argument list of ovl_check_origin(), ovl_cleanup_index() and ovl_verify_index(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: store layer index in ovl_layerAmir Goldstein
Store the fs root layer index inside ovl_layer struct, so we can get the root fs layer index from merge dir lower layer instead of find it with ovl_find_layer() helper. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: force r/o mount when index dir creation failsAmir Goldstein
When work dir creation fails, a warning is emitted and overlay is mounted r/o. Trying to remount r/w will fail with no work dir. When index dir creation fails, the same warning is emitted and overlay is mounted r/o, but trying to remount r/w will succeed. This may cause unintentional corruption of filesystem consistency. Adjust the behavior of index dir creation failure to that of work dir creation failure and do not allow to remount r/w. User needs to state an explicitly intention to work without an index by mounting with option 'index=off' to allow r/w mount with no index dir. When mounting with option 'index=on' and no 'upperdir', index is implicitly disabled, so do not warn about no file handle support. The issue was introduced with inodes index feature in v4.13, but this patch will not apply cleanly before ovl_fill_super() re-factoring in v4.15. Fixes: 02bcd1577400 ("ovl: introduce the inodes index dir feature") Cc: <stable@vger.kernel.org> #v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24ovl: disable index when no xattr supportAmir Goldstein
Overlayfs falls back to index=off if lower/upper fs does not support file handles. Do the same if upper fs does not support xattr. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-19ovl: take mnt_want_write() for work/index dir setupAmir Goldstein
There are several write operations on upper fs not covered by mnt_want_write(): - test set/remove OPAQUE xattr - test create O_TMPFILE - set ORIGIN xattr in ovl_verify_origin() - cleanup of index entries in ovl_indexdir_cleanup() Some of these go way back, but this patch only applies over the v4.14 re-factoring of ovl_fill_super(). Cc: <stable@vger.kernel.org> #v4.14 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-19ovl: hash directory inodes for fsnotifyAmir Goldstein
fsnotify pins a watched directory inode in cache, but if directory dentry is released, new lookup will allocate a new dentry and a new inode. Directory events will be notified on the new inode, while fsnotify listener is watching the old pinned inode. Hash all directory inodes to reuse the pinned inode on lookup. Pure upper dirs are hashes by real upper inode, merge and lower dirs are hashed by real lower inode. The reference to lower inode was being held by the lower dentry object in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry may drop lower inode refcount to zero. Add a refcount on behalf of the overlay inode to prevent that. As a by-product, hashing directory inodes also detects multiple redirected dirs to the same lower dir and uncovered redirected dir target on and returns -ESTALE on lookup. The reported issue dates back to initial version of overlayfs, but this patch depends on ovl_inode code that was introduced in kernel v4.13. Cc: <stable@vger.kernel.org> #v4.13 Reported-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com>
2017-12-11ovl: Sync upper dirty data when syncing overlayfsChengguang Xu
When executing filesystem sync or umount on overlayfs, dirty data does not get synced as expected on upper filesystem. This patch fixes sync filesystem method to keep data consistency for overlayfs. Signed-off-by: Chengguang Xu <cgxu@mykernel.net> Fixes: e593b2bf513d ("ovl: properly implement sync_filesystem()") Cc: <stable@vger.kernel.org> #4.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11ovl: don't follow redirects if redirect_dir=offMiklos Szeredi
Overlayfs is following redirects even when redirects are disabled. If this is unintentional (probably the majority of cases) then this can be a problem. E.g. upper layer comes from untrusted USB drive, and attacker crafts a redirect to enable read access to otherwise unreadable directories. If "redirect_dir=off", then turn off following as well as creation of redirects. If "redirect_dir=follow", then turn on following, but turn off creation of redirects (which is what "redirect_dir=off" does now). This is a backward incompatible change, so make it dependent on a config option. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-27Rename superblock flags (MS_xyz -> SB_xyz)Linus Torvalds
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-10ovl: remove unneeded arg from ovl_verify_origin()Amir Goldstein
Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-10ovl: rename ufs to ofsMiklos Szeredi
Rename all "struct ovl_fs" pointers to "ofs". The "ufs" name is historical and can only be found in overlayfs/super.c. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-10ovl: clean up getting lower layersMiklos Szeredi
Move calling ovl_get_lower_layers() into ovl_get_lowerstack(). ovl_get_lowerstack() now returns the root dentry's filled in ovl_entry. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-10ovl: clean up workdir creationMiklos Szeredi
Move calling ovl_get_workdir() into ovl_get_workpath(). Rename ovl_get_workdir() to ovl_make_workdir() and ovl_get_workpath() to ovl_get_workdir(). Workpath is now not needed outside ovl_get_workdir(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-10ovl: clean up getting upper layerMiklos Szeredi
Merge ovl_get_upper() and ovl_get_upperpath(). The resulting function is named ovl_get_upper(), though it still returns upperpath as well. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-10ovl: move ovl_get_workdir() and ovl_get_lower_layers()Miklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-10ovl: reduce the number of arguments for ovl_workdir_create()Miklos Szeredi
Remove "sb" and "dentry" arguments of ovl_workdir_create() and related functions. Move setting MS_RDONLY flag to callers. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-10ovl: change order of setup in ovl_fill_super()Miklos Szeredi
Move ovl_get_upper() immediately after ovl_get_upperpath(), ovl_get_workdir() immediately after ovl_get_workdir() and ovl_get_lower_layers() immediately after ovl_get_lowerstack(). Also move prepare_creds() up to where other allocations are happening. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>