summaryrefslogtreecommitdiffstats
path: root/fs/afs
AgeCommit message (Collapse)Author
2020-06-05Merge tag 'afs-next-20200604' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS updates from David Howells: "There's some core VFS changes which affect a couple of filesystems: - Make the inode hash table RCU safe and providing some RCU-safe accessor functions. The search can then be done without taking the inode_hash_lock. Care must be taken because the object may be being deleted and no wait is made. - Allow iunique() to avoid taking the inode_hash_lock. - Allow AFS's callback processing to avoid taking the inode_hash_lock when using the inode table to find an inode to notify. - Improve Ext4's time updating. Konstantin Khlebnikov said "For now, I've plugged this issue with try-lock in ext4 lazy time update. This solution is much better." Then there's a set of changes to make a number of improvements to the AFS driver: - Improve callback (ie. third party change notification) processing by: (a) Relying more on the fact we're doing this under RCU and by using fewer locks. This makes use of the RCU-based inode searching outlined above. (b) Moving to keeping volumes in a tree indexed by volume ID rather than a flat list. (c) Making the server and volume records logically part of the cell. This means that a server record now points directly at the cell and the tree of volumes is there. This removes an N:M mapping table, simplifying things. - Improve keeping NAT or firewall channels open for the server callbacks to reach the client by actively polling the fileserver on a timed basis, instead of only doing it when we have an operation to process. - Improving detection of delayed or lost callbacks by including the parent directory in the list of file IDs to be queried when doing a bulk status fetch from lookup. We can then check to see if our copy of the directory has changed under us without us getting notified. - Determine aliasing of cells (such as a cell that is pointed to be a DNS alias). This allows us to avoid having ambiguity due to apparently different cells using the same volume and file servers. - Improve the fileserver rotation to do more probing when it detects that all of the addresses to a server are listed as non-responsive. It's possible that an address that previously stopped responding has become responsive again. Beyond that, lay some foundations for making some calls asynchronous: - Turn the fileserver cursor struct into a general operation struct and hang the parameters off of that rather than keeping them in local variables and hang results off of that rather than the call struct. - Implement some general operation handling code and simplify the callers of operations that affect a volume or a volume component (such as a file). Most of the operation is now done by core code. - Operations are supplied with a table of operations to issue different variants of RPCs and to manage the completion, where all the required data is held in the operation object, thereby allowing these to be called from a workqueue. - Put the standard "if (begin), while(select), call op, end" sequence into a canned function that just emulates the current behaviour for now. There are also some fixes interspersed: - Don't let the EACCES from ICMP6 mapping reach the user as such, since it's confusing as to whether it's a filesystem error. Convert it to EHOSTUNREACH. - Don't use the epoch value acquired through probing a server. If we have two servers with the same UUID but in different cells, it's hard to draw conclusions from them having different epoch values. - Don't interpret the argument to the CB.ProbeUuid RPC as a fileserver UUID and look up a fileserver from it. - Deal with servers in different cells having the same UUIDs. In the event that a CB.InitCallBackState3 RPC is received, we have to break the callback promises for every server record matching that UUID. - Don't let afs_statfs return values that go below 0. - Don't use running fileserver probe state to make server selection and address selection decisions on. Only make decisions on final state as the running state is cleared at the start of probing" Acked-by: Al Viro <viro@zeniv.linux.org.uk> (fs/inode.c part) * tag 'afs-next-20200604' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (27 commits) afs: Adjust the fileserver rotation algorithm to reprobe/retry more quickly afs: Show more a bit more server state in /proc/net/afs/servers afs: Don't use probe running state to make decisions outside probe code afs: Fix afs_statfs() to not let the values go below zero afs: Fix the by-UUID server tree to allow servers with the same UUID afs: Reorganise volume and server trees to be rooted on the cell afs: Add a tracepoint to track the lifetime of the afs_volume struct afs: Detect cell aliases 3 - YFS Cells with a canonical cell name op afs: Detect cell aliases 2 - Cells with no root volumes afs: Detect cell aliases 1 - Cells with root volumes afs: Implement client support for the YFSVL.GetCellName RPC op afs: Retain more of the VLDB record for alias detection afs: Fix handling of CB.ProbeUuid cache manager op afs: Don't get epoch from a server because it may be ambiguous afs: Build an abstraction around an "operation" concept afs: Rename struct afs_fs_cursor to afs_operation afs: Remove the error argument from afs_protocol_error() afs: Set error flag rather than return error from file status decode afs: Make callback processing more efficient. afs: Show more information in /proc/net/afs/servers ...
2020-06-04afs: Adjust the fileserver rotation algorithm to reprobe/retry more quicklyDavid Howells
Adjust the fileserver rotation algorithm so that if we've tried all the addresses on a server (cumulatively over multiple operations) until we've run out of untried addresses, immediately reprobe all that server's interfaces and retry the op at least once before we move onto the next server. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Show more a bit more server state in /proc/net/afs/serversDavid Howells
Display more information about the state of a server record, including the flags, rtt and break counter plus the probe state for each server in /proc/net/afs/servers. Rearrange the server flags a bit to make them easier to read at a glance in the proc file. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Don't use probe running state to make decisions outside probe codeDavid Howells
Don't use the running state for fileserver probes to make decisions about which server to use as the state is cleared at the start of a probe and also intermediate values might be misleading. Instead, add a separate 'latest known' rtt in the afs_server struct and a flag to indicate if the server is known to be responding and update these as and when we know what to change them to. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Fix afs_statfs() to not let the values go below zeroDavid Howells
Fix afs_statfs() so that the value for f_bavail and f_bfree don't go "negative" if the number of blocks in use by a volume exceeds the max quota for that volume. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Fix the by-UUID server tree to allow servers with the same UUIDDavid Howells
Whilst it shouldn't happen, it is possible for multiple fileservers to share a UUID, particularly if an entire cell has been duplicated, UUIDs and all. In such a case, it's not necessarily possible to map the effect of the CB.InitCallBackState3 incoming RPC to a specific server unambiguously by UUID and thus to a specific cell. Indeed, there's a problem whereby multiple server records may need to occupy the same spot in the rb_tree rooted in the afs_net struct. Fix this by allowing servers to form a list, with the head of the list in the tree. When the front entry in the list is removed, the second in the list just replaces it. afs_init_callback_state() then just goes down the line, poking each server in the list. This means that some servers will be unnecessarily poked, unfortunately. An alternative would be to route by call parameters. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
2020-06-04afs: Reorganise volume and server trees to be rooted on the cellDavid Howells
Reorganise afs_volume objects such that they're in a tree keyed on volume ID, rooted at on an afs_cell object rather than being in multiple trees, each of which is rooted on an afs_server object. afs_server structs become per-cell and acquire a pointer to the cell. The process of breaking a callback then starts with finding the server by its network address, following that to the cell and then looking up each volume ID in the volume tree. This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web and allows those structs and the code for maintaining them to be simplified or removed. It does make a couple of things a bit more tricky, though: (1) Operations now start with a volume, not a server, so there can be more than one answer as to whether or not the server we'll end up using supports the FS.InlineBulkStatus RPC. (2) CB RPC operations that specify the server UUID. There's still a tree of servers by UUID on the afs_net struct, but the UUIDs in it aren't guaranteed unique. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Add a tracepoint to track the lifetime of the afs_volume structDavid Howells
Add a tracepoint to track the lifetime of the afs_volume struct. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Detect cell aliases 3 - YFS Cells with a canonical cell name opDavid Howells
YFS Volume Location servers have an operation by which the cell name may be queried. Use this to find out what a YFS server thinks the canonical cell name should be. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Detect cell aliases 2 - Cells with no root volumesDavid Howells
Implement the second phase of cell alias detection. This part handles alias detection for cells that don't have root.cell volumes and so we have to find some other volume or fileserver to query. We take the first volume from each such cell and attempt to look it up in the new cell. If found, we compare the records, if they are the same, we judge the cell names to be aliases. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Detect cell aliases 1 - Cells with root volumesDavid Howells
Put in the first phase of cell alias detection. This part handles alias detection for cells that have root.cell volumes (which is expected to be likely). When a cell becomes newly active, it is probed for its root.cell volume, and if it has one, this volume is compared against other root.cell volumes to find out if the list of fileserver UUIDs have any in common - and if that's the case, do the address lists of those fileservers have any addresses in common. If they do, the new cell is adjudged to be an alias of the old cell and the old cell is used instead. Comparing is aided by the server list in struct afs_server_list being sorted in UUID order and the addresses in the fileserver address lists being sorted in address order. The cell then retains the afs_volume object for the root.cell volume, even if it's not mounted for future alias checking. This necessary because: (1) Whilst fileservers have UUIDs that are meant to be globally unique, in practice they are not because cells get cloned without changing the UUIDs - so afs_server records need to be per cell. (2) Sometimes the DNS is used to make cell aliases - but if we don't know they're the same, we may end up with multiple superblocks and multiple afs_server records for the same thing, impairing our ability to deliver callback notifications of third party changes (3) The fileserver RPC API doesn't contain the cell name, so it can't tell us which cell it's notifying and can't see that a change made to to one cell should notify the same client that's also accessed as the other cell. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Implement client support for the YFSVL.GetCellName RPC opDavid Howells
Implement client support for the YFSVL.GetCellName RPC operation by which YFS permits the canonical cell name to be queried from a VL server. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Retain more of the VLDB record for alias detectionDavid Howells
Save more bits from the volume location database record obtained for a server so that we can use this information in cell alias detection. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Fix handling of CB.ProbeUuid cache manager opDavid Howells
The AFS filesystem driver is handling the CB.ProbeUuid request incorrectly. The UUID presented in the request is that of the cache manager, not the fileserver, so afs_deliver_cb_probe_uuid() shouldn't be using that UUID to look up the server. Fix this by looking up the server by address instead. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Don't get epoch from a server because it may be ambiguousDavid Howells
Don't get the epoch from a server, particularly one that we're looking up by UUID, as UUIDs may be ambiguous and may map to more than one server - so we can't draw any conclusions from it. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Build an abstraction around an "operation" conceptDavid Howells
Turn the afs_operation struct into the main way that most fileserver operations are managed. Various things are added to the struct, including the following: (1) All the parameters and results of the relevant operations are moved into it, removing corresponding fields from the afs_call struct. afs_call gets a pointer to the op. (2) The target volume is made the main focus of the operation, rather than the target vnode(s), and a bunch of op->vnode->volume are made op->volume instead. (3) Two vnode records are defined (op->file[]) for the vnode(s) involved in most operations. The vnode record (struct afs_vnode_param) contains: - The vnode pointer. - The fid of the vnode to be included in the parameters or that was returned in the reply (eg. FS.MakeDir). - The status and callback information that may be returned in the reply about the vnode. - Callback break and data version tracking for detecting simultaneous third-parth changes. (4) Pointers to dentries to be updated with new inodes. (5) An operations table pointer. The table includes pointers to functions for issuing AFS and YFS-variant RPCs, handling the success and abort of an operation and handling post-I/O-lock local editing of a directory. To make this work, the following function restructuring is made: (A) The rotation loop that issues calls to fileservers that can be found in each function that wants to issue an RPC (such as afs_mkdir()) is extracted out into common code, in a new file called fs_operation.c. (B) The rotation loops, such as the one in afs_mkdir(), are replaced with a much smaller piece of code that allocates an operation, sets the parameters and then calls out to the common code to do the actual work. (C) The code for handling the success and failure of an operation are moved into operation functions (as (5) above) and these are called from the core code at appropriate times. (D) The pseudo inode getting stuff used by the dynamic root code is moved over into dynroot.c. (E) struct afs_iget_data is absorbed into the operation struct and afs_iget() expects to be given an op pointer and a vnode record. (F) Point (E) doesn't work for the root dir of a volume, but we know the FID in advance (it's always vnode 1, unique 1), so a separate inode getter, afs_root_iget(), is provided to special-case that. (G) The inode status init/update functions now also take an op and a vnode record. (H) The RPC marshalling functions now, for the most part, just take an afs_operation struct as their only argument. All the data they need is held there. The result delivery functions write their answers there as well. (I) The call is attached to the operation and then the operation core does the waiting. And then the new operation code is, for the moment, made to just initialise the operation, get the appropriate vnode I/O locks and do the same rotation loop as before. This lays the foundation for the following changes in the future: (*) Overhauling the rotation (again). (*) Support for asynchronous I/O, where the fileserver rotation must be done asynchronously also. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-06-01Merge tag 'docs-5.8' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
2020-05-31afs: Rename struct afs_fs_cursor to afs_operationDavid Howells
As a prelude to implementing asynchronous fileserver operations in the afs filesystem, rename struct afs_fs_cursor to afs_operation. This struct is going to form the core of the operation management and is going to acquire more members in later. Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Remove the error argument from afs_protocol_error()David Howells
Remove the error argument from afs_protocol_error() as it's always -EBADMSG. Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Set error flag rather than return error from file status decodeDavid Howells
Set a flag in the call struct to indicate an unmarshalling error rather than return and handle an error from the decoding of file statuses. This flag is checked on a successful return from the delivery function. Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Make callback processing more efficient.David Howells
afs_vol_interest objects represent the volume IDs currently being accessed from a fileserver. These hold lists of afs_cb_interest objects that repesent the superblocks using that volume ID on that server. When a callback notification from the server telling of a modification by another client arrives, the volume ID specified in the notification is looked up in the server's afs_vol_interest list. Through the afs_cb_interest list, the relevant superblocks can be iterated over and the specific inode looked up and marked in each one. Make the following efficiency improvements: (1) Hold rcu_read_lock() over the entire processing rather than locking it each time. (2) Do all the callbacks for each vid together rather than individually. Each volume then only needs to be looked up once. (3) afs_vol_interest objects are now stored in an rb_tree rather than a flat list to reduce the lookup step count. (4) afs_vol_interest lookup is now done with RCU, but because it's in an rb_tree which may rotate under us, a seqlock is used so that if it changes during the walk, we repeat the walk with a lock held. With this and the preceding patch which adds RCU-based lookups in the inode cache, target volumes/vnodes can be taken without the need to take any locks, except on the target itself. Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Show more information in /proc/net/afs/serversDavid Howells
Show more information in /proc/net/afs/servers to make it easier to see what's going on with the server probing. Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Actively poll fileservers to maintain NAT or firewall openingsDavid Howells
When an AFS client accesses a file, it receives a limited-duration callback promise that the server will notify it if another client changes a file. This callback duration can be a few hours in length. If a client mounts a volume and then an application prevents it from being unmounted, say by chdir'ing into it, but then does nothing for some time, the rxrpc_peer record will expire and rxrpc-level keepalive will cease. If there is NAT or a firewall between the client and the server, the route back for the server may close after a comparatively short duration, meaning that attempts by the server to notify the client may then bounce. The client, however, may (so far as it knows) still have a valid unexpired promise and will then rely on its cached data and will not see changes made on the server by a third party until it incidentally rechecks the status or the promise needs renewal. To deal with this, the client needs to regularly probe the server. This has two effects: firstly, it keeps a route open back for the server, and secondly, it causes the server to disgorge any notifications that got queued up because they couldn't be sent. Fix this by adding a mechanism to emit regular probes. Two levels of probing are made available: Under normal circumstances the 'slow' queue will be used for a fileserver - this just probes the preferred address once every 5 mins or so; however, if server fails to respond to any probes, the server will shift to the 'fast' queue from which all its interfaces will be probed every 30s. When it finally responds, the record will switch back to the slow queue. Further notes: (1) Probing is now no longer driven from the fileserver rotation algorithm. (2) Probes are dispatched to all interfaces on a fileserver when that an afs_server object is set up to record it. (3) The afs_server object is removed from the probe queues when we start to probe it. afs_is_probing_server() returns true if it's not listed - ie. it's undergoing probing. (4) The afs_server object is added back on to the probe queue when the final outstanding probe completes, but the probed_at time is set when we're about to launch a probe so that it's not dependent on the probe duration. (5) The timer and the work item added for this must be handed a count on net->servers_outstanding, which they hand on or release. This makes sure that network namespace cleanup waits for them. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Split the usage count on struct afs_serverDavid Howells
Split the usage count on the afs_server struct to have an active count that registers who's actually using it separately from the reference count on the object. This allows a future patch to dispatch polling probes without advancing the "unuse" time into the future each time we emit a probe, which would otherwise prevent unused server records from expiring. Included in this: (1) The latter part of afs_destroy_server() in which the RCU destruction of afs_server objects is invoked and the outstanding server count is decremented is split out into __afs_put_server(). (2) afs_put_server() now calls __afs_put_server() rather then setting the management timer. (3) The calls begun by afs_fs_give_up_all_callbacks() and afs_fs_get_capabilities() can now take a ref on the server record, so afs_destroy_server() can just drop its ref and needn't wait for the completion of these calls. They'll put the ref when they're done. (4) Because of (3), afs_fs_probe_done() no longer needs to wake up afs_destroy_server() with server->probe_outstanding. (5) afs_gc_servers can be simplified. It only needs to check if server->active is 0 rather than playing games with the refcount. (6) afs_manage_servers() can propose a server for gc if usage == 0 rather than if ref == 1. The gc is effected by (5). Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Use the serverUnique field in the UVLDB record to reduce rpc opsDavid Howells
The U-version VLDB volume record retrieved by the VL.GetEntryByNameU rpc op carries a change counter (the serverUnique field) for each fileserver listed in the record as backing that volume. This is incremented whenever the registration details for a fileserver change (such as its address list). Note that the same value will be seen in all UVLDB records that refer to that fileserver. This should be checked before calling the VL server to re-query the address list for a fileserver. If it's the same, there's no point doing the query. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Always include dir in bulk status fetch from afs_do_lookup()David Howells
When a lookup is done in an AFS directory, the filesystem will speculate and fetch up to 49 other statuses for files in the same directory and fetch those as well, turning them into inodes or updating inodes that already exist. However, occasionally, a callback break might go missing due to NAT timing out, but the afs filesystem doesn't then realise that the directory is not up to date. Alleviate this by using one of the status slots to check the directory in which the lookup is being done. Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Suggested-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31vfs, afs, ext4: Make the inode hash table RCU searchableDavid Howells
Make the inode hash table RCU searchable so that searches that want to access or modify an inode without taking a ref on that inode can do so without taking the inode hash table lock. The main thing this requires is some RCU annotation on the list manipulation operations. Inodes are already freed by RCU in most cases. Users of this interface must take care as the inode may be still under construction or may be being torn down around them. There are at least three instances where this can be of use: (1) Testing whether the inode number iunique() is going to return is currently unique (the iunique_lock is still held). (2) Ext4 date stamp updating. (3) AFS callback breaking. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> cc: linux-ext4@vger.kernel.org cc: linux-afs@lists.infradead.org
2020-05-28rxrpc: add rxrpc_sock_set_min_security_levelChristoph Hellwig
Add a helper to directly set the RXRPC_MIN_SECURITY_LEVEL sockopt from kernel space without going through a fake uaccess. Thanks to David Howells for the documentation updates. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix RCU warnings in ipv6 multicast router code, from Madhuparna Bhowmik. 2) Nexthop attributes aren't being checked properly because of mis-initialized iterator, from David Ahern. 3) Revert iop_idents_reserve() change as it caused performance regressions and was just working around what is really a UBSAN bug in the compiler. From Yuqi Jin. 4) Read MAC address properly from ROM in bmac driver (double iteration proceeds past end of address array), from Jeremy Kerr. 5) Add Microsoft Surface device IDs to r8152, from Marc Payne. 6) Prevent reference to freed SKB in __netif_receive_skb_core(), from Boris Sukholitko. 7) Fix ACK discard behavior in rxrpc, from David Howells. 8) Preserve flow hash across packet scrubbing in wireguard, from Jason A. Donenfeld. 9) Cap option length properly for SO_BINDTODEVICE in AX25, from Eric Dumazet. 10) Fix encryption error checking in kTLS code, from Vadim Fedorenko. 11) Missing BPF prog ref release in flow dissector, from Jakub Sitnicki. 12) dst_cache must be used with BH disabled in tipc, from Eric Dumazet. 13) Fix use after free in mlxsw driver, from Jiri Pirko. 14) Order kTLS key destruction properly in mlx5 driver, from Tariq Toukan. 15) Check devm_platform_ioremap_resource() return value properly in several drivers, from Tiezhu Yang. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits) net: smsc911x: Fix runtime PM imbalance on error net/mlx4_core: fix a memory leak bug. net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend net: phy: mscc: fix initialization of the MACsec protocol mode net: stmmac: don't attach interface until resume finishes net: Fix return value about devm_platform_ioremap_resource() net/mlx5: Fix error flow in case of function_setup failure net/mlx5e: CT: Correctly get flow rule net/mlx5e: Update netdev txq on completions during closure net/mlx5: Annotate mutex destroy for root ns net/mlx5: Don't maintain a case of del_sw_func being null net/mlx5: Fix cleaning unmanaged flow tables net/mlx5: Fix memory leak in mlx5_events_init net/mlx5e: Fix inner tirs handling net/mlx5e: kTLS, Destroy key object after destroying the TIS net/mlx5e: Fix allowed tc redirect merged eswitch offload cases net/mlx5: Avoid processing commands before cmdif is ready net/mlx5: Fix a race when moving command interface to events mode net/mlx5: Add command entry handling completion rxrpc: Fix a memory leak in rxkad_verify_response() ...
2020-05-23rxrpc: Fix a warningDavid Howells
Fix a warning due to an uninitialised variable. le included from ../fs/afs/fs_probe.c:11: ../fs/afs/fs_probe.c: In function 'afs_fileserver_probe_result': ../fs/afs/internal.h:1453:2: warning: 'rtt_us' may be used uninitialized in this function [-Wmaybe-uninitialized] 1453 | printk("[%-6.6s] "FMT"\n", current->comm ,##__VA_ARGS__) | ^~~~~~ ../fs/afs/fs_probe.c:35:15: note: 'rtt_us' was declared here Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-18afs: Don't unlock fetched data pages until the op completes successfullyDavid Howells
Don't call req->page_done() on each page as we finish filling it with the data coming from the network. Whilst this might speed up the application a bit, it's a problem if there's a network failure and the operation has to be reissued. If this happens, an oops occurs because afs_readpages_page_done() clears the pointer to each page it unlocks and when a retry happens, the pointers to the pages it wants to fill are now NULL (and the pages have been unlocked anyway). Instead, wait till the operation completes successfully and only then release all the pages after clearing any terminal gap (the server can give us less data than we requested as we're allowed to ask for more than is available). KASAN produces a bug like the following, and even without KASAN, it can oops and panic. BUG: KASAN: wild-memory-access in _copy_to_iter+0x323/0x5f4 Write of size 1404 at addr 0005088000000000 by task md5sum/5235 CPU: 0 PID: 5235 Comm: md5sum Not tainted 5.7.0-rc3-fscache+ #250 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: memcpy+0x39/0x58 _copy_to_iter+0x323/0x5f4 __skb_datagram_iter+0x89/0x2a6 skb_copy_datagram_iter+0x129/0x135 rxrpc_recvmsg_data.isra.0+0x615/0xd42 rxrpc_kernel_recv_data+0x1e9/0x3ae afs_extract_data+0x139/0x33a yfs_deliver_fs_fetch_data64+0x47a/0x91b afs_deliver_to_call+0x304/0x709 afs_wait_for_call_to_complete+0x1cc/0x4ad yfs_fs_fetch_data+0x279/0x288 afs_fetch_data+0x1e1/0x38d afs_readpages+0x593/0x72e read_pages+0xf5/0x21e __do_page_cache_readahead+0x128/0x23f ondemand_readahead+0x36e/0x37f generic_file_buffered_read+0x234/0x680 new_sync_read+0x109/0x17e vfs_read+0xe6/0x138 ksys_read+0xd8/0x14d do_syscall_64+0x6e/0x8a entry_SYSCALL_64_after_hwframe+0x49/0xb3 Fixes: 196ee9cd2d04 ("afs: Make afs_fs_fetch_data() take a list of pages") Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-11rxrpc: Fix the excessive initial retransmission timeoutDavid Howells
rxrpc currently uses a fixed 4s retransmission timeout until the RTT is sufficiently sampled. This can cause problems with some fileservers with calls to the cache manager in the afs filesystem being dropped from the fileserver because a packet goes missing and the retransmission timeout is greater than the call expiry timeout. Fix this by: (1) Copying the RTT/RTO calculation code from Linux's TCP implementation and altering it to fit rxrpc. (2) Altering the various users of the RTT to make use of the new SRTT value. (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO value instead (which is needed in jiffies), along with a backoff. Notes: (1) rxrpc provides RTT samples by matching the serial numbers on outgoing DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets against the reference serial number in incoming REQUESTED ACK and PING-RESPONSE ACK packets. (2) Each packet that is transmitted on an rxrpc connection gets a new per-connection serial number, even for retransmissions, so an ACK can be cross-referenced to a specific trigger packet. This allows RTT information to be drawn from retransmitted DATA packets also. (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than on an rxrpc_call because many RPC calls won't live long enough to generate more than one sample. (4) The calculated SRTT value is in units of 8ths of a microsecond rather than nanoseconds. The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers. Fixes: 17926a79320a ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriateDavid Howells
When an operation is meant to be done uninterruptibly (such as FS.StoreData), we should not be allowing volume and server record checking to be interrupted. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCHDavid Howells
AFS keeps track of the epoch value from the rxrpc protocol to note (a) when a fileserver appears to have restarted and (b) when different endpoints of a fileserver do not appear to be associated with the same fileserver (ie. all probes back from a fileserver from all of its interfaces should carry the same epoch). However, the AFS_SERVER_FL_HAVE_EPOCH flag that indicates that we've received the server's epoch is never set, though it is used. Fix this to set the flag when we first receive an epoch value from a probe sent to the filesystem client from the fileserver. Fixes: 3bf0fb6f33dd ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24afs: Remove some unused bitsDavid Howells
Remove three bits: (1) afs_server::no_epoch is neither set nor used. (2) afs_server::have_result is set and a wakeup is applied to it, but nothing looks at it or waits on it. (3) afs_vl_dump_edestaddrreq() prints afs_addr_list::probed, but nothing sets it for VL servers. Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-20docs: filesystems: fix renamed referencesMauro Carvalho Chehab
Some filesystem references got broken by a previous patch series I submitted. Address those. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: David Sterba <dsterba@suse.com> # fs/affs/Kconfig Link: https://lore.kernel.org/r/57318c53008dbda7f6f4a5a9e5787f4d37e8565a.1586881715.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-04-13afs: Fix afs_d_validate() to set the right directory versionDavid Howells
If a dentry's version is somewhere between invalid_before and the current directory version, we should be setting it forward to the current version, not backwards to the invalid_before version. Note that we're only doing this at all because dentry::d_fsdata isn't large enough on a 32-bit system. Fix this by using a separate variable for invalid_before so that we don't accidentally clobber the current dir version. Fixes: a4ff7401fbfa ("afs: Keep track of invalid-before version for dentry coherency") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13afs: Fix race between post-modification dir edit and readdir/d_revalidateDavid Howells
AFS directories are retained locally as a structured file, with lookup being effected by a local search of the file contents. When a modification (such as mkdir) happens, the dir file content is modified locally rather than redownloading the directory. The directory contents are accessed in a number of ways, with a number of different locks schemes: (1) Download of contents - dvnode->validate_lock/write in afs_read_dir(). (2) Lookup and readdir - dvnode->validate_lock/read in afs_dir_iterate(), downgrading from (1) if necessary. (3) d_revalidate of child dentry - dvnode->validate_lock/read in afs_do_lookup_one() downgrading from (1) if necessary. (4) Edit of dir after modification - page locks on individual dir pages. Unfortunately, because (4) uses different locking scheme to (1) - (3), nothing protects against the page being scanned whilst the edit is underway. Even download is not safe as it doesn't lock the pages - relying instead on the validate_lock to serialise as a whole (the theory being that directory contents are treated as a block and always downloaded as a block). Fix this by write-locking dvnode->validate_lock around the edits. Care must be taken in the rename case as there may be two different dirs - but they need not be locked at the same time. In any case, once the lock is taken, the directory version must be rechecked, and the edit skipped if a later version has been downloaded by revalidation (there can't have been any local changes because the VFS holds the inode lock, but there can have been remote changes). Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13afs: Fix length of dump of bad YFSFetchStatus recordDavid Howells
Fix the length of the dump of a bad YFSFetchStatus record. The function was copied from the AFS version, but the YFS variant contains bigger fields and extra information, so expand the dump to match. Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13afs: Fix rename operation status deliveryDavid Howells
The afs_deliver_fs_rename() and yfs_deliver_fs_rename() functions both only decode the second file status returned unless the parent directories are different - unfortunately, this means that the xdr pointer isn't advanced and the volsync record will be read incorrectly in such an instance. Fix this by always decoding the second status into the second status/callback block which wasn't being used if the dirs were the same. The afs_update_dentry_version() calls that update the directory data version numbers on the dentries can then unconditionally use the second status record as this will always reflect the state of the destination dir (the two records will be identical if the destination dir is the same as the source dir) Fixes: 260a980317da ("[AFS]: Add "directory write" support.") Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13afs: Fix decoding of inline abort codes from version 1 status recordsDavid Howells
If we're decoding an AFSFetchStatus record and we see that the version is 1 and the abort code is set and we're expecting inline errors, then we store the abort code and ignore the remaining status record (which is correct), but we don't set the flag to say we got a valid abort code. This can affect operation of YFS.RemoveFile2 when removing a file and the operation of {,Y}FS.InlineBulkStatus when prospectively constructing or updating of a set of inodes during a lookup. Fix this to indicate the reception of a valid abort code. Fixes: a38a75581e6e ("afs: Fix unlink to handle YFS.RemoveFile2 better") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13afs: Fix missing XDR advance in xdr_decode_{AFS,YFS}FSFetchStatus()David Howells
If we receive a status record that has VNOVNODE set in the abort field, xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() don't advance the XDR pointer, thereby corrupting anything subsequent decodes from the same block of data. This has the potential to affect AFS.InlineBulkStatus and YFS.InlineBulkStatus operation, but probably doesn't since the status records are extracted as individual blocks of data and the buffer pointer is reset between blocks. It does affect YFS.RemoveFile2 operation, corrupting the volsync record - though that is not currently used. Other operations abort the entire operation rather than returning an error inline, in which case there is no decoding to be done. Fix this by unconditionally advancing the xdr pointer. Fixes: 684b0f68cf1c ("afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility") Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-26afs: Fix unpinned address list during probingDavid Howells
When it's probing all of a fileserver's interfaces to find which one is best to use, afs_do_probe_fileserver() takes a lock on the server record and notes the pointer to the address list. It doesn't, however, pin the address list, so as soon as it drops the lock, there's nothing to stop the address list from being freed under us. Fix this by taking a ref on the address list inside the locked section and dropping it at the end of the function. Fixes: 3bf0fb6f33dd ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>