summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2019-03-15SUNRPC: Fix the minimal size for reply buffer allocationTrond Myklebust
We must at minimum allocate enough memory to be able to see any auth errors in the reply from the server. Fixes: 2c94b8eca1a26 ("SUNRPC: Use au_rslack when computing reply...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-15SUNRPC: Fix a client regression when handling oversized repliesTrond Myklebust
If the server sends a reply that is larger than the pre-allocated buffer, then the current code may fail to register how much of the stream that it has finished reading. This again can lead to hangs. Fixes: e92053a52e68 ("SUNRPC: Handle zero length fragments correctly") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-12pNFS: Fix a typo in pnfs_update_layoutTrond Myklebust
We're supposed to wait for the outstanding layout count to go to zero, but that got lost somehow. Fixes: d03360aaf5cca ("pNFS: Ensure we return the error if someone...") Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-12fix null pointer deref in tracepoints in back channelOlga Kornievskaia
Backchannel doesn't have the rq_task->tk_clientid pointer set. Otherwise can lead to the following oops: ocalhost login: [ 111.385319] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 111.388073] #PF error: [normal kernel read fault] [ 111.389452] PGD 80000000290d8067 P4D 80000000290d8067 PUD 75f25067 PMD 0 [ 111.391224] Oops: 0000 [#1] SMP PTI [ 111.392151] CPU: 0 PID: 3533 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1 [ 111.393787] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 111.396340] RIP: 0010:trace_event_raw_event_xprt_enq_xmit+0x6f/0xf0 [sunrpc] [ 111.397974] Code: 00 00 00 48 89 ee 48 89 e7 e8 bd 0a 85 d7 48 85 c0 74 4a 41 0f b7 94 24 e0 00 00 00 48 89 e7 89 50 08 49 8b 94 24 a8 00 00 00 <8b> 52 04 89 50 0c 49 8b 94 24 c0 00 00 00 8b 92 a8 00 00 00 0f ca [ 111.402215] RSP: 0018:ffffb98743263cf8 EFLAGS: 00010286 [ 111.403406] RAX: ffffa0890fc3bc88 RBX: 0000000000000003 RCX: 0000000000000000 [ 111.405057] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb98743263cf8 [ 111.406656] RBP: ffffa0896f5368f0 R08: 0000000000000246 R09: 0000000000000000 [ 111.408437] R10: ffffe19b01c01500 R11: 0000000000000000 R12: ffffa08977d28a00 [ 111.410210] R13: 0000000000000004 R14: ffffa089315303f0 R15: ffffa08931530000 [ 111.411856] FS: 0000000000000000(0000) GS:ffffa0897bc00000(0000) knlGS:0000000000000000 [ 111.413699] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 111.415068] CR2: 0000000000000004 CR3: 000000002ac90004 CR4: 00000000001606f0 [ 111.416745] Call Trace: [ 111.417339] xprt_request_enqueue_transmit+0x2b6/0x4a0 [sunrpc] [ 111.418709] ? rpc_task_need_encode+0x40/0x40 [sunrpc] [ 111.419957] call_bc_transmit+0xd5/0x170 [sunrpc] [ 111.421067] __rpc_execute+0x7e/0x3f0 [sunrpc] [ 111.422177] rpc_run_bc_task+0x78/0xd0 [sunrpc] [ 111.423212] bc_svc_process+0x281/0x340 [sunrpc] [ 111.424325] nfs41_callback_svc+0x130/0x1c0 [nfsv4] [ 111.425430] ? remove_wait_queue+0x60/0x60 [ 111.426398] kthread+0xf5/0x130 [ 111.427155] ? nfs_callback_authenticate+0x50/0x50 [nfsv4] [ 111.428388] ? kthread_bind+0x10/0x10 [ 111.429270] ret_from_fork+0x1f/0x30 localhost login: [ 467.462259] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 467.464411] #PF error: [normal kernel read fault] [ 467.465445] PGD 80000000728c1067 P4D 80000000728c1067 PUD 728c0067 PMD 0 [ 467.466980] Oops: 0000 [#1] SMP PTI [ 467.467759] CPU: 0 PID: 3517 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1 [ 467.469393] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 467.471840] RIP: 0010:trace_event_raw_event_xprt_transmit+0x7c/0xf0 [sunrpc] [ 467.473392] Code: f6 48 85 c0 74 4b 49 8b 94 24 98 00 00 00 48 89 e7 0f b7 92 e0 00 00 00 89 50 08 49 8b 94 24 98 00 00 00 48 8b 92 a8 00 00 00 <8b> 52 04 89 50 0c 41 8b 94 24 a8 00 00 00 0f ca 89 50 10 41 8b 94 [ 467.477605] RSP: 0018:ffffabe7434fbcd0 EFLAGS: 00010282 [ 467.478793] RAX: ffff99720fc3bce0 RBX: 0000000000000003 RCX: 0000000000000000 [ 467.480409] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffabe7434fbcd0 [ 467.482011] RBP: ffff99726f631948 R08: 0000000000000246 R09: 0000000000000000 [ 467.483591] R10: 0000000070000000 R11: 0000000000000000 R12: ffff997277dfcc00 [ 467.485226] R13: 0000000000000000 R14: 0000000000000000 R15: ffff99722fecdca8 [ 467.486830] FS: 0000000000000000(0000) GS:ffff99727bc00000(0000) knlGS:0000000000000000 [ 467.488596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 467.489931] CR2: 0000000000000004 CR3: 00000000270e6006 CR4: 00000000001606f0 [ 467.491559] Call Trace: [ 467.492128] xprt_transmit+0x303/0x3f0 [sunrpc] [ 467.493143] ? rpc_task_need_encode+0x40/0x40 [sunrpc] [ 467.494328] call_bc_transmit+0x49/0x170 [sunrpc] [ 467.495379] __rpc_execute+0x7e/0x3f0 [sunrpc] [ 467.496451] rpc_run_bc_task+0x78/0xd0 [sunrpc] [ 467.497467] bc_svc_process+0x281/0x340 [sunrpc] [ 467.498507] nfs41_callback_svc+0x130/0x1c0 [nfsv4] [ 467.499751] ? remove_wait_queue+0x60/0x60 [ 467.500686] kthread+0xf5/0x130 [ 467.501438] ? nfs_callback_authenticate+0x50/0x50 [nfsv4] [ 467.502640] ? kthread_bind+0x10/0x10 [ 467.503454] ret_from_fork+0x1f/0x30 Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-10SUNRPC: Take the transport send lock before binding+connectingTrond Myklebust
Before trying to bind a port, ensure we grab the send lock to ensure that we don't change the port while another task is busy transmitting requests. The connect code already takes the send lock in xprt_connect(), but it is harmless to take it before that. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-10SUNRPC: Micro-optimise when the task is known not to be sleepingTrond Myklebust
In cases where we know the task is not sleeping, try to optimise away the indirect call to task->tk_action() by replacing it with a direct call. Only change tail calls, to allow gcc to perform tail call elimination. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-10SUNRPC: Check whether the task was transmitted before rebind/reconnectTrond Myklebust
Before initiating transport actions that require putting the task to sleep, such as rebinding or reconnecting, we should check whether or not the task was already transmitted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-09SUNRPC: Remove redundant calls to RPC_IS_QUEUED()Trond Myklebust
The RPC task wakeup calls all check for RPC_IS_QUEUED() before taking any locks. In addition, rpc_exit() already calls rpc_wake_up_queued_task(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-09SUNRPC: Clean upTrond Myklebust
Replace remaining callers of call_timeout() with rpc_check_timeout(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-07SUNRPC: Respect RPC call timeouts when retrying transmissionTrond Myklebust
Fix a regression where soft and softconn requests are not timing out as expected. Fixes: 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-07SUNRPC: Fix up RPC back channel transmissionTrond Myklebust
Now that transmissions happen through a queue, we require the RPC tasks to handle error conditions that may have been set while they were sleeping. The back channel does not currently do this, but assumes that any error condition happens during its own call to xprt_transmit(). The solution is to ensure that the back channel splits out the error handling just like the forward channel does. Fixes: 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-07SUNRPC: Prevent thundering herd when the socket is not connectedTrond Myklebust
If the socket is not connected, then we want to initiate a reconnect rather that trying to transmit requests. If there is a large number of requests queued and waiting for the lock in call_transmit(), then it can take a while for one of the to loop back and retake the lock in call_connect. Fixes: 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-02SUNRPC: Allow dynamic allocation of back channel slotsTrond Myklebust
Now that the reads happen in a process context rather than a softirq, it is safe to allocate back channel slots using a reclaiming allocation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-02NFSv4.1: Bump the default callback session slot count to 16Trond Myklebust
Users can still control this value explicitly using the max_session_cb_slots module parameter, but let's bump the default up to 16 for now. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-02SUNRPC: Convert remaining GFP_NOIO, and GFP_NOWAIT sites in sunrpcTrond Myklebust
Convert the remaining gfp_flags arguments in sunrpc to standard reclaiming allocations, now that we set memalloc_nofs_save() as appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Clean up mirror DS initialisationTrond Myklebust
Get rid of the redundant parameter and rename the function ff_layout_mirror_valid() to ff_layout_init_mirror_ds() for clarity. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Remove dead code in ff_layout_mirror_valid()Trond Myklebust
nfs4_ff_alloc_deviceid_node() guarantees that if mirror->mirror_ds is a valid pointer, then so is mirror->mirror_ds->ds. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid()Trond Myklebust
Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfile: Simplify nfs4_ff_layout_ds_version()Trond Myklebust
Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Simplify ff_layout_get_ds_cred()Trond Myklebust
Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client()Trond Myklebust
Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh()Trond Myklebust
Pass in a pointer to the mirror rather than having to retrieve it from the array and then verify the resulting pointer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Speed up read failover when DSes are downTrond Myklebust
If we notice that a DS may be down, we should attempt to read from the other mirrors first before we go back to retry the dead DS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Don't invalidate DS deviceids for being unresponsiveTrond Myklebust
If the DS is unresponsive, we want to just mark it as such, while reporting the errors. If the server later returns the same deviceid in a new layout, then we don't want to have to look it up again. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Remove bogus checks for invalid deviceidsTrond Myklebust
We already check the deviceids before we start the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Avoid unnecessary layout invalidationsTrond Myklebust
In ff_layout_mirror_valid() we may not want to invalidate the layout segment despite the call to GETDEVICEINFO failing. The reason is that a read may still be able to make progress on another mirror. So instead we let the caller (in this case nfs4_ff_layout_prepare_ds()) decide whether or not it needs to invalidate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds()Trond Myklebust
While we may want to skip attempting to connect to a downed mirror when we're deciding which mirror to select for a read, we do not want to do so once we've committed to attempting the I/O in ff_layout_read/write_pagelist(), or ff_layout_initiate_commit() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4: Handle early exit in layoutget by returning an errorTrond Myklebust
If the LAYOUTGET rpc call exits early without an error, convert it to EAGAIN. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Send LAYOUTERROR when failing over mirrored readsTrond Myklebust
When a read to the preferred mirror returns an error, the flexfiles driver records the error in the inode list and currently marks the layout for return before failing over the attempted read to the next mirror. What we actually want to do is fire off a LAYOUTERROR to notify the MDS that there is an issue with the preferred mirror, then we fail over. Only once we've failed to read from all mirrors should we return the layout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4.2: Add client support for the generic 'layouterror' RPC callTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4/flexfiles: Abort I/O early if the layout segment was invalidatedTrond Myklebust
If a layout segment gets invalidated while a pNFS I/O operation is queued for transmission, then we ideally want to abort immediately. This is particularly the case when there is a large number of I/O related RPCs queued in the RPC layer, and the layout segment gets invalidated due to an ENOSPC error, or an EACCES (because the client was fenced). We may end up forced to spam the MDS with a lot of otherwise unnecessary LAYOUTERRORs after that I/O fails. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4/pnfs: Fix barriers in nfs4_mark_deviceid_unavailable()Trond Myklebust
Fix the memory barriers in nfs4_mark_deviceid_unavailable() and nfs4_test_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS/flexfiles: Fix up sparse RCU annotationsTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()Trond Myklebust
If the attempt to instantiate the mirror's layout DS pointer failed, then that pointer may hold a value of type ERR_PTR(), so we need to check that before we dereference it. Fixes: 65990d1afbd2d ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFS: Add missing encode / decode sequence_maxsz to v4.2 operationsAnna Schumaker
These really should have been there from the beginning, but we never noticed because there was enough slack in the RPC request for the extra bytes. Chuck's recent patch to use au_cslack and au_rslack to compute buffer size shrunk the buffer enough that this was now a problem for SEEK operations on my test client. Fixes: f4ac1674f5da4 ("nfs: Add ALLOCATE support") Fixes: 2e72448b07dc3 ("NFS: Add COPY nfs operation") Fixes: cb95deea0b4aa ("NFS OFFLOAD_CANCEL xdr") Fixes: 624bd5b7b683c ("nfs: Add DEALLOCATE support") Fixes: 1c6dcbe5ceff8 ("NFS: Implement SEEK") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4.1: Don't process the sequence op more than once.Trond Myklebust
Ensure that if we call nfs41_sequence_process() a second time for the same rpc_task, then we only process the results once. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4.1: Reinitialise sequence results before retransmitting a requestTrond Myklebust
If we have to retransmit a request, we should ensure that we reinitialise the sequence results structure, since in the event of a signal we need to treat the request as if it had not been sent. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org
2019-02-26SUNRPC: Fix an Oops in udp_poll()Trond Myklebust
udp_poll() checks the struct file for the O_NONBLOCK flag, so we must not call it with a NULL file pointer. Fixes: 0ffe86f48026 ("SUNRPC: Use poll() to fix up the socket requeue races") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-25Merge tag 'nfs-rdma-for-5.1-1' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA client updates for 5.1 New features: - Convert rpc auth layer to use xdr_streams - Config option to disable insecure enctypes - Reduce size of RPC receive buffers Bugfixes and cleanups: - Fix sparse warnings - Check inline size before providing a write chunk - Reduce the receive doorbell rate - Various tracepoint improvements [Trond: Fix up merge conflicts] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-23NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umountTrond Myklebust
If a bulk layout recall or a metadata server reboot coincides with a umount, then holding a reference to an inode is unsafe unless we also hold a reference to the super block. Fixes: fd9a8d7160937 ("NFSv4.1: Fix bulk recall and destroy of layouts") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-21NFS: Fix a soft lockup in the delegation recovery codeTrond Myklebust
Fix a soft lockup when NFS client delegation recovery is attempted but the inode is in the process of being freed. When the igrab(inode) call fails, and we have to restart the recovery process, we need to ensure that we won't attempt to recover the same delegation again. Fixes: 45870d6909d5a ("NFSv4.1: Test delegation stateids when server...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-21NFSv4.1: Avoid false retries when RPC calls are interruptedTrond Myklebust
A 'false retry' in NFSv4.1 occurs when the client attempts to transmit a new RPC call using a slot+sequence number combination that references an already cached one. Currently, the Linux NFS client will do this if a user process interrupts an RPC call that is in progress. The problem with doing so is that we defeat the main mechanism used by the server to differentiate between a new call and a replayed one. Even if the server is able to perfectly cache the arguments of the old call, it cannot know if the client intended to replay or send a new call. The obvious fix is to bump the sequence number pre-emptively if an RPC call is interrupted, but in order to deal with the corner cases where the interrupted call is not actually received and processed by the server, we need to interpret the error NFS4ERR_SEQ_MISORDERED as a sign that we need to either wait or locate a correct sequence number that lies between the value we sent, and the last value that was acked by a SEQUENCE call on that slot. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Jason Tibbitts <tibbs@math.uh.edu>
2019-02-20SUNRPC: Remove the redundant 'zerocopy' argument to xs_sendpages()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Further cleanups of xs_sendpages()Trond Myklebust
Now that we send the pages using a struct msghdr, instead of using sendpage(), we no longer need to 'prime the socket' with an address for unconnected UDP messages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Convert socket page send code to use iov_iter()Trond Myklebust
Simplify the page send code using iov_iter and bvecs. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()Trond Myklebust
Prepare to the socket transmission code to use iov_iter. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Initiate a connection close on an ESHUTDOWN error in stream receiveTrond Myklebust
If the client stream receive code receives an ESHUTDOWN error either because the server closed the connection, or because it sent a callback which cannot be processed, then we should shut down the connection. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Don't suppress socket errors when a message read completesTrond Myklebust
If the message read completes, but the socket returned an error condition, we should ensure to propagate that error. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Handle zero length fragments correctlyTrond Myklebust
A zero length fragment is really a bug, but let's ensure we don't go nuts when one turns up. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Don't reset the stream record info when the receive worker is runningTrond Myklebust
To ensure that the receive worker has exclusive access to the stream record info, we must not reset the contents other than when holding the transport->recv_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>