summaryrefslogtreecommitdiffstats
path: root/fs/afs
AgeCommit message (Collapse)Author
2017-11-13afs: Keep and pass sockaddr_rxrpc addresses rather than in_addrDavid Howells
Keep and pass sockaddr_rxrpc addresses around rather than keeping and passing in_addr addresses to allow for the use of IPv6 and non-standard port numbers in future. This also allows the port and service_id fields to be removed from the afs_call struct. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Update the cache index structureDavid Howells
Update the cache index structure in the following ways: (1) Don't use the volume name followed by the volume type as levels in the cache index. Volumes can be renamed. Use the volume ID instead. (2) Don't store the VLDB data for a volume in the tree. If the volume database should be cached locally, then it should be done in a separate tree. (3) Expand the volume ID stored in the cache to 64 bits. (4) Expand the file/vnode ID stored in the cache to 96 bits. (5) Increment the cache structure version number to 1. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Add some protocol defsDavid Howells
Add some protocol definitions, including max field lengths, flag defs, an XDR-encoded UUID def, more VL operation IDs and more fileserver abort codes. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Push the net ns pointer to more placesDavid Howells
Push the network namespace pointer to more places in AFS, including the afs_server structure (which doesn't hold a ref on the netns). In particular, afs_put_cell() now takes requires a net ns parameter so that it can safely alter the netns after decrementing the cell usage count - the cell will be deallocated by a background thread after being cached for a period, which means that it's not safe to access it after reducing its usage count. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Note the cell in the superblock info alsoDavid Howells
Keep a reference to the cell in the superblock info structure in addition to the volume and net pointers. This will make it easier to clean up in a future patch in which afs_put_volume() will need the cell pointer. Whilst we're at it, make the cell and volume getting functions return a pointer to the object got to make the call sites look neater. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Fix server reapingDavid Howells
Fix server reaping and make sure it's all done before we start trying to purge cells, given that servers currently pin cells. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Close the rxrpc socket only after purging the serversDavid Howells
Close the rxrpc socket only after we've purged the server records (and also cell and volume records which might refer to servers) so that we can give up the callbacks on each server. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Lay the groundwork for supporting network namespacesDavid Howells
Lay the groundwork for supporting network namespaces (netns) to the AFS filesystem by moving various global features to a network-namespace struct (afs_net) and providing an instance of this as a temporary global variable that everything uses via accessor functions for the moment. The following changes have been made: (1) Store the netns in the superblock info. This will be obtained from the mounter's nsproxy on a manual mount and inherited from the parent superblock on an automount. (2) The cell list is made per-netns. It can be viewed through /proc/net/afs/cells and also be modified by writing commands to that file. (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell. This is unset by default. (4) The 'rootcell' module parameter, which sets a cell and VL server list modifies the init net namespace, thereby allowing an AFS root fs to be theoretically used. (5) The volume location lists and the file lock manager are made per-netns. (6) The AF_RXRPC socket and associated I/O bits are made per-ns. The various workqueues remain global for the moment. Changes still to be made: (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced from the old name. (2) A per-netns subsys needs to be registered for AFS into which it can store its per-netns data. (3) Rather than the AF_RXRPC socket being opened on module init, it needs to be opened on the creation of a superblock in that netns. (4) The socket needs to be closed when the last superblock using it is destroyed and all outstanding client calls on it have been completed. This prevents a reference loop on the namespace. (5) It is possible that several namespaces will want to use AFS, in which case each one will need its own UDP port. These can either be set through /proc/net/afs/cm_port or the kernel can pick one at random. The init_ns gets 7001 by default. Other issues that need resolving: (1) The DNS keyring needs net-namespacing. (2) Where do upcalls go (eg. DNS request-key upcall)? (3) Need something like open_socket_in_file_ns() syscall so that AFS command line tools attempting to operate on an AFS file/volume have their RPC calls go to the right place. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13Pass mode to wait_on_atomic_t() action funcs and provide default actionsDavid Howells
Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an extra argument and make it 'unsigned int throughout. Also, consolidate a bunch of identical action functions into a default function that can do the appropriate thing for the mode. Also, change the argument name in the bit_wait*() function declarations to reflect the fact that it's the mode and not the bit number. [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait should be done differently, though he's not immediately sure as to how] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> cc: Ingo Molnar <mingo@kernel.org>
2017-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-18rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signalsDavid Howells
Make AF_RXRPC accept MSG_WAITALL as a flag to sendmsg() to tell it to ignore signals whilst loading up the message queue, provided progress is being made in emptying the queue at the other side. Progress is defined as the base of the transmit window having being advanced within 2 RTT periods. If the period is exceeded with no progress, sendmsg() will return anyway, indicating how much data has been copied, if any. Once the supplied buffer is entirely decanted, the sendmsg() will return. Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18rxrpc: Support service upgrade from a kernel serviceDavid Howells
Provide support for a kernel service to make use of the service upgrade facility. This involves: (1) Pass an upgrade request flag to rxrpc_kernel_begin_call(). (2) Make rxrpc_kernel_recv_data() return the call's current service ID so that the caller can detect service upgrade and see what the service was upgraded to. Signed-off-by: David Howells <dhowells@redhat.com>
2017-09-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - various misc bits - DAX updates - OCFS2 - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits) mm,fork: introduce MADV_WIPEONFORK x86,mpx: make mpx depend on x86-64 to free up VMA flag mm: add /proc/pid/smaps_rollup mm: hugetlb: clear target sub-page last when clearing huge page mm: oom: let oom_reap_task and exit_mmap run concurrently swap: choose swap device according to numa node mm: replace TIF_MEMDIE checks by tsk_is_oom_victim mm, oom: do not rely on TIF_MEMDIE for memory reserves access z3fold: use per-cpu unbuddied lists mm, swap: don't use VMA based swap readahead if HDD is used as swap mm, swap: add sysfs interface for VMA based swap readahead mm, swap: VMA based swap readahead mm, swap: fix swap readahead marking mm, swap: add swap readahead hit statistics mm/vmalloc.c: don't reinvent the wheel but use existing llist API mm/vmstat.c: fix wrong comment selftests/memfd: add memfd_create hugetlbfs selftest mm/shmem: add hugetlbfs support to memfd_create() mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas() ...
2017-09-06fscache: remove unused ->now_uncached callbackJan Kara
Patch series "Ranged pagevec lookup", v2. In this series I make pagevec_lookup() update the index (to be consistent with pagevec_lookup_tag() and also as a preparation for ranged lookups), provide ranged variant of pagevec_lookup() and use it in places where it makes sense. This not only removes some common code but is also a measurable performance win for some use cases (see patch 4/10) where radix tree is sparse and searching & grabing of a page after the end of the range has measurable overhead. This patch (of 10): The callback doesn't ever get called. Remove it. Link: http://lkml.kernel.org/r/20170726114704.7626-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-08-29rxrpc: Add notification of end-of-Tx phaseDavid Howells
Add a callback to rxrpc_kernel_send_data() so that a kernel service can get a notification that the AF_RXRPC call has transitioned out the Tx phase and is now waiting for a reply or a final ACK. This is called from AF_RXRPC with the call state lock held so the notification is guaranteed to come before any reply is passed back. Further, modify the AFS filesystem to make use of this so that we don't have to change the afs_call state before sending the last bit of data. Signed-off-by: David Howells <dhowells@redhat.com>
2017-08-01fs: convert a pile of fsync routines to errseq_t based reportingJeff Layton
This patch converts most of the in-kernel filesystems that do writeback out of the pagecache to report errors using the errseq_t-based infrastructure that was recently added. This allows them to report errors once for each open file description. Most filesystems have a fairly straightforward fsync operation. They call filemap_write_and_wait_range to write back all of the data and wait on it, and then (sometimes) sync out the metadata. For those filesystems this is a straightforward conversion from calling filemap_write_and_wait_range in their fsync operation to calling file_write_and_wait_range. Acked-by: Jan Kara <jack@suse.cz> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-21rxrpc: Move the packet.h include file into net/rxrpc/David Howells
Move the protocol description header file into net/rxrpc/ and rename it to protocol.h. It's no longer necessary to expose it as packets are no longer exposed to kernel services (such as AFS) that use the facility. The abort codes are transferred to the UAPI header instead as we pass these back to userspace and also to kernel services. Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-15Merge branch 'work.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ->s_options removal from Al Viro: "Preparations for fsmount/fsopen stuff (coming next cycle). Everything gets moved to explicit ->show_options(), killing ->s_options off + some cosmetic bits around fs/namespace.c and friends. Basically, the stuff needed to work with fsmount series with minimum of conflicts with other work. It's not strictly required for this merge window, but it would reduce the PITA during the coming cycle, so it would be nice to have those bits and pieces out of the way" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: isofs: Fix isofs_show_options() VFS: Kill off s_options and helpers orangefs: Implement show_options 9p: Implement show_options isofs: Implement show_options afs: Implement show_options affs: Implement show_options befs: Implement show_options spufs: Implement show_options bpf: Implement show_options ramfs: Implement show_options pstore: Implement show_options omfs: Implement show_options hugetlbfs: Implement show_options VFS: Don't use save/replace_mount_options if not using generic_show_options VFS: Provide empty name qstr VFS: Make get_filesystem() return the affected filesystem VFS: Clean up whitespace in fs/namespace.c and fs/super.c Provide a function to create a NUL-terminated string from unterminated data
2017-07-11afs: Implement show_optionsDavid Howells
Implement the show_options superblock op for afs as part of a bid to get rid of s_options and generic_show_options() to make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Also implement the show_devname op to display the correct device name and thus avoid the need to display the cell= and volume= options. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-09afs: Add metadata xattrsDavid Howells
Add xattrs to allow the user to get/set metadata in lieu of having pioctl() available. The following xattrs are now available: - "afs.cell" The name of the cell in which the vnode's volume resides. - "afs.fid" The volume ID, vnode ID and vnode uniquifier of the file as three hex numbers separated by colons. - "afs.volume" The name of the volume in which the vnode resides. For example: # getfattr -d -m ".*" /mnt/scratch getfattr: Removing leading '/' from absolute path names # file: mnt/scratch afs.cell="mycell.myorg.org" afs.fid="10000b:1:1" afs.volume="scratch" Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-09afs: Ignore AFS_ACE_READ and AFS_ACE_WRITE for directoriesMarc Dionne
The AFS_ACE_READ and AFS_ACE_WRITE permission bits should not be used to make access decisions for the directory itself. They are meant to control access for the objects contained in that directory. Reading a directory is allowed if the AFS_ACE_LOOKUP bit is set. This would cause an incorrect access denied error for a directory with AFS_ACE_LOOKUP but not AFS_ACE_READ. The AFS_ACE_WRITE bit does not allow operations that modify the directory. For a directory with AFS_ACE_WRITE but neither AFS_ACE_INSERT nor AFS_ACE_DELETE, this would result in trying operations that would ultimately be denied by the server. Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Reasonably busy this cycle, but perhaps not as busy as in the 4.12 merge window: 1) Several optimizations for UDP processing under high load from Paolo Abeni. 2) Support pacing internally in TCP when using the sch_fq packet scheduler for this is not practical. From Eric Dumazet. 3) Support mutliple filter chains per qdisc, from Jiri Pirko. 4) Move to 1ms TCP timestamp clock, from Eric Dumazet. 5) Add batch dequeueing to vhost_net, from Jason Wang. 6) Flesh out more completely SCTP checksum offload support, from Davide Caratti. 7) More plumbing of extended netlink ACKs, from David Ahern, Pablo Neira Ayuso, and Matthias Schiffer. 8) Add devlink support to nfp driver, from Simon Horman. 9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa Prabhu. 10) Add stack depth tracking to BPF verifier and use this information in the various eBPF JITs. From Alexei Starovoitov. 11) Support XDP on qed device VFs, from Yuval Mintz. 12) Introduce BPF PROG ID for better introspection of installed BPF programs. From Martin KaFai Lau. 13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann. 14) For loads, allow narrower accesses in bpf verifier checking, from Yonghong Song. 15) Support MIPS in the BPF selftests and samples infrastructure, the MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David Daney. 16) Support kernel based TLS, from Dave Watson and others. 17) Remove completely DST garbage collection, from Wei Wang. 18) Allow installing TCP MD5 rules using prefixes, from Ivan Delalande. 19) Add XDP support to Intel i40e driver, from Björn Töpel 20) Add support for TC flower offload in nfp driver, from Simon Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub Kicinski, and Bert van Leeuwen. 21) IPSEC offloading support in mlx5, from Ilan Tayari. 22) Add HW PTP support to macb driver, from Rafal Ozieblo. 23) Networking refcount_t conversions, From Elena Reshetova. 24) Add sock_ops support to BPF, from Lawrence Brako. This is useful for tuning the TCP sockopt settings of a group of applications, currently via CGROUPs" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits) net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap cxgb4: Support for get_ts_info ethtool method cxgb4: Add PTP Hardware Clock (PHC) support cxgb4: time stamping interface for PTP nfp: default to chained metadata prepend format nfp: remove legacy MAC address lookup nfp: improve order of interfaces in breakout mode net: macb: remove extraneous return when MACB_EXT_DESC is defined bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case bpf: fix return in load_bpf_file mpls: fix rtm policy in mpls_getroute net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t net, ax25: convert ax25_route.refcount from atomic_t to refcount_t net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t ...
2017-06-07rxrpc: Provide a cmsg to specify the amount of Tx data for a callDavid Howells
Provide a control message that can be specified on the first sendmsg() of a client call or the first sendmsg() of a service response to indicate the total length of the data to be transmitted for that call. Currently, because the length of the payload of an encrypted DATA packet is encrypted in front of the data, the packet cannot be encrypted until we know how much data it will hold. By specifying the length at the beginning of the transmit phase, each DATA packet length can be set before we start loading data from userspace (where several sendmsg() calls may contribute to a particular packet). An error will be returned if too little or too much data is presented in the Tx phase. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05uuid,afs: move struct uuid_v1 back into afsChristoph Hellwig
This essentially is a partial revert of commit ff548773 ("afs: Move UUID struct to linux/uuid.h") and moves struct uuid_v1 back into fs/afs as struct afs_uuid. It however keeps it as big endian structure so that we can use the normal uuid generation helpers when casting to/from struct afs_uuid. The V1 uuid intrepretation in struct form isn't really useful to the rest of the kernel, and not really compatible to it either, so move it back to AFS instead of polluting the global uuid.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Howells <dhowells@redhat.com>
2017-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Millar: "Here are some highlights from the 2065 networking commits that happened this development cycle: 1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri) 2) Add a generic XDP driver, so that anyone can test XDP even if they lack a networking device whose driver has explicit XDP support (me). 3) Sparc64 now has an eBPF JIT too (me) 4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei Starovoitov) 5) Make netfitler network namespace teardown less expensive (Florian Westphal) 6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana) 7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger) 8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky) 9) Multiqueue support in stmmac driver (Joao Pinto) 10) Remove TCP timewait recycling, it never really could possibly work well in the real world and timestamp randomization really zaps any hint of usability this feature had (Soheil Hassas Yeganeh) 11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay Aleksandrov) 12) Add socket busy poll support to epoll (Sridhar Samudrala) 13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso, and several others) 14) IPSEC hw offload infrastructure (Steffen Klassert)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits) tipc: refactor function tipc_sk_recv_stream() tipc: refactor function tipc_sk_recvmsg() net: thunderx: Optimize page recycling for XDP net: thunderx: Support for XDP header adjustment net: thunderx: Add support for XDP_TX net: thunderx: Add support for XDP_DROP net: thunderx: Add basic XDP support net: thunderx: Cleanup receive buffer allocation net: thunderx: Optimize CQE_TX handling net: thunderx: Optimize RBDR descriptor handling net: thunderx: Support for page recycling ipx: call ipxitf_put() in ioctl error path net: sched: add helpers to handle extended actions qed*: Fix issues in the ptp filter config implementation. qede: Fix concurrency issue in PTP Tx path processing. stmmac: Add support for SIMATIC IOT2000 platform net: hns: fix ethtool_get_strings overflow in hns driver tcp: fix wraparound issue in tcp_lp bpf, arm64: fix jit branch offset related to ldimm64 bpf, arm64: implement jiting of BPF_XADD ...
2017-04-20afs: Convert to separately allocated bdiJan Kara
Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: David Howells <dhowells@redhat.com> CC: linux-afs@lists.infradead.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-06rxrpc: Use negative error codes in rxrpc_call structDavid Howells
Use negative error codes in struct rxrpc_call::error because that's what the kernel normally deals with and to make the code consistent. We only turn them positive when transcribing into a cmsg for userspace recvmsg. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Don't wait for page writeback with the page lock heldDavid Howells
Drop the page lock before waiting for page writeback. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: ->writepage() shouldn't call clear_page_dirty_for_io()David Howells
The ->writepage() op shouldn't call clear_page_dirty_for_io() as that has already been called by the caller. Fix afs_writepage() by moving the call out of afs_write_back_from_locked_page() to afs_writepages_region() where it is needed. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Fix abort on signal while waiting for call completionDavid Howells
Fix the way in which a call that's in progress and being waited for is aborted in the case that EINTR is detected. We should be sending RX_USER_ABORT rather than RX_CALL_DEAD as the abort code. Note that since the only two ways out of the loop are if the call completes or if a signal happens, the kill-the-call clause after the loop has finished can only happen in the case of EINTR. This means that we only have one abort case to deal with, not two, and the "KWC" case can never happen and so can be deleted. Note further that simply aborting the call isn't necessarily the best thing here since at this point: the request has been entirely sent and it's likely the server will do the operation anyway - whether we abort it or not. In future, we should punt the handling of the remainder of the call off to a background thread. Reported-by: Marc Dionne <marc.c.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Fix an off-by-one error in afs_send_pages()David Howells
afs_send_pages() should only put the call into the AFS_CALL_AWAIT_REPLY state if it has sent all the pages - but the check it makes is incorrect and sometimes it will finish the loop early. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Fix afs_kill_pages()David Howells
Fix afs_kill_pages() in two ways: (1) If a writeback has been partially flushed, then if we try and kill the pages it contains, some of them may no longer be undergoing writeback and end_page_writeback() will assert. Fix this by checking to see whether the page in question is actually undergoing writeback before ending that writeback. (2) The loop that scans for pages to kill doesn't increase the first page index, and so the loop may not terminate, but it will try to process the same pages over and over again. Fix this by increasing the first page index to one after the last page we processed. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Fix page leak in afs_write_begin()David Howells
afs_write_begin() leaks a ref and a lock on a page if afs_fill_page() fails. Fix the leak by unlocking and releasing the page in the error path. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Don't set PG_error on local EINTR or ENOMEM when filling a pageDavid Howells
Don't set PG_error on a page if we get local EINTR or ENOMEM when filling a page for writing. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Populate and use client modification timeMarc Dionne
The inode timestamps should be set from the client time in the status received from the server, rather than the server time which is meant for internal server use. Set AFS_SET_MTIME and populate the mtime for operations that take an input status, such as file/dir creation and StoreData. If an input time is not provided the server will set the vnode times based on the current server time. In a situation where the server has some skew with the client, this could lead to the client seeing a timestamp in the future for a file that it just created or wrote. Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Better abort and net error handlingDavid Howells
If we receive a network error, a remote abort or a protocol error whilst we're still transmitting data, make sure we return an appropriate error to the caller rather than ESHUTDOWN or ECONNABORTED. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Invalid op ID should abort with RXGEN_OPCODEDavid Howells
When we are given an invalid operation ID, we should abort that with RXGEN_OPCODE rather than RX_INVALID_OPERATION. Also map RXGEN_OPCODE to -ENOTSUPP. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Fix the maths in afs_fs_store_data()David Howells
afs_fs_store_data() works out of the size of the write it's going to make, but it uses 32-bit unsigned subtraction in one place that gets automatically cast to loff_t. However, if to < offset, then the number goes negative, but as the result isn't signed, this doesn't get sign-extended to 64-bits when placed in a loff_t. Fix by casting the operands to loff_t. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Use a bvec rather than a kvec in afs_send_pages()David Howells
Use a bvec rather than a kvec in afs_send_pages() as we don't then have to call kmap() in advance. This allows us to pass the array of contiguous pages that we extracted through to rxrpc in one go rather than passing a single page at a time. Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Make struct afs_read::remain 64-bitDavid Howells
Make struct afs_read::remain 64-bit so that it can handle huge transfers if we ever request them or the server decides to give us a bit extra data (the other fields there are already 64-bit). Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com>
2017-03-16afs: Fix AFS read bugDavid Howells
Fix a bug in AFS read whereby the request page afs_read::index isn't incremented after calling ->page_done() if ->remain reaches 0, indicating that the data read is complete. Without this a NULL pointer exception happens when ->page_done() is called twice for the last page because the page clearing loop will call it also and afs_readpages_page_done() clears the current entry in the page list. BUG: unable to handle kernel NULL pointer dereference at (null) IP: afs_readpages_page_done+0x21/0xa4 [kafs] PGD 0 Oops: 0002 [#1] SMP Modules linked in: kafs(E) CPU: 2 PID: 3002 Comm: md5sum Tainted: G E 4.10.0-fscache #485 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff8804017d86c0 task.stack: ffff8803fc1d8000 RIP: 0010:afs_readpages_page_done+0x21/0xa4 [kafs] RSP: 0018:ffff8803fc1db978 EFLAGS: 00010282 RAX: ffff880405d39af8 RBX: 0000000000000000 RCX: ffff880407d83ed4 RDX: 0000000000000000 RSI: ffff880405d39a00 RDI: ffff880405c6f400 RBP: ffff8803fc1db988 R08: 0000000000000000 R09: 0000000000000001 R10: ffff8803fc1db820 R11: ffff88040cf56000 R12: ffff8804088f1780 R13: ffff8804017d86c0 R14: ffff8804088f1780 R15: 0000000000003840 FS: 00007f8154469700(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000004016ec000 CR4: 00000000001406e0 Call Trace: afs_deliver_fs_fetch_data+0x5b9/0x60e [kafs] ? afs_make_call+0x316/0x4e8 [kafs] ? afs_make_call+0x359/0x4e8 [kafs] afs_deliver_to_call+0x173/0x2e8 [kafs] ? afs_make_call+0x316/0x4e8 [kafs] afs_make_call+0x37a/0x4e8 [kafs] ? wake_up_q+0x4f/0x4f ? __init_waitqueue_head+0x36/0x49 afs_fs_fetch_data+0x21c/0x227 [kafs] ? afs_fs_fetch_data+0x21c/0x227 [kafs] afs_vnode_fetch_data+0xf3/0x1d2 [kafs] afs_readpages+0x314/0x3fd [kafs] __do_page_cache_readahead+0x208/0x2c5 ondemand_readahead+0x3a2/0x3b7 ? ondemand_readahead+0x3a2/0x3b7 page_cache_async_readahead+0x5e/0x67 generic_file_read_iter+0x23b/0x70c ? __inode_security_revalidate+0x2f/0x62 __vfs_read+0xc4/0xe8 vfs_read+0xd1/0x15a SyS_read+0x4c/0x89 do_syscall_64+0x80/0x191 entry_SYSCALL64_slow_path+0x25/0x25 Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com>
2017-03-16afs: Prevent callback expiry timer overflowTina Ruchandani
get_seconds() returns real wall-clock seconds. On 32-bit systems this value will overflow in year 2038 and beyond. This patch changes afs_vnode record to use ktime_get_real_seconds() instead, for the fields cb_expires and cb_expires_at. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Migrate vlocation fields to 64-bitTina Ruchandani
get_seconds() returns real wall-clock seconds. On 32-bit systems this value will overflow in year 2038 and beyond. This patch changes afs's vlocation record to use ktime_get_real_seconds() instead, for the fields time_of_death and update_at. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: security: Replace rcu_assign_pointer() with RCU_INIT_POINTER()Andreea-Cristina Bernat
The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: inode: Replace rcu_assign_pointer() with RCU_INIT_POINTER()Andreea-Cristina Bernat
The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Distinguish mountpoints from symlinks by file mode aloneDavid Howells
In AFS, mountpoints appear as symlinks with mode 0644 and normal symlinks have mode 0777, so use this to distinguish them rather than reading the content and parsing it. In the case of a mountpoint, the symlink body is a formatted string indicating the location of the target volume. Note that with this, kAFS no longer 'pre-fetches' the contents of symlinks, so afs_readpage() may fail with an access-denial because when the VFS calls d_automount(), it wraps the call in an credentials override that sets the initial creds - thereby preventing access to the caller's keyrings and the authentication keys held therein. To this end, a patch reverting that change to the VFS is required also. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Flush outstanding writes when an fd is closedDavid Howells
Flush outstanding writes in afs when an fd is closed. This is what NFS and CIFS do. Reported-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16afs: Handle a short write to an AFS pageDavid Howells
Handle the situation where afs_write_begin() is told to expect that a full-page write will be made, but this doesn't happen (EFAULT, CTRL-C, etc.), and so afs_write_end() sees a partial write took place. Currently, no attempt is to deal with the discrepency. Fix this by loading the gap from the server. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com>