summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2020-02-14nvme/pci: move cqe check after device shutdownKeith Busch
Many users have reported nvme triggered irq_startup() warnings during shutdown. The driver uses the nvme queue's irq to synchronize scanning for completions, and enabling an interrupt affined to only offline CPUs triggers the alarming warning. Move the final CQE check to after disabling the device and all registered interrupts have been torn down so that we do not have any IRQ to synchronize. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206509 Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-14nvme: prevent warning triggered by nvme_stop_keep_aliveNigel Kirkland
Delayed keep alive work is queued on system workqueue and may be cancelled via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq. Check_flush_dependency detects mismatched attributes between the work-queue context used to cancel the keep alive work and system-wq. Specifically system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used to cancel keep alive work have WQ_MEM_RECLAIM flag. Example warning: workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc] is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core] To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq. However this creates a secondary concern where work and a request to cancel that work may be in the same work queue - namely err_work in the rdma and tcp transports, which will want to flush/cancel the keep alive work which will now be on nvme_wq. After reviewing the transports, it looks like err_work can be moved to nvme_reset_wq. In fact that aligns them better with transition into RESETTING and performing related reset work in nvme_reset_wq. Change nvme-rdma and nvme-tcp to perform err_work in nvme_reset_wq. Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-14nvme/tcp: fix bug on double requeue when send failsAnton Eidelman
When nvme_tcp_io_work() fails to send to socket due to connection close/reset, error_recovery work is triggered from nvme_tcp_state_change() socket callback. This cancels all the active requests in the tagset, which requeues them. The failed request, however, was ended and thus requeued individually as well unless send returned -EPIPE. Another return code to be treated the same way is -ECONNRESET. Double requeue caused BUG_ON(blk_queued_rq(rq)) in blk_mq_requeue_request() from either the individual requeue of the failed request or the bulk requeue from blk_mq_tagset_busy_iter(, nvme_cancel_request, ); Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-13bcache: remove macro nr_to_fifo_front()Coly Li
Macro nr_to_fifo_front() is only used once in btree_flush_write(), it is unncessary indeed. This patch removes this macro and does calculation directly in place. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-13bcache: Revert "bcache: shrink btree node cache after bch_btree_check()"Coly Li
This reverts commit 1df3877ff6a4810054237c3259d900ded4468969. In my testing, sometimes even all the cached btree nodes are freed, creating gc and allocator kernel threads may still fail. Finally it turns out that kthread_run() may fail if there is pending signal for current task. And the pending signal is sent from OOM killer which is triggered by memory consuption in bch_btree_check(). Therefore explicitly shrinking bcache btree node here does not help, and after the shrinker callback is improved, as well as pending signals are ignored before creating kernel threads, now such operation is unncessary anymore. This patch reverts the commit 1df3877ff6a4 ("bcache: shrink btree node cache after bch_btree_check()") because we have better improvement now. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-13bcache: ignore pending signals when creating gc and allocator threadColy Li
When run a cache set, all the bcache btree node of this cache set will be checked by bch_btree_check(). If the bcache btree is very large, iterating all the btree nodes will occupy too much system memory and the bcache registering process might be selected and killed by system OOM killer. kthread_run() will fail if current process has pending signal, therefore the kthread creating in run_cache_set() for gc and allocator kernel threads are very probably failed for a very large bcache btree. Indeed such OOM is safe and the registering process will exit after the registration done. Therefore this patch flushes pending signals during the cache set start up, specificly in bch_cache_allocator_start() and bch_gc_thread_start(), to make sure run_cache_set() won't fail for large cahced data set. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-04Merge branch 'nvme-5.6' of git://git.infradead.org/nvme into block-5.6Jens Axboe
Pull NVMe fixes from Keith. * 'nvme-5.6' of git://git.infradead.org/nvme: nvmet: update AEN list and array at one place nvmet: Fix controller use after free nvmet: Fix error print message at nvmet_install_queue function nvme-pci: remove nvmeq->tags nvmet: fix dsm failure when payload does not match sgl descriptor nvmet: Pass lockdep expression to RCU lists
2020-02-05nvmet: update AEN list and array at one placeDaniel Wagner
All async events are enqueued via nvmet_add_async_event() which updates the ctrl->async_event_cmds[] array and additionally an struct nvmet_async_event is added to the ctrl->async_events list. Under normal operations the nvmet_async_event_work() updates again the ctrl->async_event_cmds and removes the corresponding struct nvmet_async_event from the list again. Though nvmet_sq_destroy() could be called which calls nvmet_async_events_free() which only updates the ctrl->async_event_cmds[] array. Add new functions nvmet_async_events_process() and nvmet_async_events_free() to process async events, update an array and the list. When we destroy submission queue after clearing the aen present on the ctrl->async list we also loop over ctrl->async_event_cmds[] for any requests posted by the host for which we don't have the AEN in the ctrl->async_events list by calling nvmet_async_event_process() and nvmet_async_events_free(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Wagner <dwagner@suse.de> [chaitanya.kulkarni@wdc.com * Loop over and clear out outstanding requests * Update changelog ] Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-02-05nvmet: Fix controller use after freeIsrael Rukshin
After nvmet_install_queue() sets sq->ctrl calling to nvmet_sq_destroy() reduces the controller refcount. In case nvmet_install_queue() fails, calling to nvmet_ctrl_put() is done twice (at nvmet_sq_destroy and nvmet_execute_io_connect/nvmet_execute_admin_connect) instead of once for the queue which leads to use after free of the controller. Fix this by set NULL at sq->ctrl in case of a failure at nvmet_install_queue(). The bug leads to the following Call Trace: [65857.994862] refcount_t: underflow; use-after-free. [65858.108304] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] [65858.115557] RIP: 0010:refcount_warn_saturate+0xe5/0xf0 [65858.208141] Call Trace: [65858.211203] nvmet_sq_destroy+0xe1/0xf0 [nvmet] [65858.216383] nvmet_rdma_release_queue_work+0x37/0xf0 [nvmet_rdma] [65858.223117] process_one_work+0x167/0x370 [65858.227776] worker_thread+0x49/0x3e0 [65858.232089] kthread+0xf5/0x130 [65858.235895] ? max_active_store+0x80/0x80 [65858.240504] ? kthread_bind+0x10/0x10 [65858.244832] ret_from_fork+0x1f/0x30 [65858.249074] ---[ end trace f82d59250b54beb7 ]--- Fixes: bb1cc74790eb ("nvmet: implement valid sqhd values in completions") Fixes: 1672ddb8d691 ("nvmet: Add install_queue callout") Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-02-05nvmet: Fix error print message at nvmet_install_queue functionIsrael Rukshin
Place the arguments in the correct order. Fixes: 1672ddb8d691 ("nvmet: Add install_queue callout") Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-02-04brd: check and limit max_part parZhiqiang Liu
In brd_init func, rd_nr num of brd_device are firstly allocated and add in brd_devices, then brd_devices are traversed to add each brd_device by calling add_disk func. When allocating brd_device, the disk->first_minor is set to i * max_part, if rd_nr * max_part is larger than MINORMASK, two different brd_device may have the same devt, then only one of them can be successfully added. when rmmod brd.ko, it will cause oops when calling brd_exit. Follow those steps: # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576 # rmmod brd then, the oops will appear. Oops log: [ 726.613722] Call trace: [ 726.614175] kernfs_find_ns+0x24/0x130 [ 726.614852] kernfs_find_and_get_ns+0x44/0x68 [ 726.615749] sysfs_remove_group+0x38/0xb0 [ 726.616520] blk_trace_remove_sysfs+0x1c/0x28 [ 726.617320] blk_unregister_queue+0x98/0x100 [ 726.618105] del_gendisk+0x144/0x2b8 [ 726.618759] brd_exit+0x68/0x560 [brd] [ 726.619501] __arm64_sys_delete_module+0x19c/0x2a0 [ 726.620384] el0_svc_common+0x78/0x130 [ 726.621057] el0_svc_handler+0x38/0x78 [ 726.621738] el0_svc+0x8/0xc [ 726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260) Here, we add brd_check_and_reset_par func to check and limit max_part par. -- V5->V6: - remove useless code V4->V5:(suggested by Ming Lei) - make sure max_part is not larger than DISK_MAX_PARTS V3->V4:(suggested by Ming Lei) - remove useless change - add one limit of max_part V2->V3: (suggested by Ming Lei) - clear .minors when running out of consecutive minor space in brd_alloc - remove limit of rd_nr V1->V2: - add more checks in brd_check_par_valid as suggested by Ming Lei. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-04nvme-pci: remove nvmeq->tagsChristoph Hellwig
There is no real need to have a pointer to the tagset in struct nvme_queue, as we only need it in a single place, and that place can derive the used tagset from the device and qid trivially. This fixes a problem with stale pointer exposure when tagsets are reset, and also shrinks the nvme_queue structure. It also matches what most other transports have done since day 1. Reported-by: Edmund Nadolski <edmund.nadolski@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-02-04nvmet: fix dsm failure when payload does not match sgl descriptorSagi Grimberg
The host is allowed to pass the controller an sgl describing a buffer that is larger than the dsm payload itself, allow it when executing dsm. Reported-by: Dakshaja Uppalapati <dakshaja@chelsio.com> Reviewed-by: Christoph Hellwig <hch@lst.de>, Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-02-04nvmet: Pass lockdep expression to RCU listsAmol Grover
ctrl->subsys->namespaces and subsys->namespaces are traversed with list_for_each_entry_rcu outside an RCU read-side critical section but under the protection of ctrl->subsys->lock and subsys->lock respectively. Hence, add the corresponding lockdep expression to the list traversal primitive to silence false-positive lockdep warnings, and harden RCU lists. Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-02-03block, bfq: clarify the goal of bfq_split_bfqq()Paolo Valente
The exact, general goal of the function bfq_split_bfqq() is not that apparent. Add a comment to make it clear. Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-03block, bfq: get a ref to a group when adding it to a service treePaolo Valente
BFQ schedules generic entities, which may represent either bfq_queues or groups of bfq_queues. When an entity is inserted into a service tree, a reference must be taken, to make sure that the entity does not disappear while still referred in the tree. Unfortunately, such a reference is mistakenly taken only if the entity represents a bfq_queue. This commit takes a reference also in case the entity represents a group. Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Chris Evich <cevich@redhat.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-03block, bfq: remove ifdefs from around gets/puts of bfq groupsPaolo Valente
ifdefs around gets and puts of bfq groups reduce readability, remove them. Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-03block, bfq: extend incomplete name of field on_stPaolo Valente
The flag on_st in the bfq_entity data structure is true if the entity is on a service tree or is in service. Yet the name of the field, confusingly, does not mention the second, very important case. Extend the name to mention the second case too. Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-03block, bfq: get extra ref to prevent a queue from being freed during a group ↵Paolo Valente
move In bfq_bfqq_move(), the bfq_queue, say Q, to be moved to a new group may happen to be deactivated in the scheduling data structures of the source group (and then activated in the destination group). If Q is referred only by the data structures in the source group when the deactivation happens, then Q is freed upon the deactivation. This commit addresses this issue by getting an extra reference before the possible deactivation, and releasing this extra reference after Q has been moved. Tested-by: Chris Evich <cevich@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-03block, bfq: do not insert oom queue into position treePaolo Valente
BFQ maintains an ordered list, implemented with an RB tree, of head-request positions of non-empty bfq_queues. This position tree, inherited from CFQ, is used to find bfq_queues that contain I/O close to each other. BFQ merges these bfq_queues into a single shared queue, if this boosts throughput on the device at hand. There is however a special-purpose bfq_queue that does not participate in queue merging, the oom bfq_queue. Yet, also this bfq_queue could be wrongly added to the position tree. So bfqq_find_close() could return the oom bfq_queue, which is a source of further troubles in an out-of-memory situation. This commit prevents the oom bfq_queue from being inserted into the position tree. Tested-by: Patrick Dung <patdung100@gmail.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-03block, bfq: do not plug I/O for bfq_queues with no proc refsPaolo Valente
Commit 478de3380c1c ("block, bfq: deschedule empty bfq_queues not referred by any process") fixed commit 3726112ec731 ("block, bfq: re-schedule empty queues if they deserve I/O plugging") by descheduling an empty bfq_queue when it remains with not process reference. Yet, this still left a case uncovered: an empty bfq_queue with not process reference that remains in service. This happens for an in-service sync bfq_queue that is deemed to deserve I/O-dispatch plugging when it remains empty. Yet no new requests will arrive for such a bfq_queue if no process sends requests to it any longer. Even worse, the bfq_queue may happen to be prematurely freed while still in service (because there may remain no reference to it any longer). This commit solves this problem by preventing I/O dispatch from being plugged for the in-service bfq_queue, if the latter has no process reference (the bfq_queue is then prevented from remaining in service). Fixes: 3726112ec731 ("block, bfq: re-schedule empty queues if they deserve I/O plugging") Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reported-by: Patrick Dung <patdung100@gmail.com> Tested-by: Patrick Dung <patdung100@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-01bcache: check return value of prio_read()Coly Li
Now if prio_read() failed during starting a cache set, we can print out error message in run_cache_set() and handle the failure properly. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-01bcache: fix incorrect data type usage in btree_flush_write()Coly Li
Dan Carpenter points out that from commit 2aa8c529387c ("bcache: avoid unnecessary btree nodes flushing in btree_flush_write()"), there is a incorrect data type usage which leads to the following static checker warning: drivers/md/bcache/journal.c:444 btree_flush_write() warn: 'ref_nr' unsigned <= 0 drivers/md/bcache/journal.c 422 static void btree_flush_write(struct cache_set *c) 423 { 424 struct btree *b, *t, *btree_nodes[BTREE_FLUSH_NR]; 425 unsigned int i, nr, ref_nr; ^^^^^^ 426 atomic_t *fifo_front_p, *now_fifo_front_p; 427 size_t mask; 428 429 if (c->journal.btree_flushing) 430 return; 431 432 spin_lock(&c->journal.flush_write_lock); 433 if (c->journal.btree_flushing) { 434 spin_unlock(&c->journal.flush_write_lock); 435 return; 436 } 437 c->journal.btree_flushing = true; 438 spin_unlock(&c->journal.flush_write_lock); 439 440 /* get the oldest journal entry and check its refcount */ 441 spin_lock(&c->journal.lock); 442 fifo_front_p = &fifo_front(&c->journal.pin); 443 ref_nr = atomic_read(fifo_front_p); 444 if (ref_nr <= 0) { ^^^^^^^^^^^ Unsigned can't be less than zero. 445 /* 446 * do nothing if no btree node references 447 * the oldest journal entry 448 */ 449 spin_unlock(&c->journal.lock); 450 goto out; 451 } 452 spin_unlock(&c->journal.lock); As the warning information indicates, local varaible ref_nr in unsigned int type is wrong, which does not matche atomic_read() and the "<= 0" checking. This patch fixes the above error by defining local variable ref_nr as int type. Fixes: 2aa8c529387c ("bcache: avoid unnecessary btree nodes flushing in btree_flush_write()") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-01bcache: add readahead cache policy options via sysfs interfaceColy Li
In year 2007 high performance SSD was still expensive, in order to save more space for real workload or meta data, the readahead I/Os for non-meta data was bypassed and not cached on SSD. In now days, SSD price drops a lot and people can find larger size SSD with more comfortable price. It is unncessary to alway bypass normal readahead I/Os to save SSD space for now. This patch adds options for readahead data cache policies via sysfs file /sys/block/bcache<N>/readahead_cache_policy, the options are, - "all": cache all readahead data I/Os. - "meta-only": only cache meta data, and bypass other regular I/Os. If users want to make bcache continue to only cache readahead request for metadata and bypass regular data readahead, please set "meta-only" to this sysfs file. By default, bcache will back to cache all read- ahead requests now. Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-01bcache: explicity type cast in bset_bkey_last()Coly Li
In bset.h, macro bset_bkey_last() is defined as, bkey_idx((struct bkey *) (i)->d, (i)->keys) Parameter i can be variable type of data structure, the macro always works once the type of struct i has member 'd' and 'keys'. bset_bkey_last() is also used in macro csum_set() to calculate the checksum of a on-disk data structure. When csum_set() is used to calculate checksum of on-disk bcache super block, the parameter 'i' data type is struct cache_sb_disk. Inside struct cache_sb_disk (also in struct cache_sb) the member keys is __u16 type. But bkey_idx() expects unsigned int (a 32bit width), so there is problem when sending parameters via stack to call bkey_idx(). Sparse tool from Intel 0day kbuild system reports this incompatible problem. bkey_idx() is part of user space API, so the simplest fix is to cast the (i)->keys to unsigned int type in macro bset_bkey_last(). Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-01bcache: fix memory corruption in bch_cache_accounting_clear()Coly Li
Commit 83ff9318c44ba ("bcache: not use hard coded memset size in bch_cache_accounting_clear()") tries to make the code more easy to understand by removing the hard coded number with following change, void bch_cache_accounting_clear(...) { memset(&acc->total.cache_hits, 0, - sizeof(unsigned long) * 7); + sizeof(struct cache_stats)); } Unfortunately the change was wrong (it also tells us the original code was not easy to correctly understand). The hard coded number 7 is used because in struct cache_stats, 15 struct cache_stats { 16 struct kobject kobj; 17 18 unsigned long cache_hits; 19 unsigned long cache_misses; 20 unsigned long cache_bypass_hits; 21 unsigned long cache_bypass_misses; 22 unsigned long cache_readaheads; 23 unsigned long cache_miss_collisions; 24 unsigned long sectors_bypassed; 25 26 unsigned int rescale; 27 }; only members in LINE 18-24 want to be set to 0. It is wrong to use 'sizeof(struct cache_stats)' to replace 'sizeof(unsigned long) * 7), the memory objects behind acc->total is staled by this change. Сорокин Артем Сергеевич reports that by the following steps, kernel panic will be triggered, 1. Create new set: make-bcache -B /dev/nvme1n1 -C /dev/sda --wipe-bcache 2. Run in /sys/fs/bcache/<uuid>: echo 1 > clear_stats && cat stats_five_minute/cache_bypass_hits I can reproduce the panic and get following dmesg with KASAN enabled, [22613.172742] ================================================================== [22613.172862] BUG: KASAN: null-ptr-deref in sysfs_kf_seq_show+0x117/0x230 [22613.172864] Read of size 8 at addr 0000000000000000 by task cat/6753 [22613.172870] CPU: 1 PID: 6753 Comm: cat Not tainted 5.5.0-rc7-lp151.28.16-default+ #11 [22613.172872] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [22613.172873] Call Trace: [22613.172964] dump_stack+0x8b/0xbb [22613.172968] ? sysfs_kf_seq_show+0x117/0x230 [22613.172970] ? sysfs_kf_seq_show+0x117/0x230 [22613.173031] __kasan_report+0x176/0x192 [22613.173064] ? pr_cont_kernfs_name+0x40/0x60 [22613.173067] ? sysfs_kf_seq_show+0x117/0x230 [22613.173070] kasan_report+0xe/0x20 [22613.173072] sysfs_kf_seq_show+0x117/0x230 [22613.173105] seq_read+0x199/0x6d0 [22613.173110] vfs_read+0xa5/0x1a0 [22613.173113] ksys_read+0x110/0x160 [22613.173115] ? kernel_write+0xb0/0xb0 [22613.173177] do_syscall_64+0x77/0x290 [22613.173238] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [22613.173241] RIP: 0033:0x7fc2c886ac61 [22613.173244] Code: fe ff ff 48 8d 3d c7 a0 09 00 48 83 ec 08 e8 46 03 02 00 66 0f 1f 44 00 00 8b 05 ca fb 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 [22613.173245] RSP: 002b:00007ffebe776d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [22613.173248] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2c886ac61 [22613.173249] RDX: 0000000000020000 RSI: 00007fc2c8cca000 RDI: 0000000000000003 [22613.173250] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000 [22613.173251] R10: 000000000000038c R11: 0000000000000246 R12: 00007fc2c8cca000 [22613.173253] R13: 0000000000000003 R14: 00007fc2c8cca00f R15: 0000000000020000 [22613.173255] ================================================================== [22613.173256] Disabling lock debugging due to kernel taint [22613.173350] BUG: kernel NULL pointer dereference, address: 0000000000000000 [22613.178380] #PF: supervisor read access in kernel mode [22613.180959] #PF: error_code(0x0000) - not-present page [22613.183444] PGD 0 P4D 0 [22613.184867] Oops: 0000 [#1] SMP KASAN PTI [22613.186797] CPU: 1 PID: 6753 Comm: cat Tainted: G B 5.5.0-rc7-lp151.28.16-default+ #11 [22613.191253] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [22613.196706] RIP: 0010:sysfs_kf_seq_show+0x117/0x230 [22613.199097] Code: ff 48 8b 0b 48 8b 44 24 08 48 01 e9 eb a6 31 f6 48 89 cf ba 00 10 00 00 48 89 4c 24 10 e8 b1 e6 e9 ff 4c 89 ff e8 19 07 ea ff <49> 8b 07 48 85 c0 48 89 44 24 08 0f 84 91 00 00 00 49 8b 6d 00 48 [22613.208016] RSP: 0018:ffff8881d4f8fd78 EFLAGS: 00010246 [22613.210448] RAX: 0000000000000000 RBX: ffff8881eb99b180 RCX: ffffffff810d9ef6 [22613.213691] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246 [22613.216893] RBP: 0000000000001000 R08: fffffbfff072ddcd R09: fffffbfff072ddcd [22613.220075] R10: 0000000000000001 R11: fffffbfff072ddcc R12: ffff8881de5c0200 [22613.223256] R13: ffff8881ed175500 R14: ffff8881eb99b198 R15: 0000000000000000 [22613.226290] FS: 00007fc2c8d3d500(0000) GS:ffff8881f2a80000(0000) knlGS:0000000000000000 [22613.229637] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [22613.231993] CR2: 0000000000000000 CR3: 00000001ec89a004 CR4: 00000000003606e0 [22613.234909] Call Trace: [22613.235931] seq_read+0x199/0x6d0 [22613.237259] vfs_read+0xa5/0x1a0 [22613.239229] ksys_read+0x110/0x160 [22613.240590] ? kernel_write+0xb0/0xb0 [22613.242040] do_syscall_64+0x77/0x290 [22613.243625] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [22613.245450] RIP: 0033:0x7fc2c886ac61 [22613.246706] Code: fe ff ff 48 8d 3d c7 a0 09 00 48 83 ec 08 e8 46 03 02 00 66 0f 1f 44 00 00 8b 05 ca fb 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 [22613.253296] RSP: 002b:00007ffebe776d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [22613.255835] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2c886ac61 [22613.258472] RDX: 0000000000020000 RSI: 00007fc2c8cca000 RDI: 0000000000000003 [22613.260807] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000 [22613.263188] R10: 000000000000038c R11: 0000000000000246 R12: 00007fc2c8cca000 [22613.265598] R13: 0000000000000003 R14: 00007fc2c8cca00f R15: 0000000000020000 [22613.268729] Modules linked in: scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock fuse bnep kvm_intel kvm irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel snd_ens1371 snd_ac97_codec ac97_bus bcache snd_pcm btusb btrtl btbcm btintel crc64 aesni_intel glue_helper crypto_simd vmw_balloon cryptd bluetooth snd_timer snd_rawmidi snd joydev pcspkr e1000 rfkill vmw_vmci soundcore ecdh_generic ecc gameport i2c_piix4 mptctl ac button hid_generic usbhid sr_mod cdrom ata_generic ehci_pci vmwgfx uhci_hcd drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm ehci_hcd mptspi scsi_transport_spi mptscsih ata_piix mptbase ahci usbcore libahci drm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua [22613.292429] CR2: 0000000000000000 [22613.293563] ---[ end trace a074b26a8508f378 ]--- [22613.295138] RIP: 0010:sysfs_kf_seq_show+0x117/0x230 [22613.296769] Code: ff 48 8b 0b 48 8b 44 24 08 48 01 e9 eb a6 31 f6 48 89 cf ba 00 10 00 00 48 89 4c 24 10 e8 b1 e6 e9 ff 4c 89 ff e8 19 07 ea ff <49> 8b 07 48 85 c0 48 89 44 24 08 0f 84 91 00 00 00 49 8b 6d 00 48 [22613.303553] RSP: 0018:ffff8881d4f8fd78 EFLAGS: 00010246 [22613.305280] RAX: 0000000000000000 RBX: ffff8881eb99b180 RCX: ffffffff810d9ef6 [22613.307924] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246 [22613.310272] RBP: 0000000000001000 R08: fffffbfff072ddcd R09: fffffbfff072ddcd [22613.312685] R10: 0000000000000001 R11: fffffbfff072ddcc R12: ffff8881de5c0200 [22613.315076] R13: ffff8881ed175500 R14: ffff8881eb99b198 R15: 0000000000000000 [22613.318116] FS: 00007fc2c8d3d500(0000) GS:ffff8881f2a80000(0000) knlGS:0000000000000000 [22613.320743] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [22613.322628] CR2: 0000000000000000 CR3: 00000001ec89a004 CR4: 00000000003606e0 Here this patch fixes the following problem by explicity set all the 7 members to 0 in bch_cache_accounting_clear(). Reported-by: Сорокин Артем Сергеевич <a.sorokin@bank-hlynov.ru> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29xen/blkfront: limit allocated memory size to actual use caseJuergen Gross
Today the Xen blkfront driver allocates memory for one struct blkfront_ring_info for each communication ring. This structure is statically sized for the maximum supported configuration resulting in a size of more than 90 kB. As the main size contributor is one array inside the struct, the memory allocation can easily be limited by moving this array to be the last structure element and to allocate only the memory for the actually needed array size. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29nbd: add a flush_workqueue in nbd_start_deviceSun Ke
When kzalloc fail, may cause trying to destroy the workqueue from inside the workqueue. If num_connections is m (2 < m), and NO.1 ~ NO.n (1 < n < m) kzalloc are successful. The NO.(n + 1) failed. Then, nbd_start_device will return ENOMEM to nbd_start_device_ioctl, and nbd_start_device_ioctl will return immediately without running flush_workqueue. However, we still have n recv threads. If nbd_release run first, recv threads may have to drop the last config_refs and try to destroy the workqueue from inside the workqueue. To fix it, add a flush_workqueue in nbd_start_device. Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Signed-off-by: Sun Ke <sunke32@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29drbd: fifo_alloc() should use struct_sizeStephen Kitt
Switching to struct_size for the allocation in fifo_alloc avoids hard-coding the type of fifo_buffer.values in fifo_alloc. It also provides overflow protection; to avoid pessimistic code being generated by the compiler as a result, this patch also switches fifo_size to unsigned, propagating the change as appropriate. Reviewed-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Stephen Kitt <steve@sk2.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29MAINTAINERS: Add Revanth Rajashekar as a SED-Opal maintainerJon Derrick
Scott hasn't worked for Intel for some time and has already given us his blessing. CC: Scott Bauer <sbauer@plzdonthack.me> Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29Merge tag 'docs-5.6' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "It has been a relatively quiet cycle for documentation, but there's still a couple of things of note: - Conversion of the NFS documentation to RST - A new document on how to help with documentation (and a maintainer profile entry too) Plus the usual collection of typo fixes, etc" * tag 'docs-5.6' of git://git.lwn.net/linux: (40 commits) docs: filesystems: add overlayfs to index.rst docs: usb: remove some broken references scripts/find-unused-docs: Fix massive false positives docs: nvdimm: use ReST notation for subsection zram: correct documentation about sysfs node of huge page writeback Documentation: zram: various fixes in zram.rst Add a maintainer entry profile for documentation Add a document on how to contribute to the documentation docs: Keep up with the location of NoUri Documentation: Call out example SYM_FUNC_* usage as x86-specific Documentation: nfs: fault_injection: convert to ReST Documentation: nfs: pnfs-scsi-server: convert to ReST Documentation: nfs: convert pnfs-block-server to ReST Documentation: nfs: idmapper: convert to ReST Documentation: convert nfsd-admin-interfaces to ReST Documentation: nfs-rdma: convert to ReST Documentation: nfsroot.rst: COSMETIC: refill a paragraph Documentation: nfsroot.txt: convert to ReST Documentation: convert nfs.txt to ReST Documentation: filesystems: convert vfat.txt to RST ...
2020-01-29Merge tag 'linux-kselftest-5.6-rc1-kunit' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest kunit updates from Shuah Khan: "This kunit update consists of: - Support for building kunit as a module from Alan Maguire - AppArmor KUnit tests for policy unpack from Mike Salvatore" * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: building kunit as a module breaks allmodconfig kunit: update documentation to describe module-based build kunit: allow kunit to be loaded as a module kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds kunit: allow kunit tests to be loaded as a module kunit: hide unexported try-catch interface in try-catch-impl.h kunit: move string-stream.h to lib/kunit apparmor: add AppArmor KUnit tests for policy unpack
2020-01-29Merge tag 'linux-kselftest-5.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest update from Shuah Khan: "This Kselftest update consists of several fixes to framework and individual tests. In addition, it enables LKDTM tests adding lkdtm target to kselftest Makefile" * tag 'linux-kselftest-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ftrace: fix glob selftest selftests: settings: tests can be in subsubdirs kselftest: Minimise dependency of get_size on C library interfaces selftests/livepatch: Remove unused local variable in set_ftrace_enabled() selftests/livepatch: Replace set_dynamic_debug() with setup_config() in README selftests/lkdtm: Add tests for LKDTM targets selftests: Uninitialized variable in test_cgcore_proc_migration() selftests: fix build behaviour on targets' failures
2020-01-29Merge tag 'y2038-drivers-for-v5.6-signed' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 updates from Arnd Bergmann: "Core, driver and file system changes These are updates to device drivers and file systems that for some reason or another were not included in the kernel in the previous y2038 series. I've gone through all users of time_t again to make sure the kernel is in a long-term maintainable state, replacing all remaining references to time_t with safe alternatives. Some related parts of the series were picked up into the nfsd, xfs, alsa and v4l2 trees. A final set of patches in linux-mm removes the now unused time_t/timeval/timespec types and helper functions after all five branches are merged for linux-5.6, ensuring that no new users get merged. As a result, linux-5.6, or my backport of the patches to 5.4 [1], should be the first release that can serve as a base for a 32-bit system designed to run beyond year 2038, with a few remaining caveats: - All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher. - Applications that use the system call interfaces directly need to be ported to use the time64 syscalls added in linux-5.1 in place of the existing system calls. This impacts most users of futex() and seccomp() as well as programming languages that have their own runtime environment not based on libc. - Applications that use a private copy of kernel uapi header files or their contents may need to update to the linux-5.6 version, in particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h, linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h. - A few remaining interfaces cannot be changed to pass a 64-bit time_t in a compatible way, so they must be configured to use CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit timestamps. Most importantly this impacts all users of 'struct input_event'. - All y2038 problems that are present on 64-bit machines also apply to 32-bit machines. In particular this affects file systems with on-disk timestamps using signed 32-bit seconds: ext4 with ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs" [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits) Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC" y2038: sh: remove timeval/timespec usage from headers y2038: sparc: remove use of struct timex y2038: rename itimerval to __kernel_old_itimerval y2038: remove obsolete jiffies conversion functions nfs: fscache: use timespec64 in inode auxdata nfs: fix timstamp debug prints nfs: use time64_t internally sunrpc: convert to time64_t for expiry drm/etnaviv: avoid deprecated timespec drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC drm/msm: avoid using 'timespec' hfs/hfsplus: use 64-bit inode timestamps hostfs: pass 64-bit timestamps to/from user space packet: clarify timestamp overflow tsacct: add 64-bit btime field acct: stop using get_seconds() um: ubd: use 64-bit time_t where possible xtensa: ISS: avoid struct timeval dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD ...
2020-01-29Merge tag 'printk-for-5.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk update from Petr Mladek: "Prevent replaying log on all consoles" * tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: fix exclusive_console replaying
2020-01-29Merge tag 'erofs-for-5.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "A regression fix, several cleanups and (maybe) plus an upcoming new mount api convert patch as a part of vfs update are considered available for this cycle. All commits have been in linux-next and tested with no smoke out. Summary: - fix an out-of-bound read access introduced in v5.3, which could rarely cause data corruption - various cleanup patches" * tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clean up z_erofs_submit_queue() erofs: fold in postsubmit_is_all_bypassed() erofs: fix out-of-bound read for shifted uncompressed block erofs: remove void tagging/untagging of workgroup pointers erofs: remove unused tag argument while registering a workgroup erofs: remove unused tag argument while finding a workgroup erofs: correct indentation of an assigned structure inside a function
2020-01-29Merge branch 'work.adfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull adfs updates from Al Viro: "adfs stuff for this cycle" * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) fs/adfs: bigdir: Fix an error code in adfs_fplus_read() Documentation: update adfs filesystem documentation fs/adfs: mostly divorse inode number from indirect disc address fs/adfs: super: add support for E and E+ floppy image formats fs/adfs: super: extract filesystem block probe fs/adfs: dir: remove debug in adfs_dir_update() fs/adfs: super: fix inode dropping fs/adfs: bigdir: implement directory update support fs/adfs: bigdir: calculate and validate directory checkbyte fs/adfs: bigdir: directory validation strengthening fs/adfs: bigdir: extract directory validation fs/adfs: bigdir: factor out directory entry offset calculation fs/adfs: newdir: split out directory commit from update fs/adfs: newdir: clean up adfs_f_update() fs/adfs: newdir: merge adfs_dir_read() into adfs_f_read() fs/adfs: newdir: improve directory validation fs/adfs: newdir: factor out directory format validation fs/adfs: dir: use pointers to access directory head/tails fs/adfs: dir: add more efficient iterate() per-format method fs/adfs: dir: switch to iterate_shared method ...
2020-01-29Merge branch 'work.openat2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull openat2 support from Al Viro: "This is the openat2() series from Aleksa Sarai. I'm afraid that the rest of namei stuff will have to wait - it got zero review the last time I'd posted #work.namei, and there had been a leak in the posted series I'd caught only last weekend. I was going to repost it on Monday, but the window opened and the odds of getting any review during that... Oh, well. Anyway, openat2 part should be ready; that _did_ get sane amount of review and public testing, so here it comes" From Aleksa's description of the series: "For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Furthermore, the need for some sort of control over VFS's path resolution (to avoid malicious paths resulting in inadvertent breakouts) has been a very long-standing desire of many userspace applications. This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum project[5]) with a few additions and changes made based on the previous discussion within [6] as well as others I felt were useful. In line with the conclusions of the original discussion of AT_NO_JUMPS, the flag has been split up into separate flags. However, instead of being an openat(2) flag it is provided through a new syscall openat2(2) which provides several other improvements to the openat(2) interface (see the patch description for more details). The following new LOOKUP_* flags are added: LOOKUP_NO_XDEV: Blocks all mountpoint crossings (upwards, downwards, or through absolute links). Absolute pathnames alone in openat(2) do not trigger this. Magic-link traversal which implies a vfsmount jump is also blocked (though magic-link jumps on the same vfsmount are permitted). LOOKUP_NO_MAGICLINKS: Blocks resolution through /proc/$pid/fd-style links. This is done by blocking the usage of nd_jump_link() during resolution in a filesystem. The term "magic-links" is used to match with the only reference to these links in Documentation/, but I'm happy to change the name. It should be noted that this is different to the scope of ~LOOKUP_FOLLOW in that it applies to all path components. However, you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it will *not* fail (assuming that no parent component was a magic-link), and you will have an fd for the magic-link. In order to correctly detect magic-links, the introduction of a new LOOKUP_MAGICLINK_JUMPED state flag was required. LOOKUP_BENEATH: Disallows escapes to outside the starting dirfd's tree, using techniques such as ".." or absolute links. Absolute paths in openat(2) are also disallowed. Conceptually this flag is to ensure you "stay below" a certain point in the filesystem tree -- but this requires some additional to protect against various races that would allow escape using "..". Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it can trivially beam you around the filesystem (breaking the protection). In future, there might be similar safety checks done as in LOOKUP_IN_ROOT, but that requires more discussion. In addition, two new flags are added that expand on the above ideas: LOOKUP_NO_SYMLINKS: Does what it says on the tin. No symlink resolution is allowed at all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an fd for the symlink as long as no parent path had a symlink component. LOOKUP_IN_ROOT: This is an extension of LOOKUP_BENEATH that, rather than blocking attempts to move past the root, forces all such movements to be scoped to the starting point. This provides chroot(2)-like protection but without the cost of a chroot(2) for each filesystem operation, as well as being safe against race attacks that chroot(2) is not. If a race is detected (as with LOOKUP_BENEATH) then an error is generated, and similar to LOOKUP_BENEATH it is not permitted to cross magic-links with LOOKUP_IN_ROOT. The primary need for this is from container runtimes, which currently need to do symlink scoping in userspace[7] when opening paths in a potentially malicious container. There is a long list of CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a few). In order to make all of the above more usable, I'm working on libpathrs[8] which is a C-friendly library for safe path resolution. It features a userspace-emulated backend if the kernel doesn't support openat2(2). Hopefully we can get userspace to switch to using it, and thus get openat2(2) support for free once it's ready. Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes though stale NFS handles)" * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Documentation: path-lookup: include new LOOKUP flags selftests: add openat2(2) selftests open: introduce openat2(2) syscall namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution namei: LOOKUP_IN_ROOT: chroot-like scoped resolution namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution namei: LOOKUP_NO_XDEV: block mountpoint crossing namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution namei: LOOKUP_NO_SYMLINKS: block symlink resolution namei: allow set_root() to produce errors namei: allow nd_jump_link() to produce errors nsfs: clean-up ns_get_path() signature to return int namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29Merge branch 'urgent-for-mingo' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU warning removal from Paul McKenney: "A single commit that fixes an embarrassing bug discussed here: https://lore.kernel.org/lkml/20200125131425.GB16136@zn.tnic/ which apparently also affects smaller systems" [ This was sent to Ingo, but since I see the issue on the laptop I use for testing during the merge window, I'm doing the pull directly - Linus ] * 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Forgive slow expedited grace periods at boot time
2020-01-29Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "Three objtool fixes, plus marking SFI as obsolete" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Skip samples subdirectory objtool: Fix ARCH=x86_64 build error objtool: Silence build output MAINTAINERS: Mark simple firmware interface (SFI) obsolete
2020-01-29Merge tag 'char-misc-5.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc/whatever driver changes for 5.6-rc1 Included in here are loads of things fro