summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2018-10-18md-cluster/raid10: support add disk under grow modeGuoqing Jiang
For clustered raid10 scenario, we need to let all the nodes know about that a new disk is added to the array, and the reshape caused by add new member just need to be happened in one node, but other nodes should know about the change. Since reshape means read data from somewhere (which is already used by array) and write data to unused region. Obviously, it is awful if one node is reading data from address while another node is writing to the same address. Considering we have implemented suspend writes in the resyncing area, so we can just broadcast the reading address to other nodes to avoid the trouble. For master node, it would call reshape_request then update sb during the reshape period. To avoid above trouble, we call resync_info_update to send RESYNC message in reshape_request. Then from slave node's view, it receives two type messages: 1. RESYNCING message Slave node add the address (where master node reading data from) to suspend list. 2. METADATA_UPDATED message Once slave nodes know the reshaping is started in master node, it is time to update reshape position and call start_reshape to follow master node's step. After reshape is done, only reshape position is need to be updated, so the majority task of reshaping is happened on the master node. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-10-18md-cluster/raid10: resize all the bitmaps before start reshapeGuoqing Jiang
To support add disk under grow mode, we need to resize all the bitmaps of each node before reshape, so that we can ensure all nodes have the same view of the bitmap of the clustered raid. So after the master node resized the bitmap, it broadcast a message to other slave nodes, and it checks the size of each bitmap are same or not by compare pages. We can only continue the reshaping after all nodes update the bitmap to the same size (by checking the pages), otherwise revert bitmap size to previous value. The resize_bitmaps interface and BITMAP_RESIZE message are introduced in md-cluster.c for the purpose. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-10-14MD: fix invalid stored role for a disk - try2Shaohua Li
Commit d595567dc4f0 (MD: fix invalid stored role for a disk) broke linear hotadd. Let's only fix the role for disks in raid1/10. Based on Guoqing's original patch. Reported-by: kernel test robot <rong.a.chen@intel.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-10-10md/bitmap: use mddev_suspend/resume instead of ->quiesce()Jack Wang
After 9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()") We still have similar left in bitmap functions. Replace quiesce() with mddev_suspend/resume. Also move md_bitmap_create out of mddev_suspend. and move mddev_resume after md_bitmap_destroy. as we did in set_bitmap_file. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Gioh Kim <gi-oh.kim@profitbricks.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-10-10md: remove redundant code that is no longer reachableColin Ian King
And earlier commit removed the error label to two statements that are now never reachable. Since this code is now dead code, remove it. Detected by CoverityScan, CID#1462409 ("Structurally dead code") Fixes: d5d885fd514f ("md: introduce new personality funciton start()") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-10-03md: allow metadata updates while suspending an array - fixNeilBrown
Commit 35bfc52187f6 ("md: allow metadata update while suspending.") added support for allowing md_check_recovery() to still perform metadata updates while the array is entering the 'suspended' state. This is needed to allow the processes of entering the state to complete. Unfortunately, the patch doesn't really work. The test for "mddev->suspended" at the start of md_check_recovery() means that the function doesn't try to do anything at all while entering suspend. This patch moves the code of updating the metadata while suspending to *before* the test on mddev->suspended. Reported-by: Jeff Mahoney <jeffm@suse.com> Fixes: 35bfc52187f6 ("md: allow metadata update while suspending.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-10-01MD: fix invalid stored role for a diskShaohua Li
If we change the number of array's device after device is removed from array, then add the device back to array, we can see that device is added as active role instead of spare which we expected. Please see the below link for details: https://marc.info/?l=linux-raid&m=153736982015076&w=2 This is caused by that we prefer to use device's previous role which is recorded by saved_raid_disk, but we should respect the new number of conf->raid_disks since it could be changed after device is removed. Reported-by: Gioh Kim <gi-oh.kim@profitbricks.com> Tested-by: Gioh Kim <gi-oh.kim@profitbricks.com> Acked-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-09-28md/raid10: Fix raid10 replace hang when new added disk faultyAlex Wu
[Symptom] Resync thread hang when new added disk faulty during replacing. [Root Cause] In raid10_sync_request(), we expect to issue a bio with callback end_sync_read(), and a bio with callback end_sync_write(). In normal situation, we will add resyncing sectors into mddev->recovery_active when raid10_sync_request() returned, and sub resynced sectors from mddev->recovery_active when end_sync_write() calls end_sync_request(). If new added disk, which are replacing the old disk, is set faulty, there is a race condition: 1. In the first rcu protected section, resync thread did not detect that mreplace is set faulty and pass the condition. 2. In the second rcu protected section, mreplace is set faulty. 3. But, resync thread will prepare the read object first, and then check the write condition. 4. It will find that mreplace is set faulty and do not have to prepare write object. This cause we add resync sectors but never sub it. [How to Reproduce] This issue can be easily reproduced by the following steps: mdadm -C /dev/md0 --assume-clean -l 10 -n 4 /dev/sd[abcd] mdadm /dev/md0 -a /dev/sde mdadm /dev/md0 --replace /dev/sdd sleep 1 mdadm /dev/md0 -f /dev/sde [How to Fix] This issue can be fixed by using local variables to record the result of test conditions. Once the conditions are satisfied, we can make sure that we need to issue a bio for read and a bio for write. Previous 'commit 24afd80d99f8 ("md/raid10: handle recovery of replacement devices.")' will also check whether bio is NULL, but leave the comment saying that it is a pointless test. So we remove this dummy check. Reported-by: Alex Chen <alexchen@synology.com> Reviewed-by: Allen Peng <allenpeng@synology.com> Reviewed-by: BingJing Chang <bingjingc@synology.com> Signed-off-by: Alex Wu <alexwu@synology.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-09-28raid5: block failing device if raid will be failedMariusz Tkaczyk
Currently there is an inconsistency for failing the member drives for arrays with different RAID levels. For RAID456 - there is a possibility to fail all of the devices. However - for other RAID levels - kernel blocks removing the member drive, if the operation results in array's FAIL state (EBUSY is returned). For example - removing last drive from RAID1 is not possible. This kind of blocker was never implemented for raid456 and we cannot see the reason why. We had tested following patch and did not observe any regression, so do you have any comments/reasons for current approach, or we can send the proper patch for this? Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-09-28Merge tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drmGreg Kroah-Hartman
Dave writes: "drm fixes for 4.19-rc6 Looks like a pretty normal week for graphics, core: syncobj fix, panel link regression revert amd: suspend/resume fixes, EDID emulation fix mali-dp: NV12 writeback and vblank reset fixes etnaviv: DMA setup fix" * tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Fix Edid emulation for linux drm/amd/display: Fix Vega10 lightup on S3 resume drm/amdgpu: Fix vce work queue was not cancelled when suspend Revert "drm/panel: Add device_link from panel device to DRM device" drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set drm/malidp: Fix writeback in NV12 drm: mali-dp: Call drm_crtc_vblank_reset on device init drm/etnaviv: add DMA configuration for etnaviv platform device
2018-09-28Merge tag 'riscv-for-linus-4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Palmer writes: "A Single RISC-V Update for 4.19-rc6 The Debian guys have been pushing on our port and found some unversioned symbols leaking into modules. This PR contains a single fix for that issue." * tag 'riscv-for-linus-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: RISC-V: include linux/ftrace.h in asm-prototypes.h
2018-09-28Merge tag 'pci-v4.19-fixes-2' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Bjorn writes: "PCI fixes: - Fix ACPI hotplug issue that causes black screen crash at boot (Mika Westerberg) - Fix DesignWare "scheduling while atomic" issues (Jisheng Zhang) - Add PPC contacts to MAINTAINERS for PCI core error handling (Bjorn Helgaas) - Sort Mobiveil MAINTAINERS entry (Lorenzo Pieralisi)" * tag 'pci-v4.19-fixes-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge PCI: dwc: Fix scheduling while atomic issues MAINTAINERS: Move mobiveil PCI driver entry where it belongs MAINTAINERS: Update PPC contacts for PCI core error handling
2018-09-28Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Just a few fixes for 4.19: - Couple of suspend/resume fixes - Fix EDID emulation with DC Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180927155418.2813-1-alexander.deucher@amd.com
2018-09-28Merge tag 'drm-misc-fixes-2018-09-27-1' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes - Revert adding device-link to panels - Don't leak fences in drm/syncobj Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20180927152712.GA53076@art_vandelay
2018-09-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaGreg Kroah-Hartman
Jason writes: "Second RDMA rc pull request - Fix a long standing race bug when destroying comp_event file descriptors - srp, hfi1, bnxt_re: Various driver crashes from missing validation and other cases - Fixes for regressions in patches merged this window in the gid cache, devx, ucma and uapi." * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/core: Set right entry state before releasing reference IB/mlx5: Destroy the DEVX object upon error flow IB/uverbs: Free uapi on destroy RDMA/bnxt_re: Fix system crash during RDMA resource initialization IB/hfi1: Fix destroy_qp hang after a link down IB/hfi1: Fix context recovery when PBC has an UnsupportedVL IB/hfi1: Invalid user input can result in crash IB/hfi1: Fix SL array bounds check RDMA/uverbs: Fix validity check for modify QP IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loop ucma: fix a use-after-free in ucma_resolve_ip() RDMA/uverbs: Atomically flush and mark closed the comp event queue cxgb4: fix abort_req_rss6 struct
2018-09-27Merge tag 'for_v4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Jan writes: "an ext2 patch fixing fsync(2) for DAX mounts." * tag 'for_v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2, dax: set ext2_dax_aops for dax files
2018-09-27drm/amd/display: Fix Edid emulation for linuxBhawanpreet Lakha
[Why] EDID emulation didn't work properly for linux, as we stop programming if nothing is connected physically. [How] We get a flag from DRM when we want to do edid emulation. We check if this flag is true and nothing is connected physically, if so we only program the front end using VIRTUAL_SIGNAL. Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-27drm/amd/display: Fix Vega10 lightup on S3 resumeRoman Li
[Why] There have been a few reports of Vega10 display remaining blank after S3 resume. The regression is caused by workaround for mode change on Vega10 - skip set_bandwidth if stream count is 0. As a result we skipped dispclk reset on suspend, thus on resume we may skip the clock update assuming it hasn't been changed. On some systems it causes display blank or 'out of range'. [How] Revert "drm/amd/display: Fix Vega10 black screen after mode change" Verified that it hadn't cause mode change regression. Signed-off-by: Roman Li <Roman.Li@amd.com> Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-27drm/amdgpu: Fix vce work queue was not cancelled when suspendRex Zhu
The vce cancel_delayed_work_sync never be called. driver call the function in error path. This caused the A+A suspend hang when runtime pm enebled. As we will visit the smu in the idle queue. this will cause smu hang because the dgpu has been suspend, and the dgpu also will be waked up. As the smu has been hang, so the dgpu resume will failed. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2018-09-27Revert "drm/panel: Add device_link from panel device to DRM device"Linus Walleij
This reverts commit 0c08754b59da5557532d946599854e6df28edc22. commit 0c08754b59da ("drm/panel: Add device_link from panel device to DRM device") creates a circular dependency under these circumstances: 1. The panel depends on dsi-host because it is MIPI-DSI child device. 2. dsi-host depends on the drm parent device (connector->dev->dev) this should be allowed. 3. drm parent dev (connector->dev->dev) depends on the panel after this patch. This makes the dependency circular and while it appears it does not affect any in-tree drivers (they do not seem to have dsi hosts depending on the same parent device) this does not seem right. As noted in a response from Andrzej Hajda, the intent is likely to make the panel dependent on the DRM device (connector->dev) not its parent. But we have no way of doing that since the DRM device doesn't contain any struct device on its own (arguably it should). Revert this until a proper approach is figured out. Cc: Jyri Sarha <jsarha@ti.com> Cc: Eric Anholt <eric@anholt.net> Cc: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20180927124130.9102-1-linus.walleij@linaro.org
2018-09-27Merge branch 'for-upstream/malidp-fixes' of git://linux-arm.org/linux-ld ↵Dave Airlie
into drm-fixes Fix NV12 writeback and fix vblank reset. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Liviu Dudau <Liviu.Dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180921112354.GR936@e110455-lin.cambridge.arm.com
2018-09-27Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux ↵Dave Airlie
into drm-fixes one fix to get a proper DMA configuration in place for the etnaviv virtual device. I'm sending this as a fix, as a dma-mapping change at the ARC architecture side during the 4.19 cycle broke etnaviv on this platform, which gets remedied with this patch, but it also enables ARM64. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/ea1f712bf09bf9439c6b092bf2c2bde7bb01cf5e.camel@pengutronix.de
2018-09-26ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridgeMika Westerberg
HP 6730b laptop has an ethernet NIC connected to one of the PCIe root ports. The root ports themselves are native PCIe hotplug capable. Now, during boot after PCI devices are scanned the BIOS triggers ACPI bus check directly to the NIC: ACPI: \_SB_.PCI0.RP06.NIC_: Bus check in hotplug_event() It is not clear why it is sending bus check but regardless the ACPI hotplug notify handler calls enable_slot() directly (instead of going through acpiphp_check_bridge() as there is no bridge), which ends up handling special case for non-hotplug bridges with native PCIe hotplug. This results a crash of some kind but the reporter only sees black screen so it is hard to figure out the exact spot and what actually happens. Based on a few fix proposals it was tracked to crash somewhere inside pci_assign_unassigned_bridge_resources(). In any case we should not really be in that special branch at all because the ACPI notify happened to a slot that is not a PCI bridge (it is just a regular PCI device). Fix this so that we only go to that special branch if we are calling enable_slot() for a bridge (e.g., the ACPI notification was for the bridge). Link: https://bugzilla.kernel.org/show_bug.cgi?id=201127 Fixes: 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") Reported-by: Peter Anemone <peter.anemone@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> CC: stable@vger.kernel.org # v4.18+
2018-09-26drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is setJason Ekstrand
We attempt to get fences earlier in the hopes that everything will already have fences and no callbacks will be needed. If we do succeed in getting a fence, getting one a second time will result in a duplicate ref with no unref. This is causing memory leaks in Vulkan applications that create a lot of fences; playing for a few hours can, apparently, bring down the system. Cc: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107899 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20180926071703.15257-1-jason.ekstrand@intel.com
2018-09-26Merge tag 'iommu-fixes-v4.19-rc5' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Joerg writes: "IOMMU Fixes for Linux v4.19-rc5 Three fixes queued up: - Warning fix for Rockchip IOMMU where there were IRQ handlers for offlined hardware. - Fix for Intel VT-d because recent changes caused boot failures on some machines because it tried to allocate to much contiguous memory. - Fix for AMD IOMMU to handle eMMC devices correctly that appear as ACPI HID devices." * tag 'iommu-fixes-v4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Return devid as alias for ACPI HID devices iommu/vt-d: Handle memory shortage on pasid table allocation iommu/rockchip: Free irqs in shutdown handler
2018-09-26iommu/amd: Return devid as alias for ACPI HID devicesArindam Nath
ACPI HID devices do not actually have an alias for them in the IVRS. But dev_data->alias is still used for indexing into the IOMMU device table for devices being handled by the IOMMU. So for ACPI HID devices, we simply return the corresponding devid as an alias, as parsed from IVRS table. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Fixes: 2bf9a0a12749 ('iommu/amd: Add iommu support for ACPI HID devices') Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-09-25RDMA/core: Set right entry state before releasing referenceParav Pandit
Currently add_modify_gid() for IB link layer has followong issue in cache update path. When GID update event occurs, core releases reference to the GID table without updating its state and/or entry pointer. CPU-0 CPU-1 ------ ----- ib_cache_update() IPoIB ULP add_modify_gid() [..] put_gid_entry() refcnt = 0, but state = valid, entry is valid. (work item is not yet executed). ipoib_create_ah() rdma_create_ah() rdma_get_gid_attr() <-- Tries to acquire gid_attr which has refcnt = 0. This is incorrect. GID entry state and entry pointer is provides the accurate GID enty state. Such fields must be updated with rwlock to protect against readers and, such fields must be in sane state before refcount can drop to zero. Otherwise above race condition can happen leading to use-after-free situation. Following backtrace has been observed when cache update for an IB port is triggered while IPoIB ULP is creating an AH. Therefore, when updating GID entry, first mark a valid entry as invalid through state and set the barrier so that no callers can acquired the GID entry, followed by release reference to it. refcount_t: increment on 0; use-after-free. WARNING: CPU: 4 PID: 29106 at lib/refcount.c:153 refcount_inc_checked+0x30/0x50 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] RIP: 0010:refcount_inc_checked+0x30/0x50 RSP: 0018:ffff8802ad36f600 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffffffff86710100 RBP: ffff8802d6e60a30 R08: ffffed005d67bf8b R09: ffffed005d67bf8b R10: 0000000000000001 R11: ffffed005d67bf8a R12: ffff88027620cee8 R13: ffff8802d6e60988 R14: ffff8802d6e60a78 R15: 0000000000000202 FS: 0000000000000000(0000) GS:ffff8802eb200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3ab35e5c88 CR3: 00000002ce84a000 CR4: 00000000000006e0 IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready Call Trace: rdma_get_gid_attr+0x220/0x310 [ib_core] ? lock_acquire+0x145/0x3a0 rdma_fill_sgid_attr+0x32c/0x470 [ib_core] rdma_create_ah+0x89/0x160 [ib_core] ? rdma_fill_sgid_attr+0x470/0x470 [ib_core] ? ipoib_create_ah+0x52/0x260 [ib_ipoib] ipoib_create_ah+0xf5/0x260 [ib_ipoib] ipoib_mcast_join_complete+0xbbe/0x2540 [ib_ipoib] Fixes: b150c3862d21 ("IB/core: Introduce GID entry reference counts") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25IB/mlx5: Destroy the DEVX object upon error flowYishai Hadas
Upon DEVX object creation the object must be destroyed upon a follows error flow. Fixes: 7efce3691d33 ("IB/mlx5: Add obj create and destroy functionality") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25IB/uverbs: Free uapi on destroyMark Bloch
Make sure we free struct uverbs_api once we clean the radix tree. It was allocated by uverbs_alloc_api(). Fixes: 9ed3e5f44772 ("IB/uverbs: Build the specs into a radix tree at runtime") Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25erge tag 'libnvdimm-fixes-4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Dan writes: "libnvdimm/dax for 4.19-rc6 * (2) fixes for the dax error handling updates that were merged for v4.19-rc1. My mails to Al have been bouncing recently, so I do not have his ack but the uaccess change is of the trivial / obviously correct variety. The address_space_operations fixes a regression. * A filesystem-dax fix to correct the zero page lookup to be compatible with non-x86 (mips and s390) architectures." * tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: Add missing address_space_operations uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe() filesystem-dax: Fix use of zero page
2018-09-25Merge tag 'scsi-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi James writes: "SCSI fixes on 20180925 Nine obvious bug fixes mostly in individual drivers. The target fix is of particular importance because it's CVE related." * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: don't crash the host on invalid commands scsi: ipr: System hung while dlpar adding primary ipr adapter back scsi: target: iscsi: Use bin2hex instead of a re-implementation scsi: target: iscsi: Use hex2bin instead of a re-implementation scsi: lpfc: Synchronize access to remoteport via rport scsi: ufs: Disable blk-mq for now scsi: sd: Contribute to randomness when running rotational device scsi: ibmvscsis: Ensure partition name is properly NUL terminated scsi: ibmvscsis: Fix a stringop-overflow warning
2018-09-25Merge tag 'usb-4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb I wrote: "USB fixes for 4.19-rc6 Here are some small USB core and driver fixes for reported issues for 4.19-rc6. The most visible is the oops fix for when the USB core is built into the kernel that is present in 4.18. Turns out not many people actually do that so it went unnoticed for a while. The rest is some tiny typec, musb, and other core fixes. All have been in linux-next with no reported issues." * tag 'usb-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: mux: Take care of driver module reference counting usb: core: safely deal with the dynamic quirk lists usb: roles: Take care of driver module reference counting USB: handle NULL config in usb_find_alt_setting() USB: fix error handling in usb_driver_claim_interface() USB: remove LPM management from usb_driver_claim_interface() USB: usbdevfs: restore warning for nonsensical flags USB: usbdevfs: sanitize flags more Revert "usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt()" usb: musb: dsps: do not disable CPPI41 irq in driver teardown
2018-09-25Merge tag 'tty-4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty I wrote: "TTY/Serial driver fixes for 4.19-rc6 Here are a number of small tty and serial driver fixes for reported issues for 4.19-rc6. One should hopefully resolve a much-reported issue that syzbot has found in the tty layer. Although there are still more issues there, getting this fixed is nice to see finally happen. All of these have been in linux-next for a while with no reported issues." * tag 'tty-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: imx: restore handshaking irq for imx1 tty: vt_ioctl: fix potential Spectre v1 tty: Drop tty->count on tty_reopen() failure serial: cpm_uart: return immediately from console poll tty: serial: lpuart: avoid leaking struct tty_struct serial: mvebu-uart: Fix reporting of effective CSIZE to userspace
2018-09-25Merge tag 'char-misc-4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Greg (well I), wrote: "Char/Misc driver fixes for 4.19-rc6 Here are some soundwire and intel_th (tracing) driver fixes for some reported issues. All of these have been in linux-next for a week with no reported issues." * tag 'char-misc-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: intel_th: pci: Add Ice Lake PCH support intel_th: Fix resource handling for ACPI glue layer intel_th: Fix device removal logic soundwire: Fix acquiring bus lock twice during master release soundwire: Fix incorrect exit after configuring stream soundwire: Fix duplicate stream state assignment
2018-09-25iommu/vt-d: Handle memory shortage on pasid table allocationLu Baolu
Pasid table memory allocation could return failure due to memory shortage. Limit the pasid table size to 1MiB because current 8MiB contiguous physical memory allocation can be hard to come by. W/o a PASID table, the device could continue to work with only shared virtual memory impacted. So, let's go ahead with context mapping even the memory allocation for pasid table failed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107783 Fixes: cc580e41260d ("iommu/vt-d: Per PCI device pasid table interfaces") Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Reported-and-tested-by: Pelton Kyle D <kyle.d.pelton@intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-09-25Revert "uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct ↵Lubomir Rintel
member name" This changes UAPI, breaking iwd and libell: ell/key.c: In function 'kernel_dh_compute': ell/key.c:205:38: error: 'struct keyctl_dh_params' has no member named 'private'; did you mean 'dh_private'? struct keyctl_dh_params params = { .private = private, ^~~~~~~ dh_private This reverts commit 8a2336e549d385bb0b46880435b411df8d8200e8. Fixes: 8a2336e549d3 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name") Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: David Howells <dhowells@redhat.com> cc: Randy Dunlap <rdunlap@infradead.org> cc: Mat Martineau <mathew.j.martineau@linux.intel.com> cc: Stephan Mueller <smueller@chronox.de> cc: James Morris <jmorris@namei.org> cc: "Serge E. Hallyn" <serge@hallyn.com> cc: Mat Martineau <mathew.j.martineau@linux.intel.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: <stable@vger.kernel.org> Signed-off-by: James Morris <james.morris@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-25Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
Dave writes: "Networking fixes: 1) Fix multiqueue handling of coalesce timer in stmmac, from Jose Abreu. 2) Fix memory corruption in NFC, from Suren Baghdasaryan. 3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi. 4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun. 5) Fix TX done race in mvpp2, from Antoine Tenart. 6) ipv6 metrics leak, from Wei Wang. 7) Adjust firmware version requirements in mlxsw, from Petr Machata. 8) Fix autonegotiation on resume in r8169, from Heiner Kallweit. 9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff Barnhill. 10) Fix double free in devlink, from Dan Carpenter. 11) Fix ethtool regression from UFO feature removal, from Maciej Żenczykowski. 12) Fix drivers that have a ndo_poll_controller() that captures the cpu entirely on loaded hosts by trying to drain all rx and tx queues, from Eric Dumazet. 13) Fix memory corruption with jumbo frames in aquantia driver, from Friedemann Gerold." * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits) net: mvneta: fix the remaining Rx descriptor unmapping issues ip_tunnel: be careful when accessing the inner header mpls: allow routes on ip6gre devices net: aquantia: memory corruption on jumbo frames tun: remove ndo_poll_controller nfp: remove ndo_poll_controller bnxt: remove ndo_poll_controller bnx2x: remove ndo_poll_controller mlx5: remove ndo_poll_controller mlx4: remove ndo_poll_controller i40evf: remove ndo_poll_controller ice: remove ndo_poll_controller igb: remove ndo_poll_controller ixgb: remove ndo_poll_controller fm10k: remove ndo_poll_controller ixgbevf: remove ndo_poll_controller ixgbe: remove ndo_poll_controller bonding: use netpoll_poll_dev() helper netpoll: make ndo_poll_controller() optional rds: Fix build regression. ...
2018-09-25iommu/rockchip: Free irqs in shutdown handlerHeiko Stuebner
In the iommu's shutdown handler we disable runtime-pm which could result in the irq-handler running unclocked and since commit 3fc7c5c0cff3 ("iommu/rockchip: Handle errors returned from PM framework") we warn about that fact. This can cause warnings on shutdown on some Rockchip machines, so free the irqs in the shutdown handler before we disable runtime-pm. Reported-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Fixes: 3fc7c5c0cff3 ("iommu/rockchip: Handle errors returned from PM framework") Signed-off-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-09-24RISC-V: include linux/ftrace.h in asm-prototypes.hJames Cowgill
Building a riscv kernel with CONFIG_FUNCTION_TRACER and CONFIG_MODVERSIONS enabled results in these two warnings: MODPOST vmlinux.o WARNING: EXPORT symbol "return_to_handler" [vmlinux] version generation failed, symbol will not be versioned. WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned. When exporting symbols from an assembly file, the MODVERSIONS code requires their prototypes to be defined in asm-prototypes.h (see scripts/Makefile.build). Since both of these symbols have prototypes defined in linux/ftrace.h, include this header from RISC-V's asm-prototypes.h. Reported-by: Karsten Merker <merker@debian.org> Signed-off-by: James Cowgill <jcowgill@debian.org> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-09-24net: mvneta: fix the remaining Rx descriptor unmapping issuesAntoine Tenart
With CONFIG_DMA_API_DEBUG enabled we get DMA unmapping warning in various places of the mvneta driver, for example when putting down an interface while traffic is passing through. The issue is when using s/w buffer management, the Rx buffers are mapped using dma_map_page but unmapped with dma_unmap_single. This patch fixes this by using the right unmapping function. Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24ip_tunnel: be careful when accessing the inner headerPaolo Abeni
Cong noted that we need the same checks introduced by commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the inner header") even for ipv4 tunnels. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24mpls: allow routes on ip6gre devicesSaif Hasan
Summary: This appears to be necessary and sufficient change to enable `MPLS` on `ip6gre` tunnels (RFC4023). This diff allows IP6GRE devices to be recognized by MPLS kernel module and hence user can configure interface to accept packets with mpls headers as well setup mpls routes on them. Test Plan: Test plan consists of multiple containers connected via GRE-V6 tunnel. Then carrying out testing steps as below. - Carry out necessary sysctl settings on all containers ``` sysctl -w net.mpls.platform_labels=65536 sysctl -w net.mpls.ip_ttl_propagate=1 sysctl -w net.mpls.conf.lo.input=1 ``` - Establish IP6GRE tunnels ``` ip -6 tunnel add name if_1_2_1 mode ip6gre \ local 2401:db00:21:6048:feed:0::1 \ remote 2401:db00:21:6048:feed:0::2 key 1 ip link set dev if_1_2_1 up sysctl -w net.mpls.conf.if_1_2_1.input=1 ip -4 addr add 169.254.0.2/31 dev if_1_2_1 scope link ip -6 tunnel add name if_1_3_1 mode ip6gre \ local 2401:db00:21:6048:feed:0::1 \ remote 2401:db00:21:6048:feed:0::3 key 1 ip link set dev if_1_3_1 up sysctl -w net.mpls.conf.if_1_3_1.input=1 ip -4 addr add 169.254.0.4/31 dev if_1_3_1 scope link ``` - Install MPLS encap rules on node-1 towards node-2 ``` ip route add 192.168.0.11/32 nexthop encap mpls 32/64 \ via inet 169.254.0.3 dev if_1_2_1 ``` - Install MPLS forwarding rules on node-2 and node-3 ``` // node2 ip -f mpls route add 32 via inet 169.254.0.7 dev if_2_4_1 // node3 ip -f mpls route add 64 via inet 169.254.0.12 dev if_4_3_1 ``` - Ping 192.168.0.11 (node4) from 192.168.0.1 (node1) (where routing towards 192.168.0.1 is via IP route directly towards node1 from node4) ``` ping 192.168.0.11 ``` - tcpdump on interface to capture ping packets wrapped within MPLS header which inturn wrapped within IP6GRE header ``` 16:43:41.121073 IP6 2401:db00:21:6048:feed::1 > 2401:db00:21:6048:feed::2: DSTOPT GREv0, key=0x1, length 100: MPLS (label 32, exp 0, ttl 255) (label 64, exp 0, [S], ttl 255) IP 192.168.0.1 > 192.168.0.11: ICMP echo request, id 1208, seq 45, length 64 0x0000: 6000 2cdb 006c 3c3f 2401 db00 0021 6048 `.,..l<?$....!`H 0x0010: feed 0000 0000 0001 2401 db00 0021 6048 ........$....!`H 0x0020: feed 0000 0000 0002 2f00 0401 0401 0100 ......../....... 0x0030: 2000 8847 0000 0001 0002 00ff 0004 01ff ...G............ 0x0040: 4500 0054 3280 4000 ff01 c7cb c0a8 0001 E..T2.@......... 0x0050: c0a8 000b 0800 a8d7 04b8 002d 2d3c a05b ...........--<.[ 0x0060: 0000 0000 bcd8 0100 0000 0000 1011 1213 ................ 0x0070: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0080: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0090: 3435 3637 4567 ``` Signed-off-by: Saif Hasan <has@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-09-24 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Several fixes for BPF sockmap to only allow sockets being attached in ESTABLISHED state, from John. 2) Fix up the license to LGPL/BSD for the libc compat header which contains fallback helpers that libbpf and bpftool is using, from Jakub. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24RDMA/bnxt_re: Fix system crash during RDMA resource initializationSelvin Xavier
bnxt_re_ib_reg acquires and releases the rtnl lock whenever it accesses the L2 driver. The following sequence can trigger a crash Acquires the rtnl_lock -> Registers roce driver callback with L2 driver -> release the rtnl lock bnxt_re acquires the rtnl_lock -> Request for MSIx vectors -> release the rtnl_lock Issue happens when bnxt_re proceeds with remaining part of initialization and L2 driver invokes bnxt_ulp_irq_stop as a part of bnxt_open_nic. The crash is in bnxt_qplib_nq_stop_irq as the NQ structures are not initialized yet, <snip> [ 3551.726647] BUG: unable to handle kernel NULL pointer dereference at (null) [ 3551.726656] IP: [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] [ 3551.726674] PGD 0 [ 3551.726679] Oops: 0002 1 SMP ... [ 3551.726822] Hardware name: Dell Inc. PowerEdge R720/08RW36, BIOS 2.4.3 07/09/2014 [ 3551.726826] task: ffff97e30eec5ee0 ti: ffff97e3173bc000 task.ti: ffff97e3173bc000 [ 3551.726829] RIP: 0010:[<ffffffffc0840ee9>] [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] ... [ 3551.726872] Call Trace: [ 3551.726886] [<ffffffffc082cb9e>] bnxt_re_stop_irq+0x4e/0x70 [bnxt_re] [ 3551.726899] [<ffffffffc07d6a53>] bnxt_ulp_irq_stop+0x43/0x70 [bnxt_en] [ 3551.726908] [<ffffffffc07c82f4>] bnxt_reserve_rings+0x174/0x1e0 [bnxt_en] [ 3551.726917] [<ffffffffc07cafd8>] __bnxt_open_nic+0x368/0x9a0 [bnxt_en] [ 3551.726925] [<ffffffffc07cb62b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 3551.726934] [<ffffffffc07cc62f>] bnxt_setup_mq_tc+0x11f/0x260 [bnxt_en] [ 3551.726943] [<ffffffffc07d5f58>] bnxt_dcbnl_ieee_setets+0xb8/0x1f0 [bnxt_en] [ 3551.726954] [<ffffffff890f983a>] dcbnl_ieee_set+0x9a/0x250 [ 3551.726966] [<ffffffff88fd6d21>] ? __alloc_skb+0xa1/0x2d0 [ 3551.726972] [<ffffffff890f72fa>] dcb_doit+0x13a/0x210 [ 3551.726981] [<ffffffff89003ff7>] rtnetlink_rcv_msg+0xa7/0x260 [ 3551.726989] [<ffffffff88ffdb00>] ? rtnl_unicast+0x20/0x30 [ 3551.726996] [<ffffffff88bf9dc8>] ? __kmalloc_node_track_caller+0x58/0x290 [ 3551.727002] [<ffffffff890f7326>] ? dcb_doit+0x166/0x210 [ 3551.727007] [<ffffffff88fd6d0d>] ? __alloc_skb+0x8d/0x2d0 [ 3551.727012] [<ffffffff89003f50>] ? rtnl_newlink+0x880/0x880 ... [ 3551.727104] [<ffffffff8911f7d5>] system_call_fastpath+0x1c/0x21 ... [ 3551.727164] RIP [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] [ 3551.727175] RSP <ffff97e3173bf788> [ 3551.727177] CR2: 0000000000000000 Avoid this inconsistent state and system crash by acquiring the rtnl lock for the entire duration of device initialization. Re-factor the code to remove the rtnl lock from the individual function and acquire and release it from the caller. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: 6e04b1035689 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-24Merge tag 'media/v4.19-2' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Mauro briefly writes: "media fixes for v4.19-rc5 some drivers and Kbuild fixes" * tag 'media/v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: platform: fix cros-ec-cec build error media: staging/media/mt9t031/Kconfig: remove bogus entry media: i2c: mt9v111: Fix v4l2-ctrl error handling media: camss: add missing includes media: camss: Use managed memory allocations media: camss: mark PM functions as __maybe_unused media: af9035: prevent buffer overflow on write media: video_function_calls.rst: drop obsolete video-set-attributes reference
2018-09-23net: aquantia: memory corruption on jumbo framesFriedemann Gerold
This patch fixes skb_shared area, which will be corrupted upon reception of 4K jumbo packets. Originally build_skb usage purpose was to reuse page for skb to eliminate needs of extra fragments. But that logic does not take into account that skb_shared_info should be reserved at the end of skb data area. In case packet data consumes all the page (4K), skb_shinfo location overflows the page. As a consequence, __build_skb zeroed shinfo data above the allocated page, corrupting next page. The issue is rarely seen in real life because jumbo are normally larger than 4K and that causes another code path to trigger. But it 100% reproducible with simple scapy packet, like: sendp(IP(dst="192.168.100.3") / TCP(dport=443) \ / Raw(RandString(size=(4096-40))), iface="enp1s0") Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Reported-by: Friedemann Gerold <f.gerold@b-c-s.de> Reported-by: Michael Rauch <michael@rauch.be> Signed-off-by: Friedemann Gerold <f.gerold@b-c-s.de> Tested-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23Merge branch 'netpoll-avoid-capture-effects-for-NAPI-drivers'David S. Miller
Eric Dumazet says: ==================== netpoll: avoid capture effects for NAPI drivers As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture, showing one ksoftirqd eating all cycles can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. It seems that all networking drivers that do use NAPI for their TX completions, should not provide a ndo_poll_controller() : Most NAPI drivers have netpoll support already handled in core networking stack, since netpoll_poll_dev() uses poll_napi(dev) to iterate through registered NAPI contexts for a device. This patch series take care of the first round, we will handle other drivers in future rounds. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23tun: remove ndo_poll_controllerEric Dumazet
As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. tun uses NAPI for TX completions, so we better let core networking stack call the napi->poll() to avoid the capture. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23nfp: remove ndo_poll_controllerEric Dumazet
As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. nfp uses NAPI for TX completions, so we better let core networking stack call the napi->poll() to avoid the capture. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23bnxt: remove ndo_poll_controllerEric Dumazet
As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. bnxt uses NAPI for TX completions, so we better let core networking stack call the napi->poll() to avoid the capture. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>