summaryrefslogtreecommitdiffstats
path: root/drivers/scsi/lpfc/lpfc_els.c
AgeCommit message (Collapse)Author
2020-12-01scsi: lpfc: Correct null ndlp reference on routine exitJames Smart
smatch correctly called out a logic error with accessing a pointer after checking it for null: drivers/scsi/lpfc/lpfc_els.c:2043 lpfc_cmpl_els_plogi() error: we previously assumed 'ndlp' could be null (see line 1942) Adjust the exit point to avoid the trace printf ndlp reference. A trace entry was already generated when the ndlp was checked for null. Link: https://lore.kernel.org/r/20201130181226.16675-1-james.smart@broadcom.com Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19scsi: lpfc: Fix set but not used warnings from Rework remote port lock handlingJames Smart
Remove local variables that are set but not used. Link: https://lore.kernel.org/r/20201119203340.121819-1-james.smart@broadcom.com Fixes: c6adba150191 ("scsi: lpfc: Rework remote port lock handling") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19scsi: lpfc: Fix memory leak on lcb_contextColin Ian King
Currently there is an error return path that neglects to free the allocation for lcb_context. Fix this by adding a new error free exit path that kfree's lcb_context before returning. Use this new kfree exit path in another exit error path that also kfree's the same object, allowing a line of code to be removed. Link: https://lore.kernel.org/r/20201118141314.462471-1-colin.king@canonical.com Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking") Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Addresses-Coverity: ("Resource leak")
2020-11-17scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlersJames Smart
This patch reworks the abort interfaces such that SLI-3 retains the iocb-based formatting and completions and SLI-4 now uses native WQEs and completion routines. The following changes are made: - The code is refactored from a confusing 2 routine sequence of xx_abort_iotag_issue(), which creates/formats and abort cmd, and xx_issue_abort_tag(), which then issues and handles the completion of the abort cmd - into a single interface of xx_issue_abort_iotag(). The new interface will determine whether SLI-3 or SLI-4 and then call the appropriate handler. A completion handler can now be specified to address the differences in completion handling. Note: original code is all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path, and NVMe natively using wqes. - The SLI-3 side is refactored: The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined with the logic of lpfc_sli_abort_iotag_issue() as well as the iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to create the new single SLI-3 abort routine that formats and issues the iocb. - The SLI-4 side is refactored and added to: The native WQE abort code in NVMe is moved to the new SLI-4 issue_abort_iotag() routine. Items in SCSI that set fields not set by NVMe is migrated into the new routine. Thus the routine supports NVMe and SCSI initiators. The nvmet block (target) formats the abort slightly different (like the old NVMe initiator) thus it has its own prep routine stolen from NVMe initiator and it retains the current code it has for issuing the WQE (does not use the commonized routine the initiators do). SLI-4 completion handlers were also added. - lpfc_abort_handler now becomes a wrapper that determines whether SLI-3 or SLI-4 and calls the proper abort handler. Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Fix NPIV Fabric Node reference countingJames Smart
While testing initiator-side cable swaps with NPIV, oops occur. The reference counts for the Fabric nodes on the NPIV vports isn't balanced, resulting in premature node removal. The following fixes were made: - Removed the FC_LBIT check in lpfc_linkup_port. This removed the special case for vports that didn't have them clean up just like the physical port. - Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the node is being removed in the context of a reference count release and a mailbox command can't be issued at this point. - Remove special case handling in the default mailbox completion handler that allowed the skipping of a node reference. Now, reference counting always requires the removal of the reference. - Move the location of the DEVICE_RM event is done during LOGO handling as the driver has additional work to do on the ndlp before puts/releases can be performed. Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Fix NPIV discovery and Fabric Node detectionJames Smart
While testing NPIV and link bounces, the vport would not show a fabric node for the F_Port, would not transition into NPR state during a link fault, or leave the FDMI node untouched during error injection. Cause for this was determined to be an inconsistent manner in which F_Port, Nameserver, and FDMI controller nodes were created and linked. In some cases, the nodes would never be unregistered from the transport, leaving references active. In other cases, the fabric nodes may register with the transport multiple times while still registered. The following changes were made: - Fix the FDISC issue routine, which starts vport (re)creation, to mark the F_Port as a fabric node (NLP_FABRIC) and allow the F_Port node to fully be created and show up in the node list. - When remote ports are cleaned up on vport termination, cleanup the nameserver and FDMI controller nodes on the vport so they unregister from the transport. - On link bounces, don't exclude the NPIV Fabric remote ports from transitioning to the NPR state, allowing them to avoid re-registration if already registered. Link: https://lore.kernel.org/r/20201115192646.12977-9-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Unsolicited ELS leaves node in incorrect state while dropping itJames Smart
When a target swap happens, under certain conditions the node sends a LOGO. The unsolicited ELS logic responds with a reject. The logic may allocate a new node to handle this. Afterward, the new nodes are dropped incorrectly leaving them in a mis-matched state and refcounting causes a use-after-free situation leading to a crash. It is also possible that the unsolicited els handling finds a node which is in an UNUSED state. The handling moves these nodes to NPR state with a refcount of 1. Although the end of the discovery logic assumes a final put will free such a node, there are codes paths which could increment the reference count, thus the node is in NPR state and not released. Eventually this mismatch in state and refcount leads to premature release of the node causing a crash. Fix by always using the discovery engine DEVICE RM event to decrement and release the nodes (rather than explicit code that tried to do it before). This will take care of moving the node to the UNUSED state and then removes the final ref count. If there is a trigger to reuse this node, the transition from the UNUSED state clearly indicates that the initial reference is then incremented and use can continue. Link: https://lore.kernel.org/r/20201115192646.12977-8-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately failsJames Smart
When a PLOGI/ADISC/PRLI/REG_RPI fails, the node remains in the nodelist in that state. Although the driver now frees a node when the ref count goes to zero, in this case the ref cnt doesn't reach zero because there isn't a mechanism to release the final reference. Discovery just stops. Fix by calling the node discovery state machine DEVICE_RM event whenever one of these commands fail. This will remove the final reference count and trigger node release. Link: https://lore.kernel.org/r/20201115192646.12977-7-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Rework remote port lock handlingJames Smart
Currently the discovery layers within the driver use the SCSI midlayer host_lock to access node-specific structures. This can contend with the I/O path and is too coarse of a lock. Rework the driver so that it uses a lock specific to the remote port node structure when accessing the structure contents. A few of the changes brought out spots were some slightly reorganized routines worked better. Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Fix refcounting around SCSI and NVMe transport APIsJames Smart
Due to bug history and code review, the node reference counting approach in the driver isn't implemented consistently with how the scsi and nvme transport perform registrations and unregistrations and their callbacks. This resulted in many bad/stale node pointers. Reword the driver so that reference handling is performed as follows: - The initial node reference is taken on structure allocation - Take a reference on any add/register call to the transport - Remove a reference on any delete/unregister call to the transport - After the node has fully removed from both the SCSI and NVMEe transports (dev_loss_callbacks have called back) call the discovery engine DEVICE_RM event which will remove the final reference and release the node structure. - Alter dev_loss handling when a vport or base port is unloading. - Remove the put_node handling - no longer needed. - Rewrite the vport_delete handling on reference counts. Part of this effort was driven from the FDISC not registering with the transport and disrupting the model for node reference counting. - Deleted lpfc_nlp_remove. Pushed it's remaining ops into lpfc_nlp_release. - Several other small code cleanups. Link: https://lore.kernel.org/r/20201115192646.12977-5-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Fix removal of SCSI transport device get and put on dev structureJames Smart
The lpfc driver is calling get_device and put_device on scsi_fc_transport device structure. When this code was removed, the driver triggered an oops in "scsi_is_host_dev" when the first SCSI target was unregistered from the transport. The reason the calls were necessary is that the driver is calling scsi_remove_host too early, before the target rports are unregistered and the scsi devices disconnected from the scsi_host. The fc_host was torn down during fc_remove_host. Fix by moving the lpfc_pci_remove_one_s3/s4 calls to scsi_remove_host to after the nodes are cleaned up. Remove the get_device and put_device calls and the supporting code. Link: https://lore.kernel.org/r/20201115192646.12977-4-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Rework locations of ndlp reference takingJames Smart
Now that the driver has gone to a normal ref interface (with no odd logic) the discovery logic needs to be updated to reworked so that it properly takes references when it should and give them up when it should. Rework the driver for the following get/put model: - Move gets to just before an I/O is issued. Add gets for places where an I/O was issued without one. - Ensure that failures from lpfc_nlp_get() are handled by the driver. - Check and fix the placement of lpfc_nlp_puts relative to io completions. Note: some of these paths may not release the reference on the exact io completion as the reference is held as the code takes another step in the discovery thread and which may cause another io to be issued. - Rearrange some code for error processing and calling lpfc_nlp_put. - Fix some places of incorrect reference freeing that was causing the premature releasing of the structure. - Nvmet plogi handling performs unreg_rpi's. The reference counts were unbalanced resulting in premature node removal. In some cases this caused loss of node discovery. Corrected the reftaking around nvmet plogis. Nodes that experience devloss now get released from the node list now that there is a proper reference taking. Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Rework remote port ref counting and node freeingJames Smart
When a remote port is disconnected and disappears, its node structure (ndlp) stays allocated and on a vport node list. While on the list it can be matched, thus requires validation checks on state to be added in numerous code paths. If the node comes back, its possible for there to be multiple node structures for the same device on the vport node list. There is no reason to keep the node structure around after it is no longer in existence, and the current implementation creates problems for itself (multiple nodes) and lots of unnecessary code for state validation. Additionally, the reference taking on the node structure didn't follow the normal model used by the kernel kref api. It included lots of odd logic to match state with reference count. The combination of this odd logic plus the way it was implicitly used in the discovery engine made its reference taking implementation suspect and extremely hard to follow. Change the driver such that the reference taking routines are now normal ref increments/decrements and callout on refcount=0. With this in place, the rework can be done such that the node structure is fully removed and deallocated when the remote port no longer exists and all references are removed. This removal logic, and the basic ref counting are intrically tied, thus in a single patch. Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-26scsi: lpfc: Add FDMI Vendor MIB supportJames Smart
Created new attribute lpfc_enable_mi, which by default is enabled. Add command definition bits for SLI-4 parameters that recognize whether the adapter has MIB information support and what revision of MIB data. Using the adapter information, register vendor-specific MIB support with FDMI. The registration will be done every link up. During FDMI registration, encountered a couple of errors when reverting to FDMI rev1. Code needed to exist once reverting. Fixed these. Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-08Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eleven fixes, mostly in drivers or minor fixes in driver related infrastructure libraries (target, libfc and libsas). Most of the bugs fixed only show up under rare circumstances, the exception being the endianness problem in qla2xxx which is used as a device on some sparc systems" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpt3sas: Don't call disable_irq from IRQ poll handler scsi: megaraid_sas: Don't call disable_irq from process IRQ poll scsi: target: iscsi: Fix hang in iscsit_access_np() when getting tpg->np_login_sem scsi: libsas: Set data_dir as DMA_NONE if libata marks qc as NODATA scsi: target: iscsi: Fix data digest calculation scsi: lpfc: Update lpfc version to 12.8.0.4 scsi: lpfc: Extend the RDF FPIN Registration descriptor for additional events scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discovery scsi: lpfc: Fix setting IRQ affinity with an empty CPU mask scsi: qla2xxx: Fix regression on sparc64 scsi: libfc: Fix for double free() scsi: pm8001: Fix memleak in pm8001_exec_internal_task_abort
2020-08-31scsi: lpfc: Extend the RDF FPIN Registration descriptor for additional eventsJames Smart
Currently the driver registers for Link Integrity events only. This patch adds registration for the following FPIN types: - Delivery Notifications - Congestion Notification - Peer Congestion Notification Link: https://lore.kernel.org/r/20200828175332.130300-4-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-31scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discoveryJames Smart
The driver is unable to successfully login with remote device. During pt2pt login, the driver completes its FLOGI request with the remote device having WWN precedence. The remote device issues its own (delayed) FLOGI after accepting the driver's and, upon transmitting the FLOGI, immediately recognizes it has already processed the driver's FLOGI thus it transitions to sending a PLOGI before waiting for an ACC to its FLOGI. In the driver, the FLOGI is received and an ACC sent, followed by the PLOGI being received and an ACC sent. The issue is that the PLOGI reception occurs before the response from the adapter from the FLOGI ACC is received. Processing of the PLOGI sets state flags to perform the REG_RPI mailbox command and proceed with the rest of discovery on the port. The same completion routine used by both FLOGI and PLOGI is generic in nature. One of the things it does is clear flags, and those flags happen to drive the rest of discovery. So what happened was the PLOGI processing set the flags, the FLOGI ACC completion cleared them, thus when the PLOGI ACC completes it doesn't see the flags and stops. Fix by modifying the generic completion routine to not clear the rest of discovery flag (NLP_ACC_REGLOGIN) unless the completion is also associated with performing a mailbox command as part of its handling. For things such as FLOGI ACC, there isn't a subsequent action to perform with the adapter, thus there is no mailbox cmd ptr. PLOGI ACC though will perform REG_RPI upon completion, thus there is a mailbox cmd ptr. Link: https://lore.kernel.org/r/20200828175332.130300-3-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-04scsi: lpfc: Fix retry of PRLI when status indicates its unsupportedDick Kennedy
With port bounce/address swaps and timing between initiator GID queries vs remote port FC4 support registrations, the driver may be in a situation where it sends PRLIs for both FCP and NVME even though the target may not support one of the protocols. In this case, the remote port will reject the PRLI and usually indicate it does not support the request. However, the driver currently ignores the status of the failure and immediately retries the PRLI, which is pointless. In the case of this one remote port, the reception of the PRLI retry caused it to decide to send a LOGO. The LOGO restarted the process and the same results happened. It made the remote port undiscoverable to either protocol. Add logic to detect the non-support status and not attempt the retry of the PRLI. Link: https://lore.kernel.org/r/20200803210229.23063-6-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-24scsi: lpfc: Fix some function parameter descriptionsLee Jones
Fixes the following W=1 kernel build warning(s): drivers/scsi/lpfc/lpfc_els.c:3619: warning: Function parameter or member 't' not described in 'lpfc_els_retry_delay' drivers/scsi/lpfc/lpfc_els.c:3619: warning: Excess function parameter 'ptr' description in 'lpfc_els_retry_delay' drivers/scsi/lpfc/lpfc_els.c:4877: warning: Function parameter or member 'rejectError' not described in 'lpfc_els_rsp_reject' drivers/scsi/lpfc/lpfc_els.c:7900: warning: Function parameter or member 't' not described in 'lpfc_els_timeout' drivers/scsi/lpfc/lpfc_els.c:7900: warning: Excess function parameter 'ptr' description in 'lpfc_els_timeout' drivers/scsi/lpfc/lpfc_els.c:8272: warning: Function parameter or member 'payload' not described in 'lpfc_send_els_event' drivers/scsi/lpfc/lpfc_els.c:8272: warning: Excess function parameter 'cmd' description in 'lpfc_send_els_event' drivers/scsi/lpfc/lpfc_els.c:8355: warning: Function parameter or member 'tlv' not described in 'lpfc_els_rcv_fpin_li' drivers/scsi/lpfc/lpfc_els.c:8355: warning: Excess function parameter 'lnk_not' description in 'lpfc_els_rcv_fpin_li' drivers/scsi/lpfc/lpfc_els.c:9688: warning: Function parameter or member 't' not described in 'lpfc_fabric_block_timeout' drivers/scsi/lpfc/lpfc_els.c:9688: warning: Excess function parameter 'ptr' description in 'lpfc_fabric_block_timeout' Link: https://lore.kernel.org/r/20200723122446.1329773-2-lee.jones@linaro.org Cc: James Smart <james.smart@broadcom.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-02scsi: lpfc: Add an internal trace log bufferDick Kennedy
The current logging methods typically end up requesting a reproduction with a different logging level set to figure out what happened. This was mainly by design to not clutter the kernel log messages with things that were typically not interesting and the messages themselves could cause other issues. When looking to make a better system, it was seen that in many cases when more data was wanted was when another message, usually at KERN_ERR level, was logged. And in most cases, what the additional logging that was then enabled was typically. Most of these areas fell into the discovery machine. Based on this summary, the following design has been put in place: The driver will maintain an internal log (256 elements of 256 bytes). The "additional logging" messages that are usually enabled in a reproduction will be changed to now log all the time to the internal log. A new logging level is defined - LOG_TRACE_EVENT. When this level is set (it is not by default) and a message marked as KERN_ERR is logged, all the messages in the internal log will be dumped to the kernel log before the KERN_ERR message is logged. There is a timestamp on each message added to the internal log. However, this timestamp is not converted to wall time when logged. The value of the timestamp is solely to give a crude time reference for the messages. Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-26scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited eventXiyu Yang
In order to create or activate a new node, lpfc_els_unsol_buffer() invokes lpfc_nlp_init() or lpfc_enable_node() or lpfc_nlp_get(), all of them will return a reference of the specified lpfc_nodelist object to "ndlp" with increased refcnt. When lpfc_els_unsol_buffer() returns, local variable "ndlp" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of lpfc_els_unsol_buffer(). When "ndlp" in DEV_LOSS, the function forgets to decrease the refcnt increased by lpfc_nlp_init() or lpfc_enable_node() or lpfc_nlp_get(), causing a refcnt leak. Fix this issue by calling lpfc_nlp_put() when "ndlp" in DEV_LOSS. Link: https://lore.kernel.org/r/1590416184-52592-1-git-send-email-xiyuyang19@fudan.edu.cn Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-04-22scsi: lpfc: remove duplicate unloading checksJames Smart
During code reviews several instances of duplicate module unloading checks were found. Remove the duplicate checks. Link: https://lore.kernel.org/r/20200421203354.49420-1-jsmart2021@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-24scsi: lpfc: fix spelling mistake "Notication" -> "Notification"Colin Ian King
There is a spelling mistake in a lpfc_printf_vlog info message. Fix it. [mkp: fix spelling mistake in commit description] Link: https://lore.kernel.org/linux-scsi/20200221154841.77791-1-colin.king@canonical.com Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-18scsi: lpfc: add RDF registration and Link Integrity FPIN loggingJames Smart
This patch modifies lpfc to register for Link Integrity events via the use of an RDF ELS and to perform Link Integrity FPIN logging. Specifically, the driver was modified to: - Format and issue the RDF ELS immediately following SCR registration. This registers the ability of the driver to receive FPIN ELS. - Adds decoding of the FPIN els into the received descriptors, with logging of the Link Integrity event information. After decoding, the ELS is delivered to the scsi fc transport to be delivered to any user-space applications. - To aid in logging, simple helpers were added to create enum to name string lookup functions that utilize the initialization helpers from the fc_els.h header. - Note: base header definitions for the ELS's don't populate the descriptor payloads. As such, lpfc creates it's own version of the structures, using the base definitions (mostly headers) and additionally declaring the descriptors that will complete the population of the ELS. Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-10scsi: lpfc: Copyright updates for 12.6.0.4 patchesJames Smart
Update copyrights to 2020 for files modified in the 12.6.0.4 patch set. Link: https://lore.kernel.org/r/20200128002312.16346-13-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-10scsi: lpfc: Remove handler for obsolete ELS - Read Port Status (RPS)James Smart
There was report of an odd "Fix me..." log message, which was tracked down to the lpfc_els_rcv_rps() routine. This was in handling of a very old and obsolete ELS - Read Port Status. The RPS ELS was defined in FC-LS-1, but deprecated in FC-LS-2, and removed from all later FC-LS revisions. It was replaced by the Read Diagnostic Parameters (RDP) ELS and the Link Error Status Block descriptor. There should be no support for the RSP ELS. Remove support from driver. Link: https://lore.kernel.org/r/20200128002312.16346-9-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-12scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferencesJames Smart
Coverity reported the following: *** CID 101747: Null pointer dereferences (FORWARD_NULL) /drivers/scsi/lpfc/lpfc_els.c: 4439 in lpfc_cmpl_els_rsp() 4433 kfree(mp); 4434 } 4435 mempool_free(mbox, phba->mbox_mem_pool); 4436 } 4437 out: 4438 if (ndlp && NLP_CHK_NODE_ACT(ndlp)) { vvv CID 101747: Null pointer dereferences (FORWARD_NULL) vvv Dereferencing null pointer "shost". 4439 spin_lock_irq(shost->host_lock); 4440 ndlp->nlp_flag &= ~(NLP_ACC_REGLOGIN | NLP_RM_DFLT_RPI); 4441 spin_unlock_irq(shost->host_lock); 4442 4443 /* If the node is not being used by another discovery thread, 4444 * and we are sending a reject, we are done with it. Fix by adding a check for non-null shost in line 4438. The scenario when shost is set to null is when ndlp is null. As such, the ndlp check present was sufficient. But better safe than sorry so add the shost check. Reported-by: coverity-bot <keescook+coverity-bot@chromium.org> Addresses-Coverity-ID: 101747 ("Null pointer dereferences") Fixes: 2e0fef85e098 ("[SCSI] lpfc: NPIV: split ports") CC: James Bottomley <James.Bottomley@SteelEye.com> CC: "Gustavo A. R. Silva" <gustavo@embeddedor.com> CC: linux-next@vger.kernel.org Link: https://lore.kernel.org/r/20191111230401.12958-3-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-06scsi: lpfc: Fix unexpected error messages during RSCN handlingJames Smart
During heavy RCN activity and log_verbose = 0 we see these messages: 2754 PRLI failure DID:521245 Status:x9/xb2c00, data: x0 0231 RSCN timeout Data: x0 x3 0230 Unexpected timeout, hba link state x5 This is due to delayed RSCN activity. Correct by avoiding the timeout thus the messages by restarting the discovery timeout whenever an rscn is received. Filter PRLI responses such that severity depends on whether expected for the configuration or not. For example, PRLI errors on a fabric will be informational (they are expected), but Point-to-Point errors are not necessarily expected so they are raised to an error level. Link: https://lore.kernel.org/r/20191105005708.7399-5-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24scsi: lpfc: Add additional discovery log messagesJames Smart
When debugging a recent discovery customer problem it was very hard to tell what was happening with the existing discovery log messages. To fully debug the issue additional log messages were necessary. Add or extend log messages so that sufficient information is present for debugging. Link: https://lore.kernel.org/r/20191018211832.7917-16-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24scsi: lpfc: fix coverity error of dereference after null checkJames Smart
Log message conditional upon vport being NULL dereferences vport to determine log verbose setting. Changed to use lpfc_print_log which uses phba to determine the active log verbose setting. Fixes: 43bfea1bffb6 ("scsi: lpfc: Fix coverity errors on NULL pointer checks") Link: https://lore.kernel.org/r/20191018211832.7917-8-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix spinlock_irq issues in lpfc_els_flush_cmd()James Smart
While reviewing the CT behavior, issues with spinlock_irq were seen. The driver should be using spinlock_irqsave/irqrestore in the els flush routine. Changed to spinlock_irqsave/irqrestore. Link: https://lore.kernel.org/r/20190922035906.10977-15-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix list corruption in lpfc_sli_get_iocbqJames Smart
After study, it was determined there was a double free of a CT iocb during execution of lpfc_offline_prep and lpfc_offline. The prep routine issued an abort for some CT iocbs, but the aborts did not complete fast enough for a subsequent routine that waits for completion. Thus the driver proceeded to lpfc_offline, which releases any pending iocbs. Unfortunately, the completions for the aborts were then received which re-released the ct iocbs. Turns out the issue for why the aborts didn't complete fast enough was not their time on the wire/in the adapter. It was the lpfc_work_done routine, which requires the adapter state to be UP before it calls lpfc_sli_handle_slow_ring_event() to process the completions. The issue is the prep routine takes the link down as part of it's processing. To fix, the following was performed: - Prevent the offline routine from releasing iocbs that have had aborts issued on them. Defer to the abort completions. Also means the driver fully waits for the completions. Given this change, the recognition of "driver-generated" status which then releases the iocb is no longer valid. As such, the change made in the commit 296012285c90 is reverted. As recognition of "driver-generated" status is no longer valid, this patch reverts the changes made in commit 296012285c90 ("scsi: lpfc: Fix leak of ELS completions on adapter reset") - Modify lpfc_work_done to allow slow path completions so that the abort completions aren't ignored. - Updated the fdmi path to recognize a CT request that fails due to the port being unusable. This stops FDMI retries. FDMI will be restarted on next link up. Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix coverity errors on NULL pointer checksJames Smart
Coverity flagged several scenarios where checking of null pointer values wasn't consistent. Fix the code to that be consistent on checking. Link: https://lore.kernel.org/r/20190922035906.10977-12-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Add NVMe sequence level error recovery supportJames Smart
FC-NVMe-2 added support for sequence level error recovery in the FC-NVME protocol. This allows for the detection of errors and lost frames and immediate retransmission of data to avoid exchange termination, which escalates into NVMeoFC connection and association failures. A significant RAS improvement. The driver is modified to indicate support for SLER in the NVMe PRLI is issues and to check for support in the PRLI response. When both sides support it, the driver will set a bit in the WQE to enable the recovery behavior on the exchange. The adapter will take care of all detection and retransmission. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Add MDS driver loopback diagnostics supportJames Smart
Added code to support driver loopback with MDS Diagnostics. This style of diagnostics passes frames from the fabric to the driver who then echo them back out the link. SEND_FRAME WQEs are used to transmit the frames. Added the SOF and EOF field location definitions for use by SEND_FRAME. Also ensure that enable_mds_diags is a RW parameter. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Migrate to %px and %pf in kernel print callsJames Smart
In order to see real addresses, convert %p with %px for kernel addresses and replace %p with %pf for functions. While converting, standardize on "x%px" throughout (not %px or 0x%px). Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix coverity warningsJames Smart
Running on Coverity produced the following errors: - coding style (indentation) - memset size mismatch errors note: comment cases where it is purposely a mismatch Fix the errors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix driver nvme rescan loggingJames Smart
In situations where zoning is not being used, thus NVMe initiators see other NVMe initiators as well as NVMe targets, a link bounce on an initiator will cause the NVMe initiators to spew "6169" State Error messages. The driver is not qualifying whether the remote port is a NVMe targer or not before calling the lpfc_nvme_rescan_port(), which validates the role and prints the message if its only an NVMe initiator. Fix by the following: - Before calling lpfc_nvme_rescan_port() ensure that the node is a NVMe storage target or a NVMe discovery controller. - Clean up implementation of lpfc_nvme_rescan_port. remoteport pointer will always be NULL if a NVMe initiator only. But, grabbing of remoteport pointer should be done under lock to coincide with the registering of the remote port with the fc transport. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix FLOGI handling across multiple link up/down conditionsJames Smart
It's possible for the driver to initiate an FLOGI and before it completes, another link down/up transition occurs requiring a new FLOGI. Currently, nothing is done to abort/noop the older FLOGI request to the adapter, so if this transition occurs and the FLOGI completion is received after the link down/up transition, the driver may erroneously act on the older FLOGI. In most cases, the adapter properly terminates/fails the FLOGI, but there is a timing condition where the FLOGI may complete on the wire prior to the transition, but the response may not be seen/processed by the driver before the driver sees the link transition. Fix by having the link down handler in the driver run through any outstanding ELS's and change the completion handler of the ELS so that it will be no-op'd and released. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-11Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: qla2xxx, hpsa, lpfc, ufs, mpt3sas, ibmvscsi, megaraid_sas, bnx2fc and hisi_sas as well as the removal of the osst driver (I heard from Willem privately that he would like the driver removed because all his test hardware has failed). Plus number of minor changes, spelling fixes and other trivia. The big merge conflict this time around is the SPDX licence tags. Following discussion on linux-next, we believe our version to be more accurate than the one in the tree, so the resolution is to take our version for all the SPDX conflicts" Note on the SPDX license tag conversion conflicts: the SCSI tree had done its own SPDX conversion, which in some cases conflicted with the treewide ones done by Thomas & co. In almost all cases, the conflicts were purely syntactic: the SCSI tree used the old-style SPDX tags ("GPL-2.0" and "GPL-2.0+") while the treewide conversion had used the new-style ones ("GPL-2.0-only" and "GPL-2.0-or-later"). In these cases I picked the new-style one. In a few cases, the SPDX conversion was actually different, though. As explained by James above, and in more detail in a pre-pull-request thread: "The other problem is actually substantive: In the libsas code Luben Tuikov originally specified gpl 2.0 only by dint of stating: * This file is licensed under GPLv2. In all the libsas files, but then muddied the water by quoting GPLv2 verbatim (which includes the or later than language). So for these files Christoph did the conversion to v2 only SPDX tags and Thomas converted to v2 or later tags" So in those cases, where the spdx tag substantially mattered, I took the SCSI tree conversion of it, but then also took the opportunity to turn the old-style "GPL-2.0" into a new-style "GPL-2.0-only" tag. Similarly, when there were whitespace differences or other differences to the comments around the copyright notices, I took the version from the SCSI tree as being the more specific conversion. Finally, in the spdx conversions that had no conflicts (because the treewide ones hadn't been done for those files), I just took the SCSI tree version as-is, even if it was old-style. The old-style conversions are perfectly valid, even if the "-only" and "-or-later" versions are perhaps more descriptive. * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (185 commits) scsi: qla2xxx: move IO flush to the front of NVME rport unregistration scsi: qla2xxx: Fix NVME cmd and LS cmd timeout race condition scsi: qla2xxx: on session delete, return nvme cmd scsi: qla2xxx: Fix kernel crash after disconnecting NVMe devices scsi: megaraid_sas: Update driver version to 07.710.06.00-rc1 scsi: megaraid_sas: Introduce various Aero performance modes scsi: megaraid_sas: Use high IOPS queues based on IO workload scsi: megaraid_sas: Set affinity for high IOPS reply queues scsi: megaraid_sas: Enable coalescing for high IOPS queues scsi: megaraid_sas: Add support for High IOPS queues scsi: megaraid_sas: Add support for MPI toolbox commands scsi: megaraid_sas: Offload Aero RAID5/6 division calculations to driver scsi: megaraid_sas: RAID1 PCI bandwidth limit algorithm is applicable for only Ventura scsi: megaraid_sas: megaraid_sas: Add check for count returned by HOST_DEVICE_LIST DCMD scsi: megaraid_sas: Handle sequence JBOD map failure at driver level scsi: megaraid_sas: Don't send FPIO to RL Bypass queue scsi: megaraid_sas: In probe context, retry IOC INIT once if firmware is in fault scsi: megaraid_sas: Release Mutex lock before OCR in case of DCMD timeout scsi: megaraid_sas: Call disable_irq from process IRQ poll scsi: megaraid_sas: Remove few debug counters from IO path ...
2019-06-21lpfc: add support for translating an RSCN rcv into a discovery rescanJames Smart
This patch updates RSCN receive processing to check for the remote port being an NVME port, and if so, invoke the nvme_fc callback to rescan the remote port. The rescan will generate a discovery udev event. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-06-21lpfc: add support to generate RSCN events for nportJames Smart
This patch adds general RSCN support: - The ability to transmit an RSCN to the port on the other end of the link (regular port if pt2pt, or fabric controller if fabric). - And general recognition of an RSCN ELS when an ELS is received. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-06-18scsi: lpfc: Fix PT2PT PLOGI collison stopping discoveryJames Smart
Under heavy load the target stops responding, the drivers aborts timeout and we start recovery by logging out of the target, but the target is never logged into again. In a point-to-point scenario, there were battling PLOGI's. When we received a PLOGI request after having sent one, the driver cancels the processing of the original plogi. However, the completion path of the remaining plogi was coded to skip the reg_rpi that should be happening on the 2nd plogi. Correct by adding a simple pt2pt check such that the 2nd plogi isn't skipped and the reg_login occurs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-13scsi: lpfc: add check for loss of ndlp when sending RRQJames Smart
There was a missing qualification of a valid ndlp structure when calling to send an RRQ for an abort. Add the check. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ma