summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdkfd
AgeCommit message (Collapse)Author
2020-07-01drm/amdkfd: sienna_cichlid virtual function supportshaoyunl
amdkfd add support for sienna_cichlid virtual function Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01drm/amdkfd: Support debugger in Navi1x trap handlerJay Cornwall
- Preserve scalar GPRs ttmp[4:11] and ttmp13 - Add single step exception during context save workaround - Remove incorrect PC adjustment during context save Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01drm/amdkfd: Support newer assemblers in gfx10 trap handlerJay Cornwall
The contents of macros are parsed by the assembler before conditions have been tested. This causes assembly errors when using IP-specific instructions in the IP-unified trap handler. Add a preprocessing step to filter IP-specific code. Also guard a Navi1x-specific instruction (no effect on Sienna_Cichlid). Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01drm/amdkfd: Add Sienna_Cichlid trap handler supportJay Cornwall
- Replace SQC stores with TCP stores - Synchronize with MSG_SAVEWAVE via lgkmcnt - HW_REG_IB_STS is now read-only Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01drm/amdkfd: Support Sienna_Cichlid KFD v4Yong Zhao
v4: drop get_tile_config, comment out other callbacks Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-29drm/amdkfd: fix a dereference of pdd before it is null checkedColin Ian King
Currently pointer pdd is being dereferenced when assigning pointer dpm and then pdd is being null checked. Fix this by checking if pdd is null before the dereference of pdd occurs. Addresses-Coverity: ("Dereference before null check") Fixes: 32cb59f31362 ("drm/amdkfd: Track SDMA utilization per process") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-28drm/amdkfd: Fix GCC 10 compiler warningFelix Kuehling
GCC 10 was complaining about how we append data to a buffer using snprintf: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: In function ‘perf_show’: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:214:3: warning: ‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict] 214 | snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This patch fixes the warnings and makes the sysfs code more efficient by remembering the offset in the buffer between append operations. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-28drm/amdkfd: Track SDMA utilization per processMukul Joshi
Track SDMA usage on a per process basis and report it through sysfs. The value in the sysfs file indicates the amount of time SDMA has been in-use by this process since the creation of the process. This value is in microsecond granularity. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-21drm/amdkfd: report the real PCI bus numberEvan Quan
Since the PCI bus number retrieved by PCI_BUS_NUM(pdev->devfn) is wrong. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-21drm/amdkfd: Fix boolreturn.cocci warningsAishwarya Ramakrishnan
Return statements in functions returning bool should use true/false instead of 1/0. drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c:40:9-10: WARNING: return of 0/1 in function 'event_interrupt_isr_v9' with return type bool Generated by: scripts/coccinelle/misc/boolreturn.cocci Signed-off-by: Aishwarya Ramakrishnan <aishwaryarj100@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01drm/amdkfd: Use a systematic method to calculate queue mask bitYong Zhao
The queue mask used for set_resources always assumes the queue number per pipe is 8, so KFD needs to align with that by using function amdgpu_queue_mask_bit_to_set_resource_bit(). Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01drm/amdkfd: Fix comment formattingFelix Kuehling
Corrected two function names. Added a missing space. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01drm/amdkfd: Report domain with topologyOri Messinger
PCI domain has moved to 32-bits to accommodate virtualization, so a 32-bit integer is exposed for domain to reflect this change. Domain can be found in here: /sys/class/kfd/kfd/topology/nodes/X/properties Where X is the card number Signed-off-by: Ori Messinger <ori.messinger@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-30drm/amdkfd: Track GPU memory utilization per processMukul Joshi
Track GPU VRAM usage on a per process basis and report it through sysfs. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28drm/amdkfd: Enable over-subscription with >1 GWS queueJoseph Greathouse
The current GWS usage model will only allows a single GWS-enabled process to be active on the GPU at once. This ensures that a barrier-using kernel gets a known amount of GPU hardware, to prevent deadlock due to inability to go beyond the GWS barrier. The HWS watches how many GWS entries are assigned to each process, and goes into over-subscription mode when two processes need more than the 64 that are available. The current KFD method for working with this is to allocate all 64 GWS entries to each GWS-capable process. When more than one GWS-enabled process is in the runlist, we must make sure the runlist is in over-subscription mode, so that the HWS gets a chained RUN_LIST packet and continues scheduling kernels. Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28drm/amdkfd: Enable GWS based on FW SupportJoseph Greathouse
Rather than only enabling GWS support based on the hws_gws_support modparm, also check whether the GPU's HWS firmware supports GWS. Leave the old modparm in place in case users want to test GWS on GPUs not yet in the support list. v2: fix broken syntax from the first patch. Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28drm/amdkfd: New IOCTL to allocate queue GWS (v2)Oak Zeng
Add a new kfd ioctl to allocate queue GWS. Queue GWS is released on queue destroy. v2: re-introduce this API with the following fixes squashed in: - drm/amdkfd: fix null pointer dereference on dev - drm/amdkfd: Return proper error code for gws alloc API - drm/amdkfd: Remove GPU ID in GWS queue creation Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28drm/amdkfd: Put ASIC revision into HSA capabilityJoseph Greathouse
In order to surface the ASIC revision to user level, we want to put it into the HSA topology. This can be because different ASIC revisions may require user-level software to do different things (e.g. patch code for things that are changed in later hardware revisions). The ASIC revision from the hardware is maximum of 4 bits at this time, so put it into 4 of the open bits in the HSA capability. Then user-level software can use this capability information to know -- for each ASIC -- what revision-based things must be done. Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22drm/amdkfd: Adjust three kfd dmesg printings during initializationYong Zhao
Delete two printings which are not very useful, and change one from pr_info() to pr_debug(). Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01drm/amdkfd: kfree the wrong pointerJack Zhang
Originally, it kfrees the wrong pointer for mem_obj. It would cause memory leak under stress test. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19drm: amd: fix spelling mistake "shoudn't" -> "shouldn't"Colin Ian King
There are spelling mistakes in pr_err messages and a comment. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10drm/amdkfd: Consolidate duplicated bo alloc flagsYong Zhao
ALLOC_MEM_FLAGS_* used are the same as the KFD_IOC_ALLOC_MEM_FLAGS_*, but they are interweavedly used in kernel driver, resulting in bad readability. For example, KFD_IOC_ALLOC_MEM_FLAGS_COHERENT is not referenced in kernel, and it functions implicitly in kernel through ALLOC_MEM_FLAGS_COHERENT, causing unnecessary confusion. Replace all occurrences of ALLOC_MEM_FLAGS_* with KFD_IOC_ALLOC_MEM_FLAGS_* to solve the problem. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10drm/amdkfd: Use pr_debug to print the message of reaching event limitYong Zhao
People are inclined to think of the previous pr_warn message as an error, so use pre_debug instead. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06drm/amdkfd: Signal eviction fence on process destruction (v2)Felix Kuehling
Otherwise BOs may wait for the fence indefinitely and never be destroyed. v2: Signal the fence right after destroying queues to avoid unnecessary delaye-delete in kfd_process_wq_release Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: xinhui pan <xinhui.pan@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06drm/amdkfd: Add more comments on GFX9 user CP queue MQD workaroundYong Zhao
Because too many things are involved in this workaround, we need more comments to avoid pitfalls. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05drm/amdkfd: fix indentation issueColin Ian King
There is a statement that is indented with spaces instead of a tab. Replace spaces with a tab. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28drm/amdkfd: change SDMA MQD memory typeEric Huang
SDMA MQD memory type is NC that causes MQD data overwritten accidentally by an old stable cache line. Changing it to UC default for GART will fix the issue. The mqd_gfx9 parameter is meant for control stacks that are allocated together with user mode queue MQDs. Setting mqd_gfx9 to true maps the control stack pages as NC. Here it was accidentally applied to SDMA MQDs, which are allocated together with the HIQ MQD. Setting the mqd_gfx9 to false avoids that. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Acked-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28drm/amdkfd: Make get_tile_config() genericYong Zhao
Given we can query all the asic specific information from amdgpu_gfx_config, we can make get_tile_config() generic. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdkfd: Delete unnecessary unmap queue package submissionsYong Zhao
The previous way of using SDMA queue count to infer whether we should unmap SDMA engines has bugs. The reason it did not cause issues is because MEC firmware unmaps all queues (CP + SDMA) when a unmap package for compute engine is received. Becasue of that, only one unmap queue package is needed, instead of one unmap queue package for CP and each SDMA engine, which results in much simpler driver code. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdkfd: Delete excessive printingsYong Zhao
Those printings are duplicated or useless. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdkfd: Fix a memory leak in queue creation error handlingYong Zhao
When the queue creation failed, some resources were not freed. Fix it. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdkfd: Count active CP queues directlyYong Zhao
The previous code of calculating active CP queues is problematic if some SDMA queues are inactive. Fix that by counting CP queues directly. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdkfd: Avoid ambiguity by indicating it's cp queueYong Zhao
The queues represented in queue_bitmap are only CP queues. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdkfd: Rename queue_count to active_queue_countYong Zhao
The name is easier to understand the code. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amd: Extend ROCt to surface UUID for devices that have themDivya Shikre
Devices from Arcturus onwards will have their UUID exposed to Thunk. Adding neccessary functions to the kernel to propagate the uuid. Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12drm/amdkfd: refactor runtime pm for bacoRajneesh Bhardwaj
So far the kfd driver implemented same routines for runtime and system wide suspend and resume (s2idle or mem). During system wide suspend the kfd aquires an atomic lock that prevents any more user processes to create queues and interact with kfd driver and amd gpu. This mechanism created problem when amdgpu device is runtime suspended with BACO enabled. Any application that relies on kfd driver fails to load because the driver reports a locked kfd device since gpu is runtime suspended. However, in an ideal case, when gpu is runtime suspended the kfd driver should be able to: - auto resume amdgpu driver whenever a client requests compute service - prevent runtime suspend for amdgpu while kfd is in use This change refactors the amdgpu and amdkfd drivers to support BACO and runtime power management. Reviewed-by: Oak Zeng <oak.zeng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12drm/amdkfd: show warning when kfd is lockedRajneesh Bhardwaj
During system suspend the kfd driver aquires a lock that prohibits further kfd actions unless the gpu is resumed. This adds some info which can be useful while debugging. Reviewed-by: Oak Zeng <oak.zeng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-06drm/amdkfd: Add queue information to sysfsAmber Lin
Provide compute queues information in sysfs under /sys/class/kfd/kfd/proc. The format is /sys/class/kfd/kfd/proc/<pid>/queues/<queue id>/XX where XX are size, type, and gpuid three files to represent queue size, queue type, and the GPU this queue uses. <queue id> folder and files underneath are generated when a queue is created. They are removed when the queue is destroyed. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-04drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS modeYong Zhao
The sdma_queue_count increment should be done before execute_queues_cpsch(), which calls pm_calc_rlib_size() where sdma_queue_count is used to calculate whether over_subscription is triggered. With the previous code, when a SDMA queue is created, compute_queue_count in pm_calc_rlib_size() is one more than the actual compute queue number, because the queue_count has been incremented while sdma_queue_count has not. This patch fixes that. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16drm/amdkfd: Add a message when SW scheduler is usedYong Zhao
SW scheduler is previously called non HW scheduler, or non HWS. This message is useful when triaging issues from dmesg. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16drm/amdkfd: use map_queues for hiq on gfx v10 as wellHuang Rui
To align with gfx v9, we use the map_queues packet to load hiq MQD. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16drm/amdkfd: use kiq to load the mqd of hiq queue for gfx v9 (v6)Aaron Liu
There is an issue that CP will check the HIQ queue to be configured and mapped with KIQ ring, otherwise, it will be unable to read back the secure buffer while the gfxoff is enabled even with trusted IP blocks. v1 -> v2: - Fix to remove surplus set_resources packets. - Fill the whole configuration in MQD. - Change the author as Aaron because he addressed the key point of this issue. - Add kiq ring lock. v2 -> v3: - Free the lock while in error return case. - Remove the programming only needed by the queue is unmapped. v3 -> v4: - Remove doorbell programming because it's used for restarting queue. - Remove CP scheduler programming because map_queue packet will handle this. v4 -> v5: - Remove cp_hqd_active because mec ucode will enable it while use map_queues. - Revise goto out_unlock. - Correct the right doorbell offset for HIQ that kfd driver assigned in the packet. v5 -> v6: - Merge Arcturus fix into this patch because it will get oops in Arcturus platform. Reported-by: Lisa Saturday <Lisa.Saturday@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-and-Tested-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16drm/amdgpu: GPU TLB flush API moved to amdgpu_amdkfdAlex Sierra
[Why] TLB flush method has been deprecated using kfd2kgd interface. This implementation is now on the amdgpu_amdkfd API. [How] TLB flush functions now implemented in amdgpu_amdkfd. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-09drm/amdkfd: Improve kfd_process lookup in kfd_ioctlFelix Kuehling
Use filep->private_data to store a pointer to the kfd_process data structure. Take an extra reference for that, which gets released in the kfd_release callback. Check that the process calling kfd_ioctl is the same that opened the file descriptor. Return -EBADF if it's not, so that this error can be distinguished in user mode. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07drm/amdkfd: Avoid hanging hardware in stop_cpschFelix Kuehling
Don't use the HWS if it's known to be hanging. In a reset also don't try to destroy the HIQ because that may hang on SRIOV if the KIQ is unresponsive. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: shaoyunl <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07drm/amdkfd: Improve HWS hang detection and handlingFelix Kuehling
Move HWS hang detection into unmap_queues_cpsch to catch hangs in all cases. If this happens during a reset, don't schedule another reset because the reset already in progress is expected to take care of it. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: shaoyunl <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07drm/amdkfd: Remove unused variableFelix Kuehling
dqm->pipeline_mem wasn't used anywhere. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: shaoyunl <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07drm/amdkfd: Fix permissions of hang_hwsFelix Kuehling
Reading from /sys/kernel/debug/kfd/hang_hws would cause a kernel oops because we didn't implement a read callback. Set the permission to write-only to prevent that. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: shaoyunl <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-19drm/amdkfd: expose num_cp_queues data field to topology node (v2)Huang Rui
Thunk driver would like to know the num_cp_queues data, however this data relied on different asic specific. So it's better to get it from kfd driver. v2: don't update name size. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-19drm/amdkfd: expose num_sdma_queues_per_engine data field to topology node (v2)Huang Rui
Thunk driver would like to know the num_sdma_queues_per_engine data, however this data relied on different asic specific. So it's better to get it from kfd driver. v2: don't update the name size. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>