summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu.h
AgeCommit message (Collapse)Author
2020-12-16Merge tag 'amd-drm-fixes-5.11-2020-12-16' of ↵Daniel Vetter
git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-fixes-5.11-2020-12-16: amdgpu: - Fix a eDP regression for DCE asics - SMU fixes for sienna cichlid - Misc W=1 fixes - SDMA 5.2 reset fix - Suspend/resume fix - Misc display fixes - Misc runtime PM fixes and cleanups - Dimgrey Cavefish fixes - printk cleanup - Documentation warning fixes amdkfd: - Error logging fix - Fix pipe offset calculation radeon: - printk cleanup Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201216192421.18627-1-alexander.deucher@amd.com
2020-12-15drm/amdgpu: add check for ACPI power resourcesAlex Deucher
Check if the device has ACPI power resources so we can enable runtime pm if so. Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-12-15drm/amdgpu: split BOCO and ATPX handlingAlex Deucher
In preparation for systems that support d3cold on dGPUs independent of PX/HG. No functional change intended. Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-12-15drm/amdgpu: add judgement for suspend/resume sequenceLikun Gao
S0ix only makes sense on APUs since they are part of the platform, so only when the ASIC is APU should set amdgpu_acpi_is_s0ix_supported flag to deal with the related situation. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-12-15Merge tag 'drm-misc-next-2020-11-27-1' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.11: UAPI Changes: Cross-subsystem Changes: * char/agp: Disable frontend without CONFIG_DRM_LEGACY * mm: Fix fput in mmap error path; Introduce vma_set_file() to change vma->vm_file Core Changes: * dma-buf: Use sgtables in system heap; Move heap helpers to CMA-heap code; Skip sync for unmapped buffers; Alloc higher order pages is available; Respect num_fences when initializing shared fence list * doc: Improvements around DRM modes and SCALING_FILTER * Pass full state to connector atomic functions + callee updates * Cleanups * shmem: Map pages with caching by default; Cleanups * ttm: Fix DMA32 for global page pool * fbdev: Cleanups * fb-helper: Update framebuffer after userspace writes; Unmap console buffer during shutdown; Rework damage handling of shadow framebuffer Driver Changes: * amdgpu: Multi-hop fixes, Clenaups * imx: Fix rotation for Vivante tiled formats; Support nearest-neighour skaling; Cleanups * mcde: Fix RGB formats; Support DPI output; Cleanups * meson: HDMI clock fixes * panel: Add driver and bindings for Innolux N125HCE-GN1 * panel/s6e63m0: More backlight levels; Fix init; Cleanups * via: Clenunps * virtio: Use fence ID for handling fences; Cleanups Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20201127083055.GA29139@linux-uq9g
2020-11-18drm/amdgpu: Fix missing prototype warningLuben Tuikov
Fix a missing prototype warning for function amdgpu_info_ioctl(), drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:482:5: warning: no previous prototype for 'amdgpu_info_ioctl' [-Wmissing-prototypes] Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201110051548.685725-1-luben.tuikov@amd.com
2020-11-13drm/amdgpu: add s0i3 capacity check for s0i3 routine (v2)Prike Liang
add amdgpu_acpi_is_s0ix_supported() to check the platform whether support s0i3. v2: fix empty function parameters warning (void) Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-11-13drm/amdgpu: add amdgpu_smuio structureHawking Zhang
Add amdgpu_smuio structure in amdgpu_device to provide various callback functions to support smuio ip funcitonality Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-11-13gpu: drm: amd: amdgpu: amdgpu: Mark global variables as __maybe_unusedLee Jones
These 3 variables are used in *some* sourcefiles which include amdgpu.h, but not *all*. This leads to a flurry of build warnings. Fixes the following W=1 kernel build warning(s): from drivers/gpu/drm/amd/amdgpu/amdgpu.h:67, drivers/gpu/drm/amd/amdgpu/amdgpu.h:198:19: warning: ‘no_system_mem_limit’ defined but not used [-Wunused-const-variable=] drivers/gpu/drm/amd/amdgpu/amdgpu.h:197:19: warning: ‘debug_evictions’ defined but not used [-Wunused-const-variable=] drivers/gpu/drm/amd/amdgpu/amdgpu.h:196:18: warning: ‘sched_policy’ defined but not used [-Wunused-const-variable=] NB: Repeats ~650 times - snipped for brevity. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-27drm/amd/pm: correct the checks for polaris kickersEvan Quan
By defining new Macros. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-15drm/amd/pm: correct gfx and pcie settings on umd pstate switching(V2)Evan Quan
For entering UMD stable Pstate, the operations to enter rlc_safe mode, disable mgcg_perfmon and disable PCIE aspm are needed. And the opposite operations should be performed on UMD stable Pstate exiting. V2: take those ASICs(CI/SI/VI) which may not support this into consideration Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-07drm/amdgpu: add per device user friendly xgmi events for vega20Jonathan Kim
Non-outbound data metrics are non useful so mark them as legacy. Bucket new perf counters into device and not device ip. Bind events to chip instead of IP. Report available event counters and not number of hw counter banks. Move DF public macros to private since not needed outside of IP version. v5: cleanup by moving per chip configs into structs v4: After more discussion, replace *_LEGACY references with IP references to indicate concept of pmu-typed versus event-config-typed event registration. v3: attr groups const array is global but attr groups are allocated per device which doesn't work and causes problems on memory allocation and de-allocation for pmu unregister. Switch to building const attr groups per pmu instead to simplify solution. v2: add comments on sysfs structure and formatting. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-01drm/amdgpu: support indirect access reg outside of mmio bar (v2)Hawking Zhang
support both direct and indirect accessor in unified helper functions. v2: Retire indirect mmio access via mm_index/data Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-01drm/amdgpu: add helper function for indirect reg access (v3)Hawking Zhang
Add helper function in order to remove RREG32/WREG32 in current pcie_rreg/wreg function for soc15 and onwards adapters. PCIE_INDEX/DATA pairs are used to access regsiters outside of mmio bar in the helper functions. The new helper functions help remove the recursion of amdgpu_mm_rreg/wreg from pcie_rreg/wreg and provide the oppotunity to centralize direct and indirect access in a single function. v2: Fixed typo and refine the comments v3: Remove unnecessary volatile local variable Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-30drm/amdgpu: use function pointer for gfxhub functionsOak Zeng
gfxhub functions are now called from function pointers, instead of from asic-specific functions. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-15drm/amdgpu: Fix consecutive DPC recovery failures.Andrey Grodzovsky
Cache the PCI state on boot and before each case where we might loose it. v2: Add pci_restore_state while caching the PCI state to avoid breaking PCI core logic for stuff like suspend/resume. v3: Extract pci_restore_state from amdgpu_device_cache_pci_state to avoid superflous restores during GPU resets and suspend/resumes. v4: Style fixes. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-15drm/amdgpu: Avoid accessing HW when suspending SW stateAndrey Grodzovsky
At this point the ASIC is already post reset by the HW/PSP so the HW not in proper state to be configured for suspension, some blocks might be even gated and so best is to avoid touching it. v2: Rename in_dpc to more meaningful name Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-15drm/amdgpu: Implement DPC recoveryAndrey Grodzovsky
Add PCI Downstream Port Containment (DPC) with basic recovery functionality v2: remove pci_save_state to avoid breaking suspend/resume v3: Fix style comments v4: Improve description. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26drm/amdkfd: fix set kfd node ras properties valueStanley.Yang
The ctx->features are new RAS implementation which is only available for Vega20 and onwards, it is not available for vega10, vega10 should follow legacy ECC implementation. Changed from V1: wrap function to initialize kfd node properties Changed from V2: remove wrap function and SDMA SRAM ECC check Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26drm/amdgpu: add an asic callback for pre asic initAlex Deucher
This callback can be used by asics that need to do something special prior to calling atom asic init. Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-26drm/amdgpu: Embed drm_device into amdgpu_device (v3)Luben Tuikov
a) Embed struct drm_device into struct amdgpu_device. b) Modify the inline-f drm_to_adev() accordingly. c) Modify the inline-f adev_to_drm() accordingly. d) Eliminate the use of drm_device.dev_private, in amdgpu. e) Switch from using drm_dev_alloc() to drm_dev_init(). f) Add a DRM driver release function, which frees the container amdgpu_device after all krefs on the contained drm_device have been released. v2: Split out adding adev_to_drm() into its own patch (previous commit), making this patch more succinct and clear. More detailed commit description. v3: squash in fix to call drmm_add_final_kfree() to avoid a warning. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24drm/amdgpu: Get DRM dev from adev by inline-fLuben Tuikov
Add a static inline adev_to_drm() to obtain the DRM device pointer from an amdgpu_device pointer. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24drm/amdgpu: drm_device to amdgpu_device by inline-f (v2)Luben Tuikov
Get the amdgpu_device from the DRM device by use of an inline function, drm_to_adev(). The inline function resolves a pointer to struct drm_device to a pointer to struct amdgpu_device. v2: Use a typed visible static inline function instead of an invisible macro. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24drm/amdgpu: refine create and release logic of hive infoDennis Li
Change to dynamically create and release hive info object, which help driver support more hives in the future. v2: Change to save hive object pointer in adev, to avoid locking xgmi_mutex every time when calling amdgpu_get_xgmi_hive. v3: 1. Change type of hive object pointer in adev from void* to amdgpu_hive_info*. 2. remove unnecessary variable initialization. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24drm/amdgpu: change reset lock from mutex to rw_semaphoreDennis Li
clients don't need reset-lock for synchronization when no GPU recovery. v2: change to return the return value of down_read_killable. v3: if GPU recovery begin, VF ignore FLR notification. Reviewed-by: Monk Liu <monk.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24drm/amdgpu: refine codes to avoid reentering GPU recoveryDennis Li
if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18drm/amdgpu: Fix repeatly flr issuejqdeng
Only for no job running test case need to do recover in flr notification. For having job in mirror list, then let guest driver to hit job timeout, and then do recover. Signed-off-by: jqdeng <Emily.Deng@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14drm/amdgpu: revert "fix system hang issue during GPU reset"Christian König
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-06drm/amdkfd: option to disable system mem limitPhilip Yang
If multiple process share system memory through /dev/shm, KFD allocate memory should not fail if it reaches the system memory limit because one copy of physical system memory are shared by multiple process. Add module parameter no_system_mem_limit to provide user option to disable system memory limit check at runtime using sysfs or during driver module init using kernel boot argument. By default the system memory limit is on. Print out debug message to warn user if KFD allocate memory failed because system memory reaches limit. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04drm/amdgpu: move vram usage by vbios to mman (v2)Alex Deucher
It's related to the memory manager so move it there. v2: inline the structure Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04drm/amdgpu: move IP discovery data to mmanAlex Deucher
It's related to the memory manager so move it there. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04drm/amdgpu: move stolen vga bo from amdgpu to amdgpu.gmcAlex Deucher
Since that is where we store the other data related to the stolen vga memory. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04drm/amdgpu: use a define for the memory size of the vga emulatorAlex Deucher
Rather than open coding it everywhere. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)Monk Liu
what: the MQD's save and restore of KCQ (kernel compute queue) cost lots of clocks during world switch which impacts a lot to multi-VF performance how: introduce a paramter to control the number of KCQ to avoid performance drop if there is no kernel compute queue needed notes: this paramter only affects gfx 8/9/10 v2: refine namings v3: choose queues for each ring to that try best to cross pipes evenly. v4: fix indentation some cleanupsin the gfx_compute_queue_acquire() v5: further fix on indentations more cleanupsin gfx_compute_queue_acquire() TODO: in the future we will let hypervisor driver to set this paramter automatically thus no need for user to configure it through modprobe in virtual machine Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04drm/amdgpu: add bad page count threshold in module parameter(v3)Guchun Chen
bad_page_threshold could be configured to enable/disable the associated bad page retirement feature in RAS. When it's -1, ras will use typical bad page failure value to handle bad page retirement. When it's 0, disable bad page retirement, and no bad page will be recorded and saved. For other valid value, driver will use this manual value as the threshold value of totoal bad pages. v2: correct documentation of this parameter. v3: remove confused statement in documentation. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-27drm/amdgpu: fix system hang issue during GPU resetDennis Li
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Suggested-by: Luben Tukov <luben.tuikov@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-15drm/amdgpu: add module parameter choose reset modeWenhui Sheng
Default value is auto, doesn't change original reset method logic. v2: change to use parameter reset_method v3: add warn msg if specified mode isn't supported Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-02Revert "drm/amdgpu: support access regs outside of mmio bar"Hawking Zhang
This reverts commit 2eee0229f65e897134566888e5321bcb3af0df7a. Fallback to a stable base until we have a correct new one Signed-off-by:Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-02drm/amdgpu: SI support for VCE clock controlAlex Jivin
Port functionality from the Radeon driver to support VCE clock control. Signed-off-by: Alex Jivin <alex.jivin@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01drm/amdkfd: Add eviction debug messagesFelix Kuehling
Use WARN to print messages with backtrace when evictions are triggered. This can help determine the root cause of evictions and help spot driver bugs triggering evictions unintentionally, or help with performance tuning by avoiding conditions that cause evictions in a specific workload. The messages are controlled by a new module parameter that can be changed at runtime: echo Y > /sys/module/amdgpu/parameters/debug_evictions echo N > /sys/module/amdgpu/parameters/debug_evictions Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01drm/amdgpu: Fix a buffer overflow handling the serial numberDan Carpenter
The comments say that the serial number is a 16-digit HEX string so the buffer needs to be at least 17 characters to hold the NUL terminator. The other issue is that "size" returned from sprintf() is the number of characters before the NUL terminator so the memcpy() wasn't copying the terminator. The serial number needs to be NUL terminated so that it doesn't lead to a read overflow in amdgpu_device_get_serial_number(). Also it's just cleaner and faster to sprintf() directly to adev->serial[] instead of using a temporary buffer. Fixes: 81a16241114b ("drm/amdgpu: Add unique_id and serial_number for Arcturus v3") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com>
2020-07-01drm/amdgpu: remove unnecessary check for mem trainLikun Gao
a.Check whether mem train support when try to reserve related memory. b.Remove ASIC check and atom firmware table version check as the check of firmware capability is enough to achieve that purpose. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-29drm/amdgpu: added a sysfs interface for thermal throttling related V4Evan Quan
User can check and set the enablement of throttling logging and the interval between each logging. V2: simplify the sysfs interface(no string parsing) V3: add proper lock protection on updating throttling_logging_rs.interval V4: documentation cosmetic per Luben's suggestion Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-22drm/amdgpu: add apu flags (v2)Alex Deucher
Add some APU flags to simplify handling of different APU variants. It's easier to understand the special cases if we use names flags rather than checking device ids and silicon revisions. v2: rebase on latest code Acked-by: Evan Quan <evan.quan@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-21drm/amd/display: Add DC Debug mask to disable features for bringupHarry Wentland
[Why] At bringup we want to be able to disable various power features. [How] These features are already exposed as dc_debug_options and exercised on other OSes. Create a new dc_debug_mask module parameter and expose relevant bits, in particular * DC_DISABLE_PIPE_SPLIT * DC_DISABLE_STUTTER * DC_DISABLE_DSC * DC_DISABLE_CLOCK_GATING Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-18drm/amdgpu: Add autodump debugfs node for gpu reset v8Jiange Zhao
When GPU got timeout, it would notify an interested part of an opportunity to dump info before actual GPU reset. A usermode app would open 'autodump' node under debugfs system and poll() for readable/writable. When a GPU reset is due, amdgpu would notify usermode app through wait_queue_head and give it 10 minutes to dump info. After usermode app has done its work, this 'autodump' node is closed. On node closure, amdgpu gets to know the dump is done through the completion that is triggered in release(). There is no write or read callback because necessary info can be obtained through dmesg and umr. Messages back and forth between usermode app and amdgpu are unnecessary. v2: (1) changed 'registered' to 'app_listening' (2) add a mutex in open() to prevent race condition v3 (chk): grab the reset lock to avoid race in autodump_open, rename debugfs file to amdgpu_autodump, provide autodump_read as well, style and code cleanups v4: add 'bool app_listening' to differentiate situations, so that the node can be reopened; also, there is no need to wait for completion when no app is waiting for a dump. v5: change 'bool app_listening' to 'enum amdgpu_autodump_state' add 'app_state_mutex' for race conditions: (1)Only 1 user can open this file node (2)wait_dump() can only take effect after poll() executed. (3)eliminated the race condition between release() and wait_dump() v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex' removed state checking in amdgpu_debugfs_wait_dump Improve on top of version 3 so that the node can be reopened. v7: move reinit_completion into open() so that only one user can open it. v8: remove complete_all() from amdgpu_debugfs_wait_dump(). Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08drm/amdgpu: enable hibernate support on Navi1XEvan Quan
BACO is needed to support hibernate on Navi1X. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01drm/amdgpu: re-structue members for ip discoveryHawking Zhang
This is to prepare for initializing discovery tmr size per ASIC type Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28drm/amdgpu: cleanup IB pool handling a bitChristian König
Fix the coding style, move and rename the definitions to better match what they are supposed to be doing. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28drm/amdgpu: implement TMZ accessor (v3)Luben Tuikov
Implement an accessor of adev->tmz.enabled. Let not code around access it as "if (adev->tmz.enabled)" as the organization may change. Instead... Recruit "bool amdgpu_is_tmz(adev)" to return exactly this Boolean value. That is, this function is now an accessor of an already initialized and set adev and adev->tmz. Add "void amdgpu_gmc_tmz_set(adev)" to check and set adev->gmc.tmz_enabled at initialization time. After which one uses "bool amdgpu_is_tmz(adev)" to query whether adev supports TMZ. Also, remove circular header file include. v2: Remove amdgpu_tmz.[ch] as requested. v3: Move TMZ into GMC. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>