Age | Commit message (Collapse) | Author |
|
The power-management related functions are unused since the
'commit 394216275c7d ("s390: remove broken hibernate / power management
support")'. Remove them from dasd drivers.
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
As part of removing the power management support from
s390 arch, remove PM callbacks from the scsi/zfcp driver.
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Instead of creating the sysfs attributes for the AP bus by hand,
describe them in .bus_groups and let the driver core handle it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-my: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
clang W=1 warns about missing prototypes:
>> arch/s390/kernel/vdso64/getcpu.c:8:5: warning: no previous prototype for function '__s390_vdso_getcpu' [-Wmissing-prototypes]
int __s390_vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
^
Add a local header file in order to get rid of this warnings.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
With the addition of more complete SR-IOV support, we are recommending
to raise this limit for distributions to 512, as the previous default of
128 can easily be hit with just the VFs of a single PCI physical
function.
With at least one distribution now shipping with this, supporting
only one fourth as many PCI functions on a default upstream build may
lead to confusion and reduced testing of the higher limit so increase
the default to 512.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Implement the previously removed getcpu vdso syscall by using the
TOD programmable field to pass the cpu number to user space.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Verify on exit to user space that always
- the primary ASCE (cr1) is set to kernel ASCE
- the secondary ASCE (cr7) is set to user ASCE
If this is not the case: panic since something went terribly wrong.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Create a region 3 page table which contains only invalid entries, and
use that via "s390_invalid_asce" instead of the kernel ASCE whenever
there is either
- no user address space available, e.g. during early startup
- as an intermediate ASCE when address spaces are switched
This makes sure that user space accesses in such situations are
guaranteed to fail.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Remove set_fs support from s390. With doing this rework address space
handling and simplify it. As a result address spaces are now setup
like this:
CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
----------------------------|-----------|-----------|-----------
user space | user | user | kernel
kernel, normal execution | kernel | user | kernel
kernel, kvm guest execution | gmap | user | kernel
To achieve this the getcpu vdso syscall is removed in order to avoid
secondary address mode and a separate vdso address space in for user
space. The getcpu vdso syscall will be implemented differently with a
subsequent patch.
The kernel accesses user space always via secondary address space.
This happens in different ways:
- with mvcos in home space mode and directly read/write to secondary
address space
- with mvcs/mvcp in primary space mode and copy from primary space to
secondary space or vice versa
- with e.g. cs in secondary space mode and access secondary space
Switching translation modes happens with sacf before and after
instructions which access user space, like before.
Lazy handling of control register reloading is removed in the hope to
make everything simpler, but at the cost of making kernel entry and
exit a bit slower. That is: on kernel entry the primary asce is always
changed to contain the kernel asce, and on kernel exit the primary
asce is changed again so it contains the user asce.
In kernel mode there is only one exception to the primary asce: when
kvm guests are executed the primary asce contains the gmap asce (which
describes the guest address space). The primary asce is reset to
kernel asce whenever kvm guest execution is interrupted, so that this
doesn't has to be taken into account for any user space accesses.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
* fixes:
s390: fix fpu restore in entry.S
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.
The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.
Cc: <stable@vger.kernel.org> # 5.8
Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Reported-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
While allmodconfig and allyesconfig build for s390 there are also
various bots running compile tests with randconfig, where PCI is
disabled. This reveals that a lot of drivers should actually depend on
HAS_IOMEM.
Adding this to each device driver would be a never ending story,
therefore just disable COMPILE_TEST for s390.
The reasoning is more or less the same as described in
commit bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML").
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Rename some variable and functions to better clarify
what they are and what they do.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Instead of creating the sysfs attributes for the stp root_dev by hand,
pass them to subsys_system_register() as parameter.
This also ensures that the attributes are available when the KOBJ_ADD
event is raised.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add kernel command line and last breaking event.
The kernel command line is taken from early_command_line and printed
only if kernel is not running as protected virtualization guest and
if it has been already initialized from the "COMMAND_LINE".
Linux version 5.10.0-rc3-22794-gecaa72788df0-dirty (gor@tuxmaker) #28 SMP PREEMPT Mon Nov 9 17:41:20 CET 2020
Kernel command line: audit_enable=0 audit=0 selinux=0 crashkernel=296M root=/dev/dasda1 dasd=ec5b
memblock=debug die
Kernel fault: interruption code 0005 ilc:2
PSW : 0000000180000000 0000000000012f92 (parse_boot_command_line+0x27a/0x46c)
R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 RI:0 EA:3
GPRS: 0000000000000000 00ffffffffffffff 0000000000000000 000000000001a65c
000000000000bf60 0000000000000000 00000000000003c0 0000000000000000
0000000000000080 000000000002322d 000000007f29ef20 0000000000efd018
000000000311c000 0000000000010070 0000000000012f82 000000000000bea8
Call Trace:
(sp:000000000000bea8 [<000000000002016e>] 000000000002016e)
sp:000000000000bf18 [<0000000000012408>] startup_kernel+0x88/0x2fc
sp:000000000000bf60 [<00000000000100c4>] startup_normal+0xb0/0xb0
Last Breaking-Event-Address:
[<00000000000135ba>] strcmp+0x22/0x24
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Decompressor works on a single statically allocated stack. Stacktrace
implementation with -mbackchain just takes few lines of code.
Linux version 5.10.0-rc3-22793-g0f84a355b776-dirty (gor@tuxmaker) #27 SMP PREEMPT Mon Nov 9 17:30:18 CET 2020
Kernel fault: interruption code 0005 ilc:2
PSW : 0000000180000000 0000000000012f92 (parse_boot_command_line+0x27a/0x46c)
R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 RI:0 EA:3
GPRS: 0000000000000000 00ffffffffffffff 0000000000000000 000000000001a62c
000000000000bf60 0000000000000000 00000000000003c0 0000000000000000
0000000000000080 000000000002322d 000000007f29ef20 0000000000efd018
000000000311c000 0000000000010070 0000000000012f82 000000000000bea8
Call Trace:
(sp:000000000000bea8 [<000000000002016e>] 000000000002016e)
sp:000000000000bf18 [<0000000000012408>] startup_kernel+0x88/0x2fc
sp:000000000000bf60 [<00000000000100c4>] startup_normal+0xb0/0xb0
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Information printed by print_pgm_check_info() is crucial for
debugging decompressor problems. Printing instruction addresses is
better than nothing, but turns further debugging into tedious job of
figuring out which function those addresses correspond to.
This change adds simplistic symbols resolution support. And adds %pS
format specifier support to decompressor_printk().
Decompressor symbols list is extracted and sorted with
nm -n -S:
...
0000000000010000 0000000000000014 T startup
0000000000010014 00000000000000b0 t startup_normal
0000000000010180 00000000000000b2 t startup_kdump
...
Then functions are filtered and contracted to a form:
"10000 14 startup\0""10014 b0 startup_normal\0""10180 b2 startup_kdump\0"
...
Which makes it trivial to find beginning of an entry and names are 0
terminated, so could be used as is. Symbols are binary-searched.
To get symbols list with final addresses and then get it into the
decompressor's image the same trick as for kallsyms is used.
Decompressor's vmlinux is linked twice.
Symbols are stored in .decompressor.syms section, current size is about
2kb.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use SYM_CODE_* annotations for asm functions, so that function lengths
are recognized correctly.
Also currently the most part of startup is marked as startup_kdump. Move
misplaced startup_kdump where it belongs.
$ nm -n -S arch/s390/boot/compressed/vmlinux
Before:
0000000000010000 T startup
0000000000010010 T startup_kdump
After:
0000000000010000 0000000000000014 T startup
0000000000010014 00000000000000b0 t startup_normal
0000000000010180 00000000000000b2 t startup_kdump
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The decompressor does not have any special debug means. Running the
kernel under qemu with gdb is helpful but tedious exercise if done
repeatedly. It is also not applicable to debugging under LPAR and z/VM.
One special thing which stands out is a working sclp_early_printk,
which could be used once the kernel switches to 64-bit addressing mode.
But sclp_early_printk does not provide any string formating capabilities.
Formatting and printing string without printk-alike function is a
not fun. The lack of printk-alike function means people would save up on
testing and introduce more bugs.
So, finally, provide decompressor_printk function, which fits on one
screen and trades features for simplicity.
It only supports "%s", "%x" and "%lx" specifiers and zero padding for
hex values.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Currently the kernel minimal compiler requirement is gcc 4.9 or
clang 10.0.1.
* gcc -mhotpatch option is supported since 4.8.
* A combination of -pg -mrecord-mcount -mnop-mcount -mfentry flags is
supported since gcc 9 and since clang 10.
Drop support for old -pg function prologues. Which leaves binary
compatible -mhotpatch / -mnop-mcount -mfentry prologues in a form:
brcl 0,0
Which are also do not require initial nop optimization / conversion and
presence of _mcount symbol.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Currently we have to consider too many different values which
in the end only affect identity mapping size. These are:
1. max_physmem_end - end of physical memory online or standby.
Always <= end of the last online memory block (get_mem_detect_end()).
2. CONFIG_MAX_PHYSMEM_BITS - the maximum size of physical memory the
kernel is able to support.
3. "mem=" kernel command line option which limits physical memory usage.
4. OLDMEM_BASE which is a kdump memory limit when the kernel is executed as
crash kernel.
5. "hsa" size which is a memory limit when the kernel is executed during
zfcp/nvme dump.
Through out kernel startup and run we juggle all those values at once
but that does not bring any amusement, only confusion and complexity.
Unify all those values to a single one we should really care, that is
our identity mapping size.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Instead of creating the sysfs attributes for the prng devices by hand,
describe them in .groups and let the misdevice core handle it.
This also ensures that the attributes are available when the KOBJ_ADD
event is raised.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fix typo in the kernel-doc markups
1. ccw driver -> ccw_driver
2. ccw_device_id_is_equal() -> ccw_dev_id_is_equal
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[vneethv@linux.ibm.com: slight modification in the changelog]
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
System call and program check handler both use the system call exit
path when returning to previous context. However the program check
handler jumps right to the end of the system call exit path if the
previous context is kernel context.
This lead to the quite odd double disabling of interrupts in the
system call exit path introduced with commit ce9dfafe29be ("s390:
fix system call exit path").
To avoid that have a separate program check handler exit path if the
previous context is kernel context.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
* fixes:
s390/cpum_sf.c: fix file permission for cpum_sfb_size
s390: update defconfigs
s390: fix system call exit path
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
As the number of cpus increases, the sccb response can exceed 4k for
read cpu and read scp info sclp commands. Hence, all cpu info entries
cant be embedded within a sccb response
Solution:
To overcome this limitation, extended sccb facility is provided by sclp.
1. Check if the extended sccb facility is installed.
2. If extended sccb is installed, perform the read scp and read cpu
command considering a max sccb length of three page size. This max
length is based on factors like max cpus, sccb header.
3. If extended sccb is not installed, perform the read scp and read cpu
sclp command considering a max sccb length of one page size.
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
For extended sccb support, sccb size could be up to 3 pages. Hence avoid
copy of sclp_info_sccb.
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
sclp early read cpu info is used to detect the number of configured
cpus, which is utilized by smp_detect_cpus() in early startup.
* For read cpu info, the sccb block should be below 2gb.
* smp_detect_cpus() utilizes read cpu info early, but after memblock
initialization. Thus use memblock_allow_low() instead.
* Avoid copy of sclp_core_info structure.
* sclp_early_init_core_info(), sclp_early_core_info and
sclp_early_core_info_valid initdata are no longer required.
* smp_get_core_info() is called only once during early stage. Hence for
early sclp_get_core_info(), directly call read cpu command. No need to
maintain sclp_early_core_info_valid.
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
when we're missing the necessary machine facilities zPCI can
not function. Until now it would silently fail to be initialized,
add an informational print.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This file is installed by the s390 CPU Measurement sampling
facility device driver to export supported minimum and
maximum sample buffer sizes.
This file is read by lscpumf tool to display the details
of the device driver capabilities. The lscpumf tool might
be invoked by a non-root user. In this case it does not
print anything because the file contents can not be read.
Fix this by allowing read access for all users. Reading
the file contents is ok, changing the file contents is
left to the root user only.
For further reference and details see:
[1] https://github.com/ibm-s390-tools/s390-tools/issues/97
Fixes: 69f239ed335a ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur")
Cc: <stable@vger.kernel.org> # 3.14
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The zcrypt api provides a new function to wait until the zcrypt
api is operational:
int zcrypt_wait_api_operational(void);
The AP bus scan and the binding of ap devices to device drivers is
an asynchronous job. This function waits until these initial jobs
are done and so the zcrypt api should be ready to serve crypto
requests - if there are resources available. The function uses an
internal timeout of 60s. The very first caller will either wait for
ap bus bindings complete or the timeout happens. This state will be
remembered for further callers which will only be blocked until a
decision is made (timeout or bindings complete).
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This patch adds notifications to userspace for two important
conditions of the ap bus:
I) Initial ap bus scan done. This indicates that the initial
scan of all the ap devices (cards, queues) is complete and
ap devices have been build up for all the hardware found.
This condition is signaled with
1) An ap bus change uevent send to userspace with an environment
key/value pair "INITSCAN=done":
# udevadm monitor -k -p
...
KERNEL[97.830919] change /devices/ap (ap)
ACTION=change
DEVPATH=/devices/ap
SUBSYSTEM=ap
INITSCAN=done
SEQNUM=10421
2) A sysfs attribute /sys/bus/ap/scans which shows the
number of completed ap bus scans done since bus init.
So a value of 1 or greater signals that the initial
ap bus scan is complete.
Note: The initial ap bus scan complete condition is fulfilled
and will be signaled even if there was no ap resource found.
II) APQN driver bindings complete. This indicates that all
APQNs have been bound to an zcrypt or alternate device
driver. Only with the help of an device driver an APQN
can be used for crypto load. So the binding complete
condition is the starting point for user space to be
sure all crypto resources on the ap bus are available
for use.
This condition is signaled with
1) An ap bus change uevent send to userspace with an environment
key/value pair "BINDINGS=complete":
# udevadm monitor -k -p
...
KERNEL[97.830975] change /devices/ap (ap)
ACTION=change
DEVPATH=/devices/ap
SUBSYSTEM=ap
BINDINGS=complete
SEQNUM=10422
2) A sysfs attribute /sys/bus/ap/bindings showing
"<nr of bound apqns>/<total nr of apqns> (complete)"
when all available apqns have been bound to device drivers, or
"<nr of bound apqns>/<total nr of apqns>"
when there are some apqns not bound to an device driver.
Note: The binding complete condition is also fulfilled, when
there are no apqns available to bind any device driver. In
this case the binding complete will be signaled AFTER init
scan is done.
Note: This condition may arise multiple times when after
initial scan modifications on the bindings take place. For
example a manual unbind of an APQN switches the binding
complete condition off. When at a later time the unbound APQNs
are bound with an device driver the binding is (again) complete
resulting in another uevent and marking the bindings sysfs
attribute with '(complete)'.
There is also a new function to be used within the kernel:
int ap_wait_init_apqn_bindings_complete(unsigned long timeout)
Interface to wait for the AP bus to have done one initial ap bus
scan and all detected APQNs have been bound to device drivers.
If these both conditions are not fulfilled, this function blocks
on a condition with wait_for_completion_interruptible_timeout().
If these both conditions are fulfilled (before the timeout hits)
the return value is 0. If the timeout (in jiffies) hits instead
-ETIME is returned. On failures negative return values are
returned to the caller. Please note that further unbind/bind
actions after initial binding complete is through do not cause this
function to block again.
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The s390-trng does provide 100% entropy. The quality value is supported
to be between 1 and 1024 and not 1..1000. Use 1024 to make this driver
the preferred one. If we ever have a better driver that has the same
quality but is faster we can change this again when merging the new
driver. No need to be conservative.
This makes sure that the hw variant is preferred over things like
virtio-rng, where the hypervisor has a potential to be misconfigured
and thus should have a slightly lower confidence.
Cc: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:
- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved
- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see
commit 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
region")
Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.
Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we're done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
And move it earlier in the decompressor.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Also correct rounding downs in estimation calculations.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
It is relying on _REGION1_SHIFT / _REGION2_SHIFT values which come from
asm/pgtable.h, so include it.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Kasan early code is only working on init_mm, remove unneeded pgd
parameter from kasan_copy_shadow and rename it to
kasan_copy_shadow_mapping.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Kasan has nothing to do with vmemmap, strip vmemmap from function names
to avoid confusing people.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fixes the following warning with CONFIG_KERNEL_UNCOMPRESSED=y
arch/s390/boot/compressed/decompressor.h:6:46: warning: non-void function
does not return a value [-Wreturn-type]
static inline void *decompress_kernel(void) {}
^
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
To make sure that the vmalloc area size is for almost all cases large
enough let it depend on the (potential) physical memory size. There is
still the possibility to override this with the vmalloc kernel command
line parameter.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
We've seen several occurences in the past where the default vmalloc
size of 128GB is not sufficient. Therefore extend the default size.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Since commit 29d37e5b82f3 ("s390/protvirt: add ultravisor initialization")
vmax is adjusted to the ultravisor secure storage limit. This limit is
currently applied when 4-level paging is used. Later vmax is also used
to align vmemmap address to the top region table entry border. When vmax
is set to the ultravisor secure storage limit this is no longer the case.
Instead of changing vmax, make only MODULES_END be affected by the
secure storage limit, so that vmax stays intact for further vmemmap
address alignment.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Compiling the kernel with Kasan disables automatic 3-level vs 4-level
kernel space paging selection, because the shadow memory offset has
to be known at compile time and there is no such offset which would be
acceptable for both 3 and 4-level paging. Instead S390_4_LEVEL_PAGING
option was introduced which allowed to pick how many paging levels to
use under Kasan.
With the introduction of protected virtualization, kernel memory layout
may be affected due to ultravisor secure storage limit. This adds
additional complexity into how memory layout would look like in
combination with Kasan predefined shadow memory offsets. To simplify
this make Kasan 4-level paging default and remove Kasan 3-level paging
support.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
s390_base_ext_handler_fn haven't been used since its introduction in
commit ab14de6c37fa ("[S390] Convert memory detection into C code.").
s390_base_ext_handler itself is currently falsely storing 16 registers
at __LC_SAVE_AREA_ASYNC rewriting several following lowcore values:
cpu_flags, return_psw, return_mcck_psw, sync_enter_timer and
async_enter_timer.
Besides that s390_base_ext_handler itself is only potentially hiding
EXT interrupts which should not have happen in the first place. Any
piece of code which requires EXT interrupts before fully functional
ext_int_handler is enabled has to do it on its own, like this is done
by sclp_early_cmd() which is doing EXT interrupts handling synchronously
in sclp_early_wait_irq().
With s390_base_ext_handler removed unexpected EXT interrupt leads
to disabled wait with the address 0x1b0 (__LC_EXT_NEW_PSW), which is
currently setup in the decompressor.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Currently udelay relies on working EXT interrupts handler, which is not
the case during early startup. In such cases udelay_simple() has to be
used instead.
To avoid mistakes of calling udelay too early, which could happen from
the common code as well - make udelay work for the early code by
introducing static branch and redirecting all udelay calls to
udelay_simple until EXT interrupts handler is fully initialized and
async stack is allocated.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Set io/ext handlers to disabled wait in the initial lowcore, so that they
are effective right from the kernel start, when a boot method used does
not rewrite this part of the lowcore for its own needs (i.e. kexec, z/vm
ipl reader boot, qemu direct boot, load from removable media or server).
When the kernel is loaded by zipl, scsi loader or qemu loader, some or
all of the io/ext/pgm handlers addresses might be rewritten. Rewrite them
to initial values again as early as possible.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|