summaryrefslogtreecommitdiffstats
path: root/Documentation/s390/debugging390.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/s390/debugging390.rst')
-rw-r--r--Documentation/s390/debugging390.rst2613
1 files changed, 2613 insertions, 0 deletions
diff --git a/Documentation/s390/debugging390.rst b/Documentation/s390/debugging390.rst
new file mode 100644
index 000000000000..d49305fd5e1a
--- /dev/null
+++ b/Documentation/s390/debugging390.rst
@@ -0,0 +1,2613 @@
+=============================================
+Debugging on Linux for s/390 & z/Architecture
+=============================================
+
+Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com)
+
+Copyright (C) 2000-2001 IBM Deutschland Entwicklung GmbH, IBM Corporation
+
+.. Best viewed with fixed width fonts
+
+Overview of Document:
+=====================
+This document is intended to give a good overview of how to debug Linux for
+s/390 and z/Architecture. It is not intended as a complete reference and not a
+tutorial on the fundamentals of C & assembly. It doesn't go into
+390 IO in any detail. It is intended to complement the documents in the
+reference section below & any other worthwhile references you get.
+
+It is intended like the Enterprise Systems Architecture/390 Reference Summary
+to be printed out & used as a quick cheat sheet self help style reference when
+problems occur.
+
+.. Contents
+ ========
+ Register Set
+ Address Spaces on Intel Linux
+ Address Spaces on Linux for s/390 & z/Architecture
+ The Linux for s/390 & z/Architecture Kernel Task Structure
+ Register Usage & Stackframes on Linux for s/390 & z/Architecture
+ A sample program with comments
+ Compiling programs for debugging on Linux for s/390 & z/Architecture
+ Debugging under VM
+ s/390 & z/Architecture IO Overview
+ Debugging IO on s/390 & z/Architecture under VM
+ GDB on s/390 & z/Architecture
+ Stack chaining in gdb by hand
+ Examining core dumps
+ ldd
+ Debugging modules
+ The proc file system
+ SysRq
+ References
+ Special Thanks
+
+Register Set
+============
+The current architectures have the following registers.
+
+16 General propose registers, 32 bit on s/390 and 64 bit on z/Architecture,
+r0-r15 (or gpr0-gpr15), used for arithmetic and addressing.
+
+16 Control registers, 32 bit on s/390 and 64 bit on z/Architecture, cr0-cr15,
+kernel usage only, used for memory management, interrupt control, debugging
+control etc.
+
+16 Access registers (ar0-ar15), 32 bit on both s/390 and z/Architecture,
+normally not used by normal programs but potentially could be used as
+temporary storage. These registers have a 1:1 association with general
+purpose registers and are designed to be used in the so-called access
+register mode to select different address spaces.
+Access register 0 (and access register 1 on z/Architecture, which needs a
+64 bit pointer) is currently used by the pthread library as a pointer to
+the current running threads private area.
+
+16 64-bit floating point registers (fp0-fp15 ) IEEE & HFP floating
+point format compliant on G5 upwards & a Floating point control reg (FPC)
+
+4 64-bit registers (fp0,fp2,fp4 & fp6) HFP only on older machines.
+
+Note:
+ Linux (currently) always uses IEEE & emulates G5 IEEE format on older
+ machines, ( provided the kernel is configured for this ).
+
+
+The PSW is the most important register on the machine it
+is 64 bit on s/390 & 128 bit on z/Architecture & serves the roles of
+a program counter (pc), condition code register,memory space designator.
+In IBM standard notation I am counting bit 0 as the MSB.
+It has several advantages over a normal program counter
+in that you can change address translation & program counter
+in a single instruction. To change address translation,
+e.g. switching address translation off requires that you
+have a logical=physical mapping for the address you are
+currently running at.
+
++-------------------------+-------------------------------------------------+
+| Bit | |
++--------+----------------+ Value |
+| s/390 | z/Architecture | |
++========+================+=================================================+
+| 0 | 0 | Reserved (must be 0) otherwise specification |
+| | | exception occurs. |
++--------+----------------+-------------------------------------------------+
+| 1 | 1 | Program Event Recording 1 PER enabled, |
+| | | PER is used to facilitate debugging e.g. |
+| | | single stepping. |
++--------+----------------+-------------------------------------------------+
+| 2-4 | 2-4 | Reserved (must be 0). |
++--------+----------------+-------------------------------------------------+
+| 5 | 5 | Dynamic address translation 1=DAT on. |
++--------+----------------+-------------------------------------------------+
+| 6 | 6 | Input/Output interrupt Mask |
++--------+----------------+-------------------------------------------------+
+| 7 | 7 | External interrupt Mask used primarily for |
+| | | interprocessor signalling and clock interrupts. |
++--------+----------------+-------------------------------------------------+
+| 8-11 | 8-11 | PSW Key used for complex memory protection |
+| | | mechanism (not used under linux) |
++--------+----------------+-------------------------------------------------+
+| 12 | 12 | 1 on s/390 0 on z/Architecture |
++--------+----------------+-------------------------------------------------+
+| 13 | 13 | Machine Check Mask 1=enable machine check |
+| | | interrupts |
++--------+----------------+-------------------------------------------------+
+| 14 | 14 | Wait State. Set this to 1 to stop the processor |
+| | | except for interrupts and give time to other |
+| | | LPARS. Used in CPU idle in the kernel to |
+| | | increase overall usage of processor resources. |
++--------+----------------+-------------------------------------------------+
+| 15 | 15 | Problem state (if set to 1 certain instructions |
+| | | are disabled). All linux user programs run with |
+| | | this bit 1 (useful info for debugging under VM).|
++--------+----------------+-------------------------------------------------+
+| 16-17 | 16-17 | Address Space Control |
+| | | |
+| | | 00 Primary Space Mode: |
+| | | |
+| | | The register CR1 contains the primary |
+| | | address-space control element (PASCE), which |
+| | | points to the primary space region/segment |
+| | | table origin. |
+| | | |
+| | | 01 Access register mode |
+| | | |
+| | | 10 Secondary Space Mode: |
+| | | |
+| | | The register CR7 contains the secondary |
+| | | address-space control element (SASCE), which |
+| | | points to the secondary space region or |
+| | | segment table origin. |
+| | | |
+| | | 11 Home Space Mode: |
+| | | |
+| | | The register CR13 contains the home space |
+| | | address-space control element (HASCE), which |
+| | | points to the home space region/segment |
+| | | table origin. |
+| | | |
+| | | See "Address Spaces on Linux for s/390 & |
+| | | z/Architecture" below for more information |
+| | | about address space usage in Linux. |
++--------+----------------+-------------------------------------------------+
+| 18-19 | 18-19 | Condition codes (CC) |
++--------+----------------+-------------------------------------------------+
+| 20 | 20 | Fixed point overflow mask if 1=FPU exceptions |
+| | | for this event occur (normally 0) |
++--------+----------------+-------------------------------------------------+
+| 21 | 21 | Decimal overflow mask if 1=FPU exceptions for |
+| | | this event occur (normally 0) |
++--------+----------------+-------------------------------------------------+
+| 22 | 22 | Exponent underflow mask if 1=FPU exceptions |
+| | | for this event occur (normally 0) |
++--------+----------------+-------------------------------------------------+
+| 23 | 23 | Significance Mask if 1=FPU exceptions for this |
+| | | event occur (normally 0) |
++--------+----------------+-------------------------------------------------+
+| 24-31 | 24-30 | Reserved Must be 0. |
+| +----------------+-------------------------------------------------+
+| | 31 | Extended Addressing Mode |
+| +----------------+-------------------------------------------------+
+| | 32 | Basic Addressing Mode |
+| | | |
+| | | Used to set addressing mode |
+| | | |
+| | | +---------+----------+----------+ |
+| | | | PSW 31 | PSW 32 | | |
+| | | +---------+----------+----------+ |
+| | | | 0 | 0 | 24 bit | |
+| | | +---------+----------+----------+ |
+| | | | 0 | 1 | 31 bit | |
+| | | +---------+----------+----------+ |
+| | | | 1 | 1 | 64 bit | |
+| | | +---------+----------+----------+ |
++--------+----------------+-------------------------------------------------+
+| 32 | | 1=31 bit addressing mode 0=24 bit addressing |
+| | | mode (for backward compatibility), linux |
+| | | always runs with this bit set to 1 |
++--------+----------------+-------------------------------------------------+
+| 33-64 | | Instruction address. |
+| +----------------+-------------------------------------------------+
+| | 33-63 | Reserved must be 0 |
+| +----------------+-------------------------------------------------+
+| | 64-127 | Address |
+| | | |
+| | | - In 24 bits mode bits 64-103=0 bits 104-127 |
+| | | Address |
+| | | - In 31 bits mode bits 64-96=0 bits 97-127 |
+| | | Address |
+| | | |
+| | | Note: |
+| | | unlike 31 bit mode on s/390 bit 96 must be |
+| | | zero when loading the address with LPSWE |
+| | | otherwise a specification exception occurs, |
+| | | LPSW is fully backward compatible. |
++--------+----------------+-------------------------------------------------+
+
+Prefix Page(s)
+--------------
+This per cpu memory area is too intimately tied to the processor not to mention.
+It exists between the real addresses 0-4096 on s/390 and between 0-8192 on
+z/Architecture and is exchanged with one page on s/390 or two pages on
+z/Architecture in absolute storage by the set prefix instruction during Linux
+startup.
+
+This page is mapped to a different prefix for each processor in an SMP
+configuration (assuming the OS designer is sane of course).
+
+Bytes 0-512 (200 hex) on s/390 and 0-512, 4096-4544, 4604-5119 currently on
+z/Architecture are used by the processor itself for holding such information
+as exception indications and entry points for exceptions.
+
+Bytes after 0xc00 hex are used by linux for per processor globals on s/390 and
+z/Architecture (there is a gap on z/Architecture currently between 0xc00 and
+0x1000, too, which is used by Linux).
+
+The closest thing to this on traditional architectures is the interrupt
+vector table. This is a good thing & does simplify some of the kernel coding
+however it means that we now cannot catch stray NULL pointers in the
+kernel without hard coded checks.
+
+
+
+Address Spaces on Intel Linux
+=============================
+
+The traditional Intel Linux is approximately mapped as follows forgive
+the ascii art::
+
+ 0xFFFFFFFF 4GB Himem *****************
+ * *
+ * Kernel Space *
+ * *
+ ***************** ****************
+ User Space Himem * User Stack * * *
+ (typically 0xC0000000 3GB ) ***************** * *
+ * Shared Libs * * Next Process *
+ ***************** * to *
+ * * <== * Run * <==
+ * User Program * * *
+ * Data BSS * * *
+ * Text * * *
+ * Sections * * *
+ 0x00000000 ***************** ****************
+
+Now it is easy to see that on Intel it is quite easy to recognise a kernel
+address as being one greater than user space himem (in this case 0xC0000000),
+and addresses of less than this are the ones in the current running program on
+this processor (if an smp box).
+
+If using the virtual machine ( VM ) as a debugger it is quite difficult to
+know which user process is running as the address space you are looking at
+could be from any process in the run queue.
+
+The limitation of Intels addressing technique is that the linux
+kernel uses a very simple real address to virtual addressing technique
+of Real Address=Virtual Address-User Space Himem.
+This means that on Intel the kernel linux can typically only address
+Himem=0xFFFFFFFF-0xC0000000=1GB & this is all the RAM these machines
+can typically use.
+
+They can lower User Himem to 2GB or lower & thus be
+able to use 2GB of RAM however this shrinks the maximum size
+of User Space from 3GB to 2GB they have a no win limit of 4GB unless
+they go to 64 Bit.
+
+
+On 390 our limitations & strengths make us slightly different.
+For backward compatibility we are only allowed use 31 bits (2GB)
+of our 32 bit addresses, however, we use entirely separate address
+spaces for the user & kernel.
+
+This means we can support 2GB of non Extended RAM on s/390, & more
+with the Extended memory management swap device &
+currently 4TB of physical memory currently on z/Architecture.
+
+
+Address Spaces on Linux for s/390 & z/Architecture
+==================================================
+
+Our addressing scheme is basically as follows::
+
+ Primary Space Home Space
+ Himem 0x7fffffff 2GB on s/390 ***************** ****************
+ currently 0x3ffffffffff (2^42)-1 * User Stack * * *
+ on z/Architecture. ***************** * *
+ * Shared Libs * * *
+ ***************** * *
+ * * * Kernel *
+ * User Program * * *
+ * Data BSS * * *
+ * Text * * *
+ * Sections * * *
+ 0x00000000 ***************** ****************
+
+This also means that we need to look at the PSW problem state bit and the
+addressing mode to decide whether we are looking at user or kernel space.
+
+User space runs in primary address mode (or access register mode within
+the vdso code).
+
+The kernel usually also runs in home space mode, however when accessing
+user space the kernel switches to primary or secondary address mode if
+the mvcos instruction is not available or if a compare-and-swap (futex)
+instruction on a user space address is performed.
+
+When also looking at the ASCE control registers, this means:
+
+User space:
+
+- runs in primary or access register mode
+- cr1 contains the user asce
+- cr7 contains the user asce
+- cr13 contains the kernel asce
+
+Kernel space:
+
+- runs in home space mode
+- cr1 contains the user or kernel asce
+
+ - the kernel asce is loaded when a uaccess requires primary or
+ secondary address mode
+
+- cr7 contains the user or kernel asce, (changed with set_fs())
+- cr13 contains the kernel asce
+
+In case of uaccess the kernel changes to:
+
+- primary space mode in case of a uaccess (copy_to_user) and uses
+ e.g. the mvcp instruction to access user space. However the kernel
+ will stay in home space mode if the mvcos instruction is available
+- secondary space mode in case of futex atomic operations, so that the
+ instructions come from primary address space and data from secondary
+ space
+
+In case of KVM, the kernel runs in home space mode, but cr1 gets switched
+to contain the gmap asce before the SIE instruction gets executed. When
+the SIE instruction is finished, cr1 will be switched back to contain the
+user asce.
+
+
+Virtual Addresses on s/390 & z/Architecture
+===========================================
+
+A virtual address on s/390 is made up of 3 parts
+The SX (segment index, roughly corresponding to the PGD & PMD in Linux
+terminology) being bits 1-11.
+
+The PX (page index, corresponding to the page table entry (pte) in Linux
+terminology) being bits 12-19.
+
+The remaining bits BX (the byte index are the offset in the page )
+i.e. bits 20 to 31.
+
+On z/Architecture in linux we currently make up an address from 4 parts.
+
+- The region index bits (RX) 0-32 we currently use bits 22-32
+- The segment index (SX) being bits 33-43
+- The page index (PX) being bits 44-51
+- The byte index (BX) being bits 52-63
+
+Notes:
+ 1) s/390 has no PMD so the PMD is really the PGD also.
+ A lot of this stuff is defined in pgtable.h.
+
+ 2) Also seeing as s/390's page indexes are only 1k in size
+ (bits 12-19 x 4 bytes per pte ) we use 1 ( page 4k )
+ to make the best use of memory by updating 4 segment indices
+ entries each time we mess with a PMD & use offsets
+ 0,1024,2048 & 3072 in this page as for our segment indexes.
+ On z/Architecture our page indexes are now 2k in size
+ ( bits 12-19 x 8 bytes per pte ) we do a similar trick
+ but only mess with 2 segment indices each time we mess with
+ a PMD.
+
+ 3) As z/Architecture supports up to a massive 5-level page table lookup we
+ can only use 3 currently on Linux ( as this is all the generic kernel
+ currently supports ) however this may change in future
+ this allows us to access ( according to my sums )
+ 4TB of virtual storage per process i.e.
+ 4096*512(PTES)*1024(PMDS)*2048(PGD) = 4398046511104 bytes,
+ enough for another 2 or 3 of years I think :-).
+ to do this we use a region-third-table designation type in
+ our address space control registers.
+
+
+The Linux for s/390 & z/Architecture Kernel Task Structure
+==========================================================
+Each process/thread under Linux for S390 has its own kernel task_struct
+defined in linux/include/linux/sched.h
+The S390 on initialisation & resuming of a process on a cpu sets
+the __LC_KERNEL_STACK variable in the spare prefix area for this cpu
+(which we use for per-processor globals).
+
+The kernel stack pointer is intimately tied with the task structure for
+each processor as follows::
+
+ s/390
+ ************************
+ * 1 page kernel stack *
+ * ( 4K ) *
+ ************************
+ * 1 page task_struct *
+ * ( 4K ) *
+ 8K aligned ************************
+
+ z/Architecture
+ ************************
+ * 2 page kernel stack *
+ * ( 8K ) *
+ ************************
+ * 2 page task_struct *
+ * ( 8K ) *
+ 16K aligned ************************
+
+What this means is that we don't need to dedicate any register or global
+variable to point to the current running process & can retrieve it with the
+following very simple construct for s/390 & one very similar for
+z/Architecture::
+
+ static inline struct task_struct * get_current(void)
+ {
+ struct task_struct *current;
+ __asm__("lhi %0,-8192\n\t"
+ "nr %0,15"
+ : "=r" (current) );
+ return current;
+ }
+
+i.e. just anding the current kernel stack pointer with the mask -8192.
+Thankfully because Linux doesn't have support for nested IO interrupts
+& our devices have large buffers can survive interrupts being shut for
+short amounts of time we don't need a separate stack for interrupts.
+
+
+
+
+Register Usage & Stackframes on Linux for s/390 & z/Architecture
+=================================================================
+Overview:
+---------
+This is the code that gcc produces at the top & the bottom of
+each function. It usually is fairly consistent & similar from
+function to function & if you know its layout you can probably
+make some headway in finding the ultimate cause of a problem
+after a crash without a source level debugger.
+
+Note: To follow stackframes requires a knowledge of C or Pascal &
+limited knowledge of one assembly language.
+
+It should be noted that there are some differences between the
+s/390 and z/Architecture stack layouts as the z/Architecture stack layout
+didn't have to maintain compatibility with older linkage formats.
+
+Glossary:
+---------
+alloca:
+ This is a built in compiler function for runtime allocation
+ of extra space on the callers stack which is obviously freed
+ up on function exit ( e.g. the caller may choose to allocate nothing
+ of a buffer of 4k if required for temporary purposes ), it generates
+ very efficient code ( a few cycles ) when compared to alternatives
+ like malloc.
+
+automatics:
+ These are local variables on the stack, i.e they aren't in registers &
+ they aren't static.
+
+back-chain:
+ This is a pointer to the stack pointer before entering a
+ framed functions ( see frameless function ) prologue got by
+ dereferencing the address of the current stack pointer,
+ i.e. got by accessing the 32 bit value at the stack pointers
+ current location.
+
+base-pointer:
+ This is a pointer to the back of the literal pool which
+ is an area just behind each procedure used to store constants
+ in each function.
+
+call-clobbered:
+ The caller probably needs to save these registers if there
+ is something of value in them, on the stack or elsewhere before making a
+ call to another procedure so that it can restore it later.
+
+epilogue:
+ The code generated by the compiler to return to the caller.
+
+frameless-function:
+ A frameless function in Linux for s390 & z/Architecture is one which doesn't
+ need more than the register save area (96 bytes on s/390, 160 on z/Architecture)
+ given to it by the caller.
+
+ A frameless function never:
+
+ 1) Sets up a back chain.
+ 2) Calls alloca.
+ 3) Calls other normal functions
+ 4) Has automatics.
+
+GOT-pointer:
+ This is a pointer to the global-offset-table in ELF
+ ( Executable Linkable Format, Linux'es most common executable format ),
+ all globals & shared library objects are found using this pointer.
+
+lazy-binding
+ ELF shared libraries are typically only loaded when routines in the shared
+ library are actually first called at runtime. This is lazy binding.
+
+procedure-linkage-table
+ This is a table found from the GOT which contains pointers to routines
+ in other shared libraries which can't be called to by easier means.
+
+prologue:
+ The code generated by the compiler to set up the stack frame.
+
+outgoing-args:
+ This is extra area allocated on the stack of the calling function if the
+ parameters for the callee's cannot all be put in registers, the same
+ area can be reused by each function the caller calls.
+
+routine-descriptor:
+ A COFF executable format based concept of a procedure reference
+ actually being 8 bytes or more as opposed to a simple pointer to the routine.
+ This is typically defined as follows:
+
+ - Routine Descriptor offset 0=Pointer to Function
+ - Routine Descriptor offset 4=Pointer to Table of Contents
+
+ The table of contents/TOC is roughly equivalent to a GOT pointer.
+ & it means that shared libraries etc. can be shared between several
+ environments each with their own TOC.
+
+static-chain:
+ This is used in nested functions a concept adopted from pascal
+ by gcc not used in ansi C or C++ ( although quite useful ), basically it
+ is a pointer used to reference local variables of enclosing functions.
+ You might come across this stuff once or twice in your lifetime.
+
+ e.g.
+
+ The function below should return 11 though gcc may get upset & toss warnings
+ about unused variables::
+
+ int FunctionA(int a)
+ {
+ int b;
+ FunctionC(int c)
+ {
+ b=c+1;
+ }
+ FunctionC(10);
+ return(b);
+ }
+
+
+s/390 & z/Architecture Register usage
+=====================================
+
+======== ========================================== ===============
+r0 used by syscalls/assembly call-clobbered
+r1 used by syscalls/assembly call-clobbered
+r2 argument 0 / return value 0 call-clobbered
+r3 argument 1 / return value 1 (if long long) call-clobbered
+r4 argument 2 call-clobbered
+r5 argument 3 call-clobbered
+r6 argument 4 saved
+r7 pointer-to arguments 5 to ... saved
+r8 this & that saved
+r9 this & that saved
+r10 static-chain ( if nested function ) saved
+r11 frame-pointer ( if function used alloca ) saved
+r12 got-pointer saved
+r13 base-pointer saved
+r14 return-address saved
+r15 stack-pointer saved
+
+f0 argument 0 / return value ( float/double ) call-clobbered
+f2 argument 1 call-clobbered
+f4 z/Architecture argument 2 saved
+f6 z/Architecture argument 3 saved
+======== ========================================== ===============
+
+The remaining floating points
+f1,f3,f5 f7-f15 are call-clobbered.
+
+Notes:
+------
+1) The only requirement is that registers which are used
+ by the callee are saved, e.g. the compiler is perfectly
+ capable of using r11 for purposes other than a frame a
+ frame pointer if a frame pointer is not needed.
+2) In functions with variable arguments e.g. printf the calling procedure
+ is identical to one without variable arguments & the same number of
+ parameters. However, the prologue of this function is somewhat more
+ hairy owing to it having to move these parameters to the stack to
+ get va_start, va_arg & va_end to work.
+3) Access registers are currently unused by gcc but are used in
+ the kernel. Possibilities exist to use them at the moment for
+ temporary storage but it isn't recommended.
+4) Only 4 of the floating point registers are used for
+ parameter passing as older machines such as G3 only have only 4
+ & it keeps the stack frame compatible with other compilers.
+ However with IEEE floating point emulation under linux on the
+ older machines you are free to use the other 12.
+5) A long long or double parameter cannot be have the
+ first 4 bytes in a register & the second four bytes in the
+ outgoing args area. It must be purely in the outgoing args
+ area if crossing this boundary.
+6) Floating point parameters are mixed with outgoing args
+ on the outgoing args area in the order the are passed in as parameters.
+7) Floating point arguments 2 & 3 are saved in the outgoing args area for
+ z/Architecture
+
+
+Stack Frame Layout
+------------------
+
+========= ============== ======================================================
+s/390 z/Architecture
+========= ============== ======================================================
+0 0 back chain ( a 0 here signifies end of back chain )
+4 8 eos ( end of stack, not used on Linux for S390 used
+ in other linkage formats )
+8 16 glue used in other s/390 linkage formats for saved
+ routine descriptors etc.
+12 24 glue used in other s/390 linkage formats for saved
+ routine descriptors etc.
+16 32 scratch area
+20 40 scratch area
+24 48 saved r6 of caller function
+28 56 saved r7 of caller function
+32 64 saved r8 of caller function
+36 72 saved r9 of caller function
+40 80 saved r10 of caller function
+44 88 saved r11 of caller function
+48 96 saved r12 of caller function
+52 104 saved r13 of caller function
+56 112 saved r14 of caller function
+60 120 saved r15 of caller function
+64 128 saved f4 of caller function
+72 132 saved f6 of caller function
+80 undefined
+96 160 outgoing args passed from caller to callee
+96+x 160+x possible stack alignment ( 8 bytes desirable )
+96+x+y 160+x+y alloca space of caller ( if used )
+96+x+y+z 160+x+y+z automatics of caller ( if used )
+0 back-chain
+========= ============== ======================================================
+
+A sample program with comments.
+===============================
+
+Comments on the function test
+-----------------------------
+1) It didn't need to set up a pointer to the constant pool gpr13 as it is not
+ used ( :-( ).
+2) This is a frameless function & no stack is bought.
+3) The compiler was clever enough to recognise that it could return the
+ value in r2 as well as use it for the passed in parameter ( :-) ).
+4) The basr ( branch relative & save ) trick works as follows the instruction
+ has a special case with r0,r0 with some instruction operands is understood as
+ the literal value 0, some risc architectures also do this ). So now
+ we are branching to the next address & the address new program counter is
+ in r13,so now we subtract the size of the function prologue we have executed
+ the size of the literal pool to get to the top of the literal pool::
+
+
+ 0040037c int test(int b)
+ { # Function prologue below
+ 40037c: 90 de f0 34 stm %r13,%r14,52(%r15) # Save registers r13 & r14
+ 400380: 0d d0 basr %r13,%r0 # Set up pointer to constant pool using
+ 400382: a7 da ff fa ahi %r13,-6 # basr trick
+ return(5+b);
+ # Huge main program
+ 400386: a7 2a 00 05 ahi %r2,5 # add 5 to r2
+
+ # Function epilogue below
+ 40038a: 98 de f0 34 lm %r13,%r14,52(%r15) # restore registers r13 & 14
+ 40038e: 07 fe br %r14 # return
+ }
+
+Comments on the function main
+-----------------------------
+1) The compiler did this function optimally ( 8-) )::
+
+ Literal pool for main.
+ 400390: ff ff ff ec .long 0xffffffec
+ main(int argc,char *argv[])
+ { # Function prologue below
+ 400394: 90 bf f0 2c stm %r11,%r15,44(%r15) # Save necessary registers
+ 400398: 18 0f lr %r0,%r15 # copy stack pointer to r0
+ 40039a: a7 fa ff a0 ahi %r15,-96 # Make area for callee saving
+ 40039e: 0d d0 basr %r13,%r0 # Set up r13 to point to
+ 4003a0: a7 da ff f0 ahi %r13,-16 # literal pool
+ 4003a4: 50 00 f0 00 st %r0,0(%r15) # Save backchain
+
+ return(test(5)); # Main Program Below
+ 4003a8: 58 e0 d0 00 l %r14,0(%r13) # load relative address of test from
+ # literal pool
+ 4003ac: a7 28 00 05 lhi %r2,5 # Set first parameter to 5
+ 4003b0: 4d ee d0 00 bas %r14,0(%r14,%r13) # jump to test setting r14 as return
+ # address using branch & save instruction.
+
+ # Function Epilogue below
+ 4003b4: 98 bf f0 8c lm %r11,%r15,140(%r15)# Restore necessary registers.
+ 4003b8: 07 fe br %r14 # return to do program exit
+ }
+
+
+Compiler updates
+----------------
+
+::
+
+ main(int argc,char *argv[])
+ {
+ 4004fc: 90 7f f0 1c stm %r7,%r15,28(%r15)
+ 400500: a7 d5 00 04 bras %r13,400508 <main+0xc>
+ 400504: 00 40 04 f4 .long 0x004004f4
+ # compiler now puts constant pool in code to so it saves an instruction
+ 400508: 18 0f lr %r0,%r15
+ 40050a: a7 fa ff a0 ahi %r15,-96
+ 40050e: 50 00 f0 00 st %r0,0(%r15)
+ return(test(5));
+ 400512: 58 10 d0 00 l %r1,0(%r13)
+ 400516: a7 28 00 05 lhi %r2,5
+ 40051a: 0d e1 basr %r14,%r1
+ # compiler adds 1 extra instruction to epilogue this is done to
+ # avoid processor pipeline stalls owing to data dependencies on g5 &
+ # above as register 14 in the old code was needed directly after being loaded
+ # by the lm %r11,%r15,140(%r15) for the br %14.
+ 40051c: 58 40 f0 98 l %r4,152(%r15)
+ 400520: 98 7f f0 7c lm %r7,%r15,124(%r15)
+ 400524: 07 f4 br %r4
+ }
+
+
+Hartmut ( our compiler developer ) also has been threatening to take out the
+stack backchain in optimised code as this also causes pipeline stalls, you
+have been warned.
+
+64 bit z/Architecture code disassembly
+--------------------------------------
+
+If you understand the stuff above you'll understand the stuff
+below too so I'll avoid repeating myself & just say that
+some of the instructions have g's on the end of them to indicate
+they are 64 bit & the stack offsets are a bigger,
+the only other difference you'll find between 32 & 64 bit is that
+we now use f4 & f6 for floating point arguments on 64 bit::
+
+ 00000000800005b0 <test>:
+ int test(int b)
+ {
+ return(5+b);
+ 800005b0: a7 2a 00 05 ahi %r2,5
+ 800005b4: b9 14 00 22 lgfr %r2,%r2 # downcast to integer
+ 800005b8: 07 fe br %r14
+ 800005ba: 07 07 bcr 0,%r7
+
+
+ }
+
+ 00000000800005bc <main>:
+ main(int argc,char *argv[])
+ {
+ 800005bc: eb bf f0 58 00 24 stmg %r11,%r15,88(%r15)
+ 800005c2: b9 04 00 1f lgr %r1,%r15
+ 800005c6: a7 fb ff 60 aghi %r15,-160
+ 800005ca: e3 10 f0 00 00 24 stg %r1,0(%r15)
+ return(test(5));
+ 800005d0: a7 29 00 05 lghi %r2,5
+ # brasl allows jumps > 64k & is overkill here bras would do fune
+ 800005d4: c0 e5 ff ff ff ee brasl %r14,800005b0 <test>
+ 800005da: e3 40 f1 10 00 04 lg %r4,272(%r15)
+ 800005e0: eb bf f0 f8 00 04 lmg %r11,%r15,248(%r15)
+ 800005e6: 07 f4 br %r4
+ }
+
+
+
+Compiling programs for debugging on Linux for s/390 & z/Architecture
+====================================================================
+-gdwarf-2 now works it should be considered the default debugging
+format for s/390 & z/Architecture as it is more reliable for debugging
+shared libraries, normal -g debugging works much better now
+Thanks to the IBM java compiler developers bug reports.
+
+This is typically done adding/appending the flags -g or -gdwarf-2 to the
+CFLAGS & LDFLAGS variables Makefile of the program concerned.
+
+If using gdb & you would like accurate displays of registers &
+stack traces compile without optimisation i.e make sure
+that there is no -O2 or similar on the CFLAGS line of the Makefile &
+the emitted gcc commands, obviously this will produce worse code
+( not advisable for shipment ) but it is an aid to the debugging process.
+
+This aids debugging because the compiler will copy parameters passed in
+in registers onto the stack so backtracing & looking at passed in
+parameters will work, however some larger programs which use inline functions
+will not compile without optimisation.
+
+Debugging with optimisation has since much improved after fixing
+some bugs, please make sure you are using gdb-5.0 or later developed
+after Nov'2000.
+
+
+
+Debugging under VM
+==================
+
+Notes
+-----
+Addresses & values in the VM debugger are always hex never decimal
+Address ranges are of the format <HexValue1>-<HexValue2> or
+<HexValue1>.<HexValue2>
+For example, the address range 0x2000 to 0x3000 can be described as 2000-3000
+or 2000.1000
+
+The VM Debugger is case insensitive.
+
+VM's strengths are usually other debuggers weaknesses you can get at any
+resource no matter how sensitive e.g. memory management resources, change
+address translation in the PSW. For kernel hacking you will reap dividends if
+you get good at it.
+
+The VM Debugger displays operators but not operands, and also the debugger
+displays useful information on the same line as the author of the code probably
+felt that it was a good idea not to go over the 80 columns on the screen.
+This isn't as unintuitive as it may seem as the s/390 instructions are easy to
+decode mentally and you can make a good guess at a lot of them as all the
+operands are nibble (half byte aligned).
+So if you have an objdump listing by hand, it is quite easy to follow, and if
+you don't have an objdump listing keep a copy of the s/390 Reference Summary