summaryrefslogtreecommitdiffstats
path: root/kernel/resource.c
AgeCommit message (Collapse)Author
2013-04-29mem hotunplug: fix kfree() of bootmem memoryYasuaki Ishimatsu
When hot removing memory presented at boot time, following messages are shown: kernel BUG at mm/slub.c:3409! invalid opcode: 0000 [#1] SMP Modules linked in: ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc ipmi_devintf ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core e1000e ptp pps_core tpm_infineon ioatdma dca sr_mod cdrom sd_mod crc_t10dif usb_storage megaraid_sas lpfc scsi_transport_fc scsi_tgt scsi_mod CPU 0 Pid: 5091, comm: kworker/0:2 Tainted: G W 3.9.0-rc6+ #15 RIP: kfree+0x232/0x240 Process kworker/0:2 (pid: 5091, threadinfo ffff88084678c000, task ffff88083928ca80) Call Trace: __release_region+0xd4/0xe0 __remove_pages+0x52/0x110 arch_remove_memory+0x89/0xd0 remove_memory+0xc4/0x100 acpi_memory_device_remove+0x6d/0xb1 acpi_device_remove+0x89/0xab __device_release_driver+0x7c/0xf0 device_release_driver+0x2f/0x50 acpi_bus_device_detach+0x6c/0x70 acpi_ns_walk_namespace+0x11a/0x250 acpi_walk_namespace+0xee/0x137 acpi_bus_trim+0x33/0x7a acpi_bus_hot_remove_device+0xc4/0x1a1 acpi_os_execute_deferred+0x27/0x34 process_one_work+0x1f7/0x590 worker_thread+0x11a/0x370 kthread+0xee/0x100 ret_from_fork+0x7c/0xb0 RIP [<ffffffff811c41d2>] kfree+0x232/0x240 RSP <ffff88084678d968> The reason why the messages are shown is to release a resource structure, allocated by bootmem, by kfree(). So when we release a resource structure, we should check whether it is allocated by bootmem or not. But even if we know a resource structure is allocated by bootmem, we cannot release it since SLxB cannot treat it. So for reusing a resource structure, this patch remembers it by using bootmem_resource as follows: When releasing a resource structure by free_resource(), free_resource() checks whether the resource structure is allocated by bootmem or not. If it is allocated by bootmem, free_resource() adds it to bootmem_resource. If it is not allocated by bootmem, free_resource() release it by kfree(). And when getting a new resource structure by get_resource(), get_resource() checks whether bootmem_resource has released resource structures or not. If there is a released resource structure, get_resource() returns it. If there is not a releaed resource structure, get_resource() returns new resource structure allocated by kzalloc(). [akpm@linux-foundation.org: s/get_resource/alloc_resource/] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29resource: add release_mem_region_adjustable()Toshi Kani
Add release_mem_region_adjustable(), which releases a requested region from a currently busy memory resource. This interface adjusts the matched memory resource accordingly even if the requested region does not match exactly but still fits into. This new interface is intended for memory hot-delete. During bootup, memory resources are inserted from the boot descriptor table, such as EFI Memory Table and e820. Each memory resource entry usually covers the whole contigous memory range. Memory hot-delete request, on the other hand, may target to a particular range of memory resource, and its size can be much smaller than the whole contiguous memory. Since the existing release interfaces like __release_region() require a requested region to be exactly matched to a resource entry, they do not allow a partial resource to be released. This new interface is restrictive (i.e. release under certain conditions), which is consistent with other release interfaces, __release_region() and __release_resource(). Additional release conditions, such as an overlapping region to a resource entry, can be supported after they are confirmed as valid cases. There is no change to the existing interfaces since their restriction is valid for I/O resources. [akpm@linux-foundation.org: use GFP_ATOMIC under write_lock()] [akpm@linux-foundation.org: switch back to GFP_KERNEL, less buggily] [akpm@linux-foundation.org: remove unneeded and wrong kfree(), per Toshi] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Cc: T Makphaibulchoke <tmac@hp.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29resource: add __adjust_resource() for internal useToshi Kani
Add __adjust_resource(), which is called by adjust_resource() internally after the resource_lock is held. There is no interface change to adjust_resource(). This change allows other functions to call __adjust_resource() internally while the resource_lock is held. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: T Makphaibulchoke <tmac@hp.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06kernel/resource.c: fix stack overflow in __reserve_region_with_split()T Makphaibulchoke
Using a recursive call add a non-conflicting region in __reserve_region_with_split() could result in a stack overflow in the case that the recursive calls are too deep. Convert the recursive calls to an iterative loop to avoid the problem. Tested on a machine containing 135 regions. The kernel no longer panicked with stack overflow. Also tested with code arbitrarily adding regions with no conflict, embedding two consecutive conflicts and embedding two non-consecutive conflicts. Signed-off-by: T Makphaibulchoke <tmac@hp.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@gmail.com> Cc: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30resource: make sure requested range is included in the root rangeOctavian Purdila
When the requested range is outside of the root range the logic in __reserve_region_with_split will cause an infinite recursion which will overflow the stack as seen in the warning bellow. This particular stack overflow was caused by requesting the (100000000-107ffffff) range while the root range was (0-ffffffff). In this case __request_resource would return the whole root range as conflict range (i.e. 0-ffffffff). Then, the logic in __reserve_region_with_split would continue the recursion requesting the new range as (conflict->end+1, end) which incidentally in this case equals the originally requested range. This patch aborts looking for an usable range when the request does not intersect with the root range. When the request partially overlaps with the root range, it ajust the request to fall in the root range and then continues with the new request. When the request is modified or aborted errors and a stack trace are logged to allow catching the errors in the upper layers. [ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89() [ 5.975150] Modules linked in: [ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46 [ 5.985324] Call Trace: [ 5.987759] [<c1039dfc>] ? console_unlock+0x17b/0x18d [ 5.992891] [<c1039620>] warn_slowpath_common+0x48/0x5d [ 5.998194] [<c1031758>] ? sub_preempt_count+0x63/0x89 [ 6.003412] [<c1039644>] warn_slowpath_null+0xf/0x13 [ 6.008453] [<c1031758>] sub_preempt_count+0x63/0x89 [ 6.013499] [<c14d60c4>] _raw_spin_unlock+0x27/0x3f [ 6.018453] [<c10c6349>] add_partial+0x36/0x3b [ 6.022973] [<c10c7c0a>] deactivate_slab+0x96/0xb4 [ 6.027842] [<c14cf9d9>] __slab_alloc.isra.54.constprop.63+0x204/0x241 [ 6.034456] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.039842] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.045232] [<c10c7dc9>] kmem_cache_alloc_trace+0x51/0xb0 [ 6.050710] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38 [ 6.056100] [<c103f78f>] kzalloc.constprop.5+0x29/0x38 [ 6.061320] [<c17b45e9>] __reserve_region_with_split+0x1c/0xd1 [ 6.067230] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1 ... [ 7.179057] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1 [ 7.184970] [<c17b4779>] reserve_region_with_split+0x30/0x42 [ 7.190709] [<c17a8ebf>] e820_reserve_resources_late+0xd1/0xe9 [ 7.196623] [<c17c9526>] pcibios_resource_survey+0x23/0x2a [ 7.202184] [<c17cad8a>] pcibios_init+0x23/0x35 [ 7.206789] [<c17ca574>] pci_subsys_init+0x3f/0x44 [ 7.211659] [<c1002088>] do_one_initcall+0x72/0x122 [ 7.216615] [<c17ca535>] ? pci_legacy_init+0x3d/0x3d [ 7.221659] [<c17a27ff>] kernel_init+0xa6/0x118 [ 7.226265] [<c17a2759>] ? start_kernel+0x334/0x334 [ 7.231223] [<c14d7482>] kernel_thread_helper+0x6/0x10 Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-13resources: allow adjust_resource() for resources with no parentYinghai Lu
If a resource has no parent, allow its start/end to be set arbitrarily as long as any children are still contained within the new range. [bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-05-31kernel/resource.c: correct the comment of allocate_resource()Wei Yang
In the comment of allocate_resource(), the explanation of parameter max and min is not correct. Actually, these two parameters are used to specify the range of the resource that will be allocated, not the min/max size that will be allocated. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-03kernel/resource.c: move EXPORT_SYMBOL right after definitionCong Wang
EXPORT_SYMBOL(adjust_resource) should be right after adjust_resource(). Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-10-31kernel: Map most files to use export.h instead of module.hPaul Gortmaker
The changed files were only including linux/module.h for the EXPORT_SYMBOL infrastructure, and nothing else. Revector them onto the isolated export header for faster compile times. Nothing to see here but a whole lot of instances of: -#include <linux/module.h> +#include <linux/export.h> This commit is only changing the kernel dir; next targets will probably be mm, fs, the arch dirs, etc. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-29Resource: fix wrong resource window calculationRam Pai
__find_resource() incorrectly returns a resource window which overlaps an existing allocated window. This happens when the parent's resource-window spans 0x00000000 to 0xffffffff and is entirely allocated to all its children resource-windows. __find_resource() looks for gaps in resource allocation among the children resource windows. When it encounters the last child window it blindly tries the range next to one allocated to the last child. Since the last child's window ends at 0xffffffff the calculation overflows, leading the algorithm to believe that any window in the range 0x0000000 to 0xfffffff is available for allocation. This leads to a conflicting window allocation. Michal Ludvig reported this issue seen on his platform. The following patch fixes the problem and has been verified by Michal. I believe this bug has been there for ages. It got exposed by git commit 2bbc6942273b ("PCI : ability to relocate assigned pci-resources") Signed-off-by: Ram Pai <linuxram@us.ibm.com> Tested-by: Michal Ludvig <mludvig@logix.net.nz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-30resources: Add lookup_resource()Geert Uytterhoeven
Add a function to find an existing resource by a resource start address. This allows to implement simple allocators (with a malloc/free-alike API) on top of the resource system. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2011-07-06resource: ability to resize an allocated resourceRam Pai
Provides the ability to resize a resource that is already allocated. This functionality is put in place to support reallocation needs of pci resources. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-17resources: add arch hook for preventing allocation in reserved areasBjorn Helgaas
This adds arch_remove_reservations(), which an arch can implement if it needs to protect part of the address space from allocation. Sometimes that can be done by just putting a region in the resource tree, but there are cases where that doesn't work well. For example, x86 BIOS E820 reservations are not related to devices, so they may overlap part of, all of, or more than a device resource, so they may not end up at the correct spot in the resource tree. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17Revert "resources: support allocating space within a region from the top down"Bjorn Helgaas
This reverts commit e7f8567db9a7f6b3151b0b275e245c1cef0d9c70. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-28Merge branch 'linux-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits) x86: allocate space within a region top-down x86: update iomem_resource end based on CPU physical address capabilities x86/PCI: allocate space from the end of a region, not the beginning PCI: allocate bus resources from the top down resources: support allocating space within a region from the top down resources: handle overflow when aligning start of available area resources: ensure callback doesn't allocate outside available space resources: factor out resource_clip() to simplify find_resource() resources: add a default alignf to simplify find_resource() x86/PCI: MMCONFIG: fix region end calculation PCI: Add support for polling PME state on suspended legacy PCI devices PCI: Export some PCI PM functionality PCI: fix message typo PCI: log vendor/device ID always PCI: update Intel chipset names and defines PCI: use new ccflags variable in Makefile PCI: add PCI_MSIX_TABLE/PBA defines PCI: add PCI vendor id for STmicroelectronics x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs PCI: OLPC: Only enable PCI configuration type override on XO-1 ...
2010-10-27kernel/resource.c: handle reinsertion of an already-inserted resourceHuang Shijie
If the same resource is inserted to the resource tree (maybe not on purpose), a dead loop will be created. In this situation, The kernel does not report any warning or error :( The command below will show a endless print. #cat /proc/iomem [akpm@linux-foundation.org: add WARN_ON()] Signed-off-by: Huang Shijie <shijie8@gmail.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26resources: support allocating space within a region from the top downBjorn Helgaas
Allocate space from the top of a region first, then work downward, if an architecture desires this. When we allocate space from a resource, we look for gaps between children of the resource. Previously, we always looked at gaps from the bottom up. For example, given this: [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00 [mem 0xbff00000-0xbfffffff] gap -- available [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02 [mem 0xe0000000-0xf7ffffff] gap -- available we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first, then the [mem 0xe0000000-0xf7ffffff] gap. With this patch an architecture can choose to allocate from the top gap [mem 0xe0000000-0xf7ffffff] first. We can't do this across the board because iomem_resource.end is initialized to 0xffffffff_ffffffff on 64-bit architectures, and most machines can't address the entire 64-bit physical address space. Therefore, we only allocate top-down if the arch requests it by clearing "resource_alloc_from_bottom". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: handle overflow when aligning start of available areaBjorn Helgaas
If tmp.start is near ~0, ALIGN(tmp.start) may overflow, which would make us think there's more available space than there really is. We would likely return something that conflicts with a previous resource, which would cause a failure when allocate_resource() requests the newly- allocated region. Reference: https://bugzilla.redhat.com/show_bug.cgi?id=646027 Reported-by: Fabrice Bellet <fabrice@bellet.info> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: ensure callback doesn't allocate outside available spaceBjorn Helgaas
The alignment callback returns a proposed location, which may have been adjusted to avoid ISA aliases or for other architecture-specific reasons. We already had a check ("tmp.start < tmp.end") to make sure the callback doesn't return an area that extends past the available area. This patch reworks the check to make sure it doesn't return an area that extends either below or above the available area. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: factor out resource_clip() to simplify find_resource()Bjorn Helgaas
This factors out the min/max clipping to simplify find_resource(). No functional change. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26resources: add a default alignf to simplify find_resource()Bjorn Helgaas
This removes a test from find_resource(), which is getting cluttered. No functional change. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-05-11resource: shared I/O region supportAlan Cox
SuperIO devices share regions and use lock/unlock operations to chip select. We therefore need to be able to request a resource and wait for it to be freed by whichever other SuperIO device currently hogs it. Right now you have to poll which is horrible. Add a MUXED field to IO port resources. If the MUXED field is set on the resource and on the request (via request_muxed_region) then we block until the previous owner of the muxed resource releases their region. This allows us to implement proper resource sharing and locking for superio chips using code of the form enable_my_superio_dev() { request_muxed_region(0x44, 0x02, "superio:watchdog"); outb() ..sequence to enable chip } disable_my_superio_dev() { outb() .. sequence of disable chip release_region(0x44, 0x02); } Signed-off-by: Giel van Schijndel <me@mortis.eu> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-03-23resources: add interfaces that return conflict informationBjorn Helgaas
request_resource() and insert_resource() only return success or failure, which no information about what existing resource conflicted with the proposed new reservation. This patch adds request_resource_conflict() and insert_resource_conflict(), which return the conflicting resource. Callers may use this for better error messages or to adjust the new resource and retry the request. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-03-03Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: resource: Fix broken indentation resource: Fix generic page_is_ram() for partial RAM pages x86, paravirt: Remove kmap_atomic_pte paravirt op. x86, vmi: Disable highmem PTE allocation even when CONFIG_HIGHPTE=y x86, xen: Disable highmem PTE allocation even when CONFIG_HIGHPTE=y
2010-03-02resource: Fix broken indentationH. Peter Anvin
Fix broken indentation in patch 37b99dd5372cff42f83210c280f314f10f99138e. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Wu Fengguang <fengguang.wu@intel.com> LKML-Reference: <20100301135551.GA9998@localhost> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-01resource: Fix generic page_is_ram() for partial RAM pagesWu Fengguang
The System RAM walk shall skip partial RAM pages and avoid calling func() on them. So that page_is_ram() return 0 for a partial RAM page. In particular, it shall not call func() with len=0. This fixes a boot time bug reported by Sachin and root caused by Thomas: > >>> WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x169/0x2f1() > >>> Hardware name: BladeCenter LS21 -[79716AA]- > >>> Modules linked in: > >>> Pid: 0, comm: swapper Not tainted 2.6.33-git6-autotest #1 > >>> Call Trace: > >>> [<ffffffff81047cff>] ? __ioremap_caller+0x169/0x2f1 > >>> [<ffffffff81063b7d>] warn_slowpath_common+0x77/0xa4 > >>> [<ffffffff81063bb9>] warn_slowpath_null+0xf/0x11 > >>> [<ffffffff81047cff>] __ioremap_caller+0x169/0x2f1 > >>> [<ffffffff813747a3>] ? acpi_os_map_memory+0x12/0x1b > >>> [<ffffffff81047f10>] ioremap_nocache+0x12/0x14 > >>> [<ffffffff813747a3>] acpi_os_map_memory+0x12/0x1b > >>> [<ffffffff81282fa0>] acpi_tb_verify_table+0x29/0x5b > >>> [<ffffffff812827f0>] acpi_load_tables+0x39/0x15a > >>> [<ffffffff8191c8f8>] acpi_early_init+0x60/0xf5 > >>> [<ffffffff818f2cad>] start_kernel+0x397/0x3a7 > >>> [<ffffffff818f2295>] x86_64_start_reservations+0xa5/0xa9 > >>> [<ffffffff818f237a>] x86_64_start_kernel+0xe1/0xe8 > >>> ---[ end trace 4eaa2a86a8e2da22 ]--- > >>> ioremap reserve_memtype failed -22 The return code is -EINVAL, so it failed in the is_ram check, which is not too surprising > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 000000000009c000 (usable) > BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 00000000cffa3900 (usable) > BIOS-e820: 00000000cffa3900 - 00000000cffa7400 (ACPI data) The ACPI data is not starting on a page boundary and neither does the usable RAM area end on a page boundary. Very useful ! > ACPI: DSDT 00000000cffa3900 036CE (v01 IBM SERLEWIS 00001000 INTL 20060912) ACPI is trying to map DSDT at cffa3900, which results in a check vs. cffa3000 which is the relevant page boundary. The generic is_ram check correctly identifies that as RAM because it's in the usable resource area. The old e820 based is_ram check does not take overlapping resource areas into account. That's why it works. CC: Sachin Sant <sachinp@in.ibm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> LKML-Reference: <20100301135551.GA9998@localhost> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-28Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mm: Unify kernel_physical_mapping_init() API x86, mm: Allow highmem user page tables to be disabled at boot time x86: Do not reserve brk for DMI if it's not going to be used x86: Convert tlbstate_lock to raw_spinlock x86: Use the generic page_is_ram() x86: Remove BIOS data range from e820 Move page_is_ram() declaration to mm.h Generic page_is_ram: use __weak resources: introduce generic page_is_ram()
2010-02-22resource: add release_child_resourcesYinghai Lu
Useful for freeing a portion of the resource tree, e.g. when trying to reallocate resources more efficiently. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22resource/PCI: mark struct resource as constDominik Brodowski
Now that we return the new resource start position, there is no need to update "struct resource" inside the align function. Therefore, mark the struct resource as const. Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22resource/PCI: align functions now return start of resourceDominik Brodowski
As suggested by Linus, align functions should return the start of a resource, not void. An update of "res->start" is no longer necessary. Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-17Merge branch 'linus' into x86/mmThomas Gleixner
x86/mm is on 32-rc4 and missing the spinlock namespace changes which are needed for further commits into this topic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-01Generic page_is_ram: use __weakAndrew Morton
Use __weak instead of __attribute__((weak)). Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-01resources: introduce generic page_is_ram()Wu Fengguang
It's based on walk_system_ram_range(), for archs that don't have their own page_is_ram(). The static verions in MIPS and SCORE are also made global. v4: prefer plain 1 instead of PAGE_IS_RAM (H. Peter Anvin) v3: add comment (KAMEZAWA Hiroyuki) "AFAIK, this "System RAM" information has been used for kdump to grab valid memory area and seems good for the kernel itself." v2: add PAGE_IS_RAM macro (Américo Wang) Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: linux-mips@linux-mips.org Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> LKML-Reference: <20100122081619.GA6431@localhost> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-21resources: fix call to alignf() in allocate_resource()Dominik Brodowski
The second parameter to alignf() in allocate_resource() must reflect what new resource is attempted to be allocated, else functions like pcibios_align_resource() (at least on x86) or pcmcia_align() can't work correctly. Commit 1e5ad9679016275d422e36b12a98b0927d76f556 broke this by setting the "new" resource until we're about to return success. To keep the resource untouched when allocate_resource() fails, a "tmp" resource is introduced. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-04resources: when allocate_resource() fails, leave resource untouchedBjorn Helgaas
When "allocate_resource(root, new, size, ...)" fails, we currently clobber "new". This is inconvenient for the caller, who might care about the original contents of the resource. For example, when pci_bus_alloc_resource() fails, the "can't allocate mem resource %pR" message from pci_assign_resources() currently contains junk for the resource start/end. This patch delays the "new" update until we're about to return success. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-23walk system ram rangeKAMEZAWA Hiroyuki
Originally, walk_memory_resource() was introduced to traverse all memory of "System RAM" for detecting memory hotplug/unplug range. For doing so, flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for memory hotplug. But for using other purpose, /proc/kcore, this may includes some firmware area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the check strict to find out busy "System RAM". Note: PPC64 keeps their own walk_memory_resouce(), which walk through ppc64's lmb informaton. Because old kclist_add() is called per lmb, this patch makes no difference in behavior, finally. And this patch removes CONFIG_MEMORY_HOTPLUG check from this function. Because pfn_valid() just show "there is memmap or not* and cannot be used for "there is physical memory or not", this function is useful in generic to scan physical memory range. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30kernel/resource.c: fix sign extension in reserve_setup()Zhang Rui
When the 32-bit signed quantities get assigned to the u64 resource_size_t, they are incorrectly sign-extended. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13253 Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9905 Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reported-by: Leann Ogasawara <leann@ubuntu.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Reported-by: <pablomme@googlemail.com> Tested-by: <pablomme@googlemail.com> Cc: <stable@kernel.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-18Remove 'recurse into child resources' logic from 'reserve_region_with_split()'Linus Torvalds
This function is not actually used right now, since the original use case for it was done with insert_resource_expand_to_fit() instead. However, we now have another usage case that wants to basically do a "reserve IO resource, splitting around existing resources", however that one doesn't actually want the "recurse into the conflicting resource" logic at all. And since recursing into the conflicting resource was the most complex part, and isn't wanted, just remove it. Maybe we'll some day want both versions, but we can just resurrect the logic then. Tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15resources: fix parameter name and kernel-docRandy Dunlap
Fix __request_region() parameter kernel-doc notation and parameter name: Warning(linux-2.6.28-git10//kernel/resource.c:627): No description found for parameter 'flags' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07resource: allow MMIO exclusivity for device driversArjan van de Ven
Device drivers that use pci_request_regions() (and similar APIs) have a reasonable expectation that they are the only ones accessing their device. As part of the e1000e hunt, we were afraid that some userland (X or some bootsplash stuff) was mapping the MMIO region that the driver thought it had exclusively via /dev/mem or via various sysfs resource mappings. This patch adds the option for device drivers to cause their reserved regions to the "banned from /dev/mem use" list, so now both kernel memory and device-exclusive MMIO regions are banned. NOTE: This is only active when CONFIG_STRICT_DEVMEM is set. In addition to the config option, a kernel parameter iomem=relaxed is provided for the cases where developers want to diagnose, in the field, drivers issues from userspace. Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-12-16resources: skip sanity check of busy resourcesArjan van de Ven
Impact: reduce false positives in iomem_map_sanity_check() Some drivers (vesafb) only map/reserve a portion of a resource. If then some other driver comes in and maps the whole resource, the current code WARN_ON's. This is not the intent of the checks in iomem_map_sanity_check(); rather these checks want to warn when crossing *hardware* resources only. This patch skips BUSY resources as suggested by Linus. Note: having two drivers talk to the same hardware at the same time is obviously not optimal behavior, but that's a separate story. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-01reserve_region_with_split: Fix GFP_KERNEL usage under spinlockLinus Torvalds
This one apparently doesn't generate any warnings, because the function is only used during system bootup, when the warnings are disabled. But it's still very wrong. The __reserve_region_with_split() function is called with the resource_lock held for writing, so it must only ever do GFP_ATOMIC allocations. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-28resources: fix x86info results ioremap.c:226 __ioremap_caller+0xf2/0x2d6() ↵Suresh Siddha
WARNINGs Impact: avoid false-positive WARN_ON() Andi Kleen reported: > When running x86info on a 2.6.27-git8 system I get > > resource map sanity check conflict: 0x9e000 0x9efff 0x10000 0x9e7ff System RAM > ------------[ cut here ]------------ > WARNING: at /home/lsrc/linux/arch/x86/mm/ioremap.c:226 __ioremap_caller+0xf2/0x2d6() > ... Some of the pages below the 1MB ISA addresses will be shared typically by both BIOS and system usable RAM. For example: BIOS-e820: 0000000000000000 - 000000000009f800 (usable) BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) x86info reads the low physical address using /dev/mem, which internally uses ioremap() for accessing non RAM pages. ioremap() of such low pages conflicts with multiple resource entities leading to the above warning. Change the iomem_map_sanity_check() to allow mapping a page spanning multiple resource entities (minimum granularity that one can map is a page anyhow). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-23kernel/resource: fix reserve_region_with_split() section mismatchPaul Mundt
Impact: cleanup, small kernel text size reduction, no functionality changed reserve_region_with_split() calls in to __reserve_region_with_split(), which is an __init function. The only caller of reserve_region_with_split() is an __init function, so make it __init too. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16Merge branch 'core-v28-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: do_generic_file_read: s/EINTR/EIO/ if lock_page_killable() fails softirq, warning fix: correct a format to avoid a warning softirqs, debug: preemption check x86, pci-hotplug, calgary / rio: fix EBDA ioremap() IO resources, x86: ioremap sanity check to catch mapping requests exceeding, fix IO resources, x86: ioremap sanity check to catch mapping requests exceeding the BAR sizes softlockup: Documentation/sysctl/kernel.txt: fix softlockup_thresh description dmi scan: warn about too early calls to dmi_check_system() generic: redefine resource_size_t as phys_addr_t generic: make PFN_PHYS explicitly return phys_addr_t generic: add phys_addr_t for holding physical addresses softirq: allocate less vectors IO resources: fix/remove printk printk: robustify printk, update comment printk: robustify printk, fix #2 printk: robustify printk, fix printk: robustify printk Fixed up conflicts in: arch/powerpc/include/asm/types.h arch/powerpc/platforms/Kconfig.cputype manually.
2008-10-16resources: tidy __request_region()Bjorn Helgaas
No functional change. Just return NULL for kzalloc failure immediately, rather than wrapping the whole function body in the body of an "if". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-26IO resources, x86: ioremap sanity check to catch mapping requests exceeding, fixIngo Molnar
fix this build error: kernel/resource.c: In function 'iomem_map_sanity_check': kernel/resource.c:842: error: implicit declaration of function 'r_next' kernel/resource.c:842: warning: assignment makes pointer from integer without a cast r_next() was only available if CONFIG_PROCFS was enabled. and fix this build warning: kernel/resource.c:855: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'resource_size_t' kernel/resource.c:855: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'long unsigned int' kernel/resource.c:855: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t' kernel/resource.c:855: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'resource_size_t' resource_t can be 32 bits. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-26IO resources, x86: ioremap sanity check to catch mapping requests exceeding ↵Suresh Siddha
the BAR sizes Go through the iomem resource tree to check if any of the ioremap() requests span more than any slot in the iomem resource tree and do a WARN_ON() if we hit this check. This will raise a red-flag, if some driver is mapping more than what is needed. And hopefully identify possible corruptions much earlier. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-04IO resources: fix/remove printkIngo Molnar
Andrew Morton noticed that the printk in kernel/resource.c was buggy: | start and end have type resource_size_t. Such types CANNOT be printed | unless cast to a known type. | | Because there is a %s following an incorrect %lld, the above code will | crash the machine. ... and it's probably quite unneeded as well, so remove it. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-04IO resources: add reserve_region_with_split()Yinghai Lu
add reserve_region_with_split() to not lose e820 reserved entries if they overlap with existing IO regions: with test case by extend 0xe0000000 - 0xeffffff to 0xdd800000 - we get: e0000000-efffffff : PCI MMCONFIG 0 e0000000-efffffff : reserved and in /proc/iomem we get: found conflict for reserved [dd800000, efffffff], try to reserve with split __reserve_region_with_split: (PCI Bus #80) [dd000000, ddffffff], res: (reserved) [dd800000, efffffff] __reserve_region_with_split: (PCI Bus #00) [de000000, dfffffff], res: (reserved) [de000000, efffffff] initcall pci_subsys_init+0x0/0x121 returned 0 after 381 msecs in dmesg various fixes and improvements suggested by Linus. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>