diff options
Diffstat (limited to 'hwloc-1.2.1/README')
-rw-r--r-- | hwloc-1.2.1/README | 688 |
1 files changed, 688 insertions, 0 deletions
diff --git a/hwloc-1.2.1/README b/hwloc-1.2.1/README new file mode 100644 index 00000000..7eba416f --- /dev/null +++ b/hwloc-1.2.1/README @@ -0,0 +1,688 @@ +Introduction + +hwloc provides command line tools and a C API to obtain the hierarchical map of +key computing elements, such as: NUMA memory nodes, shared caches, processor +sockets, processor cores, and processing units (logical processors or +"threads"). hwloc also gathers various attributes such as cache and memory +information, and is portable across a variety of different operating systems +and platforms. + +hwloc primarily aims at helping high-performance computing (HPC) applications, +but is also applicable to any project seeking to exploit code and/or data +locality on modern computing platforms. + +Note that the hwloc project represents the merger of the libtopology project +from INRIA and the Portable Linux Processor Affinity (PLPA) sub-project from +Open MPI. Both of these prior projects are now deprecated. The first hwloc +release was essentially a "re-branding" of the libtopology code base, but with +both a few genuinely new features and a few PLPA-like features added in. Prior +releases of hwloc included documentation about switching from PLPA to hwloc; +this documentation has been dropped on the assumption that everyone who was +using PLPA has already switched to hwloc. + +hwloc supports the following operating systems: + + * Linux (including old kernels not having sysfs topology information, with + knowledge of cpusets, offline CPUs, ScaleMP vSMP, and Kerrighed support) + * Solaris + * AIX + * Darwin / OS X + * FreeBSD and its variants, such as kFreeBSD/GNU + * OSF/1 (a.k.a., Tru64) + * HP-UX + * Microsoft Windows + +hwloc only reports the number of processors on unsupported operating systems; +no topology information is available. + +For development and debugging purposes, hwloc also offers the ability to work +on "fake" topologies: + + * Symmetrical tree of resources generated from a list of level arities + * Remote machine simulation through the gathering of Linux sysfs topology + files + +hwloc can display the topology in a human-readable format, either in graphical +mode (X11), or by exporting in one of several different formats, including: +plain text, PDF, PNG, and FIG (see CLI Examples below). Note that some of the +export formats require additional support libraries. + +hwloc offers a programming interface for manipulating topologies and objects. +It also brings a powerful CPU bitmap API that is used to describe topology +objects location on physical/logical processors. See the Programming Interface +below. It may also be used to binding applications onto certain cores or memory +nodes. Several utility programs are also provided to ease command-line +manipulation of topology objects, binding of processes, and so on. + +Installation + +hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD +license. It is hosted as a sub-project of the overall Open MPI project (http:// +www.open-mpi.org/). Note that hwloc does not require any functionality from +Open MPI -- it is a wholly separate (and much smaller!) project and code base. +It just happens to be hosted as part of the overall Open MPI project. + +Nightly development snapshots are available on the web site. Additionally, the +code can be directly checked out of Subversion: + +shell$ svn checkout http://svn.open-mpi.org/svn/hwloc/trunk hwloc-trunk +shell$ cd hwloc-trunk +shell$ ./autogen.sh + +Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are required +when building from a Subversion checkout. + +Installation by itself is the fairly common GNU-based process: + +shell$ ./configure --prefix=... +shell$ make +shell$ make install + +The hwloc command-line tool "lstopo" produces human-readable topology maps, as +mentioned above. It can also export maps to the "fig" file format. Support for +PDF, Postscript, and PNG exporting is provided if the "Cairo" development +package can be found when hwloc is configured and build. Similarly, lstopo's +XML support requires the libxml2 development package. + +CLI Examples + +On a 4-socket 2-core machine with hyperthreading, the lstopo tool may show the +following graphical output: + +dudley.png + +Here's the equivalent output in textual form: + +Machine (16GB) + Socket L#0 + L3 L#0 (4096KB) + L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0 + PU L#0 (P#0) + PU L#1 (P#8) + L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1 + PU L#2 (P#4) + PU L#3 (P#12) + Socket L#1 + L3 L#1 (4096KB) + L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2 + PU L#4 (P#1) + PU L#5 (P#9) + L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3 + PU L#6 (P#5) + PU L#7 (P#13) + Socket L#2 + L3 L#2 (4096KB) + L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4 + PU L#8 (P#2) + PU L#9 (P#10) + L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5 + PU L#10 (P#6) + PU L#11 (P#14) + Socket L#3 + L3 L#3 (4096KB) + L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6 + PU L#12 (P#3) + PU L#13 (P#11) + L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7 + PU L#14 (P#7) + PU L#15 (P#15) + +Finally, here's the equivalent output in XML. Long lines were artificially +broken for document clarity (in the real output, each XML tag is on a single +line), and only socket #0 is shown for brevity: + +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE topology SYSTEM "hwloc.dtd"> +<topology> + <object type="Machine" os_level="-1" os_index="0" cpuset="0x0000ffff" + complete_cpuset="0x0000ffff" online_cpuset="0x0000ffff" + allowed_cpuset="0x0000ffff" + dmi_board_vendor="Dell Computer Corporation" dmi_board_name="0RD318" + local_memory="16648183808"> + <page_type size="4096" count="4064498"/> + <page_type size="2097152" count="0"/> + <object type="Socket" os_level="-1" os_index="0" cpuset="0x00001111" + complete_cpuset="0x00001111" online_cpuset="0x00001111" + allowed_cpuset="0x00001111"> + <object type="Cache" os_level="-1" cpuset="0x00001111" + complete_cpuset="0x00001111" online_cpuset="0x00001111" + allowed_cpuset="0x00001111" cache_size="4194304" depth="3" + cache_linesize="64"> + <object type="Cache" os_level="-1" cpuset="0x00000101" + complete_cpuset="0x00000101" online_cpuset="0x00000101" + allowed_cpuset="0x00000101" cache_size="1048576" depth="2" + cache_linesize="64"> + <object type="Cache" os_level="-1" cpuset="0x00000101" + complete_cpuset="0x00000101" online_cpuset="0x00000101" + allowed_cpuset="0x00000101" cache_size="16384" depth="1" + cache_linesize="64"> + <object type="Core" os_level="-1" os_index="0" cpuset="0x00000101" + complete_cpuset="0x00000101" online_cpuset="0x00000101" + allowed_cpuset="0x00000101"> + <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001" + complete_cpuset="0x00000001" online_cpuset="0x00000001" + allowed_cpuset="0x00000001"/> + <object type="PU" os_level="-1" os_index="8" cpuset="0x00000100" + complete_cpuset="0x00000100" online_cpuset="0x00000100" + allowed_cpuset="0x00000100"/> + </object> + </object> + </object> + <object type="Cache" os_level="-1" cpuset="0x00001010" + complete_cpuset="0x00001010" online_cpuset="0x00001010" + allowed_cpuset="0x00001010" cache_size="1048576" depth="2" + cache_linesize="64"> + <object type="Cache" os_level="-1" cpuset="0x00001010" + complete_cpuset="0x00001010" online_cpuset="0x00001010" + allowed_cpuset="0x00001010" cache_size="16384" depth="1" + cache_linesize="64"> + <object type="Core" os_level="-1" os_index="1" cpuset="0x00001010" + complete_cpuset="0x00001010" online_cpuset="0x00001010" + allowed_cpuset="0x00001010"> + <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010" + complete_cpuset="0x00000010" online_cpuset="0x00000010" + allowed_cpuset="0x00000010"/> + <object type="PU" os_level="-1" os_index="12" cpuset="0x00001000" + complete_cpuset="0x00001000" online_cpuset="0x00001000" + allowed_cpuset="0x00001000"/> + </object> + </object> + </object> + </object> + </object> + <!-- ...other sockets listed here ... --> + </object> +</topology> + +On a 4-socket 2-core Opteron NUMA machine, the lstopo tool may show the +following graphical output: + +hagrid.png + +Here's the equivalent output in textual form: + +Machine (32GB) + NUMANode L#0 (P#0 8190MB) + Socket L#0 + L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0) + L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1) + NUMANode L#1 (P#1 8192MB) + Socket L#1 + L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2) + L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3) + NUMANode L#2 (P#2 8192MB) + Socket L#2 + L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4) + L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5) + NUMANode L#3 (P#3 8192MB) + Socket L#3 + L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6) + L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7) + +And here's the equivalent output in XML. Similar to above, line breaks were +added and only PU #0 is shown for brevity: + +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE topology SYSTEM "hwloc.dtd"> +<topology> + <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff" + complete_cpuset="0x000000ff" online_cpuset="0x000000ff" + allowed_cpuset="0x000000ff" nodeset="0x000000ff" + complete_nodeset="0x000000ff" allowed_nodeset="0x000000ff" + dmi_board_vendor="TYAN Computer Corp" dmi_board_name="S4881 "> + <page_type size="4096" count="0"/> + <page_type size="2097152" count="0"/> + <object type="NUMANode" os_level="-1" os_index="0" cpuset="0x00000003" + complete_cpuset="0x00000003" online_cpuset="0x00000003" + allowed_cpuset="0x00000003" nodeset="0x00000001" + complete_nodeset="0x00000001" allowed_nodeset="0x00000001" + local_memory="7514177536"> + <page_type size="4096" count="1834516"/> + <page_type size="2097152" count="0"/> + <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000003" + complete_cpuset="0x00000003" online_cpuset="0x00000003" + allowed_cpuset="0x00000003" nodeset="0x00000001" + complete_nodeset="0x00000001" allowed_nodeset="0x00000001"> + <object type="Cache" os_level="-1" cpuset="0x00000001" + complete_cpuset="0x00000001" online_cpuset="0x00000001" + allowed_cpuset="0x00000001" nodeset="0x00000001" + complete_nodeset="0x00000001" allowed_nodeset="0x00000001" + cache_size="1048576" depth="2" cache_linesize="64"> + <object type="Cache" os_level="-1" cpuset="0x00000001" + complete_cpuset="0x00000001" online_cpuset="0x00000001" + allowed_cpuset="0x00000001" nodeset="0x00000001" + complete_nodeset="0x00000001" allowed_nodeset="0x00000001" + cache_size="65536" depth="1" cache_linesize="64"> + <object type="Core" os_level="-1" os_index="0" + cpuset="0x00000001" complete_cpuset="0x00000001" + online_cpuset="0x00000001" allowed_cpuset="0x00000001" + nodeset="0x00000001" complete_nodeset="0x00000001" + allowed_nodeset="0x00000001"> + <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001" + complete_cpuset="0x00000001" online_cpuset="0x00000001" + allowed_cpuset="0x00000001" nodeset="0x00000001" + complete_nodeset="0x00000001" allowed_nodeset="0x00000001"/> + </object> + </object> + </object> + <!-- ...more objects listed here ... --> +</topology> + +On a 2-socket quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each +socket): + +emmett.png + +Here's the same output in textual form: + +Machine (16GB) + Socket L#0 + L2 L#0 (4096KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4) + L2 L#1 (4096KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6) + Socket L#1 + L2 L#2 (4096KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5) + L2 L#3 (4096KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7) + +And the same output in XML (line breaks added, only PU #0 shown): + +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE topology SYSTEM "hwloc.dtd"> +<topology> + <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff" + complete_cpuset="0x000000ff" online_cpuset="0x000000ff" + allowed_cpuset="0x000000ff" dmi_board_vendor="Dell Inc." + dmi_board_name="0NR282" local_memory="16865292288"> + <page_type size="4096" count="4117503"/> + <page_type size="2097152" count="0"/> + <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000055" + complete_cpuset="0x00000055" online_cpuset="0x00000055" + allowed_cpuset="0x00000055"> + <object type="Cache" os_level="-1" cpuset="0x00000011" + complete_cpuset="0x00000011" online_cpuset="0x00000011" + allowed_cpuset="0x00000011" cache_size="4194304" depth="2" + cache_linesize="64"> + <object type="Cache" os_level="-1" cpuset="0x00000001" + complete_cpuset="0x00000001" online_cpuset="0x00000001" + allowed_cpuset="0x00000001" cache_size="32768" depth="1" + cache_linesize="64"> + <object type="Core" os_level="-1" os_index="0" cpuset="0x00000001" + complete_cpuset="0x00000001" online_cpuset="0x00000001" + allowed_cpuset="0x00000001"> + <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001" + complete_cpuset="0x00000001" online_cpuset="0x00000001" + allowed_cpuset="0x00000001"/> + </object> + </object> + <object type="Cache" os_level="-1" cpuset="0x00000010" + complete_cpuset="0x00000010" online_cpuset="0x00000010" + allowed_cpuset="0x00000010" cache_size="32768" depth="1" + cache_linesize="64"> + <object type="Core" os_level="-1" os_index="1" cpuset="0x00000010" + complete_cpuset="0x00000010" online_cpuset="0x00000010" + allowed_cpuset="0x00000010"> + <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010" + complete_cpuset="0x00000010" online_cpuset="0x00000010" + allowed_cpuset="0x00000010"/> + </object> + </object> + </object> + <!-- ...more objects listed here ... --> +</topology> + +Programming Interface + +The basic interface is available in hwloc.h. It essentially offers low-level +routines for advanced programmers that want to manually manipulate objects and +follow links between them. Documentation for everything in hwloc.h are provided +later in this document. Developers should also look at hwloc/helper.h (and also +in this document, which provides good higher-level topology traversal examples. + +To precisely define the vocabulary used by hwloc, a Terms and Definitions +section is available and should probably be read first. + +Each hwloc object contains a cpuset describing the list of processing units +that it contains. These bitmaps may be used for CPU binding and Memory binding. +hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h. + +Moreover, hwloc also comes with additional helpers for interoperability with +several commonly used environments. See the Interoperability With Other +Software section for details. + +The complete API documentation is available in a full set of HTML pages, man +pages, and self-contained PDF files (formatted for both both US letter and A4 +formats) in the source tarball in doc/doxygen-doc/. + +NOTE: If you are building the documentation from a Subversion checkout, you +will need to have Doxygen and pdflatex installed -- the documentation will be +built during the normal "make" process. The documentation is installed during +"make install" to $prefix/share/doc/hwloc/ and your systems default man page +tree (under $prefix, of course). + +Portability + +As shown in CLI Examples, hwloc can obtain information on a wide variety of +hardware topologies. However, some platforms and/or operating system versions +will only report a subset of this information. For example, on an PPC64-based +system with 32 cores (each with 2 hardware threads) running a default +2.6.18-based kernel from RHEL 5.4, hwloc is only able to glean information +about NUMA nodes and processor units (PUs). No information about caches, +sockets, or cores is available. + +Similarly, Operating System have varying support for CPU and memory binding, +e.g. while some Operating Systems provide interfaces for all kinds of CPU and +memory bindings, some others provide only interfaces for a limited number of +kinds of CPU and memory binding, and some do not provide any binding interface +at all. Hwloc's binding functions would then simply return the ENOSYS error +(Function not implemented), meaning that the underlying Operating System does +not provide any interface for them. CPU binding and Memory binding provide more +information on which hwloc binding functions should be preferred because +interfaces for them are usually available on the supported Operating Systems. + +Here's the graphical output from lstopo on this platform when Simultaneous +Multi-Threading (SMT) is enabled: + +ppc64-with-smt.png + +And here's the graphical output from lstopo on this platform when SMT is +disabled: + +ppc64-without-smt.png + +Notice that hwloc only sees half the PUs when SMT is disabled. PU #15, for +example, seems to change location from NUMA node #0 to #1. In reality, no PUs +"moved" -- they were simply re-numbered when hwloc only saw half as many. +Hence, PU #15 in the SMT-disabled picture probably corresponds to PU #30 in the +SMT-enabled picture. + +This same "PUs have disappeared" effect can be seen on other platforms -- even +platforms / OSs that provide much more information than the above PPC64 system. +This is an unfortunate side-effect of how operating systems report information +to hwloc. + +Note that upgrading the Linux kernel on the same PPC64 system mentioned above +to 2.6.34, hwloc is able to discover all the topology information. The +following picture shows the entire topology layout when SMT is enabled: + +ppc64-full-with-smt.png + +Developers using the hwloc API or XML output for portable applications should +therefore be extremely careful to not make any assumptions about the structure +of data that is returned. For example, per the above reported PPC topology, it +is not safe to assume that PUs will always be descendants of cores. + +Additionally, future hardware may insert new topology elements that are not +available in this version of hwloc. Long-lived applications that are meant to +span multiple different hardware platforms should also be careful about making +structure assumptions. For example, there may someday be an element "lower" +than a PU, or perhaps a new element may exist between a core and a PU. + +API Example + +The following small C example (named ``hwloc-hello.c'') prints the topology of +the machine and bring the process to the first logical processor of the second +core of the machine. + +/* Example hwloc API program. + * + * Copyright ? 2009-2010 INRIA. All rights reserved. + * Copyright ? 2009-2011 Universit? Bordeaux 1 + * Copyright ? 2009-2010 Cisco Systems, Inc. All rights reserved. + * See COPYING in top-level directory. + * + * hwloc-hello.c + */ + +#include <hwloc.h> +#include <errno.h> +#include <stdio.h> +#include <string.h> + +static void print_children(hwloc_topology_t topology, hwloc_obj_t obj, + int depth) +{ + char string[128]; + unsigned i; + + hwloc_obj_snprintf(string, sizeof(string), topology, obj, "#", 0); + printf("%*s%s\n", 2*depth, "", string); + for (i = 0; i < obj->arity; i++) { + print_children(topology, obj->children[i], depth + 1); + } +} + +int main(void) +{ + int depth; + unsigned i, n; + unsigned long size; + int levels; + char string[128]; + int topodepth; + hwloc_topology_t topology; + hwloc_cpuset_t cpuset; + hwloc_obj_t obj; + + /* Allocate and initialize topology object. */ + hwloc_topology_init(&topology); + + /* ... Optionally, put detection configuration here to ignore + some objects types, define a synthetic topology, etc.... + + The default is to detect all the objects of the machine that + the caller is allowed to access. See Configure Topology + Detection. */ + + /* Perform the topology detection. */ + hwloc_topology_load(topology); + + /* Optionally, get some additional topology information + in case we need the topology depth later. */ + topodepth = hwloc_topology_get_depth(topology); + + /***************************************************************** + * First example: + * Walk the topology with an array style, from level 0 (always + * the system level) to the lowest level (always the proc level). + *****************************************************************/ + for (depth = 0; depth < topodepth; depth++) { + printf("*** Objects at level %d\n", depth); + for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth); + i++) { + hwloc_obj_snprintf(string, sizeof(string), topology, + hwloc_get_obj_by_depth(topology, depth, i), + "#", 0); + printf("Index %u: %s\n", i, string); + } + } + + /***************************************************************** + * Second example: + * Walk the topology with a tree style. + *****************************************************************/ + printf("*** Printing overall tree\n"); + print_children(topology, hwloc_get_root_obj(topology), 0); + + /***************************************************************** + * Third example: + * Print the number of sockets. + *****************************************************************/ + depth = hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET); + if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) { + printf("*** The number of sockets is unknown\n"); + } else { + printf("*** %u socket(s)\n", + hwloc_get_nbobjs_by_depth(topology, depth)); + } + + /***************************************************************** + * Fourth example: + * Compute the amount of cache that the first logical processor + * has above it. + *****************************************************************/ + levels = 0; + size = 0; + for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0); + obj; + obj = obj->parent) + if (obj->type == HWLOC_OBJ_CACHE) { + levels++; + size += obj->attr->cache.size; + } + printf("*** Logical processor 0 has %d caches totaling %luKB\n", + levels, size / 1024); + + /***************************************************************** + * Fifth example: + * Bind to only one thread of the last core of the machine. + * + * First find out where cores are, or else smaller sets of CPUs if + * the OS doesn't have the notion of a "core". + *****************************************************************/ + depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE); + + /* Get last core. */ + obj = hwloc_get_obj_by_depth(topology, depth, + hwloc_get_nbobjs_by_depth(topology, depth) - 1); + if (obj) { + /* Get a copy of its cpuset that we may modify. */ + cpuset = hwloc_bitmap_dup(obj->cpuset); + + /* Get only one logical processor (in case the core is + SMT/hyperthreaded). */ + hwloc_bitmap_singlify(cpuset); + + /* And try to bind ourself there. */ + if (hwloc_set_cpubind(topology, cpuset, 0)) { + char *str; + int error = errno; + hwloc_bitmap_asprintf(&str, obj->cpuset); + printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error)); + free(str); + } + + /* Free our cpuset copy */ + hwloc_bitmap_free(cpuset); + } + + /***************************************************************** + * Sixth example: + * Allocate some memory on the last NUMA node, bind some existing + * memory to the last NUMA node. + *****************************************************************/ + /* Get last node. */ + n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NODE); + if (n) { + void *m; + size = 1024*1024; + + obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, n - 1); + m = hwloc_alloc_membind_nodeset(topology, size, obj->nodeset, + HWLOC_MEMBIND_DEFAULT, 0); + hwloc_free(topology, m, size); + + m = malloc(size); + hwloc_set_area_membind_nodeset(topology, m, size, obj->nodeset, + HWLOC_MEMBIND_DEFAULT, 0); + free(m); + } + + /* Destroy topology object. */ + hwloc_topology_destroy(topology); + + return 0; +} + +hwloc provides a pkg-config executable to obtain relevant compiler and linker +flags. For example, it can be used thusly to compile applications that utilize +the hwloc library (assuming GNU Make): + +CFLAGS += $(pkg-config --cflags hwloc) +LDLIBS += $(pkg-config --libs hwloc) +cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS) + +On a machine with 4GB of RAM and 2 processor sockets -- each socket of which +has two processing cores -- the output from running hwloc-hello could be +something like the following: + +shell$ ./hwloc-hello +*** Objects at level 0 +Index 0: Machine(3938MB) +*** Objects at level 1 +Index 0: Socket#0 +Index 1: Socket#1 +*** Objects at level 2 +Index 0: Core#0 +Index 1: Core#1 +Index 2: Core#3 +Index 3: Core#2 +*** Objects at level 3 +Index 0: PU#0 +Index 1: PU#1 +Index 2: PU#2 +Index 3: PU#3 +*** Printing overall tree +Machine(3938MB) + Socket#0 + Core#0 + PU#0 + Core#1 + PU#1 + Socket#1 + Core#3 + PU#2 + Core#2 + PU#3 +*** 2 socket(s) +shell$ + +Questions and Bugs + +Questions should be sent to the devel mailing list (http://www.open-mpi.org/ +community/lists/hwloc.php). Bug reports should be reported in the tracker ( +https://svn.open-mpi.org/trac/hwloc/). + +If hwloc discovers an incorrect topology for your machine, the very first thing +you should check is to ensure that you have the most recent updates installed +for your operating system. Indeed, most of hwloc topology discovery relies on +hardware information retrieved through the operation system (e.g., via the /sys +virtual filesystem of the Linux kernel). If upgrading your OS or Linux kernel +does not solve your problem, you may also want to ensure that you are running +the most recent version of the BIOS for your machine. + +If those things fail, contact us on the mailing list for additional help. +Please attach the output of lstopo after having given the --enable-debug option +to ./configure and rebuilt completely, to get debugging output. + +History / Credits + +hwloc is the evolution and merger of the libtopology (http:// +runtime.bordeaux.inria.fr/libtopology/) project and the Portable Linux +Processor Affinity (PLPA) (http://www.open-mpi.org/projects/plpa/) project. +Because of functional and ideological overlap, these two code bases and ideas +were merged and released under the name "hwloc" as an Open MPI sub-project. + +libtopology was initially developed by the INRIA Runtime Team-Project (http:// +runtime.bordeaux.inria.fr/) (headed by Raymond Namyst (http:// +dept-info.labri.fr/~namyst/). PLPA was initially developed by the Open MPI +development team as a sub-project. Both are now deprecated in favor of hwloc, +which is distributed as an Open MPI sub-project. + +Further Reading + +The documentation chapters include + + * Terms and Definitions + * Command-Line Tools + * Environment Variables + * CPU and Memory Binding Overview + * Interoperability With Other Software + * Thread Safety + * Embedding hwloc in Other Software + * Frequently Asked Questions + +Make sure to have had a look at those too! + +------------------------------------------------------------------------------- + +Generated on Tue Aug 16 2011 19:37:04 for Hardware Locality (hwloc) by doxygen +1.7.4 |