summaryrefslogtreecommitdiffstats
path: root/hwloc-1.2.1/README
diff options
context:
space:
mode:
Diffstat (limited to 'hwloc-1.2.1/README')
-rw-r--r--hwloc-1.2.1/README688
1 files changed, 688 insertions, 0 deletions
diff --git a/hwloc-1.2.1/README b/hwloc-1.2.1/README
new file mode 100644
index 00000000..7eba416f
--- /dev/null
+++ b/hwloc-1.2.1/README
@@ -0,0 +1,688 @@
+Introduction
+
+hwloc provides command line tools and a C API to obtain the hierarchical map of
+key computing elements, such as: NUMA memory nodes, shared caches, processor
+sockets, processor cores, and processing units (logical processors or
+"threads"). hwloc also gathers various attributes such as cache and memory
+information, and is portable across a variety of different operating systems
+and platforms.
+
+hwloc primarily aims at helping high-performance computing (HPC) applications,
+but is also applicable to any project seeking to exploit code and/or data
+locality on modern computing platforms.
+
+Note that the hwloc project represents the merger of the libtopology project
+from INRIA and the Portable Linux Processor Affinity (PLPA) sub-project from
+Open MPI. Both of these prior projects are now deprecated. The first hwloc
+release was essentially a "re-branding" of the libtopology code base, but with
+both a few genuinely new features and a few PLPA-like features added in. Prior
+releases of hwloc included documentation about switching from PLPA to hwloc;
+this documentation has been dropped on the assumption that everyone who was
+using PLPA has already switched to hwloc.
+
+hwloc supports the following operating systems:
+
+ * Linux (including old kernels not having sysfs topology information, with
+ knowledge of cpusets, offline CPUs, ScaleMP vSMP, and Kerrighed support)
+ * Solaris
+ * AIX
+ * Darwin / OS X
+ * FreeBSD and its variants, such as kFreeBSD/GNU
+ * OSF/1 (a.k.a., Tru64)
+ * HP-UX
+ * Microsoft Windows
+
+hwloc only reports the number of processors on unsupported operating systems;
+no topology information is available.
+
+For development and debugging purposes, hwloc also offers the ability to work
+on "fake" topologies:
+
+ * Symmetrical tree of resources generated from a list of level arities
+ * Remote machine simulation through the gathering of Linux sysfs topology
+ files
+
+hwloc can display the topology in a human-readable format, either in graphical
+mode (X11), or by exporting in one of several different formats, including:
+plain text, PDF, PNG, and FIG (see CLI Examples below). Note that some of the
+export formats require additional support libraries.
+
+hwloc offers a programming interface for manipulating topologies and objects.
+It also brings a powerful CPU bitmap API that is used to describe topology
+objects location on physical/logical processors. See the Programming Interface
+below. It may also be used to binding applications onto certain cores or memory
+nodes. Several utility programs are also provided to ease command-line
+manipulation of topology objects, binding of processes, and so on.
+
+Installation
+
+hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD
+license. It is hosted as a sub-project of the overall Open MPI project (http://
+www.open-mpi.org/). Note that hwloc does not require any functionality from
+Open MPI -- it is a wholly separate (and much smaller!) project and code base.
+It just happens to be hosted as part of the overall Open MPI project.
+
+Nightly development snapshots are available on the web site. Additionally, the
+code can be directly checked out of Subversion:
+
+shell$ svn checkout http://svn.open-mpi.org/svn/hwloc/trunk hwloc-trunk
+shell$ cd hwloc-trunk
+shell$ ./autogen.sh
+
+Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are required
+when building from a Subversion checkout.
+
+Installation by itself is the fairly common GNU-based process:
+
+shell$ ./configure --prefix=...
+shell$ make
+shell$ make install
+
+The hwloc command-line tool "lstopo" produces human-readable topology maps, as
+mentioned above. It can also export maps to the "fig" file format. Support for
+PDF, Postscript, and PNG exporting is provided if the "Cairo" development
+package can be found when hwloc is configured and build. Similarly, lstopo's
+XML support requires the libxml2 development package.
+
+CLI Examples
+
+On a 4-socket 2-core machine with hyperthreading, the lstopo tool may show the
+following graphical output:
+
+dudley.png
+
+Here's the equivalent output in textual form:
+
+Machine (16GB)
+ Socket L#0 + L3 L#0 (4096KB)
+ L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
+ PU L#0 (P#0)
+ PU L#1 (P#8)
+ L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1
+ PU L#2 (P#4)
+ PU L#3 (P#12)
+ Socket L#1 + L3 L#1 (4096KB)
+ L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2
+ PU L#4 (P#1)
+ PU L#5 (P#9)
+ L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3
+ PU L#6 (P#5)
+ PU L#7 (P#13)
+ Socket L#2 + L3 L#2 (4096KB)
+ L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4
+ PU L#8 (P#2)
+ PU L#9 (P#10)
+ L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5
+ PU L#10 (P#6)
+ PU L#11 (P#14)
+ Socket L#3 + L3 L#3 (4096KB)
+ L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6
+ PU L#12 (P#3)
+ PU L#13 (P#11)
+ L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7
+ PU L#14 (P#7)
+ PU L#15 (P#15)
+
+Finally, here's the equivalent output in XML. Long lines were artificially
+broken for document clarity (in the real output, each XML tag is on a single
+line), and only socket #0 is shown for brevity:
+
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE topology SYSTEM "hwloc.dtd">
+<topology>
+ <object type="Machine" os_level="-1" os_index="0" cpuset="0x0000ffff"
+ complete_cpuset="0x0000ffff" online_cpuset="0x0000ffff"
+ allowed_cpuset="0x0000ffff"
+ dmi_board_vendor="Dell Computer Corporation" dmi_board_name="0RD318"
+ local_memory="16648183808">
+ <page_type size="4096" count="4064498"/>
+ <page_type size="2097152" count="0"/>
+ <object type="Socket" os_level="-1" os_index="0" cpuset="0x00001111"
+ complete_cpuset="0x00001111" online_cpuset="0x00001111"
+ allowed_cpuset="0x00001111">
+ <object type="Cache" os_level="-1" cpuset="0x00001111"
+ complete_cpuset="0x00001111" online_cpuset="0x00001111"
+ allowed_cpuset="0x00001111" cache_size="4194304" depth="3"
+ cache_linesize="64">
+ <object type="Cache" os_level="-1" cpuset="0x00000101"
+ complete_cpuset="0x00000101" online_cpuset="0x00000101"
+ allowed_cpuset="0x00000101" cache_size="1048576" depth="2"
+ cache_linesize="64">
+ <object type="Cache" os_level="-1" cpuset="0x00000101"
+ complete_cpuset="0x00000101" online_cpuset="0x00000101"
+ allowed_cpuset="0x00000101" cache_size="16384" depth="1"
+ cache_linesize="64">
+ <object type="Core" os_level="-1" os_index="0" cpuset="0x00000101"
+ complete_cpuset="0x00000101" online_cpuset="0x00000101"
+ allowed_cpuset="0x00000101">
+ <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
+ complete_cpuset="0x00000001" online_cpuset="0x00000001"
+ allowed_cpuset="0x00000001"/>
+ <object type="PU" os_level="-1" os_index="8" cpuset="0x00000100"
+ complete_cpuset="0x00000100" online_cpuset="0x00000100"
+ allowed_cpuset="0x00000100"/>
+ </object>
+ </object>
+ </object>
+ <object type="Cache" os_level="-1" cpuset="0x00001010"
+ complete_cpuset="0x00001010" online_cpuset="0x00001010"
+ allowed_cpuset="0x00001010" cache_size="1048576" depth="2"
+ cache_linesize="64">
+ <object type="Cache" os_level="-1" cpuset="0x00001010"
+ complete_cpuset="0x00001010" online_cpuset="0x00001010"
+ allowed_cpuset="0x00001010" cache_size="16384" depth="1"
+ cache_linesize="64">
+ <object type="Core" os_level="-1" os_index="1" cpuset="0x00001010"
+ complete_cpuset="0x00001010" online_cpuset="0x00001010"
+ allowed_cpuset="0x00001010">
+ <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010"
+ complete_cpuset="0x00000010" online_cpuset="0x00000010"
+ allowed_cpuset="0x00000010"/>
+ <object type="PU" os_level="-1" os_index="12" cpuset="0x00001000"
+ complete_cpuset="0x00001000" online_cpuset="0x00001000"
+ allowed_cpuset="0x00001000"/>
+ </object>
+ </object>
+ </object>
+ </object>
+ </object>
+ <!-- ...other sockets listed here ... -->
+ </object>
+</topology>
+
+On a 4-socket 2-core Opteron NUMA machine, the lstopo tool may show the
+following graphical output:
+
+hagrid.png
+
+Here's the equivalent output in textual form:
+
+Machine (32GB)
+ NUMANode L#0 (P#0 8190MB) + Socket L#0
+ L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
+ L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1)
+ NUMANode L#1 (P#1 8192MB) + Socket L#1
+ L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2)
+ L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3)
+ NUMANode L#2 (P#2 8192MB) + Socket L#2
+ L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4)
+ L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5)
+ NUMANode L#3 (P#3 8192MB) + Socket L#3
+ L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
+ L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)
+
+And here's the equivalent output in XML. Similar to above, line breaks were
+added and only PU #0 is shown for brevity:
+
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE topology SYSTEM "hwloc.dtd">
+<topology>
+ <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff"
+ complete_cpuset="0x000000ff" online_cpuset="0x000000ff"
+ allowed_cpuset="0x000000ff" nodeset="0x000000ff"
+ complete_nodeset="0x000000ff" allowed_nodeset="0x000000ff"
+ dmi_board_vendor="TYAN Computer Corp" dmi_board_name="S4881 ">
+ <page_type size="4096" count="0"/>
+ <page_type size="2097152" count="0"/>
+ <object type="NUMANode" os_level="-1" os_index="0" cpuset="0x00000003"
+ complete_cpuset="0x00000003" online_cpuset="0x00000003"
+ allowed_cpuset="0x00000003" nodeset="0x00000001"
+ complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
+ local_memory="7514177536">
+ <page_type size="4096" count="1834516"/>
+ <page_type size="2097152" count="0"/>
+ <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000003"
+ complete_cpuset="0x00000003" online_cpuset="0x00000003"
+ allowed_cpuset="0x00000003" nodeset="0x00000001"
+ complete_nodeset="0x00000001" allowed_nodeset="0x00000001">
+ <object type="Cache" os_level="-1" cpuset="0x00000001"
+ complete_cpuset="0x00000001" online_cpuset="0x00000001"
+ allowed_cpuset="0x00000001" nodeset="0x00000001"
+ complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
+ cache_size="1048576" depth="2" cache_linesize="64">
+ <object type="Cache" os_level="-1" cpuset="0x00000001"
+ complete_cpuset="0x00000001" online_cpuset="0x00000001"
+ allowed_cpuset="0x00000001" nodeset="0x00000001"
+ complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
+ cache_size="65536" depth="1" cache_linesize="64">
+ <object type="Core" os_level="-1" os_index="0"
+ cpuset="0x00000001" complete_cpuset="0x00000001"
+ online_cpuset="0x00000001" allowed_cpuset="0x00000001"
+ nodeset="0x00000001" complete_nodeset="0x00000001"
+ allowed_nodeset="0x00000001">
+ <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
+ complete_cpuset="0x00000001" online_cpuset="0x00000001"
+ allowed_cpuset="0x00000001" nodeset="0x00000001"
+ complete_nodeset="0x00000001" allowed_nodeset="0x00000001"/>
+ </object>
+ </object>
+ </object>
+ <!-- ...more objects listed here ... -->
+</topology>
+
+On a 2-socket quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each
+socket):
+
+emmett.png
+
+Here's the same output in textual form:
+
+Machine (16GB)
+ Socket L#0
+ L2 L#0 (4096KB)
+ L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
+ L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4)
+ L2 L#1 (4096KB)
+ L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
+ L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
+ Socket L#1
+ L2 L#2 (4096KB)
+ L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
+ L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
+ L2 L#3 (4096KB)
+ L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3)
+ L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
+
+And the same output in XML (line breaks added, only PU #0 shown):
+
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE topology SYSTEM "hwloc.dtd">
+<topology>
+ <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff"
+ complete_cpuset="0x000000ff" online_cpuset="0x000000ff"
+ allowed_cpuset="0x000000ff" dmi_board_vendor="Dell Inc."
+ dmi_board_name="0NR282" local_memory="16865292288">
+ <page_type size="4096" count="4117503"/>
+ <page_type size="2097152" count="0"/>
+ <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000055"
+ complete_cpuset="0x00000055" online_cpuset="0x00000055"
+ allowed_cpuset="0x00000055">
+ <object type="Cache" os_level="-1" cpuset="0x00000011"
+ complete_cpuset="0x00000011" online_cpuset="0x00000011"
+ allowed_cpuset="0x00000011" cache_size="4194304" depth="2"
+ cache_linesize="64">
+ <object type="Cache" os_level="-1" cpuset="0x00000001"
+ complete_cpuset="0x00000001" online_cpuset="0x00000001"
+ allowed_cpuset="0x00000001" cache_size="32768" depth="1"
+ cache_linesize="64">
+ <object type="Core" os_level="-1" os_index="0" cpuset="0x00000001"
+ complete_cpuset="0x00000001" online_cpuset="0x00000001"
+ allowed_cpuset="0x00000001">
+ <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
+ complete_cpuset="0x00000001" online_cpuset="0x00000001"
+ allowed_cpuset="0x00000001"/>
+ </object>
+ </object>
+ <object type="Cache" os_level="-1" cpuset="0x00000010"
+ complete_cpuset="0x00000010" online_cpuset="0x00000010"
+ allowed_cpuset="0x00000010" cache_size="32768" depth="1"
+ cache_linesize="64">
+ <object type="Core" os_level="-1" os_index="1" cpuset="0x00000010"
+ complete_cpuset="0x00000010" online_cpuset="0x00000010"
+ allowed_cpuset="0x00000010">
+ <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010"
+ complete_cpuset="0x00000010" online_cpuset="0x00000010"
+ allowed_cpuset="0x00000010"/>
+ </object>
+ </object>
+ </object>
+ <!-- ...more objects listed here ... -->
+</topology>
+
+Programming Interface
+
+The basic interface is available in hwloc.h. It essentially offers low-level
+routines for advanced programmers that want to manually manipulate objects and
+follow links between them. Documentation for everything in hwloc.h are provided
+later in this document. Developers should also look at hwloc/helper.h (and also
+in this document, which provides good higher-level topology traversal examples.
+
+To precisely define the vocabulary used by hwloc, a Terms and Definitions
+section is available and should probably be read first.
+
+Each hwloc object contains a cpuset describing the list of processing units
+that it contains. These bitmaps may be used for CPU binding and Memory binding.
+hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h.
+
+Moreover, hwloc also comes with additional helpers for interoperability with
+several commonly used environments. See the Interoperability With Other
+Software section for details.
+
+The complete API documentation is available in a full set of HTML pages, man
+pages, and self-contained PDF files (formatted for both both US letter and A4
+formats) in the source tarball in doc/doxygen-doc/.
+
+NOTE: If you are building the documentation from a Subversion checkout, you
+will need to have Doxygen and pdflatex installed -- the documentation will be
+built during the normal "make" process. The documentation is installed during
+"make install" to $prefix/share/doc/hwloc/ and your systems default man page
+tree (under $prefix, of course).
+
+Portability
+
+As shown in CLI Examples, hwloc can obtain information on a wide variety of
+hardware topologies. However, some platforms and/or operating system versions
+will only report a subset of this information. For example, on an PPC64-based
+system with 32 cores (each with 2 hardware threads) running a default
+2.6.18-based kernel from RHEL 5.4, hwloc is only able to glean information
+about NUMA nodes and processor units (PUs). No information about caches,
+sockets, or cores is available.
+
+Similarly, Operating System have varying support for CPU and memory binding,
+e.g. while some Operating Systems provide interfaces for all kinds of CPU and
+memory bindings, some others provide only interfaces for a limited number of
+kinds of CPU and memory binding, and some do not provide any binding interface
+at all. Hwloc's binding functions would then simply return the ENOSYS error
+(Function not implemented), meaning that the underlying Operating System does
+not provide any interface for them. CPU binding and Memory binding provide more
+information on which hwloc binding functions should be preferred because
+interfaces for them are usually available on the supported Operating Systems.
+
+Here's the graphical output from lstopo on this platform when Simultaneous
+Multi-Threading (SMT) is enabled:
+
+ppc64-with-smt.png
+
+And here's the graphical output from lstopo on this platform when SMT is
+disabled:
+
+ppc64-without-smt.png
+
+Notice that hwloc only sees half the PUs when SMT is disabled. PU #15, for
+example, seems to change location from NUMA node #0 to #1. In reality, no PUs
+"moved" -- they were simply re-numbered when hwloc only saw half as many.
+Hence, PU #15 in the SMT-disabled picture probably corresponds to PU #30 in the
+SMT-enabled picture.
+
+This same "PUs have disappeared" effect can be seen on other platforms -- even
+platforms / OSs that provide much more information than the above PPC64 system.
+This is an unfortunate side-effect of how operating systems report information
+to hwloc.
+
+Note that upgrading the Linux kernel on the same PPC64 system mentioned above
+to 2.6.34, hwloc is able to discover all the topology information. The
+following picture shows the entire topology layout when SMT is enabled:
+
+ppc64-full-with-smt.png
+
+Developers using the hwloc API or XML output for portable applications should
+therefore be extremely careful to not make any assumptions about the structure
+of data that is returned. For example, per the above reported PPC topology, it
+is not safe to assume that PUs will always be descendants of cores.
+
+Additionally, future hardware may insert new topology elements that are not
+available in this version of hwloc. Long-lived applications that are meant to
+span multiple different hardware platforms should also be careful about making
+structure assumptions. For example, there may someday be an element "lower"
+than a PU, or perhaps a new element may exist between a core and a PU.
+
+API Example
+
+The following small C example (named ``hwloc-hello.c'') prints the topology of
+the machine and bring the process to the first logical processor of the second
+core of the machine.
+
+/* Example hwloc API program.
+ *
+ * Copyright ? 2009-2010 INRIA. All rights reserved.
+ * Copyright ? 2009-2011 Universit? Bordeaux 1
+ * Copyright ? 2009-2010 Cisco Systems, Inc. All rights reserved.
+ * See COPYING in top-level directory.
+ *
+ * hwloc-hello.c
+ */
+
+#include <hwloc.h>
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+
+static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
+ int depth)
+{
+ char string[128];
+ unsigned i;
+
+ hwloc_obj_snprintf(string, sizeof(string), topology, obj, "#", 0);
+ printf("%*s%s\n", 2*depth, "", string);
+ for (i = 0; i < obj->arity; i++) {
+ print_children(topology, obj->children[i], depth + 1);
+ }
+}
+
+int main(void)
+{
+ int depth;
+ unsigned i, n;
+ unsigned long size;
+ int levels;
+ char string[128];
+ int topodepth;
+ hwloc_topology_t topology;
+ hwloc_cpuset_t cpuset;
+ hwloc_obj_t obj;
+
+ /* Allocate and initialize topology object. */
+ hwloc_topology_init(&topology);
+
+ /* ... Optionally, put detection configuration here to ignore
+ some objects types, define a synthetic topology, etc....
+
+ The default is to detect all the objects of the machine that
+ the caller is allowed to access. See Configure Topology
+ Detection. */
+
+ /* Perform the topology detection. */
+ hwloc_topology_load(topology);
+
+ /* Optionally, get some additional topology information
+ in case we need the topology depth later. */
+ topodepth = hwloc_topology_get_depth(topology);
+
+ /*****************************************************************
+ * First example:
+ * Walk the topology with an array style, from level 0 (always
+ * the system level) to the lowest level (always the proc level).
+ *****************************************************************/
+ for (depth = 0; depth < topodepth; depth++) {
+ printf("*** Objects at level %d\n", depth);
+ for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
+ i++) {
+ hwloc_obj_snprintf(string, sizeof(string), topology,
+ hwloc_get_obj_by_depth(topology, depth, i),
+ "#", 0);
+ printf("Index %u: %s\n", i, string);
+ }
+ }
+
+ /*****************************************************************
+ * Second example:
+ * Walk the topology with a tree style.
+ *****************************************************************/
+ printf("*** Printing overall tree\n");
+ print_children(topology, hwloc_get_root_obj(topology), 0);
+
+ /*****************************************************************
+ * Third example:
+ * Print the number of sockets.
+ *****************************************************************/
+ depth = hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET);
+ if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
+ printf("*** The number of sockets is unknown\n");
+ } else {
+ printf("*** %u socket(s)\n",
+ hwloc_get_nbobjs_by_depth(topology, depth));
+ }
+
+ /*****************************************************************
+ * Fourth example:
+ * Compute the amount of cache that the first logical processor
+ * has above it.
+ *****************************************************************/
+ levels = 0;
+ size = 0;
+ for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
+ obj;
+ obj = obj->parent)
+ if (obj->type == HWLOC_OBJ_CACHE) {
+ levels++;
+ size += obj->attr->cache.size;
+ }
+ printf("*** Logical processor 0 has %d caches totaling %luKB\n",
+ levels, size / 1024);
+
+ /*****************************************************************
+ * Fifth example:
+ * Bind to only one thread of the last core of the machine.
+ *
+ * First find out where cores are, or else smaller sets of CPUs if
+ * the OS doesn't have the notion of a "core".
+ *****************************************************************/
+ depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);
+
+ /* Get last core. */
+ obj = hwloc_get_obj_by_depth(topology, depth,
+ hwloc_get_nbobjs_by_depth(topology, depth) - 1);
+ if (obj) {
+ /* Get a copy of its cpuset that we may modify. */
+ cpuset = hwloc_bitmap_dup(obj->cpuset);
+
+ /* Get only one logical processor (in case the core is
+ SMT/hyperthreaded). */
+ hwloc_bitmap_singlify(cpuset);
+
+ /* And try to bind ourself there. */
+ if (hwloc_set_cpubind(topology, cpuset, 0)) {
+ char *str;
+ int error = errno;
+ hwloc_bitmap_asprintf(&str, obj->cpuset);
+ printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error));
+ free(str);
+ }
+
+ /* Free our cpuset copy */
+ hwloc_bitmap_free(cpuset);
+ }
+
+ /*****************************************************************
+ * Sixth example:
+ * Allocate some memory on the last NUMA node, bind some existing
+ * memory to the last NUMA node.
+ *****************************************************************/
+ /* Get last node. */
+ n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NODE);
+ if (n) {
+ void *m;
+ size = 1024*1024;
+
+ obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, n - 1);
+ m = hwloc_alloc_membind_nodeset(topology, size, obj->nodeset,
+ HWLOC_MEMBIND_DEFAULT, 0);
+ hwloc_free(topology, m, size);
+
+ m = malloc(size);
+ hwloc_set_area_membind_nodeset(topology, m, size, obj->nodeset,
+ HWLOC_MEMBIND_DEFAULT, 0);
+ free(m);
+ }
+
+ /* Destroy topology object. */
+ hwloc_topology_destroy(topology);
+
+ return 0;
+}
+
+hwloc provides a pkg-config executable to obtain relevant compiler and linker
+flags. For example, it can be used thusly to compile applications that utilize
+the hwloc library (assuming GNU Make):
+
+CFLAGS += $(pkg-config --cflags hwloc)
+LDLIBS += $(pkg-config --libs hwloc)
+cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS)
+
+On a machine with 4GB of RAM and 2 processor sockets -- each socket of which
+has two processing cores -- the output from running hwloc-hello could be
+something like the following:
+
+shell$ ./hwloc-hello
+*** Objects at level 0
+Index 0: Machine(3938MB)
+*** Objects at level 1
+Index 0: Socket#0
+Index 1: Socket#1
+*** Objects at level 2
+Index 0: Core#0
+Index 1: Core#1
+Index 2: Core#3
+Index 3: Core#2
+*** Objects at level 3
+Index 0: PU#0
+Index 1: PU#1
+Index 2: PU#2
+Index 3: PU#3
+*** Printing overall tree
+Machine(3938MB)
+ Socket#0
+ Core#0
+ PU#0
+ Core#1
+ PU#1
+ Socket#1
+ Core#3
+ PU#2
+ Core#2
+ PU#3
+*** 2 socket(s)
+shell$
+
+Questions and Bugs
+
+Questions should be sent to the devel mailing list (http://www.open-mpi.org/
+community/lists/hwloc.php). Bug reports should be reported in the tracker (
+https://svn.open-mpi.org/trac/hwloc/).
+
+If hwloc discovers an incorrect topology for your machine, the very first thing
+you should check is to ensure that you have the most recent updates installed
+for your operating system. Indeed, most of hwloc topology discovery relies on
+hardware information retrieved through the operation system (e.g., via the /sys
+virtual filesystem of the Linux kernel). If upgrading your OS or Linux kernel
+does not solve your problem, you may also want to ensure that you are running
+the most recent version of the BIOS for your machine.
+
+If those things fail, contact us on the mailing list for additional help.
+Please attach the output of lstopo after having given the --enable-debug option
+to ./configure and rebuilt completely, to get debugging output.
+
+History / Credits
+
+hwloc is the evolution and merger of the libtopology (http://
+runtime.bordeaux.inria.fr/libtopology/) project and the Portable Linux
+Processor Affinity (PLPA) (http://www.open-mpi.org/projects/plpa/) project.
+Because of functional and ideological overlap, these two code bases and ideas
+were merged and released under the name "hwloc" as an Open MPI sub-project.
+
+libtopology was initially developed by the INRIA Runtime Team-Project (http://
+runtime.bordeaux.inria.fr/) (headed by Raymond Namyst (http://
+dept-info.labri.fr/~namyst/). PLPA was initially developed by the Open MPI
+development team as a sub-project. Both are now deprecated in favor of hwloc,
+which is distributed as an Open MPI sub-project.
+
+Further Reading
+
+The documentation chapters include
+
+ * Terms and Definitions
+ * Command-Line Tools
+ * Environment Variables
+ * CPU and Memory Binding Overview
+ * Interoperability With Other Software
+ * Thread Safety
+ * Embedding hwloc in Other Software
+ * Frequently Asked Questions
+
+Make sure to have had a look at those too!
+
+-------------------------------------------------------------------------------
+
+Generated on Tue Aug 16 2011 19:37:04 for Hardware Locality (hwloc) by doxygen
+1.7.4