diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-03-30 12:45:23 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-03-30 12:45:23 -0700 |
commit | 481ed297d900af0ce395f6ca8975903b76a5a59e (patch) | |
tree | e3862e9993cd8e2245c5a6d632f45dd3f77d1d62 /Documentation/core-api | |
parent | e59cd88028dbd41472453e5883f78330aa73c56e (diff) | |
parent | abcb1e021ae5a36374c635eeaba5cec733169b78 (diff) |
Merge tag 'docs-5.7' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This has been a busy cycle for documentation work.
Highlights include:
- Lots of RST conversion work by Mauro, Daniel ALmeida, and others.
Maybe someday we'll get to the end of this stuff...maybe...
- Some organizational work to bring some order to the core-api
manual.
- Various new docs and additions to the existing documentation.
- Typo fixes, warning fixes, ..."
* tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits)
Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT
MAINTAINERS: adjust to filesystem doc ReST conversion
docs: deprecated.rst: Add BUG()-family
doc: zh_CN: add translation for virtiofs
doc: zh_CN: index files in filesystems subdirectory
docs: locking: Drop :c:func: throughout
docs: locking: Add 'need' to hardirq section
docs: conf.py: avoid thousands of duplicate label warning on Sphinx
docs: prevent warnings due to autosectionlabel
docs: fix reference to core-api/namespaces.rst
docs: fix pointers to io-mapping.rst and io_ordering.rst files
Documentation: Better document the softlockup_panic sysctl
docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref
docs: perf: imx-ddr.rst: get rid of a warning
docs: filesystems: fuse.rst: supress a Sphinx warning
docs: translations: it: avoid duplicate refs at programming-language.rst
docs: driver.rst: supress two ReSt warnings
docs: trace: events.rst: convert some new stuff to ReST format
Documentation: Add io_ordering.rst to driver-api manual
Documentation: Add io-mapping.rst to driver-api manual
...
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/gcc-plugins.rst | 93 | ||||
-rw-r--r-- | Documentation/core-api/index.rst | 94 | ||||
-rw-r--r-- | Documentation/core-api/ioctl.rst | 253 | ||||
-rw-r--r-- | Documentation/core-api/kobject.rst | 434 |
4 files changed, 506 insertions, 368 deletions
diff --git a/Documentation/core-api/gcc-plugins.rst b/Documentation/core-api/gcc-plugins.rst deleted file mode 100644 index 8502f24396fb..000000000000 --- a/Documentation/core-api/gcc-plugins.rst +++ /dev/null @@ -1,93 +0,0 @@ -========================= -GCC plugin infrastructure -========================= - - -Introduction -============ - -GCC plugins are loadable modules that provide extra features to the -compiler [1]_. They are useful for runtime instrumentation and static analysis. -We can analyse, change and add further code during compilation via -callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_. - -The GCC plugin infrastructure of the kernel supports all gcc versions from -4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a -separate directory. -Plugin source files have to be compilable by both a C and a C++ compiler as well -because gcc versions 4.5 and 4.6 are compiled by a C compiler, -gcc-4.7 can be compiled by a C or a C++ compiler, -and versions 4.8+ can only be compiled by a C++ compiler. - -Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and -powerpc architectures. - -This infrastructure was ported from grsecurity [6]_ and PaX [7]_. - --- - -.. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html -.. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API -.. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html -.. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html -.. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html -.. [6] https://grsecurity.net/ -.. [7] https://pax.grsecurity.net/ - - -Files -===== - -**$(src)/scripts/gcc-plugins** - - This is the directory of the GCC plugins. - -**$(src)/scripts/gcc-plugins/gcc-common.h** - - This is a compatibility header for GCC plugins. - It should be always included instead of individual gcc headers. - -**$(src)/scripts/gcc-plugin.sh** - - This script checks the availability of the included headers in - gcc-common.h and chooses the proper host compiler to build the plugins - (gcc-4.7 can be built by either gcc or g++). - -**$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h, -$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h, -$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h, -$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h** - - These headers automatically generate the registration structures for - GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions - from 4.5 to 6.0. - They should be preferred to creating the structures by hand. - - -Usage -===== - -You must install the gcc plugin headers for your gcc version, -e.g., on Ubuntu for gcc-4.9:: - - apt-get install gcc-4.9-plugin-dev - -Enable a GCC plugin based feature in the kernel config:: - - CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y - -To compile only the plugin(s):: - - make gcc-plugins - -or just run the kernel make and compile the whole kernel with -the cyclomatic complexity GCC plugin. - - -4. How to add a new GCC plugin -============================== - -The GCC plugins are in $(src)/scripts/gcc-plugins/. You can use a file or a directory -here. It must be added to $(src)/scripts/gcc-plugins/Makefile, -$(src)/scripts/Makefile.gcc-plugins and $(src)/arch/Kconfig. -See the cyc_complexity_plugin.c (CONFIG_GCC_PLUGIN_CYC_COMPLEXITY) GCC plugin. diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index a501dc1c90d0..0897ad12c119 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -8,41 +8,81 @@ This is the beginning of a manual for core kernel APIs. The conversion Core utilities ============== +This section has general and "core core" documentation. The first is a +massive grab-bag of kerneldoc info left over from the docbook days; it +should really be broken up someday when somebody finds the energy to do +it. + .. toctree:: :maxdepth: 1 kernel-api + workqueue + printk-formats + symbol-namespaces + +Data structures and low-level utilities +======================================= + +Library functionality that is used throughout the kernel. + +.. toctree:: + :maxdepth: 1 + + kobject assoc_array + xarray + idr + circular-buffers + generic-radix-tree + packing + timekeeping + errseq + +Concurrency primitives +====================== + +How Linux keeps everything from happening at the same time. See +:doc:`/locking/index` for more related documentation. + +.. toctree:: + :maxdepth: 1 + atomic_ops - cachetlb refcount-vs-atomic - cpu_hotplug - idr local_ops - workqueue + padata + ../RCU/index + +Low-level hardware management +============================= + +Cache management, managing CPU hotplug, etc. + +.. toctree:: + :maxdepth: 1 + + cachetlb + cpu_hotplug + memory-hotplug genericirq - xarray - librs - genalloc - errseq - packing - printk-formats - circular-buffers - generic-radix-tree + protection-keys + +Memory management +================= + +How to allocate and use memory in the kernel. Note that there is a lot +more memory-management documentation in :doc:`/vm/index`. + +.. toctree:: + :maxdepth: 1 + memory-allocation mm-api + genalloc pin_user_pages - gfp_mask-from-fs-io - timekeeping boot-time-mm - memory-hotplug - protection-keys - ../RCU/index - gcc-plugins - symbol-namespaces - padata - ioctl - + gfp_mask-from-fs-io Interfaces for kernel debugging =============================== @@ -53,6 +93,16 @@ Interfaces for kernel debugging debug-objects tracepoint +Everything else +=============== + +Documents that don't fit elsewhere or which have yet to be categorized. + +.. toctree:: + :maxdepth: 1 + + librs + .. only:: subproject and html Indices diff --git a/Documentation/core-api/ioctl.rst b/Documentation/core-api/ioctl.rst deleted file mode 100644 index c455db0e1627..000000000000 --- a/Documentation/core-api/ioctl.rst +++ /dev/null @@ -1,253 +0,0 @@ -====================== -ioctl based interfaces -====================== - -ioctl() is the most common way for applications to interface -with device drivers. It is flexible and easily extended by adding new -commands and can be passed through character devices, block devices as -well as sockets and other special file descriptors. - -However, it is also very easy to get ioctl command definitions wrong, -and hard to fix them later without breaking existing applications, -so this documentation tries to help developers get it right. - -Command number definitions -========================== - -The command number, or request number, is the second argument passed to -the ioctl system call. While this can be any 32-bit number that uniquely -identifies an action for a particular driver, there are a number of -conventions around defining them. - -``include/uapi/asm-generic/ioctl.h`` provides four macros for defining -ioctl commands that follow modern conventions: ``_IO``, ``_IOR``, -``_IOW``, and ``_IOWR``. These should be used for all new commands, -with the correct parameters: - -_IO/_IOR/_IOW/_IOWR - The macro name specifies how the argument will be used. It may be a - pointer to data to be passed into the kernel (_IOW), out of the kernel - (_IOR), or both (_IOWR). _IO can indicate either commands with no - argument or those passing an integer value instead of a pointer. - It is recommended to only use _IO for commands without arguments, - and use pointers for passing data. - -type - An 8-bit number, often a character literal, specific to a subsystem - or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number` - -nr - An 8-bit number identifying the specific command, unique for a give - value of 'type' - -data_type - The name of the data type pointed to by the argument, the command number - encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer, - leading to a limit of 8191 bytes for the maximum size of the argument. - Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that - will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t). - _IO does not have a data_type parameter. - - -Interface versions -================== - -Some subsystems use version numbers in data structures to overload -commands with different interpretations of the argument. - -This is generally a bad idea, since changes to existing commands tend -to break existing applications. - -A better approach is to add a new ioctl command with a new number. The -old command still needs to be implemented in the kernel for compatibility, -but this can be a wrapper around the new implementation. - -Return code -=========== - -ioctl commands can return negative error codes as documented in errno(3); -these get turned into errno values in user space. On success, the return -code should be zero. It is also possible but not recommended to return -a positive 'long' value. - -When the ioctl callback is called with an unknown command number, the -handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in --ENOTTY being returned from the system call. Some subsystems return --ENOSYS or -EINVAL here for historic reasons, but this is wrong. - -Prior to Linux 5.5, compat_ioctl handlers were required to return --ENOIOCTLCMD in order to use the fallback conversion into native -commands. As all subsystems are now responsible for handling compat -mode themselves, this is no longer needed, but it may be important to -consider when backporting bug fixes to older kernels. - -Timestamps -========== - -Traditionally, timestamps and timeout values are passed as ``struct -timespec`` or ``struct timeval``, but these are problematic because of -incompatible definitions of these structures in user space after the -move to 64-bit time_t. - -The ``struct __kernel_timespec`` type can be used instead to be embedded -in other data structures when separate second/nanosecond values are -desired, or passed to user space directly. This is still not ideal though, -as the structure matches neither the kernel's timespec64 nor the user -space timespec exactly. The get_timespec64() and put_timespec64() helper -functions can be used to ensure that the layout remains compatible with -user space and the padding is treated correctly. - -As it is cheap to convert seconds to nanoseconds, but the opposite -requires an expensive 64-bit division, a simple __u64 nanosecond value -can be simpler and more efficient. - -Timeout values and timestamps should ideally use CLOCK_MONOTONIC time, -as returned by ktime_get_ns() or ktime_get_ts64(). Unlike -CLOCK_REALTIME, this makes the timestamps immune from jumping backwards -or forwards due to leap second adjustments and clock_settime() calls. - -ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that -need to be persistent across a reboot or between multiple machines. - -32-bit compat mode -================== - -In order to support 32-bit user space running on a 64-bit machine, each -subsystem or driver that implements an ioctl callback handler must also -implement the corresponding compat_ioctl handler. - -As long as all the rules for data structures are followed, this is as -easy as setting the .compat_ioctl pointer to a helper function such as -compat_ptr_ioctl() or blkdev_compat_ptr_ioctl(). - -compat_ptr() ------------- - -On the s390 architecture, 31-bit user space has ambiguous representations -for data pointers, with the upper bit being ignored. When running such -a process in compat mode, the compat_ptr() helper must be used to -clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit -pointer. On other architectures, this macro only performs a cast to a -``void __user *`` pointer. - -In an compat_ioctl() callback, the last argument is an unsigned long, -which can be interpreted as either a pointer or a scalar depending on -the command. If it is a scalar, then compat_ptr() must not be used, to -ensure that the 64-bit kernel behaves the same way as a 32-bit kernel -for arguments with the upper bit set. - -The compat_ptr_ioctl() helper can be used in place of a custom -compat_ioctl file operation for drivers that only take arguments that -are pointers to compatible data structures. - -Structure layout ----------------- - -Compatible data structures have the same layout on all architectures, -avoiding all problematic members: - -* ``long`` and ``unsigned long`` are the size of a register, so - they can be either 32-bit or 64-bit wide and cannot be used in portable - data structures. Fixed-length replacements are ``__s32``, ``__u32``, - ``__s64`` and ``__u64``. - -* Pointers have the same problem, in addition to requiring the - use of compat_ptr(). The best workaround is to use ``__u64`` - in place of pointers, which requires a cast to ``uintptr_t`` in user - space, and the use of u64_to_user_ptr() in the kernel to convert - it back into a user pointer. - -* On the x86-32 (i386) architecture, the alignment of 64-bit variables - is only 32-bit, but they are naturally aligned on most other - architectures including x86-64. This means a structure like:: - - struct foo { - __u32 a; - __u64 b; - __u32 c; - }; - - has four bytes of padding between a and b on x86-64, plus another four - bytes of padding at the end, but no padding on i386, and it needs a - compat_ioctl conversion handler to translate between the two formats. - - To avoid this problem, all structures should have their members - naturally aligned, or explicit reserved fields added in place of the - implicit padding. The ``pahole`` tool can be used for checking the - alignment. - -* On ARM OABI user space, structures are padded to multiples of 32-bit, - making some structs incompatible with modern EABI kernels if they - do not end on a 32-bit boundary. - -* On the m68k architecture, struct members are not guaranteed to have an - alignment greater than 16-bit, which is a problem when relying on - implicit padding. - -* Bitfields and enums generally work as one would expect them to, - but some properties of them are implementation-defined, so it is better - to avoid them completely in ioctl interfaces. - -* ``char`` members can be either signed or unsigned, depending on - the architecture, so the __u8 and __s8 types should be used for 8-bit - integer values, though char arrays are clearer for fixed-length strings. - -Information leaks -================= - -Uninitialized data must not be copied back to user space, as this can -cause an information leak, which can be used to defeat kernel address -space layout randomization (KASLR), helping in an attack. - -For this reason (and for compat support) it is best to avoid any -implicit padding in data structures. Where there is implicit padding -in an existing structure, kernel drivers must be careful to fully -initialize an instance of the structure before copying it to user -space. This is usually done by calling memset() before assigning to -individual members. - -Subsystem abstractions -====================== - -While some device drivers implement their own ioctl function, most -subsystems implement the same command for multiple drivers. Ideally the -subsystem has an .ioctl() handler that copies the arguments from and -to user space, passing them into subsystem specific callback functions -through normal kernel pointers. - -This helps in various ways: - -* Applications written for one driver are more likely to work for - another one in the same subsystem if there are no subtle differences - in the user space ABI. - -* The complexity of user space access and data structure layout is done - in one place, reducing the potential for implementation bugs. - -* It is more likely to be reviewed by experienced developers - that can spot problems in the interface when the ioctl is shared - between multiple drivers than when it is only used in a single driver. - -Alternatives to ioctl -===================== - -There are many cases in which ioctl is not the best solution for a -problem. Alternatives include: - -* System calls are a better choice for a system-wide feature that - is not tied to a physical device or constrained by the file system - permissions of a character device node - -* netlink is the preferred way of configuring any network related - objects through sockets. - -* debugfs is used for ad-hoc interfaces for debugging functionality - that does not need to be exposed as a stable interface to applications. - -* sysfs is a good way to expose the state of an in-kernel object - that is not tied to a file descriptor. - -* configfs can be used for more complex configuration than sysfs - -* A custom file system can provide extra flexibility with a simple - user interface but adds a lot of complexity to the implementation. diff --git a/Documentation/core-api/kobject.rst b/Documentation/core-api/kobject.rst new file mode 100644 index 000000000000..1f62d4d7d966 --- /dev/null +++ b/Documentation/core-api/kobject.rst @@ -0,0 +1,434 @@ +===================================================================== +Everything you never wanted to know about kobjects, ksets, and ktypes +===================================================================== + +:Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> +:Last updated: December 19, 2007 + +Based on an original article by Jon Corbet for lwn.net written October 1, +2003 and located at http://lwn.net/Articles/51437/ + +Part of the difficulty in understanding the driver model - and the kobject +abstraction upon which it is built - is that there is no obvious starting +place. Dealing with kobjects requires understanding a few different types, +all of which make reference to each other. In an attempt to make things +easier, we'll take a multi-pass approach, starting with vague terms and +adding detail as we go. To that end, here are some quick definitions of +some terms we will be working with. + + - A kobject is an object of type struct kobject. Kobjects have a name + and a reference count. A kobject also has a parent pointer (allowing + objects to be arranged into hierarchies), a specific type, and, + usually, a representation in the sysfs virtual filesystem. + + Kobjects are generally not interesting on their own; instead, they are + usually embedded within some other structure which contains the stuff + the code is really interested in. + + No structure should **EVER** have more than one kobject embedded within it. + If it does, the reference counting for the object is sure to be messed + up and incorrect, and your code will be buggy. So do not do this. + + - A ktype is the type of object that embeds a kobject. Every structure + that embeds a kobject needs a corresponding ktype. The ktype controls + what happens to the kobject when it is created and destroyed. + + - A kset is a group of kobjects. These kobjects can be of the same ktype + or belong to different ktypes. The kset is the basic container type for + collections of kobjects. Ksets contain their own kobjects, but you can + safely ignore that implementation detail as the kset core code handles + this kobject automatically. + + When you see a sysfs directory full of other directories, generally each + of those directories corresponds to a kobject in the same kset. + +We'll look at how to create and manipulate all of these types. A bottom-up +approach will be taken, so we'll go back to kobjects. + + +Embedding kobjects +================== + +It is rare for kernel code to create a standalone kobject, with one major +exception explained below. Instead, kobjects are used to control access to +a larger, domain-specific object. To this end, kobjects will be found +embedded in other structures. If you are used to thinking of things in +object-oriented terms, kobjects can be seen as a top-level, abstract class +from which other classes are derived. A kobject implements a set of +capabilities which are not particularly useful by themselves, but are +nice to have in other objects. The C language does not allow for the +direct expression of inheritance, so other techniques - such as structure +embedding - must be used. + +(As an aside, for those familiar with the kernel linked list implementation, +this is analogous as to how "list_head" structs are rarely useful on +their own, but are invariably found embedded in the larger objects of +interest.) + +So, for example, the UIO code in ``drivers/uio/uio.c`` has a structure that +defines the memory region associated with a uio device:: + + struct uio_map { + struct kobject kobj; + struct uio_mem *mem; + }; + +If you have a struct uio_map structure, finding its embedded kobject is +just a matter of using the kobj member. Code that works with kobjects will +often have the opposite problem, however: given a struct kobject pointer, +what is the pointer to the containing structure? You must avoid tricks +(such as assuming that the kobject is at the beginning of the structure) +and, instead, use the container_of() macro, found in ``<linux/kernel.h>``:: + + container_of(pointer, type, member) + +where: + + * ``pointer`` is the pointer to the embedded kobject, + * ``type`` is the type of the containing structure, and + * ``member`` is the name of the structure field to which ``pointer`` points. + +The return value from container_of() is a pointer to the corresponding +container type. So, for example, a pointer ``kp`` to a struct kobject +embedded **within** a struct uio_map could be converted to a pointer to the +**containing** uio_map structure with:: + + struct uio_map *u_map = container_of(kp, struct uio_map, kobj); + +For convenience, programmers often define a simple macro for **back-casting** +kobject pointers to the containing type. Exactly this happens in the +earlier ``drivers/uio/uio.c``, as you can see here:: + + struct uio_map { + struct kobject kobj; + struct uio_mem *mem; + }; + + #define to_map(map) container_of(map, struct uio_map, kobj) + +where the macro argument "map" is a pointer to the struct kobject in +question. That macro is subsequently invoked with:: + + struct uio_map *map = to_map(kobj); + + +Initialization of kobjects +========================== + +Code which creates a kobject must, of course, initialize that object. Some +of the internal fields are setup with a (mandatory) call to kobject_init():: + + void kobject_init(struct kobject *kobj, struct kobj_type *ktype); + +The ktype is required for a kobject to be created properly, as every kobject +must have an associated kobj_type. After calling kobject_init(), to +register the kobject with sysfs, the function kobject_add() must be called:: + + int kobject_add(struct kobject *kobj, struct kobject *parent, + const char *fmt, ...); + +This sets up the parent of the kobject and the name for the kobject +properly. If the kobject is to be associated with a specific kset, +kobj->kset must be assigned before calling kobject_add(). If a kset is +associated with a kobject, then the parent for the kobject can be set to +NULL in the call to kobject_add() and then the kobject's parent will be the +kset itself. + +As the name of the kobject is set when it is added to the kernel, the name +of the kobject should never be manipulated directly. If you must change +the name of the kobject, call kobject_rename():: + + int kobject_rename(struct kobject *kobj, const char *new_name); + +kobject_rename does not perform any locking or have a solid notion of +what names are valid so the caller must provide their own sanity checking +and serialization. + +There is a function called kobject_set_name() but that is legacy cruft and +is being removed. If your code needs to call this function, it is +incorrect and needs to be fixed. + +To properly access the name of the kobject, use the function +kobject_name():: + + const char *kobject_name(const struct kobject * kobj); + +There is a helper function to both initialize and add the kobject to the +kernel at the same time, called surprisingly enough kobject_init_and_add():: + + int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, + struct kobject *parent, const char *fmt, ...); + +The arguments are the same as the individual kobject_init() and +kobject_add() functions described above. + + +Uevents +======= + +After a kobject has been registered with the kobject core, you need to +announce to the world that it has been created. This can be done with a +call to kobject_uevent():: + + int kobject_uevent(struct kobject *kobj, enum kobject_action action); + +Use the **KOBJ_ADD** action for when the kobject is first added to the kernel. +This should be done only after any attributes or children of the kobject +have been initialized properly, as userspace will instantly start to look +for them when this call happens. + +When the kobject is removed from the kernel (details on how to do that are +below), the uevent for **KOBJ_REMOVE** will be automatically created by the +kobject core, so the caller does not have to worry about doing that by +hand. + + +Reference counts +================ + +One of the key functions of a kobject is to serve as a reference counter +for the object in which it is embedded. As long as references to the object +exist, the object (and the code which supports it) must continue to exist. +The low-level functions for manipulating a kobject's reference counts are:: + + struct kobject *kobject_get(struct kobject *kobj); + void kobject_put(struct kobject *kobj); + +A successful call to kobject_get() will increment the kobject's reference +counter and return the pointer to the kobject. + +When a reference is released, the call to kobject_put() will decrement the +reference count and, possibly, free the object. Note that kobject_init() +sets the reference count to one, so the code which sets up the kobject will +need to do a kobject_put() eventually to release that reference. + +Because kobjects are dynamic, they must not be declared statically or on +the stack, but instead, always allocated dynamically. Future versions of +the kernel will contain a run-time check for kobjects that are created +statically and will warn the developer of this improper usage. + +If all that you want to use a kobject for is to provide a reference counter +for your structure, please use the struct kref instead; a kobject would be +overkill. For more information on how to use struct kref, please see the +file Documentation/kref.txt in the Linux kernel source tree. + + +Creating "simple" kobjects +========================== + +Sometimes all that a developer wants is a way to create a simple directory +in the sysfs hierarchy, and not have to mess with the whole complication of +ksets, show and store functions, and other details. This is the one +exception where a single kobject should be created. To create such an +entry, use the function:: + + struct kobject *kobject_create_and_add(char *name, struct kobject *parent); + +This function will create a kobject and place it in sysfs in the location +underneath the specified parent kobject. To create simple attributes +associated with this kobject, use:: + + int sysfs_create_file(struct kobject *kobj, struct attribute *attr); + +or:: + + int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp); + +Both types of attributes used here, with a kobject that has been created +with the kobject_create_and_add(), can be of type kobj_attribute, so no +special custom attribute is needed to be created. + +See the example module, ``samples/kobject/kobject-example.c`` for an +implementation of a simple kobject and attributes. + + + +ktypes and release methods +========================== + +One important thing still missing from the discussion is what happens to a +kobject when its reference count reaches zero. The code which created the +kobject generally does not know when that will happen; if it did, there +would be little point in using a kobject in the first place. Even +predictable object lifecycles become more complicated when sysfs is brought +in as other portions of the kernel can get a reference on any kobject that +is registered in the system. + +The end result is that a structure protected by a kobject cannot be freed +before its reference count goes to zero. The reference count is not under +the direct control of the code which created the kobject. So that code must +be notified asynchronously whenever the last reference to one of its +kobjects goes away. + +Once you registered your kobject via kobject_add(), you must never use +kfree() to free it directly. The only safe way is to use kobject_put(). It +is good practice to always use kobject_put() after kobject_init() to avoid +errors creeping in. + +This notification is done through a kobject's release() method. Usually +such a method has a form like:: + + void my_object_release(struct kobject *kobj) + { + struct my_object *mine = container_of(kobj, struct my_object, kobj); + + /* Perform any additional cleanup on this object, then... */ + kfree(mine); + } + +One important point cannot be overstated: every kobject must have a +release() method, and the kobject must persist (in a consistent state) +until that method is called. If these constraints are not met, the code is +flawed. Note that the kernel will warn you if you forget to provide a +release() method. Do not try to get rid of this warning by providing an +"empty" release function. + +If all your cleanup function needs to do is call kfree(), then you must +create a wrapper function which uses container_of() to upcast to the correct +type (as shown in the example above) and then calls kfree() on the overall +structure. + +Note, the name of the kobject is available in the release function, but it +must NOT be changed within this callback. Otherwise there will be a memory +leak in the kobject core, which makes people unhappy. + +Interestingly, the release() method is not stored in the kobject itself; +instead, it is associated with the ktype. So let us introduce struct +kobj_type:: + + struct kobj_type { + void (*release)(struct kobject *kobj); + const struct sysfs_ops *sysfs_ops; + struct attribute **default_attrs; + const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj); + const void *(*namespace)(struct kobject *kobj); + }; + +This structure is used to describe a particular type of kobject (or, more +correctly, of containing object). Every kobject needs to have an associated +kobj_type structure; a pointer to that structure must be specified when you +call kobject_init() or kobject_init_and_add(). + +The release field in struct kobj_type is, of course, a pointer to the +release() method for this type of kobject. The other two fields (sysfs_ops +and default_attrs) control how objects of this type are represented in +sysfs; they are beyond the scope of this document. + +The default_attrs pointer is a list of default attributes that will be +automatically created for any kobject that is registered with this ktype. + + +ksets +===== + +A kset is merely a collection of kobjects that want to be associated with +each other. There is no restriction that they be of the same ktype, but be +very careful if they are not. + +A kset serves these functions: + + - It serves as a bag containing a group of objects. A kset can be used by + the kernel to track "all block devices" or "all PCI device drivers." + + - A kset is also a subdirectory in sysfs, where the associated kobjects + with the kset can show up. Every kset contains a kobject which can be + set up to be the parent of other kobjects; the top-level directories of + the sysfs hierarchy are constructed in this way. + + - Ksets can support the "hotplugging" of kobjects and influence how + uevent events are reported to user space. + +In object-oriented terms, "kset" is the top-level container class; ksets +contain their own kobject, but that kobject is managed by the kset code and +should not be manipulated by any other user. + +A kset keeps its children in a standard kernel linked list. Kobjects point +back to their containing kset via their kset field. In almost all cases, +the kobjects belonging to a kset have that kset (or, strictly, its embedded +kobject) in their parent. + +As a kset contains a kobject within it, it should always be dynamically +created and never declared statically or on the stack. To create a new +kset use:: + + struct kset *kset_create_and_add(const char *name, + struct kset_uevent_ops *u, + struct kobject *parent); + +When you are finished with the kset, call:: + + void kset_unregister(struct kset *kset); + +to destroy it. This removes the kset from sysfs and decrements its reference +count. When the reference count goes to zero, the kset will be released. +Because other references to the kset may still exist, the release may happen +after kset_unregister() returns. + +An example of using a kset can be seen in the +``samples/kobject/kset-example.c`` file in the kernel tree. + +If a kset wishes to control the uevent operations of the kobjects +associated with it, it can use the struct kset_uevent_ops to handle it:: + + struct kset_uevent_ops { + int (*filter)(struct kset *kset, struct kobject *kobj); + const char *(*name)(struct kset *kset, struct kobject *kobj); + int (*uevent)(struct kset *kset, struct kobject *kobj, + struct kobj_uevent_env *env); + }; + + +The filter function allows a kset to prevent a uevent from being emitted to +userspace for a specific kobject. If the function returns 0, the uevent +will not be emitted. + +The name function will be called to override the default name of the kset +that the uevent sends to userspace. By default, the name will be the same +as the kset itself, but this function, if present, can override that name. + +The uevent function will be called when the uevent is about to be sent to +userspace to allow more environment variables to be added to the uevent. + +One might ask how, exactly, a kobject is added to a kset, given that no +functions which perform that function have been presented. The answer is +that this task is handled by kobject_add(). When a kobject is passed to +kobject_add(), its kset member should point to the kset to which the +kobject will belong. kobject_add() will handle the rest. + +If the kobject belonging to a kset has no parent kobject set, it will be +added to the kset's directory. Not all members of a kset do necessarily +live in the kset directory. If an explicit parent kobject is assigned +before the kobject is added, the kobject is registered with the kset, but +added below the parent kobject. + + +Kobject removal +=============== + +After a kobject has been registered with the kobject core successfully, it +must be cleaned up when the code is finished with it. To do that, call +kobject_put(). By doing this, the kobject core will automatically clean up +all of the memory allocated by this kobject. If a ``KOBJ_ADD`` uevent has been +sent for the object, a corresponding ``KOBJ_REMOVE`` uevent will be sent, and +any other sysfs housekeeping will be handled for the caller properly. + +If y |