Merge tag 'docs-5.7' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet: "This has been a busy cycle for documentation work. Highlights include: - Lots of RST conversion work by Mauro, Daniel ALmeida, and others. Maybe someday we'll get to the end of this stuff...maybe... - Some organizational work to bring some order to the core-api manual. - Various new docs and additions to the existing documentation. - Typo fixes, warning fixes, ..." * tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits) Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT MAINTAINERS: adjust to filesystem doc ReST conversion docs: deprecated.rst: Add BUG()-family doc: zh_CN: add translation for virtiofs doc: zh_CN: index files in filesystems subdirectory docs: locking: Drop :c:func: throughout docs: locking: Add 'need' to hardirq section docs: conf.py: avoid thousands of duplicate label warning on Sphinx docs: prevent warnings due to autosectionlabel docs: fix reference to core-api/namespaces.rst docs: fix pointers to io-mapping.rst and io_ordering.rst files Documentation: Better document the softlockup_panic sysctl docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref docs: perf: imx-ddr.rst: get rid of a warning docs: filesystems: fuse.rst: supress a Sphinx warning docs: translations: it: avoid duplicate refs at programming-language.rst docs: driver.rst: supress two ReSt warnings docs: trace: events.rst: convert some new stuff to ReST format Documentation: Add io_ordering.rst to driver-api manual Documentation: Add io-mapping.rst to driver-api manual ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2020-03-30 12:45:23 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2020-03-30 12:45:23 -0700
commit: 481ed297d900af0ce395f6ca8975903b76a5a59e (patch)
tree: e3862e9993cd8e2245c5a6d632f45dd3f77d1d62 /Documentation/core-api
parent: e59cd88028dbd41472453e5883f78330aa73c56e (diff)
parent: abcb1e021ae5a36374c635eeaba5cec733169b78 (diff)
4 files changed, 506 insertions, 368 deletions
diff --git a/Documentation/core-api/gcc-plugins.rst b/Documentation/core-api/gcc-plugins.rst
deleted file mode 100644
index 8502f24396fb..000000000000
--- a/Documentation/core-api/gcc-plugins.rst
+++ /dev/null
@@ -1,93 +0,0 @@
-=========================
-GCC plugin infrastructure
-=========================
-
-
-Introduction
-============
-
-GCC plugins are loadable modules that provide extra features to the
-compiler [1]_. They are useful for runtime instrumentation and static analysis.
-We can analyse, change and add further code during compilation via
-callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_.
-
-The GCC plugin infrastructure of the kernel supports all gcc versions from
-4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a
-separate directory.
-Plugin source files have to be compilable by both a C and a C++ compiler as well
-because gcc versions 4.5 and 4.6 are compiled by a C compiler,
-gcc-4.7 can be compiled by a C or a C++ compiler,
-and versions 4.8+ can only be compiled by a C++ compiler.
-
-Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and
-powerpc architectures.
-
-This infrastructure was ported from grsecurity [6]_ and PaX [7]_.
-
---
-
-.. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html
-.. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API
-.. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html
-.. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html
-.. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html
-.. [6] https://grsecurity.net/
-.. [7] https://pax.grsecurity.net/
-
-
-Files
-=====
-
-**$(src)/scripts/gcc-plugins**
-
-	This is the directory of the GCC plugins.
-
-**$(src)/scripts/gcc-plugins/gcc-common.h**
-
-	This is a compatibility header for GCC plugins.
-	It should be always included instead of individual gcc headers.
-
-**$(src)/scripts/gcc-plugin.sh**
-
-	This script checks the availability of the included headers in
-	gcc-common.h and chooses the proper host compiler to build the plugins
-	(gcc-4.7 can be built by either gcc or g++).
-
-**$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h**
-
-	These headers automatically generate the registration structures for
-	GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions
-	from 4.5 to 6.0.
-	They should be preferred to creating the structures by hand.
-
-
-Usage
-=====
-
-You must install the gcc plugin headers for your gcc version,
-e.g., on Ubuntu for gcc-4.9::
-
-	apt-get install gcc-4.9-plugin-dev
-
-Enable a GCC plugin based feature in the kernel config::
-
-	CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
-
-To compile only the plugin(s)::
-
-	make gcc-plugins
-
-or just run the kernel make and compile the whole kernel with
-the cyclomatic complexity GCC plugin.
-
-
-4. How to add a new GCC plugin
-==============================
-
-The GCC plugins are in $(src)/scripts/gcc-plugins/. You can use a file or a directory
-here. It must be added to $(src)/scripts/gcc-plugins/Makefile,
-$(src)/scripts/Makefile.gcc-plugins and $(src)/arch/Kconfig.
-See the cyc_complexity_plugin.c (CONFIG_GCC_PLUGIN_CYC_COMPLEXITY) GCC plugin.
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index a501dc1c90d0..0897ad12c119 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -8,41 +8,81 @@ This is the beginning of a manual for core kernel APIs.  The conversion
 Core utilities
 ==============
 
+This section has general and "core core" documentation.  The first is a
+massive grab-bag of kerneldoc info left over from the docbook days; it
+should really be broken up someday when somebody finds the energy to do
+it.
+
 .. toctree::
    :maxdepth: 1
 
    kernel-api
+   workqueue
+   printk-formats
+   symbol-namespaces
+
+Data structures and low-level utilities
+=======================================
+
+Library functionality that is used throughout the kernel.
+
+.. toctree::
+   :maxdepth: 1
+
+   kobject
    assoc_array
+   xarray
+   idr
+   circular-buffers
+   generic-radix-tree
+   packing
+   timekeeping
+   errseq
+
+Concurrency primitives
+======================
+
+How Linux keeps everything from happening at the same time.  See
+:doc:`/locking/index` for more related documentation.
+
+.. toctree::
+   :maxdepth: 1
+
    atomic_ops
-   cachetlb
    refcount-vs-atomic
-   cpu_hotplug
-   idr
    local_ops
-   workqueue
+   padata
+   ../RCU/index
+
+Low-level hardware management
+=============================
+
+Cache management, managing CPU hotplug, etc.
+
+.. toctree::
+   :maxdepth: 1
+
+   cachetlb
+   cpu_hotplug
+   memory-hotplug
    genericirq
-   xarray
-   librs
-   genalloc
-   errseq
-   packing
-   printk-formats
-   circular-buffers
-   generic-radix-tree
+   protection-keys
+
+Memory management
+=================
+
+How to allocate and use memory in the kernel.  Note that there is a lot
+more memory-management documentation in :doc:`/vm/index`.
+
+.. toctree::
+   :maxdepth: 1
+
    memory-allocation
    mm-api
+   genalloc
    pin_user_pages
-   gfp_mask-from-fs-io
-   timekeeping
    boot-time-mm
-   memory-hotplug
-   protection-keys
-   ../RCU/index
-   gcc-plugins
-   symbol-namespaces
-   padata
-   ioctl
-
+   gfp_mask-from-fs-io
 
 Interfaces for kernel debugging
 ===============================
@@ -53,6 +93,16 @@ Interfaces for kernel debugging
    debug-objects
    tracepoint
 
+Everything else
+===============
+
+Documents that don't fit elsewhere or which have yet to be categorized.
+
+.. toctree::
+   :maxdepth: 1
+
+   librs
+
 .. only:: subproject and html
 
    Indices
diff --git a/Documentation/core-api/ioctl.rst b/Documentation/core-api/ioctl.rst
deleted file mode 100644
index c455db0e1627..000000000000
--- a/Documentation/core-api/ioctl.rst
+++ /dev/null
@@ -1,253 +0,0 @@
-======================
-ioctl based interfaces
-======================
-
-ioctl() is the most common way for applications to interface
-with device drivers. It is flexible and easily extended by adding new
-commands and can be passed through character devices, block devices as
-well as sockets and other special file descriptors.
-
-However, it is also very easy to get ioctl command definitions wrong,
-and hard to fix them later without breaking existing applications,
-so this documentation tries to help developers get it right.
-
-Command number definitions
-==========================
-
-The command number, or request number, is the second argument passed to
-the ioctl system call. While this can be any 32-bit number that uniquely
-identifies an action for a particular driver, there are a number of
-conventions around defining them.
-
-``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
-ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
-``_IOW``, and ``_IOWR``. These should be used for all new commands,
-with the correct parameters:
-
-_IO/_IOR/_IOW/_IOWR
-   The macro name specifies how the argument will be used.  It may be a
-   pointer to data to be passed into the kernel (_IOW), out of the kernel
-   (_IOR), or both (_IOWR).  _IO can indicate either commands with no
-   argument or those passing an integer value instead of a pointer.
-   It is recommended to only use _IO for commands without arguments,
-   and use pointers for passing data.
-
-type
-   An 8-bit number, often a character literal, specific to a subsystem
-   or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number`
-
-nr
-  An 8-bit number identifying the specific command, unique for a give
-  value of 'type'
-
-data_type
-  The name of the data type pointed to by the argument, the command number
-  encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
-  leading to a limit of 8191 bytes for the maximum size of the argument.
-  Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
-  will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
-  _IO does not have a data_type parameter.
-
-
-Interface versions
-==================
-
-Some subsystems use version numbers in data structures to overload
-commands with different interpretations of the argument.
-
-This is generally a bad idea, since changes to existing commands tend
-to break existing applications.
-
-A better approach is to add a new ioctl command with a new number. The
-old command still needs to be implemented in the kernel for compatibility,
-but this can be a wrapper around the new implementation.
-
-Return code
-===========
-
-ioctl commands can return negative error codes as documented in errno(3);
-these get turned into errno values in user space. On success, the return
-code should be zero. It is also possible but not recommended to return
-a positive 'long' value.
-
-When the ioctl callback is called with an unknown command number, the
-handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
--ENOTTY being returned from the system call. Some subsystems return
--ENOSYS or -EINVAL here for historic reasons, but this is wrong.
-
-Prior to Linux 5.5, compat_ioctl handlers were required to return
--ENOIOCTLCMD in order to use the fallback conversion into native
-commands. As all subsystems are now responsible for handling compat
-mode themselves, this is no longer needed, but it may be important to
-consider when backporting bug fixes to older kernels.
-
-Timestamps
-==========
-
-Traditionally, timestamps and timeout values are passed as ``struct
-timespec`` or ``struct timeval``, but these are problematic because of
-incompatible definitions of these structures in user space after the
-move to 64-bit time_t.
-
-The ``struct __kernel_timespec`` type can be used instead to be embedded
-in other data structures when separate second/nanosecond values are
-desired, or passed to user space directly. This is still not ideal though,
-as the structure matches neither the kernel's timespec64 nor the user
-space timespec exactly. The get_timespec64() and put_timespec64() helper
-functions can be used to ensure that the layout remains compatible with
-user space and the padding is treated correctly.
-
-As it is cheap to convert seconds to nanoseconds, but the opposite
-requires an expensive 64-bit division, a simple __u64 nanosecond value
-can be simpler and more efficient.
-
-Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
-as returned by ktime_get_ns() or ktime_get_ts64().  Unlike
-CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
-or forwards due to leap second adjustments and clock_settime() calls.
-
-ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
-need to be persistent across a reboot or between multiple machines.
-
-32-bit compat mode
-==================
-
-In order to support 32-bit user space running on a 64-bit machine, each
-subsystem or driver that implements an ioctl callback handler must also
-implement the corresponding compat_ioctl handler.
-
-As long as all the rules for data structures are followed, this is as
-easy as setting the .compat_ioctl pointer to a helper function such as
-compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
-
-compat_ptr()
-------------
-
-On the s390 architecture, 31-bit user space has ambiguous representations
-for data pointers, with the upper bit being ignored. When running such
-a process in compat mode, the compat_ptr() helper must be used to
-clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
-pointer.  On other architectures, this macro only performs a cast to a
-``void __user *`` pointer.
-
-In an compat_ioctl() callback, the last argument is an unsigned long,
-which can be interpreted as either a pointer or a scalar depending on
-the command. If it is a scalar, then compat_ptr() must not be used, to
-ensure that the 64-bit kernel behaves the same way as a 32-bit kernel
-for arguments with the upper bit set.
-
-The compat_ptr_ioctl() helper can be used in place of a custom
-compat_ioctl file operation for drivers that only take arguments that
-are pointers to compatible data structures.
-
-Structure layout
-----------------
-
-Compatible data structures have the same layout on all architectures,
-avoiding all problematic members:
-
-* ``long`` and ``unsigned long`` are the size of a register, so
-  they can be either 32-bit or 64-bit wide and cannot be used in portable
-  data structures. Fixed-length replacements are ``__s32``, ``__u32``,
-  ``__s64`` and ``__u64``.
-
-* Pointers have the same problem, in addition to requiring the
-  use of compat_ptr(). The best workaround is to use ``__u64``
-  in place of pointers, which requires a cast to ``uintptr_t`` in user
-  space, and the use of u64_to_user_ptr() in the kernel to convert
-  it back into a user pointer.
-
-* On the x86-32 (i386) architecture, the alignment of 64-bit variables
-  is only 32-bit, but they are naturally aligned on most other
-  architectures including x86-64. This means a structure like::
-
-    struct foo {
-        __u32 a;
-        __u64 b;
-        __u32 c;
-    };
-
-  has four bytes of padding between a and b on x86-64, plus another four
-  bytes of padding at the end, but no padding on i386, and it needs a
-  compat_ioctl conversion handler to translate between the two formats.
-
-  To avoid this problem, all structures should have their members
-  naturally aligned, or explicit reserved fields added in place of the
-  implicit padding. The ``pahole`` tool can be used for checking the
-  alignment.
-
-* On ARM OABI user space, structures are padded to multiples of 32-bit,
-  making some structs incompatible with modern EABI kernels if they
-  do not end on a 32-bit boundary.
-
-* On the m68k architecture, struct members are not guaranteed to have an
-  alignment greater than 16-bit, which is a problem when relying on
-  implicit padding.
-
-* Bitfields and enums generally work as one would expect them to,
-  but some properties of them are implementation-defined, so it is better
-  to avoid them completely in ioctl interfaces.
-
-* ``char`` members can be either signed or unsigned, depending on
-  the architecture, so the __u8 and __s8 types should be used for 8-bit
-  integer values, though char arrays are clearer for fixed-length strings.
-
-Information leaks
-=================
-
-Uninitialized data must not be copied back to user space, as this can
-cause an information leak, which can be used to defeat kernel address
-space layout randomization (KASLR), helping in an attack.
-
-For this reason (and for compat support) it is best to avoid any
-implicit padding in data structures.  Where there is implicit padding
-in an existing structure, kernel drivers must be careful to fully
-initialize an instance of the structure before copying it to user
-space.  This is usually done by calling memset() before assigning to
-individual members.
-
-Subsystem abstractions
-======================
-
-While some device drivers implement their own ioctl function, most
-subsystems implement the same command for multiple drivers.  Ideally the
-subsystem has an .ioctl() handler that copies the arguments from and
-to user space, passing them into subsystem specific callback functions
-through normal kernel pointers.
-
-This helps in various ways:
-
-* Applications written for one driver are more likely to work for
-  another one in the same subsystem if there are no subtle differences
-  in the user space ABI.
-
-* The complexity of user space access and data structure layout is done
-  in one place, reducing the potential for implementation bugs.
-
-* It is more likely to be reviewed by experienced developers
-  that can spot problems in the interface when the ioctl is shared
-  between multiple drivers than when it is only used in a single driver.
-
-Alternatives to ioctl
-=====================
-
-There are many cases in which ioctl is not the best solution for a
-problem. Alternatives include:
-
-* System calls are a better choice for a system-wide feature that
-  is not tied to a physical device or constrained by the file system
-  permissions of a character device node
-
-* netlink is the preferred way of configuring any network related
-  objects through sockets.
-
-* debugfs is used for ad-hoc interfaces for debugging functionality
-  that does not need to be exposed as a stable interface to applications.
-
-* sysfs is a good way to expose the state of an in-kernel object
-  that is not tied to a file descriptor.
-
-* configfs can be used for more complex configuration than sysfs
-
-* A custom file system can provide extra flexibility with a simple
-  user interface but adds a lot of complexity to the implementation.
diff --git a/Documentation/core-api/kobject.rst b/Documentation/core-api/kobject.rst
new file mode 100644
index 000000000000..1f62d4d7d966
--- /dev/null
+++ b/Documentation/core-api/kobject.rst
@@ -0,0 +1,434 @@
+=====================================================================
+Everything you never wanted to know about kobjects, ksets, and ktypes
+=====================================================================
+
+:Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+:Last updated: December 19, 2007
+
+Based on an original article by Jon Corbet for lwn.net written October 1,
+2003 and located at http://lwn.net/Articles/51437/
+
+Part of the difficulty in understanding the driver model - and the kobject
+abstraction upon which it is built - is that there is no obvious starting
+place. Dealing with kobjects requires understanding a few different types,
+all of which make reference to each other. In an attempt to make things
+easier, we'll take a multi-pass approach, starting with vague terms and
+adding detail as we go. To that end, here are some quick definitions of
+some terms we will be working with.
+
+ - A kobject is an object of type struct kobject.  Kobjects have a name
+   and a reference count.  A kobject also has a parent pointer (allowing
+   objects to be arranged into hierarchies), a specific type, and,
+   usually, a representation in the sysfs virtual filesystem.
+
+   Kobjects are generally not interesting on their own; instead, they are
+   usually embedded within some other structure which contains the stuff
+   the code is really interested in.
+
+   No structure should **EVER** have more than one kobject embedded within it.
+   If it does, the reference counting for the object is sure to be messed
+   up and incorrect, and your code will be buggy.  So do not do this.
+
+ - A ktype is the type of object that embeds a kobject.  Every structure
+   that embeds a kobject needs a corresponding ktype.  The ktype controls
+   what happens to the kobject when it is created and destroyed.
+
+ - A kset is a group of kobjects.  These kobjects can be of the same ktype
+   or belong to different ktypes.  The kset is the basic container type for
+   collections of kobjects. Ksets contain their own kobjects, but you can
+   safely ignore that implementation detail as the kset core code handles
+   this kobject automatically.
+
+   When you see a sysfs directory full of other directories, generally each
+   of those directories corresponds to a kobject in the same kset.
+
+We'll look at how to create and manipulate all of these types. A bottom-up
+approach will be taken, so we'll go back to kobjects.
+
+
+Embedding kobjects
+==================
+
+It is rare for kernel code to create a standalone kobject, with one major
+exception explained below.  Instead, kobjects are used to control access to
+a larger, domain-specific object.  To this end, kobjects will be found
+embedded in other structures.  If you are used to thinking of things in
+object-oriented terms, kobjects can be seen as a top-level, abstract class
+from which other classes are derived.  A kobject implements a set of
+capabilities which are not particularly useful by themselves, but are
+nice to have in other objects.  The C language does not allow for the
+direct expression of inheritance, so other techniques - such as structure
+embedding - must be used.
+
+(As an aside, for those familiar with the kernel linked list implementation,
+this is analogous as to how "list_head" structs are rarely useful on
+their own, but are invariably found embedded in the larger objects of
+interest.)
+
+So, for example, the UIO code in ``drivers/uio/uio.c`` has a structure that
+defines the memory region associated with a uio device::
+
+    struct uio_map {
+            struct kobject kobj;
+            struct uio_mem *mem;
+    };
+
+If you have a struct uio_map structure, finding its embedded kobject is
+just a matter of using the kobj member.  Code that works with kobjects will
+often have the opposite problem, however: given a struct kobject pointer,
+what is the pointer to the containing structure?  You must avoid tricks
+(such as assuming that the kobject is at the beginning of the structure)
+and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
+
+    container_of(pointer, type, member)
+
+where:
+
+  * ``pointer`` is the pointer to the embedded kobject,
+  * ``type`` is the type of the containing structure, and
+  * ``member`` is the name of the structure field to which ``pointer`` points.
+
+The return value from container_of() is a pointer to the corresponding
+container type. So, for example, a pointer ``kp`` to a struct kobject
+embedded **within** a struct uio_map could be converted to a pointer to the
+**containing** uio_map structure with::
+
+    struct uio_map *u_map = container_of(kp, struct uio_map, kobj);
+
+For convenience, programmers often define a simple macro for **back-casting**
+kobject pointers to the containing type.  Exactly this happens in the
+earlier ``drivers/uio/uio.c``, as you can see here::
+
+    struct uio_map {
+            struct kobject kobj;
+            struct uio_mem *mem;
+    };
+
+    #define to_map(map) container_of(map, struct uio_map, kobj)
+
+where the macro argument "map" is a pointer to the struct kobject in
+question.  That macro is subsequently invoked with::
+
+    struct uio_map *map = to_map(kobj);
+
+
+Initialization of kobjects
+==========================
+
+Code which creates a kobject must, of course, initialize that object. Some
+of the internal fields are setup with a (mandatory) call to kobject_init()::
+
+    void kobject_init(struct kobject *kobj, struct kobj_type *ktype);
+
+The ktype is required for a kobject to be created properly, as every kobject
+must have an associated kobj_type.  After calling kobject_init(), to
+register the kobject with sysfs, the function kobject_add() must be called::
+
+    int kobject_add(struct kobject *kobj, struct kobject *parent,
+                    const char *fmt, ...);
+
+This sets up the parent of the kobject and the name for the kobject
+properly.  If the kobject is to be associated with a specific kset,
+kobj->kset must be assigned before calling kobject_add().  If a kset is
+associated with a kobject, then the parent for the kobject can be set to
+NULL in the call to kobject_add() and then the kobject's parent will be the
+kset itself.
+
+As the name of the kobject is set when it is added to the kernel, the name
+of the kobject should never be manipulated directly.  If you must change
+the name of the kobject, call kobject_rename()::
+
+    int kobject_rename(struct kobject *kobj, const char *new_name);
+
+kobject_rename does not perform any locking or have a solid notion of
+what names are valid so the caller must provide their own sanity checking
+and serialization.
+
+There is a function called kobject_set_name() but that is legacy cruft and
+is being removed.  If your code needs to call this function, it is
+incorrect and needs to be fixed.
+
+To properly access the name of the kobject, use the function
+kobject_name()::
+
+    const char *kobject_name(const struct kobject * kobj);
+
+There is a helper function to both initialize and add the kobject to the
+kernel at the same time, called surprisingly enough kobject_init_and_add()::
+
+    int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
+                             struct kobject *parent, const char *fmt, ...);
+
+The arguments are the same as the individual kobject_init() and
+kobject_add() functions described above.
+
+
+Uevents
+=======
+
+After a kobject has been registered with the kobject core, you need to
+announce to the world that it has been created.  This can be done with a
+call to kobject_uevent()::
+
+    int kobject_uevent(struct kobject *kobj, enum kobject_action action);
+
+Use the **KOBJ_ADD** action for when the kobject is first added to the kernel.
+This should be done only after any attributes or children of the kobject
+have been initialized properly, as userspace will instantly start to look
+for them when this call happens.
+
+When the kobject is removed from the kernel (details on how to do that are
+below), the uevent for **KOBJ_REMOVE** will be automatically created by the
+kobject core, so the caller does not have to worry about doing that by
+hand.
+
+
+Reference counts
+================
+
+One of the key functions of a kobject is to serve as a reference counter
+for the object in which it is embedded. As long as references to the object
+exist, the object (and the code which supports it) must continue to exist.
+The low-level functions for manipulating a kobject's reference counts are::
+
+    struct kobject *kobject_get(struct kobject *kobj);
+    void kobject_put(struct kobject *kobj);
+
+A successful call to kobject_get() will increment the kobject's reference
+counter and return the pointer to the kobject.
+
+When a reference is released, the call to kobject_put() will decrement the
+reference count and, possibly, free the object. Note that kobject_init()
+sets the reference count to one, so the code which sets up the kobject will
+need to do a kobject_put() eventually to release that reference.
+
+Because kobjects are dynamic, they must not be declared statically or on
+the stack, but instead, always allocated dynamically.  Future versions of
+the kernel will contain a run-time check for kobjects that are created
+statically and will warn the developer of this improper usage.
+
+If all that you want to use a kobject for is to provide a reference counter
+for your structure, please use the struct kref instead; a kobject would be
+overkill.  For more information on how to use struct kref, please see the
+file Documentation/kref.txt in the Linux kernel source tree.
+
+
+Creating "simple" kobjects
+==========================
+
+Sometimes all that a developer wants is a way to create a simple directory
+in the sysfs hierarchy, and not have to mess with the whole complication of
+ksets, show and store functions, and other details.  This is the one
+exception where a single kobject should be created.  To create such an
+entry, use the function::
+
+    struct kobject *kobject_create_and_add(char *name, struct kobject *parent);
+
+This function will create a kobject and place it in sysfs in the location
+underneath the specified parent kobject.  To create simple attributes
+associated with this kobject, use::
+
+    int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
+
+or::
+
+    int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
+
+Both types of attributes used here, with a kobject that has been created
+with the kobject_create_and_add(), can be of type kobj_attribute, so no
+special custom attribute is needed to be created.
+
+See the example module, ``samples/kobject/kobject-example.c`` for an
+implementation of a simple kobject and attributes.
+
+
+
+ktypes and release methods
+==========================
+
+One important thing still missing from the discussion is what happens to a
+kobject when its reference count reaches zero. The code which created the
+kobject generally does not know when that will happen; if it did, there
+would be little point in using a kobject in the first place. Even
+predictable object lifecycles become more complicated when sysfs is brought
+in as other portions of the kernel can get a reference on any kobject that
+is registered in the system.
+
+The end result is that a structure protected by a kobject cannot be freed
+before its reference count goes to zero. The reference count is not under
+the direct control of the code which created the kobject. So that code must
+be notified asynchronously whenever the last reference to one of its
+kobjects goes away.
+
+Once you registered your kobject via kobject_add(), you must never use
+kfree() to free it directly. The only safe way is to use kobject_put(). It
+is good practice to always use kobject_put() after kobject_init() to avoid
+errors creeping in.
+
+This notification is done through a kobject's release() method. Usually
+such a method has a form like::
+
+    void my_object_release(struct kobject *kobj)
+    {
+            struct my_object *mine = container_of(kobj, struct my_object, kobj);
+
+            /* Perform any additional cleanup on this object, then... */
+            kfree(mine);
+    }
+
+One important point cannot be overstated: every kobject must have a
+release() method, and the kobject must persist (in a consistent state)
+until that method is called. If these constraints are not met, the code is
+flawed. Note that the kernel will warn you if you forget to provide a
+release() method.  Do not try to get rid of this warning by providing an
+"empty" release function.
+
+If all your cleanup function needs to do is call kfree(), then you must
+create a wrapper function which uses container_of() to upcast to the correct
+type (as shown in the example above) and then calls kfree() on the overall
+structure.
+
+Note, the name of the kobject is available in the release function, but it
+must NOT be changed within this callback.  Otherwise there will be a memory
+leak in the kobject core, which makes people unhappy.
+
+Interestingly, the release() method is not stored in the kobject itself;
+instead, it is associated with the ktype. So let us introduce struct
+kobj_type::
+
+    struct kobj_type {
+            void (*release)(struct kobject *kobj);
+            const struct sysfs_ops *sysfs_ops;
+            struct attribute **default_attrs;
+            const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
+            const void *(*namespace)(struct kobject *kobj);
+    };
+
+This structure is used to describe a particular type of kobject (or, more
+correctly, of containing object). Every kobject needs to have an associated
+kobj_type structure; a pointer to that structure must be specified when you
+call kobject_init() or kobject_init_and_add().
+
+The release field in struct kobj_type is, of course, a pointer to the
+release() method for this type of kobject. The other two fields (sysfs_ops
+and default_attrs) control how objects of this type are represented in
+sysfs; they are beyond the scope of this document.
+
+The default_attrs pointer is a list of default attributes that will be
+automatically created for any kobject that is registered with this ktype.
+
+
+ksets
+=====
+
+A kset is merely a collection of kobjects that want to be associated with
+each other.  There is no restriction that they be of the same ktype, but be
+very careful if they are not.
+
+A kset serves these functions:
+
+ - It serves as a bag containing a group of objects. A kset can be used by
+   the kernel to track "all block devices" or "all PCI device drivers."
+
+ - A kset is also a subdirectory in sysfs, where the associated kobjects
+   with the kset can show up.  Every kset contains a kobject which can be
+   set up to be the parent of other kobjects; the top-level directories of
+   the sysfs hierarchy are constructed in this way.
+
+ - Ksets can support the "hotplugging" of kobjects and influence how
+   uevent events are reported to user space.
+
+In object-oriented terms, "kset" is the top-level container class; ksets
+contain their own kobject, but that kobject is managed by the kset code and
+should not be manipulated by any other user.
+
+A kset keeps its children in a standard kernel linked list.  Kobjects point
+back to their containing kset via their kset field. In almost all cases,
+the kobjects belonging to a kset have that kset (or, strictly, its embedded
+kobject) in their parent.
+
+As a kset contains a kobject within it, it should always be dynamically
+created and never declared statically or on the stack.  To create a new
+kset use::
+
+  struct kset *kset_create_and_add(const char *name,
+                                   struct kset_uevent_ops *u,
+                                   struct kobject *parent);
+
+When you are finished with the kset, call::
+
+  void kset_unregister(struct kset *kset);
+
+to destroy it.  This removes the kset from sysfs and decrements its reference
+count.  When the reference count goes to zero, the kset will be released.
+Because other references to the kset may still exist, the release may happen
+after kset_unregister() returns.
+
+An example of using a kset can be seen in the
+``samples/kobject/kset-example.c`` file in the kernel tree.
+
+If a kset wishes to control the uevent operations of the kobjects
+associated with it, it can use the struct kset_uevent_ops to handle it::
+
+  struct kset_uevent_ops {
+          int (*filter)(struct kset *kset, struct kobject *kobj);
+          const char *(*name)(struct kset *kset, struct kobject *kobj);
+          int (*uevent)(struct kset *kset, struct kobject *kobj,
+                        struct kobj_uevent_env *env);
+  };
+
+
+The filter function allows a kset to prevent a uevent from being emitted to
+userspace for a specific kobject.  If the function returns 0, the uevent
+will not be emitted.
+
+The name function will be called to override the default name of the kset
+that the uevent sends to userspace.  By default, the name will be the same
+as the kset itself, but this function, if present, can override that name.
+
+The uevent function will be called when the uevent is about to be sent to
+userspace to allow more environment variables to be added to the uevent.
+
+One might ask how, exactly, a kobject is added to a kset, given that no
+functions which perform that function have been presented.  The answer is
+that this task is handled by kobject_add().  When a kobject is passed to
+kobject_add(), its kset member should point to the kset to which the
+kobject will belong.  kobject_add() will handle the rest.
+
+If the kobject belonging to a kset has no parent kobject set, it will be
+added to the kset's directory.  Not all members of a kset do necessarily
+live in the kset directory.  If an explicit parent kobject is assigned
+before the kobject is added, the kobject is registered with the kset, but
+added below the parent kobject.
+
+
+Kobject removal
+===============
+
+After a kobject has been registered with the kobject core successfully, it
+must be cleaned up when the code is finished with it.  To do that, call
+kobject_put().  By doing this, the kobject core will automatically clean up
+all of the memory allocated by this kobject.  If a ``KOBJ_ADD`` uevent has been
+sent for the object, a corresponding ``KOBJ_REMOVE`` uevent will be sent, and
+any other sysfs housekeeping will be handled for the caller properly.
+
+If y
author	Linus Torvalds <torvalds@linux-foundation.org>	2020-03-30 12:45:23 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2020-03-30 12:45:23 -0700
commit	481ed297d900af0ce395f6ca8975903b76a5a59e (patch)
tree	e3862e9993cd8e2245c5a6d632f45dd3f77d1d62 /Documentation/core-api
parent	e59cd88028dbd41472453e5883f78330aa73c56e (diff)
parent	abcb1e021ae5a36374c635eeaba5cec733169b78 (diff)