diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2018-08-13 22:34:47 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2018-08-13 22:34:47 -0700 |
commit | 10f3e23f07cb0c20f9bcb77a5b5a7eb2a1b2a2fe (patch) | |
tree | 1fcb34309b3542512c6f3345f092f7adb8c3312c /Documentation | |
parent | 3bb37da509e576c80180fa0e4d1cfcaddf0cb82e (diff) | |
parent | 863c37fcb14f8b66ea831b45fb35a53ac4a8d69e (diff) |
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
- Convert content from the ext4 wiki to Documentation rst files so it
is more likely to be updated as we add new features to ext4.
- Add 64-bit timestamp support to ext4's superblock fields.
- ... and the usual bug fixes and cleanups, including a Spectre gadget
fixup and some hardening against maliciously corrupted file systems.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (34 commits)
ext4: remove unneeded variable "err" in ext4_mb_release_inode_pa()
ext4: improve code readability in ext4_iget()
ext4: fix spectre gadget in ext4_mb_regular_allocator()
ext4: check for NUL characters in extended attribute's name
ext4: use ext4_warning() for sb_getblk failure
ext4: fix race when setting the bitmap corrupted flag
ext4: reset error code in ext4_find_entry in fallback
ext4: handle layout changes to pinned DAX mappings
dax: dax_layout_busy_page() warn on !exceptional
docs: fix up the obviously obsolete bits in the new ext4 documentation
docs: add new ext4 superblock time extension fields
docs: create filesystem internal section
ext4: use swap macro in mext_page_double_lock
ext4: check allocation failure when duplicating "data" in ext4_remount()
ext4: fix warning message in ext4_enable_quotas()
ext4: super: extend timestamps to 40 bits
jbd2: replace current_kernel_time64 with ktime equivalent
ext4: use timespec64 for all inode times
ext4: use ktime_get_real_seconds for i_dtime
ext4: use 64-bit timestamps for mmp_time
...
Diffstat (limited to 'Documentation')
27 files changed, 3840 insertions, 79 deletions
diff --git a/Documentation/conf.py b/Documentation/conf.py index 62ac5a9f3a9f..b691af4831fa 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -34,7 +34,7 @@ needs_sphinx = '1.3' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. -extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure'] +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig'] # The name of the math extension changed on Sphinx 1.4 if major == 1 and minor > 3: diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4/ext4.rst index 7f628b9f7c4b..9d4368d591fa 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4/ext4.rst @@ -1,6 +1,8 @@ +.. SPDX-License-Identifier: GPL-2.0 -Ext4 Filesystem -=============== +======================== +General Information +======================== Ext4 is an advanced level of the ext3 filesystem which incorporates scalability and reliability enhancements for supporting large filesystems @@ -11,37 +13,30 @@ Mailing list: linux-ext4@vger.kernel.org Web site: http://ext4.wiki.kernel.org -1. Quick usage instructions: -=========================== +Quick usage instructions +======================== Note: More extensive information for getting started with ext4 can be - found at the ext4 wiki site at the URL: - http://ext4.wiki.kernel.org/index.php/Ext4_Howto +found at the ext4 wiki site at the URL: +http://ext4.wiki.kernel.org/index.php/Ext4_Howto - - Compile and install the latest version of e2fsprogs (as of this - writing version 1.41.3) from: + - The latest version of e2fsprogs can be found at: + + https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ - http://sourceforge.net/project/showfiles.php?group_id=2406 - or - https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ + http://sourceforge.net/project/showfiles.php?group_id=2406 or grab the latest git repository from: - git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git - - - Note that it is highly important to install the mke2fs.conf file - that comes with the e2fsprogs 1.41.x sources in /etc/mke2fs.conf. If - you have edited the /etc/mke2fs.conf file installed on your system, - you will need to merge your changes with the version from e2fsprogs - 1.41.x. + https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git - Create a new filesystem using the ext4 filesystem type: - # mke2fs -t ext4 /dev/hda1 + # mke2fs -t ext4 /dev/hda1 - Or to configure an existing ext3 filesystem to support extents: + Or to configure an existing ext3 filesystem to support extents: # tune2fs -O extents /dev/hda1 @@ -50,10 +45,6 @@ Note: More extensive information for getting started with ext4 can be # tune2fs -I 256 /dev/hda1 - (Note: we currently do not have tools to convert an ext4 - filesystem back to ext3; so please do not do try this on production - filesystems.) - - Mounting: # mount -t ext4 /dev/hda1 /wherever @@ -75,10 +66,11 @@ Note: More extensive information for getting started with ext4 can be the filesystem with a large journal can also be helpful for metadata-intensive workloads. -2. Features -=========== +Features +======== -2.1 Currently available +Currently Available +------------------- * ability to use filesystems > 16TB (e2fsprogs support not available yet) * extent format reduces metadata overhead (RAM, IO for access, transactions) @@ -103,31 +95,15 @@ Note: More extensive information for getting started with ext4 can be [1] Filesystems with a block size of 1k may see a limit imposed by the directory hash tree having a maximum depth of two. -2.2 Candidate features for future inclusion - -* online defrag (patches available but not well tested) -* reduced mke2fs time via lazy itable initialization in conjunction with - the uninit_bg feature (capability to do this is available in e2fsprogs - but a kernel thread to do lazy zeroing of unused inode table blocks - after filesystem is first mounted is required for safety) - -There are several others under discussion, whether they all make it in is -partly a function of how much time everyone has to work on them. Features like -metadata checksumming have been discussed and planned for a bit but no patches -exist yet so I'm not sure they're in the near-term roadmap. - -The big performance win will come with mballoc, delalloc and flex_bg -grouping of bitmaps and inode tables. Some test results available here: - - - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-write-2.6.27-rc1.html - - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-readwrite-2.6.27-rc1.html - -3. Options -========== +Options +======= When mounting an ext4 filesystem, the following option are accepted: (*) == default +======================= ======================================================= +Mount Option Description +======================= ======================================================= ro Mount filesystem read only. Note that ext4 will replay the journal (and thus write to the partition) even when mounted "read only". The @@ -387,33 +363,38 @@ i_version Enable 64-bit inode version support. This option is dax Use direct access (no page cache). See Documentation/filesystems/dax.txt. Note that this option is incompatible with data=journal. +======================= ======================================================= Data Mode ========= There are 3 different data modes: * writeback mode -In data=writeback mode, ext4 does not journal data at all. This mode provides -a similar level of journaling as that of XFS, JFS, and ReiserFS in its default -mode - metadata journaling. A crash+recovery can cause incorrect data to -appear in files which were written shortly before the crash. This mode will -typically provide the best ext4 performance. + + In data=writeback mode, ext4 does not journal data at all. This mode provides + a similar level of journaling as that of XFS, JFS, and ReiserFS in its default + mode - metadata journaling. A crash+recovery can cause incorrect data to + appear in files which were written shortly before the crash. This mode will + typically provide the best ext4 performance. * ordered mode -In data=ordered mode, ext4 only officially journals metadata, but it logically -groups metadata information related to data changes with the data blocks into a -single unit called a transaction. When it's time to write the new metadata -out to disk, the associated data blocks are written first. In general, -this mode performs slightly slower than writeback but significantly faster than journal mode. + + In data=ordered mode, ext4 only officially journals metadata, but it logically + groups metadata information related to data changes with the data blocks into + a single unit called a transaction. When it's time to write the new metadata + out to disk, the associated data blocks are written first. In general, this + mode performs slightly slower than writeback but significantly faster than + journal mode. * journal mode -data=journal mode provides full data and metadata journaling. All new data is -written to the journal first, and then to its final location. -In the event of a crash, the journal can be replayed, bringing both data and -metadata into a consistent state. This mode is the slowest except when data -needs to be read from and written to disk at the same time where it -outperforms all others modes. Enabling this mode will disable delayed -allocation and O_DIRECT support. + + data=journal mode provides full data and metadata journaling. All new data is + written to the journal first, and then to its final location. In the event of + a crash, the journal can be replayed, bringing both data and metadata into a + consistent state. This mode is the slowest except when data needs to be read + from and written to disk at the same time where it outperforms all others + modes. Enabling this mode will disable delayed allocation and O_DIRECT + support. /proc entries ============= @@ -425,10 +406,12 @@ Information about mounted ext4 file systems can be found in in table below. Files in /proc/fs/ext4/<devname> -.............................................................................. + +================ ======= File Content +================ ======= mb_groups details of multiblock allocator buddy cache of free blocks -.............................................................................. +================ ======= /sys entries ============ @@ -439,28 +422,30 @@ Information about mounted ext4 file systems can be found in /sys/fs/ext4/dm-0). The files in each per-device directory are shown in table below. -Files in /sys/fs/ext4/<devname> +Files in /sys/fs/ext4/<devname>: + (see also Documentation/ABI/testing/sysfs-fs-ext4) -.............................................................................. - File Content +============================= ================================================= +File Content +============================= ================================================= delayed_allocation_blocks This file is read-only and shows the number of blocks that are dirty in the page cache, but which do not have their location in the filesystem allocated yet. - inode_goal Tuning parameter which (if non-zero) controls +inode_goal Tuning parameter which (if non-zero) controls the goal inode used by the inode allocator in preference to all other allocation heuristics. This is intended for debugging use only, and should be 0 on production systems. - inode_readahead_blks Tuning parameter which controls the maximum +inode_readahead_blks Tuning parameter which controls the maximum number of inode table blocks that ext4's inode table readahead algorithm will pre-read into the buffer cache - lifetime_write_kbytes This file is read-only and shows the number of +lifetime_write_kbytes This file is read-only and shows the number of kilobytes of data that have been written to this filesystem since it was created. @@ -508,7 +493,7 @@ Files in /sys/fs/ext4/<devname> in the file system. If there is not enough space for the reserved space when mounting the file mount will _not_ fail. -.............................................................................. +============================= ================================================= Ioctls ====== @@ -518,8 +503,10 @@ through the system call interfaces. The list of all Ext4 specific ioctls are shown in the table below. Table of Ext4 specific ioctls -.............................................................................. - Ioctl Description + +============================= ================================================= +Ioctl Description +============================= ================================================= EXT4_IOC_GETFLAGS Get additional attributes associated with inode. The ioctl argument is an integer bitfield, with bit values described in ext4.h. This ioctl is an @@ -610,8 +597,7 @@ Table of Ext4 specific ioctls normal user by accident. The data blocks of the previous boot loader will be associated with the given inode. - -.............................................................................. +============================= ================================================= References ========== diff --git a/Documentation/filesystems/ext4/index.rst b/Documentation/filesystems/ext4/index.rst new file mode 100644 index 000000000000..71121605558c --- /dev/null +++ b/Documentation/filesystems/ext4/index.rst @@ -0,0 +1,17 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +ext4 Filesystem +=============== + +General usage and on-disk artifacts writen by ext4. More documentation may +be ported from the wiki as time permits. This should be considered the +canonical source of information as the details here have been reviewed by +the ext4 community. + +.. toctree:: + :maxdepth: 5 + :numbered: + + ext4 + ondisk/index diff --git a/Documentation/filesystems/ext4/ondisk/about.rst b/Documentation/filesystems/ext4/ondisk/about.rst new file mode 100644 index 000000000000..0aadba052264 --- /dev/null +++ b/Documentation/filesystems/ext4/ondisk/about.rst @@ -0,0 +1,44 @@ +.. SPDX-License-Identifier: GPL-2.0 + +About this Book +=============== + +This document attempts to describe the on-disk format for ext4 +filesystems. The same general ideas should apply to ext2/3 filesystems +as well, though they do not support all the features that ext4 supports, +and the fields will be shorter. + +**NOTE**: This is a work in progress, based on notes that the author +(djwong) made while picking apart a filesystem by hand. The data +structure definitions should be current as of Linux 4.18 and +e2fsprogs-1.44. All comments and corrections are welcome, since there is +undoubtedly plenty of lore that might not be reflected in freshly +created demonstration filesystems. + +License +------- +This book is licensed under the terms of the GNU Public License, v2. + +Terminology +----------- + +ext4 divides a storage device into an array of logical blocks both to +reduce bookkeeping overhead and to increase throughput by forcing larger +transfer sizes. Generally, the block size will be 4KiB (the same size as +pages on x86 and the block layer's default block size), though the +actual size is calculated as 2 ^ (10 + ``sb.s_log_block_size``) bytes. +Throughout this document, disk locations are given in terms of these +logical blocks, not raw LBAs, and not 1024-byte blocks. For the sake of +convenience, the logical block size will be referred to as +``$block_size`` throughout the rest of the document. + +When referenced in ``preformatted text`` blocks, ``sb`` refers to fields +in the super block, and ``inode`` refers to fields in an inode table +entry. + +Other References +---------------- + +Also see http://www.nongnu.org/ext2-doc/ for quite a collection of +information about ext2/3. Here's another old reference: +http://wiki.osdev.org/Ext2 diff --git a/Documentation/filesystems/ext4/ondisk/allocators.rst b/Documentation/filesystems/ext4/ondisk/allocators.rst new file mode 100644 index 000000000000..7aa85152ace3 --- /dev/null +++ b/Documentation/filesystems/ext4/ondisk/allocators.rst @@ -0,0 +1,56 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Block and Inode Allocation Policy +--------------------------------- + +ext4 recognizes (better than ext3, anyway) that data locality is +generally a desirably quality of a filesystem. On a spinning disk, +keeping related blocks near each other reduces the amount of movement +that the head actuator and disk must perform to access a data block, +thus speeding up disk IO. On an SSD there of course are no moving parts, +but locality can increase the size of each transfer request while +reducing the total number of requests. This locality may also have the +effect of concentrating writes on a single erase block, which can speed +up file rewrites significantly. Therefore, it is useful to reduce +fragmentation whenever possible. + +The first tool that ext4 uses to combat fragmentation is the multi-block +allocator. When a file is first created, the block allocator +speculatively allocates 8KiB of disk space to the file on the assumption +that the space will get written soon. When the file is closed, the +unused speculative allocations are of course freed, but if the +speculation is correct (typically the case for full writes of small +files) then the file data gets written out in a single multi-block +extent. A second related trick that ext4 uses is delayed allocation. +Under this scheme, when a file needs more blocks to absorb file writes, +the filesystem defers deciding the exact placement on the disk until all +the dirty buffers are being written out to disk. By not committing to a +particular placement until it's absolutely necessary (the commit timeout +is hit, or sync() is called, or the kernel runs out of memory), the hope +is that the filesystem can make better location decisions. + +The third trick that ext4 (and ext3) uses is that it tries to keep a +file's data blocks in the same block group as its inode. This cuts down +on the seek penalty when the filesystem first has to read a file's inode +to learn where the file's data blocks live and then seek over to the +file's data blocks to begin I/O operations. + +The fourth trick is that all the inodes in a directory are placed in the +same block group as the directory, when feasible. The working assumption +here is that all the files in a directory might be related, therefore it +is useful to try to keep them all together. + +The fifth trick is that the disk volume is cut up into 128MB block +groups; these mini-containers are used as outlined above to try to +maintain data locality. However, there is a deliberate quirk -- when a +directory is created in the root directory, the inode allocator scans +the block groups and puts that directory into the least heavily loaded +block group that it can find. This encourages directories to spread out +over a disk; as the top-level directory/file blobs fill up one block +group, the allocators simply move on to the next block group. Allegedly +this scheme evens out the loading on the block groups, though the author +suspects that the directories which are so unlucky as to land towards +the end of a spinning drive get a raw deal performance-wise. + +Of course if all of these mechanisms fail, one can always use e4defrag +to defragment files. diff --git a/Documentation/filesystems/ext4/ondisk/attributes.rst b/Documentation/filesystems/ext4/ondisk/attributes.rst new file mode 100644 index 000000000000..0b01b67b81fe --- /dev/null +++ b/Documentation/filesystems/ext4/ondisk/attributes.rst @@ -0,0 +1,191 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Extended Attributes +------------------- + +Extended attributes (xattrs) are typically stored in a separate data +block on the disk and referenced from inodes via ``inode.i_file_acl*``. +The first use of extended attributes seems to have been for storing file +ACLs and other security data (selinux). With the ``user_xattr`` mount +option it is possible for users to store extended attributes so long as +all attribute names begin with “user”; this restriction seems to have +disappeared as of Linux 3.0. + +There are two places where extended attributes can be found. The first +place is between the end of each inode entry and the beginning of the +next inode entry. For example, if inode.i\_extra\_isize = 28 and +sb.inode\_size = 256, then there are 256 - (128 + 28) = 100 bytes +available for in-inode extended attribute storage. The second place +where extended attributes can be found is in the block pointed to by +``inode.i_file_acl``. As of Linux 3.11, it is not possible for this +block to contain a pointer to a second extended attribute block (or even +the remaining blocks of a cluster). In theory it is possible for each +attribute's value to be stored in a separate data block, though as of +Linux 3.11 the code does not permit this. + +Keys are generally assumed to be ASCIIZ strings, whereas values can be +strings or binary data. + +Extended attributes, when stored after the inode, have a header +``ext4_xattr_ibody_header`` that is 4 bytes long: + +.. list-table:: + :widths: 1 1 1 77 + :header-rows: 1 + + * - Offset + - Type + - Name + - Description + * - 0x0 + - \_\_le32 + - h\_magic + - Magic number for identification, 0xEA020000. This value is set by the + Linux driver, though e2fsprogs doesn't seem to check it(?) + +The beginning of an extended attribute block is in +``struct ext4_xattr_header``, which is 32 bytes long: + +.. list-table:: + :widths: 1 1 1 77 + :header-rows: 1 + + * - Offset + - Type + - Name + - Description + * - 0x0 + - \_\_le32 + - h\_magic + - Magic number for identification, 0xEA020000. + * - 0x4 + - \_\_le32 + - h\_refcount + - Reference count. + * - 0x8 + - \_\_le32 + - h\_blocks + - Number of disk blocks used. + * - 0xC + - \_\_le32 + - h\_hash + - Hash value of all attributes. + * - 0x10 + - \_\_le32 + - h\_checksum + - Checksum of the extended attribute block. + * - 0x14 + - \_\_u32 + - h\_reserved[2] + - Zero. + +The checksum is calculated against the FS UUID, the 64-bit block number +of the extended attribute block, and the entire block (header + +entries). + +Following the ``struct ext4_xattr_header`` or +``struct ext4_xattr_ibody_header`` is an array of +``struct ext4_xattr_entry``; each of these entries is at least 16 bytes +long. When stored in an external block, the ``struct ext4_xattr_entry`` +entries must be stored in sorted order. The sort order is +``e_name_index``, then ``e_name_len``, and finally ``e_name``. +Attributes stored inside an inode do not need be stored in sorted order. + +.. list-table:: + :widths: 1 1 1 77 + :header-rows: 1 + + * - Offset + - Type + - Name + - Description + * - 0x0 + - \_\_u8 + - e\_name\_len + - Length of name. + * - 0x1 + - \_\_u8 + - e\_name\_index + - Attribute name index. There is a discussion of this below. + * - 0x2 + - \_\_le16 + - e\_value\_offs + - Location of this attribute's value on the disk block where it is stored. + Multiple attributes can share the same value. For an inode attribute + this value is relative to the start of the first entry; for a block this + value is relative to the start of the block (i.e. the header). + * - 0x4 + - \_\_le32 + - e\_value\_inum + - The inode where the value is stored. Zero indicates the value is in the + same block as this entry. This field is only used if the + INCOMPAT\_EA\_INODE feature is enabled. + * - 0x8 + - \_\_le32 + - e\_value\_size + - Length of attribute value. + * - 0xC + - \_\_le32 + - e\_hash + - Hash value of attribute name and attribute value. The kernel doesn't + update the hash for in-inode attributes, so for that case this value + must be zero, because e2fsck validates any non-zero hash regardless of + where the xattr lives. + * - 0x10 + - char + - e\_name[e\_name\_len] + - Attribute name. Does not include trailing NULL. + +Attribute values can follow the end of the entry table. There appears to +be a requirement that they be aligned to 4-byte boundaries. The values +are stored starting at the end of the block and grow towards the +xattr\_header/xattr\_entry table. When the two collide, the overflow is +put into a separate disk block. If the disk block fills up, the +filesystem returns -ENOSPC. + +The first four fields of the ``ext4_xattr_entry`` are set to zero to +mark the end of the key list. + +Attribute Name Indices +~~~~~~~~~~~~~~~~~~~~~~ + +Logically speaking, extended attributes are a series of key=value pairs. +The keys are assumed to be NULL-terminated strings. To reduce the amount +of on-disk space that the keys consume, the beginning of the key string +is matched against the attribute name index. If a match is found, the +attribute name index field is set, and matching string is removed from +the key name. Here is a map of name index values to key prefixes: + +.. list-table:: + :widths: 1 79 + :header-rows: 1 + + * - Name Index + - Key Prefix + * - 0 + - (no prefix) + * - 1 + - “user.” + * - 2 + - “system.posix\_acl\_access” + * - 3 + - “system.posix\_acl\_default” + * - 4 + - “trusted.” + * - 6 + - “security.” + * - 7 + - “system.” (inline\_data only?) + * - 8 + - “system.richacl” (SuSE kernels only?) + +For example, if the attribute key is “user.fubar”, the attribute name +index is set to 1 and the “fubar” name is recorded on disk. + +POSIX ACLs +~~~~~~~~~~ + +POSIX ACLs are stored in a reduced version of the Linux kernel (and +libacl's) internal ACL format. The key difference is that the version +number is different (1) and the ``e_id`` field is only stored for named +user and group ACLs. diff --git a/Documentation/filesystems/ext4/ondisk/bigalloc.rst b/Documentation/filesystems/ext4/ondisk/bigalloc.rst new file mode 100644 index 000000000000..c6d88557553c --- /dev/null +++ b/Documentation/filesystems/ext4/ondisk/bigalloc.rst @@ -0,0 +1,22 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Bigalloc +-------- + +At the moment, the default size of a block is 4KiB, which is a commonly +supported page size on most MMU-capable hardware. This is fortunate, as +ext4 code is not prepared to handle the case where the block size +exceeds the page size. However, for a filesystem of mostly huge files, +it is desirable to be able to allocate disk blocks in units of multiple +blocks to reduce both fragmentation and metadata overhead. The +`bigalloc <Bigalloc>`__ feature provides exactly this ability. The +administrator can set a block cluster size at mkfs time (which is stored +in the s\_log\_cluster\_size field in the superblock); from then on, the +block bitmaps track clusters, not individual blocks. This means that +block groups can be several gigabytes in size (instead of just 128MiB); +however, the minimum allocation unit becomes a cluster, not a block, +even for directories. TaoBao had a patchset to extend the “use units of +clusters instead of blocks” to the extent tree, though it is not clear +where those patches went-- they eventually morphed into “extent tree v2” +but that code has not landed as of May 2015. + diff --git a/Documentation/filesystems/ext4/ondisk/bitmaps.rst b/Documentation/filesystems/ext4/ondisk/bitmaps.rst new file mode 100644 index 000000000000..c7546dbc197a --- /dev/null +++ b/Documentation/filesystems/ext4/ondisk/bitmaps.rst @@ -0,0 +1,28 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Block and inode Bitmaps +----------------------- + +The data block bitmap tracks the usage of data blocks within the block +group. + +The inode bitmap records which entries in the inode table are in use. + +As with most bitmaps, one bit represents the usage status of one data +block or inode table entry. This implies a block group size of 8 \* +number\_of\_bytes\_in\_a\_logical\_block. + +NOTE: If ``BLOCK_UNINIT`` is set for a given block group, various parts +of the kernel and e2fsprogs code pretends that the block bitmap contains +zeros (i.e. all blocks in the group are free). However, it is not +necessarily the case that no blocks are in use -- if ``meta_bg`` is set, +the bitmaps and group descriptor live inside the group. Unfortunately, +ext2fs\_test\_block\_bitmap2() will return '0' for those locations, +which produces confusing debugfs output. + +Inode Table +----------- +Inode tables are statically allocated at mkfs time. Each block group +descriptor points to the start of the table, and the superblock records +the number of inodes per group. See the section on inodes for more +information. diff --git a/Documentation/filesystems/ext4/ondisk/blockgroup.rst b/Documentation/filesystems/ext4/ondisk/blockgroup.rst new file mode 100644 index 000000000000..baf888e4c06a --- /dev/null +++ b/Documentation/filesystems/ext4/ondisk/blockgroup.rst @@ -0,0 +1,135 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Layout +------ + +The layout of a standard block group is a |