From e66d8631ddb3306bd9f463324c2d9a5d9dc559f7 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:01 +0100
Subject: docs: filesystems: convert erofs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/402d1d2f7252b8a683f7a9c6867bc5428da64026.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/erofs.rst | 240 ++++++++++++++++++++++++++++++++++++
 1 file changed, 240 insertions(+)
 create mode 100644 Documentation/filesystems/erofs.rst

(limited to 'Documentation/filesystems/erofs.rst')

diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
new file mode 100644
index 000000000000..bf145171c2bf
--- /dev/null
+++ b/Documentation/filesystems/erofs.rst
@@ -0,0 +1,240 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Enhanced Read-Only File System - EROFS
+======================================
+
+Overview
+========
+
+EROFS file-system stands for Enhanced Read-Only File System. Different
+from other read-only file systems, it aims to be designed for flexibility,
+scalability, but be kept simple and high performance.
+
+It is designed as a better filesystem solution for the following scenarios:
+
+ - read-only storage media or
+
+ - part of a fully trusted read-only solution, which means it needs to be
+   immutable and bit-for-bit identical to the official golden image for
+   their releases due to security and other considerations and
+
+ - hope to save some extra storage space with guaranteed end-to-end performance
+   by using reduced metadata and transparent file compression, especially
+   for those embedded devices with limited memory (ex, smartphone);
+
+Here is the main features of EROFS:
+
+ - Little endian on-disk design;
+
+ - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
+
+ - Metadata & data could be mixed by design;
+
+ - 2 inode versions for different requirements:
+
+   =====================  ============  =====================================
+                          compact (v1)  extended (v2)
+   =====================  ============  =====================================
+   Inode metadata size    32 bytes      64 bytes
+   Max file size          4 GB          16 EB (also limited by max. vol size)
+   Max uids/gids          65536         4294967296
+   File change time       no            yes (64 + 32-bit timestamp)
+   Max hardlinks          65536         4294967296
+   Metadata reserved      4 bytes       14 bytes
+   =====================  ============  =====================================
+
+ - Support extended attributes (xattrs) as an option;
+
+ - Support xattr inline and tail-end data inline for all files;
+
+ - Support POSIX.1e ACLs by using xattrs;
+
+ - Support transparent file compression as an option:
+   LZ4 algorithm with 4 KB fixed-sized output compression for high performance.
+
+The following git tree provides the file system user-space tools under
+development (ex, formatting tool mkfs.erofs):
+
+- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
+
+Bugs and patches are welcome, please kindly help us and send to the following
+linux-erofs mailing list:
+
+- linux-erofs mailing list   <linux-erofs@lists.ozlabs.org>
+
+Mount options
+=============
+
+===================    =========================================================
+(no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled
+                       by default if CONFIG_EROFS_FS_XATTR is selected.
+(no)acl                Setup POSIX Access Control List. Note: acl is enabled
+                       by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
+cache_strategy=%s      Select a strategy for cached decompression from now on:
+
+		       ==========  =============================================
+                         disabled  In-place I/O decompression only;
+                        readahead  Cache the last incomplete compressed physical
+                                   cluster for further reading. It still does
+                                   in-place I/O decompression for the rest
+                                   compressed physical clusters;
+                       readaround  Cache the both ends of incomplete compressed
+                                   physical clusters for further reading.
+                                   It still does in-place I/O decompression
+                                   for the rest compressed physical clusters.
+		       ==========  =============================================
+===================    =========================================================
+
+On-disk details
+===============
+
+Summary
+-------
+Different from other read-only file systems, an EROFS volume is designed
+to be as simple as possible::
+
+                                |-> aligned with the block size
+   ____________________________________________________________
+  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
+  |_|__|_|_____|__________|_____|______|__________|_____|______|
+  0 +1K
+
+All data areas should be aligned with the block size, but metadata areas
+may not. All metadatas can be now observed in two different spaces (views):
+
+ 1. Inode metadata space
+
+    Each valid inode should be aligned with an inode slot, which is a fixed
+    value (32 bytes) and designed to be kept in line with compact inode size.
+
+    Each inode can be directly found with the following formula:
+         inode offset = meta_blkaddr * block_size + 32 * nid
+
+    ::
+
+				    |-> aligned with 8B
+					    |-> followed closely
+	+ meta_blkaddr blocks                                      |-> another slot
+	_____________________________________________________________________
+	|  ...   | inode |  xattrs  | extents  | data inline | ... | inode ...
+	|________|_______|(optional)|(optional)|__(optional)_|_____|__________
+		|-> aligned with the inode slot size
+		    .                   .
+		    .                         .
+		.                              .
+		.                                    .
+	    .                                         .
+	    .                                              .
+	.____________________________________________________|-> aligned with 4B
+	| xattr_ibody_header | shared xattrs | inline xattrs |
+	|____________________|_______________|_______________|
+	|->    12 bytes    <-|->x * 4 bytes<-|               .
+			    .                .                 .
+			.                      .                   .
+		.                           .                     .
+	    ._______________________________.______________________.
+	    | id | id | id | id |  ... | id | ent | ... | ent| ... |
+	    |____|____|____|____|______|____|_____|_____|____|_____|
+					    |-> aligned with 4B
+							|-> aligned with 4B
+
+    Inode could be 32 or 64 bytes, which can be distinguished from a common
+    field which all inode versions have -- i_format::
+
+        __________________               __________________
+       |     i_format     |             |     i_format     |
+       |__________________|             |__________________|
+       |        ...       |             |        ...       |
+       |                  |             |                  |
+       |__________________| 32 bytes    |                  |
+                                        |                  |
+                                        |__________________| 64 bytes
+
+    Xattrs, extents, data inline are followed by the corresponding inode with
+    proper alignment, and they could be optional for different data mappings.
+    _currently_ total 4 valid data mappings are supported:
+
+    ==  ====================================================================
+     0  flat file data without data inline (no extent);
+     1  fixed-sized output data compression (with non-compacted indexes);
+     2  flat file data with tail packing data inline (no extent);
+     3  fixed-sized output data compression (with compacted indexes, v5.3+).
+    ==  ====================================================================
+
+    The size of the optional xattrs is indicated by i_xattr_count in inode
+    header. Large xattrs or xattrs shared by many different files can be
+    stored in shared xattrs metadata rather than inlined right after inode.
+
+ 2. Shared xattrs metadata space
+
+    Shared xattrs space is similar to the above inode space, started with
+    a specific block indicated by xattr_blkaddr, organized one by one with
+    proper align.
+
+    Each share xattr can also be directly found by the following formula:
+         xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
+
+    ::
+
+			    |-> aligned by  4 bytes
+	+ xattr_blkaddr blocks                     |-> aligned with 4 bytes
+	_________________________________________________________________________
+	|  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ...
+	|________|_____________|_____________|_____|______________|_______________
+
+Directories
+-----------
+All directories are now organized in a compact on-disk format. Note that
+each directory block is divided into index and name areas in order to support
+random file lookup, and all directory entries are _strictly_ recorded in
+alphabetical order in order to support improved prefix binary search
+algorithm (could refer to the related source code).
+
+::
+
+		    ___________________________
+		    /                           |
+		/              ______________|________________
+		/              /              | nameoff1       | nameoffN-1
+    ____________.______________._______________v________________v__________
+    | dirent | dirent | ... | dirent | filename | filename | ... | filename |
+    |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
+	\                           ^
+	\                          |                           * could have
+	\                         |                             trailing '\0'
+	    \________________________| nameoff0
+
+				Directory block
+
+Note that apart from the offset of the first filename, nameoff0 also indicates
+the total number of directory entries in this block since it is no need to
+introduce another on-disk field at all.
+
+Compression
+-----------
+Currently, EROFS supports 4KB fixed-sized output transparent file compression,
+as illustrated below::
+
+	    |---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
+	    clusterofs                      clusterofs            clusterofs
+	    |                               |                     |   logical data
+    _________v_______________________________v_____________________v_______________
+    ... |    .        |             |        .    |             |  .          | ...
+    ____|____.________|_____________|________.____|_____________|__.__________|____
+	|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
+	    size          size          size          size          size
+	    .                             .                .                   .
+	    .                       .               .                  .
+		.                  .              .                .
+	_______._____________._____________._____________._____________________
+	    ... |             |             |             | ... physical data
+	_______|_____________|_____________|_____________|_____________________
+		|-> cluster <-|-> cluster <-|-> cluster <-|
+		    size          size          size
+
+Currently each on-disk physical cluster can contain 4KB (un)compressed data
+at most. For each logical cluster, there is a corresponding on-disk index to
+describe its cluster type, physical cluster address, etc.
+
+See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
-- 
cgit v1.2.3