docs: Improve error handling docs (#1919)

* docs: Improve error handling docs and add more TL;DRs and new sections about handling specific errors, and logging errors. * contributing: Add more coding tips
author: har7an <99636919+har7an@users.noreply.github.com> 2022-11-09 18:01:36 +0000
committer: GitHub <noreply@github.com> 2022-11-09 18:01:36 +0000
commit: 49b831c032aa4d7f770847ba0c03050a09c80fe1 (patch)
tree: 9aa6efdc4bd85a47eb08914cd45afc82ad325a24 /docs
parent: d4beabfeb294f7abad06891b2f6c9cf07cce60c6 (diff)
1 files changed, 185 insertions, 1 deletions
diff --git a/docs/ERROR_HANDLING.md b/docs/ERROR_HANDLING.md
index 6304e9661..da12d57b7 100644
--- a/docs/ERROR_HANDLING.md
+++ b/docs/ERROR_HANDLING.md
@@ -92,12 +92,16 @@ to handle it. Instead of `panic`ing the application, the error is written to
 the application log and execution continues. Please use this sparingly, as an
 error usually calls for actions to be taken rather than ignoring it.
 
+
+
 # Examples of applied error handling
 
 You can have a look at the commit that introduced error handling to the
 `zellij_server::screen` module [right here][1] (look at the changes in
 `zellij-server/src/screen.rs`). We'll use this to demonstrate a few things in
-the following text.
+the following text. You can find countless other examples in the [tracking
+issue for error handling][3]
+
 
 ## Converting a function to return a `Result` type
 
@@ -184,6 +188,7 @@ time to do something about the error.
 In general, any function calling `unwrap` or `expect` is a good candidate to be
 rewritten to return a `Result` type instead.
 
+
 ## Attaching context
 
 [Anyhow][anyhow]s [`Context`][context] trait gives us two methods to attach
@@ -243,8 +248,13 @@ Feel free to move context string/closure to a variable to avoid copy-pasting:
 When there is only a single `Result` to be returned from your function, use
 `context` as shown above for `resize_to_screen`.
 
+
 ## Choosing helpful context messages
 
+> **TL;DR**
+> - Don't repeat what the error message in the `Result` says
+> - Describe what you were doing, ideally include the current functions name
+
 When attaching context to an error, usually you want to express what you were
 doing when the error occurred, i.e. in what context the error occurred. In the
 `render` method, we could have done something like this instead:
@@ -276,8 +286,13 @@ give us a [`NotFound`][2] error), so we don't have to repeat that.
 In case of doubt, look at the name of the function you're currently working in
 and write a context message somehow mentioning this.
 
+
 ## Terminating execution
 
+> **TL;DR**
+> - Terminate execution on errors by adding `.fatal()` to it
+> - First try to pass the error as far up as you can or deem reasonable
+
 We want to propagate errors as far up as we can. This way, every function along
 the way can at least attach a context message giving us an idea what chain of
 events lead to the error. Where do we terminate execution in `Screen`? If you
@@ -330,6 +345,10 @@ resulting output!
 
 ## Error handling for `Option` types
 
+> **TL;DR**
+> - Attach a `.context` with a message saying why a `None` here is an error
+> - Attach a regular context message like you would for a `Result` type, too!
+
 Beyond what's described in "Choosing helpful context messages" above, `Option`
 types benefit from extra handling. That is because a `Option` containing a
 `None` where a value is expected doesn't carry an error message: It doesn't
@@ -365,13 +384,178 @@ attach a description manually. The second context:
 then describes what the surrounding function was trying to achieve (See
 descriptions above).
 
+
+## Logging errors
+
+> **TL;DR**  
+> - When there's a `Result` type around, use `.non_fatal()` on that instead of `log::error!`
+> - When there's a `Err` type around, use `Err::<(), _>(err).non_fatal()`
+> - Also attach context before logging!
+> - For further examples, refer to [PR #1881][pr1881]
+
+You may encounter situations where you have an error and decide it's safe to
+ignore. Depending on the circumstances, this is a perfectly fine thing to do.
+However, oftentimes it proves to be useful to at least log the error, so in
+case things do go wrong we at least see the logged error message. Also, the
+logged message may hint towards an underlying problem which may require further
+action.
+
+An obvious thing to do is something like the following:
+
+```rust
+log::error!("failed to find tab with index {tab_index}");
+```
+
+While an ad-hoc log message is better than silently ignoring the error, we can
+usually do better than that. That is because in large parts of the codebase we
+have a `Result` available in one form or another.
+
+If the `Result` has been treated as suggested above and context messages have
+been attached to it, it already contains a lot of valuable information. This is
+lost when instead we log a custom error. Hence, the better solution is to log
+the `Result` type including all the context information.
+
+This is easily achieved like so:
+
+```rust
+fs::create_dir_all(&plugin_global_data_dir)
+    .context("failed to create plugin asset directory")
+    .non_fatal();
+```
+
+Here, we try to create a directory, and if we fail to do this we simply log the
+error (with `non_fatal()`) and continue. It's important to note that the
+`non_fatal()` function always returns `()`: It cannot be used on any `Result`
+type whose `Ok` value is different from `()`. This is on purpose: If your
+`Result` carries a type, it is probably required for further
+calculations/actions. Hence, you mustn't ignore it.
+
+Also note that, even though we log the error and continue, we still attach
+context information to it. Just like with fatal errors, the context information
+allows us to tell what we tried to do before we got the error and logged it.
+
+This is a simple example, and oftentimes the `Result` you're trying to handle
+doesn't carry a `()`. Your code may look more like this:
+
+```rust
+if let Ok(active_tab) = self.get_active_tab(client_id) {
+    let active_tab_pos = active_tab.position;
+    let new_tab_pos = (active_tab_pos + 1) % self.tabs.len();
+    return self.switch_active_tab(new_tab_pos, client_id);
+} else {
+    log::error!("Active tab not found for client_id: {:?}", client_id);
+}
+```
+
+When we get an `Ok`, we do something with it. When we get an `Err`, we log a
+message. Here you'll notice that the usage of an `if let` statement hides the
+error from us: In the `else` branch, we have no means to access the error we
+got from `get_active_tab()` above. In such a situation, it is helpful to
+rewrite the `if let` to a `match` statement instead:
+
+```rust
+match self.get_active_tab(client_id) {
+    Ok(active_tab) => {
+        let active_tab_pos = active_tab.position;
+        let new_tab_pos = (active_tab_pos + 1) % self.tabs.len();
+        return self.switch_active_tab(new_tab_pos, client_id);
+    },
+    Err(err) => Err::<(), _>(err).with_context(err_context).non_fatal(),
+}
+```
+
+The interesting part here is what happens after the `Err(err)`. Let's break it down:
+
+1. `Err(err) =>`: We're making the `Err` value that was inside the `Result`
+   accessible in the variable `err`
+2. `Err::<(), _>(err)`: We're wrapping the `err` back into a `Result` with the
+   `Err()` function
+3. `.with_context(err_context)`: We're attaching error context (the actual
+   context isn't shown here)
+4. `.non_fatal()`: We're logging the error with `non_fatal`
+
+Notice that in step 2 we must qualify the final `Result` type, which is the job
+of the `::<(), _>` attached to `Err`. This is necessary because at this point
+Rust cannot determine on its own what the `Ok` value of the `Result` we create
+may be. This isn't relevant for our case, because we *know* the `Result` will
+never be an `Ok` variant, but Rust still requires this.
+
+If you're getting errors about "type annotations needed" or "cannot infer type
+for type parameter `T`" in one of your calls to `Err`, this is most likely
+fixed by adding `::<(), _>`.
+
+
+## Adding Concrete Errors, Handling Specific Errors
+
+> **TL;DR**
+> - Add a new variant to `zellij_utils::errors::ZellijError`
+> - Use `anyhow::Error::downcast_ref::<ZellijError>()` to recover underlying errors
+
+Sometimes you'll find yourself in a situation where you want to react to very
+specific errors. For example, the "command panes" feature in zellij has a
+special handling for "command not found" errors. If all the `anyhow::Error`s
+are the same, how can we distinguish between underlying error types?
+
+This is possible because while `anyhow` can unify a vast amount of errors into
+the `anyhow::Error` type, it also gives us the possibility to recover
+underlying error types. To do this, however, we must first know what error type
+to expect.
+
+External libraries, such as other crates or even `std` will likely define their
+own error types. These error types have distinct error variants that one can
+distinguish and react upon. But what happens, for example, if we create the
+error we want to react to ourselves?
+
+For this purpose, there is the `ZellijError`, which is contained in
+`zellij_utils::errors`. It is built with the [`thiserror`][thiserror] crate and
+hence easily extensible. If you need a specific error type to act upon, just
+define a new variant in `ZellijError`. It is automatically available in any
+source file that has the `use zellij_utils::errors::prelude::*;` statement in
+it.
+
+Once you have created your error instance, as soon as you wrap a `context`
+around it, it is turned into an `anyhow::Error`. This makes it compatible with
+all the other functions in the code that return `anyhow::Result`.
+
+Recovering the error can look, for example, like this:
+
+```rust
+match pty
+    .spawn_terminal(terminal_action, client_or_tab_index)
+    .with_context(err_context)  // <-- Note how we attach a context, but can
+                                //     still recover the error below!
+{
+    Ok(_) => {
+        // ... Whatever
+    },
+    Err(err) => match err.downcast_ref::<ZellijError>() {
+        Some(ZellijError::CommandNotFound { terminal_id, .. }) => {
+            // Do something now that this error occured.
+            // We can even access the values stored inside it, "terminal_id" in
+            // this case
+        },
+        // You can check for other error variants here
+        _ => {
+            // Some other error, which we haven't checked for, occured here.
+            // Now we can, for example, log it!
+            Err::<(), _>(err).non_fatal(),
+        },
+    },
+}
+```
+
+
+
 [tracking_issue]: https://github.com/zellij-org/zellij/issues/1753
 [handle_panic]: https://docs.rs/zellij-utils/latest/zellij_utils/errors/fn.handle_panic.html
 [miette]: https://crates.io/crates/miette
 [anyhow]: https://crates.io/crates/anyhow
+[thiserror]: https://crates.io/crates/thiserror
 [context]: https://docs.rs/anyhow/latest/anyhow/trait.Context.html
 [result]: https://doc.rust-lang.org/stable/std/result/enum.Result.html
 [fatal]: https://docs.rs/zellij-utils/latest/zellij_utils/errors/trait.FatalError.html#tymethod.fatal
 [non_fatal]: https://docs.rs/zellij-utils/latest/zellij_utils/errors/trait.FatalError.html#tymethod.non_fatal
 [1]: https://github.com/zellij-org/zellij/commit/99e2bef8c68bd166cf89e90c8ffe8c02272ab4d3
 [2]: https://doc.rust-lang.org/stable/std/io/enum.ErrorKind.html#variant.NotFound
+[3]: https://github.com/zellij-org/zellij/issues/1753
+[pr1881]: https://github.com/zellij-org/zellij/pull/1881
author	har7an <99636919+har7an@users.noreply.github.com>	2022-11-09 18:01:36 +0000
committer	GitHub <noreply@github.com>	2022-11-09 18:01:36 +0000
commit	49b831c032aa4d7f770847ba0c03050a09c80fe1 (patch)
tree	9aa6efdc4bd85a47eb08914cd45afc82ad325a24 /docs
parent	d4beabfeb294f7abad06891b2f6c9cf07cce60c6 (diff)