diff options
author | Tomas Mraz <tomas@openssl.org> | 2023-04-28 19:28:53 +0200 |
---|---|---|
committer | Tomas Mraz <tomas@openssl.org> | 2023-05-18 13:24:05 +0200 |
commit | 95d3c148ca3818a8773f293e9a886a3ec4185353 (patch) | |
tree | 205be394c5c2cc9afd725c086bdaf85253ea17de | |
parent | 831ef5347253a9381c2ab6bd3ca74cbe10995939 (diff) |
Initial design for error handling in QUIC
Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20857)
-rw-r--r-- | doc/designs/quic-design/error-handling.md | 101 |
1 files changed, 101 insertions, 0 deletions
diff --git a/doc/designs/quic-design/error-handling.md b/doc/designs/quic-design/error-handling.md new file mode 100644 index 0000000000..070304bec4 --- /dev/null +++ b/doc/designs/quic-design/error-handling.md @@ -0,0 +1,101 @@ +Error handling in QUIC code +=========================== + +Current situation with TLS +-------------------------- + +The errors are put on the error stack (rather a queue but error stack is +used throughout the code base) during the libssl API calls. In most +(if not all) cases they should appear there only if the API call returns an +error return value. The `SSL_get_error()` call depends on the stack being +clean before the API call to be properly able to determine if the API +call caused a library or system (I/O) error. + +The error stacks are thread-local. Libssl API calls from separate threads +push errors to these separate error stacks. It is unusual to invoke libssl +APIs with the same SSL object from different threads, but even if it happens, +it is not a problem as applications are supposed to check for errors +immediately after the API call on the same thread. There is no such thing as +Thread-assisted mode of operation. + +Constraints +----------- + +We need to keep using the existing ERR API as doing otherwise would +complicate the existing applications and break our API compatibility promise. +Even the ERR_STATE structure is public, although deprecated, and thus its +structure and semantics cannot be changed. + +The error stack access is not under a lock (because it is thread-local). +This complicates _moving errors between threads_. + +Error stack entries contain allocated data, copying entries between threads +implies duplicating it or losing it. + +Assumptions +----------- + +This document assumes the actual error state of the QUIC connection (or stream +for stream level errors) is handled separately from the auxiliary error reason +entries on the error stack. + +We can assume the internal assistance thread is well-behaving in regards +to the error stack. + +We assume there are two types of errors that can be raised in the QUIC +library calls and in the subordinate libcrypto (and provider) calls. First +type is an intermittent error that does not really affect the state of the +QUIC connection - for example EAGAIN returned on a syscall, or unavailability +of some algorithm where there are other algorithms to try. Second type +is a permanent error that affects the error state of the QUIC connection. +Operations on QUIC streams (SSL_write(), SSL_read()) can also trigger errors, +depending on their effect they are either permanent if they cause the +QUIC connection to enter an error state, or if they just affect the stream +they are left on the error stack of the thread that called SSL_write() +or SSL_read() on the stream. + +Design +------ + +Return value of SSL_get_error() on QUIC connections or streams does not +depend on the error stack contents. + +Intermittent errors are handled within the library and cleared from the +error stack before returning to the user. + +Permanent errors happenning within the assist thread, within SSL_tick() +processing, or when calling SSL_read()/SSL_write() on a stream need to be +replicated for SSL_read()/SSL_write() calls on other streams. + +Implementation +-------------- + +There is an error stack in QUIC_CHANNEL which serves as temporary storage +for errors happening in the internal assistance thread. When a permanent error +is detected the error stack entries are moved to this error stack in +QUIC_CHANNEL. + +When returning to an application from a SSL_read()/SSL_write() call with +a permanent connection error, entries from the QUIC_CHANNEL error stack +are copied to the thread local error stack. They are always kept on +the QUIC_CHANNEL error stack as well for possible further calls from +an application. An additional error reason +SSL_R_QUIC_CONNECTION_TERMINATED is added to the stack. + +SSL_tick() return value +----------------------- + +The return value of SSL_tick() does not depend on whether there is +a permanent error on the connection. The only case when SSL_tick() may +return an error is when there was some fatal error processing it +such as a memory allocation error where no further SSL_tick() calls +make any sense. + +Multi-stream-multi-thread mode +------------------------------ + +There is nothing particular that needs to be handled specially for +multi-stream-multi-thread mode as the error stack entries are always +copied from the QUIC_CHANNEL after the failure. So if multiple threads +are calling SSL_read()/SSL_write() simultaneously they all get +the same error stack entries to report to the user. |