diff options
Diffstat (limited to 'template/notes/notes.md')
-rw-r--r-- | template/notes/notes.md | 331 |
1 files changed, 331 insertions, 0 deletions
diff --git a/template/notes/notes.md b/template/notes/notes.md new file mode 100644 index 0000000..a6d68ea --- /dev/null +++ b/template/notes/notes.md @@ -0,0 +1,331 @@ + +# Outer Most interface + +something like a Mailer which might implement tokio_servie::Service (if +so multiple parameters are wrapped into a tupple) + +mailer contains information like `from` + +`mailer.send_mails( recipients_data, mail_gen )` + +where recipients_data is a iterable mapping from address to recipient specific data, +e.g. `Vec<(Address, Data)>` + +and mail_gen is something like `trait MailGen { fn gen_mail( from, to, data, bits8support ) -> MailBody; }` + +`MailBody` is not `tokio_smtp::MailBody` but has to implement nessesray contraints, +(e.g. implemnting `toki_smtp::IntoMailBody` not that for the beginning this will be +hard encoded but later one a generic variation allowing `smtp` to be switched out +by something else is also possible`) + +MailGen implementations are not done by hand but implemented ontop of something +like a template spec e.g. `struct TemplateSpec { id_template: TemplateId, additional_appendixes: Vec<Appendix> }` + +Where `TemplateId` can is e.g. `reset_link` leading to the creation of a `html` with alternate `plain` +mail iff there is a `reset_link.html` and a `reset_link.plain` template. A `reset_link.html.data` +folder could be used to define inline (mime related) appendixes like embedded images, +but we might want to have a way to define such embeddigns through the data ( +E.g. by mapping `Data => TemplateEnginData` and replacing `EmbeddedFile` variations +by a new related id and adding the `EmbeddedFile(data)` data to the list of embeddings) + + + +# List of parts possible non-ascii and not ascii encodable + +- local-part (address/addr-spec/local-part) + +# Limitations + +Line length limit: + +SHOULD be no more than 78 chars (excluding CRLF!) +MUST NOT be more than 998 chars (excluding CRLF) + +# Orphan `\n`,`\r` + +MUST NOT occur in header (except for folding) +MUST NOT occur in body (except for newline) + +## Header specific limitations + +- encoded word max length of 75 chars +- spaces around encoed words are ignored?? + + +# Email Address part (a@b.e) + +- there is a `domain-literal` version which does use somthing like `[some_thing]`, + we can use puny code for converting domains into ascii but probably can't use + this with `domain-literal`'s + +- `local-part` is `dot-atom` which has leading and trailing `[CFWS]` so comments are alowed + +- MessageId uses a email address like syntax but without supporting spaces/comments + + +# MIME + +fields containing mime types can have parameters with a `<type>; key=value` style +this is mainly used for `multipart/mixed; boundary=blablabla` and similar. + +You have to make sure the boundary does not appear in any of the "sub-bodies", +this is kinda easy for bodies with e.g. content transfer encoding Base64, +but can be tricky in combination with some other content as normal text +can totally contain the boundary. To prevent this: + +- use long boundary strings +- encode the body with base64 even if it's "just" ascii + - OR check the content and encode parts of it if necessary + +you can have multipart in multipart creating a tree, +make sure you don't mix up the boundaries + + +A body part does not have to have any headers, assume default values if +there is no header, bodies which have no header _have to start with a +blank line_ separating 0 headers from the body. + +Header fields of bodies which do not start with `Content-` _are ignored_! + +Contend types: + +- `mixed`, list of sub-bodies with mixed mime types, might be displayed inline or as appendix + - use >>`Content-Disposition` (RFC 2183)<< to controll this, even through it's not standarized yet (or is it by now?) + - default body mime type is `text/plain` +- `digest` for combining muliple messages of content type `message/rfc822` + - e.g. `(multipar/mixed ("table of content") (multipart/digest "message1", "message2"))` + - `message` (mainly `message/rfc822`) contains _another_ email, e.g. for digest + - wait is there a `multipart/message`?? proably not! +- `alternative` multiple alternative versions of the "same" information + - e.g. `(multipart/alternative (text/plain ...) (text/html ...))` + - _place preferred form last!_ (i.e. increasing order of preference) + - interesting usage with `application/X-FixedRecord`+`application/octet-stream` +- `related` (RFC 2387) all bodies are part of one howl, making no (less) sense if placed alone + - the first part is normally the entry point, but this can be chaged through parameters + - (only relevant for parsing AND interpreting it, but not for generating as we can always use the default) + - Content-ID is used to specify a id on each body respectivly which can be used to refer to it (e.g. in HTML) + - in html use e.g. `<img src="cid:the_content_id@goes.here>....</img>` + - example is `(multipart/relat (text/html ...) (image/jpeg (Content-ID <bala@bal.bla>) ...))` for embedding a image INTO a HTML mail +- `report` +- `signed` (body part + signature part) +- `encrypted` (encryption information part + encrypted data (`application/octet-stream`)) +- `form-data` +- `x-mixed-replace` (for server push, don't use by now there are better ways) +- `byteranges` + + +Example mail structure: + +``` +(multipart/mixed + (multipart/alternative + (text/plain ... ) + (multipart/related + (text/hmtl ... '<img src="cid:ContentId@1aim.com"></img>' ... ) + (image/png (Content-ID <ContentId@1aim.com>) ... ) + ... )) + (image/png (Content-Disposition attachment) ...) + (image/png (Content-Disposition attachment) ...)) +``` + +Possible alternate structure: + +``` +(multipart/mixed + (multipart/related + + (multipart/alternative + (text/plain ... '[cid:ContentId@1aim.com]' ... ) + (text/html ... '<img src="cid:ContentId@1aim.com"></img>' ... ) ) + + (image/png (Content-ID <ContentId@1aim.com>) ... ) ) + + (image/png (Content-Disposition attachment) ...) + (image/png (Content-Disposition attachment) ...)) +``` + +but I have not seen the `[cid:...]` for text/plain in any standard, through it might be there. +Also if se we might still have a related specific for the html (for html only stuff) so: +- place Embedding in Data in the outer `multipart/related` +- place Embedding returned by the template in inner `multipart/related` + +# Attatchment + +proposed filenames for attachments can be given through parameters of the disposition header + +it does not allow non ascii character there! + +see rfc2231 for more information, it extends some part wrt.: + +- splitting long parameters (e.g. long file names) +- specifying language and character set +- specifying language for encoded words + +# Encoded Words + +extended by rfc2231 + +additional limits in header fields + +header containing encoded words are limited to 76 bytes + +a "big" text chunk can be split in multiple encoded words seperated by b'\r\n ' + +non encoded words and encoded words can apear in the same header field, but +must be seperate by "linear-white-space" (space) which is NOT removed when +decoding encoded words + +encoded words can appear in: + +- `text` sections where `text` is based on RFC 822! (e.g. Content-Description ) + - in context of RFC 5322 this means `unstructured` count's as text +- `comments` (as alternative to `ctext`,`quoted-pair`,`comment` +- `word`'s within a `phrase` + +**Therefor it MUST NOT appear in any structured header field except withing a `comment` or `phrase`!** + +**You have to encode text which looks like an encoded word** + + + +limitations: + +- in comment's no ')',')' and '"' +- in headers no ' ' + + +# Other + +there is no `[CFWS]` after the `:` in Header fields, +but most (all?) of the parts following them are allowed +to start with a `[CFWS]`. (exception is unstructured where +a `CFWS` can be allowed but also MIGHT be part of the +string) + +CFWS -> (un-) foldable whitespace allowing comments +FWS -> (un-) foldable whitespace without comments + + +# Relevant RFCs +5321, 5322, 6854, 3492, 2045, 2046, 2047, 4288, 4289, 2049, 6531, 5890 + +make sure to not use the outdated versions + + +# Parsing Notes + +be strict when parsing (e.g. only ws and printable in subject line) + +if "some other" strings should still be supported do not do zero +copy, but instead add the data to a new buff _replacing invalid +chars with replacement symbol or just stripping them_ + + +# Non-utf8 Non-Ascci bytes in Mail body + +The mail body can contain non-utf8, non-ascii data (e.g. +utf16 data, images etc.) WITHOUT base64 encoding if +8BITMIME is supported (note there is also BINARY and CHUNKING) + +smtp still considers _the bytes_ corresponding to CR LF and DOT special. + +- there is a line length limit, lines terminate with b'CRLF' +- b'.CRLF' does sill end the body (if preceeded by CRLF, or body starts with it) + - so dot-staching is still done on protocol level + + + +## Hot to handle `obs-` parsings + +we have to be able to parse mails with obsolete syntax (theoretically) +but should never genrate such mails, the encder excepts its underlying +data to be correct, but it might not be if we directly place `obs-` +parsed data there. For many parts this is no problem as the +`obs-` syntax is a lot about having FWS at other positions, +_between components_ (so we won't have a problem with the encoder). +Or some additional obsolete infromations (which we often/allways can just +"skip" over). So we have to check if there are any braking cases and if +we have to not zero copy them when parsing but instead transform them +into a valide representation, in worst case we could add a `not_encodable` +field to some structs. + +# TODO +check if some parts are empty and error if encode is called on them +e.g. empty domain + +make sure trace and resend fields are: + +1. encoded in order (MUST) +2. encoded as blocks (MUST?) +3. encoded before other fields (SHOULD) + +as people may come up with their own trace like fileds, +rule 1 and 2 should appy to all fields + + +make sure trace,resent-* are multi fields + +add a RawUnstructured not doing any encoding, but only validity checking + +# Postponded + +`component::Disposition` should have a `Other` variant, using `Token` (which +means a general extension token type is needed) + +other features like signature, encryption etc. + +check what happens if I "execute" a async/mio/>tokio< +based future in a CPU pool? Does it just do live +polling in the thread? Or does it act more intelligent? +or does it simply fail? + +just before encoding singlepart bodies, resource is resolved, +therefore: + +1. we now have the MediaType + File meta + TransferEncoding +2. add* ContentType header to headers +3. add* ContentTransferEncoding header to headers +4. add* file meta infor to ContentDisposition header if it exists +5. note that >add*< is not modifying Mail, but adds it to the list of headers to encode + + +warn when encoding a Disposition of kind Attachment which's +file_meta has no name set + + +// From RFC 2183: +// NOTE ON PARAMETER VALUE LENGHTS: A short (length <= 78 characters) +// parameter value containing only non-`tspecials' characters SHOULD be +// represented as a single `token'. A short parameter value containing +// only ASCII characters, but including `tspecials' characters, SHOULD +// be represented as `quoted-string'. Parameter values longer than 78 +// characters, or which contain non-ASCII characters, MUST be encoded as +// specified in [RFC 2184]. +provide a gnneral way for encoding header parameter which follow the scheme: +`<mainvalue> *(";" <key>"="<value> )` this are ContentType and ContentDisposition + for now + + +IF Item::Encoded only appears as encoded word, make it Item::Encoded word, +possible checking for "more" validity then noew + + +email::quote => do not escape WSP, and use FWS when encoding +also make quote, generally available for library useers a +create_quoted_string( .. ) + +# Dependencies + +quoted_printable and base64 have some problems: +1. it's speaking of a 76 character limit where it is 78 + it seems they treated the RFC as 78 character including + CRLF where the RFC speaks of 78 characters EXCLUDING + CRLF +2. it's only suited for content transfer encoding the body + as there is a limit of the length of encoded words (75) + which can't be handled by both + +also quoted_printable has another problem: +3. in headers the number of character which can be displayed without + encoding is more limited (e.g. no ' ' ) quoted_printable does not + respect this? (TODO CHECK THIS) |