summaryrefslogtreecommitdiffstats
path: root/template/notes/notes.md
blob: a6d68eaa03b5ebd06b12f36969df601fb0b3aaab (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# Outer Most interface

something like a Mailer which might implement tokio_servie::Service (if
so multiple parameters are wrapped into a tupple)

mailer contains information like `from`

`mailer.send_mails( recipients_data, mail_gen )`

where recipients_data is a iterable mapping from address to recipient specific data,
e.g. `Vec<(Address, Data)>`

and mail_gen is something like `trait MailGen { fn gen_mail( from, to, data, bits8support ) ->  MailBody; }`

`MailBody` is not `tokio_smtp::MailBody` but has to implement nessesray contraints,
(e.g. implemnting `toki_smtp::IntoMailBody` not that for the beginning this will be
hard encoded but later one a generic variation allowing `smtp` to be switched out
by something else is also possible`)

MailGen implementations are not done by hand but implemented ontop of something
like a template spec e.g. `struct TemplateSpec { id_template: TemplateId, additional_appendixes: Vec<Appendix> }`

Where `TemplateId` can is e.g. `reset_link` leading to the creation of a `html` with alternate `plain`
mail iff there is a `reset_link.html` and a `reset_link.plain` template. A `reset_link.html.data`
folder could be used to define inline (mime related) appendixes like embedded images,
but we might want to have a way to define such embeddigns through the data (
E.g. by mapping `Data => TemplateEnginData` and replacing `EmbeddedFile` variations
by a new related id and adding the `EmbeddedFile(data)` data to the list of embeddings)



# List of parts possible non-ascii and not ascii encodable

- local-part (address/addr-spec/local-part)

# Limitations

Line length limit:

SHOULD be no more than 78 chars (excluding CRLF!)
MUST NOT be more than 998 chars (excluding CRLF)

# Orphan `\n`,`\r`

MUST NOT occur in header (except for folding)
MUST NOT occur in body (except for newline)

## Header specific limitations

- encoded word max length of 75 chars
- spaces around encoed words are ignored??


# Email Address part (a@b.e)

- there is a `domain-literal` version which does use somthing like `[some_thing]`,
  we can use puny code for converting domains into ascii but probably can't use
  this with `domain-literal`'s

- `local-part` is `dot-atom` which has leading and trailing `[CFWS]` so comments are alowed

- MessageId uses a email address like syntax but without supporting spaces/comments


# MIME

fields containing mime types can have parameters with a `<type>; key=value` style
this is mainly used for `multipart/mixed; boundary=blablabla` and similar.

You have to make sure the boundary does not appear in any of the "sub-bodies",
this is kinda easy for bodies with e.g. content transfer encoding Base64,
but can be tricky in combination with some other content as normal text
can totally contain the boundary. To prevent this:

- use long boundary strings
- encode the body with base64 even if it's "just" ascii
    - OR check the content and encode parts of it if necessary

you can have multipart in multipart creating a tree,
make sure you don't mix up the boundaries


A body part does not have to have any headers, assume default values if
there is no header, bodies which have no header _have to start with a
blank line_ separating 0 headers from the body.

Header fields of bodies which do not start with `Content-` _are ignored_!

Contend types:

- `mixed`, list of sub-bodies with mixed mime types, might be displayed inline or as appendix
    - use >>`Content-Disposition` (RFC 2183)<< to controll this, even through it's not standarized yet (or is it by now?)
    - default body mime type is `text/plain`
- `digest` for combining muliple messages of content type `message/rfc822`
    - e.g. `(multipar/mixed ("table of content") (multipart/digest "message1", "message2"))`
    - `message` (mainly `message/rfc822`) contains _another_ email, e.g. for digest
        - wait is there a `multipart/message`?? proably not!
- `alternative` multiple alternative versions of the "same" information
    - e.g. `(multipart/alternative (text/plain ...) (text/html ...))`
    - _place preferred form last!_ (i.e. increasing order of preference)
    - interesting usage with `application/X-FixedRecord`+`application/octet-stream`
- `related` (RFC 2387) all bodies are part of one howl, making no (less) sense if placed alone
    - the first part is normally the entry point, but this can be chaged through parameters
        - (only relevant for parsing AND interpreting it, but not for generating as we can always use the default)
    - Content-ID is used to specify a id on each body respectivly which can be used to refer to it (e.g. in HTML)
        - in html use e.g. `<img src="cid:the_content_id@goes.here>....</img>`
    - example is `(multipart/relat (text/html ...) (image/jpeg (Content-ID <bala@bal.bla>) ...))` for embedding a image INTO a HTML mail
- `report`
- `signed` (body part + signature part)
- `encrypted` (encryption information part + encrypted data (`application/octet-stream`))
- `form-data`
- `x-mixed-replace` (for server push, don't use by now there are better ways)
- `byteranges`


Example mail structure:

```
(multipart/mixed
    (multipart/alternative
        (text/plain ... )
        (multipart/related
            (text/hmtl ... '<img src="cid:ContentId@1aim.com"></img>' ... )
            (image/png (Content-ID <ContentId@1aim.com>) ... )
            ... ))
    (image/png (Content-Disposition attachment) ...)
    (image/png (Content-Disposition attachment) ...))
```

Possible alternate structure:

```
(multipart/mixed
    (multipart/related

        (multipart/alternative
            (text/plain ...  '[cid:ContentId@1aim.com]' ... )
            (text/html ... '<img src="cid:ContentId@1aim.com"></img>' ... ) )

        (image/png (Content-ID <ContentId@1aim.com>) ... ) )

    (image/png (Content-Disposition attachment) ...)
    (image/png (Content-Disposition attachment) ...))
```

but I have not seen the `[cid:...]` for text/plain in any standard, through it might be there.
Also if se we might still have a related specific for the html (for html only stuff) so:
- place Embedding in Data in the outer `multipart/related`
- place Embedding returned by the template in inner `multipart/related`

# Attatchment

proposed filenames for attachments can be given through parameters of the disposition header

it does not allow non ascii character there!

see rfc2231 for more information, it extends some part wrt.:

- splitting long parameters (e.g. long file names)
- specifying language and character set
- specifying language for encoded words

# Encoded Words

extended by rfc2231

additional limits in header fields

header containing encoded words are limited to 76 bytes

a "big" text chunk can be split in multiple encoded words seperated by b'\r\n '

non encoded words and encoded words can apear in the same header field, but
must be seperate by "linear-white-space" (space) which is NOT removed when
decoding encoded words

encoded words can appear in:

- `text` sections where `text` is based on RFC 822! (e.g. Content-Description )
    - in context of RFC 5322 this means `unstructured` count's as text
- `comments` (as alternative to `ctext`,`quoted-pair`,`comment`
- `word`'s within a `phrase`

**Therefor it MUST NOT appear in any structured header field except withing a `comment` or `phrase`!**

**You have to encode text which looks like an encoded word**



limitations:

- in comment's no ')',')' and '"'
- in headers no ' '


# Other

there is no `[CFWS]` after the `:` in Header fields,
but most (all?) of the parts following them are allowed
to start with a `[CFWS]`. (exception is unstructured where
a `CFWS` can be allowed but also MIGHT be part of the
string)

CFWS -> (un-) foldable whitespace allowing comments
FWS -> (un-) foldable whitespace without comments


# Relevant RFCs
5321, 5322, 6854, 3492, 2045, 2046, 2047, 4288,  4289, 2049, 6531, 5890

make sure to not use the outdated versions


# Parsing Notes

be strict when parsing (e.g. only ws and printable in subject line)

if "some other" strings should still be supported do not do zero
copy, but instead add the data to a new buff _replacing invalid
chars with replacement symbol or just stripping them_


# Non-utf8 Non-Ascci bytes in Mail body

The mail body can contain non-utf8, non-ascii data (e.g.
utf16 data, images etc.) WITHOUT base64 encoding if
8BITMIME is supported (note there is also BINARY and CHUNKING)

smtp still considers _the bytes_ corresponding to CR LF and DOT special.

- there is a line length limit, lines terminate with b'CRLF'
- b'.CRLF' does sill end the body (if preceeded by CRLF, or body starts with it)
    - so dot-staching is still done on protocol level



## Hot to handle `obs-` parsings

we have to be able to parse mails with obsolete syntax (theoretically)
but should never genrate such mails, the encder excepts its underlying
data to be correct, but it might not be if we directly place `obs-`
parsed data there. For many parts this is no problem as the
`obs-` syntax is a lot about having FWS at other positions,
_between components_ (so we won't have a problem with the encoder).
Or some additional obsolete infromations (which we often/allways can just
"skip" over). So we have to check if there are any braking cases and if
we have to not zero copy them when parsing but instead transform them
into a valide representation, in worst case we could add a `not_encodable`
field to some structs.

# TODO
check if some parts are empty and error if encode is called on them
e.g. empty domain

make sure trace and resend fields are:

1. encoded in order (MUST)
2. encoded as blocks (MUST?)
3. encoded before other fields (SHOULD)

as people may come up with their own trace like fileds,
rule 1 and 2 should appy to all fields


make sure trace,resent-* are multi fields

add a RawUnstructured not doing any encoding, but only validity checking

# Postponded

`component::Disposition` should have a `Other` variant, using `Token` (which
means a general extension token type is needed)

other features like signature, encryption etc.

check what happens if I "execute" a async/mio/>tokio<
based future in a CPU pool? Does it just do live
polling in the thread? Or does it act more intelligent?
or does it simply fail?

just before encoding singlepart bodies, resource is resolved,
therefore:

1. we now have the MediaType + File meta + TransferEncoding
2. add* ContentType header to headers
3. add* ContentTransferEncoding header to headers
4. add* file meta infor to ContentDisposition header if it exists
5. note that >add*< is not modifying Mail, but adds it to the list of headers to encode


warn when encoding a Disposition of kind Attachment which's
file_meta has no name set


// From RFC 2183:
// NOTE ON PARAMETER VALUE LENGHTS: A short (length <= 78 characters)
// parameter value containing only non-`tspecials' characters SHOULD be
// represented as a single `token'.  A short parameter value containing
// only ASCII characters, but including `tspecials' characters, SHOULD
// be represented as `quoted-string'.  Parameter values longer than 78
// characters, or which contain non-ASCII characters, MUST be encoded as
// specified in [RFC 2184].
provide a gnneral way for encoding header parameter which follow the scheme:
`<mainvalue> *(";" <key>"="<value> )` this are ContentType and ContentDisposition
 for now


IF Item::Encoded only appears as encoded word, make it Item::Encoded word,
possible checking for "more" validity then noew


email::quote => do not escape WSP, and use FWS when encoding
also make quote, generally available for library useers a
create_quoted_string( .. )

# Dependencies

quoted_printable and base64 have some problems:
1. it's speaking of a 76 character limit where it is 78
   it seems they treated the RFC as 78 character including
   CRLF where the RFC speaks of 78 characters EXCLUDING
   CRLF
2. it's only suited for content transfer encoding the body
   as there is a limit of the length of encoded words (75)
   which can't be handled by both

also quoted_printable has another problem:
3. in headers the number of character which can be displayed without
   encoding is more limited (e.g. no ' ' ) quoted_printable does not
   respect this? (TODO CHECK THIS)