diff options
author | Christian Brabandt <cb@256bit.org> | 2023-04-24 21:09:54 +0100 |
---|---|---|
committer | Bram Moolenaar <Bram@vim.org> | 2023-04-24 21:09:54 +0100 |
commit | 67672ef097dd708244ff042a8364994da2b91e75 (patch) | |
tree | 728d6713562555f7917bf5297c6ff27b2bb0c792 /runtime/doc/eval.txt | |
parent | e1b4822137b74d45fde1d47d2e32b3ae89966092 (diff) |
patch 9.0.1485: no functions for converting from/to UTF-16 indexv9.0.1485
Problem: no functions for converting from/to UTF-16 index.
Solution: Add UTF-16 flag to existing funtions and add strutf16len() and
utf16idx(). (Yegappan Lakshmanan, closes #12216)
Diffstat (limited to 'runtime/doc/eval.txt')
-rw-r--r-- | runtime/doc/eval.txt | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/runtime/doc/eval.txt b/runtime/doc/eval.txt index 5c77c796f0..b863f42e68 100644 --- a/runtime/doc/eval.txt +++ b/runtime/doc/eval.txt @@ -1580,6 +1580,33 @@ Examples: > echo $"The square root of {{9}} is {sqrt(9)}" < The square root of {9} is 3.0 ~ + *string-offset-encoding* +A string consists of multiple characters. How the characters are stored +depends on 'encoding'. Most common is UTF-8, which uses one byte for ASCII +characters, two bytes for other latin characters and more bytes for other +characters. + +A string offset can count characters or bytes. Other programs may use +UTF-16 encoding (16-bit words) and an offset of UTF-16 words. Some functions +use byte offsets, usually for UTF-8 encoding. Other functions use character +offsets, in which case the encoding doesn't matter. + +The different offsets for the string "a©😊" are below: + + UTF-8 offsets: + [0]: 61, [1]: C2, [2]: A9, [3]: F0, [4]: 9F, [5]: 98, [6]: 8A + UTF-16 offsets: + [0]: 0061, [1]: 00A9, [2]: D83D, [3]: DE0A + UTF-32 (character) offsets: + [0]: 00000061, [1]: 000000A9, [2]: 0001F60A + +You can use the "g8" and "ga" commands on a character to see the +decimal/hex/octal values. + +The functions |byteidx()|, |utf16idx()| and |charidx()| can be used to convert +between these indices. The functions |strlen()|, |strutf16len()| and +|strcharlen()| return the number of bytes, UTF-16 code units and characters in +a string respectively. option *expr-option* *E112* *E113* ------ |