*mbyte.txt* For Vim version 7.0aa. Last change: 2006 Mar 05
VIM REFERENCE MANUAL by Bram Moolenaar et al.
Multi-byte support *multibyte* *multi-byte*
*Chinese* *Japanese* *Korean*
This is about editing text in languages which have many characters that can
not be represented using one byte (one octet). Examples are Chinese, Japanese
and Korean. Unicode is also covered here.
For an introduction to the most common features, see |usr_45.txt| in the user
manual.
For changing the language of messages and menus see |mlang.txt|.
{not available when compiled without the +multi_byte feature}
1. Getting started |mbyte-first|
2. Locale |mbyte-locale|
3. Encoding |mbyte-encoding|
4. Using a terminal |mbyte-terminal|
5. Fonts on X11 |mbyte-fonts-X11|
6. Fonts on MS-Windows |mbyte-fonts-MSwin|
7. Input on X11 |mbyte-XIM|
8. Input on MS-Windows |mbyte-IME|
9. Input with a keymap |mbyte-keymap|
10. Using UTF-8 |mbyte-utf8|
11. Overview of options |mbyte-options|
NOTE: This file contains UTF-8 characters. These may show up as strange
characters or boxes when using another encoding.
==============================================================================
1. Getting started *mbyte-first*
This is a summary of the multibyte features in Vim. If you are lucky it works
as described and you can start using Vim without much trouble. If something
doesn't work you will have to read the rest. Don't be surprised if it takes
quite a bit of work and experimenting to make Vim use all the multi-byte
features. Unfortunately, every system has its own way to deal with multibyte
languages and it is quite complicated.
COMPILING
If you already have a compiled Vim program, check if the |+multi_byte| feature
is included. The |:version| command can be used for this.
If +multi_byte is not included, you should compile Vim with "big" features.
You can further tune what features are included. See the INSTALL files in the
source directory.
LOCALE
First of all, you must make sure your current locale is set correctly. If
your system has been installed to use the language, it probably works right
away. If not, you can often make it work by setting the $LANG environment
variable in your shell: >
setenv LANG ja_JP.EUC
Unfortunately, the name of the locale depends on your system. Japanese might
also be called "ja_JP.EUCjp" or just "ja". To see what is currently used: >
:language
To change the locale inside Vim use: >
:language ja_JP.EUC
Vim will give an error message if this doesn't work. This is a good way to
experiment and find the locale name you want to use. But it's always better
to set the locale in the shell, so that it is used right from the start.
See |mbyte-locale| for details.
ENCODING
If your locale works properly, Vim will try to set the 'encoding' option
accordingly. If this doesn't work you can overrule its value: >
:set encoding=utf-8
See |encoding-values| for a list of acceptable values.
The result is that all the text that is used inside Vim will be in this
encoding. Not only the text in the buffers, but also in registers, variables,
etc. This also means that changing the value of 'encoding' makes the existing
text invalid! The text doesn't change, but it will be displayed wrong.
You can edit files in another encoding than what 'encoding' is set to. Vim
will convert the file when you read it and convert it back when you write it.
See 'fileencoding', 'fileencodings' and |++enc|.
DISPLAY AND FONTS
If you are working in a terminal (emulator) you must make sure it accepts the
same encoding as which Vim is working with. If this is not the case, you can
use the 'termencoding' option to make Vim convert text automatically.
For the GUI you must select fonts that work with the current 'encoding'. This
is the difficult part. It depends on the system you are using, the locale and
a few other things. See the chapters on fonts: |mbyte-fonts-X11| for
X-Windows and |mbyte-fonts-MSwin| for MS-Windows.
For GTK+ 2, you can skip most of this section. The option 'guifontset' does
no longer exist. You only need to set 'guifont' and everything should "just
work". If your system comes with Xft2 and fontconfig and the current font
does not contain a certain glyph, a different font will be used automatically
if available. The 'guifontwide' option is still supported but usually you do
not need to set it. It is only necessary if the automatic font selection does
not suit your needs.
For X11 you can set the 'guifontset' option to a list of fonts that together
cover the characters that are used. Example for Korean: >
:set guifontset=k12,r12
Alternatively, you can set 'guifont' and 'guifontwide'. 'guifont' is used for
the single-width characters, 'guifontwide' for the double-width characters.
Thus the 'guifontwide' font must be exactly twice as wide as 'guifont'.
Example for UTF-8: >
:set guifont=-misc-fixed-medium-r-normal-*-18-120-100-100-c-90-iso10646-1
:set guifontwide=-misc-fixed-medium-r-normal-*-18-120-100-100-c-180-iso10646-1
You can also set 'guifont' alone, Vim will try to find a matching
'guifontwide' for you.
INPUT
There are several ways to enter multi-byte characters:
- For X11 XIM can be used. See |XIM|.
- For MS-Windows IME can be used. See |IME|.
- For all systems keymaps can be used. See |mbyte-keymap|.
The options 'iminsert', 'imsearch' and 'imcmdline' can be used to chose
the different input methods or disable them temporarily.
==============================================================================
2. Locale *mbyte-locale*
The easiest setup is when your whole system uses the locale you want to work
in. But it's also possible to set the locale for one shell you are working
in, or just use a certain locale inside Vim.
WHAT IS A LOCALE? *locale*
There are many of languages in the world. And there are different cultures
and environments at least as much as the number of languages. A linguistic
environment corresponding to an area is called "locale". This includes
information about the used language, the charset, collating order for sorting,
date format, currency format and so on. For Vim only the language and charset
really matter.
You can only use a locale if your system has support for it. Some systems
have only a few locales, especially in the USA. The language which you want
to use may not be on your system. In that case you might be able to install
it as an extra package. Check your system documentation for how to do that.
The location in which the locales are installed varies from system to system.
For example, "/usr/share/locale" or "/usr/lib/locale". See your system's
setlocale() man page.
Looking in these directories will show you the exact name of each locale.
Mostly upper/lowercase matters, thus "ja_JP.EUC" and "ja_jp.euc" are
different. Some systems have a locale.alias file, which allows translation
from a short name like "nl" to the full name "nl_NL.ISO_8859-1".
Note that X-windows has its own locale stuff. And unfortunately uses locale
names different from what is used elsewhere. This is confusing! For Vim it
matters what the setlocale() function uses, which is generally NOT the
X-windows stuff. You might have to do some experiments to find out what
really works.
*locale-name*
The (simplified) format of |locale| name is:
language
or language_territory
or language_territory.codeset
Territory means the country (or part of it), codeset means the |charset|. For
example, the locale name "