[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
how to test the new glibc-2.2's UTF-8 locales
The new multibyte locales in the upcoming glibc-2.2 are now pretty much
working. For the adventurous among you who would like to discover how
neat it is, I'm appending a recipe how to install a new glibc snapshot
without shooting yourself in the foot.
What works: (I hope I missed nothing, Ulrich!)
- Locales with multibyte encodings can be created.
- iconv is now much more reliable.
- The wc* and mb* family of functions, including fwprintf and fwscanf.
- FILE streams: fopen("filename","r/w,ccs=ENCODING"), fpos_t now includes
an mbstate_t.
- strcoll, wcscoll have been completely rewritten.
- nl_langinfo(CODESET) works.
- gettext automatically converts the translations to the current locale's
encoding.
What is still missing:
- Transliteration of accented and special punctuation characters during
a conversion UTF-8 -> ISO-8859-* or UTF-8 -> ASCII.
- The wide character properties (wcwidth, iswupper, etc.) are very
different from the tables of the Unicode consortium.
- regexp is multibyte aware, but still only handles ISO-8859-1 characters
correctly.
- There is no UTF-7 support in iconv.
Bruno
Installation instructions
=========================
What you need
-------------
- Sources:
- glibc CVS sources, instructions are at http://sourceware.cygnus.com/glibc/
(remember to use "cvs -z 9" to save network bandwidth)
- Linux kernel sources,
- sources of gcc-2.95.2 or newer, and binutils (I used 2.9.1.0.25),
- sources of all the packages you wish to recompile.
- Programs: not too old versions of gcc, make, sed.
- Disk space: ca. 150 MB permanent, ca. 700 MB temporary space.
General approach
----------------
If you were to install a glibc snapshot in /lib:/usr/lib, chances are high
that your system would be damaged in some way: maybe most C++ apps would
dump core, maybe emacs won't work any more, maybe you will be unable to
recompile a new gcc. If you have bad luck, your system will not even boot
any more.
For this reason, I recommend to build in a completely separate tree, let's
say /glibc22 instead of /usr. (You can choose any other pathname instead,
or make /glibc22 a symbolic link. 150 MB space should be available here.)
Other people recommend to copy all of /etc, /lib, /bin, /sbin, /usr, /dev,
/tmp, /var and the glibc sources to a new partition, say /testing, then do
"chroot /testing", and build and install the new glibc in this chroot
environment. But I don't like this, because I don't like chroot, and because
with this approach you are likely to need to start from scratch if anything
goes wrong.
Note that all binaries you create in /glibc22 will contain a hardwired
pathname /glibc22/lib/ld-linux.so.2. You shouldn't distribute them: They
will not run on any Linux system except yours.
Step 0: Preparation
-------------------
Work as a non-root user, because that's the least likely to damage anything.
$ useradd buildguy
$ su - buildguy
Create the directory.
$ mkdir /glibc22
Unset the LD_PRELOAD and LD_LIBRARY_PATH environment variables. They would
only cause trouble later.
$ unset LD_PRELOAD
$ unset LD_LIBRARY_PATH
Prepare the kernel sources. You must have them unpacked and configured.
/usr/src/linux-2.x.y/include/linux/autoconf.h must exist. Building the
kernel is not needed.
Step 1: Build glibc
-------------------
Unpack a fresh copy of the glibc snapshot sources. (Building glibc needs
write access to the sources.) Also, remove the CVS traces therein.
$ tar xvfz /somewhere/glibc-2000-05-xx.tar.gz
$ find glibc-2000-05-xx -name CVS -type d -exec rm -r '{}' ';'
Modify elf/ldconfig.c as follows:
Add prefix /glibc22 to LD_SO_CACHE and LD_SO_CONF definitions.
Remove the two lines mentioning /lib and /usr/lib.
Modify sysdeps/generic/dl-cache.c as follows:
Add prefix /glibc22 to LD_SO_CACHE definition.
Create a build directory and build there:
$ mkdir glibc-build
$ cd glibc-build
$ ../glibc-2000-05-xx/configure --prefix=/glibc22 --with-headers=/usr/src/linux-2.x.y/include --enable-add-ons
The --prefix=/glibc22 line here is very important; this is what avoids
that your existing libc gets overwritten.
$ make
$ make check [This may fail.]
Make the documentation:
$ make info
$ make dvi [This creates manual/libc.dvi.]
$ make pdf [This creates manual/libc.pdf.]
Then install it:
$ make install
$ make localedata/install-locales
$ mkdir -p /glibc22/doc/libc
$ cp ../glibc-2000-05-xx/manual/libc.{dvi,pdf} /glibc22/doc/libc
Make symlinks for /glibc22/include/linux and /glibc22/include/asm
$ ln -s /usr/src/linux/include/asm /glibc22/include/asm
$ ln -s /usr/src/linux/include/linux /glibc22/include/linux
Step 2: Customization
---------------------
Add /glibc22/lib to /glibc22/etc/ld.so.conf and run /glibc22/sbin/ldconfig.
Now binaries linked against the new glibc should run. As a first test,
try to create an UTF-8 locale for your work.
$ mkdir /glibc22/lib/locale
$ /glibc22/bin/localedef -c -f UTF8 -i de_DE de_DE.UTF-8
Also set your timezone:
$ cd /glibc22/etc
$ ln -sf ../share/zoneinfo/Europe/Berlin localtime
Step 3: Build ld
----------------
At this point, you still cannot create C programs which link against the
new glibc, because the linker will search for libc.so in /lib and /usr/lib.
(Well, you can, but it's painful: First, you have to give -I/glibc22/include,
and when it comes to linking, you have to link statically and modify by hand
the linker command line that gcc passes to collect2.)
Unpack and configure binutils. (Note that you cannot compile binutils
in its source directory if your $PATH contains "." in front of "/usr/bin".)
$ tar xvfz /somewhere/binutils-2.9.1.0.x.tar.gz
$ cd binutils-2.9.1.0.x
$ mkdir build
$ cd build
$ ../configure --prefix=/glibc22
Modify ld/Makefile as follows:
- Set a value for LIB_PATH.
LIB_PATH = /glibc22/lib:/usr/local/lib
- Add LIB_PATH=$(LIB_PATH) to the front of $(GENSCRIPTS) so that the
linker scripts search /glibc22/lib instead of /usr/lib.
GENSCRIPTS = LIB_PATH=$(LIB_PATH) $(SHELL) $(srcdir)/genscripts.sh ${srcdir} ${libdir} i586-pc-linux-gnu i586-pc-linux-gnu i586-pc-linux-gnu ${EMUL} ""
- Add Makefile to $(GEN_DEPENDS) so that the scripts get rebuilt.
GEN_DEPENDS = $(srcdir)/genscripts.sh $(srcdir)/emultempl/stringify.sed Makefile
Then you can build.
$ make
The only program you need to install is ld.
$ cd ld
$ mkdir -p /glibc22/i586-pc-linux-gnu/lib/ldscripts
$ make install
Step 4: Build gcc
-----------------
At this point, you still cannot create C programs which link against the
new glibc, because gcc does not know the location of the newly created linker,
and because it passes the option "--dynamic-linker /lib/ld-linux.so.2" to
the linker. So you have to build a new gcc.
It is important that the ld created by the last step gets installed in
/glibc22/i586-pc-linux-gnu/bin/
because this is the directory gcc will look at. It is was installed in a
different directory, make a symlink.
Unpack gcc.
$ tar xvfI /somewhere/gcc-2.95.2.tar.bz2
$ cd gcc-2.95.2
Modify gcc/config/i386/linux.h as follows:
Change /lib/ld-linux.so.2 to /glibc22/lib/ld-linux.so.2
Configure gcc without C++ support. (You can't compile gcc-2.95.2 with C++
support now; it would choke while compiling libio/indstream.cc because
of a problem with 'struct streampos'. I don't know whether this is fixed
in current gcc snapshots.)
$ cp gcc/cp/lang-options.h gcc/lang-options.h
$ cd ..
$ mkdir gcc-build
$ cd gcc-build
$ ../gcc-2.95.2/configure --prefix=/glibc22 --enable-shared --enable-version-specific-runtime-libs --enable-languages=
Modify gcc/Makefile as follows:
To the cccp.o rule add the line
-DSTANDARD_INCLUDE_DIR=\"/glibc22/include\" \
Then build as usual:
$ make bootstrap
$ make install
Now finally you have a compiler, /glibc22/bin/gcc, which can create binaries
linked against the new glibc.
Step 5: Build binutils
----------------------
This step is optional. The ld which is used by /glibc22/bin/gcc still uses
the old libc. To make things really self-hosting, you should rebuild the
binutils.
Throw away the old build directory:
$ rm -r binutils-2.9.1.0.x
Unpack and configure:
$ tar xvfz /somewhere/binutils-2.9.1.0.x.tar.gz
$ cd binutils-2.9.1.0.x
$ mkdir build
$ cd build
$ CC=/glibc22/bin/gcc ../configure --prefix=/glibc22 --enable-shared
Modify ld/Makefile exactly the same way as before:
- Set a value for LIB_PATH.
LIB_PATH = /glibc22/lib:/usr/local/lib
- Add LIB_PATH=$(LIB_PATH) to the front of $(GENSCRIPTS) so that the
linker scripts search /glibc22/lib instead of /usr/lib.
GENSCRIPTS = LIB_PATH=$(LIB_PATH) $(SHELL) $(srcdir)/genscripts.sh ${srcdir} ${libdir} i586-pc-linux-gnu i586-pc-linux-gnu i586-pc-linux-gnu ${EMUL} ""
- Add Makefile to $(GEN_DEPENDS) so that the scripts get rebuilt.
GEN_DEPENDS = $(srcdir)/genscripts.sh $(srcdir)/emultempl/stringify.sed Makefile
Then you can build.
$ make
Before installing, edit the main Makefile to comment out the three lines
defining REALLY_SET_LIB_PATH. Without this, LD_LIBRARY_PATH would be set,
and you would get an obscure error during the installation of libiberty.a:
sh: /lib/ld-linux.so.2: version `GLIBC_2.2' not found (required by libc.so.6)
Now
$ make install
Step 6: Use the new environment
-------------------------------
Any user can now use the new environment. All that is needed is
$ unset LD_PRELOAD
$ unset LD_LIBRARY_PATH
$ export PATH=/glibc22/bin:$PATH
You can now install any number of additional packages in /glibc22.
Typically you would configure them with
$ .../configure --prefix=/glibc22 --enable-shared
And to put yourself into an UTF-8 locale:
$ unset LC_ALL
$ unset LC_CTYPE
$ export LANG=de_DE.UTF-8 [the name of the locale you created earlier]
And for gettext:
$ export LANGUAGE=de:en
(With earlier gettext versions, you would have to set
$ export LANGUAGE=de.UTF-8:en.UTF-8
and convert the message catalogs to UTF-8, but now gettext does the character
encoding conversion itself.)
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/