Amarendra Godbole wrote:
Hi,
The output of a command that prints a tabular output (with a tab separator) is susceptible for a mis-alignment across different languages. Mostly the headers' get mis-aligned with the
Not just the output of commands that print with tab separators, but also commands like cal which just put spaces between the days of the month. In cal, the abbreviated day name headers get misaligned very easily with locales using complex text layout scripts like Hindi, Thai, etc:
~>LANG=hi_IN.UTF-8 cal
अकटबर 2005
रव सो म ब ग श शन
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31The presence of such scripts in tab-separated output should also prove to be problematic
Support for even displaying such scripts is very uneven across different terminal emulator implementations, if there is support at all.
So, even if one pursues localized solutions for locales like Japanese, Chinese, Korean, etc., which is quite doable, there still won't be an adequate answer for Hindi, Bengali, Tibetan, Arabic, and so on and so on ...
Simos' suggestion to run the command in a POSIX or en_US.UTF-8 locale is a very reasonable solution in light of the limitations of current terminals and terminal emulators.
Would it be an option for you to default to, let's say, the POSIX or en_US.UTF-8 locales?column data in a multi-byte language like Japanese. I have been thinking of this issue for a while, and here are the possible solutions to it -
1. Space the columns based on the length of the header. For eg., if the column data is ``helloworld", then o/p would be - head1 header2 headline3 ----------------------------- hello hellowo helloworl world rld d
Each column wraps. But this approach might break existing line-by-line parsing scripts.
2. Space the columns based on the longest length of the column data. This shall need two passes - one to find out the longest column data, and other to align-and-print the table.
3. Space the columns based on some pre-computation of the change in lengths of the English and Japanese equivalent string. For eg., if the Japanese string occupies 40% more columns approx., then space the columns accordingly.
4. Leave the issue as-is. :) I have found this approach taken on HP-UX, where output of df command gets mis-aligned in Japanese locale.
Can senior folks on this list help me with this? Can there be a better approach more suitable to i18nized software?? Thanks a lot in advance.
Before running the mentioned commands, you can reset on demand the LANG/LANGAUGE variables to values of your choice.
It looks as a hell of a problem to parse output that is affected by l10n.
Simos
-- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
-- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/