Hi Flako000,
as I’ve had a similar (but much easier) case last Friday, here’s a quick summary of what to look out for (this is all about command line stuff, not native X11 applications):
- you have a shell (i.e. “bash”) with some $LANG setting, telling applications which code page is to be used when displaying things
- you have a program that will read the shell’s output stream, interpret it according to the code page settings of this program and display it on your display (“program” may i.e. be “konsole” inside an X11 KDE session, but as well “putty” under MS Windows or anything else)
- you have (in your case) a file which contains text that is encoded in a specific codepage, that differs from your general system setup
- there’s an application to display the contents of the file (in your case “vim”)
More than one of the above can and will alter the character stream and code page!
Let’s say you’re using bash with some UTF-8 (according to $LANG) setting, an ISO 8859-1 file that contains “ä” (an umlaut character not in ASCII, used i.e. in the German language, HTML: ä). You are working within a KDE session, have opened “konsole” set to UTF-8, where your “bash” is running and you’re calling “vim” to open that file.
You will probably notice that vim reports a “converted” file, but will display the character correctly. Internally (inside vim’s code), the file is treated as ISO 8859-1, but vim “prints out” UTF-8 sequences, which are then taken by “konsole” and treated correctly.
It you, instead of using “vim”, will use “cat” to output the contents, then you’ll see a “funny character”, as the code position for the umlaut character doesn’t match the UTF-8 code point - but that’s how “konsole” will treat the output of “cat” (which does no conversion, unlike “vim”).
“konsole” is capable of changing its code page handling - via “view” → “set encoding”, you could change to ISO 8859-1. Then, after another invocation of “cat”, you’ll see the file content correctly (the old output will not be “re-parsed”, but remains displayed as is). Since “konsole” now is using ISO 8859-1, invoking “vim” will show you supposedly two characters content of the same unchanged file - which is nothing else then the two bytes that are used to UTF-8 encode the umlaut (remember: “vim” will still believe your terminal to run on UTF-8, since $TERM wasn’t changed, and thus internally converts the file content to UTF-8 for display).
Were you to compile some program from that ISO 8859-1 file, the strings in that “program” would most probably be output as ISO 8859-1. So, if you’d run that program on some “konsole” that is set to UTF-8, the output will look garbled. If the konsole is on ISO 8859-1 though, everything will look fine. Again, this is because the program does no conversion of the bytes output for your strings, and will be interpreted accordingly by “konsole”.
So if using “vim” or some other auto-display-converting tools for string handling, you need to be especially careful for those “converted” messages when opening files. New files will be created by “vim” in the default character page of your session. So it’s really easy to end up with files in different character sets, within one and the same program. Not what you really want.
What I don’t understand is why you start playing with totally different character pages though:
With gnome-terminal setting ‘Hebrew (IBM862).’ half working.
Either that file is in IBM862, or it’s CP437. Make sure your editor tools do support that target code page and convert the display of the file’s content to the character page your session runs in (i.e. UTF-8). I understand that this is not easy, and I have no solution for working with CP437. But mixing CPs wildly will only make things worse, not better.
Regards,
Jens