As properly view ascii files?

Flako000 · September 7, 2013, 12:16am

Hello,
I’m migrating a cobol source files of a SCO Unix SLES11. These files have double line boxes (ascii code 200 to 206, as http://www.elcodigoascii.com.ar/), the source files seem to be encoded with ISO-8859.

I need to insert these codes/symbols and display correctly.
I tried with vim and options ‘setglobal fenc’, ‘set fileencodings’ and export TERM=ansi, but I’m doing water properly set.

Since vim with ‘: e + + enc = CP437’ looks fine, but I can not insert with Contol+V, it seems that the character map is still non-ascc, as if the type utf8 (I see from ‘: digraphs’)

That’s what I should set? vim or the vaiable TERM?

thanks
(use translate.google.com )

Jens-U · September 9, 2013, 2:01pm

Hi flako000,

code pages can be a *****

These files have double line boxes (ascii code 200 to 206, as http://www.elcodigoascii.com.ar/)

Formally these are not ASCII characters, which covers only 0 to 127… (http://en.wikipedia.org/wiki/Ascii). Your description sounds more like CP437.

ISO-8859

which ISO 8859? There are quite a number of them

That’s what I should set? vim or the vaiable TERM?

While you could change your system from using UTF-8 to i.e. some ISO 8859 codepage, I’d wouldn’t suggest to do that. I’d rather find a proper tool that is able to handle the non-system encoding and convert it properly for display.

When I run “vim” on my UTF-8 system to edit a ISO-8859-15 encoded file, it will open the file with all characters displayed properly, and give a “[converted]” message after opening. I then can edit the file as I like and all characters are stored ISO-8859-15 encoded. But of course, there are no “box characters” in ISO 8859-1(5), especially not at code point 200 to 206 (see http://en.wikipedia.org/wiki/Iso_8859-1#Codepage_layout)

These files have double line boxes (ascii code 200 to 206, as http://www.elcodigoascii.com.ar/), the source files seem to be encoded with ISO-8859

If they contain the “box characters”, then they cannot be ISO-8859-*-encoded, they are most probably “CP437”-encoded (http://en.wikipedia.org/wiki/Code_page_437). Unfortunately, Linux seems not to be prepared to handle that CP easily. I found http://forums.opensuse.org/english/get-technical-help-here/64-bit/475296-installing-cp437.html, but didn’t actually try to follow those instructions, YMMV…

Regards,
Jens

Flako000 · September 11, 2013, 12:53am

Hello jmozdzen
Based on what you indicated I GOT move a little but not everything you need. for now is ‘YMMV’
I write what I did:

  With vim (now that I understand well) is the problem of editing. With the option ':e ++enc=CP437' is displayed correctly and Ctrl + K 'xx' can be inserted (not just the range 200-206)

  With gnome-terminal setting 'Hebrew (IBM862).' half working. Corretamente Displays a 'cat source.cob' but the run does not look right.

  Probe all charsets of luit, but it seems to work (some are not in SLES)

  I share your opinion, not reconfigure the entire linux.
  So I'm looking for the sets of variables and their values &#8203;&#8203;should be modified to work properly.

   If you have any other suggestions I agradecere, meanwhile I keep reading ...
   Thanks again,

system · September 15, 2013, 10:29am

I am not sure about this problem, I just hope you will soon overcome it

Jens-U · September 16, 2013, 5:37pm

Hi Flako000,

as I’ve had a similar (but much easier) case last Friday, here’s a quick summary of what to look out for (this is all about command line stuff, not native X11 applications):

you have a shell (i.e. “bash”) with some $LANG setting, telling applications which code page is to be used when displaying things
you have a program that will read the shell’s output stream, interpret it according to the code page settings of this program and display it on your display (“program” may i.e. be “konsole” inside an X11 KDE session, but as well “putty” under MS Windows or anything else)
you have (in your case) a file which contains text that is encoded in a specific codepage, that differs from your general system setup
there’s an application to display the contents of the file (in your case “vim”)

More than one of the above can and will alter the character stream and code page!

Let’s say you’re using bash with some UTF-8 (according to $LANG) setting, an ISO 8859-1 file that contains “ä” (an umlaut character not in ASCII, used i.e. in the German language, HTML: ä). You are working within a KDE session, have opened “konsole” set to UTF-8, where your “bash” is running and you’re calling “vim” to open that file.

You will probably notice that vim reports a “converted” file, but will display the character correctly. Internally (inside vim’s code), the file is treated as ISO 8859-1, but vim “prints out” UTF-8 sequences, which are then taken by “konsole” and treated correctly.

It you, instead of using “vim”, will use “cat” to output the contents, then you’ll see a “funny character”, as the code position for the umlaut character doesn’t match the UTF-8 code point - but that’s how “konsole” will treat the output of “cat” (which does no conversion, unlike “vim”).

“konsole” is capable of changing its code page handling - via “view” → “set encoding”, you could change to ISO 8859-1. Then, after another invocation of “cat”, you’ll see the file content correctly (the old output will not be “re-parsed”, but remains displayed as is). Since “konsole” now is using ISO 8859-1, invoking “vim” will show you supposedly two characters content of the same unchanged file - which is nothing else then the two bytes that are used to UTF-8 encode the umlaut (remember: “vim” will still believe your terminal to run on UTF-8, since $TERM wasn’t changed, and thus internally converts the file content to UTF-8 for display).

Were you to compile some program from that ISO 8859-1 file, the strings in that “program” would most probably be output as ISO 8859-1. So, if you’d run that program on some “konsole” that is set to UTF-8, the output will look garbled. If the konsole is on ISO 8859-1 though, everything will look fine. Again, this is because the program does no conversion of the bytes output for your strings, and will be interpreted accordingly by “konsole”.

So if using “vim” or some other auto-display-converting tools for string handling, you need to be especially careful for those “converted” messages when opening files. New files will be created by “vim” in the default character page of your session. So it’s really easy to end up with files in different character sets, within one and the same program. Not what you really want.

What I don’t understand is why you start playing with totally different character pages though:

With gnome-terminal setting ‘Hebrew (IBM862).’ half working.

Either that file is in IBM862, or it’s CP437. Make sure your editor tools do support that target code page and convert the display of the file’s content to the character page your session runs in (i.e. UTF-8). I understand that this is not easy, and I have no solution for working with CP437. But mixing CPs wildly will only make things worse, not better.

Regards,
Jens

Topic		Replies	Views
Re: Carriage returns are stripped from text files. SLES Configure-Administer	1	169	October 26, 2011
SLES 11 : Add Japanese Language SLES Configure-Administer	4	246	December 9, 2011
Some unicode characters are not available in SLED SLED Applications	11	230	November 22, 2011
Support Chinese character in mail subject SLES Configure-Administer	3	210	September 23, 2011
VNC remote access not displaying "@", "<", ">" characters SLES Configure-Administer	2	239	July 23, 2013

As properly view ascii files?

Related topics