Add optional UTF-8 Display/File character support. #73

taviso · 2022-06-03T22:33:12Z

Lotus 1-2-3 predates UTF-8, and uses LMBCS internally, which is sort of a precursor to unicode.

I see no reason we couldn't add a UTF-8 option for file/display charset, for better i18n support. It supports character set translation, we just have to teach it how and figure out the CBD (character bundle) format. I already know the BDLREC format, from my lotusdrv project - it's basically a TLV (tag, length, value) encoding system.

sjuswede · 2022-06-15T20:18:36Z

I sometimes work with Chinese and Japanese text, and if this could get working, I'd be extremely happy.

Right now not even Swedish characters like åäö work for me in rxvt-unicode or XTerm, when I try to enter them. When I import a csv containing them (in UTF8) they predictably get stripped out.

taviso · 2022-06-16T00:36:19Z

Yeah, not great right now, I can't even use £ lol. I looked at the code a bit today, I think I can make a few improvements easily, some might be harder though!

Internally, lotus uses LMBCS, which is actually pretty impressive foresight considering unicode wasn't invented and everyone else was using codepages. This is good, because internally it can tell the difference between åäa.

You can see it knows about å, and calls it a ring:

https://archive.org/details/lotus-1-2-3-release-3.1-reference/Lotus%201-2-3%20Release%203.1%20-%20Reference/page/n637/mode/2up

It stores these characters correctly but doesn't know how to display them, so right now it uses a "fallback" ascii character translation table (å => a and £ => L, and so on). That actually seems pretty easy to solve, I'll just add a lmbcs => utf-8 table, then pass it to waddch() instead.

I'll give it a shot this weekend.

taviso · 2022-06-16T01:18:16Z

I think display and keyboard input might be easy, but the question is what to do with /File Import, always assume UTF-8? I guess we could have an environment variable like $LOTUS_IMPORT_CHARSET or whatever.

sjuswede · 2022-06-17T05:51:46Z

An environment variable would of course be great for legacy files. I would default to UTF-8, since that is standard in Linux today. It's a lot of work to set a normal distro to use anything else. But there are a lot of legacy files out there, and many systems which still spit out very strange formats. Don't ask me how I know.

taviso · 2022-06-20T23:14:08Z

Okay, I think I've got a plan. I have an easy temporary improvement, and a plan for a harder complete solution.

I can change the keymap code to translate UTF-8 on input to all the supported lmbcs characters. There are no collisions (I checked) so this will be super easy, I can do this in a day or two.

This is easy but not a complete solution -- there's no cjk for a start... but it is better than nothing - most of the latin extended characters are covered (so I'll get £, you'll get all the Swedish characters, things like éßçñ are all there). There is no €, but it has ¤, it seems pretty safe to just steal that for € for now? I don't know.

The complete solution will be adding lmbcs<->UTF-8 charset support, but this is a much bigger job.

This is the first step in improving i18n support. If any UTF-8 sequences have LMBCS encodings, translate them on input. These characters are stored as LMBCS internally, and you can differentiate them with @code, but they are not displayed correctly (they are transliterated to ASCII, see the 1-2-3R3.1 manual, Appendix 2). The next part of this change will be displaying them as UTF-8.

krackout · 2023-01-03T10:32:59Z

The complete solution will be adding lmbcs<->UTF-8 charset support, but this is a much bigger job.

@taviso If any help can be given, I'm willing; especially regarding Greek. It may be a waste of time to get me programmatically involved, but it'll be easier regarding conversion tables I suppose.

taviso · 2023-01-04T19:07:12Z

Thank you! I'm slowly working on this, it will work eventually! 😆

taviso added this to the 1.0.0 milestone Jun 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional UTF-8 Display/File character support. #73

Add optional UTF-8 Display/File character support. #73

taviso commented Jun 3, 2022

sjuswede commented Jun 15, 2022

taviso commented Jun 16, 2022

taviso commented Jun 16, 2022

sjuswede commented Jun 17, 2022

taviso commented Jun 20, 2022

krackout commented Jan 3, 2023

taviso commented Jan 4, 2023

Add optional UTF-8 Display/File character support. #73

Add optional UTF-8 Display/File character support. #73

Comments

taviso commented Jun 3, 2022

sjuswede commented Jun 15, 2022

taviso commented Jun 16, 2022

taviso commented Jun 16, 2022

sjuswede commented Jun 17, 2022

taviso commented Jun 20, 2022

krackout commented Jan 3, 2023

taviso commented Jan 4, 2023