[FR] Review of encoding and representation of Characters in the language files and the font files #1963

MarkusThur · 2021-06-01T13:52:02Z

Is your feature request related to a problem? Please describe.
This request is related to #1952.

There definitely at least with the BTT 70" is a encoding issue with the language headers and inis in combination with the provided fonts.

language header files and .ini files are or should be encoded UTF-8

according to the readme.md in the fonts file contain character representations encoded ASCII
or UTF-16.

The "byte_ascii.fon" is the bitmap fonts of ASCII, size: 1224
The "word_unicode.fon" is the bitmap fonts of UTF-16, size: 2424
Scan direction: form UP to DOWN, from LEFT to RIGHT

Sadly I can't check the .fon files, as I don't know the proper editor for them. From testing those files contain a proper representation of U-2103 ℃

For the first 127 characters of the Unicode this is fine, for the higher characters this in some way must and does fail.

Sure is correct UTF-8 encoded 2 characters representation of °C does work, one character representation of ℃ does work also if encoded in correct UTF-8 encoding 0xe2 0x84 0x83.
The "weak" encoding mixing the UTF-16 encoding ´0x2103` into the UTF-8 language files to represent ℃, like found in the german .ini and header file, does interestingly work in some environments, while fails in others.

Describe the solution you'd like

Indicating clearly the needed UTF-8 encoding of language_xx.ini files and language_xx.h files by adding a markdown README.md in the respective folders mentioning it.

Describe alternatives you've considered

implementing a autocheck in the buildroot/scripts/auto_gen_language_pack.py to ensure the correct encoding of critical characters like U-2103 would be great, but in my eyes it's not a priority.
Processing of \uxxxx escape sequences instead of deleting them in buildroot/scripts/auto_gen_language_pack.py would be great
Providing a proper editor for the fonts / explaining the .fon file format better in order to identify a proper editor. (in progress, see answer to Is there a recommended editor for the fonts? #1957)

Additional context
Thank you for reading and considering this

The text was updated successfully, but these errors were encountered:

guruathwal · 2021-06-03T12:03:34Z

@MarkusThur The character ℃ is being properly encoded in both language_xx.ini and language_xx.h file for all languages.
The UTF-8 encoding is already implemented in buildroot/scripts/auto_gen_language_pack.py.
I inspected all the files and found that all unicode characters are being encoded correctly (see image below). If there was an issue with the firmware then it will be with all the TFT variants because the API is the same for all the variants. the only difference is the screen resolution.
It is not clear what kind of problem you are having with just the ℃ character and why? Did you modify any part of the firmware? You need to share the file which has improper encoding and share a photo of the display with the issue.

MarkusThur · 2021-06-03T14:05:36Z

Some files contained the UTF-16 / Windows 1252 encoding of it 21 03 at some point, somewhere at the Vx.x.27 tag. This happens as the files do not contain a encoding indicator and the used editor "guesses" the encoding wrong.

Lets have a markdown file at the relevant positions to remind on the right encoding and everything is fine.
If the auto_gen_language_pack.py would automatically take care of it, it would be really great.

it takes in some kind care, that the encoding of the header file is 'UTF-8' already, but does not check for that typical mistake, that tends to occur with windows machines editing UTF-8 files.
Also it operates the unicode escape sequences \uxxxx in some way. But if i read it right, it just deletes them, instead of processing them.

I can't reproduce it, as I don't find those wrong combo anymore, and with the contributions it is fixed, by correctly encoded files, which then are also displayed correctly.

The request is about preventing that from happen again, as at any time there could occure files with "weak" encoding

stale · 2021-08-07T15:11:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions · 2024-03-29T01:24:28Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

MarkusThur added the enhancement New feature or request label Jun 1, 2021

MarkusThur changed the title ~~[FR] (feature request title)~~ [FR] Review of encoding and representation of Characters in the language files and the font files Jun 1, 2021

MarkusThur mentioned this issue Jun 1, 2021

language_de.h with UTF-8 encoded U-2103 #1964

Merged

stale bot added the Abandoned label Aug 7, 2021

stale bot closed this as completed Aug 14, 2021

github-actions bot locked and limited conversation to collaborators Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Review of encoding and representation of Characters in the language files and the font files #1963

[FR] Review of encoding and representation of Characters in the language files and the font files #1963

MarkusThur commented Jun 1, 2021 •

edited

Loading

guruathwal commented Jun 3, 2021

MarkusThur commented Jun 3, 2021 •

edited

Loading

stale bot commented Aug 7, 2021

github-actions bot commented Mar 29, 2024

[FR] Review of encoding and representation of Characters in the language files and the font files #1963

[FR] Review of encoding and representation of Characters in the language files and the font files #1963

Comments

MarkusThur commented Jun 1, 2021 • edited Loading

guruathwal commented Jun 3, 2021

MarkusThur commented Jun 3, 2021 • edited Loading

stale bot commented Aug 7, 2021

github-actions bot commented Mar 29, 2024

MarkusThur commented Jun 1, 2021 •

edited

Loading

MarkusThur commented Jun 3, 2021 •

edited

Loading