Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Review of encoding and representation of Characters in the language files and the font files #1963

Closed
MarkusThur opened this issue Jun 1, 2021 · 4 comments
Labels
Abandoned enhancement New feature or request

Comments

@MarkusThur
Copy link
Contributor

MarkusThur commented Jun 1, 2021

Is your feature request related to a problem? Please describe.
This request is related to #1952.

There definitely at least with the BTT 70" is a encoding issue with the language headers and inis in combination with the provided fonts.

language header files and .ini files are or should be encoded UTF-8

according to the readme.md in the fonts file contain character representations encoded ASCII
or UTF-16.

The "byte_ascii.fon" is the bitmap fonts of ASCII, size: 1224
The "word_unicode.fon" is the bitmap fonts of UTF-16, size: 24
24
Scan direction: form UP to DOWN, from LEFT to RIGHT

Sadly I can't check the .fon files, as I don't know the proper editor for them. From testing those files contain a proper representation of U-2103 ℃

For the first 127 characters of the Unicode this is fine, for the higher characters this in some way must and does fail.

Sure is correct UTF-8 encoded 2 characters representation of °C does work, one character representation of ℃ does work also if encoded in correct UTF-8 encoding 0xe2 0x84 0x83.
The "weak" encoding mixing the UTF-16 encoding ´0x2103` into the UTF-8 language files to represent ℃, like found in the german .ini and header file, does interestingly work in some environments, while fails in others.

Describe the solution you'd like

  • Indicating clearly the needed UTF-8 encoding of language_xx.ini files and language_xx.h files by adding a markdown README.md in the respective folders mentioning it.

Describe alternatives you've considered

  • implementing a autocheck in the buildroot/scripts/auto_gen_language_pack.py to ensure the correct encoding of critical characters like U-2103 would be great, but in my eyes it's not a priority.
  • Processing of \uxxxx escape sequences instead of deleting them in buildroot/scripts/auto_gen_language_pack.py would be great
  • Providing a proper editor for the fonts / explaining the .fon file format better in order to identify a proper editor. (in progress, see answer to Is there a recommended editor for the fonts? #1957)

Additional context
Thank you for reading and considering this

@MarkusThur MarkusThur added the enhancement New feature or request label Jun 1, 2021
@MarkusThur MarkusThur changed the title [FR] (feature request title) [FR] Review of encoding and representation of Characters in the language files and the font files Jun 1, 2021
@guruathwal
Copy link
Contributor

@MarkusThur The character is being properly encoded in both language_xx.ini and language_xx.h file for all languages.
The UTF-8 encoding is already implemented in buildroot/scripts/auto_gen_language_pack.py.
I inspected all the files and found that all unicode characters are being encoded correctly (see image below). If there was an issue with the firmware then it will be with all the TFT variants because the API is the same for all the variants. the only difference is the screen resolution.
It is not clear what kind of problem you are having with just the character and why? Did you modify any part of the firmware? You need to share the file which has improper encoding and share a photo of the display with the issue.

image

@MarkusThur
Copy link
Contributor Author

MarkusThur commented Jun 3, 2021

Some files contained the UTF-16 / Windows 1252 encoding of it 21 03 at some point, somewhere at the Vx.x.27 tag. This happens as the files do not contain a encoding indicator and the used editor "guesses" the encoding wrong.

Lets have a markdown file at the relevant positions to remind on the right encoding and everything is fine.
If the auto_gen_language_pack.py would automatically take care of it, it would be really great.

it takes in some kind care, that the encoding of the header file is 'UTF-8' already, but does not check for that typical mistake, that tends to occur with windows machines editing UTF-8 files.
Also it operates the unicode escape sequences \uxxxx in some way. But if i read it right, it just deletes them, instead of processing them.

I can't reproduce it, as I don't find those wrong combo anymore, and with the contributions it is fixed, by correctly encoded files, which then are also displayed correctly.

The request is about preventing that from happen again, as at any time there could occure files with "weak" encoding

@stale
Copy link

stale bot commented Aug 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Abandoned label Aug 7, 2021
@stale stale bot closed this as completed Aug 14, 2021
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Abandoned enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants