Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Indic scripts #1533

Closed
shreevatsa opened this issue Jun 22, 2024 · 7 comments · Fixed by #1536
Closed

Support for Indic scripts #1533

shreevatsa opened this issue Jun 22, 2024 · 7 comments · Fixed by #1536
Labels
enhancement New feature or request fonts font rasterization and text shaping API and platform implementations platform: macOS topics that directly address macOS platform regression

Comments

@shreevatsa
Copy link

Abstract

Support for non-Latin scripts, such as Indic scripts like Devanagari and Kannada, seems to be missing in Contour, at least on macOS. I just tried a bunch of terminals on macOS, and Contour is the worst of them (zero font support), which is surprising given the mention of Unicode Grapheme cluster support etc. I'm not sure whether I've done anything incorrectly (whether this is a bug report or feature request). This is what it looks like:

image

where sample.txt is:

ತನ್ನ ಒಂದು ಸತ್ಯಸಂಕಲ್ಪದಂತೆ ಸೃಷ್ಟಿಯಲ್ಲಿ ವ್ಯವಸ್ಥೆಯಿಲ್ಲದೆ ತನ್ನ

Motivation

Output from programs like diff, or for that matter ls, should work even when file contents or filenames contain Indic-script characters.

Specification

There is no specification; I believe no terminal renders these scripts correctly (though mlterm comes closest). Still, I'd hope that Contour could be at least as good as other terminals.

@shreevatsa shreevatsa added enhancement New feature or request feature-request User requested features labels Jun 22, 2024
@Yaraslaut
Copy link
Member

hi @shreevatsa
Here is what i see for your text on my system
image

you need to setup fonts accordingly in contour config file. And contour debug font.textshaping might give you some additional info

@christianparpart
Copy link
Member

I think you might have strict_spacing set to true. Try setting it to false :)

@shreevatsa
Copy link
Author

Thank you @Yaraslaut and @christianparpart. It is heartening to know that some amount of support exists in principle.

For what it's worth, I was not able to get it to work on macOS:

➜  ~ contour debug font.textshaping
Warning: Could not find the Qt platform plugin "cocoa" in "" ((null):0, (null))
Fatal: This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
 ((null):0, (null))
[1]    73822 abort      contour debug font.textshaping

and

➜  ~ contour font-locator
[error] The configured text shaping engine CoreText does not yet support font feature settings. Ignoring.
Matching fonts using  : CoreText
Font description      : (family=monospace weight=Regular slant=Roman spacing=Proportional, strict_spacing=no)
Number of fonts found : 1
  path /System/Library/Fonts/Menlo.ttc

@Yaraslaut Yaraslaut removed the feature-request User requested features label Jun 22, 2024
@christianparpart christianparpart added fonts font rasterization and text shaping API and platform implementations platform: macOS topics that directly address macOS platform regression labels Jun 22, 2024
@christianparpart
Copy link
Member

@shreevatsa I just checked. I'm having the same output as you on MacOS, but it works flawlessly on non-MacOS (same software version).

So the only difference is how we discover fonts, on MacOS. This is something I can investigate tonight (I'm out with family during the day). I keep you posted.

I'd like to note one very important thing though (having quickly scanned through your blog article), which is: Unicode is not specified for terminals. Not at all. This is all undefined implementation dependant behaviour. We (as in some terminal developers) try indeed to get to the current century, when it comes to Unicode. Every TE has its own priorities there. For us for example, we focus on complex grapheme cluster support, especially related to emoji, but also on ligature support. Which is both very well supported in Contour.
Languages like Hebrew, RTL, etc.

I don't like MacOS falling behind in font fallback (this is the issue here). I'll look into it later.

@christianparpart
Copy link
Member

christianparpart commented Jun 22, 2024

@shreevatsa the given PR above actually implements proper font fallback on MacOS. Thanks for reporting this. :)

[ ... ] and Contour is the worst of them (zero font support), [ ... ]

I wanted to clarify here something on the wording.

"zero font support" is impossible. Some font is always displayed, and fonts can change, bold, italic, bold/italic, this all works (also on latest stable release for macOS). What you maybe meant is font fallback support, which is, what I addressed in #1536 (for macOS), because apparently, since we switched away from fontconfig-use on macOS to the native CoreText API, we did not implement font fallback, but only basic font matching support. #1536 requires macOS 13.1 or higher, however.

which is surprising given the mention of Unicode Grapheme cluster support etc

grapheme segmentation is something entirely different. This is part of UAX #29 and is implemented in libunicode. In grapheme cluster segmentation, one determines how many (UTF-32) Unicode codepoints form a single user perceived character. This can range from 1 to many (e.g. 7) with zero width joiners or even variation selectors included to alter the display. This is something most terminals don't get right. You can try a little test script which I once wrote to just check our own terminal (not sure why I created a separated repo for that, I was probably a little bit too over-motivated :D).

For reference, i've put a small screenshot of the script's output here (this test script solely focuses on Unicode grapheme segmentation, shown by printing various emoji characters):

image

@shreevatsa
Copy link
Author

Thank you so much!

To clarify:

  • Yes I understand terminal behaviour is not specified with Unicode, in particular (in this case) for complex scripts and trying to force them into the grid-of-cells assumption of terminals. I appreciate that Contour here intends to support Unicode better than other terminals! In fact, while searching around I found some very sensible comments by you on another repo (Support For Emoji Modifiers alacritty/alacritty#3975) which is what emboldened me to report this here. :-)
  • Sorry about "zero font support" — what I meant was that we speakers/readers of languages with complex scripts have, over the years, come to recognize several "levels" of support for our scripts, in various environments' text stacks: (0) no font loaded at all, glyphs are shown as question marks/boxes/tofu (LastResort/.notdef), (1) some font is used, but the glyphs are just laid out next to each other with no CTL reordering, (2) text rendered mostly correctly and readably, with some corner cases that can be read with guesswork, (3) text rendered properly. I imagine readers of Arabic recognize further levels with RTL support, and with terminals I now realize there are further complications with rendering to the grid.
  • Thanks for clarifying that Unicode Grapheme cluster support refers to grapheme cluster segmentation — I must say we are all thankful for emoji, the "trojan horse" of Unicode support (in a good way: many programs that would otherwise not care about Unicode end up implementing varying levels of support for the sake of emojis, which ends up benefiting all the world's languages).

Back to this issue: building from source after #1536 (following the steps from this comment: #1510 (comment)), I can confirm that after the recent PR, there is some positive change as font fallback seems to be working:

image

(The rendering isn't great, with characters overlapping etc, but most other terminals have similar issues, and I understand that implementing something better, when there isn't even any specification yet, may not be within scope. In the meantime on a personal note, I was able to get my work done using eshell, which is not a terminal emulator (thankfully), and doesn't try to force text to a grid.)

@shreevatsa
Copy link
Author

shreevatsa commented Jun 23, 2024

Just for completeness, some concrete numbers for an example (from another repo wez/wezterm#1333 (comment)): in the example there, the text "বাংলা ভাষা" has, at a font size where the space character (and thus one "cell") is 8 pixels wide:

  • বাং 7+4+6=17 pixels (=17/8=2.125 cells)
  • লা 10+4=14 pixels (=14/8=1.75 cells)
  • 8 pixels (=1 cell)
  • ভা 10+4=14 pixels (=14/8=1.75 cells)
  • ষা 8+4=12 pixels (=12/8=1.5 cells)

So ideally this would be 65 pixels = 8.125 cells wide, but if that's not possible, what I as a reader would prefer would be for cell-alignment to happen at word boundaries (so বাংলা = 17 + 14 pixels = 3.875 cells would be rounded up to 4 cells, then a space, then ভাষা = 26 pixels = 3.25 cells would be either squeezed to 3 or rounded up to 4 cells).

I think Contour tries to render the whole thing 5 or 6 cells wide, as there are 5 graphemes and 6 glyphs (copy-pasted input was echo বাংলা ভাষা | wc — note there's a space before the |):

image

while Terminal and iTerm2 use 10 cells, rounding up each grapheme or maybe even each glyph (বাং 2.125 -> 3 cells, লা 1.75 -> 2 cells, 1 cell, ভা 1.75 -> 2 cells, ষা 1.5 -> 2 cells):

image image

(renders better or more readable, but cursor movement goes haywire).

I understand there is no specification here and it's a research problem how best to render these.


Edit: I understand that the equation of "one grapheme cluster = one terminal cell" can make sense for cursor movement (I wonder what's happening with wide emoji or wide East Asian characters?), but if that needs to be retained, I think one simple hack (that would make the text both readable and usable, at cost of some ugliness) would be to scale glyphs so that they don't exceed one cell's width. For the grapheme clusters in the example above:

  • বাং — want to render it in one cell, but font says it's 2.125 cells wide, so scale this whole run by 1/2.125.
  • লা — want to render it in one cell, but font says it's 1.75 cells wide, so scale this whole run by 1/1.75.
  • — want to render it in one cell, and font says it's 1 cell wide, no scaling needing (scale=1).

etc. Kitty and wezterm seem to be attempting something like this, but half-heartedly (only for some glyphs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fonts font rasterization and text shaping API and platform implementations platform: macOS topics that directly address macOS platform regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants