Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are word boundaries detected in Tibetan? #15

Open
r12a opened this issue Mar 27, 2024 · 1 comment
Open

How are word boundaries detected in Tibetan? #15

r12a opened this issue Mar 27, 2024 · 1 comment
Labels
i:segmentation Grapheme/word segmentation & selection l:bo Lhasa Tibetan l:dz Dzongkha question s:tibt Tibetan script

Comments

@r12a
Copy link
Contributor

r12a commented Mar 27, 2024

Tibetan has no delimiter for word boundaries (only for tsheg-bar boundaries), but i'm led to believe that the preference is to wrap lines at word boundaries where possible.

Is this something that applications currently do? If so, how do they detect the word boundaries? Do they use dictionary lookups like Chinese, Thai, etc?

@r12a r12a added question i:segmentation Grapheme/word segmentation & selection labels Mar 27, 2024
@nickscottprior
Copy link

i'm led to believe that the preference is to wrap lines at word boundaries where possible

The preference for Tibetan? I don't think this is common. Every genre and format of text that I've read from Old Tibetan to the modern day flagrantly breaks words across different lines all the time.

@r12a r12a added s:tibt Tibetan script l:bo Lhasa Tibetan l:dz Dzongkha labels Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i:segmentation Grapheme/word segmentation & selection l:bo Lhasa Tibetan l:dz Dzongkha question s:tibt Tibetan script
Projects
None yet
Development

No branches or pull requests

2 participants