Skip to content

Commit

Permalink
Tamil visual normalization rules added for flipped two-part vowel sig…
Browse files Browse the repository at this point in the history
…ns. Example: SIGN EE + SIGN AA -> SIGN OO

PiperOrigin-RevId: 641977604
  • Loading branch information
cibu authored and copybara-github committed Jun 10, 2024
1 parent 9300462 commit e37af90
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 0 deletions.
15 changes: 15 additions & 0 deletions nisaba/scripts/brahmic/data/Taml/visual_rewrite.textproto
Original file line number Diff line number Diff line change
Expand Up @@ -106,3 +106,18 @@ item {
uname: ["SIGN AU", "SIGN I"] raw: "ௌி"
to_uname: ["SIGN E", "LLA", "SIGN I"] to_raw: "ெளி"
}

# Flipped two-part vowel signs.
# The non-flipped sequence is covered by NFC.
item {
uname: ["SIGN AA", "SIGN E"] raw: "ாெ"
to_uname: ["SIGN O"] to_raw: ""
}
item {
uname: ["SIGN AA", "SIGN EE"] raw: "ாே"
to_uname: ["SIGN OO"] to_raw: ""
}
item {
uname: ["AU LENGTH MARK", "SIGN E"] raw: "ௗெ"
to_uname: ["SIGN AU"] to_raw: ""
}
4 changes: 4 additions & 0 deletions nisaba/scripts/brahmic/testdata/visual_norm.textproto
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ rewrite { rule: "SINH" input: "අපේ‍්‍රල්" output: "අප්
# rewrite { rule: "TAML" input: "தமி​ழர்‌கள்‍" output: "தமிழர்கள்" }
rewrite { rule: "TAML" input: "ஆக்‌ஷன்" output: "ஆக்‌ஷன்" }

rewrite { rule: "TAML" input: "காெள்" output: "கொள்" }
rewrite { rule: "TAML" input: "ப்ராேஷன்" output: "ப்ரோஷன்" }
rewrite { rule: "TAML" input: "சௗெந்தர்யம்" output: "சௌந்தர்யம்" }

rewrite { rule: "DEVA" input: "श्रीमान्‌को" output: "श्रीमान्‌को" }
rewrite { rule: "DEVA" input: "गोल्‍डबर्ग" output: "गोल्डबर्ग" }

Expand Down

0 comments on commit e37af90

Please sign in to comment.