Skip to content

Commit

Permalink
Merge pull request #3 from kargnas/enhance/pluralization
Browse files Browse the repository at this point in the history
Even more advanced pluralization
  • Loading branch information
kargnas committed Jul 24, 2024
2 parents 5f71e59 + fd36550 commit e65a717
Show file tree
Hide file tree
Showing 3 changed files with 387 additions and 43 deletions.
35 changes: 10 additions & 25 deletions src/AI/prompt-system.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,39 +8,24 @@ Follow these important rules first:
- Keep pluralization code same. (e.g. {0} There are none|[1,19] There are some|[20,*] There are many)
- Keep a letter case for each word like in source translation. The only exception would be when {targetLanguage} has different capitalization rules than {sourceLanguage} for example for some languages nouns should be capitalized.
- For phrases or titles without a period, translate them directly without adding extra words or changing the structure.
- Examples:
- 'Read in other languages' should be translated as a phrase or title, without adding extra words.
- 'Read in other languages.' should be translated as a complete sentence, potentially with polite expressions as appropriate in the target language.
- 'Submit form' on a button should be translated using a short, common action word equivalent to "Confirm" or "OK" in the target language.

Pluralization rules:
- Always expand all plural forms into multiple specific numbered forms, regardless of the source format or word type.
- For languages with 3 forms (e.g., Polish, Russian, Czech):
- Always use: {1} singular|[2,4] few|[5,*] many
- Apply this to ALL nouns, regular or irregular
- For languages with 4 forms (e.g., Arabic, Slovenian):
- Always use: {1} singular|{2} dual|[3,10] few|[11,*] many
- Apply this to ALL nouns, regardless of their original plural formation
- For languages with simpler pluralization (e.g., English), still force the 3-form format:
- Use: {1} singular|[2,4] plural|[5,*] plural
- Always apply this expansion, even when it means repeating the same form multiple times.
- Research and apply the correct plural forms for each specific noun in the target language and preserve case of letters for each.
- Consider language-specific features like gender, case, and measure words when applicable.
- If unsure about a specific plural form, use a placeholder and flag it for human review.
- Examples:
- 'Read in other languages' should be translated as a phrase or title, without adding extra words.
- 'Read in other languages.' should be translated as a complete sentence, potentially with polite expressions as appropriate in the target language.
- 'Submit form' on a button should be translated using a short, common action word equivalent to "Confirm" or "OK" in the target language.

Follow these additional rules:
- Keep the meaning same, but make them more modern, user-friendly, and appropriate for digital interfaces.
- Use contemporary IT and web-related terminology that's commonly found in popular apps and websites.
- Maintain the sentence structure of the original text. If the original is a complete sentence, translate it as a complete sentence. If it's a phrase or title, keep it as a phrase or title in the translation.
- Prefer shorter, more intuitive terms for UI elements. For example, use equivalents of "OK" or "Confirm" instead of "Submit" for button labels.
- When translating error messages or system notifications, use a friendly, reassuring tone rather than a technical or severe one.
- Use contemporary IT and web-related terminology that's commonly found in popular apps and websites.
- Maintain the sentence structure of the original text. If the original is a complete sentence, translate it as a complete sentence. If it's a phrase or title, keep it as a phrase or title in the translation.
- Prefer shorter, more intuitive terms for UI elements. For example, use equivalents of "OK" or "Confirm" instead of "Submit" for button labels.
- When translating error messages or system notifications, use a friendly, reassuring tone rather than a technical or severe one.
- Keep the length almost the same.
- Keep the words forms same. Don't change the tense or form of the words.
- Don't translate codes(`code`), variables, commands(/command), placeholders, and html tags.
- For time expressions:
- Translate time-related phrases like "Updated at :time" by adjusting the variable position to fit the target language's natural word order while preserving the meaning.
- Translate time-related phrases like "Updated at :time" by adjusting the variable position to fit the target language's natural word order while preserving the meaning.
- For count expressions:
- Translate count-related phrases like ":count messages" by adjusting the variable position as needed for natural expression in the target language.
- Translate count-related phrases like ":count messages" by adjusting the variable position as needed for natural expression in the target language.
- Preserve the meaning of complex variable combinations (e.g., "Welcome, :name! You have :count new messages."). The semantic roles of variables should remain the same in the translation, even if their positions change.
- For placeholder text that users will replace (often in ALL CAPS or surrounded by brackets), keep these in their original language but adjust the position if necessary for natural expression in the target language.
{additionalRules}
Expand Down
61 changes: 43 additions & 18 deletions src/Console/TranslateStrings.php
Original file line number Diff line number Diff line change
Expand Up @@ -6,31 +6,16 @@
use Illuminate\Console\Command;
use Kargnas\LaravelAiTranslator\AI\AIProvider;
use Kargnas\LaravelAiTranslator\Transformers\PHPLangTransformer;
use Kargnas\LaravelAiTranslator\Utility;

class TranslateStrings extends Command
{
// en_us (all capital, underscore)
protected static $additionalRules = [
'pl' => [
"- Polish pluralization: Always use 3 forms: {1} singular, [2,4] plural for few, [5,*] plural for many. Example: \"One book|:count books\" becomes \"{1} jedna książka|[2,4] :count książki|[5,*] :count książek\".",
"- Polish pluralization example: For 'apple': {1} jedno jabłko|[2,4] :count jabłka|[5,*] :count jabłek. Consider gender (męski, żeński, nijaki) and case (mianownik, dopełniacz, etc.) when forming plurals.",
],
'zh' => [
"- CRITICAL: For ALL Chinese translations, ALWAYS use exactly THREE parts: {1} 一 + measure word + noun|{2} 两 + measure word + noun|[3,*] :count + measure word + noun. This is MANDATORY, even if the original only has two parts. NO SPACES in Chinese text except right after numbers in curly braces and square brackets.",
"- CRITICAL: For ALL Chinese translations, ALWAYS use exactly THREE parts: 一 + measure word + noun|两 + measure word + noun|:count + measure word + noun. This is MANDATORY, even if the original only has two parts. NO SPACES in Chinese text except right after numbers in curly braces and square brackets.",
"- Example structure (DO NOT COPY WORDS, only structure): {1} 一X词Y|{2} 两X词Y|[3,*] :countX词Y. Replace X with correct measure word, Y with noun. Ensure NO SPACE between :count and the measure word. If any incorrect spaces are found, remove them and flag for review.",
],
'ar' => [
"- CRITICAL: For ALL Arabic translations, ALWAYS use exactly FOUR parts: {1} singular|{2} dual|[3,10] plural for few|[11,*] plural for many. This is MANDATORY, even if the original has fewer forms.",
"- Example structure (DO NOT COPY WORDS, only structure): {1} كتاب واحد|{2} كتابان|[3,10] :count كتب|[11,*] :count كتابًا. Adjust endings based on grammatical case. Consider gender and definiteness. If unsure about a form, use a placeholder and flag for human review.",
],
'ru' => [
"- CRITICAL: For ALL Russian translations, ALWAYS use exactly THREE parts: {1} singular|[2,4] plural for few|[5,*] plural for many. This is MANDATORY, even if the original has fewer forms.",
"- Example structure (DO NOT COPY WORDS, only structure): {1} книга|[2,4] :count книги|[5,*] :count книг. Consider gender (masculine, feminine, neuter) and case (nominative, genitive, etc.) when forming plurals. If unsure about a form, use a placeholder and flag for human review.",
],
'ga' => [
"- CRITICAL: For ALL Irish (Gaeilge) translations, ALWAYS use exactly FOUR parts: {1} singular|{2} dual|[3,6] plural for few|[7,*] plural for many. This is MANDATORY, even if the original has fewer forms.",
"- Example structure (DO NOT COPY WORDS, only structure): {1} leabhar amháin|{2} dhá leabhar|[3,6] :count leabhair|[7,*] :count leabhar. Consider initial mutations (séimhiú, urú) and irregular plurals. For nouns that don't have all forms, repeat the closest appropriate form. If unsure, flag for human review.",
],
'ko' => [
// 1개, 2개 할 때 '1 개', '2 개' 이런식으로 써지는 것 방지
"- Don't add a space between the number and the measure word with variables. Example: {1} 한 개|{2} 두 개|[3,*] :count개",
Expand Down Expand Up @@ -362,8 +347,48 @@ private static function getAdditionalRulesDefault($locale): array {
}
}

private static function getAdditionalRulesPlural($locale) {
$plural = Utility::getPluralForms($locale);
if (!$plural) return [];

return match ($plural) {
1 => [
"- Pluralization Rules",
" - For plurals, always use the format: {1} singular|[2,*] plural. This is MANDATORY, even if the original only has one part.",
" - Example structure (DO NOT COPY WORDS, only structure): {1} singular|[2,*] plural",
" - Consider language-specific features like gender, case, and measure words when applicable.",
],
2 => [
"- Pluralization Rules",
" - Research and apply the correct plural forms for each specific noun in target language and preserve case of letters for each.",
],
3 => [
"- Pluralization Rules",
" - Always expand all plural forms into multiple forms, regardless of the source format or word type. Don't specify a range.",
" - Always use: singular|few|many",
" - Apply this to ALL nouns, regular or irregular",
" - Research and apply the correct plural forms for each specific noun in target language and preserve case of letters for each.",
],
4 => [
"- Pluralization Rules",
" - Always expand all plural forms into multiple forms, regardless of the source format or word type. Don't specify a range.",
" - Always use: singular|dual|few|many",
" - Apply this to ALL nouns, regardless of their original plural formation",
" - Research and apply the correct plural forms for each specific noun in target language and preserve case of letters for each.",
],
6 => [
"- Pluralization Rules",
" - Always expand all plural forms into multiple forms, regardless of the source format or word type. Don't specify a range.",
" - Always use: zero|one|two|few|many|other",
" - Apply this to ALL nouns, regardless of their original plural formation",
" - Research and apply the correct plural forms for each specific noun in target language and preserve case of letters for each.",
],
default => [],
};
}

protected static function getAdditionalRules($locale): array {
return array_merge(static::getAdditionalRulesFromConfig($locale), static::getAdditionalRulesDefault($locale));
return array_merge(static::getAdditionalRulesFromConfig($locale), static::getAdditionalRulesDefault($locale), static::getAdditionalRulesPlural($locale));
}

public function translate() {
Expand Down
Loading

0 comments on commit e65a717

Please sign in to comment.