-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing information for Sardinian (srd_latn
)
#6
Comments
When the variety isn't specified, you should use Standardized Sardinian, called LSC "Limba Sarda Comuna", which has been created exactly for this purpose. Therefore, it is particularly suitable for FLORES+: cross-translation, standardization of a written form, and preservation of an endangered language. LSC offers many advantages for this scope: it is the official version that the local government, "Regione Autonoma Della Sardegna," uses for laws and official communications. Additionally, many Wikipedia pages and books are already in LSC, and numerous grammar guidelines are available for this variety. Cultural associations that use and teach Sardinian also utilize written LSC. Lexically, LSC already encompasses all varieties, considering different words from Logudoresu, Campidanesu, Nuoresu, etc., that indicate the same thing as synonyms. We can begin with LSC Standardized Sardinian and simultaneously request to add at least the two main varieties (Logudoresu and Campidanesu) to the future FLORES+ list, making translation from LSC easier than from Italian. In my opinion, if we start discussing and rejecting Standard Sardinian, we risk ending up as Duolingo: there was widespread disagreement, and after a decade, the request to add the language to the courses list was declined. The language is already endangered enough to risk not being properly included in FLORES+ and therefore missing the opportunity to be part of projects like No Language Left Behind (NLLB). |
Hi @srfro! Thank you, this is very valuable feedback. We are organising a shared task at WMT24 with the purpose of improving/extending this data. Would you be interested in participating? |
Hi @srfro, are you still interested in adding Limba Sarda Comuna, Logudoresu and Campidanesu? As I was mentioning in my previous message, we are running a shared task at WMT24 with the specific purpose of extending the datasets and improving existing data. We are asking people to indicate interest by 20th May. If this is still of interest to you, free to get in touch at info@oldi.org, we would be happy to assist. |
Sardinian consists of different varieties and has multiple orthographies. Both FLORES+ and Seed are missing information on which ones exactly are used in the data.
The text was updated successfully, but these errors were encountered: