Skip to content

Create high quality and large custom parallel datasets for low-resource language translation tasks using this UIPath automated solution.

Notifications You must be signed in to change notification settings

PrudhvirajuChekuri/LangSync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LangSync(A solution to generate parallel datasets for low-resource languages).

Instructions:

  1. Download the UIPath Studio and its browser extension. Enable the extension in your browser.

  2. Download the LangSync.zip file.

  3. Unzip and move it to the UIPath directory(Path given during installation).

  4. Collect the English sentences you want to use for training, and add them to a text file. Sentences should be separated by '\n' and a '$' key should be used after every 5000 characters because of the limits.

  5. After following the preprocessing in step-4, name the text file as input and replace it with the input.txt file in the LangSync directory.

  6. Now open the UIPath studio and you can find the process named ParallelCorpus. Click on it and then click on the "open main workflow" dialogue.

  7. You can now run the automation by clicking the "Debug file" button on the top left and then selecting "Run File".

  8. The default source and target languages are English and Hindi. To change them, follow these steps:

    -> Scroll down until you see the "Browser URL" element.

    -> Then double-click on the URL and change the 'sl' and 'tl' values to your desired language codes.

    -> Find your language codes here.

  9. The output parallel dataset for your language can be found in the output.txt file in the LangSync directory. Execution time depends on the size of the input file. You can keep track of the progress with the help of the count.txt file.

  10. Below are sample input and output files for your reference.

Sample Input

image

Sample Output

image

About

Create high quality and large custom parallel datasets for low-resource language translation tasks using this UIPath automated solution.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published