This guide will list the steps required to add support for a new language in SSID. To better illustrate the steps, we will assume that the language that we want to add support is called "lang"
- Go to
/lib/antlr/grammars/
. You will see that this directory is organized into two subdirectories
bin - contains separate directories for each recognizable grammar by PlagiarismDetection.jar. Each grammar directory contains the _tokens file containing a list of keywords in the language and _token_mapings file containing the mapping between the _tokens file and the tokens understood by PlagiarismDetection.jar
src - contains the ANTLR4 grammar file (file extension .g4). Follow the README file to generate the token file in the bin directory
-
Now, go under
src
folder and create a folder with the language name, which in our case will be "lang". -
For the respective language that you are intending to add support, look for the grammar file (. g4) understood by Antlr4, publicly available at https://github.com/antlr/grammars-v4
-
If the grammar file is available, retrieve the file into the created folder by using the
wget
command. For instance, for the lang language, the command will look something like this:
wget https://github.com/raw/antlr/grammars-v4/master/lang/lang.g4
- If the grammar file is unavailable, then you will need to write your own grammar file based on the parsers for the respective language.
- Extract the respective .g4 using the Antlr Tool in antlr-4.8-complete.jar. For the lang language, the command will look like this:
java -jar ../../../../java/jar/antlr-4.8-complete.jar -no-listener -no-visitor -package Antlr4Grammars.lang lang.g4
-
Once the extraction is done, there will be 6 files. They are:
- lang.tokens
- langLexer.java
- langParser.java
- langLexer.tokens
- langLexer.interp
- lang.interp
Files (iv) - (vi) are not needed and can be deleted.
-
Rename the
lang.tokens
file tolang_tokens
file. -
Now go to
/lib/antlr/grammars/bin
folder and create a folder with the language name, which in our case will be "lang". This folder will be used to store thelang_tokens
file &lang_token_mappings
file (which will be created later by us). As such, move thelang_tokens
file from the/src/
folder to this newly created folder. -
Now go to
/lib/java/PlagiarismDetection/src/Antlr4Grammars/
and create a folder with the language name, which in our case will be "lang". This foler will be used to storelangLexer.java
file &langParser.java
file. As such, move these two files from the/lib/antlr/grammars/src
folder to this newly created folder.
Each grammar has its associated LangLexer.java and LangParser.java in the source code of PlagiarsmDetection (SSID/lib/java/PlagiarismDetection/src Antlr4Grammars) that is required for PlagiarismDetection.jar to understand the grammar.
In theory only LangLexer.java is needed for PlagiarismDetection.jar but some LangLexer.java makes use of methods from LangParser.java. LangParser.java is included in to prevent compilation issues when building PlagiarismDetection.jar
- Navigate back to
/lib/antlr/grammars/bin/lang
and create a file namedlang_token_mappings
file. Manually assign the tokens defined inlang_tokens
to tokens understood by PlagiarismDetection.jar. Read below to find out more.
lang_tokens
is the tokens file produced by ANTLR4. It provides a mapping from the token type number provided by the ANTLR4 runtime library to the token names defined in the .g4 ANTLR grammar file
lang_token_mappings
is a csv file that needs to be written manually to map the tokens defined inlang_tokens
to the tokens understood by PlagiarismDetection.jar. The tokens in lang_tokens must be mapped to either Symbol, Constant, Variable, or Keyword. To ignore a particular token inlang_tokens
, assign it with the label "Ignore" inlang_token_mappings
file.
-
Now, navigate to
/lib/java/PlagiarismDetection/src/Tokenizer/ANTLRDynamicTokenizer.java
. Allow ANTLRDynamicTokenizer to access LangLexer.java inside the package Antlr4Grammars.langimport Antlr4Grammars.lang.*;
-
Under the same file, allow the function
getLexer(String[ ] args)
in line 196 to obtain the correct LangLexer.else if (language.equals("lang")) { LangLexer lexer = new LangLexer(CharStreams.fromFileName(fileName)); return lexer; }
-
Run "ant" to rebuild.
-
Now navigate to
/app/models/assignment.rb
and add the respective language name underLANGUAGES
&PRETTIFY_LANGUAGES
-
Next, navigate to
/app/assets/javascripts/assignment_file_upload.js
and add the respective language extension and language name underlanguageHashMap
-
Finally, navigate to
lib/submissions_handler.rb
and modifyaccepted_formats
in line 78 to include file extension type for the language.