Skip to content

dipta007/SemEval24-Task8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Version GitHub Contributors GitHub Stars GitHub Pull Requests GitHub Forks GitHub Last Commit GitHub Top Language GitHub Commit Activity GitHub Followers

HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?

This is the official implementation of our final submission on SemEval 2024, Task 8. Paper is available on arXiv.

Run Locally

Clone

  git clone https://github.com/dipta007/SemEval24-task8

Go to the project directory

  cd SemEval24-task8

Install dependencies

  conda env create -f environment.yml 
  conda activate sem24_task8

Download Data

 gdown https://drive.google.com/drive/folders/1FrhMQ5QvMgaeSgcBmZbk7l_GbU-ga99P -O ./data --folder

Run trainer

  python src/train.py --exp_name=EXP_NAME

Final Model Hyperparameters

 'accumulate_grad_batches': 16,
 'batch_size': 2,
 'cls_dropout': 0.6,
 'encoder_type': 'sen',
 'loss_weight_con': 0.7,
 'loss_weight_gen_text': 0.1,
 'loss_weight_text': 0.8,
 'lr': 1e-05,
 'max_doc_len': 64,
 'max_epochs': -1,
 'max_sen_len': 4096,
 'model_name': 'jpwahle/longformer-base-plagiarism-detection',
 'seed': 42,
 'validate_every': 0.04,
 'weight_decay': 0.0