Skip to content

Latest commit

 

History

History
67 lines (52 loc) · 1.93 KB

README.md

File metadata and controls

67 lines (52 loc) · 1.93 KB

Python Version GitHub Contributors GitHub Stars GitHub Pull Requests GitHub Forks GitHub Last Commit GitHub Top Language GitHub Commit Activity GitHub Followers

HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?

This is the official implementation of our final submission on SemEval 2024, Task 8. Paper is available on arXiv.

Run Locally

Clone

  git clone https://github.com/dipta007/SemEval24-task8

Go to the project directory

  cd SemEval24-task8

Install dependencies

  conda env create -f environment.yml 
  conda activate sem24_task8

Download Data

 gdown https://drive.google.com/drive/folders/1FrhMQ5QvMgaeSgcBmZbk7l_GbU-ga99P -O ./data --folder

Run trainer

  python src/train.py --exp_name=EXP_NAME

Final Model Hyperparameters

 'accumulate_grad_batches': 16,
 'batch_size': 2,
 'cls_dropout': 0.6,
 'encoder_type': 'sen',
 'loss_weight_con': 0.7,
 'loss_weight_gen_text': 0.1,
 'loss_weight_text': 0.8,
 'lr': 1e-05,
 'max_doc_len': 64,
 'max_epochs': -1,
 'max_sen_len': 4096,
 'model_name': 'jpwahle/longformer-base-plagiarism-detection',
 'seed': 42,
 'validate_every': 0.04,
 'weight_decay': 0.0