Skip to content

Kirill-Kravtsov/kaggle-tweet-sentiment-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is a Pytorch training pipeline for a text span selection task. It also uses the Catalyst deep learning framework.

Installation

  1. You need to have Anaconda installed
  2. Clone the repo
git clone https://github.com/Kirill-Kravtsov/kaggle-tweet-sentiment-extraction
  1. Create and activate provided Anaconda enviroment
conda env create -f tweet_env.yml
conda activate tweet_env
  1. Download competition data and put in data dir in root of the project
  2. Create folds by running
python create_folds.py

Project structure:

├── configs
│   ├── best_bertweet.yml
│   ├── best_roberta.yml
│   ├── experiments
│   └── optimization
├── create_folds.py
├── data
├── logs
├── scripts
├── src
│   ├── callbacks.py
│   ├── collators.py
│   ├── datasets.py
│   ├── data_utils.py
│   ├── hooks.py
│   ├── losses.py
│   ├── optimize_experiment.py
│   ├── tokenization.py
│   ├── train.py
│   ├── transformer_models.py
│   └── utils.py
└── tweet_env.yml

Running pipeline

To train tha basic Roberta and BERTweet models run:

python train.py --cv --config ../configs/best_roberta.yml
python train.py --cv --config ../configs/best_bertweet.yml

Note: the code is supposed to work with one gpu, so if you have multi-gpu system do not forget to specify CUDA_VISIBLE_DEVICE variable, e.g.:

CUDA_VISIBLE_DEVICES=0 python train.py --cv --config ../configs/best_roberta.yml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published