CSCI5527-course-project

Dataset available on https://www.kaggle.com/competitions/linking-writing-processes-to-writing-quality/data

Notes from Sam on splitting into individual csvs via `preprocess.py`

This can be run to split the large csv file into individual ones for each essay - it only needs to be run once. This is useful for:

Testing out model architectures on a subset of the essays, by using the cutoff parameter to preprocess only a subset and use that as the model inputs.
Training models without reading the big csv into memory at once (helps a lot with speed on my computer).

Preprocessing splits the dataset into training and test based on the train_pct parameter.

Preprocessing doesn't automatically preprocess the train_scores.csv csv file since it isn't that big. Instead, the corresponding SplitDataset class reads it in and uses the id to match the score to an essay. I recommend cp-ing the train_scores.csv file to test_scores.csv or just using the same scores file for both stages since extra scores will be ignored by the SplitDataset class anyway.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
models		models
.gitignore		.gitignore
README.md		README.md
dataset_visualization.ipynb		dataset_visualization.ipynb
preprocess.py		preprocess.py
split_dataset.py		split_dataset.py
train.py		train.py
train_conv_lstm_model.py		train_conv_lstm_model.py
train_lenet_model.py		train_lenet_model.py
train_one_hot_conv_model.py		train_one_hot_conv_model.py
train_test_fns.py		train_test_fns.py
train_vgg_model.py		train_vgg_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSCI5527-course-project

Notes from Sam on splitting into individual csvs via `preprocess.py`

About

Releases

Packages

Contributors 2

Languages

shelman/CSCI5527-course-project

Folders and files

Latest commit

History

Repository files navigation

CSCI5527-course-project

Notes from Sam on splitting into individual csvs via preprocess.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Notes from Sam on splitting into individual csvs via `preprocess.py`

Packages