Logistic regression with TF-IDF features

This is an open source implementation of our solution to the competition: PAN@CLEF 2020 Author Profiling.

Our approach uses a logistic regressor model with character and word n-grams TF-IDF features.

Dependencies

Python 3.7
We need the following packages (using pip):

pip install hyperopt
pip install joblib
pip install scikit-learn
pip install nltk
pip install tweet-preprocessor

Results

Our approach achieves the third best solution in the private test, the results are shown in the table below:

LANG	ACC
ES	0.78
EN	0.73

Our team is deborjavalero20, you can check the full ranking in this link

Usage

The commands below show how to replicate the experiments.

The train.py script trains the Spanish and English models using the corpus located at DATA_DIR and stores the trained models on RESOURCES_DIR.

python3 train.py DATA_DIR RESOURCES_DIR

The test.py script generates the Spanish and the English hypothesis. The argument DATA_DIR is the folder of the input data, and the argument HYPOTHESIS_DIR will be the directory to store own hypothesis.

python3 test.py -c DATA_DIR -o HYPOTHESIS_DIR

License

The MIT License (MIT)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logistic regression with TF-IDF features

Dependencies

Results

Usage

License

About

Releases

Packages

Languages

License

franbvalero/clef-2020-author-profiling

Folders and files

Latest commit

History

Repository files navigation

Logistic regression with TF-IDF features

Dependencies

Results

Usage

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages