Yandex_ML_track_2018

Result: 29th place, mean NDCG x 100000 = 85352

Problem:

Rank replies by relevance according to the context. Context usually consists of 3 replicas with several replies. Each reply has relevance and confidence - I use it's product as a target variable.

Solution:

Use pretrained fastText model to represent each sequence as concatenated replicas embeddings and match it all with target. Let's create KFold cross-validation with 10 folds, train separate simple lightgbm regressor on every iteration and then calculate mean of all models predictions to make result more "stable".

Pipeline:

Install requirements.txt pip3 install -r requirements.txt
Change paths in config.py and then run python3 prep.py
Download fastText and build it using make.
Download fastText model trained on wikipedia and common crawl.
Fill get_vectors.txt with needed paths and copy it to the fastText folder.
Run bash get_vectors.txt
Make numpy array from processed dataset python3 prep_fasttext_data.py
Train model python3 fasttext_lgbm.py

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
experiments		experiments
.gitignore		.gitignore
README.md		README.md
config.py		config.py
fasttext_lgbm.py		fasttext_lgbm.py
get_vectors.txt		get_vectors.txt
prep.py		prep.py
prep_fasttext_data.py		prep_fasttext_data.py
requirements.txt		requirements.txt
scripts.py		scripts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yandex_ML_track_2018

Result: 29th place, mean NDCG x 100000 = 85352

Problem:

Solution:

Pipeline:

About

Releases

Packages

Languages

gasparian/Yandex_ML_track_2018

Folders and files

Latest commit

History

Repository files navigation

Yandex_ML_track_2018

Result: 29th place, mean NDCG x 100000 = 85352

Problem:

Solution:

Pipeline:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages