Kaggle_Quora

Target

Developed deep learning models in Keras to addresses the Quora Question Pairs.
https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs
https://www.kaggle.com/c/quora-question-pairs

Model structure - Siamese Network

The first model is based on a Siamese Network Structure for classifying text similarity from Siamese Recurrent Architectures for Learning Sentence Similarity

The model structure is like:

The top exponetial transformation was replaced with a multiple-layer neural network. The text sentence was first embedding with the Glove pretrained word embedding. Then I fed each embedded question sentence into the same LSTM layer. Next, the two vector outputs from the LSTM are concatenated into one vector, combined with a vector of handcrafted features were fed into fully connected layers to produce the final classification result. The visualization of the model structure is:

Model evaluation

Data The train data is split into (90% train set/10% dev set), the model is trained on the trains set and tuned on the dev set.
Framework Keras(Tensorflow backend) for on paperspace.com with
Pretrained word embedding The pretrained embedding model is Common Crawl (840B tokens, 300 dimension) https://nlp.stanford.edu/projects/glove/
Best model performance On Kaggle private leader board
Loss: 0.17978
Rank: 660/3307(20%)
Accuracy on dev set: 88%

Model Structure - 1D CNN with global average pooling

Reference: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
Model structure looks like:

Model evaluation

The embedding and data used are the same as the model above.
Best model performance(The model was submitted after the competition, so the results is not recorded on leader board) On Kaggle private leader board
Loss: 0.17028
Rank: 552/3307(16%)
Accuracy on dev set: 89%

Handcrafted features

Text common ratio(how many words overlap)
Text bi-gram common ratio(how many two-word phrase overlap)
jaccard distance
nlevenshtein distance
sorensen distance
sentence word length
sentence character length
sentence word length difference
sentence character length difference
Tf-IDF(sum, mean, cosine distance)

Reference

Learning Sentence Similarity with Siamese Recurrent Architectures
https://engineering.quora.com/Semantic-Question-Matching-with-Deep-Learning
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
https://www.kaggle.com/rethfro/1d-cnn-single-model-score-0-14-0-16-or-0-23
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.ipynb_checkpoints		.ipynb_checkpoints
CNN_Keras.py		CNN_Keras.py
CNN_img.png		CNN_img.png
Experiment_of_feature_extraction_from_text.ipynb		Experiment_of_feature_extraction_from_text.ipynb
README.md		README.md
RNN_Keras.ipynb		RNN_Keras.ipynb
RNN_Keras.py		RNN_Keras.py
model_1_img.png		model_1_img.png
siamese_img.jpeg		siamese_img.jpeg
word2vec_skip_gram.ipynb		word2vec_skip_gram.ipynb
xgboost_with_handcraft_feature.ipynb		xgboost_with_handcraft_feature.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle_Quora

Target

Model structure - Siamese Network

Model evaluation

Model Structure - 1D CNN with global average pooling

Model evaluation

Handcrafted features

Reference

About

Releases

Packages

Languages

hncpr1992/Kaggle_Quora

Folders and files

Latest commit

History

Repository files navigation

Kaggle_Quora

Target

Model structure - Siamese Network

Model evaluation

Model Structure - 1D CNN with global average pooling

Model evaluation

Handcrafted features

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages