Similarity through extracted phrases using Self-Cross Attention Bert

Term Project for CSE 472: Machine Learning Sessional offered by the CSE Department of BUET. This project modifies the existing architecture of Bert to introduce self-cross attention and measures the performances on a semantic similarity task.

Technology Stack:

Frameworks: Pytorch, Sklearn, HuggingFace (models repository + modification of Bert source code)
MLOps: Wandb

Project Wallkthrough

Model Modification

In the vanilla Bert encoder, the output of previous layer is added to the output with a residual connection. But this results in the direct combination of discrepant features. Instead, we can cross attend previous layer output and current layer bert encoder output to get more harmonious features. To mitigate potential adverse effects, trainable weighted summation of the original residual connection and cross attended output are used. This model is trained with the USA Patent2Patent Matching Dataset in order to detect similarity between two phrases.

Original	Proposed

Pipeline

Given a pair of sentences, keyBERT is first used on each sentence to extract top 5 keyphrases. Then, the similarity of each keyword of the 1st sentence with each keyword of the 2nd sentence is extracted via the trained modified Bert model. This results in 25 scores which is passed through a MLP to get the final similarity score.

In order to evaluate the pipeline, the Quora Question Pair dataset is used. Pretrained SentenceTransformer is used to generate the labels. The overall method and the results can be found in the presentation file.

Project Developers:

Najibul Haque Sarker (1705044)
Tahmeed Tarek (1705039)

Project Supervisor:

Dr. Mohammed Eunus Ali

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Assets		Assets
.gitignore		.gitignore
Final_Presentation.pptx		Final_Presentation.pptx
Full Pipeline.ipynb		Full Pipeline.ipynb
P2P_Bert.ipynb		P2P_Bert.ipynb
P2P_Bert_Self_Cross_Attention.ipynb		P2P_Bert_Self_Cross_Attention.ipynb
P2P_Deberta.ipynb		P2P_Deberta.ipynb
QP_MLP.ipynb		QP_MLP.ipynb
README.md		README.md
single-example.ipynb		single-example.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Similarity through extracted phrases using Self-Cross Attention Bert

Project Wallkthrough

Model Modification

Pipeline

About

Releases

Packages

Languages

Najib-Haq/Similarity-through-extracted-phrases-using-Self-Cross-Attention-Bert

Folders and files

Latest commit

History

Repository files navigation

Similarity through extracted phrases using Self-Cross Attention Bert

Project Wallkthrough

Model Modification

Pipeline

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages