Skip to content

DupPredictor is a framework to detect duplicate questions on Stack Overflow using machine learning techniques . The algorithm consists of LDA (latent Dirichlet Allocation) for topic modelling to classify the text into topics.

Notifications You must be signed in to change notification settings

Abhi7410/StackOverFlow_Duplicate_Question_Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StackOverFlow_Duplicate_Question_Detection

DupPredictor is a framework to detect duplicate questions on Stack Overflow using machine learning techniques. The framework consists of two phases: model building and prediction. During the model building phase, a model is trained using a dataset of past duplicate questions. The data is obtained by running a query on StackExchange and consists of 50,000 rows with original questions and duplicate questions. The data preprocessing involves removing html tags, punctuation, stopwords and stemming the text in the titles and bodies. The similarity scores are calculated by computing the similarity between the titles of the questions based on the common words they share. The model also uses Latent Dirichlet Allocation (LDA) for topic modeling to classify the text into topics. The model is trained using a set of 300 duplicate questions and the parameters are calculated using a sample-based greedy method or a gradient-based optimization method.

About

DupPredictor is a framework to detect duplicate questions on Stack Overflow using machine learning techniques . The algorithm consists of LDA (latent Dirichlet Allocation) for topic modelling to classify the text into topics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published