Skip to content

Search Engine on Stackoverflow corpus and also a web crawler on the Stack Overflow website to get the most popular technologies

Notifications You must be signed in to change notification settings

Saideepthi123/MiniStack

Repository files navigation

MiniStack

Task-1

Search Engine on Stackoverflow corpus nearly 160k+ docs

I split the code in three notebook files

  • DataExtraction is code for collecting data
  • DataPreprocessing is code for processing the data
  • Retrieval is code for retrieving top 10 similar docs

Required libraries and packages are

pandas, numpy, sklearn, nltk, re, os ,sys, csv, xml

Dataset

For this project I collected data from Stack Exchange Data Dump website

Task-2

A web crawler which crawls the Stack Overflow website and finds the most popular technologies at current point of time by getting the tags information of the newest questions asked on the website.

webcrawler is the code for the this task

Required libraries are

urllib3, requests, bs4, zlib, operator, os, sys

How to run

Download the files and make sure all the files and folders are in the same directory

UI Demo

To run the code in server

  • Go to UI-demo folder
  • create a virual environment ( Command : virtualenv env for windows)
  • activate the virtual environment (Command : env/Scripts/activate)
  • install requirements.txt (Command: pip install -r requirements.txt)
  • run python app.py in the terminal

Github repository Link

https://github.com/Saideepthi123/MiniStack

About

Search Engine on Stackoverflow corpus and also a web crawler on the Stack Overflow website to get the most popular technologies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published