Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 1.52 KB

File metadata and controls

17 lines (10 loc) · 1.52 KB

📑 Similar Content Service - Heartathon 2019

Applied Flair Word+Document Embeddings on a small subset of the given mission literature dataset. Then computed cosine similarity on the embedding vectors. Top 'k' elements from resulting vector are mapped with the content id's and sent back as 'Similar Content' in an REST API.

Tech Stack includes Python (pytorch, flair, pandas) + Azure Machine Learning Service for training in cloud and model deployment as webservice (training on full dataset is in progress).

Complete API Documentation

https://documenter.getpostman.com/view/5756089/SVfGzCVu?version=latest

References

  1. Flair: State-of-the-Art Natural Language Processing Library (NLP)
  2. Contextual String Embeddings for Sequence Labelling
  3. Text Similarities : Estimate the degree of similarity between two texts
  4. Quick review on Text Clustering and Text Similarity Approaches