Skip to content

Latest commit

 

History

History
15 lines (13 loc) · 2.59 KB

ROADMAP.md

File metadata and controls

15 lines (13 loc) · 2.59 KB

Roadmap 🧭

With winkNLP's production ready release in late 2020, the core is already in place. Apart from sustainment, our goal is to continuously improve it by adding new features and capabilities. We have listed some of the features that should be added to winkNLP:

S. No. Feature Complexity Status
01. Extractive Summarization:
Add its.sentenceWiseImprotance helper to extract sentence wise impotance from a document. This may be used for extractive summarization apart from other usage. While it should be language agnostic, but it should leverage loaded language model's capability to improve summarization.
Simple Completed
02. Text Pre-processor:
Add a text preprocessing utility that provides options to (a) filter specific tokens based on their properties such as pos, isStopWordFlag, and type; (b) map entity type with a definable keyword; (c) add bigrams & trigrams and (d) inject sentiment. The API should follow winkNLP style and standards.
Medium YTS
03. Word Vectors Integration:
Add integration with various word vectors starting with GloVe. This should include compression/decompression for fast loading, helpers for token, sentence and document vector computation.
High Completed
04. Sub-word Tokenizer:
Add sub-word tokenization feature using techniques like Byte Pair Encoding (BPE) and/or WordPiece. The processing pipeline should allow choice of tokenizer.
Very High YTS
05. Compose Corpus:
Add a utility to produce training corpus using patterns and cartesian product.
Simple YTS
06. Keywords Extraction:
Add its.keywords helper to extract keywords/keyphrases from the text via doc.out( its.keywords ). While it should be language agnostic, but it should leverage loaded language model's capability to improve extraction.
Simple YTS
07. BM25 Vectorizer:
Add a utility to train and also vectorize text based on an already trained BM25 model. It will follow wink-nlp styled API.
Medium Completed
08. Constituency/Dependency Parser:
Add a constituency and/or dependency parser — details have to be worked out.
Very High YTS

The above is intended to serve as a guideline for users and contributors for information, feedback and possible participation & discussion.