Skip to content

Khodnevis-Research-Lab/khoshnevis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Khoshnevis (خوشنويس)

Python package for normalizing Persian text.

  • Text Cleaning
  • URL Remover
  • Emoji Remover
  • Text Tokenization
  • Punctuation Space Correction
  • Half Space Correction (using Parsivar)
  • Standardize Alphabet
  • NLTK compatible
  • Python 3 support

Usage

>>> from khoshnevis import Normalizer

>>> normalizer = Normalizer()

>>> normalizer.normalize(text="استفاده از نیم‌فاصله متن را زیبا مي كند", zwnj="\u200c", 
                         clean_url=False, remove_emoji=False)
text (str): input text
zwnj (str, optional): Zero-width non-joiner character. Defaults to "\u200c".
clean_url (bool, optional): removes all URLs from text. Defaults to True.
remove_emoji (bool, optional): removes all emojis from the text. Defaults to True.

Installation

The latest stable version of Hazm can be installed through pip:

pip install khoshnevis

Citation info

@misc{khoshnevis,
  author = {HamidReza Attar, Milad Lotfi, Saied Alimoradi},
  title = {Khoshnevis, a Python library for Persian text preprocessing},
  year = {2022},
  url= {https://www.khodnevisai.com/},
}

Releases

No releases published

Packages

No packages published

Languages