You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to prepare cleanup and normalisation methods for any dirty alike text. Imagine texts after OCR.
Cleanup things such as: many new lines; sentences divided in couple of lines; strange encodings symbols; divided words into chunks
Consider checking if content is written in one language or multiple and optionally remove not main language content by default everything which is not in Polish should be removed.
We want to prepare cleanup and normalisation methods for any dirty alike text. Imagine texts after OCR.
Please sync with @laugustyniak or @pedrito87 in case of cleaning after OCR texts.
The text was updated successfully, but these errors were encountered: