This project is collections of standalone scripts and patches for converting different pieces of data into SQLite database format. Right now it concentrated on dictionaries that exists in form of ad hoc text files or are purely web-based (this limits ability to query them alot).
Being run from command line, creates file urban-dict.db in current directory. Process is safe to interrupt with pressing Ctrl-C or programmaticaly (this is necessary because it takes very long time to complete) and will continue from point it was stopped previously.
Command line utility, usage python hagen-full.py "path/to/Полная парадигма. Морфология.txt" path/to/sqlite.db First argument is Russian morphology text file, it could be extracted from here (RAR archive). Second argument is resulting DB, it will coontain table parsed_morpho with structure
Column | Possible values |
new_group | True if first row of grouped words |
main_word | True if this word is default form (like infinitive for verbs, etc.) |
optional | True if this form is optional |
word | Word itself |
part_of_speech | 'сущ':1,'прл':2,'гл':3,'мест':4,'союз':5,'предик':6,'част':7,'межд':8,'предл':9, 'числ':10, 'прч':11, 'дееп':12, 'нар':13,'ввод':14 |
gender | 'муж':1, 'жен':2, 'ср':3,'общ':4 |
number | 'ед':1,'мн':2 |
plural | 'им':1,'род':2,'дат':3,'вин':4,'тв':5,'пр':6,'зват':7,'счет':8,'мест':8,'парт':10 |
tense | 'буд':3,'наст':2, 'прош':1 |
declension | '1-е':1,'2-е':2,'3-е':3 |
transitive | 'перех':1,'пер/не':2,'непер':3 |
spirit | 'одуш':1,'неод':2 |
adverb_type | 'вопр':1,'обст':2,'опред':3,'сравн':4 |
circumstance_type | 'врем':1,'места':2,'напр':3,'причин':4,'цель':5 |
definition_type | 'степ':1,'кач':2, 'спос':3 |
perfect_type | 'сов':1,'несов':2,'2вид':3 |
number_type | 'кол':1,'поряд':2,'собир':3,'неопр':4 |
pronoun_type | 'прил':1,'сущ':2,'нар':3 |
infinitive | 1 if true |
pledge | 1 if 'страд' |
impersonal | 1 if 'безл' |
shortened | 1 if 'крат' |
immutable | 1 if 'неизм' |
reflexive | 1 if 'воз' |
superlative | 1 if 'прев' |
imperative | 1 if 'пов' |