-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
148 lines (117 loc) · 8.92 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
0.9.18alpha (20150612)
- Many bugfixes and speedups
- Corrects handling of some rare unicode composing diacritics
- Python bindings
- Adds _closeu()-builtin function which closes sigma
- Adds _sublabel(L1,symbol,L2)-builtin function: substitute all instance of symbol in L1 with L2
- Adds separate pairs, pairs > file, random-pairs commands in interface
- Adds possibility to automatically align lexc-entries
0.9.17alpha (20121117)
- Many bugfixes in foma, flookup (apply code)
- Adds possibility to redirect "print words" output to file by "print words > file"
0.9.16alpha (20111213)
- New faster apply code, as well as optional indexing of arcs in flookup
- adds rewrite rule formalism "transducers with backreferences" (e.g. T -> || L _ R, where T is a transducer)
- flookup now has option to run as UDP server (-S)
- Adds low-level functions _marktail(), _addfinalloop(), _addnonfinalloop(), _addsink(), leftrewr(), _flatten().
- Minor bugfixes in apply code
- Some rare memory leak fixes throughout
- Adds functions so that apply_med() can be used through the API
- Adds functions to API fsm_get_next_state(), fsm_get_next_state_arc()
0.9.15alpha (20111110)
- adds cgflookup utility to facilitate piping with Constraint Grammar parsers
- flookup now applies in chain (a virtual compose) if there are several transducers in a file, except if the [-a] flag is given, in which case flookup looks for an output in each transducer until it finds one (simulating priority union .P.)
- changes and bug fixes in application code/flag diacritic handling
- lexc now accepts forced alignments with zeroes and info strings
- \r (carriage return) awarness added
- adds support for the construct regex @re"regexfile.txt" (assumes file has one regular expression on one line)
- fixes argument order of interface commands compose net and concatenate net
- adds functions in API to read in multiple nets from one file through an iterator fsm_read_binary_file_multiple_init(infilename)/fsm_read_binary_file_multiple(rh))
- improved error messages
- adds "test sequential" ("tseq" for short) and assert stack commands
- fixes bugs with loading/saving nets that have newlines in transitions
0.9.14alpha (20110203)
- Changed tokenization in apply_up() and apply_down() to always choose the leftmost longest tokenization in case of multicharacter symbols sharing the same prefix. Previously, all possible tokenizations were given. For example, earlier regex [a a|aa] would match the input string "aa" in two ways, yielding two outputs, whereas only one tokenization is given now, the leftmost-longest one. This also speeds up the apply functions.
- Fix minor bug in fsm_construct_set_final() and fsm_construct_set_initial(), affecting fsm_reverse()
- Minimize automata read with "read text" and "read spaced-text"
- Minor change in .dot output style
- Fix bug in fsm_trie
0.9.13alpha (20101025)
- Various memory-management improvements throughout
- Adds the commands "read text" and "read spaced-text" for building automata/transducers from word lists. The functionality can be embedded in regexes with @txt"filename" and @stxt"filename". From the API, the functions fsm_read_text_file() and fsm_read_spaced_text_file(), reads files and builds trie FSMs from the lists
- Adds the external utility flookup, which reads words from stdin and applies them to a transducer given in a file, and prints output to stdout
- Adds the command "substitute defined", and the corresponding fsm_substitute_label() in the API.
- Improved prolog file reading and writing
- Adds a function to convert a multicharacter machine to a letter machine (where each transition is exactly one unicode symbol long). It is called fsm_letter_machine() in the API, and _lm() in the built-in functions
- Print random-lower/random-upper/random-words now provide a "more" random distribution. Duplicated are not printed, but prefixed by a count
- Thread-safety changes in apply.c API
- Bug fixes in order of stack operations, printing, reverse operation (.r), function definitions, lower-side priority union (.p.)
- New API calls fsm_construct_* and fsm_get_* for constructing and reading automata/transducers
0.9.12alpha (20091025)
- OSX version "view" command launches native Graphviz (get it from http://www.pixelglow.com/graphviz/)
- export cmatrix writes AT&T format weighted transducer
- More API functions
- Added support for reading/writing FSM files in AT&T format
- Added "apply down/up < infile (> outfile)"
- Separated the API functions into libfoma.h. Separate library builds.
- Apply functions in API are now iterators
- Minor bugfixes in regex parsing / lexc file parsing
- Added the built-in _eq()-operator
- Added extraction of attested symbol pairs and extraction of sigma as an FSM through "label net" and "sigma net"
- Minor bugfix in fsm_cross_product()
- Bugfix in "print dot"
0.9.11alpha (20090722)
- Symbol handling efficiency modifications in fsm_minimize() and fsm_determinize()
- Added the function fsm_epsilon_remove() (not used outside the API, however)
- Minor bugfixes in fsm_rewrite()
- A bug that was reintroduced in fsm_compose() in 0.9.10alpha is fixed
- fsm_lower() fsm_upper() fsm_kleene_star() fsm_concatenate() have been rewritten
- Added the functionality "ambiguous upper","extract ambiguous","extract unambiguous" and the corresponding built-in regex functions: _ambdom(), _ambpart(), _unambpart(). The first function extracts the input words that have multiple paths through a transducer. The other two split up a given transducer into an ambiguous or unambiguous one.
0.9.10alpha (20090717)
- Added support for loading and saving networks in a binary file format and the commands "save stack" "load stack" "save defined" "load defined" and the regular expression construct 'regex @"filename"' which loads the network in "filename" (uses libz)
- foma is no longer compiled with libgc as default
- Some major memory management changes in fsm_minimize() and fsm_determinize(). Efficiency/memory tweaks in fsm_determinize().
- Minor bugfixes in fsm_compose()
0.9.9alpha (20090621)
- Added support for saving networks in prolog file format (write prolog > filename or wpl > filename).
- Bugfixes in "read prolog"/"rpl"
- Minor bugfix in "test unambiguous"/_isunambiguous()
0.9.8alpha (20090604)
- Added option to load confusion matrices (load cmatrix filename) that specify costs for minimum edit distance matching and attaches to a network. The command "print cmatrix" prints the matrix associated with the network on top of the stack. These are used whenever "apply med" is called. If no matrix is specified, the default distance is Levenshtein.
- Added the global variables med-cutoff and med-limit to control the med search
- Minor bugfixes in fsm_compose()
- Changes in dot file output (print dot, print dot > filename) and "view net"
- Added tests for transducer ambiguity (test unambiguous/tunam), and the equivalent built-in function _isunambiguous() which returns a boolean automaton.
0.9.7alpha (20090519)
- Fixed bug in print shortest-string "alias: pss"
- Added tests for transducer functionality (test functional/tfu) and transducer identity (test identity/tid).
- Added support for built-in functions. They use the same notation as user-defined ones, except all begin with _. Functions that are network property tests such as _isfunctional() and _isidentity() return boolean automata (the empty set for FALSE and the empty string for TRUE).
0.9.6alpha (20090506)
- Fixed bugs in apply up
- Fixed bugs in left and right quotient (\\\ and ///)
- Changed fsm_minus(), i.e. A - B, so that it subtracts paths instead of being A & ~B. This means transducer paths can be subtracted.
- Added algorithms for finding the minimum edit distance between a word and a fsm. "apply med" is the same as "apply down" except it finds the cheapest approximate matches. Still experimental.
0.9.5alpha (20090325)
- Fixed bug in lexcread that affected lexicons with very long words
- Fixed bug in context restrict with 0 as both contexts, e.g. a => 0 _ 0
0.9.4alpha (20090116)
- Tweaked memory and efficiency in minimization and determinization algorithms. For large automata, minimization uses much less temporary memory.
- Added 'flag-is-epsilon' variable behavior and composition algorithm now obeys it
--
0.9.3alpha (20090113)
- Recursive script-files now work
- Changed lexc format reading so Lexicon is accepted as well as LEXICON as a keyword
- Minor bugfix for comment behavior in script files
- Minor bugfix in "define" format
--
0.9.2alpha (20090111)
- Determinization now uses much less temporary memory without sacrificing efficiency.
The memory use caused severe problems for compiling very large lexicons or determinizing
large automata with limited memory.
- Added a "-r" command line option for starting foma without readline (mainly for Win32 version)
- Fixed a serious bug in the ternary `[A,B,C] substitution operator and added the interface command
"substitute symbol B for A"
- Added a slight optimization in directed replacements
--
0.9.1alpha (20081231)
- First public release