Changelog

This document contains descriptions of all the significant changes made to ERRANT since its release.

16-11-18

The compare_m2.py evaluation script was refactored to make it easier to use.
We tweaked the alignment code and merging rules to not only make ERRANT ~700% faster, but also slightly more accurate.

Specifically, we simplified the lemma cost to not repeatedly call the lemmatiser for different parts-of-speech, and also replaced the character cost with python's native difflib.SequenceMatcher instead of a character based Damerau-Levenshtein alignment.

This significantly increased the speed, but also slightly decreased performance (~0.5 F1 worse), so we additionally revisited the merging rules. The new implementation now processes the largest combinations of adjacent non-matches first, instead of processing one alignment at a time, and now also features some new or slightly modified rules (see scripts/align_text.py for more information).

The differences between the old and new version are summarised in the following table.

Dataset	Sents	Setting	P	R	F1	Time (secs)
FCE Dev	2371	Old New	82.77 84.00	85.22 85.52	83.98 84.75	260 40
FCE Test	2805	Old New	83.88 85.17	85.84 85.93	84.85 85.55	300 45
FCE Train	30200	Old New	82.69 84.06	85.12 85.38	83.89 84.72	2965 340
CoNLL-2013	1381	Old New	82.64 83.27	82.45 82.24	82.54 82.75	315 45
CoNLL-2014.0	1312	Old New	78.48 79.02	80.38 80.18	79.42 79.59	350 45
CoNLL-2014.1	1312	Old New	82.50 84.04	82.73 82.85	82.61 83.44	385 50
NUCLE	57151	Old New	70.14 73.20	80.27 81.16	71.95 76.97	7565 725

23-08-18

Fix arbitrary reordering of edits with the same start and end span; e.g.
S I am happy .
A 2 2|||M:ADV|||really|||REQUIRED|||-NONE-|||0
A 2 2|||M:ADV|||very|||REQUIRED|||-NONE-|||0

VS.

S I am happy .
A 2 2|||M:ADV|||very|||REQUIRED|||-NONE-|||0
A 2 2|||M:ADV|||really|||REQUIRED|||-NONE-|||0

10-08-18

Added support for multiple annotators in parallel_to_m2.py.
Before: python3 parallel_to_m2.py -orig <orig_file> -cor <cor_file> -out <out_file>
After: python3 parallel_to_m2.py -orig <orig_file> -cor <cor_file1> [<cor_file2> ...] -out <out_file>
This is helpful if you have multiple annotations for the same orig file.

17-12-17

In November, spaCy changed significantly when it became version 2.0.0. Although we have not tested ERRANT with this new version, the main change seemed to be a slight increase in performance (pos tagging and parsing etc.) at a significant cost to speed. Consequently, we still recommend spaCy 1.9.0 for use with ERRANT.

22-11-17

ERRANT would sometimes run into memory problems if sentences were long and very different. We hence changed the default alignment from breadth-first to depth-first. This bypassed the memory problems, made ERRANT faster and barely affected results.

10-05-17

ERRANT v1.0 released.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changelog.md

changelog.md

Changelog

16-11-18

23-08-18

10-08-18

17-12-17

22-11-17

10-05-17

Files

changelog.md

Latest commit

History

changelog.md

File metadata and controls

Changelog

16-11-18

23-08-18

10-08-18

17-12-17

22-11-17

10-05-17