Skip to content

Release for spaCy v3

Compare
Choose a tag to compare
@BramVanroy BramVanroy released this 12 Jul 10:17
· 56 commits to master since this release

This release makes spacy_conll compatible with spaCy's new v3 release. On top of that some improvements were made to make the project easier to maintain.

  • [general] Breaking change: spaCy v3 required (closes #8)
  • [init_parser] Breaking change: in all cases, is_tokenized now disables sentence segmentation
  • [init_parser] Breaking change: no more default values for parser or model anywhere. Important to note here that
    spaCy does not work with short-hand codes such as en any more. You have to provide the full model name, e.g.
    en_core_web_sm
  • [init_parser] Improvement: models are automatically downloaded for Stanza and UDPipe
  • [cli] Reworked the position of the CLI script in the directory structure as well as the arguments. Run
    parse-as-conll -h for more information.
  • [conllparser] Made the ConllParser class available as a utility to easily create a wrapper for a spaCy-like
    parser which can return the parsed CoNLL output of a given file or text
  • [conllparser,cli] Improvements to usability of n_process. Will try to figure out whether multiprocessing
    is available for your platform and if not, tell you so. Such a priori error messages can be disabled, with
    ignore_pipe_errors, both on the command line as in ConllParser's parse methods