Skip to content

Releases: gandersen101/spaczz

v0.6.1 Regex[Searcher/Matcher] Bugfix

12 Mar 01:16
Compare
Choose a tag to compare

What’s Changed

🪲 Fixes

🚨 Testing

👷 Continuous Integration

📚 Documentation

v0.6.0 Returning Patterns, Consistency and Support Updates

01 May 13:05
Compare
Choose a tag to compare
  • Returning the matching pattern for all matchers, this is a breaking change as matches are now tuples of length 5 instead of 4.
  • Regex and token matches now return match ratios.
  • Support for python<=3.11,>=3.7, along with rapidfuzz>=1.0.0.
  • Dropped support for spaCy v2. Sorry to do this without a deprecation cycle, but I stepped away from this project for a long time.
  • Removed support of "spaczz_" preprended optional SpaczzRuler init arguments. Also, sorry to do this without a deprecation cycle.
  • Matcher.pipe methods, which were deprecated, are now removed.
  • spaczz_span custom attribute, which was deprecated, is now removed.

v0.5.4 RegexSearcher Bugfix

23 Dec 19:29
Compare
Choose a tag to compare

What’s Changed

📚 Documentation

v0.5.3 Bugfix: TokenMatcher Match Order

22 May 21:39
c2895f6
Compare
Choose a tag to compare
  • Fixed a "bug" in the TokenMatcher. Spaczz expects token matches returned in order of ascending match start, then descending match length. However, spaCy's Matcher does not return matches in this order by default. Added a sort in the TokenMatcher to ensure this.

v0.5.2 CI/Dev Updates

04 May 15:34
Compare
Choose a tag to compare
  • Minor updates to pre-commits and noxfile.

v0.5.1 Dependency and Typing Updates

25 Apr 15:52
d2161b5
Compare
Choose a tag to compare
  • Minor updates to allowed dependency versions and CI.
  • Switched back to using typing types instead of generic types because spaCy v3 uses Pydantic and Pydantic does not support generic types in Python < 3.9. I don't know if this would actually cause any issues but I am playing it safe. Potentially more changes for spaczz to play nicely with Pydantic to follow.

v0.5.0 spaCy v3 Support

01 Mar 19:22
de34205
Compare
Choose a tag to compare

What’s Changed

🚀 Features

  • Enhancement spacy3 support (#52) @gandersen101
    • Support for spaCy v3.
    • If using spaCy v3, the SpaczzRuler optional arguments no longer need to be prepended with "spaczz_". This will still work in most cases offering some backwards compatibility. However, optional arguments prepended with "spaczz_" will not work with spaCy v3's new spacy.load and nlp.add_pipe config driven APIs. It is therefore recommended that users move away from using the prepended versions if using spaCy v3. It should be noted however that the prepended arguments are still necessary if using spaczz with spaCy v2.
    • Matcher.pipe methods are now deprecated in accordance with spaCy v3.
    • spaczz_span custom attribute is deprecated in favor of spaczz_ent. They both have the same functionality but the -spaczz_ent name makes more sense.

v0.4.2 SpaczzRuler Bug Fixes

25 Feb 03:44
Compare
Choose a tag to compare
  • Fixed a bug where TokenMatcher callbacks did nothing.
  • Fixed a bug where spaczz_token_defaults in the SpaczzRuler did nothing.
  • Fixed a bug where defaults would not be added to their respective matchers when loading from bytes/disk in the SpaczzRuler.
  • Fixed some inconsistencies in the SpaczzRuler which will be particularly noticeable with ent_ids. See the "Known Issues" section below for more details.
  • Small tweaks to spaczz custom attributes.
  • Available fuzzy matching functions have changed in RapidFuzz and have changed in spaczz accordingly.
  • Preparing for spaCy v3 updates.

v0.4.1 Phrasesearch Performance Improvements

31 Jan 00:04
Compare
Choose a tag to compare
  • Spaczz's phrase searching algorithm has been further optimized so both the FuzzyMatcher and SimilarityMatcher should run considerably faster.
  • The FuzzyMatcher and SimilarityMatcher now include a thresh parameter that defaults to 100. When matching, if flex > 0 and the match ratio is >= thresh during the initial scan of the document, no optimization will be attempted. By default perfect matches don't need to be run through match optimization.
  • flex now defaults to len(pattern) // 2. This creates more meaningful difference between "default" and "max" with longer patterns.
  • PEP585 code updates.

v0.4.0 TokenMatcher

20 Jan 19:24
Compare
Choose a tag to compare

Adds the TokenMatcher to spaczz and integrates it with the SpaczzRuler. Also overhauls spaczz's custom attributes and includes some quality of life improvements and bug fixes.