Skip to content

Commit

Permalink
fix(linker): add lots of dashes to tokenizer
Browse files Browse the repository at this point in the history
  • Loading branch information
nsantacruz committed Feb 11, 2024
1 parent df270cb commit b84aeeb
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion sefaria/spacy_function_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
def inner_punct_tokenizer_factory():
def inner_punct_tokenizer(nlp):
# infix_re = spacy.util.compile_infix_regex(nlp.Defaults.infixes)
infix_re = re.compile(r'''[.,?!:;…‘’`“”"'~–\-/()<>]''')
infix_re = re.compile(r'''[.,?!:;…‘’`“”"'~–—\-‐‑‒־―⸺⸻/()<>]''')
prefix_re = spacy.util.compile_prefix_regex(nlp.Defaults.prefixes)
suffix_re = spacy.util.compile_suffix_regex(nlp.Defaults.suffixes)

Expand Down

0 comments on commit b84aeeb

Please sign in to comment.