Skip to content

Commit

Permalink
feat(linker): update tokenizer to include more punctuation.
Browse files Browse the repository at this point in the history
  • Loading branch information
nsantacruz committed Oct 26, 2023
1 parent 9d8dc2a commit 873c496
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion sefaria/spacy_function_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
def inner_punct_tokenizer_factory():
def inner_punct_tokenizer(nlp):
# infix_re = spacy.util.compile_infix_regex(nlp.Defaults.infixes)
infix_re = re.compile(r'''[\.\,\?\:\;…\‘\’\`\“\”\"\'~\–\-/\(\)]''')
infix_re = re.compile(r'''[.,?!:;…‘’`“”"'~–\-/()<>]''')
prefix_re = spacy.util.compile_prefix_regex(nlp.Defaults.prefixes)
suffix_re = spacy.util.compile_suffix_regex(nlp.Defaults.suffixes)

Expand Down

0 comments on commit 873c496

Please sign in to comment.