PreprocessingMetadata enhancement #2

hlibbabii · 2020-02-27T13:43:00Z

Rename PreprocessingMetadata -> PreppedTokenMetadata
Represent word_boundaries field as a list of the number of subtoken in each token, e.g
[1, 3, 1, 2] instead of [0, 1, 4, 5, 7]
Remove non-processible tokens filed. Return non-processible tokens as a separate object
Provide a method for returning the metadata for the last tokens:

>>> metadata.for_last_tokens(n: int)

The text was updated successfully, but these errors were encountered:

hlibbabii · 2020-02-27T13:44:42Z

This enhancement is useful for easier implementation of the calculation of the context statistics in giganticode-langmodels

…s as a separate object

…en in each token, e.g [1, 3, 1, 2] instead of [0, 1, 4, 5, 7]

hlibbabii added this to the Codeprep v2 milestone Feb 27, 2020

hlibbabii added a commit that referenced this issue Feb 27, 2020

#2: Remove non-processible tokens filed. Return non-processible token…

eb1ef76

…s as a separate object

hlibbabii added a commit that referenced this issue Feb 27, 2020

#2: rename PreprocessingMetadata -> PreppedTokenMetadata

8c9a061

hlibbabii added a commit that referenced this issue Feb 27, 2020

#2: Represent word_boundaries field as a list of the number of subtok…

8ed5685

…en in each token, e.g [1, 3, 1, 2] instead of [0, 1, 4, 5, 7]

hlibbabii added a commit that referenced this issue Feb 27, 2020

#2: make PreppedTokenMetadata a dataclass

62e0bc0

hlibbabii added a commit that referenced this issue Feb 27, 2020

#2: Provide a method which truncates old metadata

fa4460a

hlibbabii added a commit that referenced this issue Feb 28, 2020

#2: make changes to iterators

5fa14e9

hlibbabii added the enhancement New feature or request label Mar 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PreprocessingMetadata enhancement #2

PreprocessingMetadata enhancement #2

hlibbabii commented Feb 27, 2020

hlibbabii commented Feb 27, 2020

PreprocessingMetadata enhancement #2

PreprocessingMetadata enhancement #2

Comments

hlibbabii commented Feb 27, 2020

hlibbabii commented Feb 27, 2020