Limit input string to 512 characters to avoid CUDA crash #58

ulf1 · 2022-08-30T11:52:23Z

Problem

# If
assert len(sentence) > 512
# then
annotated = model_trankit(sentence, is_sent=True)
# result in CUDA error, e.g.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [19635,0,0], thread: [112,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

Cause
XLM-Roberta can only process 512 characters.

Possible fix

trankit/trankit/pipeline.py

Line 1066 in 1c19b9b

ori_text = deepcopy(input)

Change

...

                ori_text = deepcopy(input)
                tagged_sent = self._posdep_sent(input)
...

to

...

                ori_text = deepcopy(input)
                ori_text = ori_text[:512]   # <<< TRIM STRING TO MAX 512
                tagged_sent = self._posdep_sent(input)
...

The text was updated successfully, but these errors were encountered:

ulf1 · 2022-08-30T11:53:36Z

A quick fix for other trankit users would be

annotated = model_trankit(sentence[:512], is_sent=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit input string to 512 characters to avoid CUDA crash #58

Limit input string to 512 characters to avoid CUDA crash #58

ulf1 commented Aug 30, 2022

ulf1 commented Aug 30, 2022

Limit input string to 512 characters to avoid CUDA crash #58

Limit input string to 512 characters to avoid CUDA crash #58

Comments

ulf1 commented Aug 30, 2022

ulf1 commented Aug 30, 2022