ValueError When Loading TACRED #3018

dobbersc · 2022-12-11T13:04:09Z

Describe the bug
When loading TACRED the following ValueError will be thrown:

Traceback (most recent call last):
  File "/glusterfs/dfs-gfs-dist/dobbersc/.local/bin/miniconda3/envs/flair/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/relation_extraction.py", line 260, in __init__
    super(RE_ENGLISH_TACRED, self).__init__(
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 403, in __init__
    super(ColumnCorpus, self).__init__(
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 296, in __init__
    [
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 297, in <listcomp>
    ColumnDataset(
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 501, in __init__
    sentence = self._convert_lines_to_sentence(
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 696, in _convert_lines_to_sentence
    key, value = comment_row.split("=", 2)
ValueError: too many values to unpack (expected 2)

To Reproduce

from flair.datasets import RE_ENGLISH_TACRED
RE_ENGLISH_TACRED()

Environment (please complete the following information):

OS: openSUSE Leap 15.3
Version: flair on master

Additional context
The cause of this error is from the changes in #3006 on how comments are handled in the ColumnDataset.

flair/flair/datasets/sequence_labeling.py

Line 694 in 5a13598

for comment_row in comment.split("\t"):

The generated .conllu files from the original TACRED corpus contain comments not supported by the currently implemented comment parser. Example (since TACRED is a private dataset):

# text = `` Market conditions became more challenging through November and December , '' said Sir Stuart Rose , the company 's chief executive .
# sentence_id = e7798fb926d91a16cd93
# relations = 15;16;22;22;per:title
1       ``      O
2       Market  O
3       conditions      O
4       became  O
5       more    O
6       challenging     O
7       through O
8       November        B-DATE
9       and     O
10      December        B-DATE
11      ,       O
12      ''      O
13      said    O
14      Sir     O
15      Stuart  B-PERSON
16      Rose    I-PERSON
17      ,       O
18      the     O
19      company O
20      's      O
21      chief   O
22      executive       B-TITLE
23      .       O

The text was updated successfully, but these errors were encountered:

helpmefindaname · 2022-12-12T09:49:17Z

Hi @dobbersc
I am sorry for adding this bug, as I don't have access to the dataset, can you verify that #3020 is working?

dobbersc added the bug Something isn't working label Dec 11, 2022

helpmefindaname mentioned this issue Dec 12, 2022

fix comment parsing for conllu datasets #3020

Merged

helpmefindaname closed this as completed Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError When Loading TACRED #3018

ValueError When Loading TACRED #3018

dobbersc commented Dec 11, 2022

helpmefindaname commented Dec 12, 2022

ValueError When Loading TACRED #3018

ValueError When Loading TACRED #3018

Comments

dobbersc commented Dec 11, 2022

helpmefindaname commented Dec 12, 2022