Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError When Loading TACRED #3018

Closed
dobbersc opened this issue Dec 11, 2022 · 1 comment
Closed

ValueError When Loading TACRED #3018

dobbersc opened this issue Dec 11, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@dobbersc
Copy link
Collaborator

Describe the bug
When loading TACRED the following ValueError will be thrown:

Traceback (most recent call last):
  File "/glusterfs/dfs-gfs-dist/dobbersc/.local/bin/miniconda3/envs/flair/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/relation_extraction.py", line 260, in __init__
    super(RE_ENGLISH_TACRED, self).__init__(
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 403, in __init__
    super(ColumnCorpus, self).__init__(
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 296, in __init__
    [
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 297, in <listcomp>
    ColumnDataset(
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 501, in __init__
    sentence = self._convert_lines_to_sentence(
  File "/glusterfs/dfs-gfs-dist/dobbersc/PyCharmProjects/flair/flair/datasets/sequence_labeling.py", line 696, in _convert_lines_to_sentence
    key, value = comment_row.split("=", 2)
ValueError: too many values to unpack (expected 2)

To Reproduce

from flair.datasets import RE_ENGLISH_TACRED
RE_ENGLISH_TACRED()

Environment (please complete the following information):

  • OS: openSUSE Leap 15.3
  • Version: flair on master

Additional context
The cause of this error is from the changes in #3006 on how comments are handled in the ColumnDataset.

for comment_row in comment.split("\t"):

The generated .conllu files from the original TACRED corpus contain comments not supported by the currently implemented comment parser. Example (since TACRED is a private dataset):

# text = `` Market conditions became more challenging through November and December , '' said Sir Stuart Rose , the company 's chief executive .
# sentence_id = e7798fb926d91a16cd93
# relations = 15;16;22;22;per:title
1       ``      O
2       Market  O
3       conditions      O
4       became  O
5       more    O
6       challenging     O
7       through O
8       November        B-DATE
9       and     O
10      December        B-DATE
11      ,       O
12      ''      O
13      said    O
14      Sir     O
15      Stuart  B-PERSON
16      Rose    I-PERSON
17      ,       O
18      the     O
19      company O
20      's      O
21      chief   O
22      executive       B-TITLE
23      .       O
@dobbersc dobbersc added the bug Something isn't working label Dec 11, 2022
@helpmefindaname
Copy link
Collaborator

Hi @dobbersc
I am sorry for adding this bug, as I don't have access to the dataset, can you verify that #3020 is working?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants