Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] error in typing_metric.py #24

Open
6 tasks done
averieso opened this issue Jun 2, 2023 · 10 comments
Open
6 tasks done

[Bug] error in typing_metric.py #24

averieso opened this issue Jun 2, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@averieso
Copy link

averieso commented Jun 2, 2023

Checklist before your report.

  • I have verified that the issue exists against the master branch of AdaSeq.
  • I have read the relevant section in the contribution guide on reporting bugs.
  • I have checked the issues list for similar or identical bug reports.
  • I have checked the pull requests list for existing proposed fixes.
  • I have checked the commit log to find out if the bug was already fixed in the master branch.

What happened?

error occurred during the evaluation phase of the training script for entity typing.

Python traceback

show/hide

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/averie/name-entity-recognition/experiments/adaseq/scripts/train.py", line 38, in <module>
    train_model_from_args(args)
  File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/commands/train.py", line 84, in train_model_from_args
    train_model(
  File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/commands/train.py", line 164, in train_model
    trainer.train(checkpoint_path)
  File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/training/default_trainer.py", line 146, in train
    return super().train(checkpoint_path=checkpoint_path, *args, **kwargs)
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 689, in train
    self.train_loop(self.train_dataloader)
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 1220, in train_loop
    self.invoke_hook(TrainerStages.after_train_epoch)
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 1372, in invoke_hook
    getattr(hook, fn_name)(self)
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/hooks/evaluation_hook.py", line 54, in after_train_epoch
    self.do_evaluate(trainer)
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/hooks/evaluation_hook.py", line 67, in do_evaluate
    eval_res = trainer.evaluate()
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 778, in evaluate
    metric_values = self.evaluation_loop(self.eval_dataloader,
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/trainer.py", line 1272, in evaluation_loop
    metric_values = single_gpu_test(
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/utils/inference.py", line 56, in single_gpu_test
    evaluate_batch(trainer, data, metric_classes, vis_closure)
  File "/home/averie/name-entity-recognition/experiments/adaseq/env/lib/python3.10/site-packages/modelscope/trainers/utils/inference.py", line 183, in evaluate_batch
    metric_cls.add(batch_result, data)
  File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/metrics/typing_metric.py", line 128, in add
    pred_results.append(one_hot_to_list(predicts[i][j]))
  File "/home/averie/name-entity-recognition/experiments/adaseq/adaseq/metrics/typing_metric.py", line 123, in one_hot_to_list
    id_list = set((np.where(in_tensor.detach().cpu() == 1)[0]))
AttributeError: 'set' object has no attribute 'detach'

Operating system

Ubuntu 22.04.2 LTS

Python version

3.10.6

Output of pip freeze

show/hide

addict==2.4.0
aiohttp==3.8.4
aiosignal==1.3.1
aliyun-python-sdk-core==2.13.36
aliyun-python-sdk-kms==2.16.1
async-timeout==4.0.2
attrs==23.1.0
certifi==2023.5.7
cffi==1.15.1
charset-normalizer==3.1.0
cmake==3.26.3
crcmod==1.7
cryptography==41.0.1
datasets==2.8.0
dill==0.3.6
einops==0.6.1
filelock==3.12.0
frozenlist==1.3.3
fsspec==2023.5.0
gast==0.5.4
huggingface-hub==0.15.1
idna==3.4
Jinja2==3.1.2
jmespath==0.10.0
joblib==1.2.0
lit==16.0.5
MarkupSafe==2.1.2
modelscope==1.6.0
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.14
networkx==3.1
numpy==1.22.0
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
oss2==2.18.0
packaging==23.1
pandas==1.5.3
Pillow==9.5.0
pyarrow==12.0.0
pycparser==2.21
pycryptodome==3.18.0
python-dateutil==2.8.2
pytz==2023.3
PyYAML==6.0
regex==2023.5.5
requests==2.31.0
responses==0.18.0
scikit-learn==1.2.2
scipy==1.10.1
seqeval==1.2.2
simplejson==3.19.1
six==1.16.0
sortedcontainers==2.4.0
sympy==1.12
threadpoolctl==3.1.0
tokenizers==0.13.3
tomli==2.0.1
torch==1.13.1
torchvision==0.14.1
tqdm==4.65.0
transformers==4.29.2
triton==2.0.0
typing_extensions==4.6.3
urllib3==2.0.2
xxhash==3.2.0
yapf==0.33.0
yarl==1.9.2

How to reproduce

show/hide

python3  -m scripts.train -c examples/NPCRF/configs/ufet_concat_npcrf.yaml

Code of Conduct

  • I agree to follow this project's Code of Conduct
@averieso averieso added the bug Something isn't working label Jun 2, 2023
@jeffchy
Copy link
Collaborator

jeffchy commented Jun 3, 2023

Hi, could you please provide more details about this issue? e.g., the config file you run, the environments, and screenshots. These will help us find the problem, thanks.

@averieso
Copy link
Author

averieso commented Jun 5, 2023

Please see below for the config file (only the source and target emb files are changed according to the instruction). I'm not sure what information I should provide regarding the environment (except pip freeze above) and screenshots? Thanks.

config file:
experiment:
exp_dir: experiments/
exp_name: ufet
seed: 17

task: entity-typing

dataset:
data_file:
train: 'https://www.modelscope.cn/api/v1/datasets/izhx404/ufet/repo/files?Revision=master&FilePath=train.json'
valid: 'https://www.modelscope.cn/api/v1/datasets/izhx404/ufet/repo/files?Revision=master&FilePath=dev.json'
test: 'https://www.modelscope.cn/api/v1/datasets/izhx404/ufet/repo/files?Revision=master&FilePath=test.json'
tokenizer: blank
lower: true
labels: 'https://www.modelscope.cn/api/v1/datasets/izhx404/ufet/repo/files?Revision=master&FilePath=labels.txt'

preprocessor:
type: multilabel-concat-typing-preprocessor
model_dir: roberta-large
max_length: 150

data_collator: MultiLabelConcatTypingDataCollatorWithPadding

model:
type: multilabel-concat-typing-model
embedder:
model_name_or_path: roberta-large
drop_special_tokens: false
dropout: 0
decoder:
type: pairwise-crf
label_emb_type: glove
label_emb_dim: 300
source_emb_file_path: None
target_emb_dir: /home/averie/name-entity-recognition/experiments/adaseq/glove_embeds # TODO
target_emb_name: glove.300.emb
pairwise_factor: 70
mfvi_iteration: 4
two_potential: false
sign_trick: true
loss_function: WBCE
pos_weight: 4

train:
max_epochs: 30
dataloader:
batch_size_per_gpu: 4
optimizer:
type: AdamW
lr: 2.0e-5
lr_scheduler:
type: cosine
warmup_rate: 0.1 # when choose concat typing model, default to use cosine_linear_with_warmup
options:
by_epoch: false
hooks:
- type: "CheckpointHook"
interval: 100
- type: "BestCkptSaverHook"
save_file_name: "best_model.pt"

evaluation:
dataloader:
batch_size_per_gpu: 32
metrics: typing-metric

@jeffchy
Copy link
Collaborator

jeffchy commented Jun 5, 2023

could you successfully run the default npcrf example?

@averieso
Copy link
Author

averieso commented Jun 5, 2023

how do i run the default example? it requires PATH_TO_DIR to be replaced, which is what I did.

@jeffchy
Copy link
Collaborator

jeffchy commented Jun 5, 2023

decoder:
  type: pairwise-crf
  label_emb_type: glove
  label_emb_dim: 300
  source_emb_file_path: ${PATH_TO_DIR}/glove.6B.300d.txt
  target_emb_dir: ${PATH_TO_DIR}  # TODO
  target_emb_name: glove.300.emb
  pairwise_factor: 70
  mfvi_iteration: 4
  two_potential: false
  sign_trick: true

It seems that your configuration is incorrect, the above shows the default configuration. The glove path can be downloaded from the official stanford website: https://nlp.stanford.edu/data/glove.6B.zip.
The source_emb_file_path should be the absolute path to for example the glove.6B.300d.txt, and the target_emb_dir, is the directory that you want to store the label embedding matrix named with target_emb_name.
In other word, the label embedding is preprocessed from ${YOUR_SRC_EMB_DIR}/glove.6B.300d.txt, and saved to ${YOUR_TGT_EMB_SAVE_DIR}/glove.300.emb

@averieso
Copy link
Author

averieso commented Jun 5, 2023

thank you for your answer. according to the readme in the NPCRF directory:

"NPCRF requires static label embeddings, the preprocessed label embeddings (from GloVe for EN, Tencent for ZH) can be downloaded here: UFET, CFET, and you can place them in yoru folder and run the following config: (you need to reset your target_emb_dir in the config). Or you can provide the path of the glove embedding file (e.g., /path/to/your/glove.6B.300d.txt) and the code will generate label embedding for you."

so i cannot use the glove.300.emb given in this description?

@jeffchy
Copy link
Collaborator

jeffchy commented Jun 5, 2023

Could you please give a screenshot of the error message?
And, can you successfully run the model when you create embedding from the glove source?

@averieso
Copy link
Author

averieso commented Jun 5, 2023

Screenshot 2023-06-05 at 16 11 34

I just tried the glove source, it resulted in the same error (see screenshot)

@jeffchy
Copy link
Collaborator

jeffchy commented Jun 5, 2023

oops, it seems that the bug is caused by the latest update of adaseq in the typing metric. No problem occurs in the training and loading label embeddings, a quick fix could be downgrading the adaseq to 0.6.2 and modelscope to 1.4.2.
We will fix the bug later.

@averieso
Copy link
Author

averieso commented Jun 6, 2023

thanks for the reply. i did pip install adaseq==0.6.2 and pip install modelscope==1.4.2 but still getting the same error. am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants