Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion Error cfg.mdl.mdl_name == "sf_base" when running main_dist.py #12

Open
yrf1 opened this issue Jul 2, 2021 · 7 comments
Open

Comments

@yrf1
Copy link

yrf1 commented Jul 2, 2021

I ran python main_dist.py "experiment_name" without further specifying the --arg1 and --arg2 flags but got

  quant_noise_pq: 0
  quant_noise_pq_block_size: 8
  quant_noise_scalar: 0
  share_all_embeddings: False
  share_decoder_input_output_embed: False
  tie_adaptive_weights: False
uid: experiment_name
val_dl_name: valid
Traceback (most recent call last):
  File "main_dist.py", line 172, in <module>
    fire.Fire(main_dist)
  File "/shared/nas/data/users//data_preproc/VidSitu/vsitu_pyt/lib/python3.7/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/shared/nas/data/users/data_preproc/VidSitu/vsitu_pyt/lib/python3.7/site-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/shared/nas/data/users/data_preproc/VidSitu/vsitu_pyt/lib/python3.7/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "main_dist.py", line 160, in main_dist
    launch_job(cfg, init_method="tcp://localhost:9997", func=main_fn)
  File "/shared/nas/data/users/data_preproc/VidSitu/utils/trn_dist_utils.py", line 42, in launch_job
    func(cfg=cfg)
  File "main_dist.py", line 96, in main_fn
    learn = learner_init(uid, cfg)
  File "main_dist.py", line 34, in learner_init
    mdl_loss_eval = get_mdl_loss_eval(cfg)
  File "/shared/nas/data/users/data_preproc/VidSitu/vidsitu_code/mdl_selector.py", line 29, in get_mdl_loss_eval
    assert cfg.mdl.mdl_name == "sf_base"
AssertionError

So we should be using sf_base instead of the GPT model in config?

Also, I think it'll be cool if there can be some instructions showing how VidSitu can be applied to external data (like the data preprocessing steps and change in commands for inference etc).

@yrf1 yrf1 closed this as completed Jul 2, 2021
@yrf1 yrf1 reopened this Jul 2, 2021
@yrf1 yrf1 changed the title How to Specify "split_file_path", "vinfo_file_path", and "vsitu_ann_file_path" Path for Inference Assertion Error cfg.mdl.mdl_name == "sf_base" when running main_dist.py Jul 2, 2021
@TheShadow29
Copy link
Owner

@yrf1 to run the gpt2 model you need the following command:

python main_dist.py <exp_name> --train.bs=... --train.bsv=... --task_type=vb_arg --mdl.mdl_name=new_gpt2_only 

In general, you can find the command inside logs by Ctrl+F cmd

Let me know if you face any other issues.

@yrf1
Copy link
Author

yrf1 commented Jul 6, 2021

Thanks, I have some follow-up questions:

  1. How does the configs for calling on a pretrained verb prediction model match with the checkpoints released in this repo? The model checkpoints in EXPTs.md are pytorch checkpoints, but when I try to update the configs in configs/vsitu_mdl_cfgs/Kinetics_c2_SLOWFAST_8x8_R50.yaml to the pytorch checkpoints (which is called from configs/vsitu_cfg.yml), I run into checkpoint loading errors.
VidSitu/SlowFast/slowfast/utils/checkpoint.py", line 270, in load_checkpoint
    checkpoint["model_state"], model_state_dict_3d
KeyError: 'model_state'

I used the slow_fast_nl_r50_8x8 mdl_ep_10.pth.

I have trouble figuring the information out from looking through the log file corresponding to slow_fast_nl_r50_8x8 mdl_ep_10.pth because the "val" part in the log file shows an empty checkpoint path while the "train" part in the log file uses the caffe checkpoint which I think you guys already converted into pytorch before release.

@TheShadow29
Copy link
Owner

@yrf1

So the config inside configs/vsitu_mdl_cfgs/Kinetics_c2_SLOWFAST_8x8_R50.yaml is for pre-trained Slowfast model trained over Kinetics.

But if you want to use some of our checkpoint, you should pass --train.resume='....' and --train.resume_path='/path/to/model'

Does that answer your question?

@yrf1
Copy link
Author

yrf1 commented Jul 7, 2021

Hi @TheShadow29, thanks for the follow-up. So I'm still trying to apply pretrained VidSitu on my dataset. I tried running a command like python main_dist.py "experiment_name" --train.resume_path='weights/vbarg_sfastpret_txe_txd_18Feb21.pth' for the semantic role labeling task using pretrained models, but it appears that the code calls on a data/vsitu_vid_feats directory (from line 569 in vidsitu_code/dat_loader.py).

Should this have happened? If so, how should the video features be computed for running VidSitu on external data?

@TheShadow29
Copy link
Owner

@yrf1 I see that I forgot to put up the feature extraction code. I will upload it within end of day. If you are in a hurry, it initializes SFBase model, and uses the head_out after permute (https://github.com/TheShadow29/VidSitu/blob/main/vidsitu_code/mdl_sf_base.py#L195) and saves in npy file.

@TheShadow29
Copy link
Owner

@yrf1 The feature extraction code is up now vidsitu_code/feat_extractor.py

Instructions are provided in DATA_PREP.md inside data/

Let me know if you face any issues.

@yrf1
Copy link
Author

yrf1 commented Jul 7, 2021

@TheShadow29 thank you! I'll try it out right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants