Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Error when trying to replicate the tutorial for pre-training with custom dataset. #1919

Open
siddhi-wiai opened this issue Jul 9, 2024 · 1 comment

Comments

@siddhi-wiai
Copy link

Branch

main branch (mmpretrain version)

Describe the bug

I have followed the exact same steps as mentioned in the tutorial for pre-training MAE on a custom dataset, but getting the following error:

File "/home/XXX/code_siddhi/mmpretrain/mmpretrain/models/utils/data_preprocessor.py", line 261, in
_input[:, [2, 1, 0], ...] for _input in batch_inputs
TypeError: string indices must be integers

Environment

{'sys.platform': 'linux',
'Python': '3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]',
'CUDA available': True,
'MUSA available': False,
'numpy_random_seed': 2147483648,
'GPU 0': 'NVIDIA L4',
'CUDA_HOME': '/usr/local/cuda',
'NVCC': 'Cuda compilation tools, release 12.3, V12.3.107',
'GCC': 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0',
'PyTorch': '1.10.1',
'TorchVision': '0.11.2',
'OpenCV': '4.10.0',
'MMEngine': '0.10.4',
'MMCV': '2.0.1',
'MMPreTrain': '1.2.0+17a886c'}

Other information

  • Did not make any extra changes from my side
  • Seems like an issue with the cast_data() from mmengine.model.BaseDataPreprocessor.
@keiohta
Copy link

keiohta commented Aug 6, 2024

Hi @siddhi-wiai , I encountered the same issue and found a solution as follows:

  1. specify train_pipeline (just copied from the base dataset)
  2. add _delete_=True to avoid an error
# >>>>>>>>>>>>>>> Override dataset settings here >>>>>>>>>>>>>>>>>>>
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='RandomResizedCrop',
        scale=224,
        crop_ratio_range=(0.2, 1.0),
        backend='pillow',
        interpolation='bicubic'),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PackInputs')
]
train_dataloader = dict(
    batch_size=128,
    dataset=dict(
        type='CustomDataset',
        data_root='data/custom_dataset/',
        ann_file='',  # We assume you are using the sub-folder format without ann_file
        data_prefix='',  # The `data_root` is the data_prefix directly.
        with_label=False,
        _delete_=True,  # Need to remove `split` keyword
        pipeline=train_pipeline  # Need to specify pipeline
    )
)
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants