Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config for irrigation_scenes and custom SpatioTemporalDataset loader #13

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Aug 3, 2023

A mmsegmentation configuration file for the irrigation_scenes dataset on https://huggingface.co/datasets/ibm-nasa-geospatial/hls_irrigation_scenes.

As this is a time-series dataset with data from four months stored in four different folders, a custom SpatioTemporalDataset class (subclassed from GeospatialDataset) and LoadSpatioTemporalImagesFromFile class (subclassed from LoadGeospatialImageFromFile) was created to perform the data loading. Training with only the first 3 months (June, July, August) for now. Also updated the fine-tuning-examples/README.md to mention how to run the irrigation_scenes setup.

Xref original work at https://github.com/NASA-IMPACT/hls-foundation/pull/30 and https://github.com/NASA-IMPACT/hls-foundation/pull/35

P.S. This is the same branch as #4, but that one got closed somehow during the private->public conversion of the repo.

Initial mmsegmentation configuration file for the irrigation_scenes dataset on https://huggingface.co/datasets/ibm-nasa-geospatial/hls_irrigation_scenes. As this is a time-series dataset with data from four months stored in four different folders, a custom SpatioTemporalDataset class (subclassed from GeospatialDataset) and LoadSpatioTemporalImagesFromFile class (subclassed from LoadGeospatialImageFromFile) was created to perform the data loading. Training with only the first 3 months (June, July, August) for now. Also updated the fine-tuning-examples/README.md to mention how to run the irrigation_scenes setup.
Config folder has moved from the fine-tuning-examples folder up to the root directory in 464e9f2/NASA-IMPACT#8, so no need to do `../` anymore.
The old open_tiff function used rasterio.open which stacked the bands/channels in the first position (CHW), but moving to tiffile.imread in 86e9ba9 changed the stacking to the last position (HWC). Need to use channel last (NHWC) for the RandomFlip function since it is somewhat hardcoded to flip on axis 1, and then use TorchPermute to change to channel first (NCHW) so that TorchNormalize (using torchvision which expects BCHW) works.
Hacky way to avoid `KeyError: 'ann_info'` by setting `results["ann_info"]["seg_map"]` to `results["img_info"]["ann"]["seg_map"]`. Also edited docstring of the LoadGeospatialAnnotations class slightly. Cherry-picked from NASA-IMPACT/hls-foundation@e5fb7ab.
Making sure that the test_pipeline is consistent with the training and validation pipeline.
@weiji14
Copy link
Member Author

weiji14 commented Aug 3, 2023

Getting a TypeError: imgs must be a list, but got <class 'torch.Tensor'> on the validation stage in the forward_test function:

2023-08-03 17:03:11,629 - mmseg - INFO - workflow: [('train', 1)], max: 5000 iters
2023-08-03 17:03:11,629 - mmseg - INFO - Checkpoints will be saved to finetune_weights/irrigation_scenes/test_1/test_1 by HardDiskBackend.
2023-08-03 17:03:22,052 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2023-08-03 17:03:58,935 - mmseg - INFO - Iter [20/5000]	lr: 1.893e-07, eta: 3:15:17, time: 2.353, data_time: 0.046, memory: 6031, decode.loss_ce: 3.3195, decode.acc_seg: 6.3938, aux.loss_ce: 3.4039, aux.acc_seg: 1.7806, loss: 6.7234
[                                                  ] 0/281, elapsed: 0s, ETA:Traceback (most recent call last):
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/.mim/tools/train.py", line 242, in <module>
    main()
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/.mim/tools/train.py", line 231, in main
    train_segmentor(
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/apis/train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 134, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 67, in train
    self.call_hook('after_train_iter')
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/hooks/evaluation.py", line 262, in after_train_iter
    self._do_evaluate(runner)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/core/evaluation/eval_hooks.py", line 117, in _do_evaluate
    results = multi_gpu_test(
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/apis/test.py", line 208, in multi_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/models/segmentors/base.py", line 110, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/username/mambaforge/envs/hls/lib/python3.9/site-packages/mmseg/models/segmentors/base.py", line 74, in forward_test
    raise TypeError(f'{name} must be a list, but got '
TypeError: imgs must be a list, but got <class 'torch.Tensor'>

This is the same one reported before at https://github.com/NASA-IMPACT/hls-foundation/pull/30#issuecomment-1603652525, which was fixed with some hacky workarounds to modify the default collate function in mmsegmentation's code here:

https://github.com/NASA-IMPACT/hls-foundation/blob/35edfb54057b18d2840b0e674277248797208b6f/mmsegmentation/mmseg/models/segmentors/base.py#L72-L75

Doesn't look possible to apply the same old workaround here anymore, so would need to find a different solution. Xref upstream issue at open-mmlab/mmsegmentation#2410

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant