When I trained 11G S3DIS, there was an error #139

Avril-Dragon · 2024-07-16T12:52:45Z

Traceback (most recent call last):
  File "/media/wcj/A4D4C4CFD4C4A4C01/zl/superpoint_transformer-master/src/utils/utils.py", line 45, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "src/train.py", line 115, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 543, in fit
    call._call_and_handle_interrupt(
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _run
    results = self._run_stage()
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1028, in _run_stage
    self._run_sanity_check()
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1057, in _run_sanity_check
    val_loop.run()
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 135, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 370, in _evaluation_step
    batch = call._call_strategy_hook(trainer, "batch_to_device", batch, dataloader_idx=dataloader_idx)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 311, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 277, in batch_to_device
    return model._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 359, in _apply_batch_transfer_handler
    batch = self._call_batch_hook("on_after_batch_transfer", batch, dataloader_idx)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 347, in _call_batch_hook
    return trainer_method(trainer, hook_name, *args)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 181, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/media/wcj/A4D4C4CFD4C4A4C01/zl/superpoint_transformer-master/src/datamodules/base.py", line 333, in on_after_batch_transfer
    return on_device_transform(nag)
  File "/home/wcj/anaconda3/envs/spt7/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 24, in __call__
    data = transform(data)
  File "/media/wcj/A4D4C4CFD4C4A4C01/zl/superpoint_transformer-master/src/transforms/transforms.py", line 23, in __call__
    return self._process(x)
  File "/media/wcj/A4D4C4CFD4C4A4C01/zl/superpoint_transformer-master/src/transforms/graph.py", line 1359, in _process
    nag[i_level].node_size = nag.get_sub_size(i_level, low=self.low)
  File "/media/wcj/A4D4C4CFD4C4A4C01/zl/superpoint_transformer-master/src/data/nag.py", line 58, in get_sub_size
    sub_sizes = self[low + 1].sub.sizes
AttributeError: 'list' object has no attribute 'sizes'

I print the related param in nag.py like:

print(self[low+1])
print(self[low + 1].sub)
print(type(self[low + 1].sub))

log_size=[19731, 1], log_surface=[19731, 1], log_volume=[19731, 1], normal=[19731, 3], super_index=[19731],
sub=[1], batch=[19731], ptr=[2])
[Cluster(num_clusters=19731, num_points=660524, device=cuda:0)]
<class 'list'>

Thanks all the help you provide

The text was updated successfully, but these errors were encountered:

drprojects · 2024-07-16T22:12:14Z

It seems self[low + 1].sub is a List(Cluster) instead of simply being a Cluster. This is the first time I see this issue, I am not sure how it appeared yet. Have you made any modification to the code, even minor ? Can you please share the exact bash command are you running ?

If you ❤️ or use this project, don't forget to give it a ⭐, it means a lot to us !

Avril-Dragon · 2024-07-17T14:58:06Z

In order to ensure that I did not make any modifications, I re -decompressed the ZIP, and only pressed the data set in. When running, I often encounter PIPE LINE ERROR because of insufficient memory, but after re -execution, I successfully generated S3DIS data, and then I encountered the above error.
by the way,i find the WARN in Processing

if any problems in my method?

va-kiet · 2024-07-17T15:18:38Z

I have the same problem when training on DALES dataset without any modification to the code. The only difference is that I ran on python venv instead of conda environment (but I don't think it really matters). Here the logs I've got:

output.log

and when I print self[low + 1].sub, the output is: [Cluster(num_clusters=42880, num_points=1324840, device=cuda:0), Cluster(num_clusters=35140, num_points=1080737, device=cuda:0), Cluster(num_clusters=30092, num_points=959261, device=cuda:0), Cluster(num_clusters=36576, num_points=1147082, device=cuda:0)]

I guest the issue lies in the process of packing data into batchs. In my case, the batch_size was 4 resulted a batch of 4 Clusters but in the type of List, and then the whole List was pushed into the transformation progress instead of a single Cluster, which maybe the reason of this problem.

va-kiet · 2024-07-17T17:57:37Z

I have solved this issue by editting the line 933 of src/data/data.py, deleting and isinstance(batch.sub, Cluster) will work. After checked the previous version of this file, I've found that isinstance(batch.sub, Cluster) will always return False in this condition so the batch will be stuck in List data type instead of being convert to ClusterBatch.

Avril-Dragon · 2024-07-18T10:23:50Z

it works! Thanks!

drprojects · 2024-07-19T12:42:23Z

Good catch @va-kiet ! There was indeed an error there, since the PyG behavior of Batch.from_data_list() would return a List(Cluster) by default. Your fix was the correct one, I integrated this in the latest commit.

Avril-Dragon closed this as completed Jul 18, 2024

gitKincses mentioned this issue Jul 19, 2024

AttributeError: 'list' object has no attribute 'sizes' #141

Closed

drprojects added a commit that referenced this issue Jul 19, 2024

fix for issues #139 #141

1bafb71

drprojects added a commit to vschelbi/superpoint_transformer_vschelbi that referenced this issue Aug 7, 2024

fix for issues drprojects#139 drprojects#141

1aed619

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When I trained 11G S3DIS, there was an error #139

When I trained 11G S3DIS, there was an error #139

Avril-Dragon commented Jul 16, 2024 •

edited by drprojects

Loading

drprojects commented Jul 16, 2024 •

edited

Loading

Avril-Dragon commented Jul 17, 2024

va-kiet commented Jul 17, 2024 •

edited

Loading

va-kiet commented Jul 17, 2024 •

edited

Loading

Avril-Dragon commented Jul 18, 2024

drprojects commented Jul 19, 2024

When I trained 11G S3DIS, there was an error #139

When I trained 11G S3DIS, there was an error #139

Comments

Avril-Dragon commented Jul 16, 2024 • edited by drprojects Loading

drprojects commented Jul 16, 2024 • edited Loading

If you ❤️ or use this project, don't forget to give it a ⭐, it means a lot to us !

Avril-Dragon commented Jul 17, 2024

va-kiet commented Jul 17, 2024 • edited Loading

va-kiet commented Jul 17, 2024 • edited Loading

Avril-Dragon commented Jul 18, 2024

drprojects commented Jul 19, 2024

Avril-Dragon commented Jul 16, 2024 •

edited by drprojects

Loading

drprojects commented Jul 16, 2024 •

edited

Loading

va-kiet commented Jul 17, 2024 •

edited

Loading

va-kiet commented Jul 17, 2024 •

edited

Loading