Added support for HDF5 dataset and an HDF5 creation tool #1468

madisi98 · 2020-10-04T11:10:38Z

Added retinanet-build-hdf5 entry point which allows the creation of datasets in the hdf5 format and a new option 'hdf5' to retinanet-train. This allows the dataset to be loaded in main memory the whole time and drastically reduces training times.

ha-nso-li · 2020-10-04T16:06:30Z

keras_retinanet/bin/build_hdf5.py

+
+import h5py
+import numpy as np
+from tqdm import tqdm


keras-retinanet/keras_retinanet/utils/eval.py

Line 26 in da16864

import progressbar

keras-retinanet/keras_retinanet/utils/coco_eval.py

Line 23 in da16864

import progressbar

There are other scripts using progressbar2 instead of tqdm. I think we should choose one of them.

Agreed.
Already changed and pushed changes :)

hgaiser

Very nice, I've been meaning to try something like this for a while.

Do you have measurements in general for how much time is gained by using the HDF5 format?

hgaiser · 2020-10-06T09:37:26Z

keras_retinanet/preprocessing/hdf5_generator.py

+        return {'labels': self.labels[image_index],
+                'bboxes': self.bboxes[image_index]}
+
+    def compute_input_output(self, group):


You override this to remove the filtering, right? Does it have a large computational impact? I'd expect it to be minimal, in which case it would be cleaner to not override this function. Do you have a measurement for this?

Yes, I removed filtering because filtering happens when creating the hdf5. This process relies in the CSVGenerator class which filters the annotations already, so I considered removing that

madisi98 · 2020-10-06T13:02:08Z

I haven’t made a lot of testing yet, since I haven’t had the time for it, but I was getting roughly 6x faster epochs. I have to say that my dataset is composed of large images (around 3000x2000) and with smaller images the speedup will be less significant.

hgaiser · 2020-10-06T13:05:09Z

I would be interested to see the differences on a more "normal" dataset like COCO. I expect the difference will be much smaller there because AFAIK the most time spent there is from anchor target generation, not data loading.

madisi98 added 3 commits September 20, 2020 20:38

Added build_hdf5.py and an entry point for it

fd67a65

Added support for hdf5 datasets

14066a8

Added abstract methods that were left from implementation

9d2b9e5

ha-nso-li reviewed Oct 4, 2020

View reviewed changes

Swaped tqdm for progressbar2 in build_hdf5.py

8382be0

hgaiser reviewed Oct 6, 2020

View reviewed changes

hsahin changed the base branch from master to main June 17, 2021 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for HDF5 dataset and an HDF5 creation tool #1468

Added support for HDF5 dataset and an HDF5 creation tool #1468

madisi98 commented Oct 4, 2020

ha-nso-li Oct 4, 2020

madisi98 Oct 4, 2020

hgaiser left a comment

hgaiser Oct 6, 2020

madisi98 Oct 6, 2020

madisi98 commented Oct 6, 2020

hgaiser commented Oct 6, 2020

Added support for HDF5 dataset and an HDF5 creation tool #1468

Are you sure you want to change the base?

Added support for HDF5 dataset and an HDF5 creation tool #1468

Conversation

madisi98 commented Oct 4, 2020

ha-nso-li Oct 4, 2020

Choose a reason for hiding this comment

madisi98 Oct 4, 2020

Choose a reason for hiding this comment

hgaiser left a comment

Choose a reason for hiding this comment

hgaiser Oct 6, 2020

Choose a reason for hiding this comment

madisi98 Oct 6, 2020

Choose a reason for hiding this comment

madisi98 commented Oct 6, 2020

hgaiser commented Oct 6, 2020