Dtype and scale conversion in V2 #7756

NicolasHug · 2023-07-24T15:05:58Z

Our UX for converting Dtype and scales is bad and error-prone in V2.

In #7743 we have a sample with an Image and a Mask. We need to:

convert the image from uint8 to float and convert its scale from 0-255 to 0-1
convert the mask from uint8 to int64, without converting its scale

The only way I found to do that right now is:

            dtype= {
                (datapoints.Image if backend == "datapoint" else torch.Tensor): None,
                datapoints.Mask: torch.int64,
            }
            transforms += [T.ToDtype(dtype=dtype)]
            transforms += [T.ConvertDtype(dtype=torch.float)]

We should at the very least:

Clearly distinguish between ConvertDtype and ToDtype. It should be absolutely clear that ConvertDtype also scales the values. ConvertDtypeAndScale() is a decent candidate. Any new name is probably an improvement on the status quo.
Not error when (datapoints.Image if backend == "datapoint" else torch.Tensor): None, isn't passed. If I'm not converting a type, I shouldn't have to pass it as input.

Ideally I think we can get rid of ConvertDtype and just add a scale parameter to ToDtype():

scale=False means no scaling happens
scale=True means all transformed inputs are scaled into the range specified by their dtype
scale=Image means only Image instances are scaled
scale=(Image, Mask) means only Images and Masks instances are scaled.

The text was updated successfully, but these errors were encountered:

pmeier · 2023-07-25T07:16:56Z

The only way I found to do that right now is:

The way we have envisioned it, is to function similar to fill

vision/test/test_transforms_v2_consistency.py

Line 1246 in a6dea86

PadIfSmaller(size=480, fill=defaultdict(lambda: 0, {datapoints.Mask: 255})),

For example

transforms.ToDtype(dtype=defaultdict(lambda: None, {datapoints.Mask: torch.int64}))

Clearly distinguish between ConvertDtype and ToDtype. It should be absolutely clear that ConvertDtype also scales the values. ConvertDtypeAndScale() is a decent candidate. Any new name is probably an improvement on the status quo.

👍

Not error when (datapoints.Image if backend == "datapoint" else torch.Tensor): None, isn't passed. If I'm not converting a type, I shouldn't have to pass it as input.

IIRC, we did that to avoid silently missing to convert something. Like I stated above, our vision was for users to either specify everything or use a defaultdict to indicate that only a few explicit types should be transformed. Fully agree that this is not properly documented right now.

Ideally I think we can get rid of ConvertDtype and just add a scale parameter to ToDtype():

scale=False means no scaling happens

scale=True means all transformed inputs are scaled into the range specified by their dtype

scale=Image means only Image instances are scaled

scale=(Image, Mask) means only Images and Masks instances are scaled.

In general, I'm onboard with this change. However, I'm not sure we need the extra options besides just the boolean flag. We only have logic to scale images and videos and I can't think of a way to scale the other datapoints. Like, what would a scaled mask even be? Maybe that is applicable to bool-ish detection masks, but certainly not for segmentation masks.

Thus, I would just use scale: bool = False as default and if True always apply to all images and videos that we find.

pmeier · 2023-07-25T07:19:05Z

Does that mean we also create a new dispatcher called to_dtype with a scale parameter that "replaces" the old convert_dtype? Right now the former doesn't exists and we simply call .to on the input:

vision/torchvision/transforms/v2/_misc.py

Lines 253 to 257 in a6dea86

    
           def _transform(self, inpt: Any, params: Dict[str, Any]) -> Any: 
        
               dtype = self.dtype[type(inpt)] 
        
               if dtype is None: 
        
                   return inpt 
        
               return inpt.to(dtype=dtype)

NicolasHug · 2023-07-25T08:36:57Z

Thus, I would just use scale: bool = False as default and if True always apply to all images and videos that we find.

Sounds fine, OK

fill=defaultdict(lambda: 0, {datapoints.Mask: 255}))

FWIW I find this to be a bad UX and it's really difficult to understand what's going on for a non-expert user. There's just too much to unpack. We should have a simpler way to let users specify "for all the rest, use 0". Even just defaultdict(lambda: None) is reasonably advanced Python and we shouldn't expect our users to use that kind of idiom just so that they can do the most common tasks of every pre-proc pipeline. Just imagine yourself in front of a class of 50 students teaching how to train a model and having to unpack this line - I don't want to be in these shoes.

I think we should either pass-through non-specified input OR have a much much easier way of specifying what happens to "all the rest". Happy to hear suggestions with the latter. I feel like allowing dtype={datapoints.Mask: torch.int64, "others": None} could already be an improvement

NicolasHug mentioned this issue Jul 25, 2023

Add scale option to ToDtype. Remove ConvertDtype. #7759

Merged

NicolasHug closed this as completed in #7759 Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dtype and scale conversion in V2 #7756

Dtype and scale conversion in V2 #7756

NicolasHug commented Jul 24, 2023 •

edited

Loading

pmeier commented Jul 25, 2023

pmeier commented Jul 25, 2023

NicolasHug commented Jul 25, 2023

Dtype and scale conversion in V2 #7756

Dtype and scale conversion in V2 #7756

Comments

NicolasHug commented Jul 24, 2023 • edited Loading

pmeier commented Jul 25, 2023

pmeier commented Jul 25, 2023

NicolasHug commented Jul 25, 2023

NicolasHug commented Jul 24, 2023 •

edited

Loading