Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalisation for RGB imshow #1819

Merged
merged 2 commits into from
Jan 19, 2018
Merged

Normalisation for RGB imshow #1819

merged 2 commits into from
Jan 19, 2018

Conversation

Zac-HD
Copy link
Contributor

@Zac-HD Zac-HD commented Jan 11, 2018

Follow-up to #1796, where normalisation and clipping of RGB[A] values were deferred so that we could match any upstream API. matplotlib/matplotlib#10220 implements clipping to the valid range, but a strong consensus against RGB normalisation in matplotlib has emerged.

This pull therefore implements normalisation, and clips values only where our normalisation has pushed them out of range.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 11, 2018

CC @shoyer - hopefully I'm in time for Xarray 0.10.1 😉

@jhamman jhamman mentioned this pull request Jan 11, 2018
5 tasks
q=[ROBUST_PERCENTILE, 1 - ROBUST_PERCENTILE])
vmin = vmin if vmin is not None else vmin_tmp
vmax = vmax if vmax is not None else vmax_tmp
del vmin_tmp, vmax__tmp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo. vmax_tmp

if vmin is not None or vmax is not None:
vmin = vmin if vmin is not None else darray.min()
vmax = vmax if vmax is not None else darray.max()
darray = (darray.astype(np.float32) - vmin) / (vmax - vmin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ran into dynamic range issues w/ uint32 being converted to float32, particularly as some people use an extreme high value as a bad-data flag. matplotlib/matplotlib#10133

@@ -625,6 +641,10 @@ def imshow(x, y, z, ax, **kwargs):
dimension can be interpreted as RGB or RGBA color channels and
allows this dimension to be specified via the kwarg ``rgb=``.

Unlike matplotlib, Xarray can apply ``vmin`` and ``vmax`` to RGB or RGBA
data, by applying a single scaling factor and offset to all bands.
Passing ``robust=True`` infers ``vmin`` and ``vmax`` in the usual way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... infers vmin and vmax from the 96-th percentile of the data distribution. (Or reference the “usual way” somehow).

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for this. Can you add an additional note to the whats new page in the docs?

vmin = vmin if vmin is not None else vmin_tmp
vmax = vmax if vmax is not None else vmax_tmp
del vmin_tmp, vmax__tmp
robust=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 nits: 1) let garbage collection handle deleting your temp variables, 2) space in robust = False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use locals() somewhere below, which is why this is necessary (but yes it's pretty ugly)

# Test for crash when normalising RGB data. (further tests require a
# reference image, as the args used to be silently ignored instead)
da = easy_array((5, 5, 3), start=-0.6, stop=1.4)
da.plot.imshow(**kwds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can test that the range of the values in this plot are correct.

vmin_tmp, vmax_tmp = darray.quantile(
q=[ROBUST_PERCENTILE, 1 - ROBUST_PERCENTILE])
vmin = vmin if vmin is not None else vmin_tmp
vmax = vmax if vmax is not None else vmax_tmp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should skip the quantile calculation when both vmin/vmax are set. Then again, if robust is True, the behavior is poorly defined and maybe an error would be better.

@Zac-HD Zac-HD force-pushed the rgb-maps branch 5 times, most recently from 007ebff to 4d3a1b7 Compare January 12, 2018 13:51
vmin, vmax = None, None
# There's a cyclic dependency via DataArray, so we can't
# import xarray.ufuncs in global or outer scope.
import xarray.ufuncs as xu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to fix this.

if vmin is not None or vmax is not None:
vmin = vmin if vmin is not None else darray.min()
vmax = vmax if vmax is not None else darray.max()
darray = darray.astype('f4' if vmax - vmin < 2 ** 32 else 'f8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use np.finfo('f4').max here instead of 2**32.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was uint32 that gave us trouble on Matplotlib. i.e. way less than f4.max (I think?). See matplotlib/matplotlib#10133. I was too lazy to figure out the actual limit (I guess its 2^N, where N is how many bits a float has dynamic range over. I think its 32-1-8 = 2**23?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can use np.iinfo instead:

Out[12]: iinfo(min=0, max=4294967295, dtype=uint32)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but thats not the problem. The problem is that if you convert (a-vmin)/(1e9) then you can't tell the difference between a=1 and a=2 anymore if you only use float32.

This came up in Matplotlib because the conversion to float happened with an automatic vmin and vmax, and the user had a large value as a bad-data flag. Then the user subsequently set vmin - vmax = 10 or so, but the convert to float32 had erased all the dynamic between 0 and 10.

Maybe xarray doesn't have this problem because you trust the user to set vmin and vmax during the imshow call. But if someone grabs the image handle and tries to set the limits after the call they could have this problem.

Copy link
Contributor Author

@Zac-HD Zac-HD Jan 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more deeply I analyse matplotlib/matplotlib#10133, the more suspicious I get - there are actually two places you could get loss of precision: subtracting vmin (if <<0, loss of precision for high values), and dividing by (vmin - vmax) (loss if vmax is large).

I think for Xarray, the best option is actually just to use np.float64 in all cases. We already have a habit of upcasting all the time anyway*, and if that's lossy you're in trouble anyway.

* (including uint16 to float64 when decoding variables from a file, which is silly - issue incoming for that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vmin and A are still ints when vmin is removed from A.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case you are vulnerable to underflow, and hence wraparound of values - that's why I cast to float64 here first!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow.

The case we were running into was

A = np.array( [[1, 2, 1e9], [3, 4, 5]], dtype=np.uint32)
im = plt.imshow(A)
im.set_clim([0., 5.])

When the data in the image is first input, vmin = 1 and vmax = 1e9, and the underflow occurred when we cast to np.float32, which we did to help accommodate very large images. Hence the check on vmax - vmin to see if it is larger than 1e8 and to use np.float64 for the case instead.

Casting to float64 first is great but apparently people sometimes have huge images, hence the extra rigamarole w/ using float32 if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vmin and A are still ints when vmin is removed from A.

Going and reading the code again, it's not - a_min is subtracted from A_scaled, which has a floating dtype. In which case you're also safe from overflow/underflow, and have just added handling for loss of precision 😄

I remain happy with this solution - for Xarray, including passing non-RGB images directly to matplotlib - and suggest we take any further discussion to the upstream pull.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 14, 2018

@jhamman - all done 😄

# There's a cyclic dependency via DataArray, so we can't
# import xarray.ufuncs in global or outer scope.
import xarray.ufuncs as xu
darray = xu.minimum(xu.maximum(darray.astype('f4'), 0), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to type cast this back to a 4byte float?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer - do you have a solution off had for the cyclic import problem here?

Copy link
Contributor Author

@Zac-HD Zac-HD Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to downcast, in that it will work even if we don't, but downcasting ~halves memory use further inside matplotlib which can be a 5-10x multiple of the input data size. Happy to remove if you don't think the performance hit is worth it.

The best alternative solution I can see is refactoring xarray.ufuncs to do all the imports inside a function or class, and pay the performance cost when creating ufuncs and slightly increased overhead in dispatch. Personally, I prefer a slight slowdown when creating scaled RGB images - much less common, and it's probably negligible given the array operations involved anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with downcasting, let's just add a comment to explain why.

I don't think we can avoid the cycle import issue. This is a downside of our method heavy design. However, it would be nice if we moved the import statement to the top of the method.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 17, 2018

Ping @shoyer, @jhamman - I think we're out of blockers and ready to merge 😄

# Scale interval [vmin .. vmax] to [0 .. 1] and clip to bounds
if vmin is not None or vmax is not None:
vmin = vmin if vmin is not None else darray.min()
vmax = vmax if vmax is not None else darray.max()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default here for vmin/vmax be 0/1 rather than the min/max values of the array? That would seem a little less surprising to me. Otherwise setting one of vmin/vmax would cause the other to be automatically set which seems strange.

# There's a cyclic dependency via DataArray, so we can't
# import xarray.ufuncs in global or outer scope.
import xarray.ufuncs as xu
darray = xu.minimum(xu.maximum(darray.astype('f4'), 0), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with downcasting, let's just add a comment to explain why.

I don't think we can avoid the cycle import issue. This is a downside of our method heavy design. However, it would be nice if we moved the import statement to the top of the method.

@@ -449,10 +449,28 @@ def newplotfunc(darray, x=None, y=None, figsize=None, size=None,
if imshow_rgb:
# Don't add a colorbar when showing an image with explicit colors
add_colorbar = False
# Calculate vmin and vmax automatically for `robust=True`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this logic to a separate helper function, e.g., _rescale_imshow_rgb(darray, vmin, vmax, robust)? This method is already getting pretty long.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 18, 2018

Method extracted, default for missing bounds fixed, comment added, import moved. Also added a check that you haven't accidentally reversed the bounds while supplying only one.

elif vmax is None:
vmax = 255 if np.issubdtype(darray.dtype, np.integer) else 1
if vmax < vmin:
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error checks look great, thanks! Can you add a test that covers them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 😄

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @jhamman any further concerns?

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 19, 2018

Ping @jhamman?

@jhamman jhamman merged commit 6aa225f into pydata:master Jan 19, 2018
@jhamman
Copy link
Member

jhamman commented Jan 19, 2018

In it goes. Thanks @Zac-HD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants