Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support RGB[A] arrays in plot.imshow() #1796

Merged
merged 4 commits into from
Jan 11, 2018
Merged

Conversation

Zac-HD
Copy link
Contributor

@Zac-HD Zac-HD commented Dec 20, 2017

  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Passes git diff upstream/master **/*py | flake8 --diff
  • Fully documented, including whats-new.rst for all changes

This patch brings xarray.plot.imshow up to parity with matplotlib.pyplot.imshow:

  • As well as 2D images (greyscale / luminance, using a colormap), it now supports a third dimension for RGB or RGBA channels. For consistency with 2D arrays, missing data is plotted as transparent pixels
  • Being Xarray, users need not care about the order of their dimensions - we infer the right one for color, and warn if it's ambiguous.
  • Using robust=True for easy saturation is really nice. Having it adjust each channel and facet in the same way is essential for this to work, which it does.
  • Matplotlib wraps out-of-range colors, leading to crazy maps and serious interpretation problems if it's only a small region. Xarray clips (ie saturates) to the valid range instead.

I'm going to implement clip-to-range and color normalization upstream in matplotlib, then open a second PR here so that Xarray can use the same interface.

And that's the commit log! It's not really a big feature, but each of the parts can be fiddly so I've broken
the commits up logically 😄

Finally, a motivating example: visible-light Landsat data before, during (top-right), and after a fire at Sampson's Flat, Australia:

arr = ds['red green blue'.split()].to_array(dim='band') / (2 ** 12)
arr.plot.imshow(col='time', col_wrap=5, robust=True)

image

DataArray(
easy_array((10, 15, 3), start=0),
dims=['y', 'x', 'band'],
).plot.imshow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check that the colors have actually rendered correctly by introspecting deeper into the figures / axes that are generated by imshow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's almost certainly possible - but I have absolutely no idea where to start!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay relying on matplotlib to plot colors properly. One sane option would be to mock plt.imshow to ensure it gets called properly, but we don't bother with that for our current tests. But that shouldn't get in the way of integration tests that verify that we can actually call matplotlib without any errors being raised.

@shoyer
Copy link
Member

shoyer commented Dec 20, 2017

I like this. My main concern is that I'd like a fully explicit way to opt into this behavior. Perhaps rgb='band' and/or rgba='band'?

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 21, 2017

I like this. My main concern is that I'd like a fully explicit way to opt into this behavior. Perhaps rgb='band' and/or rgba='band'?

I have a pretty strong preference for RGB images 'just working' - requiring a new argument would mean imshow has a different signature to all other plot methods without adding any information. If it's an optional argument, it barely does anything - in the rare case where the heuristic fails, you can supply x and y dims instead.

In many ways showing an RGB image is a special case, but this is the least-special-case I could make work - and it's consistent with the obvious meaning and upstream behavior of imshow.

@shoyer
Copy link
Member

shoyer commented Dec 21, 2017

I should clarify: I don't think we should require the explicit name for the RGB dimension, but I think we should have an argument around so that's an option if desired. This is nice because it gives better error messages when things go wrong and makes it easier to write self-documented code. This would be consistent with the rest of our visualization functions, which provide an option for being fully explicit but don't require it.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 21, 2017

Optional argument coming up then 😄

Anything else I need to do?

@Zac-HD Zac-HD force-pushed the rgb-maps branch 4 times, most recently from b378a48 to 540300d Compare December 21, 2017 12:45
@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 21, 2017

Done - now with optional argument and nicer commit history, Anything else?

robust = False
del flat
# Clip range to [0, 1] to avoid visual artefacts
darray.values[:] = np.clip(darray.values, 0, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify that this doesn't modify the DataArray object on which the plot method is being called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I've tossed in a .copy(deep=True) to guarantee that there's no mutation.

# All data will be masked, so skip percentile calculation
vmin, vmax = 0, 1
if vmin is None:
vmin = np.percentile(flat, ROBUST_PERCENTILE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure that this handling of the robust argument makes sense. Does this resemble what vmin/vmax do by default on imshow?

Parts of this that seem strange to me:

  • robust=True does more than merely clipping the data to the robust range, unlike the behavior with colormaps. The data gets extra normalization, even if it all already fell within the central 95%.
  • all channels are treated together, rather than independently, including the alpha channel!

I'm sure this behavior is useful sometimes, but maybe it's better to leave it out for now. We could make robust=True an error for now, and potentially save color normalization for another keyword argument.

Copy link
Contributor Author

@Zac-HD Zac-HD Dec 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic makes robust=True work exactly the same way on single-band and RGB greyscale images. The handling is only distinct because the 'normalise to [0, 1]' step is implicit for single bands but must be explicit for RGB. Scaling each band independently would completely change the colour of the image, which is almost never what you'd want to do.

It does make sense to normalise RGB without considering or changing the alpha channel though; I'll play with this a little more. I can't think of a way to describe this that isn't "here's a weird edge case...", and the implementation would be even worse.

As to usefulness, it's basically essential to sensibly visualise surface reflectance data (ie high-quality satellite images) - they're represented for analysis as albedo, ie fraction of light reflected, and if you display that directly it's a very dark and muddy image (example below; same scene as above). So I'd strongly prefer to keep this in!

download

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels strange to me. With colormaps, all robust=True does is explicitly set vmin and vmax based on the data. But vmin and vmax are entirely ignored here.

What if we interpreted vmin/vmax with rgb plots as setting the bounds for normalization? The defaults would be (0, 1) for floats and (smallest int, largest int) for int dtypes. Then, robust=True could again simply set vmin/vmax based on the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree that the current behavior is strange and that your suggestion is an improvement (though I'd use [0, 255] for all ints) - but it's consistent with matplotlib! My proposed plan has some extra steps though:

  1. Merge this pull. We will need the stretching machinery regardless, unless Xarray drops support for matplotlib<2.2
  2. Patch imshow doesn't normalize the color range in RGB images  matplotlib/matplotlib#9391 upstream. (I'm already working on a patch for this)
  3. Implement the new matplotlib interface in Xarray. I could do that in this pull, but then if the matplotlib implementation doesn't match we'd have to break API compatibility - I'd really prefer to have a smaller API now and get it right the first time, later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that in this pull, but then if the matplotlib implementation doesn't match we'd have to break API compatibility - I'd really prefer to have a smaller API now and get it right the first time, later.

I agree. But I'm also concerned about the behavior for robust=True potentially needing to change to match upstream.

My preference would be to leave out support for robust with RGB plots until the matplotlib behavior lands. You could write a small helper function for RGB normalization (maybe even it include it in the docs as an example), e.g.,

import xarray.ufuncs as xu

def robust_stretch(data):
  vmin, vmax = data.quantile(q=[0.02, 0.98])
  bounded = xu.minimum(xu.maximum(data, vmin), vmax)
  returned (bounded - vmin) / (vmax - vmin)

Then you could plot things like arr.pipe(robust_stretch).plot.imshow(col='time', col_wrap=5).

It's a few more characters but not a terrible work around.

@@ -446,12 +468,13 @@ def newplotfunc(darray, x=None, y=None, figsize=None, size=None,
"Use colors keyword instead.",
DeprecationWarning, stacklevel=3)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add rgb as an actual argument inline, after x and y?

Copy link
Contributor Author

@Zac-HD Zac-HD Dec 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but this would be a breaking change for anyone using positional arguments. I'd therefore prefer not to, as it's just as easy to do the breaking bit later.

Please confirm if you'd like the breaking change; otherwise I'll leave it as-is.

"""
if imshow and darray.ndim == 3:
return _infer_xy_labels_3d(darray, x, y, rgb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure this raises an informative error message if RGB is provided and the input has the wrong number of dimensions or isn't being plotted with imshow.

"""
Determine x and y labels for showing RGB images.

Attempts to infer which dimension is RGB/RGBA by size and order of dims.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than all this complex logic with warnings in ambiguous cases, why not always treat the last and/or remaining (after explicit x/y labels) dimension as RGB? I think that solves the convenience use cases, without hard to understand/predict inference logic.

Copy link
Contributor Author

@Zac-HD Zac-HD Dec 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# but doesn't warn if dimensions specified
arr.plot.imshow(rgb='band')
arr.plot.imshow(x='x', y='y')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for errors related to rgb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

DataArray(
easy_array((10, 15, 3), start=0),
dims=['y', 'x', 'band'],
).plot.imshow()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay relying on matplotlib to plot colors properly. One sane option would be to mock plt.imshow to ensure it gets called properly, but we don't bother with that for our current tests. But that shouldn't get in the way of integration tests that verify that we can actually call matplotlib without any errors being raised.

In this case, ``robust=True`` will saturate the image in the
usual way, consistenly between all bands and facets.

This method will clip oversaturated pixels to the valid range,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually strictly better? If so, why didn't it get changed during matplotlib's recent style overhaup? Was it simply overlooked or a more careful decision?

Copy link
Contributor Author

@Zac-HD Zac-HD Dec 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is strictly better, because it avoids the ambiguity a colour could actually represent any value at all modulo the range.

I believe it was overlooked because relatively few people start with crudely-scaled RGB images, though I wasn't involved closely enough to know for sure (or fix it then!). Unfortunately such "scaling" is pretty common to go from albedo to useful dynamic range in imagery, as I mentioned about robust above. Reference: matplotlib/matplotlib#9391 (and links from that issue).

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 24, 2017

Thanks for the review @shoyer! I'll do what I can in the next few days, but that might not be much at all before I get back from a no-internet camping trip around Jan 8th. Items:

  • add validation and tests for rgb argument
  • don't mutate the input data! (I think it doesn't, but obviously this needs to be checked)
  • simplify order heuristic, to "use last dimension of size 3 or 4 as rgb; warn if there are multiple possible"
  • update rasterio example in gallery

@fmaussion
Copy link
Member

I'll add another todo (can be done in a separate PR if you don't have time ;-): modify the gallery example to plot the RGB image instead of the BW with our rasterio example here.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 6, 2018

OK, @shoyer & @fmaussion - I think I'm done again!

The only thing I haven't done is change the handling of robust=True, as anything beyond "scale and clip 2nd-to-98th percentiles to [0, 1]" seems fraught with complications. If this is not what the user intended, it's at least the obvious default behaviour and should be simple to understand.

Hopefully this covers the substantial changes; but let me know if there's anything else as I'd really like this merged before the 0.10.1 release - and ideally released in time for demos at my summer school in January 😄

@@ -445,10 +445,38 @@ def newplotfunc(darray, x=None, y=None, figsize=None, size=None,
# Decide on a default for the colorbar before facetgrids
if add_colorbar is None:
add_colorbar = plotfunc.__name__ != 'contour'
imshow_rgb = plotfunc.__name__ == 'imshow' and \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: per PEP8, please use parentheses to group this expression instead of an explicit line continuation.

assert darray.ndim == 3
not_none = [a for a in (x, y, rgb) if a is not None]
if len(set(not_none)) < len(not_none):
raise ValueError('Dimensions passed as x, y, and rgb must be unique.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you include the values for these dimensions in the error message?

# there isn't one, and set it to transparent where data is masked.
if z.shape[-1] == 3:
z = np.ma.concatenate((z, np.ma.ones(z.shape[:2] + (1,))), 2)
z[np.sum(z.mask, axis=-1, dtype=bool),-1] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're sure this doesn't mutate the inputs, right?

style note: add a space after the comma.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would also be clearer as with np.any(z.mask, axis=-1) rather than np.sum.

# All data will be masked, so skip percentile calculation
vmin, vmax = 0, 1
if vmin is None:
vmin = np.percentile(flat, ROBUST_PERCENTILE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels strange to me. With colormaps, all robust=True does is explicitly set vmin and vmax based on the data. But vmin and vmax are entirely ignored here.

What if we interpreted vmin/vmax with rgb plots as setting the bounds for normalization? The defaults would be (0, 1) for floats and (smallest int, largest int) for int dtypes. Then, robust=True could again simply set vmin/vmax based on the data.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 9, 2018

More review fixes 😄

I'm pretty sure the build failure is just Travis being flaky - it passes on my machine. Could someone restart it?

@fmaussion
Copy link
Member

Just restarted them, let's see how it goes

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 9, 2018

Okay... I've now seen three Travis runs. In every one there's been a pass, a fail, and three errors... but different jobs each time. At this point I'm ready to give up and wait for the Travis team to fix it 😕

The code is ready to go though! 🎉

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 10, 2018

@shoyer makes some good points about things that should be fixed upstream in Matplotlib - namely normalization of RGB images, but I'm also going to move my color fixes (clipping instead of modulo) upstream instead. Timeline:

  1. I edit this PR, it gets merged, Xarray 10.1 (ETA soonish?) can display RGB images just like Matplotlib
  2. I finish my upstream PR, Matplotlib 2.2 (ETA later this month (!)) supports normalization of RGB images and fixes the modulo/clip bug
  3. Once the matplotlib API is merged, I make a new PR against Xarray to support the same interface (including on older matplotlib versions)

@shoyer
Copy link
Member

shoyer commented Jan 10, 2018

@Zac-HD Sounds good. At this point I don't think there are any blockers left for the 0.10.1 release (I wanted to get the pydap fix in, which we recently merged).

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 10, 2018

Looks like TestDatetimePlot.test_datetime_line_plot is failing, but I'm pretty sure that's nothing to do with me (and it passed twice on Travis!).

@shoyer
Copy link
Member

shoyer commented Jan 10, 2018

Should be fixed by #1814 which I just merged

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now. If there are no other comments I will merge tomorrow.

@shoyer
Copy link
Member

shoyer commented Jan 11, 2018

Thanks @Zac-HD !

@Zac-HD Zac-HD deleted the rgb-maps branch January 11, 2018 03:20
@Zac-HD Zac-HD mentioned this pull request Feb 25, 2018
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants