Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error prioritisation and messages #3009

Merged
merged 41 commits into from
Apr 25, 2023

Conversation

binste
Copy link
Contributor

@binste binste commented Apr 2, 2023

Background

Error messages in Altair are mostly generated based on the validation of the chart specification against the Vega-Lite schema. This validation is performed by the jsonschema package. It's not straight forward to generate helpful error messages out of these specs and Altair 5 rc1 already contains various improvements over Altair 4 in this regard. One improvement was the replacement of the previous 'deep validation' approach with the error hierarchy returned by jsonschema, see #2842. The main benefits over 'deep validation' were that it solved some error messages pointing to parts of the specifications which are actually correct, and the second one was that we can use the error hierarchy to craft more helpful error messages as we now have structured information on all the errors in the chart specification. This last benefit was already somewhat used in #2957.

This PR aims to further use the error hierarchy to improve error messages. It resolves #2915, resolves #2873 and partially addresses #2913.

Suggested approach

See #2842 for a description of how the validation works in general. This PR keeps the separation between validate_jsonschema and SchemaValidationError the same so that:

  • validate_jsonschema: Validates the chart spec against a JSON schema and prioritizes and prepares the relevant error messages
  • SchemaValidationError: Creates a more user friendly error message out of the errors from validate_jsonschema

I tried to document the changes in the code and added more examples to test_schemapi so you can see how the errors look. However, let me know if I should write up a more detailed walkthrough. Below, I just show some examples:

import altair as alt
from vega_datasets import data

Aggregate multiple error messages which are about the same error in the chart specification (#2873)

alt.Chart(data.barley()).mark_bar().encode(
    x=alt.X("variety"),
    y=alt.Y("sum(yield)", stack="asdf"),
)
'asdf' is an invalid value for `stack`. Valid values are:

- one of ['zero', 'center', 'normalize']
- of type 'null' or 'boolean'

If it would be just one one of line or only types then it is shown on the same line and not as a list item:

alt.Chart(data.cars()).mark_text(align="right").encode(
    alt.Text("Horsepower:N", bandPosition="4")
)
'4' is an invalid value for `bandPosition`. Valid values are of type 'number'.

Prefer errors from "deeper levels" of the schema (#2915)

Fixes both examples from #2915 for layered and faceted charts. See that issue for more details. This part happens in schemapi.py:_subset_to_most_specific_json_paths. For the layer example in #2915 it keeps the error for the tooltip as the other error has a json path '$.layer[0]' and the tooltip error is at '$.layer[0].encoding.tooltip'.

Show all errors in the chart specification

In case a chart spec has multiple errors, so far we only showed one and as soon as that one was resolved the other showed up. As a user, I'd prefer to see all at once, else I sometimes think the change I did to resolve one error caused another error although the two are unrelated:

alt.layer(
    alt.Chart().mark_point().encode(tooltip=[{"wrong"}]),
    alt.Chart().mark_line().encode(invalidChannel="unknown"),
)
SchemaValidationError: Error #1: '{'wrong'}' is an invalid value for `field`. Valid values are of type 'string' or 'object'.

Error #2: `Encoding` has no parameter named 'invalidChannel'

Existing parameter names are:
angle         key          order     strokeDash      tooltip   xOffset   
color         latitude     radius    strokeOpacity   url       y         
description   latitude2    radius2   strokeWidth     x         y2        
detail        longitude    shape     text            x2        yError    
fill          longitude2   size      theta           xError    yError2   
fillOpacity   opacity      stroke    theta2          xError2   yOffset   
href                                                                     

See the help for `Encoding` to read the full description of these parameters

Open questions/todos

  • Messages for multiple errors are currently only separated by new lines and by the "Error #..." prefix. Any ideas on how we can improve the readability? I already tried separating the messages with a line ("------"...) but it's unclear what a good length for that line would be as it depends on the frontend where the message is displayed. If it's too long, it's wrapped onto a new line. An alternative might be to indent everything after the first line of a message which is similar to tracebacks and don't start on the first line, for example:
SchemaValidationError:
Error #1: `X` has no parameter named 'unknown'

    Existing parameter names are:
    shorthand      bin      scale   timeUnit   
    aggregate      field    sort    title      
    axis           impute   stack   type       
    bandPosition                               

    See the help for `X` to read the full description of these parameters

Error #2: 'asdf' is an invalid value for `stack`. Valid values are:

    - one of ['zero', 'center', 'normalize']
    - of type 'null' or 'boolean'
  • I'll do another code review myself but wanted to already put this up for review to get inputs, especially as we might still get it into version 5 which I think would be great!

What I think could be tackled in separate PRs:

binste added 29 commits April 1, 2023 08:51
…fication. Relevant as sometimes only one is shown in the resulting error message
…lement before passing to SchemaValidationError to allow for better error messages to be created
…pect.cleandoc. This makes it easier to work on the error messages
…it should be as this depends on the frontend
…ere as we upgrade to 5.7 anyway before a release
@binste binste marked this pull request as ready for review April 15, 2023 08:28
@binste
Copy link
Contributor Author

binste commented Apr 15, 2023

PR is ready for review :) I'll do another review myself this weekend but wanted to already put this up to get your inputs in case you find the time, especially as we might still get it into version 5 which I think would be great!

…hanges here as we upgrade to 5.7 anyway before a release"

This reverts commit 09805a3.
@binste
Copy link
Contributor Author

binste commented Apr 15, 2023

Tests will fail until vega/schema#8 is resolved.

@joelostblom
Copy link
Contributor

joelostblom commented Apr 15, 2023

Wohoo! Exciting to see this ready for review 🎉 Thanks for all your hard work on the PR, it looks really awesome at first glance =)

I'll do another review myself this weekend but wanted to already put this up to get your inputs in case you find the time, especially as we might still get it into version 5 which I think would be great!

I agree, and I think even getting this into 5 RC2 would be a good goal. It seems like we might want to wait for vega/schema#8 before releasing RC 2 anyways?

I will try to review as soon as possible, but a deeper look will probably have to wait until end/middle of next week for me unfortunately. But I can say already that I like all the approaches taken here, and it is interesting that we know show more than just one Error. I agree with you that the formatting is important for this not to be confusing. To me the indented example you showed above is notably clearer than the other one. Maybe even tweaked to explicitly mention on the first line that there are multiple errors?

SchemaValidationError: Multiple errors were found.

Error 1: `X` has no parameter named 'unknown'

    Existing parameter names are:
    shorthand      bin      scale   timeUnit   
    aggregate      field    sort    title      
    axis           impute   stack   type       
    bandPosition                               

    See the help for `X` to read the full description of these parameters

Error 2: 'asdf' is an invalid value for `stack`. Valid values are:

    - one of ['zero', 'center', 'normalize']
    - of type 'null' or 'boolean'

Without indendation, maybe three --- could work? Standardizing on three could be justified as this is what is enough in HTML for a horizontal rule and it adds some additional whitespace separation too:

SchemaValidationError: Multiple errors were found.

Error 1: `X` has no parameter named 'unknown'

Existing parameter names are:
shorthand      bin      scale   timeUnit   
aggregate      field    sort    title      
axis           impute   stack   type       
bandPosition                               

See the help for `X` to read the full description of these parameters

---

Error 2: 'asdf' is an invalid value for `stack`. Valid values are:

- one of ['zero', 'center', 'normalize']
- of type 'null' or 'boolean'

I am OK with either of those two approaches; I think the indentation is easier to quickly scan for the two errors, so I favor that a bit (unless there are any width issues width only indenting for multiple errors, but I can't think of why that would be the case). I also slightly favor having the # number prefixed removed, but that is very minor and I am really fine either way so you pick.


One more thought, do you think it makes sense to have a max on the number of errors that can be displayed? I am thinking something like 3-5, because above that it might be hard to parse/scroll, and also to guard against some possible odd scenario when 20+ errors are shown (not sure if this possible and it definitely doesn't seem likely but maybe it is still a good idea to have an upper limit in place as a safeguard). What do you think?

@binste
Copy link
Contributor Author

binste commented Apr 15, 2023

Happy to hear! :) And thank you for the first feedback. I fully agree with all your points and implemented them. I went for the indentation as I also think it's more readable then some solution with dividers. At most 3 errors are now shown.

…a. Changes are not relevant as we upgrade to 5.7 anyway. They happened as part of resolving vega/schema vega#8
@binste
Copy link
Contributor Author

binste commented Apr 17, 2023

Tests are failing because altair_viewer no longer supports VL 5.6 so we'll first need to merge #3022 once it's ready.

@mattijn
Copy link
Contributor

mattijn commented Apr 21, 2023

(did a quick check if this PR pass the bump to VL5.7.1)

Copy link
Contributor

@joelostblom joelostblom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing and will be a huge help for all Altair users, thank you so much for tackling this @binste ! And sorry for taking some time to get to it. I have a few minor comments, but I also think this is good to go as is, so feel free to update/resolve as you see fir and we can merge it after. Although all the changes here seems to make sense to me and the tests are encompassing, I think it is important that we start running this ourselves and get it into the next RC, so that we get the chance to stumble upon error cases that are difficult to anticipate ahead of time.

altair/utils/schemapi.py Show resolved Hide resolved
altair/utils/schemapi.py Show resolved Hide resolved
altair/utils/schemapi.py Show resolved Hide resolved
altair/utils/schemapi.py Show resolved Hide resolved
altair/utils/schemapi.py Show resolved Hide resolved
@binste
Copy link
Contributor Author

binste commented Apr 25, 2023

No worries at all and thank you for the review and the feedback! I addressed your comments and it's ready to be merged from my side. Looking forward to RC2!

@joelostblom
Copy link
Contributor

Great, thank you again! Merging this!

@joelostblom joelostblom merged commit c273c0a into vega:master Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some errors in layered or faceted specs raise the wrong error message Improve messages for multiple errors
3 participants