Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make the schema validation error for non-existing params more informative #2568

Merged
merged 27 commits into from
Feb 21, 2023

Conversation

joelostblom
Copy link
Contributor

@jakevdp What do you think of something like this as a solution to #2565?

I am still working on organizing the printed parameters in neatly formatted columns so that they are easier to read, but I thought I would check in with you whether you think this solution is robust enough. It has been working fine in my limited testing and I kept the old error message as a fallback to be safe.

close #2565

@joelostblom joelostblom force-pushed the schema-validation-error branch 4 times, most recently from 1a84e1d to 3e99a92 Compare March 22, 2022 20:34
@joelostblom
Copy link
Contributor Author

joelostblom commented Mar 22, 2022

I rebased on master so that the checks are passing (the many force pushes are for black and flake8, sorry about the noise). I also fixed the formatting of the parameters in the error message. It required more code than expected to make an even table with names sorted alphabetically in columns, but I think that the output is much easier to read now.

Instead of the unformated list of parameters:

SchemaValidationError: Invalid specification

altair.vegalite.v4.schema.core.Axis, validating 'additionalProperties'

altair.Axis has no parameter named 'here'

Existing parameter names are:
'aria', 'bandPosition', 'description', 'domain', 'domainCap', 'domainColor', 'domainDash',
'domainDashOffset', 'domainOpacity', 'domainWidth', 'format',
'formatType', 'grid', 'gridCap', 'gridColor', 'gridDash',
'gridDashOffset', 'gridOpacity', 'gridWidth', 'labelAlign', 'labelAngle',
'labelBaseline', 'labelBound', 'labelColor', 'labelExpr', 'labelFlush',
'labelFlushOffset', 'labelFont', 'labelFontSize', 'labelFontStyle',
'labelFontWeight', 'labelLimit', 'labelLineHeight', 'labelOffset',
'labelOpacity', 'labelOverlap', 'labelPadding', 'labelSeparation',
'labels', 'maxExtent', 'minExtent', 'offset', 'orient', 'position',
'style', 'tickBand', 'tickCap', 'tickColor', 'tickCount', 'tickDash',
'tickDashOffset', 'tickExtra', 'tickMinStep', 'tickOffset', 'tickOpacity',
'tickRound', 'tickSize', 'tickWidth', 'ticks', 'title', 'titleAlign',
'titleAnchor', 'titleAngle', 'titleBaseline', 'titleColor', 'titleFont',
'titleFontSize', 'titleFontStyle', 'titleFontWeight', 'titleLimit',
'titleLineHeight', 'titleOpacity', 'titlePadding', 'titleX', 'titleY',
'translate', 'values', 'zindex'

See the help for altair.Axis to read the full description of these parameters

The output is now a table of parameters:

altair.vegalite.v4.schema.core.Axis, validating 'additionalProperties'

altair.Axis has no parameter named 'here'

Existing parameter names are:
aria               gridDashOffset     labelLineHeight   tickCount        titleBaseline     
bandPosition       gridOpacity        labelOffset       tickDash         titleColor        
description        gridWidth          labelOpacity      tickDashOffset   titleFont         
domain             labelAlign         labelOverlap      tickExtra        titleFontSize     
domainCap          labelAngle         labelPadding      tickMinStep      titleFontStyle    
domainColor        labelBaseline      labelSeparation   tickOffset       titleFontWeight   
domainDash         labelBound         labels            tickOpacity      titleLimit        
domainDashOffset   labelColor         maxExtent         tickRound        titleLineHeight   
domainOpacity      labelExpr          minExtent         tickSize         titleOpacity      
domainWidth        labelFlush         offset            tickWidth        titlePadding      
format             labelFlushOffset   orient            ticks            titleX            
formatType         labelFont          position          title            titleY            
grid               labelFontSize      style             titleAlign       translate         
gridCap            labelFontStyle     tickBand          titleAnchor      values            
gridColor          labelFontWeight    tickCap           titleAngle       zindex            
gridDash           labelLimit         tickColor                                            

See the help for altair.Axis to read the full description of these parameters

I did some basic manual testing with all the encoding channels listed in https://altair-viz.github.io/user_guide/encoding.html#encoding-channels and it seems to work fine. Although I am not sure if there are edge cases, I hope that the fallback to the old method could guard again those, but let me know if you think of anything specific and I can try to add tests for that.

@mattijn
Copy link
Contributor

mattijn commented Dec 27, 2022

Hi @joelostblom can you synchronise your branch, https://github.com/joelostblom/altair/tree/schema-validation-error, with the main repo and maybe add a code-snippet how to this can be tested easily using eg. colab. This will help the review process. Hopefully @jakevdp is still willing to do this.

@joelostblom joelostblom force-pushed the schema-validation-error branch 2 times, most recently from f1a0932 to fdbb50c Compare January 6, 2023 12:20
@joelostblom
Copy link
Contributor Author

joelostblom commented Jan 6, 2023

@mattijn I rebased this on the latest main branch and also structured the code so that it is easier to review. Do you think you would be able to have a look at this? I think it could be quite useful to get feedback on in the RC. I have included the rationale and an example below:

When a non-existing parameter name is used, I think it would be helpful to include the existing parameter names in the error message. Currently, when misspelling a parameter name like in the example below, it is not clear whether I made a typo or the parameter doesn't exist at all, so the immediate feedback of seeing the correct parameter names would be helpful.

import altair as alt
import pandas as pd

source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

alt.Chart(source).mark_bar().encode(
    x=alt.X("a", scale=alt.Scale(padingOuter=0.5)),
    y=alt.Y("b")
)
SchemaValidationError: Invalid specification

        altair.vegalite.v4.schema.core.Scale, validating 'additionalProperties'

        Additional properties are not allowed ('padingOuter' was unexpected)

I also belive the error message text could be clarified a little. Especially for novices, I don't think it is immediately clear what it means that "Additional properties are not allowed". This PR updates the error message to instead read:

SchemaValidationError: Invalid specification

altair.vegalite.v5.schema.core.Scale, validating 'additionalProperties'

altair.Scale has no parameter named 'padingOuter'

Existing parameter names are:
align      domain      interpolate    range      round    
base       domainMax   nice           rangeMax   scheme   
bins       domainMid   padding        rangeMin   type     
clamp      domainMin   paddingInner   reverse    zero     
constant   exponent    paddingOuter                       

See the help for altair.Scale to read the full description of these parameters

It is possible that we could even remove the two first lines and start with altair.Scale has no... as I don't think the two first lines add any info that helps the user solve the error, what do you think?

This PR also includes a fallback to the old method of displaying error messages, so that in case the assumption in the if-statement does not hold true, things will not break and the user will just see a less helpful error message. Here is an example of when that is happening:

import altair as alt
from vega_datasets import data

source = data.cars()

alt.Chart(source).mark_boxplot(
    ticks={'thickness': 2},
    median={'stroke': 'black', 'strokeWidth': 2},
    size=50
).encode(
    x=alt.X('Miles_per_Gallon:Q', scale=alt.Scale(zero=False)),
    y=alt.Y('Origin', scale=alt.Scale(zero=False)),
    color=alt.Color('Origin'),
)

@binste
Copy link
Contributor

binste commented Jan 7, 2023

This looks super helpful! Haven't reviewed the code but just a quick reminder that the changes need to be moved to tools/schemapi/schemapi.py.

My original approach sometimes returned the name of an existing
parameter. This commit uses the same approach as the fallback but
extracts just the parameter name from the message string.
@joelostblom
Copy link
Contributor Author

Thanks @binste, updated!

@binste
Copy link
Contributor

binste commented Jan 20, 2023

Just a note that it might make sense to merge this PR after #2842 and test it again. I'd expect that your changes work in even more cases as the correct additionalProperties error is raised more often, but just to be sure.

""".format(
schema_path, self.validator, message
)
if hasattr(vegalite, schema_path.split(".")[-1]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@binste I noticed an issue here after #2842 was merged. Previously, error messages would include a reference to the class where the error was raised, as in this case with the Y channel:

altair.vegalite.v5.schema.channels.Y, validating 'additionalProperties'

After #2842 the first part of that error message changes to no longer point to the particular class where the error was encountered, but just the chart in general:

altair.vegalite.v5.api.Chart->0, validating 'additionalProperties'

In addition to being less informative in general, this becomes an issue here because this PR was relying on that first part of the error message to find all the correct parameters names that the Y channel accept. There might be ways to work around that in this PR, but I wanted to ask you first if you think it is possible to restore the more precise error message that points to the class where the error was encountered or if that information is lost with the new approach to raising errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add some more info, previously with deep validation, what I think was happening was that the iterative finding of the error meant that the class of the SchemaBase (I think) would change in each iteration to reflect which part of the spec was checked for errors. For example if I print out type(self) of the self that is passed to the SchemaValidationError it would look like this with deep validation (for a spec with a typo in a param in the Y channel):

<class 'altair.vegalite.v5.api.Chart'>
<class 'altair.vegalite.v5.schema.channels.Y'>

Without deep validation, all the errors are found without iteration and therefor the self param remains as the chart itself without changing to the Y channel where the error is occurring:

<class 'altair.vegalite.v5.api.Chart'>

I tried to look briefly if there was a way to not only return all the error messages, but also all the classes so that we could still find out that in this case it was the Y class that the error pertained to.

I looked briefly and it seems like some of this information is stored in the absolute_path attribute of the self object inside the SchemaValidationError class, and it might be possible to reconstruct it from there, but I couldn't find the full class path anywhere and I wanted to hear your thoughts before going down that path.

Copy link
Contributor

@binste binste Feb 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed write-up! Indeed, I agree that it would be more informative if that original behaviour is restored and of course helpful for your proposed changes here, which look very helpful.

I think absolute_path is the correct place to take this from. From that name, we can try to recreate the correct lass name. We can maybe use the usual conversion to a Python object name that is used in the propertysetter https://github.com/altair-viz/altair/blob/master/altair/utils/schemapi.py#L680 and also in the code generation https://github.com/altair-viz/altair/blob/master/tools/generate_schema_wrapper.py#L454:

classname = prop[0].upper() + prop[1:]

Places to modify would be in SchemaValidationError.__str__ instead of cls = self.obj.__class__ we could do something like

class_name_lower = self.absolute_path[-1]
# TODO: Check if this always works. Would work for `xOffset` -> `XOffset`
class_name = f"{class_name_lower}"[0].upper() + f"{class_name_lower}"[1:]
cls = getattr(vegalite, class_name)
cls

and before that in SchemaValidationError.__init__:

self.absolute_path = err.absolute_path

I haven't tested this yet but most classes should be importable from the top-level like this right? For others we could still fall back on self.obj (the self that is passed into SchemaValidationError).

If I find more time in the next days I can make a PR with this to restore the original behaviour of showing the class in which the error occured. I could then also review this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #2883

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I merged in your PR and this seems to be working as intended now. There are a few failing tests which needs the error message updated; I think I can get to those on Friday or Sunday this week. There seems to be some larger error also where my solution here still needs some change to work with the new multiple returned error messages.

Copy link
Contributor Author

@joelostblom joelostblom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@binste I have updated this PR so that it works with the new error handling and also only shows the names of the existing parameters when an unknown parameter was specified and not for other types of errors. The tests you added previously was very helpful so thanks for that!

Redundant info removed

I have also removed some redundant info from the error message that I don't think is helpful for the user. As an example, take the following spec:

import altair as alt
import pandas as pd

source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

alt.Chart(source).mark_bar().encode(
    x=alt.X("a", scale=alt.Scale(padingOuter=0.5)),
    y=alt.Y("b")
)

It now outputs an error that get straight to the point of what is incorrect:

SchemaValidationError: `Scale` has no parameter named 'padingOuter'

Existing parameter names are:
align      domain      interpolate    range      round    
base       domainMax   nice           rangeMax   scheme   
bins       domainMid   padding        rangeMin   type     
clamp      domainMin   paddingInner   reverse    zero     
constant   exponent    paddingOuter                       

See the help for `Scale` to read the full description of these parameters

Previously it included "invalid specification" and a mention of attributes in the json schema, neither of which I think adds any info that makes it easier to troubleshoot this error, just more text to read:

SchemaValidationError: Invalid specification

altair.vegalite.v5.schema.core.Scale, validating 'additionalProperties'

altair.Scale has no parameter named 'padingOuter'

Existing parameter names are:
align      domain      interpolate    range      round    
base       domainMax   nice           rangeMax   scheme   
bins       domainMid   padding        rangeMin   type     
clamp      domainMin   paddingInner   reverse    zero     
constant   exponent    paddingOuter                       

See the help for altair.Scale to read the full description of these parameters

Summary line added

In addition to removing some redundant info, I have added a summary line when an invalid value is passed to a correct parameter name as can be seen in the following spec:

source = data.cars()
(
    alt.Chart(source)
    .mark_text(align="right")
    .encode(alt.Text("Horsepower:N", title=dict(text="Horsepower", align="right")))
)

which now outputs this error:

SchemaValidationError: '{'text': 'Horsepower', 'align': 'right'}' is an invalid value for `title`:

{'text': 'Horsepower', 'align': 'right'} is not of type 'string'
{'text': 'Horsepower', 'align': 'right'} is not of type 'array'

instead of this more verbose, but less informative error:

SchemaValidationError: Invalid specification

        altair.vegalite.v5.schema.core.TitleParams, validating 'type'

        {'text': 'Horsepower', 'align': 'right'} is not of type 'string'
        {'text': 'Horsepower', 'align': 'right'} is not of type 'array'

I find that this is also helpful when there are multiple errors spit out as in the example above and this example (I know you were also thinking about another solution here @binste so let me know if you prefer that instead):

alt.Chart(data.barley()).mark_bar().encode(
    x=alt.X('variety'),
    y=alt.Y('sum(yield)', stack='asdf'),
)

which now shows the error:

SchemaValidationError: 'asdf' is an invalid value for `stack`:

'asdf' is not one of ['zero', 'center', 'normalize']
'asdf' is not of type 'null'
'asdf' is not of type 'boolean'

instead of:

SchemaValidationError: Invalid specification

        altair.vegalite.v5.api.Chart->0, validating 'enum'

        'asdf' is not one of ['zero', 'center', 'normalize']
        'asdf' is not of type 'null'
        'asdf' is not of type 'boolean'

Invalid encoding channels

For specs that include an invalid encoding channel, there is no summary line but the error message is still shortened to remove redundant info:

selection = alt.selection_point()
(
    alt.Chart(data.barley())
    .mark_circle()
    .add_params(selection)
    .encode(
        color=alt.condition(selection, alt.value("red"), alt.value("green")),
        invalidChannel=None,
    )
)

which now shows:

SchemaValidationError: Additional properties are not allowed ('invalidChannel' was unexpected)

instead of:

SchemaValidationError: Invalid specification

        altair.vegalite.v5.schema.core.Encoding->encoding, validating 'additionalProperties'

        Additional properties are not allowed ('invalidChannel' was unexpected)

altair/utils/schemapi.py Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
tests/utils/tests/test_schemapi.py Show resolved Hide resolved
Copy link
Contributor

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the proposed changes to the error messages! They are much more succinct and informative and the tables are awesome 🚀 Thank you also for the detailed comment, this made the review much easier for me. I only have some small suggestions.

altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
altair/utils/schemapi.py Outdated Show resolved Hide resolved
tests/utils/tests/test_schemapi.py Outdated Show resolved Hide resolved
joelostblom and others added 2 commits February 19, 2023 14:34
…ams to trigger the parameter table

Co-authored-by: Stefan Binder <binder_stefan@outlook.com>
Copy link
Contributor

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in test_schemapi.py need to be moved into the respective tools script. This happens to me way too often! I opened #2906 with an idea on how we can test this.

@joelostblom
Copy link
Contributor Author

Good catch, thank you! I have updated to added the changes to the tools script as well

@mattijn
Copy link
Contributor

mattijn commented Feb 21, 2023

I think it is a really nice PR. Thanks @joelostblom. Really helpful, nice that it gives suggestions. It makes me want to make mistakes on purpose to read the suggestions..

I found one spec that gives an incorrect hint:

import altair as alt
from vega_datasets import data
cars = data.cars.url

alt.Chart(cars).mark_point().encode(
    x='Acceleration:Q',
    y='Horsepower:Q',
    color=alt.value(1)  # should be eg. alt.value('red')
)
SchemaValidationError: `Color` has no parameter named 'value'

Existing parameter names are:
shorthand      bin         legend   timeUnit   
aggregate      condition   scale    title      
bandPosition   field       sort     type       

See the help for `Color` to read the full description of these parameters

It is allowed to use alt.value() in this context, but in this case it should be a string and not a number.

Also the context description is based on the jsonschema. So I'm still linking this comment #2842 (comment) here as well.

alt.Chart(data.barley()).mark_bar().encode(
    x=alt.X('variety'),
    y=alt.Y('sum(yield)', stack='null'),  # should be eg. stack=None
)
SchemaValidationError: 'null' is an invalid value for `stack`:

'null' is not one of ['zero', 'center', 'normalize']
'null' is not of type 'null'
'null' is not of type 'boolean'

I tried to follow the suggestion, but it was not directly clear that type 'null' actually is None.

No further comments. Congrats again on this PR!

@joelostblom
Copy link
Contributor Author

Thank you @mattijn 😄 I will go ahead and merge this since you approved and the change that @binste suggested has also been implemented.

We can follow up on the remaining two issues separately. I noticed what you mentioned regarding alt.datum/value and it was introduced before this PR so I opened #2913 to track it. I agree that it would be helpful to show the Python types as well and opened #2914 for that.

@joelostblom joelostblom merged commit 7bf976e into vega:master Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make SchemaValidationError more helpful by printing expected parameters
3 participants