Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft 2019+ tests incorrectly depend on implementations supporting $schema-less schemas but they are not required to process them #311

Closed
gregsdennis opened this issue Nov 27, 2019 · 50 comments · Fixed by #586
Assignees
Labels
bug A test is wrong, or tooling is broken or buggy.

Comments

@gregsdennis
Copy link
Member

For example, the first test in the const file is

{
    "description": "const validation",
    "schema": {"const": 2},
    "tests": [
        {
            "description": "same value is valid",
            "data": 2,
            "valid": true
        },
        {
            "description": "another value is invalid",
            "data": 5,
            "valid": false
        },
        {
            "description": "another type is invalid",
            "data": "a",
            "valid": false
        }
    ]
}

There is no $schema declaration. That means that this is a valid draft-6, -7, and -2019-09 schema.

My implementation assumes latest, unless it can be determined that another one should be used (via keywords used or $schema declaration), but that may not be the case for others.

This means that while we intend for this to be evaluated as draft 2019-09, it may not be, depending on how the implementation reads non-specific schemas.

@Julian
Copy link
Member

Julian commented Nov 27, 2019

There's no real reason other than that it'd be noise (a property needed in all schemas) -- it's also valid to leave it out even though it's highly recommended, so even if we did add it everywhere you'd still have a few (one at least) that tests the behavior without it.

Your implementation doesn't allow explicitly choosing which version to apply?

E.g. in mine, when I run the suite, for each folder, I explicitly use the appropriate draft, even though mine works as you say defaulting to the latest if you don't specify.

@handrews
Copy link
Contributor

@Julian given that we're strongly encouraging folks to start using $schema and the $vocabulary keyword in the referenced meta-schema to control how to process schemas, we should really include this.

The idea that $schema is a needless formality has significantly degraded the usefulness of the keyword to the point that many implementations outright ignore it and just default to doing whatever they feel like regardless of what the schema declares. That's a problem.

@gregsdennis
Copy link
Member Author

Your implementation doesn't allow explicitly choosing which version to apply?

Quite the opposite. If you specify with $schema, it applies at that draft. But if you don't, it makes a best guess given the keywords that are used.

@gregsdennis

This comment was marked as off-topic.

@Julian
Copy link
Member

Julian commented Nov 27, 2019

Quite the opposite. If you specify with $schema, it applies at that draft. But if you don't, it makes a best guess given the keywords that are used.

Right that's what I mean -- in mine, there are 3 choices -- if you specify with $schema you get that one. If you specify nothing you get latest. But if you want to, you can explicitly choose which draft to use. Seems pretty important to me to support the latter if you're going to support multiple drafts, and that was definitely how the suite was written (that you explicitly apply the validator for each corresponding draft).

That being said though I'm definitely not against adding $schema to all the schemas for what @handrews you said -- I do think we should have at least 1 test for each draft that doesn't use it and asserts you still get the right draft-specific behavior so long as the drafts don't say it is literally required, otherwise there's a valid schema we won't cover.

(My usual neuroses about not using large sweeping automated formatting changes there though still may mean we have to do this slowly enough to review the patches individually)

Exactly. Moreover, we should be the example of how schemas should be authored.

I don't think the test suite should be a guide there -- it contains in many cases intentionally pathological examples, but yeah fair enough I guess I shouldn't push back if I'm not disagreeing :D

@handrews
Copy link
Contributor

But if you want to, you can explicitly choose which draft to use. Seems pretty important to me to support the latter if you're going to support multiple drafts

I'd say it's far more important to support reading and respecting $schema and $vocabulary. Sure, its fine to put in an option to explicitly set it when $schema is absent, but I would find the ability to outright contradict it to be of dubious value.

Exactly. Moreover, we should be the example of how schemas should be authored.

Yup. This is particularly critical with $vocabulary now. In the past, there wasn't really all that much you could do with $schema. For most implementations, it understood only the http://json-schema.org/{version}/schema URI for each version anyway. Custom meta-schema URIs were essentially useless. With $vocabulary that is no longer the case.

I do think we should have at least 1 test for each draft that doesn't use it and asserts you still get the right draft-specific behavior so long as the drafts don't say it is literally required, otherwise there's a valid schema we won't cover.

Sure, we can have a test for this, but we only need one, and it should only test behavior specified by the specification. So I'm not sure what it should tests. The most recent specification does state a default meta-schema, but I believe it is the first to explicitly do so.

My usual neuroses about not using large sweeping automated formatting changes there though still may mean we have to do this slowly enough to review the patches individually

@Julian are you really saying that you're not OK with mechanically inserting the same $schema everywhere in each version tree? It's so easy to just do that and review them. If there's a problem with file formatting not being stable enough, then we should absolutely fix that first.

@Julian

This comment was marked as off-topic.

@handrews

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@Julian Julian added the enhancement An enhancement to the tooling or structure of the suite (as opposed to a new test). label Nov 27, 2019
@handrews

This comment was marked as off-topic.

@karenetheridge

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@Julian
Copy link
Member

Julian commented Jun 17, 2022

From a ticket closed as a dupe, sharing this again below:

According to the specification, JSON Schema definitions are recommended to have a $schema keyword.

To me, I don't feel the recommendation really applies strongly here, for reasons I intended to describe in json-schema-org/json-schema-spec#324 -- namely, what I think we should optimize for in the test suite is readability, not perfection -- and while that recommendation is perfect for schemas in isolation which otherwise wouldn't know what $schema they belong to, for the test suite, it seems more to be boilerplate that makes it harder to focus on the key point of each test case. So I've always been -0 on adding it.

On the other hand I do like:

Instead of relying on implementors to guess which one should be used for every folder, would it make sense to manually set $schema in all the tests

since it seems nice to be able to rely on something someone already needs to support, but I don't think, again in my opinion, that this will work really. Because the test suite will always need to also include a test where $schema is not present (as we always do for SHOULDs in the spec which may be ignored), and once we need to do so, the problem comes back again.

But yeah happy to either elaborate myself, or hear again from others. I know Greg and Henry may still be in favor of this.

@awwright
Copy link
Member

I don't think there's been a specific reason we don't include $schema except that it's not required, and adding "$schema" would mean that this is no longer tested.

Instead of relying on implementors to guess which one should be used for every folder

Guessing what, exactly? JSON Schema should be suitably well defined to operate without a meta-schema. What's written in the spec is authoritative, what appears in the meta-schema is merely informative.

@handrews
Copy link
Contributor

@Julian, the role of $schema has changed over time. Originally, it didn't exist at all. In drafts 3 and 4, it was presented as an inessential nice-to-have, which made your approach reasonable. Drafts 6 and 7 didn't really change the level of requirement around $schema, so that continued to be fine.

However, as of 2019-09, we made reliable extensibility a major goal and developed $vocabulary, which works via $schema, to accomplish that goal. As part of this, the absence of $schema in the document root was explicitly stated to result in undefined behavior.

Basic QA principles say that you can't write test cases that combine implementation-defined behavior with positive functionality tests (for those who don't know, my career bounced back and forth between systems QA and Development before settling into Technical Director roles spanning both departments). Doing so means that every test outcome is impacted by implementation-specific choices rather than standards conformance.

So on those grounds, if someone came to me with a test plan where nearly every test case involved an implementation-defined condition, I would reject the plan.


The larger topic here is whether the JSON Schema org supports the goal of reliable extensibility. If we do, then we really need to stop treating $schema as an afterthought. Implementation behavior around $schema is extremely variable, including implementations that ignore it entirely.

The reason for this is that the test suite makes it very clear that $schema doesn't matter. There are no test cases for it at all. Once you get to 2019-09, there's one $vocabulary test (that doesn't really do much AFAICT), and in 2020-12 it's used to enable format assertion behavior, but that's optional anyway.

So, as currently designed, the test suite is undercutting an (apparent?) major project goal. It's very frustrating to see how little attention many implementations pay to $schema, and it's not a mystery as to why that has happened.

If the JSON Schema org does not have a consensus that this is a major project goal, then we should open a discussion on that in the discussions repo, because there's no point in debating this detail without clear context on why we care about $schema. The fact that the media type registration work hinges on $schema for dialect identification suggests that the keyword is important.


If we do have consensus on this, then the topic of how to best test $schema and $vocabulary is another fairly large discussion. We would need cases where $schema is absent, but testing implementation-defined conditions is a bit of a tricky thing (the spec could be better regarding testable requirements here, I'm a little disappointed in myself TBH- for example, I should have put some testable guardrails around "implementation-defined" like "MUST NOT crash" or "MAY refuse to process the schema" or "MUST try to process the schema" etc).

But the minimum first step would be to get the implementation-defined condition out of all of the other test cases. The test suite should rely on well-specified functionality ($schema+$vocabulary determines the processing rules for 2019-09 and later) rather than on behavior that is outside of the specification entirely (relying on an external selection of processing rules).

I would prefer to extend this to all drafts, as it would significantly help normalize $schema support and encourage implementations to take it seriously. However, I would be reluctantly OK with leaving draft-07 and earlier as-is because the spec was written differently for those drafts. The behavior without $schema is not clearly specified, but it's strongly implied to be a common occurrence and therefore handled reasonably. However, that's intentionally no longer the case in 2019-09 and later.

@Julian
Copy link
Member

Julian commented Jun 18, 2022

I'm responding quickly not to say I've already understood what you wrote (which yeah thanks! I'll read it carefully) but just because:

as of 2019-09 [...] the absence of $schema in the document root was explicitly stated to result in undefined behavior.

If this is true, you already convinced me. Can you or someone point me at where this is?

The reason for this is that the test suite makes it very clear that $schema doesn't matter. There are no test cases for it at all.

I'll just state on the record that I do not intend the test suite to say this, and if we don't have it, it's purely an "accident" of lack of effort until now on my part, something I clearly want to fix.

@handrews
Copy link
Contributor

Thanks, @Julian

From §8.1.1 The "$schema" keyword:

The "$schema" keyword SHOULD be used in the document root schema object, and MAY be used in the root schema objects of embedded schema resources. It MUST NOT appear in non-resource root schema objects. If absent from the document root schema, the resulting behavior is implementation-defined_.

There's slightly more clear guidance around unknown $vocabulary values, although of course that requires following $schema to look at $vocabulary:

§8.1.2.1 Default vocabularies:

If "$vocabulary" is absent, an implementation MAY determine behavior based on the meta-schema if it is recognized from the URI value of the referring schema's "$schema" keyword. This is how behavior (such as Hyper-Schema usage) has been recognized prior to the existence of vocabularies.

If the meta-schema, as referenced by the schema, is not recognized, or is missing, then the behavior is implementation-defined. If the implementation proceeds with processing the schema, it MUST assume the use of the core vocabulary. If the implementation is built for a specific purpose, then it SHOULD assume the use of all of the most relevant vocabularies for that purpose.

For example, an implementation that is a validator SHOULD assume the use of all vocabularies in this specification and the companion Validation specification

Even if we take that last paragraph as guidance, it doesn't actually say what draft of the vocabularies in question should be used.

The 2nd paragraph even notes that an implementation might refuse to process the schema. So there's an implicit allowance that a conforming implementation could outright refuse to the run the test suite. That might not be something that we want to imply, but that's what it says. IIRC, I did not want to require implementations to process an unknown thing as that could be a security risk.


Regarding:

'll just state on the record that I do not intend the test suite to say this, and if we don't have it, it's purely an "accident" of lack of effort until now on my part, something I clearly want to fix.

Oh, I did not mean to imply that you did! I apologize, I should have worded that better. I'm trying to work on my tendency to sound accusatory on this sort of thing.

I also think it was not unreasonable given the history. When I went to reply to this issue, at first I was confused over why we hadn't been using $schema until I went back and read draft-04 and draft-07 and realized that what was in 2019-09 and later was actually a change. Prior to that, the spec was pretty casual about $schema as well.

@Julian
Copy link
Member

Julian commented Jun 18, 2022

My reading skills (of the spec) apparently leave what to be desired, since I figured you were referring to that paragraph, but I have up until this moment probably never internalized the last half of that paragraph, i.e.:

If absent from the document root schema, the resulting behavior is implementation-defined_.

and was always focused on "yeah OK SHOULD, but doesn't need to".

That's quite clear, so I'm now very much +1 on the change, and indeed as you say now even when we do now have tests without $schema, they need to go in optional even.

Oh, I did not mean to imply that you did! I apologize, I should have worded that better. I'm trying to work on my tendency to sound accusatory on this sort of thing.

Don't worry about it :) I'll very much assume good intent (at least until I look silly doing so :D).

@Julian
Copy link
Member

Julian commented Jun 18, 2022

For my own complete clarity, and thanks again for bearing with me until now, I want to also cite something from json-schema-org/json-schema-spec#439 (which I intend to get back to and merge in the next few days), which covers why that section means an immediate +1, which is:

additional tests MUST NOT ever fail for an implementation that is correct under the specification

whereas it's now clear that a correct implementation, under

If absent from the document root schema, the resulting behavior is implementation-defined_.

may perfectly well have decided that their "implementation-defined behavior" is "blow up and error" (EDIT: see the hidden comments below, there may be disagreement about this precise "implementation-defined behavior, but the point stands for some other behavior).

So yes, anyone is welcome to do this (on all drafts, or ones with that language, I have no strong opinion since there's going to be "boilerplate" anyhow in most places now). If no one gets to it sometime soon, I'll get to it myself, probably after json-schema-org/json-schema-spec#439 and a few other issues are closed first.

@awwright

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@notEthan

This comment was marked as off-topic.

@awwright

This comment was marked as off-topic.

@Julian
Copy link
Member

Julian commented Jun 19, 2022

Let's move any discussion of what the set of valid implementation-defined behaviors are, and whether they include "error out", elsewhere (like the ticket linked) so that this ticket can just be about what the spec says, not what it should say --

@awwright are you disagreeing there (about what it seems to say)? Or are you making a point about what you'd like it to say?

(To me it seems very clear that what the spec says is that an implementation may choose to do something other than what the test suite assumes it does today). But speak up if you're disagreeing.

@awwright
Copy link
Member

awwright commented Jun 19, 2022

The effect on testing is: if you're changing the meta-schema being used for running the tests, based on the draft directory, then you're not testing for compatibility with schemas in the wild (where this information is unavailable).

  • In controlled environments where you are the author of the schema, like within an application to validate incoming data, this may be sufficient. But if you're reading 3rd party schemas, then compatibility isn't being tested.

@karenetheridge
Copy link
Member

Instead of relying on implementors to guess which one should be used for every folder

I thought it was implicit that the directory name for the tests would indicate which specification version to use for running those tests. If that's not clear enough, we should say so explicitly in a README. That is -- it's not okay for an implementation to just use whatever version it likes (i.e. its preferred default version) for all tests, as obviously the tests won't all pass if the versions don't match up, so what is the point? The different directories exist for the express purpose of demonstrating the different expected behaviours for each specification version.

@karenetheridge
Copy link
Member

The reason for this is that the test suite makes it very clear that $schema doesn't matter. There are no test cases for it at all. ...
So, as currently designed, the test suite is undercutting an (apparent?) major project goal. It's very frustrating to see how little attention many implementations pay to $schema, and it's not a mystery as to why that has happened.

You are confusing intent with happenstance. It's not intentional that there are few to no tests involving $schema -- it's just that no one has written them yet.

I'd been intending to write more, using a new metaschema that used the format-validation vocabulary, but there was some confusion and disagreement about the existing format tests and I ran out of spoons. We could certainly add others that omitted certain vocabularies to ensure implementations did the right thing, but someone has to do it, and we're (almost) all volunteers. Maybe that will get better with the increase in paid staff. But please don't infer the lack of certain things is signalling some sort of intent. Output formats matter too, and we have no tests for those either, because it's complicated and someone has to do the work in thinking about how it's all going to work (and then make the case to everyone else to get it merged).

@awwright
Copy link
Member

awwright commented Jun 20, 2022

I thought it was implicit that the directory name for the tests would indicate which specification version to use for running those tests.

The directory name indicates the draft that the schema was written to. It doesn't imply there's a way to pick the specification version you use for running a test. This is implied because if I see a JSON Schema in the wild, I would have no way of knowing which specification version it was for, it makes sense tests should work this way too.

Testing multiple versions of a validator would be like saying "Should I test this JSON parser with RFC 4627, RFC 7159, or RFC 8259?" That is: while there's different versions of the JSON specification, there's only one version of JSON, and implementations are expected to be compatible with documents written for any of the RFCs, even when we don't know which one.

If I wrote a validator by comparing the input to the known tests, and returning the expected result, it would pass all the tests, but it would quickly fail against instances in the wild. Likewise, if you're changing the behavior of the validator depending on the directory it was found in, that's a little bit cheating the tests.

Adding $schema to the tests would be a slight variation on this, since $schema is a pretty good heuristic of specification version (this was intentional), but not necessarily:

{ "$schema": "https://json-schema.org/draft-03/schema"
, "$ref": "https://json-schema.org/draft-07/schema#"
, "type": "object"
}

... This is a schema that uses the draft-03 vocabulary to match all valid object-form draft-07 schemas (no boolean schemas). (Yes, according to the JSON Schema text, I would expect it to work this way!)

@Julian
Copy link
Member

Julian commented Jun 20, 2022

What we have today is not a heuristic and not cheating :), it's explicitly the test suite telling anyone using it "stuff in this folder is written to be run against the draft whose folder it's in". There's no ambiguity, that's the current public interface of this suite itself. There are indeed suites which test different RFC implementations of e.g. URIs, even though from staring at one you don't know what version it's written for. Such a thing doesn't sound like an alien concept to me for JSON either, and I wouldn't be surprised if they existed, heck they'd be useful to us even if we ever had a "format" : "json"` and explicitly said which RFC was intended (as we do for regexes or datetimes or whatever).

But it seems we're already all agreed (well, at least unless @karenetheridge says they disagree) to add $schema simply because the spec doesn't require that implementations respect or support being told what version to use when validating a schema with no $schema nor does it constrain what they do in that case at all. Unless someone disagrees with the next step here (add $schema to tests, and then perhaps add additional tests as per json-schema-org/json-schema-spec#210), I'd recommend we probably leave this here. Pathological examples like the one you just mentioned may indeed be interesting at some point but seem low priority to me. But file them as extra test tickets, why not.

@awwright
Copy link
Member

it's explicitly the test suite telling anyone using it "stuff in this folder is written to be run against the draft whose folder it's in".

Even if this was the intention, there's no such thing as "running against a draft" (any more than I can parse JSON "against RFC 4627"). Sure, we can talk about how JSON was defined as of a specific publication. But this does not imply that a JSON parser (or a JSON Schema validator) should toggle between RFCs/publications/versions.

Could you speak more specifically to this argument (there's normally no way to distinguish drafts and the tests shouldn't invent this parameter)?

Unless someone disagrees with the next step here (add $schema to tests

What do you mean exactly? We can add a file that tests the $schema keyword in various combinations. But the bulk of the tests should omit it.

@handrews
Copy link
Contributor

@karenetheridge I was not ascribing intent to the authors of the test suite, and please note my apology to @Julian yesterday for not making that 100% clear, and allow me to extend it to you and everyone else who has worked on the test suite. It was an observation of the effect, regardless of the intent.

The different directories exist for the express purpose of demonstrating the different expected behaviours for each specification version.

Up to draft-07, the requirements around $schema were fairly weak, so it was necessary to have an external signifier. But that's no longer really the case. The separate directories still make sense purely on organizational grounds, but $schema should be what really does the work.

@gregsdennis
Copy link
Member Author

there's no such thing as "running against a draft" (any more than I can parse JSON "against RFC 4627"). Sure, we can talk about how JSON was defined as of a specific publication. - @awwright

This isn't true. Each version/draft of JSON Schema adds new functionality. As you pointed out with your draft-3/draft-7 example, boolean schemas aren't a thing in draft 3, but they are in draft 7. Similarly, drafts 6+ no longer recognize id as a keyword. Therefore it doesn't make sense to have a generic "JSON Schema" validator because the idea of JSON Schema is ever-evolving. A validator must know against what stage of that evolution it's supposed to perform its validation.

Regardless, I think that discussion, while it may feed into this one, should be handled elsewhere.

All of that said, while my validator does allow the user to define what draft a non-draft-specific schema should be processed in, making this test suite possible, I'm on the side of including $schema in the tests because:

  • some validators may not have this configurability
  • behavior on $schema-less schemas is undefined, meaning implementations can do whatever
  • $schema-less schemas may not be indicative of the real world
  • we definitely should test that $schema should be respected

@handrews
Copy link
Contributor

Even if this was the intention, there's no such thing as "running against a draft" (any more than I can parse JSON "against RFC 4627"). Sure, we can talk about how JSON was defined as of a specific publication. But this does not imply that a JSON parser (or a JSON Schema validator) should toggle between RFCs/publications/versions.

@awwright you have a different view on how this does or should work than most of the other people in the project, which is a point of discussion at ietf-wg-httpapi/mediatypes#20 among other places. That discussion should be had and settled elsewhere before we make decisions one way or another based on any of our views.

@Julian , it occurred to me to check the validation spec, and I see that it says:

The current URI for the default JSON Schema dialect meta-schema is https://json-schema.org/draft/2020-12/schema. For schema author convenience, this meta-schema describes a dialect consisting of all vocabularies defined in this specification and the JSON Schema Core specification, as well as two former keywords which are reserved for a transitional period. Individual vocabulary and vocabulary meta-schema URIs are given for each section below. Certain vocabularies are optional to support, which is explained in detail in the relevant sections.

It's possible to read that as "if your implementation is a validator, in the absence of $schema it SHOULD assume this meta-schema and associated vocabularies and behaviors." But it doesn't actually state a clear requirement like that. I double-checked the core spec, and it does not mention default meta-schemas.

I vaguely recall that I wanted to draw a sharper line between a generalized extensible JSON Schema implementation (where it would not make sense to default $schema) vs specialized ones, which would deem most "JSON Schema" implementations that currently exist to be specialized for validation.

It's pretty wobbly reasoning, and definitely not covered by proper normative language, but I could see interpreting this to indicate that the test suite, as a test suite for validators, implies a default $schema.

I would prefer to take a more general view, in hopes that the test suite will broaden its scope beyond validation, e.g. to annotation collection, although of course annotation collection is not mutually exclusive with validation and usually relies on it, so... ¯\(ツ)

Anyway, I have $schema added locally for 2020-12 although I need to write a little script to sanity-check it. But I can post a PR if we do decide we want to do that.

@karenetheridge
Copy link
Member

I don't see any practical effect from adding $schema everywhere, unless you also intend to collapse all the tests together into a single directory where it is impossible to tell which version of the spec it's written for without examining the $schema keyword. Everyone uses the appropriately-named directory/ies when testing their implementation already. Absolutely we should have more tests of the $schema keyword, but adding the keyword to all test cases isn't going to add any value.

Instead of adding the keyword everywhere, I think the README should be strengthened to clearly indicate that it is intended that each directory's tests are to be run against an implementation that is configured to use that specification version, and skipped if that version is not supported (there is no sense in me running the draft4 tests against my implementation, as I know there will be failures and I don't claim to support that version).

@Julian
Copy link
Member

Julian commented Jun 20, 2022

Even if this was the intention, there's no such thing as "running against a draft" (any more than I can parse JSON "against RFC 4627")

I disagree, and it seems there's context here where some of this discussion has occurred, but let's focus on the things which are relevant to this ticket, which concerns simply whether we add $schema to all tests, as it seems there's other discussions that Henry has linked for this part of the discussion.

It does seem that now both you and @karenetheridge are indeed disagreeing with the intended next steps here. Can one or both of you explain how you read the specification then? The section @handrews linked seems clear, so it seems the burden should be on you to explain it some other way.

Specifically, I hypothetically have written an implementation which can accept schemas written either for draft 7 semantics or draft2020 semantics. This hypothetical implementation will return "true", i.e. valid, for all instances, when presented with a schema which does not have $schema declared in it. Can you point to where in the specification says I may not do so?

Henry has pointed to:

If absent from the document root schema, the resulting behavior is implementation-defined_.

which to me seems to say very clearly that I may indeed decide to do so. And, if I may do so, I will have difficulty running the test suite under my implementation, because around half of the tests in the draft2020 folder will fail when run against my draft2020 validator!

So in short to me, the spec seems clear, and effectively means we MUST add $schema to all schemas, otherwise an implementation such as the above would not be able to use it. Yes, we then may add optional tests which say one can have an implementation which indeed operates identically without $schema, but the question doesn't seem to be about clarity, the spec literally seems to allow compliant implementations which cannot today run the suite.

What have I missed?

(n.b. we definitely should not collapse the suite into one giant file or directory, so that's not one of the options anyone's proposing I believe.)

@awwright
Copy link
Member

Each version/draft of JSON Schema adds new functionality

I was using JSON for simplicity; but my point holds for standards that add features over time. Consider ECMAScript: There's no standardized way to specify which ECMA publication is used for running code... You get whatever your particular version of V8 implemented, which this is usually sufficient regardless of the version of ECMA you wrote your code to.

As you pointed out with your draft-3/draft-7 example, boolean schemas aren't a thing in draft 3, but they are in draft 7.

I was trying to call attention to how the "type" keyword would have been ignored according to the draft-03 I-D, but this is no longer the case, so even though I'm specifying a draft-3 $schema, the newer behavior applies.

Therefore it doesn't make sense to have a generic "JSON Schema" validator because the idea of JSON Schema is ever-evolving.

Can you detail how this follows? For example, ECMAScript, or CSS have no version identifiers, but they add functionality with each new release.

@karenetheridge
Copy link
Member

I'll respond with another question: where in the specification does it say that "$schema": "https://json-schema.org/draft/2020-12/schema" should be used to infer draft2020-12 semantics, "$schema": "https://json-schema.org/draft/2019-09/schema" for draft2019-09 semantics, and so on? [I was surprised too.]

My conclusion is: we have no way of indicating whatsoever what version of the specification we intend the implementation to use when running tests. Adding a $schema keyword won't make a whit of difference there. If we want a particular version of the spec to be used, we have to say so out of band.

We should fix the spec to make this more clear: as I've described a few times before, I've assumed the algorithm that $schema keywords are followed sequentially until one gets to one where the $id matches the $schema, and then it's either a URI the implementation recognizes, in which case the spec version is derived from that, or it's not, and evaluation fails/dies. But the spec doesn't actually say any of this. The spec assumes that its version is the only one in existence at any time, and its publication has superceded all previous publications. This is in conflict with the test suite that very clearly acknowledges that multiple versions of the spec exist, and that implementations can support any number of them. Therefore: the test suite needs to be explicit about the intent for the tests in each directory.

@Julian

This comment was marked as off-topic.

@handrews
Copy link
Contributor

handrews commented Jun 20, 2022

@karenetheridge

I don't see any practical effect from adding $schema everywhere, unless you also intend to collapse all the tests together into a single directory where it is impossible to tell which version of the spec it's written for without examining the $schema keyword.

I'm not sure what you mean by "practical effect." The important thing here is testing the specification. How the tests are organized and configured is a secondary concern. It doesn't matter whether it's possible to figure the processing semantics out from outside of the specification (e.g. the directory structure) or not. If it is both possible to figure out the right processing semantics from within the spec, and a requirement that implementations do so correctly, then that is what we should test.


Perhaps the most important argument is that it's possible to write a conforming implementation that only uses $schema to determine the processing rules. I don't know of any that effectively require it (refusing to process without it) but it would be valid to do that. Regardless, a validator should only need to be explicitly configured with a draft if it can't figure things out from $schema.

Our test suite should properly tests validators that actually rely on $schema without forcing them to pre-configure the draft beforehand. Pre-configuring is a convenience workaround outside of the specification's requirements, and it's the specification we should test.


There's also, in my view, no real reason not to do it. Adding $schema does not interfere. It is at most a mild inconvenience while writing tests, as one usually isn't writing huge numbers of schema objects at once.

I just added it to all of 2020-12's test cases, by hand, in vi, no scripting, no IDE, no nothing. It took a few slightly tedious minutes, but I needed to stop staring at another problem so that was convenient. I'm happy to do the others. I understand @Julian's concerns over mass changes, and am also happy to do whatever verification work is needed to make everyone comfortable with it (I'm writing a little script that goes through and checks for missing or mismatched $schema URIs and will add the output to any PRs).


Finally, there is clearly a larger discussion to be had regarding how we expect implementations to process schemas. In my experience, people use $schema when they are sending schemas out to be consumed by others, because you never want to rely on people reading documentation to get the settings right. Keeping the configuration inside the system is far more reliable.

I believe that we should orient the test suite towards real-world usage rather than convenience for test authors. Particularly when the inconvenience is not that substantial.

@handrews
Copy link
Contributor

@karenetheridge

where in the specification does it say that "$schema": "https://json-schema.org/draft/2020-12/schema" should be used to infer draft 2020-12 semantics

From §8.1.1 The "$schema" keyword:

The "$schema" keyword is both used as a JSON Schema dialect identifier

I think this pretty clear. It identifies the dialect, the dialect identifies the semantics / processing rules. I will grant you that it is not detailed, and could use some "MUST" etc., but there's a lot of other material in the spec that reinforces this.

From §3 Overview:

A dialect is defined as a set of vocabularies and their required support identified in a meta-schema.

Plus the entirety of $vocabulary and the rules explained there. It's normatively-described functionality depends on and supports $schema being respected to determine the processing rules.

From §9.3.2 Differing and Default Dialects (under §9.3 Compound documents):

When multiple schema resources are present in a single document, schema resources which do not define with which dialect they should be processed MUST be processed with the same dialect as the enclosing resource.

Since any schema that can be referenced can also be embedded, embedded schema resources MAY specify different processing dialects using the "$schema" values from their enclosing resource.

This is pretty clear that $schema signals the processing rules per resource, as that's the only mechanism given to detect the change for an embedded resource. The "MAY" here is about not requiring $schema to redeclare the same processing rules all the time, it's not about it being optional to respect $schema in embedded resources.

All of this is about resources setting their own processing rules, not someone externally choosing them.

@handrews handrews self-assigned this Jun 20, 2022
@Relequestual
Copy link
Member

Adding $schema to every test schema where appropriate feels totally reasonable to me.
Mild inconvenicne, huge gain.
We can verify that such a sweeping change doesn't screw anything up.

(It would be interesting to see if this impacts any of the major implementations before merging any such PR, just out of cauction.)

@gregsdennis
Copy link
Member Author

I'm happy to download the change and run it on mine. Adding $schema will override the configuration I currently have to set, so it should give the desired effect.

@handrews
Copy link
Contributor

I'll post a PR in the next day or two. Got a bit distracted by some other stuff.

@gregsdennis
Copy link
Member Author

The original question for this issue was, "Do we need $schema in the test suite cases?" not "Should $schema be required?"

After (lengthy) discussions in Slack and some time to think, I've concluded that no, the test cases don't need $schema*.

Warning: reasoning ahead

Consider the schema {"type": "string"} included in the "draft 7" folder. My original point was that there is nothing stopping an implementation from running this as a 2019-09 schema.

But that doesn't matter as long as the result is correct. My question assumes that implementations will have "Draft X Compability" modes, but that's not the case, and it's not required.

Basically, if my validator is given that schema and ["array"] as an instance, and it returns invalid, then it's compliant for draft 7 for this scenario. It's also compliant (for this scenario) with drafts 3 through 2020-12.

If it's compliant for all test cases within a single draft folder, then it's said to be compliant with that draft.

None of this is dependent upon $schema being present.

If, in some future version, we change the semantics of type so that the above scenario is valid, then my implementation would not be compliant with that version. But again $schema doesn't need to be present in order for this to be determined. The fact that the test case expects a "valid" result and doesn't get one determines that the implementation is non-compliant for this future version.

The test suite isn't asking, "Can the implementation process a schema with meta-schema X?" (Maybe that's a question it should ask, but if so it should do it as distinct test cases.) It's asking, "Can the implementation give me the desired result as indicated by the specification?" That I have to configure my implementation to properly handle some test cases is outside the scope of the test suite. Secondarily, it's my responsibility to document that such a configuration needs to be performed before processing some cases.

* Caveat

Because the 2019-09 and 2020-12 spec allow for implementations to refuse to process schemas without the $schema keyword, it's conceivable that, in the test suite's current state, an implementation could refuse to process the entirety of these draft folders and still claim compliance with these drafts.

This needs to be fixed by adding $schema to the schemas in these folders. The other drafts don't need it for the reasoning above.

We will continue discussion on "expected behavior when $schema is missing for future versions" elsewhere.

@handrews
Copy link
Contributor

@gregsdennis I was only planning to do 2019-09 and 2020-12 in the PR, so I think we're good here. I had noted earlier that there is no language in draft-07 or earlier that says anything at all about what happens without $schema, and therefore I did not think it was strictly necessary to add it. While I would do that if it were just me, I'm fine with leaving it as-is for those drafts.

@Relequestual
Copy link
Member

In the wider discussion, we're conflating a few more issues than we realised (I think).

We have "compliance" and we have "support".

You can be "compliant" with a version of the spec while not "supporting" said version of the spec.

Currently, for 2019-09 and 2020-12, based on the discussion, we need to add $schema to those folders in order to test for support (As far as I can tell, we agree on this). Compliance, strictly, isn't being fully tested, because, as we've disccussed, if $schema is absent, all bets are off and anything goes (aka, implementation defined).

I'm going to strongly suggest that we have and ADR included in any PR that looks to resolve this issue, mostly to avoid us hashing this out again.
I appreciate that requires more work, and additional sign off, but I feel that's worthwhile if we can avoid revisiting the discussions of this week!

@awwright
Copy link
Member

I concur with #311 (comment)

However, if it's true that this effect was not intended, I think we should put this on hold and fix that first, since the resolution will affect this issue. json-schema-org/community#189

@Relequestual Can you offer a definition for these terms? It would seem to me that schemas are "compliant" or not, whereas validators have a range of features that they "support" (or don't support).

@handrews
Copy link
Contributor

@awwright

However, if it's true that this effect was not intended, I think we should put this on hold and fix that first, since the resolution will affect this issue.

Again, it was 100% intended that conforming implementations can refuse to process schemas that do not have $schema. That was the intent, three years or more ago when I made that change, got it reviewed, and incorporated into the spec, where it has been published in three consecutive drafts.

The only part that was unintentional was that I forgot to forbid completely arbitrary behavior. But that doesn't impact the reasoning here. Fixing that doesn't change anything about the need for $schema in 2019-09 and 2020-12. And I continue to believe that allowing refusal to process is correct.

Please stop treating the intended meaning of this clause as a bug. There are bugs related to it, but fundamentally it says what it was meant to say.

@Julian
Copy link
Member

Julian commented Jun 23, 2022

Please stop discussing. The change is accepted, it's the correct one, we can summarize the back and forth above in an ADR if helpful for documentation (I can do that). Henry send a PR whenever you can.

@Julian Julian changed the title Is there a reason we don't specify $schema in the test schemas? Draft 2019+ tests incorrectly depend on implementations supporting $schema-less schemas but they are not required to process them Jul 1, 2022
@Julian Julian added bug A test is wrong, or tooling is broken or buggy. and removed enhancement An enhancement to the tooling or structure of the suite (as opposed to a new test). labels Jul 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A test is wrong, or tooling is broken or buggy.
Projects
None yet
7 participants