Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$vocabulary tests are incorrectly required #574

Closed
Julian opened this issue Jul 6, 2022 · 23 comments
Closed

$vocabulary tests are incorrectly required #574

Julian opened this issue Jul 6, 2022 · 23 comments
Labels
bug A test is wrong, or tooling is broken or buggy.

Comments

@Julian
Copy link
Member

Julian commented Jul 6, 2022

The vocabulary tests here are not strictly correct according to the spec, unless I'm missing something. They appear to assert that if the validation vocabulary isn't present, that an implementation must mark instances as valid even if the validation vocabulary says they are not -- but that's not the specified (or intended) behavior of $vocabulary as far as I know. Quoting §6.5:

Additional schema keywords and schema vocabularies MAY be defined by any entity. Save for explicit agreement, schema authors SHALL NOT expect these additional keywords and vocabularies to be supported by implementations that do not explicitly document such support.

I.e. a schema author may not depend on support for a keyword or vocabulary they use but do not place in $vocabulary, but an implementation may indeed offer support for it and enable it, either by always enabling the vocabulary or because it has chosen to add a keyword called "minimum" whose behavior is precisely the same as the validation vocabulary's, and then enable it by default regardless of what's in $vocabulary.

When the $vocabulary keyword does have mandatory effect is in the converse -- where an implementation lacks support for a vocabulary and a schema author requires its use, the implementation may not ignore those keywords:

The values of the object properties MUST be booleans. If the value is true, then implementations that do not recognize the vocabulary MUST refuse to process any schemas that declare this meta-schema with "$schema". If the value is false, implementations that do not recognize the vocabulary SHOULD proceed with processing such schemas. The value has no impact if the implementation understands the vocabulary.

from §8.1.2.

TL;DR, an implementation given this schema:

{
            "$id": "https://schema/using/no/validation",
            "$schema": "http://localhost:1234/draft2020-12/metaschema-no-validation.json",
            "properties": {
                "badProperty": false,
                "numberProperty": {
                    "minimum": 10
                }
            }
}

(with metaschema here) is indeed free to still apply the validation vocabulary, or to similarly define some behavior for the minimum keyword which makes instances like 20 be invalid.

In "today's test layout", the above means that these tests belong in optional, though given we have #561 on hold pending restructuring the optional/ directory, perhaps we instead should remove them and do the same with these?

CC @handrews (since I believe you confirmed the above interpretation previously, but just making sure) and @karenetheridge (since you added these looks like, in case you disagree).

@Julian Julian added the bug A test is wrong, or tooling is broken or buggy. label Jul 6, 2022
@jdesrosiers
Copy link
Member

That's how I understand it as well. I agree that the test should be moved to "optional".

@handrews
Copy link
Contributor

handrews commented Jul 6, 2022

Strong disagree. I will work up a more thorough response.

@karenetheridge
Copy link
Member

Regardless of the outcome here, we should clean up the wording in the spec so it's more clear what the Right Thing is.

@Julian
Copy link
Member Author

Julian commented Jul 6, 2022

Strong disagree. I will work up a more thorough response.

Ha. Quoting someone named @handrews from here :D

[When not part of a vocabulary declared], extra keywords are still allowed and ignored, and the part of the spec he quoted indicates that. $vocabulary says what is being used, but not that only those vocabularies are being used. This was intentional to allow casually adding keywords in informal settings, without having to construct a vocabulary, assign a URI, etc.

But will wait for your more thorough response to clarify :)

@handrews
Copy link
Contributor

handrews commented Jul 7, 2022

@Julian thanks - it might be a few days before I have time to sort it out and explain (and make sure I'm actually right first).

@Julian
Copy link
Member Author

Julian commented Aug 7, 2022

@handrews I know you're still doing what you can, but just a heads up, I'd like to clear a bunch more things out of the way (this and $schema-related changes) in the next few weeks so they don't sit here and languish. Will wait a bit longer in case you're able to get to them and otherwise likely move forward. Just in case, again, I hope you're not misinterpreting the issue title as "no $vocabulary support is mandatory", rather than what it really means which is "the tests we have are not for the mandatory portion of it".

@handrews
Copy link
Contributor

handrews commented Aug 12, 2022

Determining where these test cases go involves determining two things:

  1. Does the specification require that vocabularies not present in $vocabulary not be used?
  2. Does the specification allow an always-on non-vocabulary extension keyword to take the place of a keyword from a vocabulary that is not in use?

What is $vocabulary actually specified to do?

This is stated most clearly in two places:

8.1. Meta-Schemas and Vocabularies

Two concepts, meta-schemas and vocabularies, are used to inform an implementation how to interpret a schema. Every schema has a meta-schema, which can be declared using the "$schema" keyword.

The meta-schema serves two purposes:

Declaring the vocabularies in use
The "$vocabulary" keyword, when it appears in a meta-schema, declares which vocabularies are available to be used in schemas that refer to that meta-schema. Vocabularies define keyword semantics, as well as their general syntax.

and then:

8.1.2. The "$vocabulary" Keyword

The "$vocabulary" keyword is used in meta-schemas to identify the vocabularies available for use in schemas described by that meta-schema. It is also used to indicate whether each vocabulary is required or optional, in the sense that an implementation MUST understand the required vocabularies in order to successfully process the schema. Together, this information forms a dialect. Any vocabulary that is understood by the implementation MUST be processed in a manner consistent with the semantic definitions contained within the vocabulary.

The section on default vocabularies provides some additional clues about expected behavior:

8.1.2.1. Default vocabularies

If "$vocabulary" is absent, an implementation MAY determine behavior based on the meta-schema if it is recognized from the URI value of the referring schema's "$schema" keyword. This is how behavior (such as Hyper-Schema usage) has been recognized prior to the existence of vocabularies.

If the meta-schema, as referenced by the schema, is not recognized, or is missing, then the behavior is implementation-defined. If the implementation proceeds with processing the schema, it MUST assume the use of the core vocabulary. If the implementation is built for a specific purpose, then it SHOULD assume the use of all of the most relevant vocabularies for that purpose.

For example, an implementation that is a validator SHOULD assume the use of all vocabularies in this specification and the companion Validation specification.

What's important here is that while there are conditions under which a can (or even SHOULD) assume vocabularies, they are only relevant if:

  1. $vocabulary is absent,
  2. $schema is absent or not recognized (meaning that the implementation does not know what vocabularies the meta-schema was intended to imply), and
  3. the implementation decided to process the schema, which it is not required to do (the SHOULD is about which vocabularies to assume, not about whether to process or not — a purpose-built validator is just as free to decline to process as any other implementation)

There is nothing about assuming vocabularies when $vocabulary is present.

Another important set of requirements comes from the beginning of §8:

8. The JSON Schema Core Vocabulary

The Core vocabulary MUST be considered mandatory at all times, in order to bootstrap the processing of further vocabularies. Meta-schemas that use the "$vocabulary" (Section 8.1) keyword to declare the vocabularies in use MUST explicitly list the Core vocabulary, which MUST have a value of true indicating that it is required.

The behavior of a false value for this vocabulary (and only this vocabulary) is undefined, as is the behavior when "$vocabulary" is present but the Core vocabulary is not included. However, it is RECOMMENDED that implementations detect these cases and raise an error when they occur. It is not meaningful to declare that a meta-schema optionally uses Core.

It doesn't make sense to mandate the presence of the core vocabulary in $vocabulary unless leaving it out means that it would not be available. If it is set to false or left out, the behavior is undefined, which implies that for any other vocabulary, the behavior is defined. That is explicitly true for the false value, and this suggests that it was assumed to be true for omitted vocabularies as well.

The RECOMMENDED approach of raising an error is because the meta-schema is otherwise saying that you can't use the core vocabulary, which makes no sense. (Sadly, the individual vocabulary meta-schemas omit the core vocabulary, which... wtf was I thinking? It ends up being OK because they're not really intended to be used on their own.) So there's no reason, given this, to assume that an implementation can use the validation vocabulary if it's omitted.


Looking over the above text, it's unquestionable that the spec does not offer clear normative text requiring vocabularies omitted from $vocabulary to not be used. However, the phrase "identify the vocabularies available for use" hints at that by implying that other vocabularies are not available (otherwise why would we need to list the set at all?). I accept that it does not say so explicitly, but I argue that between that phrasing and the explicit statements about a few cases where vocabularies can be used without appearing $vocabulary, there is at least some ambiguity here, so how do we resolve that?

Do old issue and PR comments support my reading of $vocabulary?

Test suite issue #439 "Document the test inclusion guidance/criteria" includes the following guidance:

additional tests MUST NOT attempt to clarify the specification itself independently for behavior that was not considered or proscribed by the specification. In the case of ambiguous text in the specification, the specification team SHOULD be consulted to confirm what behavior was intended. If the relevant scenario was clearly and specifically considered but the wording was unclear, tests MAY be added. Otherwise, the test MUST be deferred (i.e. not added with any expected result) until a specification with explicit decision on its behavior is published.

Part of "consulting the specification team" includes what I personally intended for $vocabulary, but what is more important is whether other members of the team involved in adding the feature understood and agreed with that intent.

I think it can be solidly establshed that the set of people most involved in adding and reviewing this feature (myself, Greg, Ben, and jgonzalesdr, with others chiming in here and there) all understood $vocabulary to be the complete, rather than minimum, set of usable vocabularies in the context of a given meta-schema, with the ability to turn off the standard validation feature seen as a valuable use case:

  • Move "applicability" keywords to core json-schema-spec#513 has an extensive (and exhausting) debate regarding using JSON Schema without the standard validation vocabulary. $vocabulary had not been specified yet, but we had the concept of modular vocabularies as a subject of discussion.

    • Evgeny's counter-argument was that the standard validation vocabulary should always be present. Several of us spent 50+ comments arguing against this, which at least establishes the idea that turning the vocabulary off somehow is seen as valuable. However, this issue avoids asserting that all implementations must be able to turn it off.
    • Relequestual, dlax, erayd, Anthropic, and I all presented or agreed with use cases for omitting the standard validaiton vocabualry. Some used alternate assertion vocabularies, and others avoided assertions entirely. One comment mentions substituting a different kind of type check, although it does not say that that would be done by redefining type (the relevance of this will be clear later).
  • Issue Core vocabulary? json-schema-spec#567 is all about whether you can or have to put the core vocabulary in $vocabulary. Most of the discussion is elsewhere (see the $vocabulary PR below), but we decided against letting implementations just use it without it being present. This decision only makes sense if omitting a vocabulary from $vocabularies means it wouldn't be used.

  • In PR 671 comment, Greg observes "It sounds like $vocabulary is kind of an equivalent of the JS imports statement for schema keywords. Unless a keyword is defined within one of the "imported" meta-schemas, a validator will ignore it." There's also some discussion of special handling if the core vocabulary is omitted.

  • Here, in another PR 671 discussion with Greg I discuss what happens if the core vocabulary is omitted, saying "You can't not use the core vocabulary- you need it to bootstrap processing (assuming you don't just hardcode it, which you can do because it's mandatory and I expect most people will). Basically, I expect implementations to ignore whether core is present in $vocabulary but some people will like having it there." which hopefully establishes that I believed that omitting a vocabulary from $vocabulary would mean actually not using it, which is why the core vocab required special discussion of this.

  • In yet another PR 671 commment Ben states "Assuming that validation is its own vocabulary, this means that any schema document that is for validation, MUST define it is using the validation vocabulary, right?"

    • This is in the context of when $vocabulary is absent, but it reflects the assumption that failing to declare a vocabualry makes it unavailable.
    • In a follow-up comment, Greg notes "A vocabulary is a just set of keywords. If I want to create a validation vocabulary with a completely disjoint set of keywords, I shouldn't be required to also include the standard set. Perhaps my vocabulary isn't disjoint. Maybe it overlaps and redefines a number of the keywords. In that case, it would be explicitly wrong of me to include the standard set due to conflicts." This statement makes it clear that an implementation that enabled the standard validation even if it is omitted would create a conflict with extension vocabularies. If implementations were allowed to just enable whatever they want, then the standard vocabularies would become a reserved namespace, and the spec makes clear that that is only true for core.
    • Two follow-ups below that, I note that "[vocabulary] assumptions are only allowed when $vocabulary is completely absent", referring to assumptions that a validator can make in the absence of $vocabulary, in what is now §8.1.2.1 "Default vocabularies". The section as published establishes what assumptions are allowed under what circumstances, while this comment clarifies that other assumptions were intended to be forbidden.
  • There were various PR 671 discussions, including this one about the wording that, by the end of PR, became that identify the vocabularies available for use language. This particular thread was started by jgonzalesdr several days after one of the above threads, in which he participated, so folks had the context of omitted vocabularies being turned off, and none of us spotted the problem with the wording despite agreement on that aspect.

Non-schema extension keywords replicating the standard vocabularies

I think it should be pretty clear at this point that despite the lack of normative text, the active contributors on this feature all agreed that $vocabulary defines the complete set of available vocabularies, not the minimum.

So the question now is whether an implementation can define minimum (the keyword used in these test cases) as a non-vocabulary extension, with syntax and semantics identical to that of the validation vocabulary, and always have that keyword enabled regardless of what $vocabulary contains.

@Julian asserts that implementations can have always-on non-vocabulary extension keywords. This sort of extension long predates my involvement with the project, and Julian has a much deeper background with it, so I will defer to him on this point.

The remaining question is whether, in 2019-09 and 2020-12, it is sufficiently acceptable to do that with minimum (or any non-core standard vocabualry keyword). Let's walk through this.

  1. Any implementation running this test suite supports the validation vocabulary, by definition. It doesn't matter if it supports it by hardwiring it or by treating it as a plug-in just like all other non-core vocabularies (this is §6.5's distinction between "direclty supporting" or not "direclty supporting" a vocabulary).
  2. Omitting the validation vocabulary from $vocabulary means that the implementation cannot use it.
  3. It is not possible to detect the difference between using the vocabulary keyword and using an extension keyword that is identical in appareance and behavior to the vocabulary keyword. Since there is no requirement that vocabularies be implemented as plugins, there is no universal distinction between "this is the implementation of a vocabulary keyword" and "this is the implementaiton of a non-vocabulary extension." The only difference is that vocabularies have a mechanism for controlling whether they are available to a schema or not.

§6.5 says "Additional schema keywords and schema vocabularies MAY be defined by any entity." The key word here is "additional." There are two plausible readings of "additional":

  • additional beyond what the specification defines
  • additional beyond what the keywords from the vocabularies that$vocabulary lists

While the core specification does not depend on the validation specification, it references its vocabularies when discussing default vocabularies in §8.1.2.1, and we've already established that any implementation running the test suite supports the validation vocabulary anyway.

I do not see a plausible argument that an implementation that supports the standard validation vocabulary can claim that a minimum keyword that effectively duplicates the same keyword from that vocabulary is an "additional" keyword in such a way that it would invalidate these test cases.

  • If it is truly identical, then it is the vocabulary keyword, and not "additional" at all, as the internal code structure is irrelevant. It's the vocabulary keyword that's being turned on despite $vocabulary indicating that it is turned off.
  • If it is not identical, but is always on, then it could be "additional" but it has redefined the keyword in a non-compliant way and won't pass the test suite anyway.
  • If it is not always on, then claiming that automatically enabling it if and only if the validation vocabulary is omitted from $vocabulary, are we really saying that that is a reasonable test configuration that should pass a required test?

There is no plausible reason that someone would do that other than to confound this test case. I don't think that invalidates the test case. There are other things in the required test suite that a truly dedicated person could find a way to break and still claim compliance. I don't think those cases should be moved to optional, and I don't think these should either.

@Julian
Copy link
Member Author

Julian commented Aug 15, 2022

It seems like the new information in your comment, to summarize, is that spec authors intended (and agreed) that $vocabulary indeed restricts what vocabularies are available, and my reading of the spec is incorrect. Of course that's totally fine, and doesn't even bear me double checking those posts to see others intended this, I'm happy to take your word for it.

The second bit seems a bit shakier. There are a few lines in there that no one (well I) at least wasn't making, perhaps just to avoid doubt I'll list them, so we can focus on the other bit:

If it is not identical, but is always on, then it could be "additional" but it has redefined the keyword in a non-compliant way and won't pass the test suite anyway.

If it is not always on, then claiming that automatically enabling it if and only if the validation vocabulary is omitted from $vocabulary, are we really saying that that is a reasonable test configuration that should pass a required test?

There is no plausible reason that someone would do that other than to confound this test case.

None of this was intended certainly.

Though neither is precisely how you wrote the first option:

If it is truly identical, then it is the vocabulary keyword, and not "additional" at all, as the internal code structure is irrelevant. It's the vocabulary keyword that's being turned on despite $vocabulary indicating that it is turned off.

It is to me additional in the sense that the rest of the vocabulary is not enabled. It thereby differs in behavior from minimum-as-part-of-the-validation-vocabulary, but not in a way that'd fail the tests. This is precisely the way to differentiate such a keyword "practically" from the vocabulary, because other keywords from the vocabulary in this hypothetical don't work.

@gregsdennis / @Relequestual on the minimum bit, that was your understandings as well? That the introduction of $vocabulary means essentially that keywords defined by core vocabularies are now "special" in some sense? Specifically, surely no one thinks that if I invent a foobar keyword outside of a vocabulary and turn it on by default in an implementation, that I am forbidden to start doing so once some other person adds a foobar keyword to a vocabulary they create? But here we apparently have minimum, a keyword usable independently, which one cannot turn on by default for whatever reason they come up with, because that keyword is part of one of the spec's vocabularies.

Keep in mind the suite's role isn't to decide just how pathological something is, nor whether such a thing is what we want to be true in future specs, just current ones -- in fact to me it was easier to think about this via type rather than minimum, even though I think if the answer is "it's allowed" for type then that should apply to minimum, they're no different from each other, and it may just require more imagination for minimum. Specifically -- can some implementer say "type is so useful I want it always available, my users always use it"?

If the answer is that that was the intention (that $vocabulary changed things in the above way) I'm indeed happy to close this.

@gregsdennis
Copy link
Member

There's a lot going on this this issue, and I've completely lost track of who's arguing for what.

That said, if someome created a meta-schema that didn't include the validation vocab and then included type in a schema that uses that meta-schema, my implementation would ignore type.

If that meta-schema also required (value of true) a custom vocabulary that defined type, my implementation would need a type keyword implementation that is defined for that custom vocabulary.

In short, $vocabulary defines the keywords that are usable by the schema. If a vocab is absent, then those keywords aren't usable. The caveat is the core vocab, which (per 8.1.2.1) is implied when absent (unless that entire section is subject to the intro "if $vocabulary is absent...").


I think the takeaway is that the language could be better. Personally I like the idea that vocabs declare keywords that are available for use, which implies keywords defined by missing vocabs are unavailable.

@Julian
Copy link
Member Author

Julian commented Aug 15, 2022

There indeed is a lot of back and forth -- to summarize, the final question isn't the scenario you mentioned about a concrete implementation, it's about implementers and ones with some specific need or desire to add extra keywords, not your specific implementation's behavior (or mine or whatever) -- specifically:

Is an implementer of JSON Schema allowed by the spec, in their own new implementation they're writing, to make their implementation always understand the type keyword, even when given meta schemas without the validation vocabulary?

Adding extra keywords beyond those given in a metaschema, and beyond $vocabulary which didn't exist obviously, used to be allowed in previous drafts (and some people did it if only to show off their own proposals for new keywords), and Henry is saying, if I'm not paraphrasing too heavilty, that specifically for keywords present in our vocabularies, you and Ben and he intended to disallow this once vocabularies were introduced for those specific keywords.

I think the takeaway is that the language could be better. Personally I like the idea that vocabs declare keywords that are available for use, which implies keywords defined by missing vocabs are unavailable.

(I assume your line break was acknowledging this was a tangential, but yeah there's I believe no argument that non-vocabulary keywords are allowed today, it's explicitly in the spec, so whether or not I agree with your last line for future specs -- n.b. I do, if $vocabulary behaves in the way Henry says it does it's weird to me that non-vocabulary keywords are allowed at all -- but I think it probably doesn't have bearing on the current behavior.)

@gregsdennis
Copy link
Member

Is an implementer of JSON Schema allowed by the spec, in their own new implementation they're writing, to make their implementation always understand the type keyword, even when given meta schemas without the validation vocabulary?

No, an implementation may not assume a keyword that is not defined in the vocabularies listed in the meta-schema. Section 8.1.2 explicitly forbids it:

Per 6.5, unrecognized keywords SHOULD be treated as annotations. This remains the case for keywords defined by unrecognized vocabularies. It is not currently possible to distinguish between unrecognized keywords that are defined in vocabularies from those that are not part of any vocabulary.

A keyword in a vocabulary not listed in $vocabulary must be considered unrecognized.

@Julian
Copy link
Member Author

Julian commented Aug 15, 2022

I don't think even Henry (who doesn't go as far as I do) agrees with that, but it could be I'm just really off. But keywords outside of vocabularies are explicitly allowed by the section both he and I cited previously, from §6.5:

Additional schema keywords and schema vocabularies MAY be defined by any entity. Save for explicit agreement, schema authors SHALL NOT expect these additional keywords and vocabularies to be supported by implementations that do not explicitly document such support. Implementations SHOULD treat keywords they do not support as annotations, where the value of the keyword is the value of the annotation.

To me the word "unrecognized" in your passage clearly does not apply to keywords the implementation indeed recognizes. What you're citing is that if I present an implementation with awfeoije and it doesn't know what that means, the implementer cannot make their implementation blow up, it has to ignore the keyword it doesn't recognize. It doesn't say it can't indeed say "yup I know what awfeoije is". So I personally see no contradiction there, but I don't think anyone has previously argued that only vocabularies are allowed to add keywords.

@handrews
Copy link
Contributor

handrews commented Aug 15, 2022

@Julian what I don't understand is why you are so dead-set on insisting that this particular configuration: an implementation that deliberately circumvents the intent of $vocabulary as its default setting through which it expects to pass the test suite is something the test suite should accommodate.

Is the test suite designed to test reasonable configurations or pathological ones? I'm defining a "pathological" implementation as one that exploits non-interoperable behavior to circumvent more clearly-defined behavior. The disabling of the validation vocabulary by omitting it from $vocabulary is clearly-defined behavior (the text is not as clear as it should have been, but as demonstrated, everyone who worked on getting that PR in had the same understanding of that text means, and it is well within the common-language meaning of that text).

That is a choice that an implementation makes to go beyond the spec, into areas that are not interoperable. §6.5 makes it quite clear that this is outside of what is "normal" JSON Schema behavior: "Save for explicit agreement...", etc.

The test suite is correctly happy to rely on this elsewhere. The (again, correctly) required test that ensures that $id is not recognized in a location that is not known to be a schema relies on an unimplemented unknown keyword. But by the logic you're using here, the test suite MUST assume that any unknown keyword might be implemented and permanently enabled, and done so in such a way to make involving one in a test impossible. I hesitate to bring this up because I will be just as vehemently against moving tests/draft2020-12/unknownKeyword.json out of the required suite as I am about these $vocabulary tests, but in terms of reading what the spec allows around additional keywords in the most aggressive possible way, it's the exact same problem.

There are quite a few configurations or implementation choices that are not explicitly forbidden by the spec but that the test suite. The spec's wording technically allows ignoring certain uses of $ref because the language around noticing an $id and "automatically" resolving a $ref to it is at most a SHOULD. Are you going to exile all of those tests to "optional"? I would certainly hope not.

Because if we go that far, then (as you and I have discussed), the core spec starts falling apart entirely. The normative language just isn't clear enough. Tons of things in the core spec lack normative language entirely. There is no normative language around what it means to "identify" a schema with a URI.

None of this has to do with whether only vocabularies are allowed to add keywords. It has to do with how far implementations can go beyond what the spec states into murky, explicitly non-interoperable areas, and still expect to pass the test suite.

You keep saying that you don't want the test suite to be making subjective decisions, but there is unquestionably a subjective decision going on here, where you are willing to allow the test suite to assume a reasonable configuration and set of implementation choices for some tests in the required suite, but seem dead-set on forbidding such an assumption for $vocabulary. And I cannot understand why $vocabulary is being singled out for this treatment.

@handrews
Copy link
Contributor

handrews commented Aug 15, 2022

An implementation that notices that the validation vocabulary is disabled and then automatically enables a non-vocabulary keyword that takes its place is not interoperable. It is also clearly built to circumvent the normal behavior of $vocabulary, because we're not even talking about an extension keyword that is always on. We're talking about one that is conditionally on when $vocabulary has excluded a standard vocabulary. And there is nothing in the spec addressing, much less encouraging, such a thing.

Yeah, I forgot to explicitly exclude it because it never occurred to me that someone would try to do that. Previous versions of the spec, of course, did not talk about conditionally automatically enabling extension keywords to circumvent other required behavior, because there wasn't behavior to circumvent.

So why is "conditionally automatically enable an extension keyword to circumvent the disabling of the validation vocabulary" the reading of the spec that we have to accommodate? Instead of "Yes it's possible to define minimum as a non-vocabulary extension keyword, but since it overlaps with a vocabulary keyword it clearly is not an extension keyword by default, therefore it is off by default, because conditional-automatic-only-if-validation-is-not-present invents a new category of extension behavior that never existed before"?

@Julian
Copy link
Member Author

Julian commented Aug 15, 2022

An implementation that notices that the validation vocabulary is disabled and then automatically enables a non-vocabulary keyword that takes its place is not interoperable. It is also clearly built to circumvent the normal behavior of $vocabulary, because we're not even talking about an extension keyword that is always on.

Such a thing is again ridiculous, and not what I've said -- I again am focused strictly on an implementation where the keyword is always enabled, no matter what.

@Julian what I don't understand is why you are so dead-set on insisting that this particular configuration: an implementation that deliberately circumvents the intent of $vocabulary as its default setting through which it expects to pass the test suite is something the test suite should accommodate.

The absolute only thing I am focused on is what the spec means and intends, which includes indeed some room to pepper over stuff missing from the "letter" of the spec, but which does not mean the test suite is a way to enact new change -- the minute it's clear that's indeed what the spec authors (in this case you cited them) really meant and agree on, that's it. Here (and often) I represent some stubborn implementer who says it's not clear that the spec and its authors agree this isn't allowed. That is literally my most important role maintaining this repo, is to ensure the test suite isn't valueless or devalued because someone thinks it diverges from the specification, and I try to do it uniformly for any test.

I hesitate to bring this up because I will be just as vehemently against moving tests/draft2020-12/unknownKeyword.json

Guarding against an implementation adding a keyword called array_of_schemas isn't something I'd ever worry about -- though if someone came along and said "hey actually I have one of these", probably we'd change those tests to use a UUID or something, and then that'd be enough, there's no need to say someone can continue to add random crazy validators which no one would add and I think it's trivial to differentiate from this case, where someone may indeed think they're doing good by enabling some basic keywords in all cases.

You keep saying that you don't want the test suite to be making subjective decisions, but there is unquestionably a subjective decision going on here, where you are willing to allow the test suite to assume a reasonable configuration

I haven't made any decision. I pointed out an area of lack of agreement first in the spec text based on my reading, and now between spec folks, starting with Jason, and then now Greg, who you'll forgive me for saying it so explicitly, is saying something very straightforwardly incorrect according to both the spec and the intention behind the spec. I then simply asked that if you all indeed agree this (== banning keywords present in the core vocabularies from behavior otherwise allowed) is the intention of the spec, and if that's indeed the case, that these can stay.

@handrews
Copy link
Contributor

does not mean the test suite is a way to enact new change

This is not in any way a new change.

I again am focused strictly on an implementation where the keyword is always enabled, no matter what.

But that is not a thing that is possible with a standard keyword. It is either a standard keyword, and therefore under the control of the vocabulary system that was specifically designed to control such things, or it is not.

  • If it's a standard keyword, then $vocabulary omitting the vocabulary that defines it turns it off.
  • If it's not a standard keyword then it's violating the fundamental assumption of the test suite which is that we're testing something that implements the standard.
  • If it is not a standard keyword but behaves like a standard keyword, then it is pathological: It's not really the thing the test suite was designed to test (which is the standard set of vocabularies), but it is masquerading as one purely to confound the test suite. And if we allow that, then we can allow any standard keyword to be substituted by an always-on extension keyword, and what is the point of anything if that is the case? The idea of a standard vocabulary becomes meaningless.

I pointed out an area of lack of agreement first in the spec text based on my reading, and now between spec folks, starting with Jason,

Jason wasn't involved in the PR adding $vocabulary, or in any of the other issues I cited, so his interpretation is an after-the-fact thing.

and then now Greg, who you'll forgive me for saying it so explicitly, is saying something very straightforwardly incorrect according to both the spec and the intention behind the spec. I then simply asked that if you all indeed agree this (== banning keywords present in the core vocabularies from behavior otherwise allowed) is the intention of the spec, and if that's indeed the case, that these can stay.

I'm tempted to just say yes to get the outcome that I want, but in terms of Greg's exact wording that would be dishonest. I expected people to be allowed to implement additional keywords outside of the vocabulary system. But the standard validation vocabulary keyword minimum is not outside the vocabulary system. Or rather, it cannot be both outside and inside at the same time, which is the key point.

Therefore if we are truly testing standard implementations, then the only relevant minimum is the standard one, and these tests turn it off. It can't magically morph into something else that allows it to stay on. Yes, someone could configure a validator to not use the validation vocabulary, but to use instead some extension keywords that include some with the same names as those in the validation vocabulary, but that's not standard JSON Schema at all at that point.

On the other hand, if an implementation had an always-on non-vocabulary extension xml keyword, and we had a test that enabled the OAS 3.1 vocabulary, which also has an xml keyword, then turning the OAS vocabulary off would not remove the always-on non-vocabulary xml. Turning the OAS vocabulary on might not work if the two xmls have a different interpretation, in which case per §8.1.2 the keyword's semantics MUST match that of the vocabulary.

But minimum cannot be both additional and standard at the same time, and if it's not standard then the test suite isn't even relevant.

@Julian
Copy link
Member Author

Julian commented Aug 15, 2022

You are asserting things that I've given possible alternate explanations for. I have no way of knowing which explanation is correct, as I myself didn't participate, and the spec doesn't say it. Your opinion is valuable, and it indeed weighs things. It immediately weighed things enough to have me believe $vocabulary works the way you're saying everyone agreed it should work at the time (re: restricting available vocabularies), regardless of what I read in words in the spec, precisely because you're saying everyone agreed to that bit. Your opinion on bits you're not asserting that for isn't absolute though, so you repeating what you claim doesn't help much there. Others participated clearly in deciding what the feature does. I want to hear others say they:

  • understand the nuance (i.e. which behaviors we're all agreed on, and which one we're talking about.)
  • agree with your assertion that standard keywords like minimum behave the way you say they do outside of vocabularies (i.e. can never be considered not part of a vocabulary) and that therefore:

If it's a standard keyword, then $vocabulary omitting the vocabulary that defines it turns it off.

despite not applying to xml, does apply to minimum. Again, I'll ask that you please try to avoid straw men (which I've pointed out 3 times now), since the other alternatives you give are things I've never claimed, are somewhat trivially ridiculous, and make it harder to distinguish what the point of clarity is for others to weigh in on.

If they are on board, great, then regardless of the language of the spec, this behavior was what was intended, and the tests stay. If they misunderstood or disagree with the outcome, then it's not the test suite's position to make a call there, and it's something that should be resolved in the spec before it's here. This isn't very complicated, all I want to know is others understand and agree.

@handrews
Copy link
Contributor

agree with your assertion that standard keywords like minimum behave the way you say they do outside of vocabularies (i.e. can never be considered not part of a vocabulary)

That is not what I am saying. There is a difference between a straw man and us not understanding each other's arguments, and it is the latter that is happening here. I am trying to explain why the way you are looking at this is not the way that was intended, and that if we follow the way you are looking at it then we end up in place that does not work.

I don't think your point ("can never be considered not part of a vocabulary") is an intentional straw man, but it's not my position.

minimum or type or any other keyword can be a non-vocabulary keyword.

They just cannot be both a vocabulary and non-vocabulary keyword at the same time, and only the vocabulary keyword is covered by the standard.

Do you see how that is different from "can never be considered not part of a vocabulary"?

@handrews
Copy link
Contributor

handrews commented Aug 15, 2022

Is an implementer of JSON Schema allowed by the spec, in their own new implementation they're writing, to make their implementation always understand the type keyword, even when given meta schemas without the validation vocabulary?

So to answer your question directly, if an implementer does this, then they are not implementing the standard validation vocabulary. They are implementing something else that just so happens to look like it.

It is to me additional in the sense that the rest of the vocabulary is not enabled. It thereby differs in behavior from minimum-as-part-of-the-validation-vocabulary, but not in a way that'd fail the tests. This is precisely the way to differentiate such a keyword "practically" from the vocabulary, because other keywords from the vocabulary in this hypothetical don't work.

That differentiation proves that if the keyword still functions after the validation vocabulary is removed from $vocabulary, that it was never the standard validation vocabulary keyword in the first place, and is therefore outside of the spec and not what the test suite is testing.

This, I'm guessing, is why you read my arguments as straw men, because we fundamentally disagree on something here. The test suite tests implementations of the standard. There are things that are gray areas because they are not forbidden, but swapping out a standard keyword for a non-standard keyword violates the standard even if the mechanism used to do so is, in general, allowed, and even if the replacement happens to look enough like the standard to slip by the tests.

@handrews
Copy link
Contributor

handrews commented Aug 15, 2022

if $vocabulary behaves in the way Henry says it does it's weird to me that non-vocabulary keywords are allowed at all

This is because I (and I assume @gregsdennis and @Relequestual) never contemplated the possibility of someone substituting non-vocabulary keywords for standard keywords in a way that can't be turned off. In my case, I had no idea that always-on extension keywords were considered valid when it came to assessing conformance in the first place (Greg, Ben, were either of you aware of this?).

Don't get me wrong- it's excellent QA thinking! From the perspective of my QA career I respect the thoroughness and reasoning. And I would have loved to have heard it 3 years and 8 months ago when we could have easily fixed the wording to make it beyond a doubt that this was not allowed, whether by exempting the standard vocabularies, or by forbidding always-on extension keywords, or whatever else.

But it's difficult to prove now, because how do you prove that you intended to forbid something that you didn't think was possible in the first place? (By which I mean "I did not (and do not) think it is possible to have such a minimum or type always on and considered a viable candidate for test suite conformance", not "it's not possible to redefine such keywords outside of vocabularies at all")

@gregsdennis
Copy link
Member

Okay... I think I see what's going on here.

We have a (currently required) test:

// schema
{
  "$id": "https://schema/using/no/validation",
  "$schema": "http://localhost:1234/draft2020-12/metaschema-no-validation.json",
  "properties": {
    "badProperty": false,
    "numberProperty": {
      "minimum": 10
    }
  }
}

// instance
{
  "numberProperty": 1
}

The expected outcome from this test is that since the validation vocab is explicitly excluded, minimum is rendered an unknown keyword and thus ignored, meaning that this instance passes (perhaps counterintuitively).


@Julian is saying that the specification allows (via not explicitly disallowing) that a JSON Schema implementation may have an internal, always-on implementation of minimum that functions identically to what the omitted vocab defines, and that such an implementation would fail this instance, thus failing the test, labelling it as non-conformant. However because there is no language that explicitly forbids an implementation from doing this, the inclusion of an always-on minimum should be permitted, meaning that this test should be optional (if not removed entirely), and we should allow such implementations to declare conformance.


I would say that writing a meta-schema that explicitly excludes a vocab is a very intentional act, and an implementation that doesn't allow an author to do this (as its default behavior) should be considered non-conformant.

The test suite should cover only what is required by the spec. So to show that this is in fact a requirement...

From the opening statement for section 8.1.2,

The "$vocabulary" keyword is used in meta-schemas to identify the vocabularies available for use in schemas described by that meta-schema.

it seems clear there exists an implication that keywords defined in a vocab that is not listed in $vocabulary are not considered "available for use." This MUST render keywords in exluded vocabs as unknown or unrecognized. This, to me, is sufficient to mean that omitting a vocabulary implies that its keywords are to be ignored. This is an implicit requirement of the specification.

Consider my data or unique-keys vocabs. If I use the 2020-12 meta-schema, which doesn't list either of these vocabs, it is expected that an implementation ignore these keywords, even if it understands them.

Similarly, if a meta-schema excludes the validation vocab, it must be expected that an implementation ignore those keywords.

The explicit exclusion of a vocabulary implies that its keywords MUST be ignored.

This is not to say that an implementation can't be configured so that a keyword from an excluded vocab is "always-on." But it does mean that this cannot be the default behavior.

@Julian
Copy link
Member Author

Julian commented Aug 17, 2022

Fair enough we can leave this here. Thanks for the input folks.

@karenetheridge
Copy link
Member

I do not see a plausible argument that an implementation that supports the standard validation vocabulary can claim that a minimum keyword that effectively duplicates the same keyword from that vocabulary is an "additional" keyword in such a way that it would invalidate these test cases.

Can/should we state in the specification that redefining any keyword that exists in the specification is strictly prohibited? e.g. in the example in the github thread, if someone wants to define new semantics for "minimum" in their implementation, and exclude the validation vocabulary from a metaschema, you need to pick a new name for that keyword, because "minimum" is taken.

For that matter, I don't really like the idea of "always on" keywords that aren't defined by a vocabulary. We [that is, great minds not including me] invented the concept of vocabularies for a reason; having a keyword outside of that framework reduces interoperability to zero. ..but that might be a more controversial opinion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A test is wrong, or tooling is broken or buggy.
Projects
None yet
Development

No branches or pull requests

5 participants