Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct #543

cueckoo · 2021-07-03T10:29:33Z

Originally opened by @mpvl in cuelang/cue#543

Background

Currently, for regular fields, an author of a definition can make backwards-compatible modifications without fear of breaking its user, as long as users do not use this definition in an embedding. This is a great property to have for large-scale engineering.

A backwards compatible change in general is when an API is modified to be more less specific, with the exception for definitions that existing fields may not be removed, unless paired with a catch-all [string]: T or .... This is because, by default, a user may not add a field to a closed struct, so removing a field from the API will break a usage of this definition that specifies this field.

Note that a user may still add fields to an existing definitions by embedding it. In this case, changes to original definition may break its usage. There is a clear distinction, however, between using something as an embedding or not, and being able to give guarantees on the basis or whether or not a definition is used as an embedding is a clear rule. Also vet rules could warn about the use of definitions from outside the current module as an embedding.

The Problem

The problem is that this desirable property does not hold for nested definitions. Consider this definition:

#D: {
    a: int
}

A user of this template could write

foo: #D & {
    #bar: int
}

introducing a new definition in the definition #D. This is allowed because the current closedness rules do not apply to definitions.

Suppose the original definition is now changed to

#D: {
    a: int
   #bar: string
}

This now results in a breakage of foo downstream. This may be fine if both #D and foo are maintained by a single owner, but it is problematic if they are maintained by different organizations. The problem is two-fold: on the one hand the user of #D has no guarantee that the maintainer of #D will not break their usage, while the maintainer of #D cannot introduce a definition in #D without knowing it may break a user.

Note that the same problem does not occur for this snippet

import "acme.com/pkg"

foo: pkg.#D & {
    _#bar: int
}

because hidden definitions are not visible across package boundaries and live in their own namespace (NOTE: not implemented/enforced yet).

Proposal

We propose disallowing adding definitions to closed structs. In the example above, this means that

#D: a: int

foo: #D
foo: #bar: int // proposal makes this illegal

would be illegal and result in a #bar not allowed error.

The user could still write

#D: a: int

foo: {
    #D
    #bar: int
}

resulting in the same issue. However, this makes the behavior consistent with regular field and results in a single guideline upon which non-breakage guarantees can rely.

Note that users can still also write

#D: {
    a: int
}

foo: #D
foo: _#bar: int

to work around the issue as well.

Other less tangible benefits of this proposal is that conceptually there are less scenarios in how fields behave, as it makes the behavior of definitions more like that of regular fields. This may result in simpler model for the user and results in simpler code.

We propose that the ... operator does not apply to definitions.

Impact and transition

This is a breaking change. An automated rewrite is somewhat complex and would likely need to operate on the semantic level. Detecting breakage cases should be fairly straightforward. An automated rewrite will not likely be as straightforward, but seems in the realm of possibilities. It may be that it requires user interaction to decide if an offending inclusion of a definition should be rewritten as a an embedding or a hidden definition. Always rewriting it as an embedding may be the closest to the original semantics, though.

To minimize changes,

foo: #D
foo: #bar: int

could be rewritten as

foo: { #D, #bar: _ }
foo: #bar: int

This would not be too hard if position information of conflicting values is accurate.

It would be good to first implement the scoping rules of hidden fields before rolling this change out.

Discussion

Default constraints

The spec currently allows ...T to be used in structs as well (not yet implemented). It seems to not make sense to apply T to definitions and regular fields. The most likely reason why one would want to add a definition in a certain scope is because it either is a common type for multiple fields (see #limit below) or a type that is part of an expression at various points in the struct's fields. In the first case, ...T would just be applied multiple time. There seem to be very few favorable outcomes for the latter.

As it doesn't make sense to allow ...T to apply to definitions, it seems consistent to not have ... apply for definitions.

Users can use embedding to work around the constraint of adding definitions.

Possible future extensions

Note that what is discussed in this section is NOT a proposal, but rather a discussion on how the remaining limitations could be addressed in the future in a backwards compatible way.

The introduction of definitions may be useful in some cases. For instance, an author of a definition may want to base its definition on another definition, but also introduce a definition with some common patterns for its users. For instance

// amce.com/foo
package foo

#A: {
  a: int
  b: 
}

// example.com/pkg
package pkg

import "acme.com/foo"

#E: foo.#A & {
   #limit: <10 // illegal under the new proposal
  a: #limit
  b: #limit
}

Users of #E could write

import "example.com/pkg"

e: pkg.#E & { #limit: <5 }

to collectively tighten the limit of all fields.

To work around the limitations of the new proposal, the author could either use embeddings or a hidden definition. In this case, however, neither are satisfactory. It cannot be hidden, because users of the package are supposed to modify #limit and this it cannot be hidden. Using embedding brings us back to square one, as the author of #A could introduce #limit later on and break #E.

To resolve this issue, we could introduce qualified definitions that live in the namespace of the package in which they are defined. Syntactically, this could look like:

// example.com/pkg
package pkg

import "acme.com/foo"

#E: foo.#A & {
  pkg#limit: <10
  a: pkg#limit
  b: pkg#limit
}

Here pkg#limit is a broadening of the identifier syntax, where a # may be preceded not just by _, but any valid regular CUE identifier. The identifier here either refers to the handle of the current package or an imported package, fully qualifying the namespace of the definition. So in the example, pkg#limit is qualified by the tuple ("example.com/pkg", #limit).

A usage of #E would look like:

import "example.com/pkg"

e: pkg.#E & { pkg#limit: <5 }

This mechanism would allow authors to introduce definitions in a backwards compatible way without worrying about breaking things downstream or being broken by an upstream change.

This mechanism would only apply to definitions and not to regular fields. Regular fields are considered what ends up ultimately in a generated configuration and inherently represent a flat space. Definitions don't have this restriction and can benefit from the scoping results as introduced in this paragraph.

Note the close relation of scoped definitions with hidden definitions: 1) both are allowed to introduce a new definition within a closed struct (must be in the current package in both cases), 2) both have an additional qualifier before the # in a definition name, 3) in both cases the meaning of this qualifier is some fully qualified package path (which may be the current package), 4) the implementation of both relies on qualifying a field based on package names available in the current file during compile time. Note that this is mechanism is almost identical to how exported and unexported fields work in Go.

We reiterate, that the discussion of scoped definitions is missing a lot of details and is intended to be a proposal, but serves as an example to allow scoped definitions in a world where closedness rules apply to non-scoped definitions.

The text was updated successfully, but these errors were encountered:

cueckoo · 2021-07-03T10:51:55Z

Original reply by @seh in cuelang/cue#543 (comment)

So in the example, pkg#limit is qualified by the tuple ("acme.com/foo", #limit).

Should that be ("example.com/pkg", #limit) instead? Isn't #limit being defined in example.com/pkg?

We reiterate, that the discussion of scoped definitions is missing a lot of details and is intended to be a proposal,

Earlier you said that this is not a proposal, but here you say it is a proposal. It sounds like a proposal.

cueckoo · 2021-07-03T10:51:55Z

Original reply by @mpvl in cuelang/cue#543 (comment)

@seh: yes, good catch (I've changed it in the original).

@seh: the scoped/ qualified definitions (pkg#limit) are not part of the proposal. The actual proposal only involves the disallowing new definitions in closed structs discussed in the "Proposal" section.

cueckoo · 2021-07-03T10:51:56Z

Original reply by @mpvl in cuelang/cue#543 (comment)

@seh, I grouped the non-proposal part in discussions under "possible future extensions"

cueckoo · 2021-07-03T10:51:58Z

Original reply by @eonpatapon in cuelang/cue#543 (comment)

Quite agree with this proposal as if the def is closed (ie not including ... and friends) it shouldn't allow the addition of regular and def fields.

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.

As for the discussion part and the described used case:

#E: foo.#A & {
  #limit: <10 // illegal under the new proposal
  a: #limit
  b: #limit
}

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

cueckoo · 2021-07-03T10:51:58Z

Original reply by @myitcv in cuelang/cue#543 (comment)

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

I think the context @mpvl is looking to establish here is that the author of #E is not the same author/owner as foo.#A, hence you don't have that control/luxury. But beyond that, the concept of #limit is only something that makes sense in the context of #E as presented (else it would already have been defined in foo.#A).

cueckoo · 2021-07-03T10:51:59Z

Original reply by @myitcv in cuelang/cue#543 (comment)

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.

My take on the proposal is that it does not feel/appear like there is a good rationale/use case that suggests it should apply to definitions, and that therefore this issue has been raised to try and solicit others' intuition/examples to test that hypothesis/position.

Asked another (less convoluted!) way, do you have an example/intuition as to why it should apply to definitions?

How does the package qualified definitions approach work to your mind?

cueckoo · 2021-07-03T10:52:00Z

Original reply by @eonpatapon in cuelang/cue#543 (comment)

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

I think the context @mpvl is looking to establish here is that the author of #E is not the same author/owner as foo.#A, hence you don't have that control/luxury. But beyond that, the concept of #limit is only something that makes sense in the context of #E as presented (else it would already have been defined in foo.#A).

Yes I understood that, so the author of #E could write:

#limit: <10
#E: foo.#A & {
  a: #limit
  b: #limit
}

He doesn't need any control on foo.

cueckoo · 2021-07-03T10:52:01Z

Original reply by @myitcv in cuelang/cue#543 (comment)

@eonpatapon apologies, I misinterpreted what you wrote:

It could be declared at the root

as meaning "could be declared on the root type, i.e. foo.#A". On second reading I can see that's not what you meant.

the author of #E could write:
#limit: <10
#E: foo.#A & {
  a: #limit
  b: #limit
}

Correct, or use hidden definitions (because sometimes the definition being declared within the scope of #E is significant):

#E: foo.#A & {
    _#limit: <10
    a: _#limit
    b: _#limit
}

cueckoo · 2021-07-03T10:52:01Z

Original reply by @eonpatapon in cuelang/cue#543 (comment)

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.

Asked another (less convoluted!) way, do you have an example/intuition as to why it should apply to definitions?

No not really

How does the package qualified definitions approach work to your mind?

I'm just thinking that the meaning of ... would not be trivial to grasp as it would imply: any regular field, no definition fields but fully qualified definitions fields (if implemented).

My point is if an upstream definition includes ... then any changes to it (adding a regular field or definition field) could potentially brake downstream definitions. So I don't understand why ... would exclude definitions.

If an author writes the definition

#A: {
  a: int
  b: string
}

He is sure that he won't break any downstream usage when modifying #A as it's not possible for user to add any field or definition to #A (as proposed - which I'm ok with)

But if he writes:

#A: {
  a: int
  b: string
  ...
}

He clearly knows there could be some breakage. And excluding definitions in #A downstream doesn't help on that (at least partially).

cueckoo · 2021-07-03T10:52:04Z

Original reply by @mpvl in cuelang/cue#543 (comment)

@eonpatapon

What's the advantage of declaring #limit in #E ?

Sometimes one has to. Maybe a more concrete example would be the following list type:

#ListType: {
  #T: _
  #List: {
    value: #T
    next: #List
  }
}

Users could instantiate ListType by setting #T to create a list type with values of a certain type.
This is a parametrized linked list. Firstly, it needs to be pointed out that #T needs to be defined within ListType: if #T had been defined outside of #ListType, it would not have been possible to instantiate a new #ListType with a more specific type like that.

In general there are all kinds of analogous reasons why a definition has to be scoped within a certain struct.

Now, suppose a users wants to have a list of two values. One could augment the ListType as such:

#ListType2:  {
  #ListType
  #T2
  #List: value2: #T2
}

Still a rather hypothetical example, but you can see perhaps why it is necessary that #T2 be included in #ListType2 and not outside it.

cueckoo · 2021-07-03T10:52:04Z

Original reply by @mpvl in cuelang/cue#543 (comment)

@eonpatapon

"I'm just thinking that the meaning of ... would not be trivial to grasp as it would imply: any regular field, no definition fields but fully qualified definitions fields (if implemented)."

The rationale is based on that it has weird consequences.

Firstly, assume that ...T is implemented. This is already mentioned in the spec. It provides for more consistency between lists and structs and is necessary to model the nuances of JSON Schema in CUE.

Now suppose we would allow ... to open up a struct for definitions. Then I would find it very surprising if ...T did not apply to definitions too. After all, ... is defined as a shorthand of ..._.

If one stops and think about what ...T does, I would find it quite odd if it also applied to definitions. For instance, in the list example one introduces such type most likely to constrain fields within that struct. It does not make sense to have ...T apply to both the definitions and the values in that struct.

Now, even if ...T were to apply to definitions too, given the interplay between [string]: T and ...T, I then would also expect [string]: T to apply to definitions. But that's not possible, because it inherently matches string values, which live in a different namespace. IOW, I would expect [=~"^#"]: T to match regular fields, not definitions. But if ...T can match definitions, I would expect there to be some way to have [string]: T to apply to definitions too.

Anyway, a bit convoluted train of thought, perhaps, but what I'm trying to get at is that allowing ... to apply to definitions results in more warts and questions than not doing so.

All of these complications can be avoided by just not letting ... apply to definitions. The question is whether people see this will create a gap in functionality.

cueckoo · 2021-07-03T10:55:28Z

Original reply by @myitcv in cuelang/cue#543 (comment)

Re-opening because there was a partial revert in https://cue-review.googlesource.com/c/cue/+/8201

Open structs or structs with ellipsis should always allow fields, not just regular ones. The question is if this is really the case when we disallow definitions in closed struct by default, but it seems fair enough to allow this until this is the case as it is consistent with CUE's behavior. Fixes #2320 Issue #543 Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com> Change-Id: I25a42948f569e81ddacc74c7467c98a287861bea Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/552331 Unity-Result: CUEcueckoo <cueckoo@cuelang.org> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Roger Peppe <rogpeppe@gmail.com>

This old evaluator allowed new definitions in closed structs. This is not according to the spec. However, mimic this behavior in the new evaluator for now. Issue #2884 Issue #543 Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com> Change-Id: Ia0c37fb7c915426f13ed091f948749f7c9ac9fbb

This old evaluator allowed new definitions in closed structs. This is not according to the spec. However, mimic this behavior in the new evaluator for now. Issue #2884 Issue #543 Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com> Change-Id: Ia0c37fb7c915426f13ed091f948749f7c9ac9fbb Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1191565 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>

The new evaluator computes closedness quite differently from the old one. This CL adapts the API to use the new data. Note that the data the new evaluator generates is considerably more convenient. That is why the logic is quite a bit simpler. The logic could probably be simplified more. Also note that HasEllipsis seems to be not computed correclty entirely. Althoug it seems that many tests improve if we add the check to isClosed or IsClosedStruct, there are some cases where this causes errors. This seems to be the case with one of the tests in TestAllows. As this only manifests itself when structure sharing is disabled, we just mark this test as todo and leave the investigation for later. Issue #3060 Issue #543 Issue #2884 Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com> Change-Id: I98e8c25e0680674e2aaa9f743a49b39920d7b266

The new evaluator computes closedness quite differently from the old one. This CL adapts the API to use the new data. Note that the data the new evaluator generates is considerably more convenient. That is why the logic is quite a bit simpler. The logic could probably be simplified more. Also note that HasEllipsis seems to be not computed correclty entirely. Although it seems that many tests improve if we add the check to isClosed or IsClosedStruct, there are some cases where this causes errors. This seems to be the case with one of the tests in TestAllows. As this only manifests itself when structure sharing is disabled, we just mark this test as todo and leave the investigation for later. Issue #3060 Issue #543 Issue #2884 Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com> Change-Id: I98e8c25e0680674e2aaa9f743a49b39920d7b266

The new evaluator computes closedness quite differently from the old one. This CL adapts the API to use the new data. Note that the data the new evaluator generates is considerably more convenient. That is why the logic is quite a bit simpler. The logic could probably be simplified more. Also note that HasEllipsis seems to be not computed correclty entirely. Although it seems that many tests improve if we add the check to isClosed or IsClosedStruct, there are some cases where this causes errors. This seems to be the case with one of the tests in TestAllows. As this only manifests itself when structure sharing is disabled, we just mark this test as todo and leave the investigation for later. Issue #3060 Issue #543 Issue #2884 Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com> Change-Id: I98e8c25e0680674e2aaa9f743a49b39920d7b266 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1194080 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>

cueckoo added Proposal roadmap/language-changes Specific tag for roadmap issue #339 labels Jul 3, 2021

cueckoo added this to the v0.5.0 language changes milestone Jul 3, 2021

cueckoo mentioned this issue Jul 3, 2021

"field not allowed" with 0.3.0-alpha3, not in 0.2.2 #539

Closed

cueckoo mentioned this issue Jul 3, 2021

Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct cuelang/cue#543

Closed

myitcv mentioned this issue Jan 13, 2022

Upcoming Cue language change might break Dagger configurations dagger/dagger#473

Closed

myitcv added the Discuss Requires maintainer discussion label Jan 13, 2022

myitcv added zzz Discuss and removed Discuss Requires maintainer discussion labels Jan 29, 2022

myitcv added Discuss Requires maintainer discussion and removed zzz Discuss labels Feb 9, 2022

jlongtine mentioned this issue Apr 7, 2022

dagger do should check for a valid plan dagger/dagger#1711

Closed

myitcv added the x/user/dagger.io Experimental CUE-user-based labels label Sep 14, 2022

myitcv mentioned this issue Oct 4, 2022

cue: misleading error report for definition misspelling with embedded scalar #1965

Open

rogpeppe mentioned this issue Mar 28, 2023

cue: unification of closed struct with definition fields fails #2320

Closed

mpvl modified the milestones: v0.6.0 required fields, v0.8.0: semantic adjustments Mar 29, 2023

myitcv added the zGarden label Jun 13, 2023

mvdan removed the zGarden label Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct #543

Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct #543

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct #543

Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct #543

Comments

cueckoo commented Jul 3, 2021

Background

The Problem

Proposal

Impact and transition

Discussion

Default constraints

Possible future extensions

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021

cueckoo commented Jul 3, 2021