Skip to content
This repository has been archived by the owner on Nov 18, 2021. It is now read-only.

Proposal: required fields and related issues #822

Closed
mpvl opened this issue Mar 10, 2021 · 46 comments
Closed

Proposal: required fields and related issues #822

mpvl opened this issue Mar 10, 2021 · 46 comments
Labels

Comments

@mpvl
Copy link
Contributor

mpvl commented Mar 10, 2021

With extensive inputs from @myitcv

We propose a couple of small additions to the language, as well as a sharpening of the language specification, to solve a myriad of shortcomings related to specifying required and optional fields. We believe the result allows for a much more natural representation of required fields in CUE. This proposal also allows us to define tooling in a more consistent manner and lays the foundation for finalizing the query proposal.

The changes are mostly backwards compatible, but:

  • may allow configurations that before would fail (where the failure was likely unintended)
  • may allow configurations that worked in v0.2 and early v0.3, but fail as of v0.3.0-beta.2, and beyond.
  • may require the use of the -c flag for cue eval and cue export to get the same results with the old representation. This makes this flag consistent with the behavior of vet as well.

Automated tooling can be provided to rewrite CUE files.

Background

We now cover some core CUE concepts necessary to understand the proposal introduced in this document. In some cases, it points out general issues with these topics. People familiar with these concepts can skip this section.

Closeness

CUE closedness is intended to solve two problems:

  • be able to catch typos in field names
  • be able to represent mutual exclusivity of fields, like in Protos

A specific non-goal of defining closed structs is to define APIs that promise to never add an additional field. CUE does not offer a structured way to specify this, as it is believed that APIs should be open by definition.

A good analogy of the role of closedness in CUE is to compare it to Protobufs. By definition, protobufs are extendible. Fields with unknown tags on incoming messages should typically be ignored. However, when compiling a proto definition to, say, Go, the fields defined on a proto
message is closed. An attempt to access an unknown field in a struct will result in a compile error. In CUE, the fact that a definition closes a struct should be interpreted in the sense of a compiled proto message and not as an indication that an API is closed for extension.

In that sense, enforcing mutual exclusivity in protos is a bit of an abuse of this mechanism. Although it has served the representation of protos quite well, it may help to consider alternatives.

Error types

To understand the below proposal, it is important to realize that CUE distinguishes between several error modes:

  1. compile errors: any error that can never be resolved and can be spotted before evaluation. This may be otherwise valid CUE, such as string + “foo”
  2. permanent errors: an error that occurs during evaluation and cannot be solved by adding more values to the configuration.
  3. incomplete errors: an error that could be solved by adding more values, that is making a CUE value more specific.

In the end these are all errors, but CUE evaluation and manifestation will fail at different points when encountering these.

Evaluation modes

CUE distinguishes two evaluation modes:

Unification

Unification combines two CUE values, preserving the entire space of possible values that are the intersection of the two input values. Unification is commutative, idempotent, and associative.

For instance, cue def presents the result of unification without any simplifications like picking defaults or resolving references if this would change the meaning of the CUE program.

There can be multiple ways to represent the same schema. For instance bool is equivalent to true | false. The goal of unification is not to find a normalized representation of the result. Finding an optimal representation, by some definition, is an option, but not necessity.

Default selection

Default selection is used when a full specification of a value is required, but an intermediate concrete value is needed to proceed. Essentially, this happened for references of the form a.b and a["b"], where a needs to be a concrete list or struct to be able to select the appropriate value.

The steps are as follows:

  1. eliminate values with incomplete errors from disjunctions
  2. select default values

The differentiating property of default selection is that all optional fields, pattern constraints and closedness information needs to be preserved. This puts limits on the amount of disambiguation that can be done ahead of time. It is the user’s responsibility to disambiguate structs if applicable.

Manifestation

Manifestation projects a result from unification to a concrete value. It is used in cue export, but also, for instance, for:

  • expressions: x and y in x + y
  • elements projected by a query: x.* (corresponding to the Query Proposal Proposal: CUE Querying extension #165)
  • arguments for builtins that must be concrete
  • when outputting concrete values, like turning into JSON
    We will write manifest(x) to denote the manifested value of x. This is not a CUE builtin.

Right now this encompases:

  1. default selection
  2. eliminate duplicates values from disjunctions.

Note that after this elimination different structs with different optional values are retained.

The Issues

We present some issues for the various use cases of CUE that motivated this proposal.

Encourage good API design

When evolving and augmenting an API, it is generally strongly recommended to not introduce backwards incompatible changes. This means that any added fields to an API should typically be optional.

In CUE, optional fields are indicated by a ?. This means that when augmenting an API, one should always remember to add the ?. Especially for larger APIs, required fields should be relatively rare. It would be better if only the required fields had to be marked, while all other fields are optional. This would avoid inadvertently making fields required, and makes it more likely that making a field required is a deliberate choice.

Specifying minimum and maximum number of required fields

The protobuf oneOf representation

OneOf fields in protobufs are currently represented as

#Foo: {
    {} | { a?: int } | { b?: int }
}

This relies on all three disjuncts ultimately collapsing into the same value, assuming that optional fields can be ignored and the closedness to enforce mutual exclusivity.

The following could be used as an alternative.

*{} | { a: int } | { b: int }

This approach is a bit of an abuse of the closedness mechanism, whose predominant goals to catch typos. It mostly works, although it does have a few issues. For instance, embedding #Foo would allow users to redefine a oneOf field, allowing it to be used in configurations that were not allowed by #Foo. To some extent this is okay, considering what closedness is for, and the fact that embeddings disables this check. But it may be nice to address if possible.

In the end, there is currently no explicit “official” way of specifying oneOf fields in CUE that can capture the most likely intent. CUE currently deploys a hacky workaround for this, but that is not a permanent solution.

Specifying a certain number of required fields

The above mechanism does not generalize to specifying that, for instance, at least one of a set of fields should be given by a user.

Policy definitions and querying

One of the core ideas of CUE is that, with the right model, constraints can function as templates, resulting in a powerful combination. The way optional and required fields are marked now works fairly well in this context. These same mechanisms don’t work as smoothly in the context of querying and policy definitions, that is, using CUE to specify filters or additional constraints.

There are currently some gaps in the query proposal that can only be addressed with some refinement in the optionality model in CUE.

Requiring fields

Requiring the user to specify a field is currently achieved by specifying a field with a non-concrete value and then requiring non-concrete values to be concrete when exporting. Note that to CUE, non-concrete values are just values. So, in effect, this mechanism is also an abuse of the current semantics. Better would be to mark fields as required explicitly.

Requiring concrete values

A related problem is the ability to require the user to specify a concrete value. This is currently not possible. For instance

#Def: intList: [...int]

says that intList must be a list of integers. But since [...int] defaults to [], CUE will happily fill in this value if a user doesn’t specify it. Similarly, requiring that a user specify a concrete value is not possible: unifying a user-supplied value with

#X: {
    kind: "X"
    ...
}

would just happily fill in the missing kind.

Error messages

A side-effect of having a mechanism to explicitly specify a field as required is that it will also lead to better error messages. Currently, when CUE discovers a non-concrete message it just complains about that: it has no way of knowing the author’s intent.

Having an error message explicitly state that the error is a missing field would be considerably clearer.

Evaluation

Disambiguating disjuncts

CUE currently only disambiguates identical disjuncts. This means that disjuncts with different optional fields will not disambiguate. The following is an amended example from #353

e: #Example

e: a: "hello"

#Example: {
	a: string
	{
		value?: string
	} | {
		externalValue?: string
	}
}

As the two values have two different sets of optional fields, they are not considered equal.
This can result in confusing error messages to the user, like incomplete value {a: 1} | {a: 1}, for instance when running a command that does not show optional fields, like cue evalorcue export`.
CUE does not do this as disambiguating optional fields is NP-complete in the general case. It is nonetheless confusing for the user.
A similar issue occurs with regular (non-optional) fields.
This can be solved by giving users better tools to disambiguate as well as having more permissive disambiguation.

cue commands

Avoid confusion for users

CUE currently allows non-concrete values in the output. Unifying

a: int
b?: int

and

b: 2

succeeds with vet, because for CUE int is a valid value. The -c option, will verify values are concrete. For cue export, however, values must be concrete to succeed.

Overall, there are some discrepancies in how optionality and non-concreteness is handled across commands. A more consistent handling across commands would be useful.

What is cue eval for? How does it differ from cue export? More clarity for CUE commands

cue eval prints a form of CUE that is somewhat in between schema and concrete data. It may not even be correct CUE. The purpose of this was to get good feedback for debugging. But as CUE evolved this output has become increasingly confusing. A clearer model of output modes is in order.

A more consistent output with more explicit debugging options shared between commands seems less confusing.

Proposal

This section gives a brief overview of the proposal. Details are given in the next section.

Required fields

We introduce the concept of a required field, written as foo!: bar, which requires that the field be unified with a namesake regular field (not required and not optional) which has a concrete value. A field that violates this constraint results in an “incomplete” error (the failure could be solved by adding a concrete value of the field).

Consider this simple example:

#person: {
    name!: string
    age: int
}

jack: #person & {
    name: string // incomplete error; string is not concrete
    age:  int    // ok
}

Further examples can be seen below.

A required field can be referenced as if it were a regular field:

a!: string
b: a   // string

c!: "foo"
d: c   // “foo”

A required field may have a concrete value. The same limitation holds for such fields, meaning that it must be unified with a regular field with that exact same concrete value.

numexist builtin

We introduce a numexist builtin with the signature

numexist(<num>, <expr>+)

which takes a numeric constraint and a variable list of expressions and returns
_: if the number of expressions evaluating to a concrete value unifies with num.
_|_ otherwise, where the error is an incomplete error if the number could still be satisfied by having more concrete values.
When evaluating the expressions, incomplete errors are ignored and counted as 0.

For instance, a protobuf oneOf with fields named a and b could be written as:

#P1: {
    numexist(<=1, a, b)
    a: int
    b: int
}

There is nothing special about the arguments a and b here, they are just resolved as usual. In this case, numexist passes as both a and b evaluates to int, and thus the number of concrete values is 0, which matches <=1. It would fail with a fatal error if a and b were both, say, 1.

Requiring that at least one of a set of fields is non-concrete can be written as:

#P2: {
    numexist(>=1, a, b)
    a: int
    b: int
}

In this case, numexist fails with an “incomplete” error, as the condition can still be resolved by setting either a, b, or both to a concrete value.

Ignoring incomplete errors is needed to allow the same construct for fields that have struct values:

#P3: {
    numexist(<=1, a, b)
    a?: #Struct
    b?: #Struct
}
#Struct: { c: int }

without making a and b optional fields, they would both always evaluate to a concrete value. For consistency, it is probably good style to always mark fields referred to by this builtin as optional.

When optional fields are used, as in #P3 above, we could consider another builtin numexists that checks for the existence of a reference, instead of concreteness.

More specific manifestation

We propose adjusting manifestation as follows (new or changed steps emphasized):

  1. default selection
  2. eliminate optional fields, pattern constraints
  3. [optional]: drop definitions, hidden fields and/or fields with non-concrete values.
  4. eliminate duplicates values from disjunctions.

Step 3 can be optional and only affects the outcome insofar it may lead to different disambiguation for structs. Which to choose may depend on the output mode requested by the user.

Unlike with unification, there is logically only one correct representation (disregarding field order for now).

Tooling alignment

The introduction of required fields and a more specific manifestation can be used to simplify the commands of the tooling layer.

The biggest change is that we will generally not interpret non-concrete fields as errors, making them behave somewhat similar to optional fields today. The old behavior would still be available through the -c option. Some commands, like cue vet have always taken this interpretation. So this change will bring more consistency between commands, while addressing many of the above errors.

Details of the command alignment is discussed below.

Detailed design and examples

Required fields

Semantics

A required field, denoted foo!: bar, requires that the field be unified with one or more regular fields (not required and not optional) giving a concrete result. A field that violates this constraint results in an “incomplete” error.

For instance, consider this example:

x!: {
    a!: string
    b: int
}

This would require that the user specify field x which must have a field a with a concrete value. Dropping the ! from x would mean that users don’t have to specify a field x but when they do, they also need to specify a concrete value for a.

The required constraint is a property of fields. Here are some examples of how required fields unify with regular fields and other required fields:

{foo!: int} & {foo: int}   	→  {foo!: int}
{foo!: int} & {foo: <=3}   	→  {foo!: <=3}
{foo!: int} & {foo: 3}  	→  {foo: 3}

{foo!: 3} & {foo: int}   	→  {foo!: 3}
{foo!: 3} & {foo: <=3}   	→  {foo!: 3} & {foo: <=3}
{foo!: 3} & {foo: 3}  	→  {foo: 3}

A total ordering of all types of fields can be expressed as

foo?: int > foo: int > foo!: int > foo!: 1 > foo: 1

Note that {foo!: 3} & {foo: <=3}: cannot be simplified further. The definition requires that the values of the regular fields together be concrete. Logically, {foo: <=3} could unify with {foo: >=3} to become {foo: 3} (it represents the same set of numbers), so rewriting {foo!: 3} & {foo: <=3} as either {foo!: 3} would result in loss of information. Retaining this distinction is important to keep associativity and a valid lattice structure.

Limitations

The definition imposes some limitations on the use of the ! flag. For instance, if a definition were to have both a field foo!: 3 and foo: 3, the latter would remove the requirement for the user to specify a field, because foo: 3 satisfies the requirement of foo!: 3 to provide a concrete value. It doesn’t matter that this comes from the same definition. In general, when marking a field as required, it is advisable that all other constraints on the same field in a definition (or at least those with concrete values) should specify the ! as well. These cases could be caught with a cue vet rule, requiring that in a single definition all fields must be either required or not. This would also help readability.

Implementation

We now present a possible implementation. Understanding this section is not key to the proposal.

The implementation of required fields could be done with an “internal builtin” __isconcrete defined as follows:

__isconcre(x) if unified with a value y returns
a fatal error if unification fails
_ if it succeeds and y is concrete
an incomplete error otherwise

This would be implemented along the lines of current validators (like <10) and the proposed must builtin.

Example rewrites:

foo!: int   	→   foo: __isconcrete(int)
foo!: 3   	→   foo: __isconcrete(3)
foo!: {name: int} 	→   foo: __isconcrete({}) & {name: int}
foo!: [...int]	→   foo: __isconcrete([...]) & [...int]

Also __isconcrete(int) & __isconcrete(3)__isconcrete(3).
After this conversion, unification proceeds along the usual lines.

We opted for a syntactic addition to the language and not to have the user define this builtin directly, as its usage is rather confusing. For nested structures we only care about concreteness at the top level. So using __isconcrete({name: int}) would work, but would do unnecessary work and also give the impression that field name itself should be concrete.

Another reason is that people will usually think of this functionality as specifying a required field. Technically speaking, though, as CUE treats non-concrete values as “present”, it should be called “concrete” and not “required”. The use of ! avoids this issue.

Implications for ?

Note with the addition of !, the use of ? would be eliminated in most cases. As we saw for the proposal for numconcrete, however, there are still some use cases for it. For this reason, as well as backwards compatibility, we expect that ? will stay.

Example: querying

Regardless of whether subsumption or unification is used for querying (see #165), it is currently not possible to specify a field should be concrete but of a certain set of values.

However, with the ! constraint, we can write

a.[_:{name!: =~”^[a-z]”}]

to query all values of a that have a name field starting with a lowercase letter.

Note, the ! would not be necessary when using subsumption and querying using concrete values like name: “foo”. It would also not be necessary if all values in a can be assumed to be concrete or if including non-concrete values is desirable.

Example: policy specification

It is currently awkward to write that a value should either have value of one kind or another. For instance:

{a: >10} | {b: <10}

would match both variants if a user didn’t specify any of the fields (assuming a and b are valid fields).

Using the exclamation mark ({a!: >10} | {b!: <10}) would explicitly require that one of these fields be present.

Example: require user to specify values

This proposal allows for requiring a configuration to specify concrete values:

  • a!: [...]
  • a!: {...}
  • a!: 1

This can be useful, for instance, to require the use to explicitly specify a discriminator field:

Example: discriminator fields

The use of ! can signal the evaluator the presence of discriminator fields. This could form the basis of a substantial performance boost for many configurations. Tooling could later help to annotate configurations with such fields (detecting discriminator fields is akin to anti-unification).

Consider:

#Object: {
    kind!: string
}
#Service: #Object & {
    kind!: “Service” // Require the user to explicitly specify this.
}

In this case, the user is required to explicitly specify the discriminator field. But even if the ! is dropped from #Service.kind, as a convenience to the user, the ! from #Object would still signal that this is likely a discriminator field. The same holds if a #MyService: #Service & kind: “Service” instantiation would make the field concrete.

Example: at least one of

The required field can be used to specify that at least one of a collection of fields should be concrete:

#T: {
    {a!: _} | {b!: _}
    a: int
    b: int
}

If both were concrete, the resulting value will be identical and duplicate elimination will take care of the disambiguation.

numconcrete builtin

The numconcrete builtin allows mutual exclusivity of values where this would be hard or expensive to express using the required field construct introduced above. It has the additional benefit that it more directly conveys the intent of the user.

In many cases it is possible to express the required field annotation in terms of this builtin. In For instance, a!: int could be written as

numrequired(1, a)
a: int

However, it is not possible to express requiring the user to specify a specific concrete value, such as a!: 1. For this reason, as well as for convenience in specifying required fields in policy definitions, it makes sense to have both constructs.

Things are a bit tricker when requiring fields of structs or lists to be concrete. As these are always concrete, all values would pass.

Need for ?

In the current proposal, we rely on ? to allow specifying a non-concrete struct for a field. Alternative, this could be done by a builtin, like must:

#P: {
    numconcrete(<=1, a, b)
    a: must(a&#Struct1)
    b: must(b&#Struct2)
}

or some dedicated builtin. The must builtin (as proposed in #575) would essentially remain incomplete as long as it is not unified with a concrete value.

The use of ?, though, seems more elegant.

numvalid

Considering the need for ?, it may be useful to also have a builtin numexist which counts the number of valid references (values that are not an error). This has the advantage that it will enforce the consistent use of ? for those fields that are part of a oneOf, making them stand out a bit more. In the case of Protobuffers, this works well, and may more accurately reflect intent.

We considered using numvalid instead of numexist, but it does not cover all cases correctly. A separate builtin proposal should lay out all the builtins and their relevant consistency.

Naming

numexist seems to accurately describe what the builtin does. It may not be the most intuitive naming, though. numrequired seems on the surface to be a likely candidate, but it would be a misnomer, as this is specifically not what that means. (In fact, the term required field is also not quite correct, even though it conveys its predominant use case.)

An possible name, albeit less descriptive, could be numof. As may be confusing, however, as it alludes to the JSON schema constructs anyOf and oneOf, which are different in nature. In that sense numof(count, ...expr) would seem more to indicate a disjunction (like the or builtin), where the user can indicate the number of disjuncts that should unify.

Implementation

numconcrete is a so-called non-monotonic construct: a failing condition may be negated by making a configuration more specific. CUE handles these constructs in a separate per-node post-evaluation validation phase. It must be careful to not return a permanent error if a resolution of the constraint is still possible. In this regard, it will be similar to the implementation of struct.MinFields and struct.MaxFields.

Other than that, the implementation is straightforward: evaluate the expressions, count the number of concrete values (non-recursively), counting incomplete errors as 0. The latter is necessary to allow non-required fields of structs and lists to fill this pattern.

vet rules

It may be surprising to the user that

numexist(>=1, a, b)
a: {}
b: {}

always passes ({} is always considered concrete). A vet rule should detect use cases where an argument is always concrete, and suggest the use of ? or possibly numexists.

Example: protobuf

To annotate protobuf oneOfs, under this proposal one could write

#X: {
    *{} | {a!: int} | {b!: int}
}

Note that this doesn’t change all that much for the current model and the current model would still work. The main difference is that it enables a stronger hint for early elimination of alternatives to the CUE compiler.

This notation also doesn’t address the embedding issue: it is still possible to add a field that before was mutually exclusive, even overriding its original type. For instance:

#Y: {
    #X
    a: string

would evaluate to

#Y: *{a: string} | {a: string, b!: int}
}

Arguably, this comes with the territory of using embedding, which gives the power of extension, but disables checking: it is already the responsibility of the user to embed with care. Also, one can imagine having a vet check to guard against such likely mistaken use.

That said, the numexist builtin would allow writing the above proto definition as:

#X: {
    numexist(<=1, a, b)
    a?: int
    b?: int
}

The use of optional fields here is unnecessary, but are needed to cover the general case, for instance if a and b were fields with a struct value. One advantage is that this sets these fields visually apart from fields that are not subject to concreteness counts.

Note how this closely resembles the “structural form” as used for OpenAPI for CRDs.

The resulting type cannot be extended to redefine a. Also, the definition gets rid of disjunctions, and defines the intent of the constraint more clearly. This, in turn, can help the CUE evaluator be more performant.

Evaluation modes

We now briefly discuss the phases of CUE evaluation, to show how they will interplay with optional and non-concrete fields.

Manifestation

We proposed that in the manifestation phase we disambiguate disjuncts based on concrete values only. It will be important to not leak this final step into earlier phases of evaluation, such as default selection. Doing so may, for instance, cause a disjunct with arbitrary optional fields to be used for closedness checks.

Consider this example:

a: { b: {foo?: >=1} } | { b: { foo?: <1 } } 

Purely on the basis of concrete values, these two are identical. However, simply picking the first or second when resolving a.b would give different results for a.b & {foo: 1}.

Doing disambiguation early, however, has quite considerable performance benefits. An implementation can work around this by clearly marking a result as ambiguous. For instance, deduped elements can have counts of the number of values that were collapsed into it.

Example: Disambiguating disjuncts.

With the new semantics of manifestation, the current example from issue #353 resolves as expected: consider

e: #Example

e: a: "hello"

#Example: {
	a: string
	{ value?: string } | { externalValue?: string }
}

In this case all two disjuncts in the resulting disjunction will have the same concrete values. This will even be the case for non-optional fields if we allow non-concrete fields to be ignored.

Example: Schema simplification

A typical Kubernetes definition now looks like

#Object: {
    kind: string
    foo?: int
    bar?: int
} 

littered with question marks.

Under this proposal, the above can be written as

#Object: {
    kind!: string
    foo: int
    bar: int
} 

This eliminates the majority of uses for ? and marks required fields more explicitly.

Printing modes

  • cue
  • cue export
  • cue def

cue printing

The majority of use cases in cue seem to be to use cue export or cue eval along similar lines.

We propose a “default command” cue <selection> which uses manifestation with the following specification:

  • Non-concrete values are dropped by default. Disjunctions are disambiguated based on their value after removal.
  • Lists and structs that were only defined in definitions and that have no elements or fields with concrete values, recursively, are dropped from the output.

Omitting non-concrete values from the output avoids flooding the printing from non-concrete fields merged in from definitions. For similar reasons CUE omits optional fields (those with a ?) from the output today. So the choice to omit non-concrete values is a logical consequence of allowing users to omit the use of ? in the majority of cases in schema.

Ignoring non-concrete values when exporting is a departure from how cue export works and how generally constraints are enforced in CUE. To provide backwards compatibility users could use the -c flag to

  • fail when fields are not concrete for export
  • print non-concrete values unconditionally in CUE mode.

Printing modes:

  • -D: also print definitions. (TODO: evaluated in schema mode or manifestation mode)
  • -H: show hidden fields of the current package
    • TODO also show hidden fields of imported packages? Probably not. We could do so if we supported _foo#bar-style package classification. This may be a debug option to reflect it is not valid CUE as of this moment.
  • -A: show attributes
  • -O: show optional fields (will be rare because we anticipate that ? might become obsolete)
  • -C: print comments
  • -i: ignore errors and show in context of an evaluation.
  • -a: show non-concrete values (essentially showing what other values can be set).
  • --debug==: useful debugging information in the form of comments below each field and value
    • original expressions/ conjuncts
    • line information
    • default values
    • dependencies
    • hidden fields from other packages
    • etc.

The default printing mode is CUE. Output formats can be chosen with the flag --out.

cue export

Much of the current functionality of cue export is reproduced in command cue. We propose to repurpose the export command adding the functionality described below.

Backwards compatibility

cue export would differ in one major way: non-concrete fields would be omitted from export; only those explicitly marked as required would result in an error when not concrete.

The old behavior can be obtained by using the -c flag, just as one would have to use it today for cue eval and cue vet, making behavior between all of these more consistent.

File output

cue export would otherwise be repurposed to be the inverse of cue import. It would be like command-less cue, but would interpret @export attributes to generate specific files.

More specifically, any struct may contain one or more export attributes that evaluates to a CUE “filetypes” specifier to direct export to write a file of a certain type.

Consider the following CUE file.

a: 2 + 3
baz: {
    @export("baz.json")
    b: a
}
bar: {
    @export("jsonschema:/foo/bar/bar.json")
    string
}

This would instruct cue export to generate two files.

By default the files will be exported in thetxtar format as follows:

// File
import baz ":/foo/bar/baz.json”
import bar "jsonschema:/foo/bar/bar.json”

a: 5
"baz": { baz, @export("baz.json") }
"bar": { bar, @export("jsonschema:bar.yaml") }
-- baz.json --
b: 5
-- bar.yaml --
type: string

Note that bar.yaml represents a JSON schema here.

The comment section of the txtar output (above all files) would describe how to reconstruct the original file from the generated files. Note that this example utilizes several currently unsupported features, like JSON imports and JSON Schema output.

Options:

  • -z: write the output to a ZIP archive
  • -u/--update: actually generate the files

cue def

The define command cue def will remain as the main way to simplify CUE schema without resolving them to concrete values.

It is allowed to simplify disjunctions (retaining semantics), but it may be on a best-effort basis. Default values are retained

Special features to be addressed in a different design doc could include:

  • Make self-contained unification versus retaining imports. (-S)
  • Do potentially expensive simplifications.

Query extension

One big motivation for this proposal is to narrow down some of the gaps needed for the query proposal (see #165). One such gap was how to define queries and whether such queries should only return concrete matches or everything.

It is not the goal of this proposal to fully define these remaining gaps, but at least to show in detail how the main proposal interacts with this proposal and solves some remaining puzzles.

Selectors

CUE is different from other query languages in that a user can query concrete and non-concrete values. At the same time, we would like to avoid confusion between these two modes of operation. In particular, we need to define what to expect for selecting values in projections.

CUE currently has one form of selection: a.foo. To understand the various modes of selection in projections, we also propose a variant of this: a.foo?. For regular selection they are defined as follows:

  1. a.foo: as it is today:
  2. the value for foo if it exists in a.
  3. an incomplete error if it does not exists, but is allowed and could be added later
  4. a fatal error if foo is never allowed in a
  5. a.foo?: like a.foo, but instead of an incomplete error it would return the constraints for foo if foo were defined as _.

So a.foo? where a is a struct is equivalent to (a&{foo:_)).foo, and b.1?, where b is a list (allowing integer indexes for lists in selection) is equivalent to (b & [_, _] )[1]. The foo? variant works around a common issue reported by users.

Now pulling in the query proposal (#165), let’s consider the equivalent for projections, that is, how these selectors behave for a “stream” of values that is the result of a query. Firstly, we expect that typical query usage will either center on querying concrete data, or on API definitions that have not been manifested.

To facilitate this view and translating the semantics of selectors to that of projections, we assume that for the purpose of projections a non-concrete value (like <10) is treated as an “incomplete” error.

Given this definition, we can then define:

  1. a.*.foo: value is dropped for incomplete errors and if the value is non-concrete (does not apply recursively)
  2. a.*.foo?: also allow non-concrete or optional (match pattern constraints)

Forms 1 and 2 will fail if an illegal value is selected (fatal error).

The semantics of treating non-concrete values as an incomplete error when querying was partly chosen to be consistent with the proposed default mode for cue commands to silently ignore non-concrete values (unless the -c option is used), making behavior consistent and predictable across the spectrum.

Note that there is precedence within CUE to expect values to be concrete. For instance, operands most binary expressions (except & and |) will result in an incomplete error when not concrete. The proposed semantics for queries is identical.

Querying with subsumption (instance-of relation)

The ! notation solves an issue with allowing a value to be used as subsumption in query filters. Consider the following:

a: {
    foo: { name: string }
    bar: { name: “bar” }
}

Using a subsumption filter {name: string} would also match foo, as it is, strictly speaking subsumed. Using !, we can work around this:

query: a.[:{name!: string}]

will select only bar.

We could require that if a field is specified in a query we required it to be concrete. That is a bit of an odd rule, though. The ! notation seems a natural solution.

Subsumption variants

Subsumption in the general case is an expensive operation. Using the definition of the different evaluation modes, however, we can distinguish two different kind of subsumption:

  1. subsumption patterns without pattern constraints (such as in the above example)
  2. subsumption patterns with pattern constraints.

Note that closed structs are defined in terms of pattern constraints, so any closed struct classifies as 2.

Patterns of type 1 would be executed as actual subsumption.

For patterns of type 2, however, we would require that the queries values must have been explicitly unified with this value. For instance, the query

query: a.[:v1.#Service]

would search for any value in a that were unified with v1.#Service. So a value that has all the same fields in a as a #Service would still not match unless it was unified with this explicit definition. In effect, this introduces a notion of explicit typing, rather than just relying on isomorphic equivalence.

Such selection is easy to implement efficiently, and may be a good compromise.

Transition

Although this is a big change to the language, we foresee a relatively smooth transition. The meaning of ? would largely remain unaltered.

Phase 0

Introduce the new disambiguation semantics. This should be done before v0.3. Although somewhat different, v0.2 has similar semantics, and introducing this before a final release will allow for a smoother transition.

Phase 1

In phase one we would introduce the ! annotation and the numconcrete builtin to work as proposed.

Phase 2

Add an experiment environment variable, like CUE030EXPREQUIRED to enable the new semantics eliding non-concrete fields. In this mode, the -c flag would mimic old behavior.

A cue fix flag allows users to rewrite their CUE files to the new semantics.

If the flat does not allow for a fine-grained enough transition, we could consider defining a transitionary field attribute to define the interpretation of such field on a per-field level.

Phase 3

Decide on whether to proceed with step 4.

Phase 4

The biggest change will be moving to relaxing the rules for non-concrete fields and moving away from excessive usage of ?. This would be done as a minor pre-1.0.0 change (e.g. v0.4.0 or v0.5.0).

The biggest issue is for installations that rely on not using ? meaning required. It will be good to ask users whether the use of cue fix and/or -c is sufficient or whether an API feature is supported as well.

Adding a feature to cue fix to “promote” fields on a one-off basis would be straightforward. Generated configurations could just be regenerated.

Note that default values and other concrete values specified in definitions would still be printed. It is only the non-concrete values that are omitted.

The removal of ? may also have performance implications, as CUE processes them differently internally. The implementation can be adjusted however to overcome performance issues. Experience with v0.2, however, which processed optional fields similarly to regular fields, showed that the performance impact of this is relatively small. Structure sharing can further mitigate this issue, and we should probably ensure this is implemented before the transition.

Alternatives considered

Alternative disjunction simplifications

We also considered eliminating non-concrete values from disjunctions. For instance, at manifestation time (only!):

a: int | 1

could in such a case be simplified to a: 1. This would obviate the need for the default marker in this case.

The overall intuition here is that this would be weird, though.

By extension, we also chose not to simplify

{ a: 1, foo: int } | { a: 1, bar: int }

To achieve this effect without using defaults, users would have to write

{ a: 1, foo?: int } | { a: 1, bar?: int }

A variant that would allow such simplification is open for consideration, though, especially if it can help fully deprecating the use of ?.

Other definitions of foo!: int

We have considered the following meanings of foo!: int

foo!: int as an optional field

foo!: bar is an optional field that must unify with any concrete field to be valid.

It sort of would still work if people would diligently use ? for fields in definitions, but this would defeat one of the main purposes of introducing ! in the first place.

But logically the foo: int > foo!: int relation makes sense, as the ! constrains the regular field. This alone gave too much of a contradiction to work well.

require to be unified with a non-definition field with concrete value

The main advantage of this approach is that it would allow adding the required constraint to a schema that already defines that field as a concrete value.

The main drawback of this approach is that it is not possible to create a derivative definition of a schema that defines a required field that fills in the required field for the user.

For instance, suppose we have a definition

#Service: { kind!: string }

And we create the derivative:

#MyService: #Service & {
kind: “Service” // still not set
}

then kind would still not be set.

It seems that this should be possible, though. Specifying a concrete field in this case is akin to setting a default value. In general it is considered good CUE style to define defaults separately from an API, so this would be consistent with that definition.

Also, although the CUE evaluator can track the origin of fields quite easily, there is no representation for a “field that has already been unified with a concrete field.

require to be unified with a non-definition field and a concrete value

The distinction from the former is that the concrete value may originate from a definition. This definition however, is not associative.

Required indication on the right-hand side

We considered writing foo: int! or foo: required(int) instead of foo!: int making the requiredness a property of the value instead of the field.

This didn't sit well with the requirement of needing to unify with a concrete value from an optional field: {foo?: 1} & {foo: int!} would be hard to represent correctly in CUE. A goal is to allow representing evaluation results in CUE itself, but we did not see a way to accomplish that here.

List type

One motivation for this proposal was for the ability to define a non-concrete list. For this we considered introducing Go style

a: []T
b: [<10]T

types.
However, indicating the size can already done by syntax introduced in the query notation:

a: [<10]: int

which would just be a generalization of CUE.

Also, this would still not solve the same problem for non-concrete structs or scalar values.

So overall, this would be a heavy-weight solution for little benefit.

Querying using unification

The ! operator would also be useful for querying values using unification. Normally, a query like a.[: {name: 2}] would produce quite unexpected results: it would unify with any element that either has name set to 2 or for which name is undefined, possibly setting that value to 2.

To avoid this, users could write a.[: {name!: 2}].

We considered this to be too cumbersome and surprising to be a viable solution, though.

Alternate definitions for querying by subsumption

There are really many variants possible. We mention a few. All of these could be considered.

Definitions as types

  1. If the subsumption is a definition, the subsumed instances must have unified with this value.
  2. For non-definitions, we do a full subsumption, but put restrictions on what values are allowed.

This would allow queries like

a.[:{a: [string]: name!: <”Q”}]

but perhaps not more funky queries using pattern constraints.

Definitions as types, subsume concrete values only

If the subsumption is a definition, the subsumed instances must have unified with this value.
For non-definitions, we only match the manifestation (concrete values) of the value.
This would allow full pattern matching.

Always subsume manifested values only

This would allow unrestricted unification. This seems limited, though, as people may want to query APIs with certain properties.

The syntax a.*.[:{}]? could be used to query the non-manifested value. Similar restrictions may still have to be applied to subsumption in this mode though, though they would typically be irrelevant to the casual user.

Alternative semantics for projection selectors

We’ve considered a more direct correspondence between selectors for projection and regular selection. This results in the following definitions

  1. a.*.foo: fatal error if not allowed, dropped for incomplete errors, otherwise value returned (including non-concrete).
  2. a.*.foo?: also allow non-concrete or optional (match pattern constraints)
  3. a..foo!: like a..foo, but additionally filters for concreteness..

For the common use case of requiring concrete data, this would mean that users would have to almost always use the third form. This seems undesirable and will likely result in too many gotchas. In the end, we were able to get the desired behavior for selectors in projections by only considering a non-concrete value to be an “incomplete” error. This seems to be a reasonable solution. Consider also that interpreting a non-concrete value as incomplete already happens at various points in CUE evaluation.

@mpvl mpvl added FeatureRequest New feature or request Proposal and removed FeatureRequest New feature or request labels Mar 10, 2021
@eonpatapon
Copy link
Contributor

One motivation for this breaking change is "good API design", ok but what about configuration ?

Most of my configuration inputs are required values. Am I the only one ?

@mpvl
Copy link
Contributor Author

mpvl commented Mar 11, 2021

One motivation for this breaking change is "good API design", ok but what about configuration ?

Most of my configuration inputs are required values. Am I the only one ?

Can you give a concrete example (ideally real-life) where you would expect things to be concrete?

Also note that the plan is to not initially change it and give people plenty of opportunity to play around with it. But would be great to see good arguments in favor of the current semantics.

@mpvl
Copy link
Contributor Author

mpvl commented Mar 11, 2021

@eonpatapon: part of the reasoning: either configurations are small, in which case it won't hurt to specify a few extra !'s, or they are not, in which case the likelihood that all fields are required is tiny, while specifying ? everywhere is both tedious and error prone.

@verdverm
Copy link
Contributor

Anecdotal experience from my DevOps work...

When I look at Terraform, Ansible, Kubenetes, Helm, cloud provider APIs, and a host of other programs, I find very few required fields and many more optional fields that typically have defaults.

When creating configuration for tools I develop in this role, I try to follow this pattern as well.

@niemeyer
Copy link

niemeyer commented May 4, 2021

Thanks for the extensive context given in the proposal.

I'm a bit concerned with the direction of CUE after reading it, though, as much of the justification for the suggested changes seems to be either about performance or about cases in which the most obvious approach doesn't work as it implies.

I'll exemplify the concern with a few quotes:

A required field, denoted foo!: bar, requires that the field be unified with one or more regular fields (not required and not optional) giving a concrete result.

The intuitive notion we all have is that things are either required or optional, without in betweens. I realize that this is really concrete vs. non-concrete and present vs. non-present, but if we have to explain optional fields -- such a simple idea -- to the reader and point to a spec, we've lost a shot in terms of design. This is such a visible issue that even the proposal itself confuses the point by calling it required sometimes, and the chosen syntax reinforces the problem by them being opposites (! vs. ?).

Intuitively, it seems fair to say that when one expresses something as simple as this:

{ foo: string, bar?: int }

What is meant here is that the outcome should be foo being present and a string, while bar is optional but if present must be an int. If the goal is generating CUE source (via queries, or any other mechanism), then it's okay for foo to remain defined as string. If the goal is generating json, yaml, or validating one of these, then they should resolve to a concrete value because the statement wasn't declared as optional. Again, it's clearly subjective, but it seems fair to say that this is the intuition.

Now, speaking specifically about whether optional is more frequent than required or not, we could easily preserve the above semantics and have ! but no ?, but for similar reasons to the stated above, this sounds like a mistake because the inherent notion that people have about fields is that when they enter a name, that name is there and is required.

Demonstrating these points with another quote from the proposal:

It is currently awkward to write that a value should either have value of one kind or another. For instance:

{a: >10} | {b: <10}

would match both variants if a user didn’t specify any of the fields (assuming a and b are valid fields).

This is a great example because it seems hard to argue about what the expression should mean. Instead of adding syntax to enable what people want to do to be possible, it would be better to make sure that this does what people think it would do.

Another one:

Using a subsumption filter {name: string} would also match foo, as it is, strictly speaking subsumed. Using !, we can work around this:

query: a.[:{name!: string}]

Again, similar idea: there's little doubt when reading that query that the user did in fact expect name to be present in the document as a concrete value if the query was done against a body of source to extract a known value. It is only the lack of context that generates the possibility of string matching string being an acceptable answer. Having to type ! on every field every time one wants to write such a query would be unfortunate.

One more:

CUE currently allows non-concrete values in the output. Unifying

a: int
b?: int

and

b: 2

succeeds with vet, because for CUE int is a valid value.

Same idea: seeing both a: and b?: makes it look like indeed this should fail on vet. This is inherent to the way we've experienced such ideas before, so it would be better to make vet indeed match people's expectations, and address other use cases differently.

It's worth saying I'm fresh on CUE, which means I may still be misunderstanding concepts and be missing important design decisions, but it also means that for now I still speak as most people that spend less time on it would see it. Those groups are an important audience for projects I oversee more closely, at least.

As a simplistic suggestion, given the issues I've learned about in this proposal, I would look into transforming the behavior of the bare foo to be the one from the suggested foo! when the context suggests that the user really wants a concrete value and not a CUE expression out. At least for such simple use cases it seems worth making the language match people's expectations, instead of teaching them to workaround via additional knowledge.

Again, thanks for the clearly stated proposal, and I apologize for speaking without much background at this stage. Hopefully I didn't miss the key ideas too much.

@seh
Copy link

seh commented May 4, 2021

Like @niemeyer, my first reading of this proposal had me feeling unworthy. So much of it seemed backwards and confusing to me, but I left it assuming that I don't understand CUE well enough yet to see why it should be this way.

@seh
Copy link

seh commented May 5, 2021

Another thought, again echoing @niemeyer's statements: I read "required" and "optional" as antonyms, with no third choice. Including both a required designator (!) and an optional designator (?) for anything but a brief transition period is rather confusing, unless every field requires one or the other.

I find it most intuitive to treat an unadorned field as required, with a special adornment to indicate that a field is optional.

@mpvl
Copy link
Contributor Author

mpvl commented May 5, 2021

@seh and @niemeyer, thanks for the feedback. I'm not entirely happy, of course, about both flags being present, but so far I have not been able to find a satisfactory solution with only one of those. The current semantics does not function well for expressing policy as well as a slew of constraints. The introduction of ! solves this, but it then leaves a few cases where something like ? is needed (and, in fact, there are only few cases where a ! would be needed as well).

Part of the issue is exactly having data and types in a single view. It simplifies a lot, but it does sometime require some extra expressiveness in clarifying intent.

Perhaps the best way forward is to collect a few key examples that expose the entire set of problems and requirements to see if there is a solution where all cases can be satisfied by using either ! or ?.

Of course I would be very open to a simpler solution that satisfies all these cases.

@seh
Copy link

seh commented May 5, 2021

All the big thoughts come to me after I write something here, so take whatever I'm writing this time with a grain of salt.

As you noted, I can see how CUE's various roles for fields—or for the values in fields—makes this more complicated than something like Protocol Buffer messages. CUE fields can include default values, which makes them "optional" in the sense that no one has to provide a value, but the field might still be required to be present in an exported view. That's different from an optional field that has no default value, but could be omitted from an exported view if it lacks a concrete value.

A few questions:

  • What would it mean for an optional CUE field to have a default value?
    If no one provides a different concrete value for it, would the default value make it equivalent to a regular field?
  • What would it mean to define an optional CUE field with a concrete value?
    Doesn't that just make it a regular field?

@mpvl
Copy link
Contributor Author

mpvl commented May 5, 2021

@seh, note that this proposal does not change the semantics of optional fields (using ?):

What would it mean for an optional CUE field to have a default value?
If no one provides a different concrete value for it, would the default value make it equivalent to a regular field?

Nope, it would remain optional. A default value of an optional field is only relevant if unified with a non-optional field with a more general value, that leaves the default and non-default value.

The same would hold for required fields with optional values.

I haven't really seen a use case for this behavior in practice. It's just a consequence of keeping these constructs orthogonal.

@mpvl
Copy link
Contributor Author

mpvl commented May 5, 2021

What would it mean to define an optional CUE field with a concrete value?
Doesn't that just make it a regular field?

Same applies here. For both ! and ? specifying a concrete value does not make it a regular field. So foo?: 1 means the field is optional, but if it is specified, it must be 1. There have been some use cases for this but it is rare.

Similarly, foo!: 1 means: foo must be specified and it must be 1. This is actually quite commonly needed (e.g. #Service: kind!: "Service") and it is not possible to specify this is CUE today. This is one of the cases where ! wins over ?.

@seh
Copy link

seh commented May 5, 2021

Similarly, foo!: 1 means: foo must be specified and it must be 1. This is actually quite commonly needed (e.g. #Service: kind!: "Service") and it is not possible to specify this is CUE today.

When you say "must be specified," is it necessary for someone to restate that, say, the "kind" field's value is "Service," to confirm the required value, or is that kind!: "Service sufficient to declare the required value that will be exported unconditionally?

If you instead wrote kind: "Service" (without the exclamation point), doesn't that lock the value in just the same? Any other value wouldn't unify with "Service," right?

@niemeyer
Copy link

niemeyer commented May 5, 2021

Perhaps the best way forward is to collect a few key examples that expose the entire set of problems and requirements to see if there is a solution where all cases can be satisfied by using either ! or ?.

Indeed that would make it easier to explore options. It would also be useful to collect some context about the use case, as given the proposal above it seems that part of the issue is contextual and perhaps we can do without some of the syntax by taking that into consideration.

Same applies here. For both ! and ? specifying a concrete value does not make it a regular field. So foo?: 1 means the field is optional, but if it is specified, it must be 1. There have been some use cases for this but it is rare.

Similarly, foo!: 1 means: foo must be specified and it must be 1. This is actually quite commonly needed (e.g. #Service: kind!: "Service") and it is not possible to specify this is CUE today. This is one of the cases where ! wins over ?.

It's easy to see the motivation for both of these use cases. What is not clear is what else sits in between them. That is, I'd guess most people would think that specifying kind: "Service" should be enough to express the intent of requirement, and the alternative is the field being optional, and that's it.

Going back to the proposal to illustrate both this and the note above on contextual behavior, we see that example:

#person: {
    name!: string
    age: int
}

jack: #person & {
    name: string // incomplete error; string is not concrete
    age:  int    // ok
}

Isn't the following a more natural way to express the intent above:

#person: {
    name: string
    age?: int
}

jack: #person & {
    name: string // incomplete error; string is not concrete
    // age:  int // ok to be missing
}

Per earlier comment, the error is contextual since the latter expression remains valid while manipulating the state. It is the action on that value -- exporting, vetting, etc -- that turns the lack of concrete values into an error. Can't we take that context into account, instead of adding syntax to express the same thing in a slightly different way?

Depending on use cases, it could be useful to expose some of that contextual difference to the language by having a built-in that does the same operation to a specific value. Using the example above, it might be possible to offer something along the lines of complete(#person) to constrain jack into the exported shape early on, or valid(#person), to enforce completeness without modifying the lattice.

Again I need to add that disclaimer here pointing out that I don't have knowledge of the engine implementation at this time, and I'm trying to understand both the problem and the space of feasible solutions by extrapolating the proposal above.

@niemeyer
Copy link

niemeyer commented May 5, 2021

A correction to the above example:

#person: {
    name: string
    age?: int
}

jack: #person & {
    name: string // incomplete error; string is not concrete
}

The presence of age: int in the second struct would turn the optional field of #person into a requirement for jack, and would also generate an incomplete error. For it to not be an error on a completeness test, much like the name it either needs to not be present, or present as optional, or present and concrete.

@mpvl
Copy link
Contributor Author

mpvl commented May 6, 2021

@seh: with "must be specified," I mean indeed that it must be specified and cannot be omitted. So it will not be the same as just saying kind: "Service". This is a common case in validation (see e.g. #740, but this has come up more often), and we see this ability to be a critical feature when CUE is used as a policy or validation language in general.

@niemeyer
Copy link

niemeyer commented May 6, 2021

@mpvl We both understand that. What we're pointing out is that the intuition is a bare string being in fact required, and reading that issue (#740) I can see @narqo was saying precisely the same there:

    kafka_topic: "topic1" // field must exists and the value must be exactly "topic1"

(...)
Note, in the example above, I could set the constraint for kafka_topic field via a RegExp =~"^topic1$". The issue is that the current behaviour of cue vet feels unintuitive.

Emphasis on feels unintuitive. He's not asking for a different feature. He's saying the current language surprises him, which is what I've been exploring above.

We already have three different constructs that surround this behavior:

  • Regular fields
  • Optional fields
  • Default values

We shouldn't need a fourth "actually required" syntax, but rather just fine tune the existing one to match intuition a bit more closely.

@rogpeppe
Copy link
Contributor

rogpeppe commented May 6, 2021

In case it isn't clear, the problem is that non-optional fields are often used to hold calculated values that aren't necessary to provide explicitly.

#Foo: {
    x: int
    y: x * 2
}
f: #Foo
f: x: 1

This is currently fine: #Foo.y isn't an optional field but we don't need to specify it (if we did specify it, we'd need to specify it as exactly 2).

If #Foo.y was specified as a required field, the above would be invalid AIUI.

If we removed the existing behaviour, it would be a radical change to CUE. For example it would mean that cue trim couldn't work AFAICS, because that relies on the existing semantics.

@seh
Copy link

seh commented May 6, 2021

I'm still missing something about Roger's point. In the example above, #Foo.y is not an optional field. I take that to mean that it must be present in exported data, and it must therefore be concrete. In this case, left to its stated value of x * 2, that means that field x must be concrete at export time.

@seh
Copy link

seh commented May 6, 2021

Also, at risk of putting words into @narqo's mouth from #740, I do now see that he intended that each #Env value had to restate the "kafka_topic" field again, due to his comment about an expected validation failure for env.dev1:

env.dev1 — missing required kafka_topic field,

My current interpretation of the meaning of his #Env definition is that the Kafka topic is nonnegotiable, established as "topic1." Given that, I wouldn't expect any #Env values to restate the field. If they did, of course it would have to match "topic1," but leaving it out should be fine.

I don't understand the motivation to require each #Env value to restate that field with the fixed value.

@mpvl
Copy link
Contributor Author

mpvl commented May 6, 2021

@niemeyer:

We shouldn't need a fourth "actually required" syntax, but rather just fine tune the existing one to match intuition a bit more closely.

Can you give an example of how you would express the issue in #740 with the constructs that are currently there?

What kind of changes do you see that making possible?

@rogpeppe
Copy link
Contributor

rogpeppe commented May 6, 2021

I'm still missing something about Roger's point. In the example above, #Foo.y is not an optional field. I take that to mean that it must be present in exported data, and it must therefore be concrete. In this case, left to its stated value of x * 2, that means that field x must be concrete at export time.

Here's a concrete example (you can run it with the testscript command):

exec cue export foo.json schema.cue
cmp stdout exported.json
-- foo.json --
{
	"f": {
		"x": 1
	}
}
-- schema.cue --
#Foo: {
    x: int
    y: x * 2
}
f: #Foo
-- exported.json --
{
    "f": {
        "x": 1,
        "y": 2
    }
}

In foo.json, there is no y field present despite it not being an optional field, and that's fine because it's implied by the schema.

In other words, it's concrete in the exported data but not required in the input data. AIUI, the ! qualifier would require it to be explicitly present in the input data too.

@niemeyer
Copy link

niemeyer commented May 6, 2021

Can you give an example of how you would express the issue in #740 with the constructs that are currently there?

@mpvl I'm still trying to understand use cases and details of CUE itself, which means my proposal will likely miss important points, but with the conversation so far and the examples from the proposal, what if:

1. Regular fields become required

In other words, the behavior for the proposed ! becomes standard for definitions, so in the case of #740, this would indeed yield a vet error:

#person: {
    name: "Jack"
    age: int
}

jack: #person & {
    // Vet error, no concrete name
    age: 42
}

So this sorts out #740, but opens up problems in current usage. So let's fix those.

2. Default values are extended

We already have the idea of default values today, which is something people understand well from past experiences. So, this already works fine today and we'd keep it working as-is:

#person: {
    name: *"Jack" | "Joe"
    age: int
}

jack: #person & {
    // No error, name is exported as "Jack"
    age: 42
}

In addition, we'd extend default values so that providing a single default is supported:

#person: {
    name: *"Jack"
    age: int
}

jack: #person & {
    // No error, name is vetted/exported as "Jack"
    age: 42
}

This matches what we have today as regular fields in definitions. The field is required per point 1, but has a default.

3. Optional fields are untouched

Doesn't look like we have any issues with those, so they'd stay as-is.

...

As I said, I'm probably missing important details. What issues would that create?

@mpvl
Copy link
Contributor Author

mpvl commented May 6, 2021

@seh

My current interpretation of the meaning of his #Env definition is that the Kafka topic is nonnegotiable, established as "topic1." Given that, I wouldn't expect any #Env values to restate the field. If they did, of course it would have to match "topic1," but leaving it out should be fine.

This interpretation works well for configuration generation, but it is unsatisfactory for validation or policy in general. A very simple but important use case is to be able to ask the question: "Is this yaml file valid for consumption by some service as is?". This cannot be specified currently.

What is currently possible, and what concurs with your interpretation, is to answer "is this a valid YAML file for consumption after unifying it with a CUE schema converting it to YAML again?". But this is not the use case that is being addressed here. This extra unification step if often not an option. As far as I can tell, this is unsolvable with the current CUE.

The use case in #740 is a pure validation use case. In the analysis of using CUE as a policy language as well as within the realm of querying, this inability gets more pervasive and also becomes more an intertwined problem where interpretation of meaning within the context of a certain CUE command will not help. I would say that this is the most important issue addressed by the required proposal.

It would be possible to solve this without a language change, for instance kind: required("Service"), or something. But this is not necessarily better and has its own issues, as mentioned in the proposal.

Overall, the suggested semantics of ! more tightly and clearly reflects intent by itself, without relying on context like as part of which command it is run etc. This has several implementation benefits, including better errors. I argument may be too implementation-centric, but in my experience, what makes an implementation preciser and clearer often also helps the user.

@niemeyer @seh @rogpeppe

In a nutshell, so far I can see a transition path to introduce ! and deprecate ?, and keep only one, but I don't really see a path where we can only rely on ? and not have some additional notion of "required concreteness". Which is not to say it doesn't exist, necessarily, of course, I could be wrong.

I know I should compile more examples to expose the issues. In the meantime, though, seeing how #740 could be solved as a first step without the use of ! or a RHS solution (e.g. foo: "x"! or foo: required(x)) would be useful and may provide answers in another direction.

Not saying it isn't possible. In fact, some recent thoughts on CUE patterns and discriminator fields, as well as more reliance on numconcrete (which has some quite neat properties), may negate some of the benefits of using !. But none of that solves #740 and derivatives, unfortunately.

@seh
Copy link

seh commented May 6, 2021

Thank you for the "concrete" example. I see that specifying the #Foo.y field is not required here. What I—and, I think. @niemeyer—don't understand is why you want to express that need.

We are missing the ability to do something. I don't know why someone would want to do that thing, so it's hard to think through a good design for a feature that I don't think should exist. If this feature did exist, I expect that I'd find it annoying. When it's time to write file foo.json, why do I need to restate a value that's fixed by other declarations?

@seh
Copy link

seh commented May 6, 2021

I wrote the comment above before reading @mpvl's #822 (comment), but got interrupted before I could post it.

You are correct that I've been ignoring use of CUE to validate input data. My only use of CUE so far has been to help generate output data more concisely and correctly. That one language must carry the burden of checking input data while not augmenting it into compliance and also augmenting data for output makes for a difficult design, as you're struggling with here. Well, it's not so much that you're struggling to design it; the rest of us are struggling to figure out when each concern and feature matters.

@niemeyer
Copy link

niemeyer commented May 6, 2021

@seh To be clear, I actually see the issue, and there are apparently at least two of them:

  1. No way to validate a missing field against a schema that includes a concrete value, because definitions today inject the value that was supposed to be a validator into the value being analyzed, turning into into a valid object.

  2. Abstract values (string, int) are currently considered valid for vetting purposes.

Hopefully we can address both and still eat cake at the end of the day.

@mpvl

In a nutshell, so far I can see a transition path to introduce ! and deprecate ?

I'm curious about that option. If we can have only two distinct behaviors, wouldn't having "foo" and "foo!" be equivalent to having "foo?" and "foo", respectively? Wouldn't it be just syntax?

Overall, the suggested semantics of ! more tightly and clearly reflects intent by itself,

Sometimes that feels true, but in the above examples that's often not the case. For example, I hope we can type such expressions in the CLI eventually, and having to tell people to say foo!: X in a query feels like a design wart leaking out into the user. The case of {a: 1} | {b: 1} described above is also similar. It doesn't feel reasonable to have to type ! to do what seems like the obvious intention.

Or do I misunderstand the examples?

(Please note I've replied above simultaneously. Just want to make sure the reply doesn't get lost due to UI.)

@mpvl
Copy link
Contributor Author

mpvl commented May 6, 2021

@niemeyer

As I said, I'm probably missing important details. What issues would that create?

There are some interesting suggestions there I hadn't considered. As usual with CUE, one has to consider what the implications are for the value lattice.

But some quick observations:

  • This conflates the idea of requiring concreteness and defaults. This doesn't always have to be bad, it is just an observation.
  • This essentially introduces a "required concreteness" mechanism, just with different syntax. In the analysis of required fields, having a RHS mechanism to specify required concreteness (which is done here) resulted in some issues. I don't recall the details. I don't believe they were critical, though.
  • I am less clear as to how a transition path would look like here, as there seems to be more syntactic overlap.
  • This would break the interpretation for standard JSON. Basically, we do not want to break the semantics of taking two JSON files and unifying them. (Note that CUE is a superset of JSON).
  • This could be fixed by swapping the semantics of the "naked star", so making name: *"Jack" the required field, which seems weird.

@seh
Copy link

seh commented May 6, 2021

We've been writing back and forth concurrently, so I'll start by saying that I'm writing in response to #822 (comment).

When validating input data that we don't intend to augment, the CUE field values express constraints the input data must meet. If the constraint is a concrete value, the input data must have that same concrete value.

When generating output data that we can augment, the CUE field values express recipes to generate output data, safely combined with any starting values.

Is it correct that we use unification during both validation and export? I take it that the problem is that with a regular field today that has a concrete value, unifying it with input data that lacks that field just causes the field to come into being, as opposed to catching that the field was absent in the input data.

@niemeyer
Copy link

niemeyer commented May 6, 2021

@mpvl

This would break the interpretation for standard JSON.

The merging of two values is a power feature which is great indeed, and not worth breaking. But definitions are already more schema-like than value-like, so flipping their regular fields to become a strict concrete requirement there unless adorned by the default marker would not break that aspect, or would it?

This could be fixed by swapping the semantics of the "naked star", so making name: *"Jack" the required field, which seems weird.

Agreed. If we do need to adorn the key or value, the original ! feels like a better option. I'm just concerned about having to use it in cases which feel like it should be the default behavior for a regular field, and to avoid the confusion of having optional, semi-required, and actually-required fields, but given how nice CUE already is I'm sure you'll figure something good out.

@niemeyer
Copy link

niemeyer commented May 7, 2021

@mpvl Here is another perspective that occurred to me overnight, which I believe addresses #740 and is probably the least disruptive change to the language thus far.

We've been thinking about that issue as "required concreteness", but as we discussed a few times above the idea of a required field already exists in the language, so that creates some confusion. What's curious is that the stated problem happens precisely because the field is concrete, which means it gets unified concretely instead of remaining a requirement constraint.

So what if we think about it in terms of "required abstractness"? That's what we have normally and why despite unifications the idea of requirement isn't lost:

#person: {
    name: =~"Jack"
    age: int
}

jack: #person & {
    // Error vetting/exporting, name doesn't match.
    age: 42
}

So wouldn't a simple solution be to transform the concrete value into an abstract one, using analogous language:

#person: {
    name: =="Jack"
    age: int
}

jack: #person & {
    // Error vetting/exporting, name not set as expected.
    age: 42
}

For the spec this just means turning == from a binary_op into a rel_op.

That would prevent the unification of a concrete value, align with existing language, and make exporting and vetting that field able to catch the issue as it already does for required/regular fields which weren't properly defined.

@mpvl
Copy link
Contributor Author

mpvl commented May 7, 2021

@seh

Is it correct that we use unification during both validation and export? I take it that the problem is that with a regular field today that has a concrete value, unifying it with input data that lacks that field just causes the field to come into being, as opposed to catching that the field was absent in the input data.

Yes, that is correct. There are a few reasons for that:

  1. computational:
    in order to reinterpret CUE as is for validation, the only proper way to do that would be to use subsumption. This is not that simple though, as a) subsumption can take many forms depending on the exact purpose, and b) subsumption is NP complete for some of these forms. There is only one form for unification, and its computation is efficient (not the current implementation, but at least it is possible).

  2. explicit intent:
    This may be a matter of taste, but in general I think it is proper form to have intent specified explicitly as possible. I would much rather prefer to see from the CUE itself what the intent is of the code in any context, than have to second guess the intent of some CUE based on for which tools it was written. Consequences of not being explicit:
    a) bad error messages
    b) one may have to rewrite CUE when used for a new purpose, as the
    c) forces user to look in context of how files are used to understand intent
    d) missed opportunities for optimizations
    e) missed opportunities for tooling automation

  3. Rules out mixed modes
    So what if a command comprises both validation and unification? How would one interpret the CUE?

@mpvl
Copy link
Contributor Author

mpvl commented May 7, 2021

@niemeyer

But definitions are already more schema-like than value-like, so flipping their regular fields to become a strict concrete requirement there unless adorned by the default marker would not break that aspect, or would it?

I believe you are suggesting here we could interpret definitions differently. Even if not, the following may provide useful context.

There are in fact many things we can do if we go that road. To some extent we already do so, as definitions conflates the notions of "not being output during export" and "type checking on closed fields is enabled". The decision to conflate there was done with big trepidation and thorough consideration. In general, this does not seem to be a good idea.

That said, I have considered it in this context as well and definitely don't want to rule it out. It is actually a whole new universe of neat solutions in that direction. Nonetheless, when working this out I ran into some issues that seemed to spell danger, so I backtracked and focussed on a solution that does not overload the meaning of definitions further. Again, that doesn't mean this cannot be a direction to take.

@mpvl
Copy link
Contributor Author

mpvl commented May 7, 2021

@niemeyer

name: =="Jack"

That is a really neat idea! I always like it to use existing possible syntax when possible.

I see one potential snag. The ! approach also solves a problem to require that a field must be specified with a list or struct of a certain type. Since [...int] and {...} have an implicit default of [] and {}, respectively, there is no way to express that either. It is actually the same problem as with kind!: "Service".

So using ==, this would look like:

a: ==[...int]

and

#Foo: {
   info: =={
       version:      string
       description?: string
   }
   optionalStruct1: {}
   optionalStruct2: {}
}

Although for structs this may not be necessary. Note that the way structs work now is one of the warts of the current optional semantics. If a piece of JSON really doesn't need to specify a struct, it should be written as

#Def: a?: {...}

But that is not necessary. That probably was the wrong choice, as it is rather inconsistent. If we were to fix the optional semantics, things would look more like:

#Foo: {
   info: {
       version:      string
       description?: string
   }
   optionalStruct?: {}
   optionalStruct?: {}
   requiredList: [...int]
   optionalList?: [...int]
}

This looks quite ugly to me, and provides quite a bad UX imo (it is easy to forget the ?, even though that is probably what is intended in most cases. But that it personal taste to some extent.

It would on the other hand, make solving the == problem for lists and structs redundant, I believe. I would still need to explore other use case, though, and what again the "moving the required concrete indicator to the RHS" issue entailed again.

@niemeyer
Copy link

niemeyer commented May 7, 2021

This looks quite ugly to me, and provides quite a bad UX imo

I don't find that ugly, or beautiful either. I find it consistent and readable, which seems much more valuable in the context of CUE. I bet that if we share that snippet with someone that understands JSON but has never seen CUE before, they'll be able to guess exactly what sort of operation would be applied. The same is true for the proposed == as well.

It also looks like dropping that default-to-empty behavior would fix a couple of the issues mentioned in the original proposal above too, so again feels like a move in the right direction on itself.

So I guess I do find it beautiful after all. Not in terms of syntax, but in terms of solution. Reminds me a bit of conversations in the early days of Go itself.

@mpvl
Copy link
Contributor Author

mpvl commented May 7, 2021

Ah, I see @rogpeppe point now. So here are some other things to consider:

  1. Regular fields become required

Roger actually points out a very serious problem with the alternative approach. As Rog mentioned, it would still not be possible to default all non-concrete fields to required. A common use case is to have fields that have some expression that evaluates to a concrete value. For instance, the desired semantics for a: x + 2 is to set the field a to the result of the expression. It is not correct to write a?: x + 2 as this means the result will never be inserted. So we need the current semantics to remain as is, in this case.

In other words, one can perhaps treat a field like a: string as required, but not field a in a: b + "foo", b: string". Also, what is the cutoff? Is a field only required when it doesn't evaluate to a concrete value? Considering that string is just a predeclared identifier, it is not as simple as saying that just fields with expressions should be treated as required. Also, consider:

a: b // should this be required?
b: string

versus

a: b // should this be required?
b: 2

Adopting the "sometimes required" semantics for regular fields, one would not be able to tell from the context whether something is required. Personally, I think it is quite a loss having to chase references to see figure out the semantics of a field.
Consider also:

a: string  // this is concrete, and this not required.
...
string: 1

I believe this was one of the reasons why we considered the LHS required specification to be superior over the RHS alternative.

Note that aside from these issues, as @rogpeppe mentions, and I alluded to with "I can't see a transition path", a change in these semantics is a massive change to CUE. In contrast, adding ! is a fairly uneventful change in and of itself. The hard part with ! is whether we can manage to get rid of ?.

This issue actually gets to the core of why this proposal is the way it is. Let's consider it in terms of types of fields we need:

  1. Require a user to specify a field with arbitrary values or constraints.
  2. Have computed values that get filled into fields automatically (templating)
  3. Require a field to be of arbitrary values or constraints, if specified. The resulting field is defined, but not concrete. It is an error if its value evaluates to an error.
  4. Constrain a field to a certain type, if specified. The resulting field is not actually defined. It is not an error if its value evaluates to an error, it just means this field may not be defined.

Type 1. is less common, but important when it occurs and a must have. It will be more prevalent as CUE is used more for policy checks and querying. This is the semantics proposed with !.
Type 2. and 3. are common. These are currently equivalent to regular fields.
Type 4. is currently also common, and reflects the optional fields. It is dominant in many API specifications, as an evolved API will mostly have optional field. However, my hunch is that by increasing the consistency of tooling (have no distinction in the way export and vet interpret CUE), its utility will completely, or almost completely, vanish.

In the current implementation there is only 2, 3, and 4. If we don't introduce the notion of a required field, it is almost unavoidable to thinker with 2 and 3, as you've noticed, meaning we have to change the meaning of existing constructs in CUE, with all its consequences.

Note also that the usage of ? is dominant only because there is no other way to express optionality for cue export. My hunch is that people are not relying much on the property, for instance, that `{foo?: 1} & {foo?: 2} does not result in an error.

Also note that the use of ? is already redundant in the language, as foo?: int can be written as ["foo"]: int. This mitigates the risk removing it from the language if its usage becomes rare, as users that will need it will still have this alternative.

In fact, getting rid of ? has the nice property that it eliminates the currently somewhat confusing concept, that some users run in to, that one can refer to a field that does not exist. We did not mention this in the design doc, though, as it is a somewhat misleading to argue this is a benefit, as it is just more likely to shift the burden of having to deal with non-concrete values in comprehensions. The query syntax and semantics is designed to provide a better UX in the face of that shift, btw. But in getting rid of ? there is an opportunity to eliminate some of the confusing concepts of the language (I don't fully share the view that optional fields are a fully intuitive concept).

With the current proposal, it is possible to add !, keeping CUE semantics as is, and then chip away at phasing out ?. The worst part is users may have to use a flag to flip between old or new behavior when running cue export. Granted, there are still many details to work out. In essence this is an incremental path, introducing the unfortunate duality of ? and !, with the intention of eliminating ?, but with the risk this won't work out. I recognize the unfortunate nature of having both ! and ? in case that wouldn't work out.

To take the route of only keeping ?, the only possibility is to change the behavior of fields not using ? in one way or the other, at the language level. This would be much more an all-or-nothing change for which I can't see a clear transition path.

Basically, the fact that ! conveys an opposite meaning to ? actually helps smoothing a possible transition.

To be fair, note that both paths actually change the meaning of regular fields. The big difference, though, is that the required field approach only changes this interpretation at the tooling level, becoming more permissive during export (allowing non-concrete values), while the ? approach changes meaning at the language level. The latter is far more complex to deal with, though.

Finally, more forward looking, one also has to consider symmetry with the query language. At the query level, it seems very natural to think of terms concrete versus non-concrete. Almost all JSON query languages will silently skip selections in a query projection if this field does not exists (there concreteness translates to present or not). So the most natural behavior seems to drop an element in a.[].b, if b is not concrete. Now, if we think we also need a query mode where we require a field should be present a field to be present, then using a.b! seems like quite an intuitive notation, and would match the proposed LHS counterpart closely in semantics. Note that one could use the same for being able to access non-concrete values using a.b?, of course, but that seems a far less common use case.

In general, one observation was that the way one would think in terms of optional, required, concrete etc. in the context of the proposed query extension, quite closely matches the somewhat different way of thinking of optional vs required in this proposal. There is a reason that the query proposal hasn't yet been implemented and that the required proposal heavily refers to it. :) Anyway, again, the solution space is huge, so there can always be alternatives.

Anyway. Just providing a few more data points. :) There are certainly remaining issues with ! as well, but this sheds some more light on why going the ! route seemed superior to the seeming less innocuous path of only using ?.

@seh
Copy link

seh commented May 7, 2021

I had many thoughts overnight, and found all of the intervening discussion this morning. I read all of it.

Still, I am struggling with getting past this idea: A regular field pushes or creates data during export, and pulls and requires data during validation. When validating, it expresses a requirement of the input data. If the input data is absent, it can't satisfy the requirement. When exporting, it ensures that the exported data would satisfy the same validation rule coming back in later.

If I write:

kind: "Service"

and I'm validating input, I'm demanding that the "kind" field is present and has the value "Service." If I write that and I'm exporting, I'm ensuring that the exported data has a "kind" field with the value "Service."

If I didn't care whether or not the input data had a "kind" field, I could write kind?: "Service" to express that if it's present, it must have the value "Service," but if it's absent, that's fine. Similarly, when exporting a field written as kind?: "Service", I'm saying that we don't need to write this field out, because "Service" is already the default value. On subsequent import, we'd tolerate the absence of a "kind" field, or the presence of a "kind" field with the value "Service," but we'd reject a present "kind" field with the value "Ingress."

As an author of CUE code, that directional treatment makes sense to me. What am I missing that we can't express that way?

@mpvl
Copy link
Contributor Author

mpvl commented May 10, 2021

@seh. There are many moving parts and it is indeed easy to hold a view where things work different. I understand where you are coming from. I'll try to explain from that angle.

The problem discussed in #740 can indeed be solved in various ways. The direction that this proposal takes is to have a single interpretation of CUE for all commands.

A different approach, indeed, could be defining different behavior for different operations. In fact, vet and export already behave differently today, although the behavior evolved and wasn't deliberately planned. Both use unification, but interpret concrete fields somewhat differently. This doesn't solve #740. But indeed, #740 could be solved by using a special kind of subsumption (instance-of relation) instead of unification (merge two values). Intuitively this is an appealing solution as it seems quite simple and (seemingly) doesn't require a language chance.

There are many problems with the subsumption approach, though. With unification, a single operation satisfies all use cases. But subsumption needs to behave subtly different depending on the use case. The variables are: include optional field in parent/child?, pick defaults for parent/child?, project onto closed equivalent for parent/child? That's already 64 different forms of subsumption (not all make sense, granted), and there are probably more.
So in the case you describe (vetting data against a schema), one could treat the data as a fully evaluated, disregarding optional fields. This can be done efficiently, and would be easy to comprehend. However, this would not work for vetting schema against another schema. Thinking a bit more about it, it may be possible to define a subsumption operation that works in both cases (translate the checked CUE to a form where non-existing fields are presumed to be bottom), but that still leaves some room for interpretation (should one pick defaults?).

None of this is relevant when solving the #740 and related issues with unification instead. The trick here is that the ambiguity is resolved by making configurations more explicit in intent. Note that the required/regular field approach is not just a syntactic change over the regular/optional field approach. It is this slight change in semantics that makes this possible.

Another big problem with subsumption is that it is NP-complete in its general form. Not only that, its correct implementation is rather hard. There is currently no accurate implementation of subsumption for CUE: all applications where subsumption is used it suffices to allow false negatives. This is why the cue def section notes that simplification of schema is done on a best-effort basis only.
But arguably, such approximations are not good enough for vetting. It is hard to foresee what users will want, but there is a notable danger of painting oneself into a corner that is hard or impossible to get out off. I would rather prefer to keep vet simple and predictable.

Note that, somewhat surprisingly, although the unification is expressed in terms of subsumption, it does not suffer from the same problems, with one exception: needing to determine whether regular expressions intersect to the empty set. This issue only manifests itself with cue def, though, where it falls under the best-effort guarantee as well. By focussing solving data only, one could reduce subsumption in a similar way. Still, the general problem space is far clearer and manageable with unification, compared to that of subsumption. I know these details don't matter much to users, but they are important factors in design decisions as well.

Then a final problem with using subsumption is a matter of UX. Personally, I use cue vet if I just want to see the errors and not have the full evaluation printed. I would be very surprised if cue export and cue vet results in a contradictory set of errors.
Of course this could be solved by introducing different commands, for instance, adding cue compare [mode], where mode covers a common set of useful subsumption operations (proper instance, backwards compatible, an exact mode for data, etc.).

As we probably need to introduce such a command anyway, if only to allow for backwards compatibility checks for apis, it may not be a bad idea to add regardless. It would be interesting to consider either way how this would look like in combination with this proposal.

On top of these fundamental issues, there are a few practical issues going the subsumption route.

Using subsumption still doesn't solve expressing an equivalent for a!: [...]. This could be fixed by using a?: [...] instead of a: [...]. My hunch is this will be a more impactful change than introducing !. It could only be introduced smoothly by making some very subtle backwards incompatible changes. (e.g. generated schema would now need to add ? consistently, which means one can no longer refer to these fields unless unified with its regular counterpart, an artifact of using ?). Introducing ! instead is rather uneventful at the language level. Using an alternative syntax like == may be a solution, but that doesn't seem like a natural solution for the subsumption approach.

Using ! seems to be work better with the query extension.

Finally, the benefit of using unification requires CUE schema to be more precisely defined (or more accurately: the use of !+regular fields conveys more practical information to CUE than the use of ?+regular fields, as the first approach indicates the desire for certain values to be concrete, whereas the latter doesn't). This clearer intent, in turn, allows generating better error messages.

@niemeyer
Copy link

@mpvl I'm commenting mainly to give a solid +1 to that entire last message. Thanks for taking the time to explain, and really to think about all these corners of that foundational problem.

I still wonder whether certain apparently simple cases such as querying will be confusing because the implied meaning for a field name there is for it to be present. But even if that turns out to be a problem in practice, it should be easier to solve it once we are in a position of consistent proper behavior, as suggested in that last comment, than from where we are today.

@mpvl
Copy link
Contributor Author

mpvl commented May 10, 2021

@niemeyer

I still wonder whether certain apparently simple cases such as querying will be confusing because the implied meaning for a field name there is for it to be present

We wondered the same. The reasoning that this is probably okay is that

  1. this is pretty much how all JSON querying languages work, so not defining it this way is probably confusing,
  2. there is precedence for it in CUE: in a + b, a and b must resolve to concrete values.

It makes sense to enable an initial implementation for querying (or any of this, really) as an experiment or in some other way that allows us to make clear that the exact meaning may change.

@seh
Copy link

seh commented Jun 3, 2021

It's been almost a month since I last wrote here, and I finally ran into a situation that I think motivates required fields. I have a JSON file generated by a separate tool, and I want to consume that JSON file in a CUE instance. In that instance, I wrote a definition capturing the approximate schema to which I need that JSON file to conform.

I'm using a pattern constraint for field names when defining the JSON schema. Later, I refine that general schema for the file format with a set of field names that I expect to be present in the input file. "TF" here means Terraform.

#TFOutput: [string]: {
	sensitive: bool
	// This is loose, but captures enough of the possibilities.
	type: string | [(string | [string, ...]), ...]
	value: _
}

#AWSTFOutput: #TFOutput & {
	let objectType = {
		type: ["object", ...]
	}

	output_name_1: objectType
	output_name_2: objectType
}

Here, #AWSTFOutput is defining what I expect to find in the JSON file that terraform output -json emits for a particular AWS-related Terraform configuration, where there should be at minimum two Terraform output values named—redacted here—"output_name_1" and "output_name_2." I don't mind if there are more output values, but at minimum I want these two to be present, and I want them to be of type "object" within Terraform's type system.

This works to catch either of those output values being present but of the wrong type, but it doesn't catch either of those output values being absent. I suspect that's due to an interaction with the pattern constraint in the #TFOutput definition. Is there a way to express this constraint in CUE as it is today, or is this the sort of gap we're considering closing with required fields?

@eonpatapon
Copy link
Contributor

How do you vet the json file against this schema ? For me this should work if you check for concreteness. It should fail for example if output_name_2.value is not defined (hence not concrete).

@myitcv
Copy link
Contributor

myitcv commented Jun 4, 2021

@seh you could indeed use ! here as follows:

#AWSTFOutput: #TFOutput & {
	let objectType = {
		type: ["object", ...]
	}

	output_name_1!: objectType
	output_name_2!: objectType
}

(noting #1024 in passing). And if any fields on objectType are also required they could be marked accordingly too.

@seh
Copy link

seh commented Jun 4, 2021

That is, if you're using a version of CUE that accepts '!', right? I tried it with version 0.4.0, and it complained like this:

expected label or ':', found '!':

@myitcv
Copy link
Contributor

myitcv commented Jun 4, 2021

That is, if you're using a version of CUE that accepts '!', right?

Indeed. There is no version of CUE that implements this proposal as yet.

@seh
Copy link

seh commented Jun 4, 2021

How do you vet the json file against this schema ? For me this should work if you check for concreteness. It should fail for example if output_name_2.value is not defined (hence not concrete).

I found that it doesn't fail when I try to access output_name_2.value to populate a field that is itself optional. The value probably remains incomplete.

@cueckoo
Copy link

cueckoo commented Jul 3, 2021

This issue has been migrated to cue-lang/cue#822.

For more details about CUE's migration to a new home, please see cue-lang/cue#1078.

@cueckoo cueckoo closed this as completed Jul 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants