Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPOSAL: alternative syntax and separate name space for Definitions #307

Closed
cueckoo opened this issue Jul 3, 2021 · 10 comments
Closed
Labels
FeedbackWanted Further information is requested Proposal roadmap/language-changes Specific tag for roadmap issue #339

Comments

@cueckoo
Copy link
Collaborator

cueckoo commented Jul 3, 2021

Originally opened by @mpvl in cuelang/cue#307

Proposal: Alternative Syntax for Definitions

Definitions are currently marked using the :: syntax. This document proposes an alternative syntax. The purpose is to separate the name space of definitions from that of regular fields.

Background

CUE currently uses :: to mark a field as a definition. The field name is a regular identifier, meaning that definitions and regular fields occupy the same name space. This can pose a problem in mapping certain languages to CUE, where similar constructs live in a separate namespace.

The most notable case causing issues is JSON Schema. JSON Schema allows a definitions (or $defs) section to introduce definitions. Logically, such definitions would have to be mapped at the same level as the top-level fields in CUE. This does not pose a problem along as a convention is followed where fields are camel case and definitions are upper case. Unfortunately, this is often not the case.

A similar issue exists, theoretically, for OpenAPI, Protocol Buffers, and Go. It is just that conventions are more strictly followed for these languages and that this issue is not a problem in practice.

Several secondary issues, like simplification of export rules, are addressed by this proposal.

Overview

In this document we propose lexically distinguishing identifiers for definitions from regular fields by requiring a # prefix that is part of the identifier of such definition, thereby implicitly creating a separate namespace for definitions. This would replace the :: notation.

Before:

Definition :: {
    foo: string
    bar: Map

    Map: [string]: int
}

a: Definition
b: a.Map
c: Definition.Map

After:

#Definition: {
    foo: string
    bar: #Map

    #Map: [string]: int
}

a: #Definition
b: a.#Map
c: #Definition.#Map

Regular fields can still have a name starting with a # when the name is
enclosed in quotes "#foo": value. The proposed mechanism is thus analogous to
the now legacy, but still supported construct of hidden fields (_foo).
In fact, with this proposal the reintroduction of hidden fields becomes an option.

Details

Use of # in regular fields

Regular fields that need to start with a # can use aliases:

// These are NOT definitions, but regular fields. The X= is an alias to that it can be referred to.
X="#Regular": {
    foo: string
    bar: Y

    Y="#Map": [string]: int
}

a: X
b: X.foo
c: X["#Map"]
c: X."#Map" // alternative as per Query Proposal
// Note X.Y is not possible, as Y is not visible in this scope.

Export rules

The spec currently defines exporting rules for fields. These are not implemented. Part of the difficulty is exactly that definitions and regular fields occupy the same namespace: the spec has different rules for the two cases, but since they share the same namespace, it is not always obvious which rules should apply where.

With the two distinct namespaces, the following exporting rules would be feasible:

  • All identifiers of regular fields (those not starting with a #) are exported.
  • A definition identifier is exported if the first character following # is a Unicode uppercase letter (Unicode class "Lu").
  • Any other definition is not visible outside the package and resides in a separate namespace than namesake identifiers of other packages.
package mypackage

foo: string  // visible outside mypackage

#Foo: {      // visible outside mypackage
    a: 1     // visible outside mypackage
    B: 2     // visible outside mypackage

    #C: {    // visible outside mypackage
        d: 4 // visible outside mypackage
    }
    #e: foo  // not visible outside mypackage
}

This still relies on casing, which may still not generally work when CUE is automatically translated from other languages. Given the conventions of existing languages, it generally seems to give a desirable outcome, however. A more aggressive exporting policy, for instance exporting all identifiers starting with non-lowercase letters or even exporting any definition not
starting with #_, may be in order.

Interaction with hidden fields

The notation of the proposed change is analogous to that of hidden fields (no longer part of the spec, but still implemented in search of a good guideline for alternatives). The implementation of hidden fields is somewhat complicated. It is also a syntactically very different construct for something that is almost identical to definitions. It was therefore decided that we need
to phase out hidden fields.

With the current proposal, hidden fields become merely a slight variant of definitions: with #Foo: { … } the struct will be closed whereas for _Foo: {…} it won’t. In both cases the field will not be part of data output.

API

The current CUE API already distinguishes between regular fields and definitions. With proposed changes, bringing hidden fields in line with definitions, the same API could now be used to look up hidden fields. Other than that, the lookup API would not have to change.

The AST API could be simplified by removing the token type. There would be a long transition period, however, to support the old representation.

Discoverability

With the proposed change, it becomes easier to explain all different field types within a single table:

  • foo: x: regular field
  • $foo: x: also a regular field, often interpreted as some meta field by the user. The $ has no meaning to CUE itself.
  • #foo: x: Definition: not part of the output when converted to data. Structs are implicitly closed. Can be used to define a complete definition of a type.
  • _foo: x: Hidden field: like definitions, but structs are not implicitly closed. Can be used to define partial values that are not complete types. (TBD)
  • "foo": x: using double quotes any valid JSON string can be a field name for a regular field, including including "#foo" and "_foo".

All of these are just identifiers. Today, a._foo and a.$foo are valid references. The advantage of this syntax is that the _ and $ signal to the users what kind of value is referenced. With #-style identifiers this benefit is extended to definitions as well. For orthogonality, a."foo" should be allowed as a valid reference (see The Query Proposal).

Transition

Firstly, the proposal justifies the reintroduction of hidden fields. So any pain caused by the introduction of #-style definitions could be offset by no longer needing to transition off of hidden fields.

A transition phase could allow both the ::-style and #-style identifiers to coexist and mean the same thing. A transition period could consist of the following steps:

  1. Compiler automatically rewrites ::-style definitions to # form, including all references.
  2. cue fmt to rewrite old style definitions to new (where possible).
  3. Parser stops supporting old style in regular mode.
  4. Parser removes support for ::.
  5. AST removes support for token.ISA.

In order to move definitions to their own namespace early on, it is important for parsed CUE to be rewritten to #-style identifiers before each compile. This means that representations of references will exhibit the #-style identifiers even for code that uses the then legacy double colons.

In the strictest implementation, definitions may only have #-style identifiers and cannot have free-form strings like regular fields can. This introduces the following incompatibilities:

  • Definitions of the form "\(expr)" :: value or [expr] :: value are no longer possible.
  • The -l command line flag would no longer accept fields of the form "\(foo)"::.

A workaround for the first limitation is to move generated definition inside a map of a static definition (everything can be solved with another level of indirection). For instance:

foo: "y"
x: {
    "\(foo)" :: value
    "\(bar)" :: value
}
a: x.y

can be rewritten as

foo: "y"
x: {
    #m: "\(foo)": value
    #m: "\(bar)": value
}
x.#m.y

The flag issue can be solved by allowing some annotation to indicate a field is a definition specific to this flag (perhaps even supporting the then legacy ::). See also extensions.

Extensions

The transition section discusses several limitations imposed by the proposal. If need be, the language could be extended to allow for "dynamic" definitions, for instance of the form #(expr): value, where expr needs to evaluate to a valid identifier. See The Query Proposal for more details.

Discussion

Comparison

Precedence

The use of #foo for definition has some analogy in JSON Schema, where the same notation is allowed for anchors to refer to schema in the $ref field.

Use of double colon

The use of double colon was derived from Haskell. It also has parallels with Jsonnet, where it is used to mean almost the same thing. The "almost" can also lead to confusion here. Note that Jsonnet also has a :::. This proposal will remove any pressure for CUE to follow suit.

Alternatives

An alternative way to deal with separate namespaces, keeping :: is to have another selector operator specifically for definitions. For instance,

Foo :: {
    Bar :: {}
}

a: Foo.:Bar // or, for instance, Foo->Bar

As CUE is lexically scoped, there is no ambiguity which of the two namespaces is meant and there is no need for any special marker using the first reference. The special operator is only needed for the selector, Bar.

In other words, introducing separate namespaces without distinguishing identifiers lexically introduces a similar amount of clutter, but leads to less clarity. The users will have to learn a new symbol (.:) and will be more confronted between the difference between a reference and selector. Distinguishing fields from definitions lexically results in more symmetry between the two (#Foo.#Bar) which in turn seems to lead to a more intuitive reading.

Feedback wanted

  • Does this look like a reasonable change? Like/ not like?
  • Anybody relying on "dynamic" definitions (e.g. "\(foo)" :: bar or [foo] :: bar)?
  • Has anybody else ran into issues from having regular fields and definitions be in the same namespace?
@cueckoo cueckoo added FeedbackWanted Further information is requested Proposal roadmap/language-changes Specific tag for roadmap issue #339 labels Jul 3, 2021
@cueckoo cueckoo closed this as completed Jul 3, 2021
@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @DanielMorsing in cuelang/cue#307 (comment)

In general I approve of this. I've been working with data where I wanted composition on hidden fields and the closedness of definitions was a hassle to deal with, so I am happy to see them stay in the language

As for (regular) hidden fields, it might be a good idea to choose another sigil. A lot of C language families allow _ to start identifiers and I can see it being confusing to someone just picking up the language that the fields they added are not being emitted.

# is used as a comment marker for a lot of languages (notably yaml). It might be a good idea to explicitly detect if people use one for more friendlier errors

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @verdverm in cuelang/cue#307 (comment)

I generally like this idea as well.

One thing I like about Golang is the ability to see visibility of constructs by just looking at the first character. This would be a good measure (the ease with determining scope/visibility, open/closed) for the choice of denotation or syntax. I don't have any strong opinions on the exact format, just the ease with which I can comprehend code.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @seh in cuelang/cue#307 (comment)

I like the '#' character for referring to definitions. It looks odd, though, using the same character for establishing (defining?) these definitions. Have you considered using a separate token such as def for the latter case? That would look nice for top-level definitions, but it doesn't look as nice when they're nested within other structs (as in your Transitions example's "m" definition).

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#307 (comment)

@verdverm Yes the proposed approach was, perhaps not surprisingly, inspired by exactly this property of Go. It is harder to use the casing approach of CUE, just because it is not always feasible to have the strict casing guidelines as in Go. Hence the choice for something like #.

@seh: yes I definitely considered using a separate token. However, this will add some significant complexity to the language. Syntactically, one would need a separate production for the def ident construct, as well as a new production and token for the specialized selector (.#). With the current proposal, these are just identifiers, requiring no addition and actually allowing for a removal of a token and production. This doesn't sound like much, but it actually is quite impactful.

Whatever the choice is, though, given the nested nature of CUE, the rule would have to be the same for top-level as nested fields. Personally I don't think having nested def foo: x fields is all that bad.

Anyway, you hit on exactly the biggest issue of this proposal: the ugliness of #foo: value. So
cons (compared to def foo):

  • #foo: bar is ugly (at least to you and me it seems).

pros (compared to def foo):

  • #foo: bar and its counterpart a: #foo are intuitively clear when it comes to resolution, at least, when reading (analogous to $foo: bar and a: $foo and _foo: bar and a: _foo). One does not separately have to learn a new construct like .#.
  • the language change is significantly simpler then introducing a pair of constructs for def foo: bar and a: #foo.

So the question is whether people think these benefits weigh up to the ugliness.

I think a big mitigating factor is that #foo is used for anchors in other languages. For instance, in JSON Schema, when using anchors, a definition ID may be #address which is then referred to as #address (# is required for anchors).

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @rudolph9 in cuelang/cue#307 (comment)

Feedback wanted

* Does this look like a reasonable change? Like/ not like?

This sounds reasonable 👍

* Anybody relying on "dynamic" definitions (e.g. `"\(foo)" :: bar` or `[foo] :: bar`)?

No (I didn't realize this was supported)

* Has anybody else ran into issues from having regular fields and definitions be in the same namespace?

Yes. It's usually something I can work around though.

The best benefit I can think of is the #MyDefinition syntax would probably make it immediate what something is. Right now, unless you're looking at the the declaration of the definition, it isn't immediate clear if something is a definition or a field so for that reason alone I really like this proposal!

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @rudolph9 in cuelang/cue#307 (comment)

One thought on how to make the syntax a little nicer and unified:

How about # as the syntax for decoration as well as retrieval? e.g.

Foo: {
  #MyDef: string
  MyField: number
}

Foo#MyDef // => string
Foo.MyField // => number

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#307 (comment)

@rudolph9: interesting thought. This would require extra syntax in the language, unfortunately. The Foo.#Bar notation is more orthogonal and requires no additional syntax rules. In fact, this proposal as is simplifies the spec!, albeit a tiny bit.

But it is definitely a neat thought. Comments from others welcome in this regard.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @hi-wayne in cuelang/cue#307 (comment)

i like of this proposal
but # is comment marker for yaml
e.g.
c: #Definition.#Map
difficult understand

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @verdverm in cuelang/cue#307 (comment)

I like the proposal as is, especially if it is simpler in both code and effort for the maintainers.

Definitions definitely need to stand out more, the : vs. :: has been a source of pain. It's also nice that the variations are all very close syntactically. I like keeping hidden fields, something that enables module and library authors to hide implementation will be important.

I am ok with # as a define because of #define in C/C++ , and that it will stand out. Nesting of def: seems more complicated and I worry less about the comment confusion because there are many different comment styles across languages.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#307 (comment)

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeedbackWanted Further information is requested Proposal roadmap/language-changes Specific tag for roadmap issue #339
Projects
None yet
Development

No branches or pull requests

1 participant