Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow all fields to be optional #397

Closed
sgielen opened this issue Nov 20, 2021 · 9 comments
Closed

Allow all fields to be optional #397

sgielen opened this issue Nov 20, 2021 · 9 comments

Comments

@sgielen
Copy link
Contributor

sgielen commented Nov 20, 2021

Background

We have a single repository with a Go server and a Typescript client. We use protobufs to communicate between the two. We try to maintain forwards and backwards compatibility between the two, i.e. the client can send an extra field which the server ignores, or the server accepts an extra field that the client does not send yet. Protobuf, in principle, is excellent for this.

The issue

Currently, for proto files shared between Typescript and Go, there is a semantics difference which makes it difficult to add new fields. Suppose we have a message Foo with an int64 id = 1; and we add the additional field string name = 2; to the proto definition. Now, if we don't set a value for this field explicitly in the Go code, it is simply interpreted as having the default value for that field (e.g. "" for strings). For Typescript code, an error is generated for all instantiations of the message indicating a type mismatch between {id: number} and Foo, you're missing {name: string}.

Current options

The options I'm aware of:

  • The useOptionals=true ts-proto parameter allows message fields to be optional, but not scalars or repeated fields.
  • Type wrappers could be used, e.g. StringValue instead of string, making the scalar field a message field, allowing it to be optional; in combination with the option above, it would allow message fields and scalars to be optional, but not repeated fields.
  • In protoc 3.15+, scalar fields can be marked optional, but not repeated fields. Also, proto3's use case is different: it indicates "field presence" semantics, allowing to see the difference between a default value and an unset value; proto3 already allows all fields to be missing from all protobufs.
  • Issue Repeated fields can be undefined #225 addresses this for repeated fields, with a PR in feat: implemented forceOptionalRepeated=true flag #234, but no action since May and it doesn't change scalars.
  • If one uses the solution from feat: implemented forceOptionalRepeated=true flag #234 and enable useOptionals=true and use type wrappers, the issue would arguably be resolved for scalars, message fields and repeated fields? Is this the preferred way to go?
  • Every time you update the proto, you can update all instantiations of the message. If the type is used a lot, this is tedious.
  • One can use Foo.fromPartial(), but that means using fromPartial everywhere, which is also tedious.

It seems to me none of these options truly resolve the issue, they are all workarounds.

The proposal

I would like to add an additional parameter that enables ts-proto to follow Go semantics and make all fields optional. With the option enabled, I would expect the following proto:

message Foo {
  int64 id1 = 1;
  optional int64 id2 = 2;
  repeated int64 id3 = 3;

  Bar bar1 = 4;
  optional Bar bar2 = 5;
  repeated Bar bar3 = 6;
}

message Bar {}

to generate the following Typescript (or something similar):

export interface Foo {
  id1?: number; /* after conversion from empty protobuf to JS, id1 is 0 */
  id2?: number; /* after conversion from empty protobuf to JS, id2 is undefined */
  id3?: number[]; /* after conversion from empty protobuf to JS, id3 is [] */

  bar1?: Bar;
  bar2?: Bar;
  bar3?: Bar[];
}

export interface Bar {}

I am aware of the downside that this allows typo's, i.e. I can use {idd1: 5} as a Foo without errors since all fields are optional and Typescript allows setting additional fields (until Exact Types are implemented). Of course, in Go this risk does not exist since you are not allowed to set any fields that don't exist, so you would get an error here. I accept this risk and would like to cover it with better testing.

Do you think this is possible, and do you think it is useful to add?

@stephenh
Copy link
Owner

Ah, exact types, where art thou? :-)

Your analysis/overview is exactly right, fromPartial is currently the most kosher way of handling this...

Maybe as a pedantic issue / also curious how go handles this, in ts-proto we don't really have a way of distinguishing the "creating a message" type vs. "reading a message" type (except for fromPartial), and so the single FooMessage type currently caters to readers. I.e. my historical "that seems odd" take on a int64 id1 showing up as Foo.Messageid1? is that, for readers, it's forcing them to handle a case that "should never exist".

That said, for writers, you're right, it's tedious to add id1: ... in a bunch of call sites ... but it's also a required field, so shouldn't you verify each of those call sites is passing in a correct id1 instead of getting the default value? :-D (I get this is personal codebase / contextual preference, just thinking out loud.)

Anyway, you're right we've already got useOptionals=true, and honestly I really don't remember why it only does messages and not primitives or arrays.

And this request comes up often enough, that if you want to just change useOptionals=true to apply to all fields, primitives and arrays as well, that'd be great. I think I'd rather have that then yet-another-option, of which ts-proto already has too many imo. :-)

Thanks for the well-written out thoughts/request. Looking forward to a PR. :-)

@sgielen
Copy link
Contributor Author

sgielen commented Nov 21, 2021

Thank you @stephenh for the response!

I've already started work on a PR. It currently adds a forceOptionals=true flag. I'll submit that as-is momentarily, so you could start reviewing the approach. It'll be easy to refactor it so it uses useOptionals to apply to all fields. :-)

Maybe as a pedantic issue / also curious how go handles this, in ts-proto we don't really have a way of distinguishing the "creating a message" type vs. "reading a message" type (except for fromPartial), and so the single FooMessage type currently caters to readers. I.e. my historical "that seems odd" take on a int64 id1 showing up as Foo.Messageid1? is that, for readers, it's forcing them to handle a case that "should never exist".

Right -- what's important to know is that in Go, there is no concept of a variable being undefined or uninitialized. When a variable is declared but not initialized, it has the default value for that variable: 0 for numbers, "" for strings, nil for pointers, or [] for arrays. This is also true for members of structs. Since every proto Message becomes a struct with the same members, all members are also "initialized by default". So, without any protobuf magic on the Go side, when you declare a foo := &pb.Foo{} (referring to the protobuf Foo message), you can already read foo.Id1 as 0, foo.Id2 as 0 (but foo.HasId2() is false), foo.id3 as [], foo.bar1 as nil (of type *pb.Bar), foo.bar2 as nil (again with foo.HasBar2()) and foo.bar3 as []. This is also the behavior Go exhibits when you parse an empty protobuf. So, there's no difference between creating a message or reading it.

AFAIK, there's no way in Typescript to say "in the interface Foo, id1 should always exist; if you don't initialize it it should read back as 0". I guess we could go all the way and make a class with intercepted getters and setters, but I'm not sure if that's better. Barring that, we're stuck with unset fields being undefined when creating a message. When parsing it, we can use the same defaults Go would.

It's not necessarily too bad having to handle "field doesn't exist", though, as Typescript allows writing foo.id1! (one additional character) if the author is sure the field is always set when reading.

That said, for writers, you're right, it's tedious to add id1: ... in a bunch of call sites ... but it's also a required field, so shouldn't you verify each of those call sites is passing in a correct id1 instead of getting the default value? :-D (I get this is personal codebase / contextual preference, just thinking out loud.)

If it is really a required field, then yes, but proto3 removed this concept as fields tend to be required until they're not, at which point required fields are difficult to remove. See also this explanation of the discussion which is much better than mine. I think the take-away is that the application code requires a field at some point in time, but the proto definition does not. When reading, you can assume that a field is set (e.g. id!) and when writing, it's a bug if you don't send the value, regardless of where you check it. The only downside we have in TS compared with Go, is that typo's aren't as easily found, so the risk of that bug is higher.

@sgielen
Copy link
Contributor Author

sgielen commented Nov 21, 2021

PR is created: #402

@webmaster128
Copy link
Collaborator

I'm not sure this is the right way to go. A protobuf message instance has no semantic difference between a field that is not set and a field that has the default value (and it's a pain protobuf.js has this difference). Using field with foo?: number in TS suddenly gives two cases to handle on read: undefined and 0 – but undefined can never happen in a properly initialized object. This is why we have Foo as the type to read and DeepPartial<Foo> as the type to write. I think using fromPartial for creating messages is what you want for the use case you describe.

The way Go allows you to create structs without being explicit of the content of the field is … well … a matter of taste. I never saw this in any other language and don't consider it to be somthing worth exporting.

@stephenh
Copy link
Owner

stephenh commented Nov 24, 2021

@webmaster128 yeah, I agree with your analysis, and it's basically what (after looking through the ts-proto readme again), I articulated one of the last times this came up: #120 (comment)

That said, the useOptionals config flag has already snuck into the code base, which in retrospect does look half-implemented given it's only non-primitives; and users have repeatedly requested "optionals everywhere" over the years, so I'm generally at the point of "well, if you opt-in, and understand the trade-offs, then 🤷 ". :-)

@stephenh
Copy link
Owner

Also, per the SO post, it does have some good points i.e. messages that are persisted, and pushing breaking changes across a wide-variety of systems...

Just talking out loud/for my own mental model...I was going to assert that features like google.protobuf.StringValue (which afaiu exists solely to represent "string present" vs. "string not present") and proto3 adding back optional fields (technically field presence) were admissions that, at least for some users of protobuf, that required-vs-optional fields were important, implying that the proto3 "no more required fields" was an over-correction...

That said, there is some nuance where "you can specify optional" (presence) !== "you can specify required" (required). I.e. I think I was treating them as opposites: if I can specify something as optional, that's implying that everything else is "not optional" i.e. "required".

But that's not right. I.e. required implies messages would blow up if required fields are not available, and that (afaiu) is specifically what proto3 wanted to avoid. I.e. for older messages, or for routers/proxies that are just piping protos around and don't want to have deserialization completely blow up b/c of a missing required field.

@stephenh
Copy link
Owner

Also, shoot, @sgielen you had directly linked to it right there in the PR description, but I'm just how catching up that we tried this before and reverted:

#225 (comment)

The introduced change [repeated fields now being Foo[] | undefined] both creates an unexpected behavior
and misrepresents the type returned from the fromJson methods.

...I was going to naively question "...well, sure, but doesn't useOptional=true already add | undefineds everywhere, and you're fine with that?"

...except, right, it doesn't "add them everywhere", it leaves primitives and repeateds (which do have proto3-spec'd defaults) alone, and merely changes a child: Child | undefined to child?: Child.

Sorry guys, it's been awhile since I've paid attention to the deeper nuances here...

So, per @webmaster128 's point, the current useOptional=true does not drastically change the complexity of readers, b/c for message fields, you're already checking undefined anyway. It's just making message creation slightly easier.

But, right, this goes much further, and dramatically changes how readers much interact with every field.

At the minimum, we need to keep the current useOptionals=true behavior, to avoid another revert, but maybe instead of an entirely new flag, we could use useOptionals=all.

@webmaster128
Copy link
Collaborator

webmaster128 commented Nov 24, 2021

Thanks for the follow up. Fine with me. I'd just encourage to avoid the overhead in the generated code for users that do not use this new option to be minimal.

At the minimum, we need to keep the current useOptionals=true behavior, to avoid another revert, but maybe instead of an entirely new flag, we could use useOptionals=all.

To me the biggest question is: will forceOptionals replace useOptionals long term or is there a justification to keep both? We can create a new one new, deprecate the old one and remove the old in the next major. In this case I'd stick with boolean flags. Otherwise multiple values to useOptionals are good because it gives you 3 cases instead of 4 and probably is easier to teach to users.

@stephenh
Copy link
Owner

Closing this out as #402 got merged, with useOptionals=all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants