Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in handling of oneof fields in encode and fromObject and toObject #710

Closed
murgatroid99 opened this issue Mar 20, 2017 · 14 comments

Comments

@murgatroid99
Copy link
Contributor

protobuf.js version: 6.6.5

When encoding a plain object representing a message with a oneof field, the field containing the value will be ignored will be ignored unless there is an additional object member with which is the name of the oneof field mapping to the string name of the field chosen. But that additional object member is not required when using fromObject, or emitted by toObject.

In addition, in ProtoBuf.js 5, that additional object member was always emitted by the toObject equivalent, and that was very convenient for use in switch statements. My preferred solution here would be to always emit that field member in toObject.

// oneof.proto
syntax = "proto3";

message OneOfMessage {
  oneof choice {
    int32 first = 1;
    string second = 2;
  }
}
var protobuf = require('protobufjs');

protobuf.load('oneof.proto', function(err, root) {
  if (err) {
    throw err;
  }

  var OneOfMessage = root.lookup('OneOfMessage');

  var value1 = {choice: 'first', first: 5};
  console.log('value1:', value1);
  var buffer1 = OneOfMessage.encode(value1).finish();
  var value1_decoded = OneOfMessage.decode(buffer1).toObject();
  // {first: 5}
  console.log('value1_decoded:', value1_decoded);

  var value2 = {first: 5};
  console.log('value2:', value2);
  var buffer2 = OneOfMessage.encode(value2).finish();
  var value2_decoded = OneOfMessage.decode(buffer2).toObject();
  // {}
  console.log('value2_decoded:', value2_decoded);

  var value3 = {first: 5};
  console.log('value3:', value3);
  var message3 = OneOfMessage.fromObject(value3);
  var value3_decoded = message3.toObject();
  // {first: 5}
  console.log('value3_decoded:', value3_decoded);
});

I think the most important thing to note is that encoding and decoding value1 twice gives a different result than doing so just once.

@dcodeIO
Copy link
Member

dcodeIO commented Mar 20, 2017

The safe way here is to always use .fromObject where input data is a plain object. This creates a runtime message instance, which has the necessary getters for virtual oneof fields.

However, I am also not yet satisfied with the API - there are three possible scenarios:

  1. Rely on the virtual property to be present (plain objects can mimic this) when encoding (no overhead, that's how it currently is)
  2. Just encode whatever is present, no matter if part of an oneof or not (usually no overhead, no virtual properties necessary, but possibly redundant data on the wire)
  3. Add special logic to encoders (has overhead)

Currently, I'd favor option 2 as a compromise. This however assumes that plain objects (usually) don't have multiple fields of the same oneof set at once. Virtual getters could still remain.

dcodeIO added a commit that referenced this issue Mar 20, 2017
@dcodeIO
Copy link
Member

dcodeIO commented Mar 20, 2017

With this in place, your specific use case should be working - but the following isn't covered anymore:

OneOfMessage.encode({
  first: 1,
  second: "a"
});

This will now encode both values on the wire. The receiving part, however, will omit all but the last. When using runtime messages instead, virtual setters will still clear all but the last.

@jeffgrossmann
Copy link

I'm not an expert on oneof encoding/decoding but it seems like protobuf should never include more than one oneof field/value on the wire. The Java API for example will automatically clear the 'first' field/value if the 'second' field/value is set. Protobuf could match this behavior in the OneOfMessage.encode method by clearing the previous set oneof field when a new oneof is found. Not sure if the order of encode is deterministic or not so encode results may not be consistent. My other comment is the JSON is invalid input based on the .proto definition for a OneOfMessage. encode could report an error when it encounters more than one 'oneof' field. My first suggest is just to send the 'last' oneof field during the encode method. Just my two cents....

@murgatroid99
Copy link
Contributor Author

Why isn't MessageClass.encode(plain_object) just equivalent to MessageClass.encode(MessageClass.fromObject(plain_object))? That seems like the simplest way to avoid this problem.

And, in any case, it would still be useful for that virtual property to be present in the output of toObject, in order to do something like this

switch(value.choice) {
  case 'first': // use value.first
  case 'second': //use value.second
}

And, because ProtoBuf.js 5 does output that virtual property, and gRPC uses toObject, the absence of that property is a breaking change for gRPC, which is very inconvenient.

Even if it was just another option in ConversionOptions, that would still be very useful.

@dcodeIO
Copy link
Member

dcodeIO commented Mar 20, 2017

Why isn't MessageClass.encode(plain_object) just equivalent to MessageClass.encode(MessageClass.fromObject(plain_object))? That seems like the simplest way to avoid this problem.

The sole reason why all those methods exist in the first place is performance. Each method does one thing, depending on what the user requires. For example, you could call .encode with a plain object that is already known to be perfectly valid because it has been double checked and hard coded into the application by the developer. In this case, neither .verify nor .fromObject would be necessary, leading to less redundant assertions, improving encoding speed. The opposite would be dealing with user-provided inputs through a web form, which probably requires complete verification.

.encode and .decode especially are called internally to handle sub-messages. At the end of the day, each unnecessary assertion subtracts from performance there so I am trying to split this up.

Even if it was just another option in ConversionOptions, that would still be very useful.

I am fine with that, going to implement.

Not sure if the order of encode is deterministic or not so encode results may not be consistent

Since 6.7.0 it is deterministic, encoding fields in ascending field id order (as of the spec).

My other comment is the JSON is invalid input based on the .proto definition for a OneOfMessage. encode could report an error when it encounters more than one 'oneof' field

This is actually missing in .verify at the moment, going to implement.

@murgatroid99
Copy link
Contributor Author

OK, if encode is not intended to do any validation, I'll switch to using fromObject and validate explicitly.

@murgatroid99
Copy link
Contributor Author

I have noticed something else closely related: if a message has a oneof field, and you call message.toObject({defaults: true}), the resulting object will have fields for all choices, with the default values set on all fields that were not chosen. Without the virtual oneof property to indicate which one was chosen, that makes the fields effectively meaningless.

dcodeIO added a commit that referenced this issue Mar 21, 2017
@dcodeIO
Copy link
Member

dcodeIO commented Mar 21, 2017

This should fix your latest issue. I also added more information regarding the intended use of the various methods to the README.

Let me know if there is anything else left I can help with. Otherwise, I am also ready to push a release now.

@murgatroid99
Copy link
Contributor Author

I don't know of any other issues. Thank you for the quick turnaround on these changes.

@dcodeIO
Copy link
Member

dcodeIO commented Mar 22, 2017

Should be fixed in master. Feel free to reopen if there are any remaining issues!

@murgatroid99
Copy link
Contributor Author

Can you publish these changes soon?

@dcodeIO
Copy link
Member

dcodeIO commented Apr 2, 2017

It's now on npm as 6.7.0.

@murgatroid99
Copy link
Contributor Author

The verifier seems to return the error about multiple fields being set unconditionally. The following code illustrates this

// oneof.proto
syntax = "proto3";
message OneOfValues {
  oneof oneof_choice {
    int32 int_choice = 1;
    string string_choice = 2;
  }
}

In the following code, each console.log line outputs "oneofChoice: multiple values". All of them should be valid.

protobuf.load("oneof.proto", function(err, root) {
  var OneOfValues = root.lookupType('OneOfValues');
  console.log(OneOfValues.verify(OneOfValues.fromObject({})));
  console.log(OneOfValues.verify(OneOfValues.fromObject({int_choice: 5})));
  console.log(OneOfValues.verify(OneOfValues.fromObject({oneof_choice: 'int_choice', int_choice: 5})));
  console.log(OneOfValues.verify(OneOfValues.fromObject({string_choice: 'abc'})));
  console.log(OneOfValues.verify(OneOfValues.fromObject({oneof_choice: 'string_choice', string_choice: 'abc'})));
});

@dcodeIO
Copy link
Member

dcodeIO commented Apr 4, 2017

Currently, .verify is meant to be used with plain objects and doesn't behave well with runtime messages because these have default values on the prototype, leading to the error above.

OneOfValues.verify(OneOfValues.fromObject({}))
// should be:
OneOfValues.verify({})

But now that you mentioned it I see that there are a couple of issues arising from this. Going to look into it. The underlying issue probably is that encoders check for .hasOwnProperty while verifiers do not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants