Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move default values to prototypes and ditch nulls #380

Closed
wants to merge 1 commit into from
Closed

Move default values to prototypes and ditch nulls #380

wants to merge 1 commit into from

Conversation

seishun
Copy link
Contributor

@seishun seishun commented Dec 31, 2015

This fixes #200 by implementing the API described in #200 (comment).

  • Fields without an explicit default are treated as if the default were a type-specific value, as per the spec and the official implementations. For example, optional bool foo = 0; is now equivalent to optional bool foo = 0 [default = false];.
  • For every non-repeated non-map field, there is an equivalently named property on the message prototype. Its value is the field's default value.
  • When constructing a message object, its properties are no longer set for non-repeated non-map values. Consequently, attempting to read any of its fields results in the default value.
  • When encoding, only own properties are taken into account, unless it's a required field (previously, required fields with explicit defaults were set on construction, so one could encode without explicitly setting them).
  • When decoding, fields are no longer set to their defaults if they were missing on the wire, since the default values are already in the prototype. If a required field is missing on the wire, decoding throws even if the field has an explicit default.
  • The ProtoBuf.populateDefaults option is removed.

The changes are breaking for some (unlikely) use cases, but I tried to be as conservative as possible. Also note that all the above only applies to proto2. The behavior of proto3 should be exactly the same as before.

Also fixes #319, #373.

@englercj
Copy link

I'm not sure I like this change because it will break Hidden Class optimizations since the object will change itself at runtime. This will also destroy the Inline Caches for methods that take in these objects as parameters. The object should not change the properties it has at runtime, at construction it should have all the properties it would ever have.

@seishun
Copy link
Contributor Author

seishun commented Jan 1, 2016

@englercj Perhaps you can suggest a test script that we could use to measure the performance difference? Unless it's significant, I would argue that compliance to the spec and API simplicity is more important.

@englercj
Copy link

englercj commented Jan 1, 2016

@seishun The performance difference is heavily different because one way will generate optimized (lithuim generated) code, the other will be full compiled (unoptimized code). Depending on your application this can be unimportant or application breaking.

Measuring the performance difference in raw numbers of some test script doesn't mean much except in the context of your application, but breaking well known engine optimizations and forcing these objects to be hidden class polymorphic (and therefore break all functions that use them as well) is not a good idea.

I would move to a different library for our applications if the parsing of messages caused it to return a different hidden class every call. We process too many messages at volume to suffer a deoptimization of that nature.

If you are interested in the difference between calling a monomorphic, polymorphic, or megamorphic functions (these changes would make using functions become megamorphic eventually) then I suggest this article and this microbenchmark.


All of that said, I'm not suggesting we don't follow the spec. I'm suggesting that changing the properties of the object by adding or removing them at runtime is not the correct solution.

@seishun
Copy link
Contributor Author

seishun commented Jan 7, 2016

Measuring the performance difference in raw numbers of some test script doesn't mean much except in the context of your application

It means more than speculation.

I would move to a different library for our applications if the parsing of messages caused it to return a different hidden class every call.

Surely you wouldn't move without seeing the numbers first? In that case, it would be beneficial for everyone involved if you could test this solution now.

I'm suggesting that changing the properties of the object by adding or removing them at runtime is not the correct solution.

I'll be glad to see other suggestions if I see that this solution is unacceptable performance-wise.


Also, if you insist on following optimizations guidelines, Google seems to disagree with your statement that "at construction [the object] should have all the properties it would ever have": https://developers.google.com/speed/articles/optimizing-javascript#initializing-instance-variables

...Unless you didn't mean own properties, in which case I don't see how it applies here.

@dcodeIO
Copy link
Member

dcodeIO commented Jan 7, 2016

I actually like the hasOwnProperty idea a lot, great replacement for has_ API-wise. Regarding hidden class optimizations: The issue here is delete, right?

@englercj
Copy link

englercj commented Jan 7, 2016

It means more than speculation.

I'm not speculating, I'm trying to explain to you how V8 works. I even included a benchmark and an article that explains the compilation process I am describing.

Surely you wouldn't move without seeing the numbers first? In that case, it would be beneficial for everyone involved if you could test this solution now.

Do the numbers I linked not count? Having benchmarked our application and knowing I don't want it to run at half speed not good enough? We work pretty hard to hit optimized paths in the compilers due to the scale of the application. Suddenly removing those optimizations is not acceptable.

Google seems to disagree with your statement that "at construction [the object] should have all the properties it would ever have":

Huh? By placing value properties on the prototype, you essentially let V8 do the initialization of those properties for you on new rather than having them explicitly in the constructor. This results in a faster construction time because sets the prototype to __proto__ instead of having to run a long constructor each creation. That article in no way disagrees with what I am saying. That article is suggesting that instead of running js to initialize the vars, let v8 do it for you; which is fine if construction time is your bottleneck. But not only do we have to create the properties in the constructor here due to the nature of Message being all messages, but it still has nothing to do with Hidden Classes like I've been talking about.

Hidden classes are per-object (which includes __proto__). Where they are doesn't matter, not adding/removing them is what matters. I never said Hidden Classes require it to be in the constructor, I said that the properties have to exist at construction time.


The issue here is delete, right?

Pretty much, the properties that the message will hold need to be added at construction time then not deleted. Each delete causes a hidden class transition which is expensive, and having the multitude of hidden classes (depending on the diversity of how optional a messages arguments are) can cause megamorphic use of it. This can be unacceptable for many use cases, including ours.

@seishun
Copy link
Contributor Author

seishun commented Jan 7, 2016

Where they are doesn't matter, not changing them is what matters.

In that case, I don't understand your gripe with this solution. There is always going to be a property on the object for each field, either own or inherited.

@englercj
Copy link

englercj commented Jan 7, 2016

There is always going to be a property on the object for each field, either own or inherited.

https://github.com/dcodeIO/protobuf.js/pull/380/files#diff-3046de64cab49373b52464f636b51688R132

No, there isn't. You explicitly have delete, which removes the property. That transitions the object to a new Hidden Class.

The object, and the __proto__ property are two separate objects. They head have their own Hidden Classes. Adding/removing the properties to either had the effect of transitioning those hidden classes.

Being on the prototype is fine, being on the object is fine, changing either after construction is a no-no.

@seishun
Copy link
Contributor Author

seishun commented Jan 7, 2016

No, there isn't. You explicitly have delete, which removes the property.

Removing a property doesn't remove it from the prototype.

> var o = Object.create({foo: 5})
undefined
> o.foo = 'bar'
'bar'
> delete o.foo
true
> o.foo
5

@englercj
Copy link

englercj commented Jan 7, 2016

Removing a property doesn't remove it from the prototype.

Never said it did, or if I did I mispoke.

I said removing the property from the object changes the Hidden Class. I said that the object o and the prototype o.__proto__ have separate Hidden Classes. Doing delete o.foo changes the Hidden Class used to represent o in the V8 engine. os Hidden Class is used to quickly access properties on the o object. o.__proto__s Hidden Class is used to quickly access properties on the prototype. Those hidden classes are also used in the calling ICs to determine which optimized method to call (call asm instruction, no type guards). When the ICs fill up, it deoptimizes the function to use polymorphic calling (code that chooses the right function to run with type guards). When there are more types than can fit in the ICs it deoptimizes the fuction to use megamorphic calling (large list of typing guards and multiple internal calls, also the final function full generated not lithium generated).

I recommend you read the articles I posted, because I don't think you understand what I am saying.

This is another good one as well: https://www.smashingmagazine.com/2012/11/writing-fast-memory-efficient-javascript/

And of course I've only be speaking in terms of hidden classes, I haven't even mentioned the can of worms that can come by the object moving into Hash Table mode instead of being Class-backed. That would suck even worse.

@seishun
Copy link
Contributor Author

seishun commented Jan 7, 2016

I said removing the property from the object changes the Hidden Class. I said that the object o and the prototype o.proto have separate Hidden Classes. Doing delete o changes the Hidden Class used to represent o in the V8 engine.

Now you're contradicting yourself. So does it or does it not matter where the property is?

Assuming your current stance is that it does:

By placing value properties on the prototype, you essentially let V8 do the initialization of those properties for you on new rather than having them explicitly in the constructor. That article in no way disagrees with what I am saying. That article is suggesting that instead of running js to initialize the vars, let v8 do it for you.

Then yes, it does disagree with you. If a property is on the prototype, then it doesn't magically become the object's own property after construction – it remains inherited. The two snippets are semantically different.

@dcodeIO
Copy link
Member

dcodeIO commented Jan 7, 2016

Well, you have (at least) two (more with inheritance taken into account) hidden classes here: One for the single prototype that contains the default values as its "own" properties (the one-time generated ProtoBuf.Builder.Message) and (exactly, at least that is what we want to achieve) one for the multiple instances of it, the actual runtime messages. Runtime instances will fall back to "slow mode" in the worst case as soon as you call "delete" on any of their properties, hence disabling hidden class optimization for the exact instance it was used with. As far as I understand this is also the case when adding other new properties to them. Currently, we set everything to null to retain the layout.

The prototype (generated class) is not the issue here, the issue are the instances of them (those where delete is used on properties and which share a common hidden class ideally). It's not about changing the prototype, it's about changing objects that share their exact property layout.

As I said, I really like the hasOwnProperty approach API-wise, but depending on the scenario this comes at a significant cost.

@seishun
Copy link
Contributor Author

seishun commented Jan 7, 2016

@englercj Not very classy to edit your comments to make them look more convincing after they've been responded to.

Being on the prototype is fine, being on the object is fine, changing either after construction is a no-no.

Are you implying that the snippet from Google assumes the properties prop1_, prop2_ and prop4_ will never be modified after construction?

I recommend you read the articles I posted, because I don't think you understand what I am saying.

Have you read them? Because it seems you have skipped over or are intentionally ignoring the second-to-last paragraph in the first article you linked:

Indeed worrying about polymorphism is usually futile. Instead benchmark your code on a realistic dataset, profile it for hotspots and if any of them are JS related - check out IR that optimizing compiler produces.

@dcodeIO
Copy link
Member

dcodeIO commented Jan 7, 2016

Forgot that this is a dick-measuring contest, my bad. Ignore my comment.

@Braingines
Copy link

I feel sorry but I should join your 'dick-measuring contest' because that was my initiative to push this feature implementation

Dear all ProtoBuf.js maintainers, let's read this short article from ProtoBuf (and Cap'n'Proto) dev team:

https://developers.google.com/protocol-buffers/docs/overview#a-bit-of-history

<<
Protocol buffers were designed to solve many of these problems:

New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.>>

And what you just did?

You implemented something like tricky V8 - based binary stream reader and writer that work excluding data scheme.

THAT'S IT.

The whole feature set mentioned above were completely IGNORED the whole your implementation story, so please do not even try to tell that this is ProtoBuf implementation.

This is ProtoBuf API - compatible V8 - based reader and writer implementation. That's all.

You have no possibility in code to get list of fields that were changed!

What's the point to use your implementation?

Your current version just save the traffic bytes, ok, well done guys, well done!

I'm just wondering and can't imagine your .js code, how do you work with rich protocols (many message with many fields)?

Looks like you do tens of if / else or switch / case statements in your code. In our case this is not tens, but hundreds of this useless code jumps because of your 'implementation'.

Now, If you have some little doubts about how your super - duper - faster - hidden V8 - based code works, just ask the ProtoBuf initial developer what does he think about implementation with empty reflection feature set and your '.js null - features'.

@englercj
Copy link

englercj commented Jan 7, 2016

@seishun I edit my posts when I have a typo, realize I said something wrong, or feel that I need to clarify. I reread my post and edit it until I'm satisfied with it. I'm not trying to "win" anything because this isn't an argument, I'm trying to educate you.

I think the disconnect is that you are interpreting "Hidden Class" as the prototype object, but those aren't the same thing. Hidden Class !== Prototype.

Are you implying that the snippet from Google assumes the properties prop1_, prop2_ and prop4_ will never be modified after construction?

Changing the value isn't the issue (unless you are changing the type), adding and removing properties that change the structure of the object is the issue. I've mentioned multiple times now that I have both profiled, and read the IR of our application. And I'm here to tell you we cannot suffer an deoptimization of this scale. For most applications or websites this probably doesn't matter. For high-performance node.js application processing millions of messages per second (like ours) it matters, a lot.

Then yes, it does disagree with you. If a property is on the prototype, then it doesn't magically become the object's own property after construction – it remains inherited. The two snippets are semantically different.

They are different, but neither of them change the structure of the object after construction. The optimization mentioned there is a different one than what we have been discussing. __proto__ is a reference to the prototype object which was constructed on new. The object's own properties are separate and set in the constructor function. Adding/removing properties to either object after they are constructed will result in a Hidden Class transition for that object that changed.

@dcodeIO I'm done now, I don't understand why this is spiraling out of control. I'm trying to be reasonable and explain how this works, but I just keep getting met with "nuh-uh" and ad hominem attack. I honestly didn't think this would go like this, but you're right that the discussion is lost here and has degraded into a dick measuring contest. Sorry about that.

Do what you feel is best, you have been honoring rules like these while building this lib for a bit now so I trust you. In fact, its good to see that you actually do understand engine mechanics:

Well, you have (at least) two (more with inheritance taken into account) hidden classes here: One for the single prototype that contains the default values as its "own" properties (the one-time generated ProtoBuf.Builder.Message) and (exactly, at least that is what we want to achieve) one for the multiple instances of it, the actual runtime messages. Runtime instances will fall back to "slow mode" in the worst case as soon as you call "delete" on any of their properties, hence disabling hidden class optimization for the exact instance it was used with. As far as I understand this is also the case when adding other new properties to them. Currently, we set everything to null to retain the layout.

Thank you for understanding, and we look forward to working on future library optimizations.

@Braingines
Copy link

Does anybody of repository users can provide us the complete example of how they use optional / repeated fields on .js - side?

I mean, what's the point to discuss the performance of wrong code implementation?

I completely do not understand the reasons why people use this.

Just for compatibility with non - .js server side data exchange components?

To reduce traffic only?

If so, you can write your own strict binary reader / writer that exclude the data scheme in traffic, do not marshalling each time while you parse your message and that's it.

But it will be not the ProtoBuf implementation, sorry, guys.

The ProtoBuf technology means the complete feature set and here we have a lot of features missed, especially in reflection infrastructure.

The initial proposal and implementation mentioned in the top of this discussion is just the one attempt to implement this feature set in order to provide complete ProtoBuf technology implementation.

This attempt faced with a lot of 'it works perfectly now for us, why we should add your code' (sick).

One more question, does anybody of repository maintainers used other ProtoBuf implementations? C++ / Java - based? If someone can positively answer this question, I see no more pros for push this code further, in production...

@seishun
Copy link
Contributor Author

seishun commented Jan 7, 2016

I'm trying to educate you

I understand how hidden classes work. You haven't told me anything new. You're missing my point that speculating about performance based on one known optimization is unreasonable, as the very article you linked clearly points out.

Changing the value isn't the issue (unless you are changing the type), adding and removing properties that change the structure of the object is the issue.

You didn't answer my question clearly, so I assume the answer is "No". As you hopefully know, assigning to either prop1_, prop2_ or prop4_ of a newly-constructed instance of Bar will add a new own property. Are you still going to argue that Google's advice doesn't disagree with your statement that "at construction [the object] should have all the [own] properties it would ever have"?

(unless you are changing the type)

Protobuf.js does change the type currently. Between null and whatever the field type is.

@mraleph
Copy link

mraleph commented Jan 7, 2016

@seishun the page you are linking to is really-really old, I think it even predates V8 itself

Please don't use it to justify any sort of optimizations.

Advice given in the section "Initializing instance variables" might have been a very clear cut back in the day, but these days it comes with a very complicated trade-off - and usually it's impossible to predict whether it will improve the performance and memory consumption of your code. The opposite is also true - it's impossible to predict whether it will regress performance either. Requires benchmarking and detailed analysis.

It is true however that if the field is almost never present then you might at least save some memory (potentially sacrificing some performance due to polymorphism).

[Note: I did not look at the code in question, I am just commenting on the advice from "Initializing instance variables" in general]

@seishun
Copy link
Contributor Author

seishun commented Jan 8, 2016

Requires benchmarking and detailed analysis.

That's exactly what I've been saying the whole time.

Please don't use it to justify any sort of optimizations.

I'm only using it to show that blindly following optimization advice can lead to contradiction.

@seishun
Copy link
Contributor Author

seishun commented Feb 7, 2016

@dcodeIO It's been a month. Since the only person who was against this change due to performance concerns still hasn't provided benchmarks supporting their claims, can we merge this or at least move on to discussing other aspects besides performance?

@michelgb
Copy link

Hello. I really need this code. I am using proto3 and I am having several issues with the defaults. Could someone please review it and merge?

@robin-anil
Copy link
Contributor

+1 to what @michelgb said

@bootstraponline
Copy link

I don't think performance matters if we're not spec compliant. I'd like this PR to be merged.

Alternatively update the readme so people that need proto3 support are aware of the issue.

@paralin
Copy link

paralin commented Oct 31, 2016

@bootstraponline agreed, any progress on this?

also @seishun hey from node-steam

@bootstraponline
Copy link

@bootstraponline agreed, any progress on this?

I think we're just waiting for someone with write access to press the merge button.

@dcodeIO dcodeIO closed this Nov 6, 2016
@seishun
Copy link
Contributor Author

seishun commented Nov 6, 2016

@dcodeIO
Could you provide some explanation for why this was closed?

@bootstraponline
Copy link

#485

@dcodeIO
Copy link
Member

dcodeIO commented Nov 6, 2016

Oh, this was obviously closed automatically while moving branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

proto2 default optional fields construction behaviour Checking if a field with a default value has been set
9 participants