Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.4.13 serialization break #156

Open
ethe opened this issue Jan 21, 2021 · 6 comments
Open

v0.4.13 serialization break #156

ethe opened this issue Jan 21, 2021 · 6 comments
Labels
bug Something isn't working

Comments

@ethe
Copy link
Member

ethe commented Jan 21, 2021

relay to #139, a quick verification:

from thriftpy2.transport.memory import TCyMemoryBuffer
from thriftpy2.protocol import cybin as proto
from thriftpy2.thrift import TType

b1 = TCyMemoryBuffer()
proto.write_val(b1, TType.MAP, {"hello": "1"},
                spec=(TType.STRING, TType.STRING))
b1.flush()

b2 = TCyMemoryBuffer()
proto.write_val(b2, TType.MAP, {"hello": b"1"},
                spec=(TType.STRING, TType.BINARY))
b2.flush()

b1.getvalue() != b2.getvalue()

TType was be specified in Apache Thrift standard, and Binary type should always have the same behavior with String in kinds of protocols, hence we missed some check to modify the value of Binary: 18 to String: 11 in complex type serialization (map, list, etc.).

@ethe ethe added the bug Something isn't working label Jan 21, 2021
@ethe
Copy link
Member Author

ethe commented Jan 21, 2021

@JonnoFTW @aisk @dmulter

@ethe
Copy link
Member Author

ethe commented Jan 21, 2021

I think maybe I have to revert this merge #139 and have a discussion again about the binary type support

@ethe
Copy link
Member Author

ethe commented Jan 21, 2021

Hold this issue before Apache JSON protocol re-merge.

@ethe ethe pinned this issue Jan 21, 2021
@JonnoFTW
Copy link
Contributor

I'll have a look at this when I get back to work on Monday

@ethe
Copy link
Member Author

ethe commented Jan 21, 2021

@JonnoFTW Thanks for your work! I have several suggestions:

  • could you split support the Binary type feature and Apache JSON protocol to two pull requests?
  • we should ensure that the Binary type has the same behavior with String outside of JSON-like protocols
  • the Binary type maybe should allow the user to pass Python str in with an implicit type converting?

@JonnoFTW
Copy link
Contributor

JonnoFTW commented Feb 9, 2021

@ethe

  • could you split support the Binary type feature and Apache JSON protocol to two pull requests?

How do you want Thrifpy2's Apache JSON to handle the case where it doesn't support binary? Apache's thrift will send binary data as base64 encoded string. If we don't support binary in Apache JSON, then the result will be that thriftpy2 gets a base64 encoded string. From the test, without binary support, apache json will make the following field:

tbinary=b"\x01\x0fabc123\x00\x02"

Into this:

"19":{'str': 'AQ9hYmMxMjMAAg=='}}

Without binary support, my apache thrift protocol has no way of knowing whether or not to decode the above base64 encoded string. If we try to decode any string field using base64.b64decode we will not always get the right result since something like "notbas64":

base64.b64decode(b"notbas64")

Is a valid string and a valid base64 encoded string.

  • we should ensure that the Binary type has the same behavior with String outside of JSON-like protocols

What do you mean? Please provide an example.

  • the Binary type maybe should allow the user to pass Python str in with an implicit type converting?

Currently, if a thrift field is specified as binary and you pass a string, the string is serialised to an appropriate intermediate binary representation as appropriate. When deserialising, it is converted back to a python bytes object. Is this what you want?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants