-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(expr): add to_jsonb #13161
feat(expr): add to_jsonb #13161
Conversation
Additional notes:
This requires further investigation.
|
This seems to require a custom formatter in the jsonbb library.
This also requires arbitrary precision number support in the jsonbb library.
This can also be stored as an arbitrary precision number. Converting to string will result in extra I'll take on them. |
#6412 Formatting of float is more complicated than we thought. Even for
#7175 (comment) Similar problem is not solved for
I would suggest we just report an out-of-range error for such case, if doable. RFC 8259 acknowledges the fact that only IEEE 754 double has good interoperability. PostgreSQL jsonb uses its variable length decimal (aka numeric), which is much larger than our existing decimal. It does not have to be a blocker for the common use cases where numbers are within the range, before we support the larger range. Switching from an error to a number is not considered a breaking change, but switching from a string is. |
let mut builder = jsonbb::Builder::default(); | ||
builder.begin_object(); | ||
|
||
let names: Vec<&str> = data_type.as_struct().names().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also zip
the names to avoid collection to vec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the names
vec is empty, for instance:
select to_jsonb(ARRAY[3,4,5,6]);
Any suggestion for this situation regarding to zip
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that for unnamed fields, pg always names them as f1, f2, ...
. Then I think we should code this behavior in StructType
and make names()
always return non-empty.
postgres=# select (row(1, 2)).*;
f1 | f2
----+----
1 | 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems code is also using names()
.
Let me double check and fix it in the jsonb_agg
enhancement PR.
Still have some issues:
I'll take a look tomorrow to fix 2 and 3. |
Codecov Report
@@ Coverage Diff @@
## main #13161 +/- ##
==========================================
- Coverage 68.06% 68.03% -0.04%
==========================================
Files 1515 1516 +1
Lines 257111 257257 +146
==========================================
+ Hits 175013 175028 +15
- Misses 82098 82229 +131
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks
let mut builder = jsonbb::Builder::default(); | ||
builder.begin_object(); | ||
|
||
let names: Vec<&str> = data_type.as_struct().names().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that for unnamed fields, pg always names them as f1, f2, ...
. Then I think we should code this behavior in StructType
and make names()
always return non-empty.
postgres=# select (row(1, 2)).*;
f1 | f2
----+----
1 | 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
To summarize:
Some suggestions:
|
Maybe it's time to introduce the
Right. We should unify these traits. The other remaining task is to support any type as the argument of |
Agree. Let's do it in another PR.
Yes. I'm now trying to take use of the |
3183f3d
to
a0d1b89
Compare
Update:
|
a323671
to
116944e
Compare
116944e
to
6f67496
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly lgtm
impl ToJsonb for i64 { | ||
fn add_to(self, builder: &mut Builder) -> Result<()> { | ||
let res: F64 = self | ||
.try_into() | ||
.map_err(|_| ExprError::CastOutOfRange("IEEE 754 double"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sharing more on this, given the e2e test failure in CI:
- RFC 8259 allows this:
type Number = f64
. - PostgreSQL does this:
type Number = Decimal
.- PostgreSQL decimal can even hold
f64::MAX
- PostgreSQL decimal can even hold
- serde_json/simd-json/jsonbb does this:
enum Number {I64, U64, F64}
- serde_json has a feature
arbitrary_precision
totype Number = String
- simd-json has a feature
128bit
toenum Number {I64, U64, F64, I128, U128}
- jsonbb may be extended to
enum Number {I64, U64, F64, Decimal}
- This requires a variable length decimal.
rust_decimal
is not enough.
- This requires a variable length decimal.
- serde_json has a feature
Back to to_jsonb
: we do have the capability to avoid precision loss on i64 today, but using the lossy f64 feels more consistent with decimal / int256 which we cannot encode losslessly yet. Both sounds acceptable to me. And the .0
issue would cause less problems after the jsonbb serializer fix.
Tips: We don't have anyway to report out-of-range error during streaming. Instead, the whole JSON field will be filled with I would recommend to just (implicitly) convert int64 / decimal to float64, regardless of the precision loss, which looks intuitive to me. For |
|
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
As title.
#12834
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note