Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON_VALUE extract as weird values for JSON_OBJECT #14374

Closed
Angryshark128 opened this issue Jun 6, 2023 · 2 comments · Fixed by #14461
Closed

JSON_VALUE extract as weird values for JSON_OBJECT #14374

Angryshark128 opened this issue Jun 6, 2023 · 2 comments · Fixed by #14461
Labels

Comments

@Angryshark128
Copy link

Angryshark128 commented Jun 6, 2023

Affected Version

26.0.0

Description

image

with nested_data as ( SELECT JSON_OBJECT('k1':ARRAY['v1','v2'],'k2':'v3','k3':4) json_arr ) SELECT JSON_VALUE(json_arr,'$.k1') k1, JSON_VALUE(json_arr,'$.k1' RETURNING VARCHAR ARRAY) k1_string_arr, JSON_VALUE(json_arr,'$.k1' RETURNING INTEGER ARRAY) k1_int_arr, JSON_VALUE(json_arr,'$.k2') k2, JSON_VALUE(json_arr,'$.k2' RETURNING VARCHAR ARRAY) k2_string_arr, JSON_VALUE(json_arr,'$.k2' RETURNING INTEGER ARRAY) k2_int_arr, JSON_VALUE(json_arr,'$.k3') k3, JSON_VALUE(json_arr,'$.k3' RETURNING VARCHAR ARRAY) k3_string_arr, JSON_VALUE(json_arr,'$.k3' RETURNING INTEGER ARRAY) k3_int_arr from nested_data

When i want to extract values from a json object ( SQL as above ), it will get some unexpected and weird values.

In order to understand more easily, the object can be equivalent to json as below:
{ "k1": [ "v1", "v2" ], "k2": "v3", "k3": 4 }
image

What i expect is as below:

#1 - JSON_VALUE(json_arr,'$.k1') = ['v1','v2'], actual is empty
#2 - JSON_VALUE(json_arr,'$.k1' RETURNING VARCHAR ARRAY) = ['v1','v2'], actual is null
#3 - JSON_VALUE(json_arr,'$.k2' RETURNING VARCHAR ARRAY) = null, actual is ['v3']
#4 - JSON_VALUE(json_arr,'$.k2' RETURNING INTEGER ARRAY) = null, actual is [null]
#5 - JSON_VALUE(json_arr,'$.k3' RETURNING VARCHAR ARRAY) = null, actual is ['4']
#6 - JSON_VALUE(json_arr,'$.k3' RETURNING INTEGER ARRAY) = null, actual is [4]

I can understand results of #3#4#5#6 might be converted for type implicitly, but what's the cause for #1 and #2?

@clintropolis
Copy link
Member

thanks for the report, I'll have a look. I admit I haven't added a lot of testing yet with JSON_VALUE on 'inline' results created with JSON_OBJECT, so seems plausible that there are some bugs in this area.

@clintropolis
Copy link
Member

clintropolis commented Jun 21, 2023

confirmed a bug with expression based usage of json_value to extract arrays (so ingest time transforms and queries such as in this ticket), it will always return null. #14461 mostly fixes the issue, though the query doesn't quite work correctly as written due to the first expression, but if rewritten to use the undocumented JSON_VALUE_ANY, it works - though still calls the output a VARCHAR instead of VARCHAR ARRAY, which is probably ok since we can handle that due to multi-value strings.

SELECT
JSON_VALUE_ANY(json_arr,'$.k1') k1,
JSON_VALUE(json_arr,'$.k1' RETURNING VARCHAR ARRAY) k1_string_arr,
JSON_VALUE(json_arr,'$.k1' RETURNING INTEGER ARRAY) k1_int_arr,
JSON_VALUE(json_arr,'$.k2') k2,
JSON_VALUE(json_arr,'$.k2' RETURNING VARCHAR ARRAY) k2_string_arr,
JSON_VALUE(json_arr,'$.k2' RETURNING INTEGER ARRAY) k2_int_arr,
JSON_VALUE(json_arr,'$.k3') k3,
JSON_VALUE(json_arr,'$.k3' RETURNING VARCHAR ARRAY) k3_string_arr,
JSON_VALUE(json_arr,'$.k3' RETURNING INTEGER ARRAY) k3_int_arr
from nested_data```

<img width="1060" alt="Screenshot 2023-06-21 at 2 18 26 PM" src="https://github.com/apache/druid/assets/1577461/37456daf-8a6c-4cbf-a836-89f04bfb4f8c">

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants