Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](multi-catalog) Fix some undefined behaviors. #37845

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

kaka11chen
Copy link
Contributor

Proposed changes

  • Null pointer of type 'doris::StringRef' in orc reader. The root cause is error will throw when num_values == 0 in _decode_string_non_dict_encoded_column.
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:9: runtime error: reference binding to null pointer of type 'doris::StringRef'
    #0 0x562516fa9770 in std::vector<doris::StringRef, std::allocator<doris::StringRef> >::operator[](unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:2
    #1 0x562516fa9770 in doris::Status doris::vectorized::OrcReader::_decode_string_non_dict_encoded_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::EncodedStringVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1164:39
    #2 0x562516f9c08b in doris::Status doris::vectorized::OrcReader::_decode_string_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1116:16
    #3 0x562516f91d73 in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1357:16
    #4 0x562516c79a0c in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1524:5
    #5 0x562516f9339a in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1402:9
    #6 0x562516c79a0c in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1524:5
...
  • Shift exponent 128 is too large for 128-bit type 'ValueCopyType' (aka '__int128') in parquet reader. The root cause is error will throw when len == 0.
/root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:413:27: runtime error: shift exponent 128 is too large for 128-bit type 'ValueCopyType' (aka '__int128')
    #0 0x56251760fbc7 in doris::vectorized::parquet::StringToDecimal<doris::vectorized::Decimal128V3, (doris::vectorized::DecimalScaleParams::ScaleType)1>::physical_convert(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&) /root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:413:27
    #1 0x562517290dc4 in doris::vectorized::parquet::PhysicalToLogicalConverter::convert(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, doris::TypeDescriptor, std::shared_ptr<doris::vectorized::IDataType const> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, bool) /root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:209:9
    #2 0x562517284a6d in doris::vectorized::ScalarColumnReader::read_column_data(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const>&, doris::vectorized::ColumnSelectVector&, unsigned long, unsigned long*, bool*, bool) /root/doris/be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:569:24
    #3 0x56251725ae7e in doris::vectorized::RowGroupReader::_read_column_data(doris::vectorized::Block*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, unsigned long, unsigned long*, bool*, doris::vectorized::ColumnSelectVector&) /root/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:421:13
    #4 0x56251724d6d2 in doris::vectorized::RowGroupReader::next_batch(doris::vectorized::Block*, unsigned long, unsigned long*, bool*) /root/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:321:9
    #5 0x56251708eb97 in doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) /root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:530:36
    #6 0x56253036772d in doris::vectorized::VFileScanner::_get_block_wrapped(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:311:13
    #7 0x562530366549 in doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:253:17
    #8 0x5625176e79c8 in doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vscanner.cpp:117:17
    #9 0x5625176e6fc1 in doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vscanner.cpp:84:12
    #10 0x562517698047 in doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:250:5
    #11 0x56251769bc1f in doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:172:25
    #12 0x56251769bc1f in doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:171:35
...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39749 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 64c45ec5518333db9a91025619eeb5be46e08b9e, data reload: false

------ Round 1 ----------------------------------
q1	17624	4548	4280	4280
q2	2012	191	200	191
q3	10456	1227	1086	1086
q4	10201	777	750	750
q5	7635	2677	2679	2677
q6	223	140	136	136
q7	957	589	586	586
q8	9225	2065	2090	2065
q9	8863	6556	6540	6540
q10	8726	3812	3781	3781
q11	466	226	249	226
q12	464	233	228	228
q13	17774	2982	2991	2982
q14	282	233	241	233
q15	520	484	490	484
q16	498	394	377	377
q17	968	707	726	707
q18	7999	7545	7386	7386
q19	7858	1317	1292	1292
q20	691	316	322	316
q21	4936	3149	3316	3149
q22	350	277	281	277
Total cold run time: 118728 ms
Total hot run time: 39749 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4405	4280	4263	4263
q2	378	260	251	251
q3	3019	2908	2908	2908
q4	1963	1670	1716	1670
q5	5605	5521	5436	5436
q6	228	131	135	131
q7	2236	1862	1858	1858
q8	3260	3429	3413	3413
q9	8781	8829	8858	8829
q10	4139	3852	3752	3752
q11	595	488	489	488
q12	790	636	634	634
q13	16340	3161	3173	3161
q14	310	298	283	283
q15	547	493	489	489
q16	488	439	440	439
q17	1823	1549	1517	1517
q18	8102	7955	7832	7832
q19	1734	1629	1576	1576
q20	2080	1862	1875	1862
q21	5139	4725	4651	4651
q22	647	510	500	500
Total cold run time: 72609 ms
Total hot run time: 55943 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173292 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 64c45ec5518333db9a91025619eeb5be46e08b9e, data reload: false

query1	911	365	358	358
query2	6446	1907	1830	1830
query3	6646	207	215	207
query4	28421	17590	17388	17388
query5	3686	474	478	474
query6	249	174	158	158
query7	4583	299	298	298
query8	236	197	200	197
query9	8539	2442	2418	2418
query10	443	286	271	271
query11	11713	10006	9934	9934
query12	120	86	85	85
query13	1639	380	370	370
query14	9986	7717	6989	6989
query15	223	166	169	166
query16	7762	337	318	318
query17	1778	575	533	533
query18	1972	280	282	280
query19	198	156	151	151
query20	89	85	82	82
query21	208	130	127	127
query22	4389	4128	4086	4086
query23	34070	34121	33627	33627
query24	11131	2953	2917	2917
query25	621	449	444	444
query26	1116	158	155	155
query27	2256	286	279	279
query28	7105	2091	2084	2084
query29	920	679	654	654
query30	257	158	155	155
query31	971	766	763	763
query32	94	59	58	58
query33	788	340	319	319
query34	935	494	524	494
query35	700	621	609	609
query36	1153	996	976	976
query37	143	85	88	85
query38	2938	2829	2871	2829
query39	918	814	854	814
query40	215	123	119	119
query41	49	51	48	48
query42	120	105	105	105
query43	516	471	477	471
query44	1224	736	738	736
query45	296	158	158	158
query46	1076	751	735	735
query47	1873	1781	1749	1749
query48	381	308	301	301
query49	821	397	412	397
query50	766	396	414	396
query51	6920	6837	6819	6819
query52	102	95	93	93
query53	355	298	290	290
query54	871	451	450	450
query55	76	74	74	74
query56	294	264	265	264
query57	1092	1040	1007	1007
query58	245	250	259	250
query59	2796	2734	2643	2643
query60	297	275	282	275
query61	99	92	94	92
query62	779	640	637	637
query63	315	288	287	287
query64	9440	2239	1666	1666
query65	3155	3145	3104	3104
query66	676	326	321	321
query67	15774	14925	15027	14925
query68	8644	563	559	559
query69	740	496	416	416
query70	1360	1137	1063	1063
query71	517	279	274	274
query72	9146	5734	5317	5317
query73	2206	330	330	330
query74	6287	5640	5702	5640
query75	5309	2670	2725	2670
query76	5394	973	942	942
query77	774	302	302	302
query78	11005	10763	9220	9220
query79	11820	542	522	522
query80	1470	556	471	471
query81	583	223	213	213
query82	517	138	127	127
query83	354	165	171	165
query84	272	85	87	85
query85	1018	310	301	301
query86	413	313	312	312
query87	3352	3099	3093	3093
query88	5235	2391	2391	2391
query89	517	397	387	387
query90	2304	197	199	197
query91	131	101	97	97
query92	65	49	49	49
query93	5183	533	508	508
query94	1450	212	210	210
query95	410	324	331	324
query96	609	280	269	269
query97	3172	3012	3058	3012
query98	223	195	189	189
query99	1517	1243	1285	1243
Total cold run time: 308973 ms
Total hot run time: 173292 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 64c45ec5518333db9a91025619eeb5be46e08b9e, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.03	0.04
query3	0.21	0.04	0.05
query4	1.69	0.06	0.06
query5	0.51	0.48	0.48
query6	1.14	0.72	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.50	0.50
query10	0.53	0.54	0.53
query11	0.15	0.11	0.12
query12	0.15	0.12	0.12
query13	0.58	0.59	0.59
query14	0.76	0.76	0.78
query15	0.86	0.81	0.83
query16	0.37	0.37	0.37
query17	1.00	0.97	0.96
query18	0.22	0.21	0.21
query19	1.77	1.69	1.73
query20	0.01	0.01	0.01
query21	15.39	0.75	0.66
query22	4.20	6.92	2.57
query23	18.31	1.53	1.27
query24	2.06	0.24	0.23
query25	0.15	0.09	0.09
query26	0.29	0.22	0.21
query27	0.46	0.23	0.23
query28	13.26	1.01	1.00
query29	12.69	3.39	3.36
query30	0.25	0.07	0.05
query31	2.85	0.39	0.39
query32	3.28	0.47	0.47
query33	2.94	2.91	2.90
query34	17.03	4.33	4.34
query35	4.45	4.39	4.42
query36	0.66	0.47	0.47
query37	0.18	0.15	0.16
query38	0.17	0.15	0.15
query39	0.04	0.04	0.03
query40	0.15	0.13	0.12
query41	0.09	0.05	0.04
query42	0.06	0.05	0.04
query43	0.04	0.04	0.05
Total cold run time: 109.69 s
Total hot run time: 31.26 s

Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 16, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit eb17b2e into apache:master Jul 16, 2024
27 of 30 checks passed
seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 17, 2024
## Proposed changes

- Null pointer of type 'doris::StringRef' in orc reader. The root cause
is error will throw when `num_values == 0` in
`_decode_string_non_dict_encoded_column`.
```
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:9: runtime error: reference binding to null pointer of type 'doris::StringRef'
    #0 0x562516fa9770 in std::vector<doris::StringRef, std::allocator<doris::StringRef> >::operator[](unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:2
    #1 0x562516fa9770 in doris::Status doris::vectorized::OrcReader::_decode_string_non_dict_encoded_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::EncodedStringVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1164:39
    apache#2 0x562516f9c08b in doris::Status doris::vectorized::OrcReader::_decode_string_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1116:16
    apache#3 0x562516f91d73 in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1357:16
    apache#4 0x562516c79a0c in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1524:5
    apache#5 0x562516f9339a in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1402:9
    apache#6 0x562516c79a0c in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1524:5
...
```

- Shift exponent 128 is too large for 128-bit type 'ValueCopyType' (aka
'__int128') in parquet reader. The root cause is error will throw when
`len == 0`.
```
/root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:413:27: runtime error: shift exponent 128 is too large for 128-bit type 'ValueCopyType' (aka '__int128')
    #0 0x56251760fbc7 in doris::vectorized::parquet::StringToDecimal<doris::vectorized::Decimal128V3, (doris::vectorized::DecimalScaleParams::ScaleType)1>::physical_convert(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&) /root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:413:27
    #1 0x562517290dc4 in doris::vectorized::parquet::PhysicalToLogicalConverter::convert(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, doris::TypeDescriptor, std::shared_ptr<doris::vectorized::IDataType const> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, bool) /root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:209:9
    apache#2 0x562517284a6d in doris::vectorized::ScalarColumnReader::read_column_data(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const>&, doris::vectorized::ColumnSelectVector&, unsigned long, unsigned long*, bool*, bool) /root/doris/be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:569:24
    apache#3 0x56251725ae7e in doris::vectorized::RowGroupReader::_read_column_data(doris::vectorized::Block*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, unsigned long, unsigned long*, bool*, doris::vectorized::ColumnSelectVector&) /root/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:421:13
    apache#4 0x56251724d6d2 in doris::vectorized::RowGroupReader::next_batch(doris::vectorized::Block*, unsigned long, unsigned long*, bool*) /root/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:321:9
    apache#5 0x56251708eb97 in doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) /root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:530:36
    apache#6 0x56253036772d in doris::vectorized::VFileScanner::_get_block_wrapped(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:311:13
    apache#7 0x562530366549 in doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:253:17
    apache#8 0x5625176e79c8 in doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vscanner.cpp:117:17
    apache#9 0x5625176e6fc1 in doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vscanner.cpp:84:12
    apache#10 0x562517698047 in doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:250:5
    apache#11 0x56251769bc1f in doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:172:25
    apache#12 0x56251769bc1f in doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:171:35
...
```
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
## Proposed changes

- Null pointer of type 'doris::StringRef' in orc reader. The root cause
is error will throw when `num_values == 0` in
`_decode_string_non_dict_encoded_column`.
```
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:9: runtime error: reference binding to null pointer of type 'doris::StringRef'
    #0 0x562516fa9770 in std::vector<doris::StringRef, std::allocator<doris::StringRef> >::operator[](unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:2
    #1 0x562516fa9770 in doris::Status doris::vectorized::OrcReader::_decode_string_non_dict_encoded_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::EncodedStringVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1164:39
    #2 0x562516f9c08b in doris::Status doris::vectorized::OrcReader::_decode_string_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1116:16
    #3 0x562516f91d73 in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1357:16
    #4 0x562516c79a0c in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1524:5
    #5 0x562516f9339a in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1402:9
    #6 0x562516c79a0c in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1524:5
...
```

- Shift exponent 128 is too large for 128-bit type 'ValueCopyType' (aka
'__int128') in parquet reader. The root cause is error will throw when
`len == 0`.
```
/root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:413:27: runtime error: shift exponent 128 is too large for 128-bit type 'ValueCopyType' (aka '__int128')
    #0 0x56251760fbc7 in doris::vectorized::parquet::StringToDecimal<doris::vectorized::Decimal128V3, (doris::vectorized::DecimalScaleParams::ScaleType)1>::physical_convert(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&) /root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:413:27
    #1 0x562517290dc4 in doris::vectorized::parquet::PhysicalToLogicalConverter::convert(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, doris::TypeDescriptor, std::shared_ptr<doris::vectorized::IDataType const> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, bool) /root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:209:9
    #2 0x562517284a6d in doris::vectorized::ScalarColumnReader::read_column_data(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const>&, doris::vectorized::ColumnSelectVector&, unsigned long, unsigned long*, bool*, bool) /root/doris/be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:569:24
    #3 0x56251725ae7e in doris::vectorized::RowGroupReader::_read_column_data(doris::vectorized::Block*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, unsigned long, unsigned long*, bool*, doris::vectorized::ColumnSelectVector&) /root/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:421:13
    #4 0x56251724d6d2 in doris::vectorized::RowGroupReader::next_batch(doris::vectorized::Block*, unsigned long, unsigned long*, bool*) /root/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:321:9
    #5 0x56251708eb97 in doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) /root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:530:36
    #6 0x56253036772d in doris::vectorized::VFileScanner::_get_block_wrapped(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:311:13
    #7 0x562530366549 in doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:253:17
    #8 0x5625176e79c8 in doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vscanner.cpp:117:17
    #9 0x5625176e6fc1 in doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vscanner.cpp:84:12
    #10 0x562517698047 in doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:250:5
    #11 0x56251769bc1f in doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:172:25
    #12 0x56251769bc1f in doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:171:35
...
```
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jul 24, 2024
## Proposed changes

- Null pointer of type 'doris::StringRef' in orc reader. The root cause
is error will throw when `num_values == 0` in
`_decode_string_non_dict_encoded_column`.
```
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:9: runtime error: reference binding to null pointer of type 'doris::StringRef'
    #0 0x562516fa9770 in std::vector<doris::StringRef, std::allocator<doris::StringRef> >::operator[](unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:2
    #1 0x562516fa9770 in doris::Status doris::vectorized::OrcReader::_decode_string_non_dict_encoded_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::EncodedStringVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1164:39
    apache#2 0x562516f9c08b in doris::Status doris::vectorized::OrcReader::_decode_string_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1116:16
    apache#3 0x562516f91d73 in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1357:16
    apache#4 0x562516c79a0c in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1524:5
    apache#5 0x562516f9339a in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1402:9
    apache#6 0x562516c79a0c in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /root/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1524:5
...
```

- Shift exponent 128 is too large for 128-bit type 'ValueCopyType' (aka
'__int128') in parquet reader. The root cause is error will throw when
`len == 0`.
```
/root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:413:27: runtime error: shift exponent 128 is too large for 128-bit type 'ValueCopyType' (aka '__int128')
    #0 0x56251760fbc7 in doris::vectorized::parquet::StringToDecimal<doris::vectorized::Decimal128V3, (doris::vectorized::DecimalScaleParams::ScaleType)1>::physical_convert(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&) /root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:413:27
    #1 0x562517290dc4 in doris::vectorized::parquet::PhysicalToLogicalConverter::convert(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, doris::TypeDescriptor, std::shared_ptr<doris::vectorized::IDataType const> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, bool) /root/doris/be/src/vec/exec/format/parquet/parquet_column_convert.h:209:9
    apache#2 0x562517284a6d in doris::vectorized::ScalarColumnReader::read_column_data(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const>&, doris::vectorized::ColumnSelectVector&, unsigned long, unsigned long*, bool*, bool) /root/doris/be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:569:24
    apache#3 0x56251725ae7e in doris::vectorized::RowGroupReader::_read_column_data(doris::vectorized::Block*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, unsigned long, unsigned long*, bool*, doris::vectorized::ColumnSelectVector&) /root/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:421:13
    apache#4 0x56251724d6d2 in doris::vectorized::RowGroupReader::next_batch(doris::vectorized::Block*, unsigned long, unsigned long*, bool*) /root/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:321:9
    apache#5 0x56251708eb97 in doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) /root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:530:36
    apache#6 0x56253036772d in doris::vectorized::VFileScanner::_get_block_wrapped(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:311:13
    apache#7 0x562530366549 in doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:253:17
    apache#8 0x5625176e79c8 in doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vscanner.cpp:117:17
    apache#9 0x5625176e6fc1 in doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) /root/doris/be/src/vec/exec/scan/vscanner.cpp:84:12
    apache#10 0x562517698047 in doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:250:5
    apache#11 0x56251769bc1f in doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:172:25
    apache#12 0x56251769bc1f in doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::'lambda'()::operator()() const /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:171:35
...
```
morningman pushed a commit that referenced this pull request Jul 24, 2024
morningman pushed a commit that referenced this pull request Sep 18, 2024
## Proposed changes

### Issue
```
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:9: runtime error: reference binding to null pointer of type 'doris::StringRef'
    #0 0x55ee63eb0418 in std::vector<doris::StringRef, std::allocator<doris::StringRef>>::operator[](unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:2
    #1 0x55ee63eb0418 in doris::Status doris::vectorized::OrcReader::_decode_string_non_dict_encoded_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::EncodedStringVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1172:39
    #2 0x55ee63ea2685 in doris::Status doris::vectorized::OrcReader::_decode_string_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1124:16
    #3 0x55ee63e97e7a in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1365:16
    #4 0x55ee63b0e450 in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1532:5
    #5 0x55ee63e99622 in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1410:9
    #6 0x55ee63b0e450 in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1532:5
    #7 0x55ee63ad4f86 in doris::vectorized::OrcReader::get_next_block_impl(doris::vectorized::Block*, unsigned long*, bool*) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1714:13
    #8 0x55ee63ad093b in doris::vectorized::OrcReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1547:5
```
### Solution
[Fix] (orc-reader) Fix StringRef nullptr data in orc-reader. When string
is empty in orc row batch, the data can point anything, maybe nullptr,
StringRef has undefined behavior when data is nullptr.

Related with #37845.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Sep 20, 2024
…0857)

## Proposed changes

### Issue
```
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:9: runtime error: reference binding to null pointer of type 'doris::StringRef'
    #0 0x55ee63eb0418 in std::vector<doris::StringRef, std::allocator<doris::StringRef>>::operator[](unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:2
    #1 0x55ee63eb0418 in doris::Status doris::vectorized::OrcReader::_decode_string_non_dict_encoded_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::EncodedStringVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1172:39
    apache#2 0x55ee63ea2685 in doris::Status doris::vectorized::OrcReader::_decode_string_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1124:16
    apache#3 0x55ee63e97e7a in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1365:16
    apache#4 0x55ee63b0e450 in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1532:5
    apache#5 0x55ee63e99622 in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1410:9
    apache#6 0x55ee63b0e450 in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1532:5
    apache#7 0x55ee63ad4f86 in doris::vectorized::OrcReader::get_next_block_impl(doris::vectorized::Block*, unsigned long*, bool*) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1714:13
    apache#8 0x55ee63ad093b in doris::vectorized::OrcReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1547:5
```
### Solution
[Fix] (orc-reader) Fix StringRef nullptr data in orc-reader. When string
is empty in orc row batch, the data can point anything, maybe nullptr,
StringRef has undefined behavior when data is nullptr.

Related with apache#37845.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Sep 25, 2024
…0857)

## Proposed changes

### Issue
```
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:9: runtime error: reference binding to null pointer of type 'doris::StringRef'
    #0 0x55ee63eb0418 in std::vector<doris::StringRef, std::allocator<doris::StringRef>>::operator[](unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1046:2
    #1 0x55ee63eb0418 in doris::Status doris::vectorized::OrcReader::_decode_string_non_dict_encoded_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::EncodedStringVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1172:39
    apache#2 0x55ee63ea2685 in doris::Status doris::vectorized::OrcReader::_decode_string_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn> const&, orc::TypeKind const&, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1124:16
    apache#3 0x55ee63e97e7a in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1365:16
    apache#4 0x55ee63b0e450 in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1532:5
    apache#5 0x55ee63e99622 in doris::Status doris::vectorized::OrcReader::_fill_doris_data_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1410:9
    apache#6 0x55ee63b0e450 in doris::Status doris::vectorized::OrcReader::_orc_column_to_doris_column<false>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const> const&, orc::Type const*, orc::ColumnVectorBatch*, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1532:5
    apache#7 0x55ee63ad4f86 in doris::vectorized::OrcReader::get_next_block_impl(doris::vectorized::Block*, unsigned long*, bool*) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1714:13
    apache#8 0x55ee63ad093b in doris::vectorized::OrcReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:1547:5
```
### Solution
[Fix] (orc-reader) Fix StringRef nullptr data in orc-reader. When string
is empty in orc row batch, the data can point anything, maybe nullptr,
StringRef has undefined behavior when data is nullptr.

Related with apache#37845.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants