Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](move-memtable) do not execute close if create rowset failed when loading MOW table #40105

Merged
merged 1 commit into from
Sep 2, 2024

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Aug 29, 2024

Core dump happened when load to MOW table:

Check failure stack trace: ***
@ 0x55fae437d246 google::LogMessage::SendToLog()
@ 0x55fae4379c90 google::LogMessage::Flush()
@ 0x55fae437da89 google::LogMessageFatal::~LogMessageFatal()
@ 0x55faacf26bbf doris::BaseTablet::check_delete_bitmap_correctness()
@ 0x55fab05049ef doris::RowsetBuilder::commit_txn()
@ 0x55fab09026e8 doris::LoadStreamWriter::close()
@ 0x55fab089eff7 std::_Function_handler<>::_M_invoke()
@ 0x55fab0d14d7c doris::WorkThreadPool<>::work_thread()
@ 0x55fae76ae6f0 execute_native_thread_routine
@ 0x7fa32ea45ac3 (unknown)
@ 0x7fa32ead7850 (unknown)
@ (nil) (unknown)
Query id: a21981d5c8ef4113-84df9a5a8680e004 ***
is nereids: 0 ***
tablet id: 0 ***
Aborted at 1724668499 (unix time) try "date -d @1724668499" if you are using GNU date ***
Current BE git commitID: 2f848737c1 ***
SIGABRT unknown detail explain (@0x20db) received by PID 8411 (TID 9837 OR 0x7f9e42cfe640) from PID 8411; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# 0x00007FA32E9F3520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill at ./nptl/pthread_kill.c:89
3# raise at ../sysdeps/posix/raise.c:27
4# abort at ./stdlib/abort.c:81
5# 0x000055FAE4387B1D in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
6# 0x000055FAE437A15A in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
8# google::LogMessage::Flush() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::BaseTablet::check_delete_bitmap_correctness(std::shared_ptr, long, long, std::unordered_set, std::equal_to, std::allocator > const&, std::vector, std::allocator > >*) at /home/zcp/repo_center/doris_master/doris/be/src/olap/base_tablet.cpp:1152
11# doris::RowsetBuilder::commit_txn() at /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset_builder.cpp:316
12# doris::LoadStreamWriter::close() at /home/zcp/repo_center/doris_master/doris/be/src/runtime/load_stream_writer.cpp:311
13# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_master/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc+-v3/src/c+11/thread.cc:84
16# start_thread at ./nptl/pthread_create.c:442
17# 0x00007FA32EAD7850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

If create rowset failed, calc_delete_bitmap_task still could be executed:

add segment failed load_id=5649413b98976f0d-a105b42749f561b0, txn_id=2, tablet_id=10088, status=[INTERNAL_ERROR]create row
set failed
...
submit calc delete bitmap task to executor, tablet_id: 10088, txn_id: 2

This PR skips close to avoid submit_calc_delete_bitmap_task if create rowset failed when loading MOW table to solve this problem.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@sollhui
Copy link
Contributor Author

sollhui commented Aug 29, 2024

run buildall

Copy link
Contributor

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38543 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9a0a2e61100ee6e9e7c511dd17de48e5108c77cb, data reload: false

------ Round 1 ----------------------------------
q1	18024	4488	4371	4371
q2	3049	180	181	180
q3	10989	1144	1064	1064
q4	10726	761	787	761
q5	8332	2887	2872	2872
q6	229	142	149	142
q7	975	627	614	614
q8	9343	2070	2103	2070
q9	7170	6543	6596	6543
q10	7004	2224	2177	2177
q11	449	249	250	249
q12	399	237	234	234
q13	17774	3036	3053	3036
q14	298	232	243	232
q15	527	482	483	482
q16	583	501	524	501
q17	1007	787	725	725
q18	7445	6794	6944	6794
q19	1388	1002	1018	1002
q20	704	349	340	340
q21	4001	3154	3128	3128
q22	1120	1039	1026	1026
Total cold run time: 111536 ms
Total hot run time: 38543 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4368	4272	4253	4253
q2	387	276	260	260
q3	2883	2686	2687	2686
q4	1973	1658	1669	1658
q5	5449	5391	5428	5391
q6	215	135	135	135
q7	2124	1760	1785	1760
q8	3208	3367	3365	3365
q9	8480	8442	8504	8442
q10	3440	3187	3146	3146
q11	608	503	506	503
q12	804	649	632	632
q13	10499	3078	3062	3062
q14	321	275	276	275
q15	542	502	484	484
q16	655	567	580	567
q17	1784	1478	1494	1478
q18	7837	7392	7348	7348
q19	1669	1637	1605	1605
q20	2050	1829	1820	1820
q21	5577	5262	5259	5259
q22	1130	1030	1039	1030
Total cold run time: 66003 ms
Total hot run time: 55159 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188357 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9a0a2e61100ee6e9e7c511dd17de48e5108c77cb, data reload: false

query1	909	370	358	358
query2	6446	1936	1886	1886
query3	6656	216	216	216
query4	34051	23055	23283	23055
query5	4169	512	482	482
query6	251	166	176	166
query7	4572	303	291	291
query8	250	210	204	204
query9	8711	2487	2492	2487
query10	429	262	280	262
query11	16732	15080	15192	15080
query12	148	101	96	96
query13	1645	386	373	373
query14	8939	7375	6483	6483
query15	223	179	182	179
query16	7981	489	456	456
query17	1598	582	586	582
query18	2027	300	291	291
query19	197	151	150	150
query20	122	114	113	113
query21	220	105	108	105
query22	4591	4232	4208	4208
query23	34315	33606	33290	33290
query24	11178	2887	2872	2872
query25	651	415	404	404
query26	1227	163	165	163
query27	2527	287	283	283
query28	7427	2147	2128	2128
query29	848	423	424	423
query30	312	165	155	155
query31	1008	775	794	775
query32	101	58	58	58
query33	759	306	304	304
query34	969	496	495	495
query35	850	751	743	743
query36	1110	935	952	935
query37	166	99	101	99
query38	4040	3827	3893	3827
query39	1449	1410	1393	1393
query40	204	124	123	123
query41	49	49	49	49
query42	122	96	97	96
query43	516	466	466	466
query44	1243	770	751	751
query45	201	179	168	168
query46	1109	746	781	746
query47	1902	1816	1800	1800
query48	370	296	300	296
query49	1111	463	443	443
query50	816	424	414	414
query51	7211	7141	7101	7101
query52	101	87	89	87
query53	264	192	192	192
query54	1055	467	463	463
query55	84	78	79	78
query56	294	265	288	265
query57	1211	1072	1082	1072
query58	261	235	237	235
query59	2976	2705	2759	2705
query60	316	281	280	280
query61	124	120	120	120
query62	841	654	692	654
query63	214	192	194	192
query64	5429	773	750	750
query65	3238	3130	3129	3129
query66	1423	353	345	345
query67	15649	15593	15458	15458
query68	4597	571	559	559
query69	421	279	272	272
query70	1119	1117	1044	1044
query71	399	277	270	270
query72	6862	4058	4013	4013
query73	767	334	337	334
query74	9265	8994	8908	8908
query75	3551	2702	2685	2685
query76	2549	1030	997	997
query77	549	325	320	320
query78	9734	9167	9996	9167
query79	2730	531	555	531
query80	993	494	500	494
query81	590	241	239	239
query82	615	146	149	146
query83	242	150	154	150
query84	234	86	75	75
query85	1752	312	286	286
query86	487	295	300	295
query87	4418	4427	4258	4258
query88	3730	2364	2346	2346
query89	383	289	288	288
query90	1857	200	203	200
query91	128	100	115	100
query92	62	54	54	54
query93	1910	562	547	547
query94	948	304	292	292
query95	356	268	269	268
query96	614	276	270	270
query97	3213	3110	3091	3091
query98	226	203	199	199
query99	1610	1313	1311	1311
Total cold run time: 295049 ms
Total hot run time: 188357 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.97 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9a0a2e61100ee6e9e7c511dd17de48e5108c77cb, data reload: false

query1	0.05	0.05	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.08	0.08
query5	0.52	0.49	0.50
query6	1.13	0.73	0.71
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.55	0.48	0.48
query10	0.54	0.54	0.56
query11	0.15	0.12	0.11
query12	0.15	0.12	0.12
query13	0.60	0.58	0.58
query14	2.11	2.04	2.06
query15	0.90	0.82	0.82
query16	0.37	0.38	0.38
query17	0.99	0.99	1.07
query18	0.21	0.20	0.20
query19	1.94	1.86	1.83
query20	0.01	0.01	0.02
query21	15.41	0.67	0.65
query22	4.21	6.88	1.98
query23	18.29	1.38	1.25
query24	2.08	0.21	0.23
query25	0.14	0.08	0.08
query26	0.28	0.18	0.18
query27	0.08	0.08	0.08
query28	13.23	1.02	1.00
query29	12.64	3.35	3.30
query30	0.25	0.06	0.05
query31	2.88	0.40	0.39
query32	3.26	0.48	0.46
query33	2.96	3.02	3.02
query34	16.84	4.39	4.33
query35	4.46	4.39	4.48
query36	0.65	0.46	0.46
query37	0.20	0.15	0.16
query38	0.16	0.15	0.16
query39	0.05	0.03	0.04
query40	0.16	0.13	0.13
query41	0.09	0.04	0.04
query42	0.06	0.05	0.04
query43	0.04	0.04	0.05
Total cold run time: 110.69 s
Total hot run time: 31.97 s

@sollhui
Copy link
Contributor Author

sollhui commented Aug 29, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38073 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f213a8abfb1674e74497f8c3c60cfa72d37b7088, data reload: false

------ Round 1 ----------------------------------
q1	18127	4638	4392	4392
q2	2178	195	174	174
q3	10471	1119	1100	1100
q4	10130	793	680	680
q5	7733	2837	2814	2814
q6	230	135	137	135
q7	964	614	594	594
q8	9335	2082	2085	2082
q9	7337	6573	6577	6573
q10	7007	2219	2191	2191
q11	471	242	243	242
q12	400	227	227	227
q13	17913	3065	3043	3043
q14	282	242	232	232
q15	534	494	488	488
q16	585	500	490	490
q17	976	658	704	658
q18	7550	6853	6877	6853
q19	1412	1004	1049	1004
q20	724	336	337	336
q21	3881	2908	2780	2780
q22	1097	989	985	985
Total cold run time: 109337 ms
Total hot run time: 38073 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4348	4264	4267	4264
q2	387	274	278	274
q3	2852	2656	2650	2650
q4	1967	1588	1674	1588
q5	5387	5379	5432	5379
q6	223	129	131	129
q7	2106	1761	1701	1701
q8	3188	3363	3358	3358
q9	8426	8383	8469	8383
q10	3447	3186	3207	3186
q11	604	509	490	490
q12	801	624	612	612
q13	13036	2990	3079	2990
q14	301	272	261	261
q15	526	475	477	475
q16	626	583	560	560
q17	1818	1486	1490	1486
q18	7651	7605	7417	7417
q19	1663	1691	1462	1462
q20	2049	1812	1819	1812
q21	5525	5239	5254	5239
q22	1102	1032	1021	1021
Total cold run time: 68033 ms
Total hot run time: 54737 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187573 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f213a8abfb1674e74497f8c3c60cfa72d37b7088, data reload: false

query1	915	377	359	359
query2	6467	2070	1977	1977
query3	6641	212	222	212
query4	34386	23338	23135	23135
query5	4136	504	511	504
query6	256	165	171	165
query7	4590	301	293	293
query8	261	214	217	214
query9	8726	2515	2508	2508
query10	435	275	276	275
query11	17836	14979	15064	14979
query12	162	108	100	100
query13	1640	386	373	373
query14	9964	6878	7253	6878
query15	258	168	180	168
query16	7993	421	472	421
query17	1578	565	555	555
query18	2086	289	283	283
query19	281	145	141	141
query20	117	111	110	110
query21	216	107	102	102
query22	4319	4156	4024	4024
query23	34152	33308	33178	33178
query24	11178	2931	2907	2907
query25	643	371	384	371
query26	1178	163	161	161
query27	2327	294	290	290
query28	7027	2140	2120	2120
query29	816	418	410	410
query30	309	156	154	154
query31	1014	762	795	762
query32	100	59	60	59
query33	764	292	295	292
query34	975	481	495	481
query35	874	747	731	731
query36	1090	928	950	928
query37	166	99	129	99
query38	4000	3818	3774	3774
query39	1431	1384	1390	1384
query40	213	121	119	119
query41	48	46	45	45
query42	114	95	99	95
query43	533	494	484	484
query44	1270	757	779	757
query45	197	164	170	164
query46	1110	731	742	731
query47	1865	1783	1793	1783
query48	382	303	292	292
query49	1090	436	436	436
query50	800	405	418	405
query51	7341	7071	7117	7071
query52	101	87	93	87
query53	260	190	186	186
query54	975	462	472	462
query55	77	81	81	81
query56	283	268	267	267
query57	1209	1073	1064	1064
query58	246	232	314	232
query59	3093	2841	2926	2841
query60	304	270	272	270
query61	107	96	101	96
query62	841	673	662	662
query63	224	187	185	185
query64	4316	690	687	687
query65	3237	3146	3170	3146
query66	1427	337	346	337
query67	15463	15332	15214	15214
query68	3134	610	588	588
query69	399	285	291	285
query70	1184	1111	1121	1111
query71	351	284	273	273
query72	6331	4115	4017	4017
query73	763	330	344	330
query74	9109	8885	8706	8706
query75	3408	2659	2743	2659
query76	1899	1014	944	944
query77	492	346	331	331
query78	9829	9767	8936	8936
query79	1076	567	558	558
query80	707	503	528	503
query81	556	246	234	234
query82	257	147	145	145
query83	244	154	159	154
query84	231	82	84	82
query85	1089	292	280	280
query86	295	287	287	287
query87	4310	4323	4195	4195
query88	3157	2359	2365	2359
query89	381	292	287	287
query90	1838	196	197	196
query91	126	107	99	99
query92	60	57	53	53
query93	1052	559	551	551
query94	566	289	290	289
query95	365	270	264	264
query96	592	271	272	271
query97	3199	3099	3070	3070
query98	214	204	205	204
query99	1701	1275	1236	1236
Total cold run time: 287348 ms
Total hot run time: 187573 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.78 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f213a8abfb1674e74497f8c3c60cfa72d37b7088, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.03
query3	0.22	0.06	0.06
query4	1.65	0.09	0.09
query5	0.48	0.49	0.49
query6	1.12	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.54	0.48	0.47
query10	0.54	0.54	0.54
query11	0.16	0.12	0.12
query12	0.15	0.13	0.13
query13	0.61	0.59	0.59
query14	2.07	2.08	2.07
query15	0.90	0.81	0.84
query16	0.38	0.37	0.38
query17	0.98	1.01	1.02
query18	0.22	0.21	0.21
query19	1.85	1.77	1.84
query20	0.01	0.01	0.01
query21	15.40	0.66	0.65
query22	4.36	7.11	1.54
query23	18.28	1.45	1.36
query24	2.10	0.23	0.22
query25	0.15	0.08	0.08
query26	0.27	0.18	0.19
query27	0.08	0.08	0.08
query28	13.24	1.02	1.00
query29	12.63	3.28	3.31
query30	0.25	0.06	0.06
query31	2.87	0.40	0.38
query32	3.25	0.48	0.47
query33	2.96	2.98	3.03
query34	17.10	4.35	4.39
query35	4.47	4.47	4.48
query36	0.66	0.47	0.47
query37	0.22	0.16	0.16
query38	0.16	0.15	0.15
query39	0.05	0.04	0.04
query40	0.16	0.13	0.13
query41	0.09	0.06	0.05
query42	0.06	0.05	0.06
query43	0.05	0.04	0.05
Total cold run time: 110.94 s
Total hot run time: 31.78 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 30, 2024
Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 05f1055 into apache:master Sep 2, 2024
26 of 28 checks passed
dataroaring pushed a commit that referenced this pull request Sep 2, 2024
…n loading MOW table (#40105)

Core dump happened when load to MOW table:
```
Check failure stack trace: ***
@ 0x55fae437d246 google::LogMessage::SendToLog()
@ 0x55fae4379c90 google::LogMessage::Flush()
@ 0x55fae437da89 google::LogMessageFatal::~LogMessageFatal()
@ 0x55faacf26bbf doris::BaseTablet::check_delete_bitmap_correctness()
@ 0x55fab05049ef doris::RowsetBuilder::commit_txn()
@ 0x55fab09026e8 doris::LoadStreamWriter::close()
@ 0x55fab089eff7 std::_Function_handler<>::_M_invoke()
@ 0x55fab0d14d7c doris::WorkThreadPool<>::work_thread()
@ 0x55fae76ae6f0 execute_native_thread_routine
@ 0x7fa32ea45ac3 (unknown)
@ 0x7fa32ead7850 (unknown)
@ (nil) (unknown)
Query id: a21981d5c8ef4113-84df9a5a8680e004 ***
is nereids: 0 ***
tablet id: 0 ***
Aborted at 1724668499 (unix time) try "date -d @1724668499" if you are using GNU date ***
Current BE git commitID: 2f84873 ***
SIGABRT unknown detail explain (@0x20db) received by PID 8411 (TID 9837 OR 0x7f9e42cfe640) from PID 8411; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# 0x00007FA32E9F3520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill at ./nptl/pthread_kill.c:89
3# raise at ../sysdeps/posix/raise.c:27
4# abort at ./stdlib/abort.c:81
5# 0x000055FAE4387B1D in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
6# 0x000055FAE437A15A in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
8# google::LogMessage::Flush() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::BaseTablet::check_delete_bitmap_correctness(std::shared_ptr, long, long, std::unordered_set, std::equal_to, std::allocator > const&, std::vector, std::allocator > >*) at /home/zcp/repo_center/doris_master/doris/be/src/olap/base_tablet.cpp:1152
11# doris::RowsetBuilder::commit_txn() at /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset_builder.cpp:316
12# doris::LoadStreamWriter::close() at /home/zcp/repo_center/doris_master/doris/be/src/runtime/load_stream_writer.cpp:311
13# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_master/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc+-v3/src/c+11/thread.cc:84
16# start_thread at ./nptl/pthread_create.c:442
17# 0x00007FA32EAD7850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

If create rowset failed,` calc_delete_bitmap_task` still could be
executed:
```
add segment failed load_id=5649413b98976f0d-a105b42749f561b0, txn_id=2, tablet_id=10088, status=[INTERNAL_ERROR]create row
set failed
...
submit calc delete bitmap task to executor, tablet_id: 10088, txn_id: 2
```

This PR skips close to avoid `submit_calc_delete_bitmap_task` if create
rowset failed when loading MOW table to solve this problem.
sollhui added a commit to sollhui/doris that referenced this pull request Sep 8, 2024
…n loading MOW table (apache#40105)

Core dump happened when load to MOW table:
```
Check failure stack trace: ***
@ 0x55fae437d246 google::LogMessage::SendToLog()
@ 0x55fae4379c90 google::LogMessage::Flush()
@ 0x55fae437da89 google::LogMessageFatal::~LogMessageFatal()
@ 0x55faacf26bbf doris::BaseTablet::check_delete_bitmap_correctness()
@ 0x55fab05049ef doris::RowsetBuilder::commit_txn()
@ 0x55fab09026e8 doris::LoadStreamWriter::close()
@ 0x55fab089eff7 std::_Function_handler<>::_M_invoke()
@ 0x55fab0d14d7c doris::WorkThreadPool<>::work_thread()
@ 0x55fae76ae6f0 execute_native_thread_routine
@ 0x7fa32ea45ac3 (unknown)
@ 0x7fa32ead7850 (unknown)
@ (nil) (unknown)
Query id: a21981d5c8ef4113-84df9a5a8680e004 ***
is nereids: 0 ***
tablet id: 0 ***
Aborted at 1724668499 (unix time) try "date -d @1724668499" if you are using GNU date ***
Current BE git commitID: 2f84873 ***
SIGABRT unknown detail explain (@0x20db) received by PID 8411 (TID 9837 OR 0x7f9e42cfe640) from PID 8411; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# 0x00007FA32E9F3520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill at ./nptl/pthread_kill.c:89
3# raise at ../sysdeps/posix/raise.c:27
4# abort at ./stdlib/abort.c:81
5# 0x000055FAE4387B1D in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
6# 0x000055FAE437A15A in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
8# google::LogMessage::Flush() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::BaseTablet::check_delete_bitmap_correctness(std::shared_ptr, long, long, std::unordered_set, std::equal_to, std::allocator > const&, std::vector, std::allocator > >*) at /home/zcp/repo_center/doris_master/doris/be/src/olap/base_tablet.cpp:1152
11# doris::RowsetBuilder::commit_txn() at /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset_builder.cpp:316
12# doris::LoadStreamWriter::close() at /home/zcp/repo_center/doris_master/doris/be/src/runtime/load_stream_writer.cpp:311
13# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_master/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc+-v3/src/c+11/thread.cc:84
16# start_thread at ./nptl/pthread_create.c:442
17# 0x00007FA32EAD7850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

If create rowset failed,` calc_delete_bitmap_task` still could be
executed:
```
add segment failed load_id=5649413b98976f0d-a105b42749f561b0, txn_id=2, tablet_id=10088, status=[INTERNAL_ERROR]create row
set failed
...
submit calc delete bitmap task to executor, tablet_id: 10088, txn_id: 2
```

This PR skips close to avoid `submit_calc_delete_bitmap_task` if create
rowset failed when loading MOW table to solve this problem.
sollhui added a commit to sollhui/doris that referenced this pull request Sep 23, 2024
…n loading MOW table (apache#40105)

Core dump happened when load to MOW table:
```
Check failure stack trace: ***
@ 0x55fae437d246 google::LogMessage::SendToLog()
@ 0x55fae4379c90 google::LogMessage::Flush()
@ 0x55fae437da89 google::LogMessageFatal::~LogMessageFatal()
@ 0x55faacf26bbf doris::BaseTablet::check_delete_bitmap_correctness()
@ 0x55fab05049ef doris::RowsetBuilder::commit_txn()
@ 0x55fab09026e8 doris::LoadStreamWriter::close()
@ 0x55fab089eff7 std::_Function_handler<>::_M_invoke()
@ 0x55fab0d14d7c doris::WorkThreadPool<>::work_thread()
@ 0x55fae76ae6f0 execute_native_thread_routine
@ 0x7fa32ea45ac3 (unknown)
@ 0x7fa32ead7850 (unknown)
@ (nil) (unknown)
Query id: a21981d5c8ef4113-84df9a5a8680e004 ***
is nereids: 0 ***
tablet id: 0 ***
Aborted at 1724668499 (unix time) try "date -d @1724668499" if you are using GNU date ***
Current BE git commitID: 2f84873 ***
SIGABRT unknown detail explain (@0x20db) received by PID 8411 (TID 9837 OR 0x7f9e42cfe640) from PID 8411; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# 0x00007FA32E9F3520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill at ./nptl/pthread_kill.c:89
3# raise at ../sysdeps/posix/raise.c:27
4# abort at ./stdlib/abort.c:81
5# 0x000055FAE4387B1D in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
6# 0x000055FAE437A15A in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
8# google::LogMessage::Flush() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::BaseTablet::check_delete_bitmap_correctness(std::shared_ptr, long, long, std::unordered_set, std::equal_to, std::allocator > const&, std::vector, std::allocator > >*) at /home/zcp/repo_center/doris_master/doris/be/src/olap/base_tablet.cpp:1152
11# doris::RowsetBuilder::commit_txn() at /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset_builder.cpp:316
12# doris::LoadStreamWriter::close() at /home/zcp/repo_center/doris_master/doris/be/src/runtime/load_stream_writer.cpp:311
13# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_master/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc+-v3/src/c+11/thread.cc:84
16# start_thread at ./nptl/pthread_create.c:442
17# 0x00007FA32EAD7850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

If create rowset failed,` calc_delete_bitmap_task` still could be
executed:
```
add segment failed load_id=5649413b98976f0d-a105b42749f561b0, txn_id=2, tablet_id=10088, status=[INTERNAL_ERROR]create row
set failed
...
submit calc delete bitmap task to executor, tablet_id: 10088, txn_id: 2
```

This PR skips close to avoid `submit_calc_delete_bitmap_task` if create
rowset failed when loading MOW table to solve this problem.
yiguolei pushed a commit that referenced this pull request Sep 24, 2024
…n loading MOW table (#40105) (#41132)

pick (#40105)

Core dump happened when load to MOW table:
```
Check failure stack trace: ***
@ 0x55fae437d246 google::LogMessage::SendToLog()
@ 0x55fae4379c90 google::LogMessage::Flush()
@ 0x55fae437da89 google::LogMessageFatal::~LogMessageFatal()
@ 0x55faacf26bbf doris::BaseTablet::check_delete_bitmap_correctness()
@ 0x55fab05049ef doris::RowsetBuilder::commit_txn()
@ 0x55fab09026e8 doris::LoadStreamWriter::close()
@ 0x55fab089eff7 std::_Function_handler<>::_M_invoke()
@ 0x55fab0d14d7c doris::WorkThreadPool<>::work_thread()
@ 0x55fae76ae6f0 execute_native_thread_routine
@ 0x7fa32ea45ac3 (unknown)
@ 0x7fa32ead7850 (unknown)
@ (nil) (unknown)
Query id: a21981d5c8ef4113-84df9a5a8680e004 ***
is nereids: 0 ***
tablet id: 0 ***
Aborted at 1724668499 (unix time) try "date -d @1724668499" if you are using GNU date ***
Current BE git commitID: 2f84873 ***
SIGABRT unknown detail explain (@0x20db) received by PID 8411 (TID 9837 OR 0x7f9e42cfe640) from PID 8411; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# 0x00007FA32E9F3520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill at ./nptl/pthread_kill.c:89
3# raise at ../sysdeps/posix/raise.c:27
4# abort at ./stdlib/abort.c:81
5# 0x000055FAE4387B1D in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
6# 0x000055FAE437A15A in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
8# google::LogMessage::Flush() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::BaseTablet::check_delete_bitmap_correctness(std::shared_ptr, long, long, std::unordered_set, std::equal_to, std::allocator > const&, std::vector, std::allocator > >*) at /home/zcp/repo_center/doris_master/doris/be/src/olap/base_tablet.cpp:1152
11# doris::RowsetBuilder::commit_txn() at /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset_builder.cpp:316
12# doris::LoadStreamWriter::close() at /home/zcp/repo_center/doris_master/doris/be/src/runtime/load_stream_writer.cpp:311
13# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_master/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc+-v3/src/c+11/thread.cc:84
16# start_thread at ./nptl/pthread_create.c:442
17# 0x00007FA32EAD7850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

If create rowset failed,` calc_delete_bitmap_task` still could be
executed:
```
add segment failed load_id=5649413b98976f0d-a105b42749f561b0, txn_id=2, tablet_id=10088, status=[INTERNAL_ERROR]create row
set failed
...
submit calc delete bitmap task to executor, tablet_id: 10088, txn_id: 2
```

This PR skips close to avoid `submit_calc_delete_bitmap_task` if create
rowset failed when loading MOW table to solve this problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.2-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants