Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](cloud) Fix migrate tablets between backends back and forth #39792

Merged
merged 3 commits into from
Sep 1, 2024

Conversation

deardeng
Copy link
Contributor

BUG: cloud rebalancer migrates tablets back and forth: move from A to B, then B to A, then A to B, ...

The reason is that the tabletToInfightTask map tracking in-flight tasks ignored the multi-cluster scenario, and in the statRouteInfo function, the cluster information was lost, which led to inaccurate tablets statistics.

Proposed changes

Issue Number: close #xxx

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@dataroaring
Copy link
Contributor

run buildall

@dataroaring dataroaring added dev/3.0.x usercase Important user case type label labels Aug 23, 2024
@doris-robot
Copy link

TPC-H: Total hot run time: 38957 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 10bbe2d6a1b59954b297fd617d3734584b6988fe, data reload: false

------ Round 1 ----------------------------------
q1	17721	4877	4381	4381
q2	2023	189	175	175
q3	11802	976	1188	976
q4	10521	701	740	701
q5	8189	2835	2886	2835
q6	226	140	140	140
q7	999	615	599	599
q8	9343	2104	2088	2088
q9	7044	6601	6620	6601
q10	6995	2267	2248	2248
q11	460	238	242	238
q12	393	229	226	226
q13	19311	3079	3037	3037
q14	288	232	235	232
q15	533	485	497	485
q16	494	401	394	394
q17	1002	651	686	651
q18	7499	7381	7390	7381
q19	2801	1168	1054	1054
q20	720	356	337	337
q21	4386	3124	3318	3124
q22	1180	1085	1054	1054
Total cold run time: 113930 ms
Total hot run time: 38957 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4458	4467	4441	4441
q2	373	274	269	269
q3	2963	2731	2705	2705
q4	2078	1781	1827	1781
q5	5704	5820	5681	5681
q6	236	138	135	135
q7	2162	1813	1818	1813
q8	3394	3526	3477	3477
q9	8889	8804	8961	8804
q10	3637	3347	3446	3347
q11	589	494	500	494
q12	801	617	641	617
q13	17300	3238	3195	3195
q14	323	290	279	279
q15	544	492	491	491
q16	490	470	433	433
q17	1850	1572	1540	1540
q18	8212	7878	7996	7878
q19	1758	1691	1646	1646
q20	2148	1879	1905	1879
q21	5885	5493	5559	5493
q22	1121	1083	1072	1072
Total cold run time: 74915 ms
Total hot run time: 57470 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192865 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 10bbe2d6a1b59954b297fd617d3734584b6988fe, data reload: false

query1	1241	881	882	881
query2	6442	1963	1949	1949
query3	10767	4193	4003	4003
query4	59289	26531	23324	23324
query5	5332	500	512	500
query6	429	157	160	157
query7	5754	293	299	293
query8	298	224	215	215
query9	8643	2529	2500	2500
query10	489	293	266	266
query11	18005	15012	15267	15012
query12	166	99	107	99
query13	1503	400	396	396
query14	11179	7464	7371	7371
query15	223	171	174	171
query16	7537	471	491	471
query17	1112	598	584	584
query18	2073	319	305	305
query19	301	159	162	159
query20	122	114	116	114
query21	207	111	103	103
query22	4903	4474	4839	4474
query23	34297	33800	33736	33736
query24	5938	2940	2786	2786
query25	557	394	393	393
query26	692	164	160	160
query27	1776	290	282	282
query28	3703	2091	2067	2067
query29	713	420	424	420
query30	240	156	155	155
query31	928	779	763	763
query32	81	58	57	57
query33	450	304	295	295
query34	857	474	471	471
query35	886	741	723	723
query36	1047	969	935	935
query37	146	87	84	84
query38	3935	3863	3954	3863
query39	1470	1369	1408	1369
query40	203	118	115	115
query41	46	46	44	44
query42	114	107	99	99
query43	509	485	467	467
query44	1093	752	758	752
query45	197	161	166	161
query46	1104	728	737	728
query47	1849	1799	1780	1780
query48	375	293	286	286
query49	766	423	434	423
query50	818	434	421	421
query51	7144	7093	7058	7058
query52	97	88	94	88
query53	257	182	177	177
query54	554	461	442	442
query55	75	76	76	76
query56	284	254	251	251
query57	1185	1080	1082	1080
query58	215	223	230	223
query59	3121	2972	2814	2814
query60	295	266	269	266
query61	105	102	103	102
query62	752	650	661	650
query63	223	186	182	182
query64	3334	1768	1757	1757
query65	3252	3228	3169	3169
query66	679	336	337	336
query67	15495	15269	15492	15269
query68	3020	578	580	578
query69	400	279	277	277
query70	1176	1035	1119	1035
query71	375	283	278	278
query72	6187	2345	2090	2090
query73	766	325	324	324
query74	9146	8851	8902	8851
query75	3319	2711	2722	2711
query76	1553	970	996	970
query77	546	324	323	323
query78	9648	9389	9106	9106
query79	1047	550	531	531
query80	687	502	491	491
query81	461	234	233	233
query82	235	139	131	131
query83	172	150	151	150
query84	258	77	79	77
query85	676	301	284	284
query86	294	282	272	272
query87	4322	4336	4223	4223
query88	3116	2384	2298	2298
query89	387	286	358	286
query90	1926	200	193	193
query91	123	99	98	98
query92	61	50	50	50
query93	1052	536	541	536
query94	708	318	293	293
query95	350	260	262	260
query96	587	267	264	264
query97	3197	3153	3081	3081
query98	213	199	208	199
query99	1523	1250	1259	1250
Total cold run time: 305403 ms
Total hot run time: 192865 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 10bbe2d6a1b59954b297fd617d3734584b6988fe, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.05	0.04
query3	0.22	0.05	0.06
query4	1.67	0.11	0.10
query5	0.51	0.49	0.49
query6	1.13	0.74	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.53	0.48	0.48
query10	0.54	0.52	0.54
query11	0.17	0.12	0.12
query12	0.16	0.12	0.13
query13	0.62	0.59	0.58
query14	0.76	0.83	0.78
query15	0.87	0.85	0.85
query16	0.38	0.36	0.36
query17	1.00	1.03	0.97
query18	0.22	0.20	0.21
query19	1.90	1.78	1.74
query20	0.01	0.00	0.01
query21	15.42	0.67	0.65
query22	4.40	5.93	2.22
query23	18.24	1.42	1.32
query24	2.07	0.23	0.22
query25	0.16	0.08	0.09
query26	0.26	0.18	0.18
query27	0.08	0.08	0.08
query28	13.33	1.06	1.02
query29	12.63	3.33	3.28
query30	0.24	0.06	0.06
query31	2.90	0.41	0.41
query32	3.22	0.50	0.50
query33	3.02	3.05	3.06
query34	17.11	4.49	4.49
query35	4.52	4.51	4.53
query36	0.67	0.49	0.50
query37	0.19	0.16	0.15
query38	0.15	0.16	0.15
query39	0.04	0.04	0.04
query40	0.16	0.14	0.12
query41	0.09	0.06	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.89 s
Total hot run time: 31.37 s

@deardeng
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@deardeng deardeng force-pushed the fix-cloud-rebalce branch 2 times, most recently from 20b8e61 to a00aeb4 Compare August 26, 2024 13:13
@deardeng
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 37968 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a00aeb4d6bd56f158e36f783c8205714ac4ca4d3, data reload: false

------ Round 1 ----------------------------------
q1	17611	4439	4280	4280
q2	2020	201	180	180
q3	11698	962	1207	962
q4	10518	800	689	689
q5	7771	2811	2843	2811
q6	224	139	139	139
q7	970	623	608	608
q8	9344	2105	2123	2105
q9	7158	6625	6612	6612
q10	7020	2176	2183	2176
q11	477	248	248	248
q12	397	221	226	221
q13	17869	3050	3016	3016
q14	287	229	230	229
q15	542	498	516	498
q16	519	402	391	391
q17	987	676	616	616
q18	7237	6828	6795	6795
q19	1407	1052	1117	1052
q20	677	334	341	334
q21	3849	3023	3061	3023
q22	1119	996	983	983
Total cold run time: 109701 ms
Total hot run time: 37968 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4340	4297	4292	4292
q2	376	277	278	277
q3	2927	2661	2680	2661
q4	1918	1650	1635	1635
q5	5590	5672	5696	5672
q6	225	144	148	144
q7	2275	1823	1866	1823
q8	3313	3434	3463	3434
q9	8827	8860	8876	8860
q10	3549	3414	3397	3397
q11	594	538	518	518
q12	835	690	657	657
q13	16831	3168	3223	3168
q14	325	297	292	292
q15	524	490	483	483
q16	496	456	463	456
q17	1843	1572	1539	1539
q18	8016	7725	7911	7725
q19	1712	1565	1532	1532
q20	2160	1885	1878	1878
q21	5804	5342	5513	5342
q22	1132	1042	1027	1027
Total cold run time: 73612 ms
Total hot run time: 56812 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192005 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a00aeb4d6bd56f158e36f783c8205714ac4ca4d3, data reload: false

query1	1252	872	856	856
query2	6306	1930	1904	1904
query3	10587	4028	3969	3969
query4	59809	26062	23270	23270
query5	5505	509	508	508
query6	411	159	165	159
query7	5890	295	292	292
query8	290	214	208	208
query9	8927	2465	2468	2465
query10	488	278	259	259
query11	17860	15119	15402	15119
query12	170	102	104	102
query13	1533	396	382	382
query14	10800	7376	7694	7376
query15	255	183	186	183
query16	7434	446	490	446
query17	1131	567	574	567
query18	1869	306	294	294
query19	304	148	142	142
query20	118	110	110	110
query21	210	111	107	107
query22	4564	4509	4556	4509
query23	34405	33786	33350	33350
query24	5864	2843	2847	2843
query25	537	389	399	389
query26	694	161	156	156
query27	1795	279	282	279
query28	3646	2048	2018	2018
query29	682	423	422	422
query30	242	148	149	148
query31	940	763	755	755
query32	86	53	58	53
query33	490	298	296	296
query34	860	472	471	471
query35	832	724	718	718
query36	1081	943	949	943
query37	144	96	84	84
query38	3989	4004	3913	3913
query39	1437	1402	1407	1402
query40	200	118	119	118
query41	50	48	50	48
query42	127	99	98	98
query43	518	493	492	492
query44	1091	746	748	746
query45	197	170	222	170
query46	1083	748	706	706
query47	1875	1809	1771	1771
query48	356	288	291	288
query49	760	418	426	418
query50	814	417	417	417
query51	7110	7124	6975	6975
query52	98	86	84	84
query53	252	180	180	180
query54	548	477	448	448
query55	74	73	75	73
query56	265	245	261	245
query57	1173	1059	1067	1059
query58	233	226	238	226
query59	3067	2824	2666	2666
query60	316	267	263	263
query61	97	94	101	94
query62	744	637	644	637
query63	214	183	183	183
query64	3262	1767	1727	1727
query65	3207	3176	3186	3176
query66	612	339	327	327
query67	15483	15277	15128	15128
query68	3110	574	555	555
query69	403	276	283	276
query70	1141	1135	1106	1106
query71	337	271	269	269
query72	2732	2170	2070	2070
query73	693	319	310	310
query74	9257	8894	8907	8894
query75	3348	2697	2696	2696
query76	1518	1003	985	985
query77	533	346	308	308
query78	9607	9087	9069	9069
query79	1037	530	532	530
query80	672	503	556	503
query81	462	232	230	230
query82	285	137	139	137
query83	174	152	149	149
query84	248	78	74	74
query85	672	292	279	279
query86	305	305	291	291
query87	4509	4282	4324	4282
query88	2966	2277	2265	2265
query89	379	280	291	280
query90	2074	200	197	197
query91	123	101	103	101
query92	63	52	56	52
query93	1083	525	525	525
query94	712	278	285	278
query95	338	259	266	259
query96	590	267	267	267
query97	3223	3098	3076	3076
query98	212	207	211	207
query99	1572	1255	1316	1255
Total cold run time: 301641 ms
Total hot run time: 192005 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.14 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a00aeb4d6bd56f158e36f783c8205714ac4ca4d3, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.06
query4	1.67	0.08	0.09
query5	0.49	0.49	0.51
query6	1.14	0.73	0.73
query7	0.02	0.01	0.01
query8	0.06	0.05	0.05
query9	0.55	0.50	0.49
query10	0.56	0.56	0.54
query11	0.16	0.12	0.11
query12	0.15	0.13	0.12
query13	0.63	0.59	0.58
query14	0.77	0.79	0.80
query15	0.84	0.83	0.82
query16	0.38	0.39	0.38
query17	1.06	1.02	1.05
query18	0.21	0.20	0.20
query19	1.85	1.75	1.84
query20	0.01	0.01	0.01
query21	15.40	0.66	0.67
query22	4.29	5.87	2.28
query23	18.28	1.37	1.17
query24	2.15	0.25	0.22
query25	0.15	0.09	0.08
query26	0.26	0.18	0.18
query27	0.08	0.09	0.08
query28	13.16	1.03	1.00
query29	12.62	3.40	3.35
query30	0.25	0.06	0.06
query31	2.87	0.41	0.39
query32	3.24	0.48	0.48
query33	3.02	3.01	3.01
query34	17.07	4.37	4.40
query35	4.47	4.44	4.45
query36	0.66	0.47	0.48
query37	0.19	0.17	0.16
query38	0.16	0.15	0.15
query39	0.05	0.04	0.03
query40	0.16	0.13	0.13
query41	0.09	0.05	0.06
query42	0.06	0.04	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.62 s
Total hot run time: 31.14 s

dataroaring
dataroaring previously approved these changes Aug 27, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 27, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Collaborator

@yujun777 yujun777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deardeng
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 28, 2024
Copy link
Collaborator

@yujun777 yujun777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 37887 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2b9110605effff3e762ddaa3a5cbbb3261cc21af, data reload: false

------ Round 1 ----------------------------------
q1	17772	4357	4258	4258
q2	2023	186	178	178
q3	11757	937	1121	937
q4	10529	698	735	698
q5	7761	2845	2751	2751
q6	233	137	140	137
q7	966	633	611	611
q8	9333	2035	2108	2035
q9	7245	6529	6475	6475
q10	6987	2232	2138	2138
q11	449	242	235	235
q12	392	235	228	228
q13	18734	2995	3054	2995
q14	277	239	234	234
q15	525	480	499	480
q16	568	542	528	528
q17	992	653	656	653
q18	7347	6915	6889	6889
q19	1392	1068	1073	1068
q20	671	330	324	324
q21	4050	3014	3148	3014
q22	1127	1046	1021	1021
Total cold run time: 111130 ms
Total hot run time: 37887 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4363	4604	4277	4277
q2	371	276	279	276
q3	2906	2637	2630	2630
q4	1970	1649	1692	1649
q5	5671	5635	5747	5635
q6	228	137	145	137
q7	2215	1842	1850	1842
q8	3269	3452	3449	3449
q9	8827	8839	8791	8791
q10	3626	3418	3331	3331
q11	607	515	515	515
q12	836	679	689	679
q13	15096	3243	3311	3243
q14	321	290	297	290
q15	522	492	508	492
q16	642	584	588	584
q17	1808	1556	1567	1556
q18	8143	7875	7856	7856
q19	1707	1654	1601	1601
q20	2176	1902	1939	1902
q21	5800	5548	5440	5440
q22	1142	1069	1039	1039
Total cold run time: 72246 ms
Total hot run time: 57214 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192880 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2b9110605effff3e762ddaa3a5cbbb3261cc21af, data reload: false

query1	1251	893	871	871
query2	6396	1934	1960	1934
query3	10629	3958	4005	3958
query4	59405	24077	23211	23211
query5	5593	530	508	508
query6	438	172	168	168
query7	6084	293	291	291
query8	301	205	211	205
query9	8866	2521	2509	2509
query10	485	271	259	259
query11	17258	15079	15172	15079
query12	161	108	104	104
query13	1581	395	378	378
query14	11644	7053	7477	7053
query15	245	180	184	180
query16	7564	483	463	463
query17	1141	550	559	550
query18	2065	303	305	303
query19	289	153	145	145
query20	118	115	112	112
query21	215	102	106	102
query22	4532	4620	4415	4415
query23	34318	33695	33369	33369
query24	5810	2868	2838	2838
query25	534	381	374	374
query26	683	156	156	156
query27	1783	283	288	283
query28	3693	2173	2150	2150
query29	705	411	416	411
query30	237	157	164	157
query31	965	780	786	780
query32	77	54	57	54
query33	452	291	284	284
query34	883	487	489	487
query35	864	724	724	724
query36	1097	940	952	940
query37	151	94	99	94
query38	3984	3857	3877	3857
query39	1453	1395	1398	1395
query40	201	123	124	123
query41	48	49	46	46
query42	116	101	99	99
query43	536	494	487	487
query44	1088	742	763	742
query45	200	168	169	168
query46	1077	740	735	735
query47	1910	1806	1812	1806
query48	373	303	314	303
query49	775	447	477	447
query50	822	416	416	416
query51	7250	7043	7193	7043
query52	98	88	89	88
query53	251	181	185	181
query54	568	461	468	461
query55	79	79	79	79
query56	294	270	300	270
query57	1199	1062	1049	1049
query58	232	273	260	260
query59	3008	2904	2774	2774
query60	310	282	282	282
query61	126	118	119	118
query62	750	664	659	659
query63	226	194	188	188
query64	2965	766	761	761
query65	3240	3165	3188	3165
query66	696	344	348	344
query67	15426	15591	15086	15086
query68	3396	583	576	576
query69	425	296	285	285
query70	1142	1138	1087	1087
query71	370	280	285	280
query72	6455	4191	3976	3976
query73	761	332	335	332
query74	9183	8909	8766	8766
query75	3358	2718	2744	2718
query76	1396	947	981	947
query77	554	338	326	326
query78	11314	9577	9048	9048
query79	1584	548	545	545
query80	896	522	508	508
query81	570	238	236	236
query82	365	152	152	152
query83	247	146	150	146
query84	288	77	72	72
query85	707	293	278	278
query86	447	300	297	297
query87	4486	4381	4157	4157
query88	3973	2323	2311	2311
query89	384	284	289	284
query90	1819	197	199	197
query91	121	98	100	98
query92	59	53	52	52
query93	1104	522	526	522
query94	840	304	292	292
query95	355	266	262	262
query96	591	266	272	266
query97	3241	3095	3047	3047
query98	216	214	205	205
query99	1541	1311	1281	1281
Total cold run time: 310002 ms
Total hot run time: 192880 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.65 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2b9110605effff3e762ddaa3a5cbbb3261cc21af, data reload: false

query1	0.05	0.05	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.52	0.48	0.49
query6	1.12	0.74	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.56	0.50	0.48
query10	0.54	0.55	0.53
query11	0.17	0.12	0.11
query12	0.15	0.12	0.13
query13	0.62	0.60	0.59
query14	2.10	2.06	2.11
query15	0.85	0.83	0.82
query16	0.38	0.40	0.39
query17	1.07	1.04	1.00
query18	0.20	0.20	0.20
query19	1.94	1.85	1.85
query20	0.02	0.01	0.01
query21	15.39	0.66	0.65
query22	3.81	6.74	2.32
query23	18.33	1.32	1.38
query24	2.07	0.25	0.22
query25	0.15	0.08	0.09
query26	0.26	0.19	0.19
query27	0.08	0.08	0.08
query28	13.26	1.03	1.01
query29	12.64	3.38	3.34
query30	0.25	0.05	0.06
query31	2.88	0.42	0.39
query32	3.23	0.49	0.48
query33	3.02	3.00	3.03
query34	17.09	4.37	4.40
query35	4.45	4.45	4.44
query36	0.66	0.47	0.48
query37	0.20	0.17	0.17
query38	0.17	0.16	0.15
query39	0.04	0.04	0.04
query40	0.16	0.12	0.13
query41	0.09	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 110.68 s
Total hot run time: 32.65 s

@gavinchou
Copy link
Contributor

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 28, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 38557 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2b9110605effff3e762ddaa3a5cbbb3261cc21af, data reload: false

------ Round 1 ----------------------------------
q1	17613	4460	4348	4348
q2	2022	184	177	177
q3	10477	1167	1193	1167
q4	10135	692	702	692
q5	7734	2924	2864	2864
q6	231	135	139	135
q7	974	614	609	609
q8	9541	2091	2085	2085
q9	8863	6640	6561	6561
q10	7079	2144	2173	2144
q11	460	245	246	245
q12	560	232	234	232
q13	18904	3037	3049	3037
q14	274	244	234	234
q15	530	486	488	486
q16	620	508	517	508
q17	998	653	678	653
q18	7571	6904	6862	6862
q19	1395	1164	1076	1076
q20	686	336	333	333
q21	4116	3194	3104	3104
q22	1135	1027	1005	1005
Total cold run time: 111918 ms
Total hot run time: 38557 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4542	4456	4338	4338
q2	386	300	280	280
q3	2921	2665	2639	2639
q4	1989	1642	1674	1642
q5	5408	5445	5398	5398
q6	222	133	129	129
q7	2147	1734	1713	1713
q8	3203	3363	3351	3351
q9	8467	8457	8448	8448
q10	3449	3206	3258	3206
q11	590	508	518	508
q12	804	613	625	613
q13	9568	3009	3034	3009
q14	310	282	281	281
q15	516	468	487	468
q16	638	557	557	557
q17	1814	1528	1468	1468
q18	7734	7531	7503	7503
q19	1717	1544	1453	1453
q20	2080	1823	1840	1823
q21	5533	5314	5160	5160
q22	1104	1042	1018	1018
Total cold run time: 65142 ms
Total hot run time: 55005 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188649 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2b9110605effff3e762ddaa3a5cbbb3261cc21af, data reload: false

query1	920	379	372	372
query2	6304	2000	1842	1842
query3	6466	216	225	216
query4	26973	23116	23297	23116
query5	3631	523	488	488
query6	259	175	181	175
query7	4359	302	302	302
query8	256	218	215	215
query9	7484	2530	2488	2488
query10	388	277	265	265
query11	17591	14932	15012	14932
query12	150	109	98	98
query13	1221	390	366	366
query14	9797	7301	7540	7301
query15	289	177	177	177
query16	6886	467	481	467
query17	1405	573	572	572
query18	1739	302	294	294
query19	293	154	153	153
query20	119	115	110	110
query21	210	108	105	105
query22	4266	4210	4089	4089
query23	34179	33320	33327	33320
query24	9735	2903	2847	2847
query25	638	420	408	408
query26	1323	164	165	164
query27	2344	296	285	285
query28	6025	2155	2119	2119
query29	833	443	426	426
query30	260	161	160	160
query31	998	773	784	773
query32	87	60	68	60
query33	680	299	297	297
query34	890	485	505	485
query35	869	721	725	721
query36	1100	897	935	897
query37	168	100	103	100
query38	3949	3873	3872	3872
query39	1462	1596	1376	1376
query40	239	124	122	122
query41	50	49	50	49
query42	120	100	101	100
query43	527	476	469	469
query44	1231	766	763	763
query45	197	169	173	169
query46	1107	750	775	750
query47	1890	1821	1775	1775
query48	385	295	303	295
query49	1142	459	446	446
query50	822	431	423	423
query51	7237	7058	7052	7052
query52	101	92	100	92
query53	260	190	188	188
query54	1065	473	475	473
query55	81	85	78	78
query56	292	265	261	261
query57	1209	1050	1112	1050
query58	252	236	232	232
query59	3051	2898	2806	2806
query60	304	280	283	280
query61	123	126	126	126
query62	854	677	669	669
query63	228	189	189	189
query64	4787	720	638	638
query65	3394	3175	3186	3175
query66	1305	341	358	341
query67	15540	15434	15228	15228
query68	4269	620	580	580
query69	502	279	279	279
query70	1173	1161	1091	1091
query71	372	278	280	278
query72	7056	4011	4027	4011
query73	753	338	337	337
query74	9140	8845	8833	8833
query75	3563	2728	2736	2728
query76	3075	1243	1032	1032
query77	503	316	328	316
query78	10054	9191	9026	9026
query79	1580	544	549	544
query80	876	508	509	508
query81	590	277	238	238
query82	727	152	151	151
query83	252	158	153	153
query84	244	79	79	79
query85	1363	303	286	286
query86	396	289	277	277
query87	4481	4243	4249	4243
query88	3319	2372	2460	2372
query89	396	301	293	293
query90	1926	204	204	204
query91	132	103	103	103
query92	62	52	52	52
query93	1171	565	576	565
query94	954	304	307	304
query95	369	269	267	267
query96	593	271	275	271
query97	3185	3100	3066	3066
query98	215	203	202	202
query99	1515	1268	1277	1268
Total cold run time: 279145 ms
Total hot run time: 188649 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.45 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2b9110605effff3e762ddaa3a5cbbb3261cc21af, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.08	0.07
query5	0.51	0.50	0.49
query6	1.12	0.74	0.74
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.55	0.50	0.49
query10	0.55	0.56	0.52
query11	0.16	0.11	0.11
query12	0.15	0.13	0.13
query13	0.63	0.59	0.57
query14	2.02	2.08	2.12
query15	0.85	0.82	0.81
query16	0.40	0.38	0.38
query17	1.08	1.04	1.06
query18	0.21	0.20	0.20
query19	1.91	1.83	1.86
query20	0.02	0.01	0.01
query21	15.41	0.67	0.66
query22	4.20	6.80	2.22
query23	18.27	1.43	1.29
query24	2.14	0.22	0.22
query25	0.18	0.09	0.07
query26	0.26	0.18	0.17
query27	0.08	0.07	0.07
query28	13.21	1.02	0.99
query29	12.60	3.36	3.31
query30	0.23	0.06	0.06
query31	2.90	0.41	0.39
query32	3.23	0.47	0.47
query33	3.02	2.97	3.04
query34	17.23	4.37	4.37
query35	4.40	4.47	4.47
query36	0.65	0.50	0.47
query37	0.19	0.16	0.16
query38	0.16	0.15	0.15
query39	0.05	0.04	0.04
query40	0.16	0.13	0.14
query41	0.10	0.06	0.04
query42	0.06	0.04	0.04
query43	0.05	0.05	0.04
Total cold run time: 111.03 s
Total hot run time: 32.45 s

BUG: cloud rebalancer migrates tablets back and forth: move from A to B, then B to A, then A to B, ...

The reason is that the tabletToInfightTask map tracking in-flight tasks ignored the multi-cluster scenario, and in the statRouteInfo function, the cluster information was lost, which led to inaccurate tablets statistics.
@dataroaring
Copy link
Contributor

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38591 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6a261c309b3cf6449e73eedcd5501348f606face, data reload: false

------ Round 1 ----------------------------------
q1	18172	4464	4383	4383
q2	2542	190	177	177
q3	11523	1159	1106	1106
q4	10246	677	722	677
q5	7882	2923	2863	2863
q6	235	143	148	143
q7	976	644	630	630
q8	9391	2064	2033	2033
q9	7059	6549	6539	6539
q10	7011	2263	2242	2242
q11	440	246	245	245
q12	403	235	233	233
q13	17766	3033	3042	3033
q14	280	236	239	236
q15	529	496	482	482
q16	574	507	517	507
q17	977	698	701	698
q18	7355	6944	6769	6769
q19	1394	1060	1016	1016
q20	703	358	347	347
q21	4886	3221	3226	3221
q22	1127	1011	1051	1011
Total cold run time: 111471 ms
Total hot run time: 38591 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4314	4242	4287	4242
q2	387	286	272	272
q3	2921	2690	2641	2641
q4	1964	1644	1710	1644
q5	5386	5403	5414	5403
q6	226	134	133	133
q7	2135	1752	1751	1751
q8	3231	3359	3363	3359
q9	8400	8508	8465	8465
q10	3463	3218	3216	3216
q11	597	505	508	505
q12	789	608	628	608
q13	12341	3034	3093	3034
q14	321	289	270	270
q15	533	485	472	472
q16	604	575	561	561
q17	1778	1489	1480	1480
q18	7791	7589	7342	7342
q19	1671	1430	1606	1430
q20	2032	1841	1845	1841
q21	5439	5256	5268	5256
q22	1123	1058	1040	1040
Total cold run time: 67446 ms
Total hot run time: 54965 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188810 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6a261c309b3cf6449e73eedcd5501348f606face, data reload: false

query1	902	377	368	368
query2	6444	2023	2047	2023
query3	6652	209	217	209
query4	33643	23128	23417	23128
query5	4200	498	500	498
query6	263	167	170	167
query7	4578	313	294	294
query8	247	201	204	201
query9	8760	2466	2464	2464
query10	438	273	281	273
query11	17801	15236	15054	15054
query12	152	103	99	99
query13	1627	394	365	365
query14	9995	7272	6947	6947
query15	279	170	183	170
query16	7460	473	475	473
query17	1588	582	564	564
query18	1771	287	295	287
query19	243	143	144	143
query20	114	113	112	112
query21	211	108	103	103
query22	4408	4193	4115	4115
query23	34303	33587	33303	33303
query24	11181	2999	2916	2916
query25	611	377	380	377
query26	1097	168	164	164
query27	2290	292	285	285
query28	7031	2119	2097	2097
query29	703	445	447	445
query30	306	167	158	158
query31	974	751	773	751
query32	101	60	59	59
query33	753	305	294	294
query34	952	482	491	482
query35	877	716	743	716
query36	1100	911	931	911
query37	154	88	91	88
query38	3999	3837	3923	3837
query39	1450	1394	1384	1384
query40	199	122	119	119
query41	49	47	45	45
query42	120	100	99	99
query43	514	498	489	489
query44	1256	757	749	749
query45	199	170	176	170
query46	1122	777	783	777
query47	1866	1760	1778	1760
query48	374	300	300	300
query49	1108	493	432	432
query50	819	427	419	419
query51	7231	7079	7098	7079
query52	104	91	89	89
query53	267	196	183	183
query54	944	462	459	459
query55	80	77	79	77
query56	285	279	273	273
query57	1199	1075	1078	1075
query58	247	262	233	233
query59	3026	2780	2826	2780
query60	304	271	280	271
query61	103	99	102	99
query62	908	633	679	633
query63	219	188	186	186
query64	4597	680	681	680
query65	3260	3155	3195	3155
query66	1354	338	349	338
query67	15577	15248	15138	15138
query68	3504	608	596	596
query69	403	300	291	291
query70	1118	1122	1115	1115
query71	350	289	285	285
query72	6465	4190	4205	4190
query73	755	337	337	337
query74	9188	8844	8912	8844
query75	3444	2693	2726	2693
query76	1872	1012	1023	1012
query77	492	350	349	349
query78	9682	9304	9015	9015
query79	1074	576	574	574
query80	887	539	539	539
query81	567	238	241	238
query82	1198	161	151	151
query83	242	161	169	161
query84	234	86	82	82
query85	1042	357	345	345
query86	320	297	297	297
query87	4374	4313	4303	4303
query88	2901	2323	2327	2323
query89	388	298	293	293
query90	2026	206	208	206
query91	139	117	115	115
query92	68	58	55	55
query93	1059	555	546	546
query94	911	299	316	299
query95	383	289	286	286
query96	596	273	271	271
query97	3151	3096	3069	3069
query98	224	361	242	242
query99	1524	1257	1276	1257
Total cold run time: 287570 ms
Total hot run time: 188810 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6a261c309b3cf6449e73eedcd5501348f606face, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.06	0.05
query4	1.67	0.09	0.08
query5	0.50	0.49	0.50
query6	1.12	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.54	0.49	0.48
query10	0.53	0.55	0.53
query11	0.16	0.11	0.11
query12	0.15	0.13	0.12
query13	0.61	0.59	0.59
query14	2.00	2.04	2.12
query15	0.90	0.81	0.82
query16	0.36	0.37	0.39
query17	0.99	1.00	1.02
query18	0.22	0.20	0.20
query19	1.87	1.88	1.78
query20	0.02	0.01	0.01
query21	15.39	0.67	0.65
query22	4.17	6.54	2.12
query23	18.28	1.38	1.32
query24	2.07	0.22	0.22
query25	0.15	0.08	0.09
query26	0.27	0.18	0.18
query27	0.07	0.08	0.09
query28	13.30	1.01	1.00
query29	12.64	3.43	3.38
query30	0.24	0.07	0.05
query31	2.87	0.41	0.39
query32	3.26	0.49	0.49
query33	2.94	3.00	2.98
query34	16.95	4.36	4.40
query35	4.43	4.41	4.45
query36	0.66	0.47	0.47
query37	0.19	0.16	0.15
query38	0.16	0.15	0.15
query39	0.05	0.04	0.04
query40	0.15	0.13	0.13
query41	0.10	0.04	0.05
query42	0.06	0.05	0.06
query43	0.05	0.05	0.05
Total cold run time: 110.51 s
Total hot run time: 32.31 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gavinchou gavinchou merged commit 090b19f into apache:master Sep 1, 2024
24 of 26 checks passed
dataroaring pushed a commit that referenced this pull request Sep 3, 2024
)

BUG: cloud rebalancer migrates tablets back and forth: move from A to B,
then B to A, then A to B, ...

The reason is that the tabletToInfightTask map tracking in-flight tasks
ignored the multi-cluster scenario, and in the statRouteInfo function,
the cluster information was lost, which led to inaccurate tablets
statistics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.2-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants