Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](cloud) Fix cloud auto start and add a regression case #40027

Merged
merged 2 commits into from
Sep 10, 2024

Conversation

deardeng
Copy link
Contributor

@deardeng deardeng commented Aug 28, 2024

  1. Fix the cluster being suspended and select not waking up the cluster. The reason is that all be nodes in the cluster are inactive, the cluster is skipped, and the cluster that needs to be woken up cannot be found, and the wake-up logic will not be reached. And delete the redundant function getAuthorizedCloudCluster
  2. Add check after resume cluster, there must be at least one alive be in the cluster.
  3. add auto start regression case

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@deardeng
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38371 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8e89935b00168c3fe50c2ad890355e16b59edf3a, data reload: false

------ Round 1 ----------------------------------
q1	17949	4450	4306	4306
q2	2026	201	205	201
q3	10711	1227	1188	1188
q4	10310	720	760	720
q5	7771	2858	2880	2858
q6	231	142	141	141
q7	982	633	610	610
q8	9721	2104	2076	2076
q9	7009	6639	6617	6617
q10	7108	2252	2245	2245
q11	454	236	253	236
q12	395	223	226	223
q13	18727	3021	3038	3021
q14	284	237	235	235
q15	525	478	496	478
q16	614	514	496	496
q17	981	682	728	682
q18	7604	6801	6883	6801
q19	1453	1001	1050	1001
q20	688	348	329	329
q21	4136	2913	2982	2913
q22	1118	1005	994	994
Total cold run time: 110797 ms
Total hot run time: 38371 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4355	4310	4298	4298
q2	374	275	278	275
q3	2916	2672	2684	2672
q4	1990	1668	1650	1650
q5	5623	5763	5627	5627
q6	233	135	144	135
q7	2211	1853	1788	1788
q8	3363	3482	3464	3464
q9	8812	8710	8759	8710
q10	3592	3320	3305	3305
q11	603	505	532	505
q12	877	694	643	643
q13	11135	3251	3206	3206
q14	346	291	318	291
q15	553	483	504	483
q16	641	588	594	588
q17	1847	1538	1540	1538
q18	8254	7904	8080	7904
q19	1738	1612	1626	1612
q20	2190	1822	1813	1813
q21	5577	5465	5497	5465
q22	1155	1080	1095	1080
Total cold run time: 68385 ms
Total hot run time: 57052 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193084 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8e89935b00168c3fe50c2ad890355e16b59edf3a, data reload: false

query1	1236	874	853	853
query2	6172	1984	1931	1931
query3	10646	4137	4038	4038
query4	58886	25113	23216	23216
query5	5245	505	498	498
query6	407	163	163	163
query7	5766	312	305	305
query8	294	213	215	213
query9	8986	2507	2491	2491
query10	482	279	266	266
query11	18091	14997	15194	14997
query12	179	106	115	106
query13	1590	387	396	387
query14	11187	7335	6890	6890
query15	229	178	176	176
query16	7559	487	486	486
query17	1136	604	657	604
query18	2041	314	316	314
query19	313	153	165	153
query20	127	121	116	116
query21	206	111	104	104
query22	4610	4660	4439	4439
query23	34449	33384	33253	33253
query24	5935	2876	2915	2876
query25	555	399	390	390
query26	697	165	163	163
query27	1780	297	287	287
query28	3821	2140	2127	2127
query29	713	435	440	435
query30	241	154	167	154
query31	950	771	791	771
query32	85	54	59	54
query33	497	291	292	291
query34	872	502	497	497
query35	859	733	719	719
query36	1052	939	961	939
query37	159	92	94	92
query38	4113	3828	3804	3804
query39	1451	1391	1388	1388
query40	201	121	116	116
query41	48	48	45	45
query42	118	97	100	97
query43	526	484	472	472
query44	1099	757	765	757
query45	200	169	168	168
query46	1095	735	777	735
query47	1888	1773	1850	1773
query48	370	319	305	305
query49	772	438	457	438
query50	827	439	433	433
query51	7141	7075	7076	7075
query52	97	87	88	87
query53	249	186	189	186
query54	572	463	477	463
query55	83	81	81	81
query56	297	266	271	266
query57	1218	1074	1081	1074
query58	224	237	247	237
query59	3065	2812	2857	2812
query60	295	276	278	276
query61	104	96	100	96
query62	766	655	648	648
query63	208	186	188	186
query64	2795	680	679	679
query65	3200	3118	3146	3118
query66	684	340	340	340
query67	15807	15282	15216	15216
query68	4412	584	568	568
query69	422	275	283	275
query70	1192	1088	1131	1088
query71	372	279	282	279
query72	6630	4056	4069	4056
query73	762	340	341	340
query74	9199	8811	8811	8811
query75	3379	2708	2699	2699
query76	1770	1025	1009	1009
query77	544	308	323	308
query78	10106	9217	9229	9217
query79	1043	560	533	533
query80	1674	547	499	499
query81	582	236	235	235
query82	404	152	144	144
query83	263	151	149	149
query84	256	78	75	75
query85	858	283	289	283
query86	314	287	276	276
query87	4420	4256	4193	4193
query88	3458	2344	2356	2344
query89	393	291	290	290
query90	1979	193	198	193
query91	123	102	103	102
query92	66	56	53	53
query93	1080	552	557	552
query94	788	302	306	302
query95	350	270	268	268
query96	594	273	272	272
query97	3195	3100	3044	3044
query98	214	204	199	199
query99	1879	1320	1263	1263
Total cold run time: 310287 ms
Total hot run time: 193084 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8e89935b00168c3fe50c2ad890355e16b59edf3a, data reload: false

query1	0.04	0.04	0.04
query2	0.09	0.04	0.04
query3	0.23	0.06	0.05
query4	1.66	0.08	0.07
query5	0.51	0.49	0.49
query6	1.14	0.72	0.72
query7	0.02	0.01	0.01
query8	0.06	0.04	0.05
query9	0.54	0.49	0.48
query10	0.55	0.55	0.54
query11	0.15	0.12	0.11
query12	0.14	0.12	0.12
query13	0.61	0.59	0.58
query14	2.05	2.05	2.10
query15	0.83	0.81	0.82
query16	0.37	0.39	0.37
query17	1.04	1.06	0.98
query18	0.23	0.21	0.21
query19	1.82	1.79	1.79
query20	0.01	0.02	0.01
query21	15.45	0.67	0.68
query22	4.13	7.53	2.07
query23	18.28	1.39	1.33
query24	2.09	0.22	0.24
query25	0.15	0.09	0.09
query26	0.26	0.17	0.18
query27	0.08	0.08	0.08
query28	13.28	1.02	1.01
query29	12.66	3.33	3.33
query30	0.24	0.06	0.05
query31	2.88	0.41	0.39
query32	3.25	0.47	0.47
query33	2.99	2.93	3.03
query34	17.09	4.41	4.42
query35	4.43	4.48	4.50
query36	0.67	0.48	0.49
query37	0.20	0.17	0.16
query38	0.16	0.15	0.15
query39	0.04	0.04	0.03
query40	0.16	0.13	0.13
query41	0.10	0.05	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.78 s
Total hot run time: 32.3 s

Copy link
Contributor

@gavinchou gavinchou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR, describe below, what is the bug and how was it fixed, and then the removed segment of code, as well as changing the return type from void to string, does it involve a behavior change?

gavinchou
gavinchou previously approved these changes Sep 1, 2024
Copy link
Contributor

github-actions bot commented Sep 1, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Sep 1, 2024
Copy link
Contributor

github-actions bot commented Sep 1, 2024

PR approved by anyone and no changes requested.

dataroaring
dataroaring previously approved these changes Sep 5, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deardeng deardeng dismissed stale reviews from dataroaring and gavinchou via 73ea8d1 September 6, 2024 07:09
@deardeng
Copy link
Contributor Author

deardeng commented Sep 6, 2024

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Sep 6, 2024
@doris-robot
Copy link

TPC-H: Total hot run time: 37905 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 73ea8d1bfc12a111ecd4dd24aa3e84608b88c265, data reload: false

------ Round 1 ----------------------------------
q1	18208	4564	4413	4413
q2	2046	184	191	184
q3	10441	1218	1041	1041
q4	10145	699	724	699
q5	7714	2865	2808	2808
q6	226	136	137	136
q7	949	616	611	611
q8	9315	2060	2057	2057
q9	7081	6579	6479	6479
q10	7004	2197	2183	2183
q11	470	243	253	243
q12	398	231	223	223
q13	18046	3070	3068	3068
q14	277	246	238	238
q15	544	485	488	485
q16	537	426	427	426
q17	981	708	714	708
q18	7488	6826	6826	6826
q19	1399	951	1012	951
q20	704	329	326	326
q21	4263	3105	2788	2788
q22	1098	1012	1014	1012
Total cold run time: 109334 ms
Total hot run time: 37905 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4359	4327	4289	4289
q2	385	276	260	260
q3	2887	2686	2657	2657
q4	1974	1648	1605	1605
q5	5400	5389	5445	5389
q6	218	132	128	128
q7	2105	1727	1717	1717
q8	3211	3308	3337	3308
q9	8406	8353	8428	8353
q10	3430	3209	3222	3209
q11	604	491	522	491
q12	785	590	593	590
q13	13254	3073	3059	3059
q14	303	287	282	282
q15	519	488	495	488
q16	537	486	479	479
q17	1766	1504	1478	1478
q18	7788	7571	7485	7485
q19	1676	1567	1503	1503
q20	2066	1819	1856	1819
q21	5486	5141	5210	5141
q22	1133	1041	1012	1012
Total cold run time: 68292 ms
Total hot run time: 54742 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186730 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 73ea8d1bfc12a111ecd4dd24aa3e84608b88c265, data reload: false

query1	918	380	377	377
query2	6469	2022	1971	1971
query3	6658	205	213	205
query4	33657	23092	23017	23017
query5	4162	509	498	498
query6	261	170	151	151
query7	4591	288	299	288
query8	281	222	211	211
query9	8637	2467	2469	2467
query10	454	267	261	261
query11	17784	14925	15043	14925
query12	168	98	100	98
query13	1630	375	357	357
query14	9232	6779	6757	6757
query15	264	166	171	166
query16	7975	440	445	440
query17	1555	589	552	552
query18	2092	291	275	275
query19	190	136	147	136
query20	123	114	111	111
query21	207	103	104	103
query22	4535	4126	4122	4122
query23	33898	33597	33467	33467
query24	11405	2895	2767	2767
query25	636	388	377	377
query26	1224	156	152	152
query27	2922	271	269	269
query28	7442	2045	2034	2034
query29	848	400	425	400
query30	306	166	155	155
query31	993	709	769	709
query32	98	61	54	54
query33	758	276	282	276
query34	1014	481	479	479
query35	878	727	727	727
query36	1085	913	898	898
query37	166	90	83	83
query38	3909	3913	3877	3877
query39	1455	1384	1397	1384
query40	202	114	114	114
query41	48	49	47	47
query42	116	96	96	96
query43	509	453	453	453
query44	1246	763	733	733
query45	197	166	170	166
query46	1108	712	725	712
query47	1857	1794	1768	1768
query48	363	287	297	287
query49	1115	450	449	449
query50	817	398	439	398
query51	7105	6890	6956	6890
query52	98	85	84	84
query53	249	185	185	185
query54	835	437	444	437
query55	76	75	84	75
query56	265	253	252	252
query57	1215	1085	1054	1054
query58	255	234	241	234
query59	3047	2742	2677	2677
query60	304	268	250	250
query61	99	102	99	99
query62	875	642	669	642
query63	219	185	182	182
query64	4984	662	652	652
query65	3226	3173	3110	3110
query66	1434	339	343	339
query67	15880	15552	15252	15252
query68	4102	579	552	552
query69	400	274	273	273
query70	1094	1029	1150	1029
query71	340	271	309	271
query72	6418	4111	4034	4034
query73	742	320	326	320
query74	9176	8945	8803	8803
query75	3446	2729	2703	2703
query76	2330	1019	922	922
query77	509	309	317	309
query78	10692	9969	9214	9214
query79	1062	547	538	538
query80	696	499	487	487
query81	484	233	230	230
query82	233	135	138	135
query83	173	149	150	149
query84	231	79	73	73
query85	734	290	274	274
query86	305	289	286	286
query87	4405	4287	4189	4189
query88	3167	2382	2261	2261
query89	380	281	292	281
query90	2007	189	192	189
query91	127	98	97	97
query92	64	48	51	48
query93	1041	545	536	536
query94	889	299	288	288
query95	356	258	253	253
query96	595	265	259	259
query97	3193	3031	3046	3031
query98	209	201	200	200
query99	1450	1263	1263	1263
Total cold run time: 289674 ms
Total hot run time: 186730 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.09 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 73ea8d1bfc12a111ecd4dd24aa3e84608b88c265, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.06
query4	1.67	0.08	0.08
query5	0.51	0.50	0.50
query6	1.13	0.73	0.73
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.54	0.49	0.49
query10	0.52	0.55	0.53
query11	0.16	0.11	0.11
query12	0.16	0.13	0.13
query13	0.60	0.59	0.59
query14	1.40	1.45	1.45
query15	0.83	0.81	0.81
query16	0.38	0.37	0.38
query17	0.98	1.01	0.96
query18	0.21	0.19	0.21
query19	1.89	1.80	1.78
query20	0.02	0.01	0.01
query21	15.41	0.68	0.67
query22	4.63	6.17	2.55
query23	18.30	1.37	1.36
query24	2.15	0.23	0.22
query25	0.15	0.10	0.07
query26	0.27	0.17	0.18
query27	0.08	0.08	0.08
query28	13.21	1.01	1.00
query29	12.60	3.34	3.30
query30	0.24	0.05	0.06
query31	2.86	0.39	0.40
query32	3.27	0.47	0.47
query33	3.01	3.04	3.01
query34	17.06	4.39	4.41
query35	4.49	4.45	4.39
query36	0.66	0.49	0.48
query37	0.18	0.16	0.15
query38	0.16	0.16	0.14
query39	0.04	0.04	0.04
query40	0.16	0.13	0.13
query41	0.10	0.04	0.05
query42	0.06	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 110.53 s
Total hot run time: 32.09 s

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 10, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit eb4673f into apache:master Sep 10, 2024
27 of 30 checks passed
gavinchou pushed a commit that referenced this pull request Sep 11, 2024
1. Fix the cluster being suspended and select not waking up the cluster.
The reason is that all be nodes in the cluster are inactive, the cluster
is skipped, and the cluster that needs to be woken up cannot be found,
and the wake-up logic will not be reached. And delete the redundant
function getAuthorizedCloudCluster
2. Add check after resume cluster, there must be at least one alive be
in the cluster.
3. add auto start regression case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.2-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants