Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](index compaction)Avoid get file size when create index reader and remove unnecessary file exists #41079

Merged
merged 1 commit into from
Sep 22, 2024

Conversation

qidaye
Copy link
Contributor

@qidaye qidaye commented Sep 20, 2024

Proposed changes

Get file size and file exists operations are very expensive in object storage.
Index compaction may have plenty of small files, and the head operation will be a lot time consumption.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@qidaye
Copy link
Contributor Author

qidaye commented Sep 20, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.25% (9598/25769)
Line Coverage: 28.62% (79292/277042)
Region Coverage: 28.09% (41046/146119)
Branch Coverage: 24.71% (20905/84602)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3f8bae199191bdc43b8e2611ddb0725d0623ce75_3f8bae199191bdc43b8e2611ddb0725d0623ce75/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41249 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3f8bae199191bdc43b8e2611ddb0725d0623ce75, data reload: false

------ Round 1 ----------------------------------
q1	17565	7274	7262	7262
q2	2050	167	156	156
q3	10718	1092	1184	1092
q4	10560	743	649	649
q5	7766	3050	3070	3050
q6	234	146	146	146
q7	1001	620	605	605
q8	9445	2030	2009	2009
q9	6844	6407	6425	6407
q10	7040	2271	2292	2271
q11	446	245	249	245
q12	399	208	211	208
q13	17820	2962	2922	2922
q14	242	221	220	220
q15	567	525	520	520
q16	686	616	616	616
q17	975	828	773	773
q18	7129	6702	6707	6702
q19	1402	1090	948	948
q20	565	284	277	277
q21	4099	3191	3163	3163
q22	1115	1008	1009	1008
Total cold run time: 108668 ms
Total hot run time: 41249 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7212	7251	7368	7251
q2	326	226	225	225
q3	2980	2918	2981	2918
q4	2017	1858	1869	1858
q5	5556	5574	5556	5556
q6	226	150	148	148
q7	2207	1791	1775	1775
q8	3272	3392	3413	3392
q9	8732	8905	8740	8740
q10	3565	3419	3469	3419
q11	570	476	470	470
q12	809	616	599	599
q13	11675	3132	3109	3109
q14	293	277	278	277
q15	568	516	556	516
q16	732	688	685	685
q17	1790	1582	1560	1560
q18	8110	7729	7807	7729
q19	1715	1579	1695	1579
q20	2104	1877	1875	1875
q21	5693	5248	5375	5248
q22	1151	1062	1018	1018
Total cold run time: 71303 ms
Total hot run time: 59947 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195384 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3f8bae199191bdc43b8e2611ddb0725d0623ce75, data reload: false

query1	1282	866	898	866
query2	6381	2181	2049	2049
query3	10765	4008	3902	3902
query4	63569	26823	23448	23448
query5	5191	465	480	465
query6	414	169	158	158
query7	5460	314	298	298
query8	307	229	228	228
query9	8930	2639	2600	2600
query10	497	311	280	280
query11	18009	15244	15819	15244
query12	169	98	103	98
query13	1529	429	410	410
query14	10902	7491	7521	7491
query15	221	175	183	175
query16	7000	479	499	479
query17	1167	651	593	593
query18	1724	302	314	302
query19	221	151	150	150
query20	123	114	121	114
query21	211	103	105	103
query22	4938	4494	4921	4494
query23	34804	33818	34198	33818
query24	5982	2848	2877	2848
query25	517	412	409	409
query26	640	156	160	156
query27	1604	284	286	284
query28	3912	2463	2440	2440
query29	705	425	416	416
query30	230	164	149	149
query31	991	780	769	769
query32	71	53	53	53
query33	447	290	291	290
query34	905	488	493	488
query35	845	729	714	714
query36	1061	948	922	922
query37	149	88	90	88
query38	4053	3925	3893	3893
query39	1478	1433	1449	1433
query40	208	96	99	96
query41	50	48	48	48
query42	119	99	96	96
query43	514	492	495	492
query44	1163	807	799	799
query45	200	167	169	167
query46	1133	743	746	743
query47	1921	1818	1863	1818
query48	459	375	355	355
query49	708	424	406	406
query50	836	431	385	385
query51	7024	6889	6944	6889
query52	99	85	83	83
query53	250	172	174	172
query54	578	444	441	441
query55	73	69	75	69
query56	271	258	230	230
query57	1241	1102	1092	1092
query58	226	235	246	235
query59	3462	3009	2952	2952
query60	298	257	264	257
query61	102	106	109	106
query62	770	681	643	643
query63	209	188	179	179
query64	1356	629	630	629
query65	3253	3173	3163	3163
query66	672	298	307	298
query67	16059	15681	15573	15573
query68	4630	571	572	571
query69	540	297	298	297
query70	1152	1079	1112	1079
query71	436	267	274	267
query72	6858	4077	3969	3969
query73	790	321	323	321
query74	10426	9013	9016	9013
query75	3456	2612	2668	2612
query76	2453	977	963	963
query77	702	292	289	289
query78	10087	9189	9221	9189
query79	1611	552	540	540
query80	1169	439	423	423
query81	576	240	243	240
query82	1305	142	140	140
query83	369	130	139	130
query84	294	77	75	75
query85	1045	300	278	278
query86	407	301	246	246
query87	4517	4336	4382	4336
query88	3286	2303	2307	2303
query89	401	281	286	281
query90	1949	186	185	185
query91	188	145	137	137
query92	65	48	51	48
query93	2161	525	518	518
query94	817	298	304	298
query95	352	254	243	243
query96	621	275	273	273
query97	3234	3149	3121	3121
query98	210	200	197	197
query99	1539	1306	1269	1269
Total cold run time: 318304 ms
Total hot run time: 195384 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3f8bae199191bdc43b8e2611ddb0725d0623ce75, data reload: false

query1	0.04	0.04	0.04
query2	0.06	0.02	0.02
query3	0.22	0.06	0.06
query4	1.66	0.10	0.10
query5	0.53	0.53	0.50
query6	1.14	0.73	0.73
query7	0.02	0.01	0.01
query8	0.04	0.03	0.04
query9	0.56	0.50	0.48
query10	0.58	0.56	0.55
query11	0.14	0.11	0.11
query12	0.13	0.11	0.11
query13	0.60	0.59	0.59
query14	3.03	3.08	2.96
query15	0.89	0.81	0.82
query16	0.40	0.39	0.37
query17	1.05	1.00	1.02
query18	0.20	0.18	0.20
query19	1.98	1.88	1.95
query20	0.01	0.00	0.00
query21	15.36	0.61	0.59
query22	2.68	2.12	1.57
query23	17.47	0.82	0.79
query24	2.42	1.07	1.67
query25	0.22	0.13	0.04
query26	0.48	0.14	0.12
query27	0.04	0.03	0.04
query28	10.36	1.10	1.06
query29	12.51	3.15	3.18
query30	0.25	0.06	0.05
query31	2.89	0.40	0.38
query32	3.29	0.47	0.46
query33	2.97	2.96	3.00
query34	16.78	4.34	4.38
query35	4.47	4.42	4.39
query36	0.68	0.48	0.49
query37	0.08	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.01
query43	0.03	0.03	0.03
Total cold run time: 106.61 s
Total hot run time: 32.31 s

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Sep 21, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaokang xiaokang merged commit 63f957d into apache:master Sep 22, 2024
23 of 28 checks passed
dataroaring pushed a commit that referenced this pull request Sep 23, 2024
…nd remove unnecessary file exists (#41079)

Get file size and file exists operations are very expensive in object storage.
Index compaction may have plenty of small files, and the head operation
will be a lot time consumption.
@qidaye qidaye deleted the opt_index_compaction branch September 23, 2024 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.2-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants