Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](mtmv) Fix result wrong when query rewrite by mv if query contains null_unsafe equals expression #39629

Merged

Conversation

seawinde
Copy link
Contributor

@seawinde seawinde commented Aug 20, 2024

Proposed changes

Fix result wrong when query rewrite by mv if query contains null_unsafe equals expression and the expression both side is slot
table orders data is as following:

    (null, 1, 'o', 9.5, '2023-12-08', 'a', 'b', 1, 'yy'),
    (1, null, 'o', 10.5, '2023-12-08', 'a', 'b', 1, 'yy'),
    (2, 1, null, 11.5, '2023-12-09', 'a', 'b', 1, 'yy'),
    (3, 1, 'o', null, '2023-12-10', 'a', 'b', 1, 'yy'),
    (3, 1, 'o', 33.5, null, 'a', 'b', 1, 'yy'),
    (4, 2, 'o', 43.2, '2023-12-11', null,'d',2, 'mm'),
    (5, 2, 'o', 56.2, '2023-12-12', 'c',null, 2, 'mi'),
    (5, 2, 'o', 1.2, '2023-12-12', 'c','d', null, 'mi');  

such as mv def is

select count(*), o_orderstatus, o_comment
            from orders
            group by
            o_orderstatus, o_comment;

query is as following:

           select count(*), o_orderstatus, o_comment
            from orders
            where o_orderstatus = o_orderstatus
            group by
            o_orderstatus, o_comment;

after rewrite by materialized view, the result is wrong as following, the row contains null should not appear

+----------+---------------+-----------+
| count(*) | o_orderstatus | o_comment |
+----------+---------------+-----------+
|        1 | NULL          | yy        |
|        1 | o             | mm        |
|        2 | o             | mi        |
|        4 | o             | yy        |
+----------+---------------+-----------+

the pr fix this

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38446 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9ba2d25d95272b15c0070ae0a3593982756abf20, data reload: false

------ Round 1 ----------------------------------
q1	18194	5096	4318	4318
q2	2073	217	211	211
q3	11720	1164	1167	1164
q4	10573	759	765	759
q5	7802	2934	2860	2860
q6	265	161	164	161
q7	1033	663	657	657
q8	9637	2127	2155	2127
q9	8784	6631	6558	6558
q10	7073	2269	2236	2236
q11	484	293	273	273
q12	425	250	257	250
q13	18648	3005	3016	3005
q14	294	264	266	264
q15	548	516	511	511
q16	515	426	411	411
q17	1008	679	704	679
q18	7471	6685	6782	6685
q19	1423	1029	1107	1029
q20	686	348	371	348
q21	3915	2992	2909	2909
q22	1087	1041	1031	1031
Total cold run time: 113658 ms
Total hot run time: 38446 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4385	4327	4338	4327
q2	405	307	299	299
q3	2920	2642	2692	2642
q4	1921	1686	1752	1686
q5	5664	5684	5761	5684
q6	246	155	157	155
q7	2228	1852	1823	1823
q8	3295	3473	3510	3473
q9	8866	8808	8766	8766
q10	3543	3290	3348	3290
q11	657	530	545	530
q12	848	705	684	684
q13	16921	3181	3186	3181
q14	328	295	297	295
q15	550	527	511	511
q16	523	452	455	452
q17	1891	1656	1553	1553
q18	8163	7925	7731	7731
q19	10871	1560	1672	1560
q20	2174	1919	1880	1880
q21	5612	5379	5338	5338
q22	1153	1045	1049	1045
Total cold run time: 83164 ms
Total hot run time: 56905 ms

Map<Set<SlotReference>, Set<SlotReference>> equivalenceClassSetMap = new HashMap<>();
List<Set<SlotReference>> sourceSets = source.getEquivalenceSetList();
List<Set<SlotReference>> targetSets = target.getEquivalenceSetList();
Map<List<SlotReference>, List<SlotReference>> equivalenceClassSetMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change set to list could fix this problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such as the expression o_orderstatus = o_orderstatus we should compensate o_orderstatus = o_orderstatus on materialized view.
If we record the slot equal expression in set, only get {o_orderstatus}, in Predicates#compensateEquivalence we couldn't compensate filter.
Change set to list. we get {o_orderstatus, o_orderstatus}, then we can compensate filter by Predicates#compensateEquivalence .

@seawinde seawinde force-pushed the fix_null_unsafe_equals_result_wrong branch from 9ba2d25 to 1a6fb3c Compare August 23, 2024 08:28
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37305 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1a6fb3cd23f036f1169f0c38235e42dc4bc88a7f, data reload: false

------ Round 1 ----------------------------------
q1	17608	4387	4252	4252
q2	2032	206	177	177
q3	11635	956	1105	956
q4	10534	761	678	678
q5	7765	2806	2799	2799
q6	219	135	134	134
q7	956	609	613	609
q8	9323	2053	2062	2053
q9	7184	6491	6495	6491
q10	6991	2234	2121	2121
q11	456	237	241	237
q12	390	222	226	222
q13	18141	3018	3015	3015
q14	274	227	240	227
q15	518	480	483	480
q16	507	403	392	392
q17	971	657	691	657
q18	7441	6828	6759	6759
q19	1386	1057	1086	1057
q20	655	331	322	322
q21	3812	3061	2683	2683
q22	1104	984	1001	984
Total cold run time: 109902 ms
Total hot run time: 37305 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4337	4271	4257	4257
q2	377	271	267	267
q3	2846	2659	2716	2659
q4	1906	1652	1676	1652
q5	5500	5694	5663	5663
q6	226	133	137	133
q7	2272	1823	1849	1823
q8	3310	3393	3447	3393
q9	8869	8835	8790	8790
q10	3563	3404	3368	3368
q11	600	490	507	490
q12	838	685	661	661
q13	17169	3173	3221	3173
q14	328	306	292	292
q15	553	481	489	481
q16	504	464	436	436
q17	1834	1521	1519	1519
q18	8081	7731	7983	7731
q19	1722	1509	1551	1509
q20	2176	1900	1863	1863
q21	5787	5458	5540	5458
q22	1146	1046	1032	1032
Total cold run time: 73944 ms
Total hot run time: 56650 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191331 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1a6fb3cd23f036f1169f0c38235e42dc4bc88a7f, data reload: false

query1	1257	891	868	868
query2	6289	1911	1872	1872
query3	10600	3963	3905	3905
query4	59448	25457	23128	23128
query5	5426	501	506	501
query6	406	170	172	170
query7	5799	295	285	285
query8	308	218	215	215
query9	8833	2486	2483	2483
query10	493	297	273	273
query11	16871	15040	15235	15040
query12	164	112	116	112
query13	1633	393	379	379
query14	10826	7953	7224	7224
query15	258	170	178	170
query16	7206	500	479	479
query17	1120	585	572	572
query18	1275	297	288	288
query19	307	148	145	145
query20	118	115	116	115
query21	207	99	101	99
query22	4537	4286	4391	4286
query23	34327	33543	33416	33416
query24	5952	2869	2897	2869
query25	527	378	375	375
query26	694	151	152	151
query27	1802	268	279	268
query28	3798	2063	2042	2042
query29	649	414	435	414
query30	242	150	180	150
query31	935	776	732	732
query32	84	58	58	58
query33	446	289	285	285
query34	879	498	482	482
query35	820	723	751	723
query36	1106	904	952	904
query37	144	84	83	83
query38	4050	3866	3907	3866
query39	1423	1513	1405	1405
query40	202	118	114	114
query41	46	50	46	46
query42	115	105	100	100
query43	511	464	457	457
query44	1086	735	754	735
query45	194	162	162	162
query46	1084	753	744	744
query47	1863	1788	1769	1769
query48	361	284	293	284
query49	758	431	430	430
query50	813	416	415	415
query51	7135	7058	7001	7001
query52	96	84	88	84
query53	257	185	184	184
query54	561	446	446	446
query55	83	79	81	79
query56	292	271	266	266
query57	1191	1058	1053	1053
query58	221	234	241	234
query59	3029	2768	2652	2652
query60	315	289	286	286
query61	121	117	118	117
query62	728	638	655	638
query63	223	184	195	184
query64	4421	2378	1836	1836
query65	3196	3167	3140	3140
query66	678	350	354	350
query67	15357	15107	15137	15107
query68	3047	589	589	589
query69	402	278	288	278
query70	1220	1095	1110	1095
query71	362	285	288	285
query72	6365	2330	2062	2062
query73	766	321	329	321
query74	9189	8810	8765	8765
query75	3346	2620	2722	2620
query76	1438	1023	973	973
query77	514	325	315	315
query78	9762	9039	9080	9039
query79	1061	541	548	541
query80	772	555	508	508
query81	524	222	223	222
query82	236	140	137	137
query83	172	146	155	146
query84	258	73	74	73
query85	813	281	296	281
query86	314	295	294	294
query87	4394	4319	4236	4236
query88	2968	2300	2283	2283
query89	393	284	292	284
query90	1879	195	194	194
query91	119	97	98	97
query92	59	52	51	51
query93	1084	542	547	542
query94	767	292	293	292
query95	352	259	260	259
query96	586	276	270	270
query97	3175	3038	3094	3038
query98	221	207	205	205
query99	1555	1291	1331	1291
Total cold run time: 304309 ms
Total hot run time: 191331 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.15 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1a6fb3cd23f036f1169f0c38235e42dc4bc88a7f, data reload: false

query1	0.04	0.05	0.04
query2	0.07	0.04	0.04
query3	0.22	0.05	0.05
query4	1.69	0.07	0.08
query5	0.52	0.50	0.51
query6	1.12	0.73	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.54	0.48	0.48
query10	0.55	0.54	0.52
query11	0.16	0.11	0.11
query12	0.15	0.12	0.12
query13	0.61	0.59	0.58
query14	0.75	0.80	0.78
query15	0.84	0.81	0.82
query16	0.36	0.36	0.38
query17	0.96	1.02	0.99
query18	0.21	0.20	0.19
query19	1.84	1.73	1.68
query20	0.02	0.01	0.01
query21	15.39	0.66	0.66
query22	4.78	6.67	1.62
query23	18.29	1.32	1.26
query24	2.15	0.22	0.22
query25	0.15	0.09	0.08
query26	0.29	0.18	0.18
query27	0.08	0.08	0.07
query28	13.22	1.02	1.00
query29	12.62	3.26	3.29
query30	0.24	0.06	0.05
query31	2.86	0.42	0.38
query32	3.27	0.48	0.48
query33	3.00	2.99	2.96
query34	16.98	4.36	4.45
query35	4.49	4.41	4.49
query36	0.66	0.46	0.49
query37	0.19	0.16	0.16
query38	0.15	0.15	0.15
query39	0.04	0.04	0.04
query40	0.16	0.13	0.12
query41	0.09	0.05	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.03
Total cold run time: 109.92 s
Total hot run time: 30.15 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 26, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 5d4ad02 into apache:master Aug 28, 2024
28 of 30 checks passed
seawinde added a commit to seawinde/doris that referenced this pull request Aug 28, 2024
…ns null_unsafe equals expression (apache#39629)

Fix result wrong when query rewrite by mv if query contains null_unsafe
equals expression and the expression both side is slot
table orders data is as following:

    (null, 1, 'o', 9.5, '2023-12-08', 'a', 'b', 1, 'yy'),
    (1, null, 'o', 10.5, '2023-12-08', 'a', 'b', 1, 'yy'),
    (2, 1, null, 11.5, '2023-12-09', 'a', 'b', 1, 'yy'),
    (3, 1, 'o', null, '2023-12-10', 'a', 'b', 1, 'yy'),
    (3, 1, 'o', 33.5, null, 'a', 'b', 1, 'yy'),
    (4, 2, 'o', 43.2, '2023-12-11', null,'d',2, 'mm'),
    (5, 2, 'o', 56.2, '2023-12-12', 'c',null, 2, 'mi'),
    (5, 2, 'o', 1.2, '2023-12-12', 'c','d', null, 'mi');  

such as mv def is 

select count(*), o_orderstatus, o_comment
            from orders
            group by
            o_orderstatus, o_comment;

query is as following:

           select count(*), o_orderstatus, o_comment
            from orders
            where o_orderstatus = o_orderstatus
            group by
            o_orderstatus, o_comment;

after rewrite by materialized view, the result is wrong as following,
the row contains null should not appear

+----------+---------------+-----------+
| count(*) | o_orderstatus | o_comment |
+----------+---------------+-----------+
|        1 | NULL          | yy        |
|        1 | o             | mm        |
|        2 | o             | mi        |
|        4 | o             | yy        |
+----------+---------------+-----------+
yiguolei pushed a commit that referenced this pull request Aug 28, 2024
…ns null_unsafe equals expression (#39629) (#40041)

## Proposed changes

commitId: 5d4ad02
pr: #39629
dataroaring pushed a commit that referenced this pull request Sep 3, 2024
…ns null_unsafe equals expression (#39629)

Fix result wrong when query rewrite by mv if query contains null_unsafe
equals expression and the expression both side is slot
table orders data is as following:

    (null, 1, 'o', 9.5, '2023-12-08', 'a', 'b', 1, 'yy'),
    (1, null, 'o', 10.5, '2023-12-08', 'a', 'b', 1, 'yy'),
    (2, 1, null, 11.5, '2023-12-09', 'a', 'b', 1, 'yy'),
    (3, 1, 'o', null, '2023-12-10', 'a', 'b', 1, 'yy'),
    (3, 1, 'o', 33.5, null, 'a', 'b', 1, 'yy'),
    (4, 2, 'o', 43.2, '2023-12-11', null,'d',2, 'mm'),
    (5, 2, 'o', 56.2, '2023-12-12', 'c',null, 2, 'mi'),
    (5, 2, 'o', 1.2, '2023-12-12', 'c','d', null, 'mi');  

such as mv def is 

select count(*), o_orderstatus, o_comment
            from orders
            group by
            o_orderstatus, o_comment;

query is as following:

           select count(*), o_orderstatus, o_comment
            from orders
            where o_orderstatus = o_orderstatus
            group by
            o_orderstatus, o_comment;

after rewrite by materialized view, the result is wrong as following,
the row contains null should not appear

+----------+---------------+-----------+
| count(*) | o_orderstatus | o_comment |
+----------+---------------+-----------+
|        1 | NULL          | yy        |
|        1 | o             | mm        |
|        2 | o             | mi        |
|        4 | o             | yy        |
+----------+---------------+-----------+
@yiguolei yiguolei mentioned this pull request Sep 5, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.2-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants