[rowexec] custom rowexec #8072

max-hoffman · 2024-06-25T21:45:04Z

This PR adds custom Dolt execution operators for lookup joins. When building an execution plan, we try to replace joinIter with a Dolt equivalent that inlines the key building and map get. This is a lot faster than repeatedly building the secondary iterator and materializing sql.Rows in-between lookups.

The main downside is that this PR hoists filters in join children to after materializing lookup join rows.

This brings index_join from 5.18 ms/query to 2.64 ms/q, which will be about 2.0x MySQL's latency.

This PR falls short of some aspiration goals:

We hoist table filters until after the final join row is built because we don't have a way to call scalar expressions on val.Tuple yet. There are edge case queries that might be dramatically slower because of this. To fix this, we could need to convert sql.Expression filters into a format that we could execute on val.Tuple KV pairs.
We do not yet try to to optimize consecutive lookup joins. I'm not sure if a materialization block would be better represented iteratively or recursively beyond a simple string of lookups. A lot of interfaces and indexing considerations to think about there.

Safety comments:

we fallback to GMS when lookup source/dest keys are not prolly.Encoding compatible
the source iterators are the same as what we used before, but without projection mapping to sql.Rows. The keyless iterator required a change to return duplicate rows at the KV layer (vs the sql layer).
the secondary iterators are a generalization of what we currently use, but return KV pairs instead of rows
projection mapping is the same but generalized to merge an arbitrary list of KV pairs after the join

There are extra tests here: dolthub/go-mysql-server#2593

max-hoffman · 2024-06-26T15:53:06Z

#benchmark

github-actions · 2024-06-26T15:53:29Z

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9682864996

coffeegoddd · 2024-06-26T16:29:34Z

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	73.13	89.16	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	`f7abadf`	32.99	tpcc-scale-factor-1	dolt	`c531bce`	13.52	1

coffeegoddd · 2024-06-26T17:19:50Z

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median	is_faster
covering_index_scan	2.76	2.81	0
groupby_scan	17.32	17.32	0
index_join	5.28	2.61	1
index_join_scan	2.57	2.57	0
index_scan	53.85	54.83	0
oltp_point_select	0.46	0.46	0
oltp_read_only	7.56	7.7	0
select_random_points	0.77	0.77	0
select_random_ranges	0.92	0.92	0
table_scan	54.83	55.82	0
types_table_scan	142.39	142.39	0

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	6.09	6.09
oltp_insert	3.02	3.02
oltp_read_write	14.21	14.21
oltp_update_index	3.13	3.07
oltp_update_non_index	3.02	3.02
oltp_write_only	6.43	6.43
types_delete_insert	6.67	6.67

github-actions · 2024-06-26T19:00:20Z

Additional work is required for integration with DoltgreSQL.

max-hoffman · 2024-06-26T23:05:11Z

#benchmark

github-actions · 2024-06-26T23:05:29Z

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9687941574

coffeegoddd · 2024-06-26T23:41:15Z

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	74.46	78.6	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	`0c4b3f9`	32.94	tpcc-scale-factor-1	dolt	`ef8cc8b`	32.89	0

coffeegoddd · 2024-06-27T00:31:29Z

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median	is_faster
covering_index_scan	2.86	2.86	0
groupby_scan	17.01	17.32	0
index_join	5.28	2.66	1
index_join_scan	2.52	2.57	0
index_scan	53.85	53.85	0
oltp_point_select	0.44	0.46	0
oltp_read_only	7.43	7.56	0
select_random_points	0.73	0.74	0
select_random_ranges	0.87	0.89	0
table_scan	54.83	54.83	0
types_table_scan	139.85	142.39	0

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	6.09	6.09
oltp_insert	2.97	3.02
oltp_read_write	13.7	13.95
oltp_update_index	3.07	3.07
oltp_update_non_index	3.02	3.02
oltp_write_only	6.32	6.43
types_delete_insert	6.55	6.67

…te.sh

coffeegoddd · 2024-07-10T15:46:08Z

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`c50e5b8`	ok	5937457

version	total_tests
`c50e5b8`	5937457

correctness_percentage
100.0

coffeegoddd · 2024-07-10T15:56:29Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`921e758`	ok	5937457

version	total_tests
`921e758`	5937457

correctness_percentage
100.0

…dolt-row-exec

max-hoffman · 2024-07-10T19:35:10Z

#benchmark

github-actions · 2024-07-10T19:35:35Z

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9880357397

coffeegoddd · 2024-07-10T20:03:42Z

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`ef3a0cf`	ok	5937457

version	total_tests
`ef3a0cf`	5937457

correctness_percentage
100.0

…te.sh

coffeegoddd · 2024-07-12T17:54:11Z

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`f5ab6be`	ok	5937457

version	total_tests
`f5ab6be`	5937457

correctness_percentage
100.0

coffeegoddd · 2024-07-12T18:02:41Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`327ce5e`	ok	5937457

version	total_tests
`327ce5e`	5937457

correctness_percentage
100.0

max-hoffman · 2024-07-12T18:03:28Z

#benchmark

github-actions · 2024-07-12T18:03:55Z

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9912163940

coffeegoddd · 2024-07-12T18:39:41Z

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	74.46	77.19	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	`3745baa`	32.69	tpcc-scale-factor-1	dolt	`327ce5e`	32.51	0

…dolt-row-exec

coffeegoddd · 2024-07-12T19:29:57Z

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median
covering_index_scan	3.02	3.02
groupby_scan	17.32	17.32
index_join	5.28	5.28
index_join_scan	2.61	2.61
index_scan	55.82	55.82
oltp_point_select	0.46	0.46
oltp_read_only	7.84	7.84
select_random_points	0.75	0.75
select_random_ranges	0.9	0.9
table_scan	56.84	57.87
types_table_scan	147.61	147.61

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	6.09	6.09
oltp_insert	3.02	3.02
oltp_read_write	14.21	14.21
oltp_update_index	3.13	3.13
oltp_update_non_index	3.02	3.02
oltp_write_only	6.43	6.43
types_delete_insert	6.67	6.67

max-hoffman · 2024-07-12T19:46:28Z

Trying to figure out how to add a unit test for making sure we do the optimization when we expect. Surprisingly easy to accidentally disable.

coffeegoddd · 2024-07-12T20:09:14Z

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`ba424ee`	ok	5937457

version	total_tests
`ba424ee`	5937457

correctness_percentage
100.0

max-hoffman · 2024-07-12T20:57:41Z

#benchmark

github-actions · 2024-07-12T20:58:06Z

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9914037567

coffeegoddd · 2024-07-12T21:33:50Z

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	75.82	75.82	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	`3745baa`	32.33	tpcc-scale-factor-1	dolt	`ba424ee`	32.28	0

coffeegoddd · 2024-07-12T22:24:03Z

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median	is_faster
covering_index_scan	3.02	3.02	0
groupby_scan	17.32	17.32	0
index_join	5.28	2.71	1
index_join_scan	2.61	2.61	0
index_scan	54.83	54.83	0
oltp_point_select	0.46	0.46	0
oltp_read_only	7.7	7.7	0
select_random_points	0.75	0.75	0
select_random_ranges	0.9	0.9	0
table_scan	56.84	56.84	0
types_table_scan	144.97	144.97	0

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	5.99	5.99
oltp_insert	2.97	3.02
oltp_read_write	13.95	13.95
oltp_update_index	3.07	3.07
oltp_update_non_index	2.97	3.02
oltp_write_only	6.32	6.43
types_delete_insert	6.55	6.67

…te.sh

coffeegoddd · 2024-07-15T00:10:00Z

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`8a59480`	ok	5937457

version	total_tests
`8a59480`	5937457

correctness_percentage
100.0

coffeegoddd · 2024-07-15T00:17:55Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`bf6fdbb`	ok	5937457

version	total_tests
`bf6fdbb`	5937457

correctness_percentage
100.0

…dolt-row-exec

…te.sh

coffeegoddd · 2024-07-15T17:34:30Z

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`425b421`	ok	5937457

version	total_tests
`425b421`	5937457

correctness_percentage
100.0

coffeegoddd · 2024-07-15T17:41:52Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`cc67078`	ok	5937457

version	total_tests
`cc67078`	5937457

correctness_percentage
100.0

[rowexec] custom rowexec

c531bce

merge main

f34c015

max-hoffman added 2 commits June 26, 2024 12:06

fix build

9ef0aee

testing progress

ef8cc8b

max-hoffman and others added 6 commits June 27, 2024 15:15

save progress

3e7979c

fix more tests

e762320

skip optimization for system table indexes

9e31daf

merge main

7d320a9

fix headers

c50e5b8

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

921e758

…te.sh

coffeegoddd added the correctness_approved label Jul 10, 2024

max-hoffman added 5 commits July 10, 2024 10:27

progressive refactors

e64bcc5

more refactoring

db176c7

del comment

bc2fbd9

Merge branch 'max/dolt-row-exec' of github.com:dolthub/dolt into max/…

c60837a

…dolt-row-exec

merge main

ef3a0cf

max-hoffman and others added 5 commits July 12, 2024 13:04

merge

b6af5be

reformat

9ae4af6

delete unused file

a438e54

undo int->string conversion that isn't always valid

f5ab6be

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

327ce5e

…te.sh

max-hoffman mentioned this pull request Jul 12, 2024

custom row exec dolthub/go-mysql-server#2593

Open

max-hoffman added 2 commits July 12, 2024 15:19

formatting

0c301eb

Merge branch 'max/dolt-row-exec' of github.com:dolthub/dolt into max/…

f83b4c1

…dolt-row-exec

sometimes dstSchema nil, which is not used

ba424ee

max-hoffman and others added 2 commits July 14, 2024 18:37

add tests

8a59480

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

bf6fdbb

…te.sh

max-hoffman and others added 3 commits July 15, 2024 10:00

more tests and cleanup

85ff8b4

Merge branch 'max/dolt-row-exec' of github.com:dolthub/dolt into max/…

425b421

…dolt-row-exec

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

cc67078

…te.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rowexec] custom rowexec #8072

[rowexec] custom rowexec #8072

max-hoffman commented Jun 25, 2024 •

edited

Loading

max-hoffman commented Jun 26, 2024

github-actions bot commented Jun 26, 2024

coffeegoddd commented Jun 26, 2024

coffeegoddd commented Jun 26, 2024

github-actions bot commented Jun 26, 2024

max-hoffman commented Jun 26, 2024

github-actions bot commented Jun 26, 2024

coffeegoddd commented Jun 26, 2024

coffeegoddd commented Jun 27, 2024

coffeegoddd commented Jul 10, 2024

coffeegoddd commented Jul 10, 2024

max-hoffman commented Jul 10, 2024

github-actions bot commented Jul 10, 2024

coffeegoddd commented Jul 10, 2024

coffeegoddd commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

max-hoffman commented Jul 12, 2024

github-actions bot commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

max-hoffman commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

max-hoffman commented Jul 12, 2024

github-actions bot commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

coffeegoddd commented Jul 15, 2024

coffeegoddd commented Jul 15, 2024

coffeegoddd commented Jul 15, 2024

coffeegoddd commented Jul 15, 2024

[rowexec] custom rowexec #8072

Are you sure you want to change the base?

[rowexec] custom rowexec #8072

Conversation

max-hoffman commented Jun 25, 2024 • edited Loading

max-hoffman commented Jun 26, 2024

github-actions bot commented Jun 26, 2024

coffeegoddd commented Jun 26, 2024

coffeegoddd commented Jun 26, 2024

github-actions bot commented Jun 26, 2024

max-hoffman commented Jun 26, 2024

github-actions bot commented Jun 26, 2024

coffeegoddd commented Jun 26, 2024

coffeegoddd commented Jun 27, 2024

coffeegoddd commented Jul 10, 2024

coffeegoddd commented Jul 10, 2024

max-hoffman commented Jul 10, 2024

github-actions bot commented Jul 10, 2024

coffeegoddd commented Jul 10, 2024

coffeegoddd commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

max-hoffman commented Jul 12, 2024

github-actions bot commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

max-hoffman commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

max-hoffman commented Jul 12, 2024

github-actions bot commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

coffeegoddd commented Jul 12, 2024

coffeegoddd commented Jul 15, 2024

coffeegoddd commented Jul 15, 2024

coffeegoddd commented Jul 15, 2024

coffeegoddd commented Jul 15, 2024

max-hoffman commented Jun 25, 2024 •

edited

Loading