Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster cpu indexing #1450

Merged
merged 1 commit into from
Oct 3, 2024
Merged

Faster cpu indexing #1450

merged 1 commit into from
Oct 3, 2024

Conversation

awni
Copy link
Member

@awni awni commented Oct 3, 2024

Picking a little low hanging fruit.

Uses the ContiguousIterator in CPU scatter/gather for perf gains. Updated the iterator a little to make it easier to use.

Benchmarks

Timings on M1 Max. Source code below for benchmarks.

Bench Pre Post
Gather 1 228.555 (ms) 51.168 (ms)
Gather 2 139.326 (ms) 23.318 (ms)
Scatter 1 373.358 (ms) 128.734 (ms)
Scatter 2 321.287 (ms) 52.613 (ms)
D = 32
x = mx.zeros((D, D, D, D, D))
inds = [mx.zeros((D // 2, D // 2, D // 2), dtype=mx.int32).T] * 3

def fun(x, inds):
    return x[inds[0], :, inds[1], :, inds[2]]

timeit(fun, x, inds)

def fun(x, idx):
    return mx.take(x, idx, axis=0)

D = 32
x = mx.zeros((D, D, D, D, D))
x = x.transpose(1, 0, 4, 3, 2)
idx = mx.zeros((D // 2,)).astype(mx.int32)
timeit(fun, x, idx)

D = 32
x = mx.zeros((D, D, D, D, D))
inds = [mx.zeros((D // 2, D // 2, D // 2), dtype=mx.int32).T] * 3
update = mx.zeros((D // 2, D // 2, D, D, D // 2))
update = update.transpose(1, 0, 4, 3, 2)

def fun(x, inds, update):
    x[inds[0], :, inds[1], :, inds[2]] = update
    return x

timeit(fun, x, inds, update)

D = 32
x = mx.zeros((D, D, D, D, D))
x = x.transpose(1, 0, 4, 3, 2)
idx = mx.zeros((D // 2,)).astype(mx.int32)
update = mx.zeros((D, D // 2, D, D, D))
update = update.transpose(1, 0, 4, 3, 2)

def fun(x, idx, update):
    x[idx] = update
    return x

timeit(fun, x, idx, update)

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to say, it looks great! ContiguousIterator keeps on giving.

@awni awni merged commit 5523d9c into main Oct 3, 2024
4 checks passed
@awni awni deleted the cpu_indexing branch October 3, 2024 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants