Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDS access proposal #29

Closed
wants to merge 2 commits into from
Closed

LDS access proposal #29

wants to merge 2 commits into from

Conversation

Jasper-Bekkers
Copy link
Contributor

No description provided.

@Jasper-Bekkers
Copy link
Contributor Author

Jasper-Bekkers commented Sep 11, 2020

Comment on lines +62 to +67
// sample 3:
let lds = LdsWriter::<u32>::new();

lds.write_thread_idx(0);
lds.write_thread_idx(666); // race?
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. It's just consecutive writes to the same address, the later write wins due to program order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - this was mostly to discuss what the potential rules from the Rust point of view are. However, since this shouldn't ever lead to invalid data I think we're fine. We could probably elide the first store entirely in this case.

Comment on lines +86 to +88
let wrt = rdr.barrier();
wrt.write_thread_idx(12234);
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems fine as long as rdr.barrier() does an OpControlBarrier. The part above with barriers in non-uniform control flow is a problem, but it's orthogonal.

@nhaehnle
Copy link

Non-uniform control flow is a problem, but I think if the implementation of barrier() guarantees that it will only proceed if all threads in the workgroup reach the barrier, then you're fine. This is not a guarantee provided by OpControlBarrier, so it would have to be added somehow.

My understanding is that safe Rust can still deadlock, which is what may happen in the examples where barriers are inside of control flow.

@Jasper-Bekkers Jasper-Bekkers mentioned this pull request Sep 11, 2020
@Jasper-Bekkers
Copy link
Contributor Author

@Tobski proposed this in chat as an alternative:

struct BufferWriter;

impl BufferWriter {
    fn write_thread_idx(&mut self, value: T);
}

struct BufferReader;


impl BufferReader {
    fn read(&self, idx: usize) -> T;
    fn partition_readers(self, ranges: [(usize, usize)]) -> [BufferReader];
    fn partition_writers(self, ranges: [(usize, usize)]) -> [BufferWriter];
}

fn join_writers(a: BufferWriter, b: BufferWriter) -> BufferReader;
fn join_readers(a: BufferReader, b: BufferReader) -> BufferReader;

the basic idea is that each pass of the bitonic sort owns a different subset of the same buffer in each iteration, and you kind of need a way to express that, which is what I'm trying to do above.

something like

get 2 elements
read
compare
write
join with neighbour to get 4 elements
read
compare
write
join with neighbour to get 8 elements
read
compare
write
partition back to 2 elements
read
compare
write
...etc.

@Jasper-Bekkers
Copy link
Contributor Author

Non-uniform control flow is a problem, but I think if the implementation of barrier() guarantees that it will only proceed if all threads in the workgroup reach the barrier, then you're fine. This is not a guarantee provided by OpControlBarrier, so it would have to be added somehow.

Would it be sensible for the compiler to move the OpControlBarrier out of the controlflow? I'm expecting problems with deeply nested control flow but it might be worth exploring.

My understanding is that safe Rust can still deadlock, which is what may happen in the examples where barriers are inside of control flow.

Rust can still deadlock - it's not one of it's safety guarantees. A quick search gives this:

fn main() {
    let (tx1, rx1) = channel();
    let (tx2, rx2) = channel();

    spawn(proc() {
        println!("Waiting for result 1");
        rx1.recv();
    
        println!("Sending result 2");
        tx2.send(());
    });

    println!("Waiting for result 2");
    rx2.recv();

    println!("Sending result 1");
    tx1.send(());
}

@Jasper-Bekkers
Copy link
Contributor Author

#8 earlier discussion

@nhaehnle
Copy link

Non-uniform control flow is a problem, but I think if the implementation of barrier() guarantees that it will only proceed if all threads in the workgroup reach the barrier, then you're fine. This is not a guarantee provided by OpControlBarrier, so it would have to be added somehow.

Would it be sensible for the compiler to move the OpControlBarrier out of the controlflow? I'm expecting problems with deeply nested control flow but it might be worth exploring.

I think you need to define what code means at the Rust level. It's possible that perhaps there's a reasonable definition which, as a consequence, requires moving OpControlBarriers out of control flow. But defining the language via "it behaves in way XYZ because that happens to be the result of the compiler transform" generally doesn't lead to a happy place.

That said, the OpControlBarrier is inside of a function. Here, that function can be inlined, and in the medium term everything is going to be inlined anyway because that's the status quo. But relying on the visibility of the OpControlBarrier seems likely to cause problems in the longer term.

@XAMPPRocky
Copy link
Member

As an non graphics programmer, I think it would nice to include more context in the RFC, as right now there's not a lot. The RFC doesn't explain what "LDS" is for example, and googling "LDS graphics" brings up graphics for the Church of Latter Day Saints, nothing about "Local Data Share".

@khyperia
Copy link
Contributor

hey, the utah teapot is graphics-y and is clearly from utah, so church of LDS is of course related to graphics, right? :D (sorry)

@Jasper-Bekkers
Copy link
Contributor Author

"LDS shader" gives heaps of resources that will explain every detail to you if you want, I don't think we should need to explain these things in these kinds of discussions, and I suggest we don't derail this one further.

However, if you came here looking for an introduction to GPU programming, be sure to check out https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-usage-large-thread-groups/ and https://anteru.net/blog/2018/even-more-compute-shaders/

@hrydgard
Copy link
Contributor

hrydgard commented Oct 20, 2020

What's maybe a little bit confusing even for non-console graphics programmers is that LDS is an AMD GCN-specific abbreviation, it might be better to use a more generic term like groupshared memory.

@khyperia
Copy link
Contributor

I don't think we should need to explain these things in these kinds of discussions

I disagree - we need to keep in mind both the audience to this repository as well as contributors. Not everyone has the same expertise areas, and the real magic happens when cross-expertise discussion happens - and that means giving background and context. It doesn't have to be long, or take much time to write - for example, the second paragraph in this comment would have been immensely helpful for both me and others, and would have saved everyone time. In any case, on a pure value basis rather than culture, explaining context and definitions opens up new collaborators who may not have been able to contribute before, and that's always good.

@Tobski Tobski mentioned this pull request Nov 9, 2020
@khyperia
Copy link
Contributor

khyperia commented Apr 1, 2021

Closing this due to inactivity, if we'd like to start pushing on this again, we can reopen.

@khyperia khyperia closed this Apr 1, 2021
@XAMPPRocky XAMPPRocky deleted the lds-barriers branch April 30, 2021 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants