Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] add non-blocking send #1453

Open
bgelb-openai opened this issue Jun 3, 2024 · 15 comments
Open

[enhancement] add non-blocking send #1453

bgelb-openai opened this issue Jun 3, 2024 · 15 comments
Labels
enhancement New feature or request

Comments

@bgelb-openai
Copy link
Contributor

What's hard to do? (limit 100 words)

I'd like to be able to create a proc that unconditionally receives input on a channel and incorporate it into a next state update without blocking on a send() in the same proc.

I.e. I would like to advance to the next state whether or not the send() can take place in the current iteration of next.

send_non_blocking would need to return an indication of whether or not the send took place that could be incorporated into the next state calculation.

Additionally, if the user wants to retain the data being passed to the non-blocking send (in case the send fails). User would need to explicitly stash it in the next state somewhere (otherwise it would be lost).

Current best alternative workaround (limit 100 words)

I can kind of get behavior I want by tying off the output valid signal in verilog and writing the XLS code as if the send always succeeds (perhaps in combination w/ some other channels similarly abused to pass info in the other direction). Not super practical beyond a simple block.

Your view of the "best case XLS enhancement" (limit 100 words)

Find a way to add send_non_blocking that doesn't interfere w/ normal operation.

@bgelb-openai bgelb-openai added the enhancement New feature or request label Jun 3, 2024
@cdleary
Copy link
Collaborator

cdleary commented Jun 4, 2024

@meheff I think this isn't as hard as something like peek -- we just opportunistically send the data and if not ready it's on the caller to stash it in state. Do you agree? If so I can try to implement.

(See also #1383 )

@proppy
Copy link
Member

proppy commented Jun 4, 2024

I'd like to be able to create a proc that unconditionally receives input on a channel and incorporate it into a next state update without blocking on a send() in the same proc.

does the send() have a dependency on the same data? or is it unrelated?

@mtdudek
Copy link
Contributor

mtdudek commented Jun 4, 2024

Possibly related to #965?

@bgelb-openai
Copy link
Contributor Author

In my use case send() refers to data that is stored as part of the current state.

The data received from recv is only used to produce the next state, it does not affect the data I want to conditionally send.

I could imagine a use case for a direct bypass path from recv->send (but that would obviously need to be used w/ caution) but it is not needed for my use case.

@cdleary
Copy link
Collaborator

cdleary commented Jun 4, 2024

Possibly related to #965?

Ah yes we didn't find this on a search, maybe a dup and we can consolidate

@hongted
Copy link
Collaborator

hongted commented Jun 4, 2024

I believe the difficulty in implementing non_blocking_send and peek (see #1383) are both the simulators we have, their event model and how they differ than the eventual hardware implementation.

send_non_blocking will need the following modifications to the simulator

  1. Finite depth queues (Runtimes should model fifo_depth on channels #988) vs the infinite sized queues currently modeled (i.e. sends are never blocking right now).
  2. A change in the evaluation order as to implement send_non_blocking the following sequence needs to happen
    2a. send_non_blocking enqueues some placeholder onto the queue and suspends evaluation of the proc
    2b. other procs continue evaluating until either a recv occurs or no recv occurs. The question here is how long should it determine that "no recv has occured".
    2c. Depending on the result of 2b, the send resumes and either returns that the send was dropped or successful.
  3. Solve the same issue with peek() in that the simulator may result in execution orders impossible in hardware due to lack of roll-back/atomic sections in the simulator.

@proppy
Copy link
Member

proppy commented Jun 4, 2024

The data received from recv is only used to produce the next state, it does not affect the data I want to conditionally send.

I think that particular scenario was clarified as part of a recent documentation update to https://google.github.io/xls/tutorials/what_is_a_proc/#state (as part of the wider cross-activation tokens change #1401).

It's worth noting that activation N+1 is allowed to start before the state from activation N has fully resolved; it can stall if it needs to read from the state, waiting until it can confirm that the previous activation has set the state element that it needs.

I'll let @ericastor correct me, but my understanding is that activation N+1 w/ your send should be able to trigger even before the state is updated from the recv operation.

That's not exactly how you phrased the original problem statement:

I.e. I would like to advance to the next state whether or not the send() can take place in the current iteration of next.

But I wonder if there are way to re-frame it to benefit from this behavior: the future activation that depends on the state updated from your recv operation could be "activated" independently of (and concurrently to) your send (assuming you're not having any explicit token dependency between them).

@bgelb-openai
Copy link
Contributor Author

I don't think that helps. The goal isn't to overlap parts of N and N+1 in time.

The goal is to consume an arbitrary number of data elements on an input channel unconditionally and perform a state update (i.e. complete an iteration of the proc each time such an input recv takes place).

Meanwhile, the proc wants to send some data (derived only from state) on an output channel. But if the send() can block, it will break the unconditionality of the input side.

@meheff
Copy link
Collaborator

meheff commented Jun 5, 2024

I think this would not be too hard to implement. Most of the work would probably be in codegen.

@hongted Regarding changes to the evaluation infrastructure (jit/interpreter): are these changes necessary? Once you're out of KPN any expectation of exact matching of hardware behavior and functional evaluation are out the window. I think if you want correspondence you'd just use the block jit/interpreter.

@cdleary leary were you thinking of implementing this?

@proppy
Copy link
Member

proppy commented Jun 5, 2024

The goal is to consume an arbitrary number of data elements on an input channel unconditionally and perform a state update (i.e. complete an iteration of the proc each time such an input recv takes place).

Meanwhile, the proc wants to send some data (derived only from state) on an output channel. But if the send() can block, it will break the unconditionality of the input side.

Understood that you don't want to stale the recv from "activation N" on the completion of the send of the "activation N".

The goal isn't to overlap parts of N and N+1 in time.

But do you care in the recv from "activation N+1" if the send from "activation N" has completed or not?

Considering the following:

🕒 activation N
📌(1) send(state[N]);
📌(2) state[N+1] = recv();

🕒 activation N+1
📌(3) send(state[N+1]); // blocked on 📌(1) and 📌(2)
📌(4) state[N+2] = recv(); // only blocked on 📌(2)

According to https://google.github.io/xls/tutorials/what_is_a_proc/#state:

activation N+1 is allowed to start before the state from activation N has fully resolved; it can stall if it needs to read from the state, waiting until it can confirm that the previous activation has set the state element that it needs.

Given that 📌(4) doesn't require 📌(1), "activation N+1" should be allowed to start even if 📌(1) didn't complete, @ericastor right?

@ericastor
Copy link
Collaborator

@proppy - not quite. Think in terms of storage - we don't actually have storage for an unbounded number of future values of state at the same time. We only generate storage for a finite number of values (currently, 1 + pipeline registers), so we can't possibly suspend every send while letting the recvs trigger infinitely; we would need to block somewhere.

Currently, this happens because we generate linear pipelines. Suppose activation N can't send. The stage K in which that send was scheduled will stall until the send becomes possible. This means no future activation can get through that stage, so activation N+1 will stall at stage K-1, activation N+2 will stall at stage K-2, etc... and since K is finite, this will eventually stall the whole pipeline.

@ericastor
Copy link
Collaborator

RE: @meheff @hongted - since the interpreter currently uses infinite-depth FIFOs anyway, I'm pretty sure it's a legal evaluation to just interpret all sends as succeeding, so nonblocking sends will never be tested. When we implement FIFO depth bounds, I think it's a legal evaluation to just consider a send as succeeding if there's a spot on the queue & blocking/failing otherwise... I don't think we need to suspend evaluation for any length of time. Yes, this may result in a different execution sequence when simulated at the proc level than in HW, since pre-scheduling there's no notion of simultaneity we can use to allow a send to happen at the same time as a recv, but that's expected.

Don't get me wrong, this is not a cycle-accurate simulation - there are circuits where items will be dropped in simulation but wouldn't have been in HW - but nothing above block-level is cycle-accurate anyway. If you want accurate simulation of non-KPN operations, you have to do it with a cycle-by-cycle simulation after scheduling runs, so on block IR. (Yes, this may be an argument for introducing a "scheduled IR" that lets us interpret cycle-accurately while we still have channels rather than ports!)

Come to think of it, I believe this argument also applies to peek(), but we should talk about that on #1383.

@mikex-oss
Copy link
Collaborator

Meta: it might be good to decide on canonicalizing the discussion between here and #965 (both threads have a decent amount of useful info at this point).


On the actual issue, I had some offline discussion with @ericastor as I had some questions here, and we came to an agreement that the name is slightly confusing. I jotted down a summary based on my current understanding. Please feel free to correct any errors!

send really means "will succeed and not drop data". In DSLX land, it is non-blocking because of infinite fifo depth; in RTL land, it has to deal with finite-depth FIFOs so it uses backpressure to "block" sends until ready.

The proposed send_non_blocking really means the send could fail and we want an indicator of whether it did. it's still non-blocking in DSLX since we can always receive new data; in RTL, we no longer block and instead just deal with the data loss somehow (hence the suggested name).

Since we're describing a DSLX operation, perhaps it would be better to name it something like try_send or send_maybe since the non-blocking nature isn't really reflected at this layer of abstraction. These names also carry the invariant intent of the operation between DSLX and RTL (whether the send can fail).

@bgelb-openai
Copy link
Contributor Author

bgelb-openai commented Jun 6, 2024 via email

@meheff
Copy link
Collaborator

meheff commented Jun 7, 2024

I agree with @mikex-oss and @ericastor that the name is a bit confusing and try_send or whatever would be more precise. I suppose we should rename the corresponding receive operation as well for consistency (semantically try_receive also makes sense).

In any case, I think thisan op we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants