Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for interrupt in resumeables #432

Closed
henrikssn opened this issue Jul 11, 2020 · 10 comments
Closed

Wait for interrupt in resumeables #432

henrikssn opened this issue Jul 11, 2020 · 10 comments

Comments

@henrikssn
Copy link
Contributor

Currently, modm isn't exactly sleep friendly. I want to improve that.

The resumeables are built around two states; running (wait for condition) and stop. I propose to add a third state "wait for interrupt" which indicates that the resumeable is not going to make progress until an interrupt occurs.

What the main scheduler then can do is to run the resumeables until they all are blocked on wait for interrupt (or stopped), then put CPU to sleep and let the interrupt wake it back up (and repeat). The interrupt could be timer/rtc (for delays), external or anything else the platform supports while idling/sleeping.

Most likely the main challenge is to make sure we don't accidentally put the interrupt source to sleep, however, maybe that could be handled separately by some sleep/power controller.

Happy to hear your thoughts on this before I get my hands dirty and prototype something.

@salkinium
Copy link
Member

I propose to add a third state "wait for interrupt" which indicates that the resumeable is not going to make progress until an interrupt occurs.

Hm, difficult. There is no main schedulers, it's all implicitly scheduled by the control flow of the program, but you could of course access shared memory.

Perhaps something like RF_WAIT_WHILE/UNTIL would increment a shared counter that signifies how many resumable are currently polling for a condition (and decrement afterwards), and perhaps RF_WAIT_FOR_INTERRUPT(condition); would not increment/decrement this counter, so that the main loop simply goes to sleep if the counter is zero.
Note that you cannot poll the NVIC interrupt flag directly (ie. UART_IRQn) because the interrupt has to reset this flag before exiting the interrupts, otherwise the NVIC will trigger the interrupt again (causing a deadlock). So you need to poll a volatile variable that's set in the interrupt (and reset by who?).

In case the counter is not zero, you still have to poll for the interrupt as normal, but you don't need to build up the entire call stack, you could poll for the interrupt at a "higher" call point (however that would be implemented).

But it won't be as useful as you think. For example all modm::Timeouts and modm::PeriodicTimers are polling, especially the µs resolution ones. And the SysTick only fires at 1ms intervals on ARMv6-M and 250ms intervals on ARMv7-M (see this PR as for why), so just having one resumable that waits on even just a millisecond timer will break sleeping.

Oh, and then there's also the possiblity of just manually polling if (timer.execute()) in the main loop, and you're allowed to mix this with resumables/protothreads, so you're not going to have fun prototyping this 😭.

Currently, modm isn't exactly sleep friendly.

That's a very nice way to say it, since it's really more "actively sleep hostile with extreme prejudice"… 😬

One of the most interesting alternatives is the RTFM scheduler, which is only interrupt-based: C++ Implementation, Rust Implementation.

@salkinium
Copy link
Member

But just to make this clear: As soon as C++20 coroutines are available, the whole Protothread/Resumables will be deprecated, and a new, more traditional scheduler will take their place, since coroutines with compiler-generated promises and resume points are even more efficient, flexible and especially user friendly than our Resumable hacks.

So if you want to dive deeper, perhaps check out GCC10 coroutines rather than dealing with our limitations.

@henrikssn
Copy link
Contributor Author

Thanks for the pointers, this is very interesting!

The idea with a volatile counter would work, I was thinking to transitively pass on a flag in the resumeable result state with the meaning "all children are stopped or waiting for interrupt". Then the main loop can probe the functions once and then know when to go back to sleep.

If I am not mistaken, you don't need any actual interrupt handler, take this example:

// Psueducode for RF_CALL_BLOCKING
while(true) {
  auto state = runResumeable();
  if (state == WAIT_FOR_INTERRUPT) {
    WFI_(); // Put CPU to sleep
  } else if (state == STOPPED) {
    break;
  }
}

This will wake up and run once for every interrupt that fires, which turns out to be exactly what we want in this case :) Obviously code that wants to put CPU to sleep can't make use of software timers and need to rely on something that actually makes an hardware interrupt, such as counter match on RTC. It anyway doesn't make much sense to put CPU to sleep for delays in the uS range, as it can take a while to bring the clocks back online (samd21 about 20uS).

C++ coroutines are just protothreads/resumeables with nicer syntax and better language support, and AFAIU they do (will?) have the same problem w.r.t sleeping.

Another similar example of this is FreeRTOS with collaborative scheduler and tickless idle, however that is not in the same league of memory efficiency.

@salkinium
Copy link
Member

salkinium commented Jul 12, 2020

Obviously code that wants to put CPU to sleep can't make use of software timers and need to rely on something that actually makes an hardware interrupt, such as counter match on RTC.

Yes, I think that can work. The resumable state type is an 8-bit type with >127 defined as "keep polling", so you can just add another macro that returns >127 (RF_WAIT_FOR_INTERRUPT(cond)?), and not have to change all the logic.

However I'm still in favor of having an explicit sleep function that is explicitly called by the user in the main while(1) loop, so that when you wake up from an interrupt all resumables get a go at least once. This implies a shared memory implementation with a counter that is incremented/decremented for modm::rf::Running, but not for modm::rf::WaitForInterrupt, so that other resumables can still run to "their" completion (which may require some polling), independent of the one that finishes first.

  1. wake up in main loop sleep function.
  2. run all tasks until all of them are ready to WFI (ie. counter==0)
  3. go to sleep in the main loop sleep function.

I think it is useful to have this sleep counter mechanism as its own API in its own module modm:processing:sleep (and modm:platform:sleep with the device/cpu specific stuff)? That would allow non-resumables to also declare their sleep state? And also make it easier to declare the acceptable sleep level in a future upgrade.
You can let lbuild then transparently include the logic for this into resumables/protothreads if the sleep module is also selected. That would make this truly zero cost if not used.

The user is of course still responsible for making sure it all works, but a separate module would also make it easier to provide a (centralized) log functionality for debugging to understand what task is currently preventing sleep?

I think this approach may however break the "pass a flag up the call chain" optimization, since you now have check multiple call sites since you don't know which interrupt they were waiting for. Not sure though.

AFAIU they do (will?) have the same problem w.r.t sleeping.

Yes, of course, but because the compiler encapsulates the local state into an (copyable, movable) object, you don't have to build up the entire call stack to check the "leaf" condition.

@henrikssn
Copy link
Contributor Author

I think we are aligned on the high level, but there are some details where I need to make myself better understood.

However I'm still in favor of having an explicit sleep function that is explicitly called by the user in the main while(1) loop, so that when you wake up from an interrupt all resumables get a go at least once.

Yes, this is also something I want to achieve. At the same time, I don't want to make the main loop responsible for how long to sleep or when to wake up, because that breaks modularity. For example, the UART peripheral might want to wake up on start bit interrupt, and a sensor driver every 60 sec. Adding all these conditions to the main loop will not scale.

This implies a shared memory implementation with a counter that is incremented/decremented for modm::rf::Running, but not for modm::rf::WaitForInterrupt, so that other resumables can still run to "their" completion (which may require some polling), independent of the one that finishes first.

My alternative to this is something like:

  1. Add the following state to ResultState:
enum
ResultState
{
	// States <= 127 ...
+	WaitForInterrupt = 254,
	Running = 255,
};
  1. Add a RF_WAIT_FOR_INTERRUPT macro with a single cond argument that returns the WaitForInterrupt state. On wakeup, it checks the arg and continues the execution if true or returns WaitForInterrupt if false.
  2. Change the RF_CALL macros to propagate the WaitForInterrupt state (instead of always returning Running).
  3. Write a main loop that looks like this:
while(true) {
	while (runResumeable() == ResultState::Running);
	WFI_(); // Put CPU to sleep
}

What this does is to delegate the responsibility for responding to the interrupt to the leaf, which IMO is where it belongs. It will always run the whole call stack to completion on every interrupt, which is fine because sleep interrupts cannot happen that often anyway (due to CPU wake up time).

Actually, if we make a "resumeable RTC timer" class, then we can fully encapsulate all the delay related logic in there and the caller would just need to do RF_CALL(rtc.wakeupAtMillis(wakeup_time)). Then the resumeable timer can take the decision whether to use WFI or not, depending on the sleep duration and what other wake ups are scheduled. That said, this isn't strictly needed for this to make sense. For example, here is a resumeable that would wake up whenever a button is pressed:

class Button : protected modm::NestedResumeable {
	modm::ResumableResult<void>
	runResumeable() {
		RF_BEGIN();
		while (true) {
			RF_WAIT_FOR_INTERRUPT(button_pressed);
			RF_CALL(blinkLed());  // or whatever
		}
		RF_END();
	}

	void
	initialize() { /* Sets up the ISR. */ }
private:
	volatile bool button_pressed; // Set by ISR.
}

@salkinium
Copy link
Member

Ok, I'm convinced, lets poke the CI with some code!

@henrikssn
Copy link
Contributor Author

So I stumbled upon Contiki's "process" concept which turns out to be a generalization of what I am proposing here: https://github.com/contiki-os/contiki/wiki/Processes

Essentially, you have a linked list of proto threads and a (priority-based) event queue, the scheduler pulls objects from the queue and calls the proto threads. ISRs can add events to the queue, therefore enabling event-based scheduling. In this model, you can let the CPU sleep if the queue is empty.

I like this much more because it separates the event generation from the processing, in my model it is a bit intervened (each resumeable would need some sort of local "event" state to know if the ISR happened or not). It also enables event-driven hardware independent modules, for example buffered UART should really be hardware independent (based on non-buffered UART) and not implemented one time per platform like we have it today.

The process class template would look something like this:

class Process {

  process(Event e);

private:
  Process* next;
  uint_t state;  // Same as PT / RF
};

The event class template like this:

struct Event {
  uint8_t id;
  operator<(const Event& other) {
    return id < other.id;
  }
}

// and the queue:
std::array<Event, EVENT_QUEUE_SIZE> event_queue;
std::make_heap(event_queue.begin(), event_queue.end());

This means that the buffered UART example would need two processes (rx/tx) which is a total of 2 * sizeof(void*) overhead compared to current state. We can statically allocate the processes to still be heap independent.

The event queue can be a statically allocated array that is managed by std::make_heap. We can either assert fail when this goes out of bounds (akin to nested resumeables) or discard the lowest priority event (not sure if that would be a good idea).

A new macro PROCESS_WAIT_EVENT is introduced which tells scheduler to resume execution when a specific event has happened. The PROCESS_WAIT_UNTIL and PROCESS_WAIT_WHILE macros are special cases of RF_WAIT_EVENT, and are implemented by (re-)posting a dummy event to the queue with max priority. PROCESS_CALL_RF is implemented using PROCESS_WAIT_UNTIL.

Unless there is any high-level feedback, then I can give this a shot by using it to implement a shared buffered UART impl for SAM / STM32.

@salkinium
Copy link
Member

salkinium commented Jul 20, 2020

I think this is a fantastic concept and would fit quite well into modm's design!

Note however, that the modm::atomic::Queue is only atomic for single-producer, single-consumer (it's a relic from AVR times) and so you need to use a proper atomic queue for this.

@henrikssn
Copy link
Contributor Author

henrikssn commented Jul 26, 2020

I prototyped this and I am not convinced. The main problem I encountered is the semantics of yield in a multi-thread context, it took me quite a while to get a simple consumer/producer deadlock-free and it was hard to debug why things aren't progressing.

I am considering leaving protothreads as glorified state machines (which is exactly what they are) and instead come up with something based on super lightweight context switching (think fibers). Drawback is obviously the overhead of keeping thread context (32 byte on CM0) and the advantage is that the code looks like C++.

The main design goals would be:

  • All memory (context / stacks) is statically allocated
  • Tick-less cooperative scheduler
  • Low context switch overhead (think <20 cycles)
  • Low memory overhead (target ~40 bytes on ARM)
  • Support for ISRs to pre-empt current running thread for lower latency
  • The usual support code like mutex/semaphore/wait queue/etc

Two very close candidates for inspiration is RIOT-OS and uOS++, I believe you can also tune freertos to look very much like this, which is probably what I would start with, given that there is already a modm module for it.

The sad thing is that this would start having core modm libraries (i.e. Buffered UART) depend on something like freertos, and I am not sure what the appetite for that is. I guess the main pain points are RAM and code size, I'll try to keep an eye on those.

@salkinium
Copy link
Member

salkinium commented Jul 29, 2020

Cooperative, stackful multitasking aka. fibers has been on my mind too, I've starred these projects (just FYI):

I think our UART interface needs to be simplified, and the buffer taken out of the implementation. That would imply some sort of callback or polling interface to abstract Rx/Tx interrupts or a pointer/size DMA implementation.
In either case the notification of completion can happen outside of the UART driver and thus there can be an external adapter for Fibers/Protothreads if possible.

@modm-io modm-io locked and limited conversation to collaborators Sep 29, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Development

No branches or pull requests

2 participants