Add retry mechanics to `pallet-scheduler` #3060

georgepisaltu · 2024-01-25T11:19:20Z

This PR adds retry mechanics to pallet-scheduler, as described in the issue above.

Users can now set a retry configuration for a task so that, in case its scheduled run fails, it will be retried after a number of blocks, for a specified number of times or until it succeeds.

If a retried task runs successfully before running out of retries, its remaining retry counter will be reset to the initial value. If a retried task runs out of retries, it will be removed from the schedule.

Tasks which need to be scheduled for a retry are still subject to weight metering and agenda space, same as a regular task. Periodic tasks will have their periodic schedule put on hold while the task is retrying.

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

polkadot/runtime/westend/src/weights/pallet_scheduler.rs

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

substrate/frame/scheduler/src/lib.rs

mordamax · 2024-01-25T15:10:32Z

bot bench-all pallet -v PIPELINE_SCRIPTS_REF=mak-bench-all-pallet --pallet=pallet_scheduler

command-bot · 2024-01-25T15:10:38Z

@mordamax https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5029575 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench-all/bench-all.sh" --pallet=pallet_scheduler. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 1-44309cf4-2563-41eb-9bf6-8c037436c2c7 to cancel this command or bot cancel to cancel all commands in this pull request.

…uler

…nto HEAD

…uler

command-bot · 2024-01-25T17:34:12Z

@mordamax Command "$PIPELINE_SCRIPTS_DIR/commands/bench-all/bench-all.sh" --pallet=pallet_scheduler has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5029575 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5029575/artifacts/download.

mordamax · 2024-01-26T11:43:33Z

bot help

command-bot · 2024-01-26T11:43:38Z

Here's a link to docs

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

muharem · 2024-02-03T07:45:07Z

substrate/frame/scheduler/src/lib.rs

+		/// put on hold while the task is retrying.
+		#[pallet::call_index(7)]
+		#[pallet::weight(<T as Config>::WeightInfo::set_retry_named(T::MaxScheduledPerBlock::get()))]
+		pub fn set_retry_named(


may be just worth mentioned in the doc how it can be removed, and make sure we have a unit test for this removal.

muharem · 2024-02-03T08:10:12Z

substrate/frame/scheduler/src/lib.rs

-						Ok(_) => {},
+						Ok(new_address) => {
+							if let Some(RetryConfig { total_retries, remaining, period }) =
+								Retries::<T>::take((when, agenda_index))


if current iteration for periodic failed and retry scheduled, is not this always none?

In description you mention that next periodic is placed on hold. What about just letting a user to decide? If a user does not want them overlap, a user controls it with a number of retries and period. A user might not want their periodic tasks to be placed on hold, because of retry, to me retry has a lower priority.
Also after all retries attempts, it can be too late to schedule next periodic iteration, I think in this case we loose the periodic task.
Without the hold, the logic looks more straightforward to me. As I mentioned in the issue, I think users should assess the possibility of overlapping between a periodic task and retries on their own.

if current iteration for periodic failed and retry scheduled, is not this always none?

If that happens, the function exits early here. But there is this corner case where there isn't enough weight to do the retry logic, and in that case retries are ignored, test here.

Regarding the rest of your comment, I can see why you'd want the period and retry to work independently. I'm ok to make it do this:

if a task fails, try to schedule a retry if a retry config is set

if the task is periodic, try to schedule the next iteration

I think if we make this behavior configurable (i.e. let users decide which behaviour they want), it would get way too complicated and it's not worth it. I think it's better if it's simpler.

I think users should assess the possibility of overlapping between a periodic task and retries on their own

I was trying to avoid this because I saw it as a footgun, but I think you're right. If users set the configurations correctly, it shouldn't be an issue.

Proposed changes in 9da58c6

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

muharem · 2024-02-13T02:28:03Z

substrate/frame/scheduler/src/lib.rs

@@ -481,8 +499,15 @@ pub mod pallet {
 		/// will be removed from the schedule.
 		///
 		/// Tasks which need to be scheduled for a retry are still subject to weight metering and
-		/// agenda space, same as a regular task. Periodic tasks will have their periodic schedule
-		/// put on hold while the task is retrying.
+		/// agenda space, same as a regular task. If a periodic task fails, it will be scheduled


may be worth mentioning that we talking about possible recoverable fail

My view on this is that we shouldn't mention in the pallet docs what type of calls this applies to because the pallet itself cannot make the distinction between a recoverable or unrecoverable fail. Only a user can do that and they should update their retry config accordingly.

substrate/frame/scheduler/src/lib.rs

muharem · 2024-02-13T02:55:08Z

substrate/frame/scheduler/src/lib.rs

@@ -1124,11 +1258,17 @@ impl<T: Config> Pallet<T> {
 			},
 			Err(()) => Err((Overweight, Some(task))),


in this and the Err arm above you abandon the retry entry if there is any.
I would move the retry handling one level above, for this you just need to add for Ok varian an info if it failed or not.
or at least take it before execute_dispatch.
please write tests for this.

in this and the Err arm above you abandon the retry entry if there is any

The Err arms above deal with completely different problems:

Err(Unavailable) is a task which may never be able to run and it may never even be attempted again

Err(Overweight) is a task which couldn't run because the scheduler ran out of weight, but it will be run the next time the scheduler services agendas; the scheduler runs agendas from previous blocks if they didn't execute completely, in order of block number from earliest up until present

This means that an overweight task is not abandoned and will be run again soon, but also that there is absolutely nothing that we can do about it. There is no fallback for the current block if we run out of weight. This is the scheduler's behavior before this PR and I believe the retry mechanism has nothing to do with this.

I would move the retry handling one level above, for this you just need to add for Ok varian an info if it failed or not.

Most of the logic is common for failed or successful tasks, there would be a lot of copied code. But if you just want to refactor this, I could move the common logic out of the match.

or at least take it before execute_dispatch

Tasks are scheduled for retries if they actually run but fail in their execution. By definition, there is no way to handle this before running execute_dispatch.

I thought that the task is removed if Unavailable, but now I see that we keep it in the storage. This is why I proposed to remove the retry as well. And also I see now that Overweight does not change it's address.

muharem · 2024-02-13T03:01:14Z

substrate/frame/scheduler/src/lib.rs

+		/// derived from the original task's configuration, but will have a lower value for
+		/// `remaining` than the original `total_retries`. Tasks scheduled as a result of a retry of


can you explain why you think it should be this way?
the way I think about it, is I want some call to be executed 5 times, and when any of it fail, I wanna retry it 3 times. I wanna be sure that every call will be given chance to be retried 3 times. If first exhausts all retries for some unexpected reason, I do not wanna leave the rest calls without retries. It is hard for me to model such situation.

The way you described it is the way it works.

A periodic task that fails at any point during its periodic execution will have a clone of itself scheduled as a retry. That clone will have a retry configuration attached with remaining being total_retries - 1 at first. The original task still has its own unchanged retry configuration. The retry task will keep being retries until it runs out of retries. The periodic task keeps running periodically, potentially spawning other retry clones if it fails again.

For non-periodic tasks, there is no clone, they become the retry task because there no point in keeping clones around for something which will never run again.

In short, the point of the doc comment is to say that a retry configuration with remaining == total_retries means it's the original task. A retry configuration with remaining < total_retries means it's a retry clone.

I will try to make it clearer in the docs.

muharem · 2024-02-13T03:24:10Z

I do not know your motivation for a RP from you fork, but it makes it harder for me to pool you branch and review from my local machine.

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

substrate/frame/scheduler/src/lib.rs

muharem · 2024-02-15T06:31:10Z

substrate/frame/scheduler/src/lib.rs

@@ -1124,11 +1258,17 @@ impl<T: Config> Pallet<T> {
 			},
 			Err(()) => Err((Overweight, Some(task))),


I thought that the task is removed if Unavailable, but now I see that we keep it in the storage. This is why I proposed to remove the retry as well. And also I see now that Overweight does not change it's address.

substrate/frame/scheduler/src/lib.rs

muharem · 2024-02-15T06:46:54Z

substrate/frame/scheduler/src/lib.rs

+		if weight
+			.try_consume(T::WeightInfo::schedule_retry(T::MaxScheduledPerBlock::get()))
+			.is_err()
+		{
+			Self::deposit_event(Event::RetryFailed {
+				task: (when, agenda_index),
+				id: task.maybe_id,
+			});
+			return;
+		}


I meant, that it would be better to have this in service_task context. Now the schedule_retry is failable/no-op, which is not clear from service_task context.

The downside is that you can't know in service_task ahead of time if the task will fail and that it has a retry config. You could check that you can consume the weight anyway, but you might be preventing users from running their tasks when it wasn't necessary.

substrate/frame/scheduler/src/lib.rs

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

substrate/frame/scheduler/src/lib.rs

NachoPal

LGTM. Just wondering if you considered extending Scheduled struct with RetryConfig fields instead of creating a new Retries storage item. In that way it would natively support retries when schedule() without the need of doing it in two steps (schedule() + set_retry())

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

georgepisaltu · 2024-02-16T10:22:21Z

LGTM. Just wondering if you considered extending Scheduled struct with RetryConfig fields instead of creating a new Retries storage item. In that way it would natively support retries when schedule() without the need of doing it in two steps (schedule() + set_retry())

I considered it but I decided against it. This is a niche mechanic which we don't expect to be widely used right now. Doing it in Scheduled would bloat existing Scheduled entries. It's not backwards compatible in terms of storage (would require a migration) or API. It makes more sense for the few users of this mechanic to pay extra than for everyone to pay more for something that is not a core feature. If, in the future, we find that a lot of people are doing retries, we can reconsider in order to optimize costs.

Fixes paritytech#3014 This PR adds retry mechanics to `pallet-scheduler`, as described in the issue above. Users can now set a retry configuration for a task so that, in case its scheduled run fails, it will be retried after a number of blocks, for a specified number of times or until it succeeds. If a retried task runs successfully before running out of retries, its remaining retry counter will be reset to the initial value. If a retried task runs out of retries, it will be removed from the schedule. Tasks which need to be scheduled for a retry are still subject to weight metering and agenda space, same as a regular task. Periodic tasks will have their periodic schedule put on hold while the task is retrying. --------- Signed-off-by: georgepisaltu <george.pisaltu@parity.io> Co-authored-by: command-bot <>

georgepisaltu added 4 commits January 25, 2024 13:17

Add option to retry task in scheduler

328d0cd

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Add unit tests for retry scheduler

6f56d9f

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Add benchmarks for retry scheduler

22800c5

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Add some docs to retry functions

9315c2e

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

georgepisaltu requested a review from muharem January 25, 2024 11:19

georgepisaltu self-assigned this Jan 25, 2024

georgepisaltu requested review from a team January 25, 2024 11:19

Remove redundant clone

5bbd31e

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

ggwpez reviewed Jan 25, 2024

View reviewed changes

polkadot/runtime/westend/src/weights/pallet_scheduler.rs Outdated Show resolved Hide resolved

georgepisaltu added 2 commits January 25, 2024 16:20

Add real weights to scheduler pallet

3eb8016

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Merge remote-tracking branch 'upstream/master' into retry-schedule

9ecae46

paritytech-review-bot bot requested a review from a team January 25, 2024 14:22

ggwpez mentioned this pull request Jan 25, 2024

add bench-all-pallet command paritytech/command-bot-scripts#66

Merged

ggwpez reviewed Jan 25, 2024

View reviewed changes

substrate/frame/scheduler/src/lib.rs Outdated Show resolved Hide resolved

ggwpez reviewed Jan 25, 2024

View reviewed changes

substrate/frame/scheduler/src/lib.rs Outdated Show resolved Hide resolved

substrate/frame/scheduler/src/lib.rs Show resolved Hide resolved

substrate/frame/scheduler/src/lib.rs Show resolved Hide resolved

".git/.scripts/commands/bench-all/bench-all.sh" --pallet=pallet_sched…

fcff15c

…uler

paritytech-review-bot bot requested a review from a team January 25, 2024 15:47

command-bot added 4 commits January 25, 2024 16:07

".git/.scripts/commands/bench-all/bench-all.sh" --pallet=pallet_sched…

a8fd732

…uler

Merge branch 'master' of https://github.com/paritytech/polkadot-sdk i…

3e6e77e

…nto HEAD

".git/.scripts/commands/bench-all/bench-all.sh" --pallet=pallet_sched…

9120d60

…uler

".git/.scripts/commands/bench-all/bench-all.sh" --pallet=pallet_sched…

4167d6f

…uler

georgepisaltu added 3 commits January 26, 2024 15:56

Use TaskAddress in set_retry

816906a

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Merge remote-tracking branch 'upstream/master' into retry-schedule

8a28ea5

Add prdoc

900d2d5

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Merge remote-tracking branch 'upstream/master' into retry-schedule

a43df52

muharem reviewed Feb 3, 2024

View reviewed changes

georgepisaltu added 2 commits February 7, 2024 21:20

Make retries independent of periodic runs

9da58c6

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Merge remote-tracking branch 'upstream/master' into retry-schedule

2976dac

muharem reviewed Feb 13, 2024

View reviewed changes

georgepisaltu added 5 commits February 13, 2024 16:38

Small refactoring

8ce66f7

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Add cancel_retry extrinsics

dc9ef2e

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Merge remote-tracking branch 'upstream/master' into retry-schedule

945a095

Add e2e unit test for retry schedule

39eb209

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Merge remote-tracking branch 'upstream/master' into retry-schedule

2290240

muharem reviewed Feb 15, 2024

View reviewed changes

georgepisaltu added 3 commits February 15, 2024 10:09

Simplify schedule_retry

50a2010

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Add docs for as_retry

e9cc27e

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Merge remote-tracking branch 'upstream/master' into retry-schedule

497100c

NachoPal approved these changes Feb 15, 2024

View reviewed changes

substrate/frame/scheduler/src/lib.rs Outdated Show resolved Hide resolved

substrate/frame/scheduler/src/lib.rs Outdated Show resolved Hide resolved

substrate/frame/scheduler/src/lib.rs Outdated Show resolved Hide resolved

NachoPal approved these changes Feb 15, 2024

View reviewed changes

georgepisaltu added 3 commits February 16, 2024 11:50

Update doc comments for set_retry

863bec7

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

Merge remote-tracking branch 'upstream/master' into retry-schedule

7a16648

Move common logic under do_cancel_retry

b72da0c

Signed-off-by: georgepisaltu <george.pisaltu@parity.io>

georgepisaltu added this pull request to the merge queue Feb 16, 2024

Merged via the queue into paritytech:master with commit 9346019 Feb 16, 2024
129 of 130 checks passed

georgepisaltu deleted the retry-schedule branch February 16, 2024 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retry mechanics to `pallet-scheduler` #3060

Add retry mechanics to `pallet-scheduler` #3060

georgepisaltu commented Jan 25, 2024

mordamax commented Jan 25, 2024

command-bot bot commented Jan 25, 2024 •

edited

Loading

command-bot bot commented Jan 25, 2024

mordamax commented Jan 26, 2024

command-bot bot commented Jan 26, 2024

muharem Feb 3, 2024

muharem Feb 3, 2024

georgepisaltu Feb 6, 2024

georgepisaltu Feb 6, 2024

georgepisaltu Feb 7, 2024

muharem Feb 13, 2024

georgepisaltu Feb 13, 2024

muharem Feb 13, 2024

georgepisaltu Feb 13, 2024

muharem Feb 15, 2024

muharem Feb 13, 2024

georgepisaltu Feb 13, 2024

georgepisaltu Feb 15, 2024

muharem commented Feb 13, 2024

muharem Feb 15, 2024

muharem Feb 15, 2024

georgepisaltu Feb 15, 2024

NachoPal left a comment

georgepisaltu commented Feb 16, 2024

		@@ -1124,11 +1258,17 @@ impl<T: Config> Pallet<T> {
		},
		Err(()) => Err((Overweight, Some(task))),

		/// derived from the original task's configuration, but will have a lower value for
		/// `remaining` than the original `total_retries`. Tasks scheduled as a result of a retry of

Add retry mechanics to pallet-scheduler #3060

Add retry mechanics to pallet-scheduler #3060

Conversation

georgepisaltu commented Jan 25, 2024

mordamax commented Jan 25, 2024

command-bot bot commented Jan 25, 2024 • edited Loading

command-bot bot commented Jan 25, 2024

mordamax commented Jan 26, 2024

command-bot bot commented Jan 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muharem commented Feb 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NachoPal left a comment

Choose a reason for hiding this comment

georgepisaltu commented Feb 16, 2024

Add retry mechanics to `pallet-scheduler` #3060

Add retry mechanics to `pallet-scheduler` #3060

command-bot bot commented Jan 25, 2024 •

edited

Loading