Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review and revise exception handling within Action Scheduler #1059

Open
alewolf opened this issue May 14, 2024 · 21 comments
Open

Review and revise exception handling within Action Scheduler #1059

alewolf opened this issue May 14, 2024 · 21 comments
Assignees
Labels
priority: normal The issue/PR is normal priority—not many people are affected or there’s a workaround, etc.

Comments

@alewolf
Copy link

alewolf commented May 14, 2024

There are quite a few occasions when Action Scheduler experiences an unexpected condition, and (mostly for safety) it stops. It communicates about this using exceptions, but many of these exceptions:

  • Are not handled by Action Scheduler.
  • The design does not make it easy for parent plugins to handle them.

In most cases, this is 'fine' in the sense that the queue is processed in requests that are independent of user-facing requests. However, they could impact other scheduled events when running in the context of WP Cron.

Additionally, there is the problem noted below which is that only one version of Action Scheduler can be active at any one time. That is, only one parent plugin 'wins' this particular race, and if things go wrong—irrespective of the underlying reason—the backtraces often lead users back to that plugin's support team.

Let's review and revise when we throw exceptions, and when we should add handling for them.

  • We could consider logging deadlocks etc in Action Scheduler's own logging table (perhaps by referencing an action_id of 0 to indicate they are not related to a specific action). If we do this, we would also want to add:
    • A means of viewing any such global log entries.
    • Possibly, hooks that fire when these conditions are encountered.
  • There are other cases where we bail out for safety, but possibly are being 'too safe'. For instance, when cleaning up expired actions: if the action being removed is concurrently removed by another process, we currently bail out of the clean-up process. This is probably overkill.
  • Decisions like the above could also be made configurable/overrideable.

Original report follows...


I bundle the ActionScheduler version 3.7.4 in the Pixel Manager for WooCommerce.

As of recently we started to receive critical error reports like this:

2024-05-13T08:24:09+00:00 Critical Uncaught InvalidArgumentException: Unidentified action 19061390 in /nas/content/live/nck/wp-content/plugins/pixel-manager-pro-for-woocommerce/vendor/woocommerce/action-scheduler/classes/data-stores/ActionScheduler_DBStore.php:1086  
Additional context
{
    "error": {
        "type": 1,
        "file": "\/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/data-stores\/ActionScheduler_DBStore.php",
        "line": 1086
    },
    "backtrace": [
        "",
        "#0 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/abstracts\/ActionScheduler_Abstract_QueueRunner.php(120): ActionScheduler_DBStore->mark_failure(19061390)",
        "#1 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/abstracts\/ActionScheduler_Abstract_QueueRunner.php(98): ActionScheduler_Abstract_QueueRunner->handle_action_error(19061390, Object(Exception), 'Async Request', true)",
        "#2 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/ActionScheduler_QueueRunner.php(169): ActionScheduler_Abstract_QueueRunner->process_action(19061390, 'Async Request')",
        "#3 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/ActionScheduler_QueueRunner.php(139): ActionScheduler_QueueRunner->do_batch(25, 'Async Request')",
        "#4 \/nas\/content\/live\/nck\/wp-includes\/class-wp-hook.php(324): ActionScheduler_QueueRunner->run('Async Request')",
        "#5 \/nas\/content\/live\/nck\/wp-includes\/class-wp-hook.php(348): WP_Hook->apply_filters('', Array)",
        "#6 \/nas\/content\/live\/nck\/wp-includes\/plugin.php(517): WP_Hook->do_action(Array)",
        "#7 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/ActionScheduler_AsyncRequest_QueueRunner.php(52): do_action('action_schedule...', 'Async Request')",
        "#8 \/nas\/content\/live\/nck\/wp-content\/plugins\/woocommerce\/includes\/libraries\/wp-async-request.php(147): ActionScheduler_AsyncRequest_QueueRunner->handle()",
        "#9 \/nas\/content\/live\/nck\/wp-includes\/class-wp-hook.php(324): WP_Async_Request->maybe_handle('')",
        "#10 \/nas\/content\/live\/nck\/wp-includes\/class-wp-hook.php(348): WP_Hook->apply_filters('', Array)",
        "#11 \/nas\/content\/live\/nck\/wp-includes\/plugin.php(517): WP_Hook->do_action(Array)",
        "#12 \/nas\/content\/live\/nck\/wp-admin\/admin-ajax.php(207): do_action('wp_ajax_nopriv_...')",
        "#13 {main}",
        "thrown"
    ]
}
  1. In one particular case that I analysed the scheduled action didn't even come from the Pixel Manager. But, WooCommerce and the Pixel Manager both bundle version 3.7.4 of the ActionScheduler. It seems that the version shipped by the Pixel Manager is loaded first (probably because of the alphabetical order). The error path shows the Pixel Manager and we get all support requests.
  2. There is no helpful information in that error. In the case I analysed the action 19061390 exists and has just failed to execute.
  3. What's even more annoying is that one of our customers reached out to WooCommerce support and they reverted him to us, because the error was thrown from the ActionScheduler that's shipped in the Pixel Manager. Even though the action probably was not even scheduled by the Pixel Manager. Since WooCommerce is the maintainer of the ActionScheduler you should also support users who have issues with it.
  4. I will now revert back to a lower version of the ActionScheduler, just to avoid further support requests. But, you should really do something about this problem. All other plugins that ship version 3.7.4 will have the same issue. Unwarranted support requests because the ActionScheduler doesn't handle failed actions gracefully and logs them in the fatal-errors log.
@barryhughes
Copy link
Member

barryhughes commented May 17, 2024

Hi @alewolf,

Are you quite sure the problem is specifically tied to 3.7.4? This was a fairly minimal set of changes. What version did you switch to, and have you found that that has resolved the problems (at least in cases where WooCommerce, or other plugins shipping 3.7.4, are not present)?

There is no helpful information in that error.

We could probably finesse some of these, thank you for flagging. For what it's worth, errors of this type may point to a race condition between two queue runners, or between Action Scheduler's own queue runners and some custom code.

What's even more annoying is that one of our customers reached out to WooCommerce support and they reverted him to us, because the error was thrown from the ActionScheduler that's shipped in the Pixel Manager. Even though the action probably was not even scheduled by the Pixel Manager. Since WooCommerce is the maintainer of the ActionScheduler you should also support users who have issues with it.

I'm sure this can be annoying, but things are not always quite as clear cut as you are suggesting. We do support Action Scheduler, here and to a lesser extent in the WordPress community forums, but Action Scheduler is a conduit for processing work: errors surface through it, but not always because of it.

With your experience, you may understand when and how to make the distinction, but I'm not sure it's realistic to expect large support teams to make the right call 100% of the time. Even in this case, it's not completely clear that the problem has its origin in Action Scheduler.

I will now revert back to a lower version of the ActionScheduler, just to avoid further support requests. But, you should really do something about this problem. All other plugins that ship version 3.7.4 will have the same issue.

These exceptions are not new, and as above I'm not convinced they are tied to 3.7.4 specifically. Definitely let me know if I'm missing something obvious, here.

Unwarranted support requests because the ActionScheduler doesn't handle failed actions gracefully and logs them in the fatal-errors log.

Though I can take some guesses as to what you'd like to see here, I'd much rather get your informed opinion. When an unexpected condition is met, should it: fail silently, log the errors somewhere else, carry on regardless? When should we let site operators know of problems, and when should we suppress them?

@alewolf
Copy link
Author

alewolf commented May 18, 2024

Hi @barryhughes

Are you quite sure the problem is specifically tied to 3.7.4? This was a fairly minimal set of changes. What version did you switch to, and have you found that that has resolved the problems (at least in cases where WooCommerce, or other plugins shipping 3.7.4, are not present)?

I upgraded from 3.7.2 to 3.7.4, then the support requests started.

I can't say if it is bound to 3.7.4 or which version exactly. I only can say when upgrading to the same version as what's bundled in WooCommerce, then my bundled version will take precedence. And whatever code triggers those errors, users will contact our support because the ActionScheduler loads from our bundle and the Pixel Manager is visible in the path.

These exceptions are not new, and as above I'm not convinced they are tied to 3.7.4 specifically. Definitely let me know if I'm missing something obvious, here.

As explained in my earlier post, I've downgraded to one version below WooCommerce. As expected it will now hit another install. Now Rank Math will get those support requests:

Uncaught InvalidArgumentException: Unidentified action 19146968 in /nas/content/live/nck/wp-content/plugins/seo-by-rank-math/vendor/woocommerce/action-scheduler/classes/data-stores/ActionScheduler_DBStore.php:1086

Though I can take some guesses as to what you'd like to see here, I'd much rather get your informed opinion. When an unexpected condition is met, should it: fail silently, log the errors somewhere else, carry on regardless? When should we let site operators know of problems, and when should we suppress them?

  • Add more information to the exception message. I only can say that there are more failed ActionScheduler tasks, but the exception triggers only on a subset. I looked into the database and compared a few. I couldn't deduct a pattern that would explain why the exception triggers on some.

  • If WooCommerce is active you could log it into a separate source, not into the fatal-error log.

  • I dont' think it should fail silently. You need the telemetry to be able to find what's going on.

  • When more than one plugin is loading the same version of the ActionScheduler as the one bundled in WooCommerce, then the one bundled with WooCommerce should take precedence.

@barryhughes
Copy link
Member

barryhughes commented May 23, 2024

  • When more than one plugin is loading the same version of the ActionScheduler as the one bundled in WooCommerce, then the one bundled with WooCommerce should take precedence.

This doesn't feel like the right direction, to me.

I'm curious, though: in the case of your plugin it seems like WooCommerce is a requirement. What prevents you from using the WC Queue, or from using WooCommerce's copy of Action Scheduler directly (vs bundling your own)?

@alewolf
Copy link
Author

alewolf commented May 23, 2024

What prevents you from using the WC Queue, or from using WooCommerce's copy of Action Scheduler directly (vs bundling your own)?

Good question.

Technically the Pixel Manager can also be used on installs where WooCommerce is not running (because some of our users don't want to use more than one tagging solution).

Until now I haven't thought about loading the Action Scheduler conditionally, depending if WooCommerce is active or not.

However, I fear that it might not be possible to implement conditional loading.

The Action Scheduler needs to be required very early in the loading process. If I understand correctly this is to make sure that the Action Scheduler can check if only one, and only the most recent version is loaded, in case more than one plugin tries to load it. However, it is not possible to check if WooCommerce is active so early.

Let me know if you think this is possible. If so, I will change the logic and only load it conditionally and let WooCommerce take over.

@barryhughes
Copy link
Member

Technically the Pixel Manager can also be used on installs where WooCommerce is not running.

Gotcha, I misunderstood in that case.

Let me know if you think this is possible.

It probably is possible, but given the context I suspect the potential risk and complexity may not be worth the gain.

@vedanshujain vedanshujain added the priority: normal The issue/PR is normal priority—not many people are affected or there’s a workaround, etc. label May 27, 2024
@barryhughes
Copy link
Member

Occasionally seeing a failed action with this message:

action failed via WP Cron: Unable to update the status of action ###### to in-progress.

Did some Googling etc but not able to find any real explanation or solution.

This was reported on the community forums. Noting it here because, similar to the work we did in #1060, we can probably add additional context to make things clearer.

@barryhughes barryhughes changed the title Version 3.7.4 regularly throws InvalidArgumentException Review and revise exception handling within Action Scheduler Aug 12, 2024
@barryhughes
Copy link
Member

Updated title and description. We're hoping to look at options here soon.

@crstauf
Copy link
Contributor

crstauf commented Aug 13, 2024

@barryhughes Would this error be one that you would expect an author of a plugin (bundling AS) to handle?

Uncaught InvalidArgumentException: Unidentified action 19146968

That is to say, if a customer came to WooCommerce support about the above error, would your internal process direct to investigate further, or simply send away to whichever plugin bundled Action Scheduler?

Certainly there are errors that are because of Action Scheduler, and not because of the bundling plugin. Perhaps including in the error message direction on who is the best first contact would be helpful. Off the top of my head, this could potentially be determined by checking $wp_filter[ $hook ] for the callbacks on the scheduled action.

I think @alewolf has a very valid point, that plugin support teams can potentially get flooded with support requests that are not related to their plugin, and everyone involved should be do all they can to keep that from happening, especially Action Scheduler, being the common technology. If it's not handled well, then plugin authors will do exactly as @alewolf has done, and use older versions in an attempt to avoid the support burden. Not exactly a winning approach to encourage developers to use the latest version.

@crstauf
Copy link
Contributor

crstauf commented Aug 13, 2024

@alewolf In terms of conditionally loading, the introduction of Plugin Dependencies in WP 6.5 may give you a more viable approach:

/**
 * Plugin name: Pixel Manager for WooCommerce
 * Requires Plugins: action-scheduler
 */

...

if ( version_compare( $wp_version, '6.5' ) < 0) {
    // require bundled AS
}

This does add an additional step for site admins, but is an option to use the latest version while also alleviating support tickets. As a site admin, I always prefer installing AS explicitly anyways; maybe there are more like me.

@barryhughes
Copy link
Member

Certainly there are errors that are because of Action Scheduler

Definitely. Others are not, and yet others occupy a grey area somewhere in-between (and relate to the volume of work being processed, alongside the available resources).

I think @alewolf has a very valid point.

Yep, so do we, hence the latest update.

@crstauf
Copy link
Contributor

crstauf commented Aug 14, 2024

Yep, so do we, hence the latest update.

Oh sorry, I thought this ticket was generalized to changing exception handling and messages, not specific to alleviating false support tickets.

@barryhughes
Copy link
Member

Yes—that's the primary technical focus. The support aspect is one of the motivations for doing this, and something we'll look to alleviate through those changes.

@barryhughes
Copy link
Member

barryhughes commented Aug 23, 2024

ActionScheduler_ActionFactory::repeat()
    classes/ActionScheduler_ActionFactory.php:233

↑ This is already caught, by ActionScheduler_Abstract_QueueRunner::schedule_next_instance(), which is the only location where we call it. Third parties invoking this directly are responsible for their own exception handling 🟢

@barryhughes
Copy link
Member

barryhughes commented Aug 23, 2024

ActionScheduler_Store::validate_args()
    classes/abstracts/ActionScheduler_Store.php:324

↑ This is already caught, by ActionScheduler_DBStore::fetch_action(), which is the only location where we call it. Third parties invoking this directly are responsible for their own exception handling 🟢

(Edit: 2 exceptions are thrown from this method, but both are covered.)

@barryhughes
Copy link
Member

barryhughes commented Aug 23, 2024

ActionScheduler_Store::validate_action()
    classes/abstracts/ActionScheduler_Store.php:359

↑ This is already caught, by ActionScheduler_DBStore::save_action_to_db() Third parties invoking this directly are responsible for their own exception handling 🟢

@barryhughes
Copy link
Member

barryhughes commented Aug 23, 2024

ActionScheduler_Abstract_QueueRunner::process_action()
    classes/abstracts/ActionScheduler_Abstract_QueueRunner.php:64
ActionScheduler_Abstract_QueueRunner::process_action()
    classes/abstracts/ActionScheduler_Abstract_QueueRunner.php:95

↑ Both of the above are part of the current nested try/catch used for error handling. The exceptions are caught in the very same structure, in the same method (but we can probably simplify this now that we don't need to support PHP 5.6) 🟠

Let's review the nested try/catch. Low priority.

@barryhughes
Copy link
Member

ActionScheduler_Abstract_ListTable::get_bulk_actions()
    classes/abstracts/ActionScheduler_Abstract_ListTable.php:162

↑ Unclear who/what we're guarding against here. I'd think if a bulk action lacks a valid callback, we should log that but simply omit it from the returned array of possible actions.

Needs attention, low priority (I don't recall seeing reports about this specific exception) 🟠

@barryhughes
Copy link
Member

ActionScheduler_WPCLI_QueueRunner::__construct()
    classes/WP_CLI/ActionScheduler_WPCLI_QueueRunner.php:33

↑ This code should only run in the context of WP CLI. The exception seems justified, no follow-up needed at present 🟢

@barryhughes
Copy link
Member

ProgressBar::__construct()
    classes/WP_CLI/ProgressBar.php:47

↑ This code should only run in the context of WP CLI. The exception seems justified, no follow-up needed at present 🟢

@barryhughes
Copy link
Member

barryhughes commented Aug 23, 2024

ActionScheduler_DBStore::save_action_to_db()
    classes/data-stores/ActionScheduler_DBStore.php:113
ActionScheduler_DBStore::save_action_to_db()
    classes/data-stores/ActionScheduler_DBStore.php:118
ActionScheduler_DBStore::save_action_to_db()
    classes/data-stores/ActionScheduler_DBStore.php:126

↑ For all of the above, the exception is thrown from within a private method but we trigger this through various paths. In some cases, we catch the exception then log an error. In others, we do not and execution probably halts. Needs review 🔴

@barryhughes
Copy link
Member

ActionScheduler_DBStore::get_query_actions_sql()
    classes/data-stores/ActionScheduler_DBStore.php:397
ActionScheduler_DBStore::get_query_actions_sql()
    classes/data-stores/ActionScheduler_DBStore.php:455
ActionScheduler_DBStore::get_query_actions_sql()
    classes/data-stores/ActionScheduler_DBStore.php:470
ActionScheduler_DBStore::get_query_actions_sql()
    classes/data-stores/ActionScheduler_DBStore.php:493

↑ These are all within ActionScheduler_DBStore::get_query_actions_sql(). It's called from various public API functions like as_get_scheduled_actions() which lack protection against the exceptions. These particular exceptions seem somewhat reasonable, they guard against misuse that should generally be caught during development, but that can't be guaranteed. Needs review 🟠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: normal The issue/PR is normal priority—not many people are affected or there’s a workaround, etc.
Projects
None yet
Development

No branches or pull requests

4 participants