Bulk rerun work orders (from start) #659

amberrignell · 2023-02-22T14:53:27Z

User story

As a user that had a workflow which was running successfully, who then realised the logic isn't what I wanted, I would like to be able to bulk reprocess workflows, so that each workorder can be rerun with my updated logic.

Details

Allow users to select work orders from the history page
Allow users to reprocess the work orders they have selected

Selecting work orders
Using the following component, add a checkbox column to the history page.

There should be a checkbox in the first column (to the left of 'Workflow').
Every work order should have a checkbox, unselected by default
When the user selects the top-level checkbox in the column heading, all work orders on the page get selected
A user can select several individual work orders using the checkboxes within the work order rows

If a user has selected some work orders, then changes the filters, their selection is lost.

Reprocessing work orders

When a user selects either the top-level checkbox OR work order level checkboxes, a 'rerun from start' button should appear.
When the user clicks 'Rerun from start', the user should see a pop up modal which confirms which work orders have been selected (how many work orders will be reprocessed, and the search filters used to select these work orders)

Use the following modal component

A.Top-level select
Message: You've selected the number_of_workorders_on_page work orders from page age_number of results for workorders matching the following query search_params. There are a total of number_of_workorders_matching_query work orders matching this query.

Example: You've selected the 12 work orders from page 1 of results for workorders matching the following query: received between x and y date, for workflow A, with a status of 'Failed'. There are a total of 32 work orders matching this query.

Buttons: Cancel, Rerun all work orders on this page from start, Rerun all work orders matching the query.

B.Individual work orders selected
Message: You've selected number_of_selected_work_orders work orders to rerun from the start (first job). This will create a new attempt for these within the same work order
Example: You've selected 12 work orders to reprocess from the start (first job). This will create a new attempt for these within the same work order.
Buttons: Cancel, Rerun from start

When the user clicks 'Rerun from start', the users gets an alert "xx work orders will be bulk rerun" (so that they have feedback that their action was applied), and the work orders all get reprocessed in the same order in which they were received

Implementation notes

Please read through the platform-app implementation first
Write a function that can take a list of work order IDs and create a new attempt for each one, in the order in which they have been passed to the function.
Write a function that can take a query, and create a new attempt for each work order returned by the query, in the order in which they are returned.

The function rerun_run_from_start parameters would be:
source: (list of IDs, or parameters)
user: (used to add to the invocation reason)

Release notes

Bulk rerun work orders from start

User acceptance criteria

I can 'select all' work orders on the history page
Work orders get reprocessed in the order in which they were received
when I click 'Rerun from start', I see a pop-up which confirms how many work orders will be reprocessed, and the search filters used to select these work orders (ex: Reprocess 12 work orders which were received between x and y date, for workflow A, with a status of 'Failed')
If a user changes the filters, they lose their selected items.
If I click the top level select, all workorders are selected. If I unselect one, the top-level select is also unselected. If I click unselect from the top level, all of them are unselected.

The text was updated successfully, but these errors were encountered:

taylordowns2000 · 2023-04-06T12:23:30Z

@amberrignell , tailwind component for this where employee rows are workorders : https://tailwindui.com/components/application-ui/lists/tables#component-390db22a862687fe88b313692989ffaf

note that this requires some JS

taylordowns2000 · 2023-05-29T15:12:02Z

Useful naming content:

taylordowns2000 · 2023-05-29T15:19:46Z

Possible approach:

see how "rerun" works currently in the liveview and/or service.
determine how a "batch" or a "query" of workorders could all be "rerun" using the current backend (maybe: is there a function that takes a list or workorder IDs and creates a new attempt for each workorder from the first job in the workflow with the initial state.)
write that function
then, consider how to pass workorder IDs from the UI (when i've selected 3 workorders) to that function.
for the modal, consider how to use the applied query (or "total" count as opposed to the paginated view) to give the user the option to send just those IDs select or ALL IDs regardless of current pagination.
send them for execution, display toast notification.

sigu · 2023-06-12T01:11:43Z

How would you like the modal to be rendered, using the petal components or using liveview components?
We already have the list of work orders to be rerun.
Maybe get the first attempt and first run then rerun, use inserted_at to determine the oldest of each.
WordOrderService looks like a good place to restart the job from. I am currently thinking of calling retry_attempt_run(attempt_run, user) passing it an attempt run determined by 3 above.

sigu · 2023-06-12T01:20:03Z

Why do we need the following tables?

Attempt runs - currently assuming its a join table of attempts and runs
Jobs - what the big difference between this and a Run?
Dataclips - No idea at the moment, if we need to adjust anything in it I might need to have an idea.

For this assignment, should we change the invocation reason? (I am still not sure what invocation reason is , just inferred from the name)

amberrignell · 2023-06-12T14:09:38Z

Hey @sigu , thanks for the questions!

To give a quick answer,

Correct, attempt runs are a join table of attempts and runs (this is because two different attempts could have the same runs (if they get copied over for a “rerun from job 2” for example
A job is the adaptor, credential and job expression that a user has defined. A run is when that job has been run with some data (state)
A dataclip is the data that is used to run a job with

We'll see you in standup tomorrow so Stu can give you more details then, hope this helps in the meantime!

midigofrank · 2023-06-14T08:42:10Z

Hey @amberrignell @taylordowns2000 , should the rerun from start button get disabled / hidden when you don't have permission? Or should we allow the users to click around and get the notification error message

taylordowns2000 · 2023-06-14T08:49:50Z

hey @midigofrank , who doesn't have permission? (have you created a policy for this action? sorry, i have not seen anything related to permissions in the issue.)

midigofrank · 2023-06-14T08:55:35Z

Hey @taylordowns2000 , I noticed there's a check for can_rerun_job when rerunning a single job. I thought maybe we could use the same check for the bulk rerun

  def handle_event(
        "rerun",
        %{"attempt_id" => attempt_id, "run_id" => run_id},
        socket
      ) do
    if socket.assigns.can_rerun_job do
      attempt_id
      |> AttemptService.get_for_rerun(run_id)
      |> WorkOrderService.retry_attempt_run(socket.assigns.current_user)

      {:noreply, socket}
    else
      {:noreply,
       socket
       |> put_flash(:error, "You are not authorized to perform this action.")}
    end
  end

taylordowns2000 · 2023-06-14T08:59:05Z

Cool. For now please allow the error flash to appear as we don't have a design that explains why a user isn't able to run it. We can consider adding this later.

midigofrank · 2023-06-14T09:02:42Z

Alright. Perfect

midigofrank · 2023-06-14T16:21:34Z

Hey @taylordowns2000 , how would you suggest we handle failures. Let's say a single attempt fails, should we abort all the other attempts?

taylordowns2000 · 2023-06-14T17:23:00Z

hi @midigofrank , great q 😄

No, the scope of this feature ends at Oban.insert. In other words, this feature is to "queue up" a bunch of attempts (same thing that clicking "rerun" really fast on a bunch of attempts does) but once they're on the Oban queue they're in a totally different domain.

taylordowns2000 · 2023-06-14T17:25:24Z

So.... sorry to be extra clear: if there is an error before you successfully enqueue those attempts with Oban.insert (what might this be???) you could display a flash message explaining that something broke and asking the user to try again, but I don't want to do that since this would certainly be an unexpected error and I'd rather see it bubble up on Sentry then be able to debug the code. That make sense?

midigofrank · 2023-06-14T17:55:04Z

Hey @taylordowns2000 . That makes sense. Currently, I've wrapped all the actions (including the Oban.insert) for all the selected WorkOrders to be retried in a single transaction. Assuming we have Work Orders A, B, C, should an error occur while still retrying A, B and C won't get processed at all. Should that be the case?

taylordowns2000 · 2023-06-14T19:44:02Z

Sounds like we might need a quick call to nail this down—let's chat tomorrow! In the meantime:

When you say "while still retrying" do you mean "while you're attempting to add them to Oban" or "while they are running"? Assuming the first, I think what you actually want is the Oban.multi.

      reducer = fn {changesets, index}, multi ->
        Oban.insert_all(multi, "run_batch_#{index}", changesets)
      end

      runs
      |> Enum.chunk_every(5_000)
      |> Enum.with_index()
      |> Enum.reduce(Ecto.Multi.new(), reducer)
      |> Repo.transaction()

      conn
      |> put_status(:ok)
      |> text(Enum.count(runs))

What we can chat about tomorrow is that if the user asks to enqueue 43,000 attempts, it would be nice to either have them all get enqueue or have none of them get enqueued. (We don't know and don't care if the new attempts succeed or fail as they are processed over the next several days... our job is DONE and we report back to the end user "All good... 43,000 enqueued!" as soon as they hit the queue.)

taylordowns2000 · 2023-06-15T08:10:36Z

@midigofrank , next steps here please:

rebase on main
get all tests passing with current implementation style (then ping me on github please!)
once tests are passing, optimize to use the Oban.insert_all so we're hitting the DB slightly less for a big bulk re-processing request

midigofrank · 2023-06-15T15:31:32Z

Hey @taylordowns2000 , might you have an idea why the Mimic library isn't getting loaded while testing? I keep getting the error below when I run mix test:

** (UndefinedFunctionError) function Mimic.copy/1 is undefined (module Mimic is not available)

taylordowns2000 · 2023-06-15T16:01:50Z

I haven't seen that problem pop up for anyone yet. Please push your most recent changes to your branch (assuming you've rebased, also check that box above!) and I'll fire it up.

Tests run OK for me as of this commit:

commit 04cdafe004f5aec7b19e7973166235640493f3b5 (HEAD -> 659-bulk-rerun-work-orders, origin/659-bulk-rerun-work-orders)
Author: Frank Midigo <midigofrank@gmail.com>
Date:   Wed Jun 14 12:58:21 2023 +0300

midigofrank · 2023-06-15T16:10:12Z

sure thing. Just pushed

taylordowns2000 · 2023-06-15T16:24:22Z

Eish, sorry @midigofrank , it looks ok on my machine and on circleCI. i'm not sure i'll be able to help.

What OS are you using? Have you seen this issue on other Elixir projects? Maybe try asdf install again just to make sure? @sigu , any ideas on this one?

midigofrank · 2023-06-15T16:54:06Z

Hey @taylordowns2000 , I'm on M1 MacOs

midigofrank · 2023-06-15T16:56:38Z

Let me try it on a different OS

taylordowns2000 · 2023-06-15T21:50:14Z

Hrmmm I'm on an M1 also.

midigofrank · 2023-06-16T09:06:04Z

@taylordowns2000 , I've been able to run the tests by setting MIX_ENV=test. I thought mix test does that for us 🤔

taylordowns2000 · 2023-06-16T09:19:10Z

Yeah it does. Maybe you've got MIX_ENV set to something else by default in your path?

…

On Fri, Jun 16, 2023 at 10:06 AM Midigo Frank ***@***.***> wrote: @taylordowns2000 <https://github.com/taylordowns2000> , I've been able to run the tests by setting MIX_ENV=test. I thought mix test does that for us 🤔 — Reply to this email directly, view it on GitHub <#659 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACCUBLK4X2OWNB3WFG4CZNDXLQOYRANCNFSM6AAAAAAVEN55UU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

midigofrank · 2023-06-16T09:24:31Z

Aha, you're right. I made a copy of the .env.example and MIX_ENV has been set to dev. Let me get rid of it.

midigofrank · 2023-06-16T18:04:25Z

Hey @taylordowns2000 , all tests are passing except for one I've added for the liveview page. This only happens when I have the Oban.insert(Pipeline Job) code as part of the Multi which does everything else.

could not checkout the connection owned by #PID<0.1124.0>

EDIT:
Full stack trace

** (EXIT from #PID<0.1124.0>) an exception was raised:
         ** (RuntimeError) ChildProcess task exited without a value:
     {%DBConnection.ConnectionError{message: "could not checkout the connection owned by #PID<0.1124.0>. When using the sandbox, connections are shared, so this may imply another process is using a connection. Reason: connection not available and request was dropped from queue after 510ms. You can configure how long requests wait in the queue using :queue_target and :queue_interval. See DBConnection.start_link/2 for more information", severity: :error, reason: :queue_timeout}, [{Ecto.Adapters.SQL, :raise_sql_call_error, 1, [file: 'lib/ecto/adapters/sql.ex', line: 913, error_info: %{module: Exception}]}, {Ecto.Repo.Schema, :apply, 4, [file: 'lib/ecto/repo/schema.ex', line: 756]}, {Ecto.Repo.Schema, :"-do_update/4-fun-3-", 15, [file: 'lib/ecto/repo/schema.ex', line: 459]}, {Lightning.Invocation, :update_run, 2, [file: 'lib/lightning/invocation.ex', line: 272]}, {Lightning.Pipeline.Runner.Handler, :"-start/2-fun-1-", 3, [file: 'lib/lightning/runtime/handler.ex', line: 74]}, {Task.Supervised, :invoke_mfa, 2, [file: 'lib/task/supervised.ex', line: 89]}, {Task.Supervised, :reply, 4, [file: 'lib/task/supervised.ex', line: 34]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]}
             (lightning 0.6.3) lib/lightning/runtime/handler.ex:122: Lightning.Pipeline.Runner.Handler.wait/1
             (lightning 0.6.3) lib/lightning/pipeline.ex:29: Lightning.Pipeline.process/1

midigofrank · 2023-06-19T10:33:29Z

Hey @taylordowns2000 , I found it easier to just move the implementation to "bulk mode" and leave out the Oban.insert operation from the main transaction.
All tests are now passing.

taylordowns2000 · 2023-06-19T10:44:50Z

great! please convert this to a full PR (it's a "draft" right now) and request a review from me and @stuartc ! we'll take a look tomorrow AM. do you know if @sigu is joining our final call tomorrow AM?

also, in prep for that call, have a look at the failed checks related to code coverage. thanks @midigofrank and speak soon

midigofrank · 2023-06-19T11:12:17Z

Got it. Given his current timezone, I'm not so sure. I'll confirm with him and let you know

amberrignell self-assigned this Feb 22, 2023

taylordowns2000 added this to the RC1 milestone Apr 1, 2023

amberrignell assigned taylordowns2000 Apr 6, 2023

amberrignell changed the title ~~Bulk reprocess work orders~~ Bulk rerun work orders Apr 11, 2023

amberrignell assigned taylordowns2000 and unassigned taylordowns2000 and amberrignell Apr 12, 2023

amberrignell changed the title ~~Bulk rerun work orders~~ Bulk rerun work orders (from start) Apr 27, 2023

taylordowns2000 removed their assignment Apr 27, 2023

amberrignell assigned elias-ba and mtuchi Apr 27, 2023

amberrignell unassigned mtuchi and elias-ba May 25, 2023

taylordowns2000 assigned sigu Jun 5, 2023

midigofrank mentioned this issue Jun 9, 2023

Bulk rerun work orders from start #865

Merged

11 tasks

stuartc closed this as completed in #865 Jun 22, 2023

taylordowns2000 mentioned this issue Jul 1, 2023

Bulk rerun workorders from a specific job #906

Merged

9 tasks

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk rerun work orders (from start) #659

Bulk rerun work orders (from start) #659

amberrignell commented Feb 22, 2023 •

edited

Loading

taylordowns2000 commented Apr 6, 2023 •

edited

Loading

taylordowns2000 commented May 29, 2023

taylordowns2000 commented May 29, 2023

sigu commented Jun 12, 2023

sigu commented Jun 12, 2023

amberrignell commented Jun 12, 2023

midigofrank commented Jun 14, 2023

taylordowns2000 commented Jun 14, 2023

midigofrank commented Jun 14, 2023

taylordowns2000 commented Jun 14, 2023

midigofrank commented Jun 14, 2023

midigofrank commented Jun 14, 2023

taylordowns2000 commented Jun 14, 2023

taylordowns2000 commented Jun 14, 2023

midigofrank commented Jun 14, 2023 •

edited

Loading

taylordowns2000 commented Jun 14, 2023

taylordowns2000 commented Jun 15, 2023 •

edited by midigofrank

Loading

midigofrank commented Jun 15, 2023

taylordowns2000 commented Jun 15, 2023

midigofrank commented Jun 15, 2023

taylordowns2000 commented Jun 15, 2023

midigofrank commented Jun 15, 2023

midigofrank commented Jun 15, 2023

taylordowns2000 commented Jun 15, 2023

midigofrank commented Jun 16, 2023

taylordowns2000 commented Jun 16, 2023 via email

midigofrank commented Jun 16, 2023

midigofrank commented Jun 16, 2023 •

edited

Loading

midigofrank commented Jun 19, 2023

taylordowns2000 commented Jun 19, 2023

midigofrank commented Jun 19, 2023

Bulk rerun work orders (from start) #659

Bulk rerun work orders (from start) #659

Comments

amberrignell commented Feb 22, 2023 • edited Loading

User story

Details

Implementation notes

Release notes

User acceptance criteria

taylordowns2000 commented Apr 6, 2023 • edited Loading

taylordowns2000 commented May 29, 2023

taylordowns2000 commented May 29, 2023

sigu commented Jun 12, 2023

sigu commented Jun 12, 2023

amberrignell commented Jun 12, 2023

midigofrank commented Jun 14, 2023

taylordowns2000 commented Jun 14, 2023

midigofrank commented Jun 14, 2023

taylordowns2000 commented Jun 14, 2023

midigofrank commented Jun 14, 2023

midigofrank commented Jun 14, 2023

taylordowns2000 commented Jun 14, 2023

taylordowns2000 commented Jun 14, 2023

midigofrank commented Jun 14, 2023 • edited Loading

taylordowns2000 commented Jun 14, 2023

taylordowns2000 commented Jun 15, 2023 • edited by midigofrank Loading

midigofrank commented Jun 15, 2023

taylordowns2000 commented Jun 15, 2023

midigofrank commented Jun 15, 2023

taylordowns2000 commented Jun 15, 2023

midigofrank commented Jun 15, 2023

midigofrank commented Jun 15, 2023

taylordowns2000 commented Jun 15, 2023

midigofrank commented Jun 16, 2023

taylordowns2000 commented Jun 16, 2023 via email

midigofrank commented Jun 16, 2023

midigofrank commented Jun 16, 2023 • edited Loading

midigofrank commented Jun 19, 2023

taylordowns2000 commented Jun 19, 2023

midigofrank commented Jun 19, 2023

amberrignell commented Feb 22, 2023 •

edited

Loading

taylordowns2000 commented Apr 6, 2023 •

edited

Loading

midigofrank commented Jun 14, 2023 •

edited

Loading

taylordowns2000 commented Jun 15, 2023 •

edited by midigofrank

Loading

midigofrank commented Jun 16, 2023 •

edited

Loading