Notebooks executed in Lab then papermill (nbclient) display incorrect durations #60

kevin-bates · 2022-01-13T02:36:42Z

While investigating elyra-ai/elyra#2387 I've found that notebooks executed via papermill (which derives from nbclient) doesn't record shell.execute_reply.started in the cell's execution metadata. I'm assuming this is because shell.execute_reply.started is not in the notebook format spec. This results in incorrect duration results since shell.execute_reply.started does exist albeit relative to the last time the notebook was executed via Lab.

Since an execute_input response is essentially when the cell's execution is started (using ipykernel as an example), I'm hoping we could drop the use of shell.execute_reply.started and replace the cell's start calculation with the fallback code already in place.

Would that be a change the project would be willing to accept?

Also, since execute_input may not be available (e.g., if silent is enabled, again in ipykernel), should the in clause be discarded if the duration cannot be determined?

nbclient also uses local timestamps for each of the timing entries, but that's a different issue.

The text was updated successfully, but these errors were encountered:

mlucool · 2022-01-13T21:57:50Z

Thanks for the detailed issue!

This results in incorrect duration results since shell.execute_reply.started does exist albeit relative to the last time the notebook was executed via Lab.

Based on above, it feels like something (papermill?) should clear all the execution metadata when rerunning a cell. This ensures that no old metadata is left around which may cause other bugs (e.g. at some point in time there is a new start with a previous end). If there is a valid reason that this is not possible, we'd be happy to accept a PR.

Also, since execute_input may not be available (e.g., if silent is enabled, again in ipykernel), should the in clause be discarded if the duration cannot be determined?

If there are cases where we can have an end but not start, we'd be happy to accept a PR to fix that wording!

kevin-bates · 2022-01-13T23:17:40Z

Hi @mlucool,

Based on above, it feels like something [nbclient] should clear all the execution metadata when rerunning a cell. This ensures that no old metadata is left around which may cause other bugs (e.g. at some point in time there is a new start with a previous end). If there is a valid reason that this is not possible, we'd be happy to accept a PR.

That's a fair point. Do you know if Lab also clears the timing data?

I only started here because the user experience could be fixed with the proposed change alone which, if Lab is updated to discontinue using shell.execute_reply.started, will be required anyway.

mlucool · 2022-01-13T23:50:58Z

It does when you clear the notebook via clear execution. It also does when you execute a cell.

mlucool · 2022-01-19T19:45:06Z

Closing per jupyter/nbclient#195

This was referenced Jan 14, 2022

Clear execution metadata and use datetime from message header kevin-bates/nbclient#1

Closed

Clear execution metadata, prefer msg header date when recording times jupyter/nbclient#195

Merged

mlucool closed this as completed Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebooks executed in Lab then papermill (nbclient) display incorrect durations #60

Notebooks executed in Lab then papermill (nbclient) display incorrect durations #60

kevin-bates commented Jan 13, 2022 •

edited

Loading

mlucool commented Jan 13, 2022

kevin-bates commented Jan 13, 2022

mlucool commented Jan 13, 2022

mlucool commented Jan 19, 2022

Notebooks executed in Lab then papermill (nbclient) display incorrect durations #60

Notebooks executed in Lab then papermill (nbclient) display incorrect durations #60

Comments

kevin-bates commented Jan 13, 2022 • edited Loading

mlucool commented Jan 13, 2022

kevin-bates commented Jan 13, 2022

mlucool commented Jan 13, 2022

mlucool commented Jan 19, 2022

kevin-bates commented Jan 13, 2022 •

edited

Loading