Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebooks executed in Lab then papermill (nbclient) display incorrect durations #60

Closed
kevin-bates opened this issue Jan 13, 2022 · 4 comments

Comments

@kevin-bates
Copy link

kevin-bates commented Jan 13, 2022

While investigating elyra-ai/elyra#2387 I've found that notebooks executed via papermill (which derives from nbclient) doesn't record shell.execute_reply.started in the cell's execution metadata. I'm assuming this is because shell.execute_reply.started is not in the notebook format spec. This results in incorrect duration results since shell.execute_reply.started does exist albeit relative to the last time the notebook was executed via Lab.

Since an execute_input response is essentially when the cell's execution is started (using ipykernel as an example), I'm hoping we could drop the use of shell.execute_reply.started and replace the cell's start calculation with the fallback code already in place.

Would that be a change the project would be willing to accept?

Also, since execute_input may not be available (e.g., if silent is enabled, again in ipykernel), should the in clause be discarded if the duration cannot be determined?

nbclient also uses local timestamps for each of the timing entries, but that's a different issue.

@mlucool
Copy link
Member

mlucool commented Jan 13, 2022

Thanks for the detailed issue!

This results in incorrect duration results since shell.execute_reply.started does exist albeit relative to the last time the notebook was executed via Lab.

Based on above, it feels like something (papermill?) should clear all the execution metadata when rerunning a cell. This ensures that no old metadata is left around which may cause other bugs (e.g. at some point in time there is a new start with a previous end). If there is a valid reason that this is not possible, we'd be happy to accept a PR.

Also, since execute_input may not be available (e.g., if silent is enabled, again in ipykernel), should the in clause be discarded if the duration cannot be determined?

If there are cases where we can have an end but not start, we'd be happy to accept a PR to fix that wording!

@kevin-bates
Copy link
Author

Hi @mlucool,

Based on above, it feels like something [nbclient] should clear all the execution metadata when rerunning a cell. This ensures that no old metadata is left around which may cause other bugs (e.g. at some point in time there is a new start with a previous end). If there is a valid reason that this is not possible, we'd be happy to accept a PR.

That's a fair point. Do you know if Lab also clears the timing data?

I only started here because the user experience could be fixed with the proposed change alone which, if Lab is updated to discontinue using shell.execute_reply.started, will be required anyway.

@mlucool
Copy link
Member

mlucool commented Jan 13, 2022

It does when you clear the notebook via clear execution. It also does when you execute a cell.

@mlucool
Copy link
Member

mlucool commented Jan 19, 2022

Closing per jupyter/nbclient#195

@mlucool mlucool closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants