Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set Kedro Viz start method to spawn process while launching it from Jupyter Notebook #1696

Merged
merged 6 commits into from
Jan 12, 2024

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented Jan 2, 2024

Description

Resolves #1656

Development notes

  • Add multiprocess start method to spawn
  • Add mypy type ignore for add_route and add_websocket_route due to strict mypy check introduced here in starlette

QA notes

  • All tests should pass
  • Launch Kedro Viz via jupyter notebook
  • Run line magic %run_viz

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla changed the title Fix fork process start on Kedro Viz Jupyter launcher Set Kedro Viz start method to spawn process while launching it from Jupyter Notebook Jan 4, 2024
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review January 5, 2024 07:35
@ravi-kumar-pilla ravi-kumar-pilla requested review from jitu5 and merelcht and removed request for tynandebold and yetudada January 5, 2024 07:35
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, but my question is does it matter to viz to have the Spark context initialised? Is it being used to read data?

@rashidakanchwala
Copy link
Contributor

Looks fine to me, but my question is does it matter to viz to have the Spark context initialised? Is it being used to read data?

Ravi would have more clarity on this but from what it looks like is that if Spark context is not intiatilised; the spark dataset is loaded as a MemoryDataSet on viz.

@ravi-kumar-pilla
Copy link
Contributor Author

Looks fine to me, but my question is does it matter to viz to have the Spark context initialised? Is it being used to read data?

Hi Nok, It should not matter to viz but in Viz 6.7.0 we introduced a dataset exists check which forced the discovery of all the datasets. Since on unix forked process is default, this check caused an issue as viz process is a copy. Starting Viz as a new process would not have this dependency. The issue is explained here in detail.

Also, for your information the dataset exists check was removed in v7.0.0 as the check introduced other time out issues. So the Spark re-initialization issue should be solved in v7.0.0. However, this change will force viz to always start as a new process irrespective of the OS.

Thank you

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@ravi-kumar-pilla ravi-kumar-pilla merged commit 92836db into main Jan 12, 2024
18 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the fix/forked-process branch January 12, 2024 10:01
@jitu5 jitu5 mentioned this pull request Jan 22, 2024
5 tasks
@ravi-kumar-pilla ravi-kumar-pilla mentioned this pull request Oct 14, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SparkContext not reinitialized due to forked process start
4 participants