Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The submission_environment_dependencies.txt file does not get staged when running with Flink runner on Dataproc #32743

Open
1 of 17 tasks
liferoad opened this issue Oct 10, 2024 · 5 comments · Fixed by #32752
Assignees

Comments

@liferoad
Copy link
Collaborator

liferoad commented Oct 10, 2024

What happened?

In some cases, "submission_environment_dependencies.txt" might not be staged.

#32752 added a workaround to ignore the error for a missing artifact, but we should rootcause why it didn't get staged.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@tvalentyn
Copy link
Contributor

Do we crash? Seems like we should just print sth:

bufLogger.Printf(ctx, "couldn't fetch the submission environment dependencies: %v", err)

@liferoad
Copy link
Collaborator Author

Flink on Dataproc returns this:
failed to retrieve staged files: failed to retrieve /tmp/staged in 3 attempts: failed to retrieve chunk for /tmp/staged/submission_environment_dependencies.txt
And the job then failed.

@tvalentyn
Copy link
Contributor

ok, then the problem concerns the area of materialization of staged artifacts - we have a file that is being added to a manifest, but then not available when we try to materialize it.

It should either not be staged (and not included in the manifest), or be available in the staging location.

@tvalentyn
Copy link
Contributor

Workaround: supply --experiments=disable_logging_submission_environment

'disable_logging_submission_environment')):

@github-actions github-actions bot added this to the 2.61.0 Release milestone Oct 11, 2024
@tvalentyn tvalentyn reopened this Oct 11, 2024
@tvalentyn tvalentyn changed the title [Bug]: when submission_environment_dependencies.txt somehow does not exist, the error should be ignored [Bug]: The submission_environment_dependencies.txt file does not get staged when running with Flink runner on Dataproc Oct 11, 2024
@tvalentyn tvalentyn removed this from the 2.61.0 Release milestone Oct 11, 2024
@liferoad liferoad self-assigned this Oct 11, 2024
@liferoad
Copy link
Collaborator Author

Thanks, @tvalentyn. Let me investigate this later when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants