Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vCPU allocation #126

Open
redhog opened this issue Dec 8, 2016 · 2 comments
Open

vCPU allocation #126

redhog opened this issue Dec 8, 2016 · 2 comments
Assignees

Comments

@redhog
Copy link
Contributor

redhog commented Dec 8, 2016

We have a curious case of under-allocation of vCPUs at a particular stage in our scio pipeline. At this stage, the vCPU allication is down to 7 while the pipeline is run with --maxNumWorkers=200

We have tested to run it with

--zone=us-central1-f --experiments=use_mem_shuffle --workerHarnessContainerImage=dataflow.gcr.io/v1beta3/java-batch:1.8.0-mm

with no difference.

At this stage, items in the SCollection have already been grouped and so the SCollection does contain way fewer elements than in the start of the pipeline, but there are still plenty ( >10k ). However, processing each item in this stage is CPU intensive (it does a DBSCAN clustering of items inside each group). As a matter of fact, this is the most cpu intensive part of the pipeline.

Code reference: https://github.com/GlobalFishingWatch/vessel-classification-pipeline/blob/84-cluster-anchorages/pipeline/anchorages/src/main/scala/Anchorages.scala#L380

@enriquetuya
Copy link
Contributor

@seacourtaw did you have any news from your google contacts about this one?

@enriquetuya enriquetuya assigned redhog and unassigned seacourtaw Jan 19, 2017
@enriquetuya
Copy link
Contributor

@redhog @seacourtaw based on what we talk with Amy, we need to try to turn off autoscaling and set to 200 workers (there are some known issues w/ autoscaling today)
If that still doesn't resolve the issue, grab the process ID and other details and pass along to Amy to involve other Google engineers to dig deeper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants