Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support larger cluster #73

Merged
merged 5 commits into from
Oct 22, 2018
Merged

support larger cluster #73

merged 5 commits into from
Oct 22, 2018

Conversation

chenqin
Copy link
Contributor

@chenqin chenqin commented Oct 21, 2018

We run some experiments with xgboost4j-spark, found rabbit will throw FD assertion failure at around 350 executers. This is largely due to what #57 described and seem only need to change from select to poll would avoid such limit from OS side. In our experiment, we were able to scale to 1.5k executors on 12 Billions dataset after some tweaks here and there.

@CodingCat
Copy link
Member

thanks for the contribution, merge this in, credits also to @frenzykryger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants