Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support xgboost 2.0 #1219

Merged
merged 32 commits into from
Sep 22, 2023
Merged

support xgboost 2.0 #1219

merged 32 commits into from
Sep 22, 2023

Conversation

sonichi
Copy link
Contributor

@sonichi sonichi commented Sep 13, 2023

Why are these changes needed?

Trying to support xgboost 2.0

Related issue number

close #1217

Checks

@thinkall
Copy link
Collaborator

https://github.com/microsoft/FLAML/actions/runs/6180225370/job/16776350218?pr=1219#step:13:944

FAILED test/spark/test_multiclass.py::TestMultiClass::test_sparse_matrix_classification - AttributeError: 'XGBClassifier' object has no attribute 'use_label_encoder'

@sonichi
Copy link
Contributor Author

sonichi commented Sep 14, 2023

https://github.com/microsoft/FLAML/actions/runs/6180225370/job/16776350218?pr=1219#step:13:944

FAILED test/spark/test_multiclass.py::TestMultiClass::test_sparse_matrix_classification - AttributeError: 'XGBClassifier' object has no attribute 'use_label_encoder'

that test doesn't use xgboost 2. It uses xgboost 1.7. Could you check why it fails with the spark test?

@thinkall
Copy link
Collaborator

https://github.com/microsoft/FLAML/actions/runs/6180225370/job/16776350218?pr=1219#step:13:944

FAILED test/spark/test_multiclass.py::TestMultiClass::test_sparse_matrix_classification - AttributeError: 'XGBClassifier' object has no attribute 'use_label_encoder'

that test doesn't use xgboost 2. It uses xgboost 1.7. Could you check why it fails with the spark test?

what do you mean with "it uses xgboost 1.7"? This test can pass with xgboost 1.7.0. In spark tests, it only parallize the trials with spark.

@sonichi
Copy link
Contributor Author

sonichi commented Sep 14, 2023

https://github.com/microsoft/FLAML/actions/runs/6180225370/job/16776350218?pr=1219#step:13:944

FAILED test/spark/test_multiclass.py::TestMultiClass::test_sparse_matrix_classification - AttributeError: 'XGBClassifier' object has no attribute 'use_label_encoder'

that test doesn't use xgboost 2. It uses xgboost 1.7. Could you check why it fails with the spark test?

what do you mean with "it uses xgboost 1.7"? This test can pass with xgboost 1.7.0. In spark tests, it only parallize the trials with spark.

xgboost 1.7 is installed for this test: https://github.com/microsoft/FLAML/actions/runs/6180225370/job/16776350218?pr=1219#step:8:89

@levscaut
Copy link
Collaborator

hello @sonichi , I fixed small deprecated import error in notebook. May this help.

@sonichi
Copy link
Contributor Author

sonichi commented Sep 19, 2023

hello @sonichi , I fixed small deprecated import error in notebook. May this help.

Thanks. The test still fails.

@levscaut
Copy link
Collaborator

hello @sonichi , I fixed small deprecated import error in notebook. May this help.

Thanks. The test still fails.

I'm still working on this test. According to my local test, this issue is most likely due to the inconsistency of xgboost version between pyspark driver and executor. I'm figuring out why the install xgb<2 part is not happened on executor.

@thinkall
Copy link
Collaborator

Very weired, it passed when test only test/spark. https://github.com/microsoft/FLAML/actions/runs/6258815880/job/16993596238?pr=1219#step:13:24

if xgboost_version < "1.7.0":
params["use_label_encoder"] = params.get("use_label_encoder", False)
else:
assert "use_label_encoder" not in params, "use_label_encoder is deprecated in xgboost>=1.7.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to assert here. Assert will slow down the training process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to find the bug. This assert doesn't fail, so "use_label_encoder" is indeed not in params with xgboost 2.
So, why does error about "use_label_encoder" happen? Does any executor use xgboost < 1.7.0 for some reason?

Copy link
Contributor Author

@sonichi sonichi Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the bug is fixed now. Then we can remove this assertion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to find the bug. This assert doesn't fail, so "use_label_encoder" is indeed not in params with xgboost 2. So, why does error about "use_label_encoder" happen? Does any executor use xgboost < 1.7.0 for some reason?

In the macOS test, 1.7.6 is actually the version we want to use. But somehow the version in spark driver is upgraded to 2.0.0. The root cause is that in test/automl/test_classification.py::fix test_sparse_matrix_xgboost, it will first downgrade xgboost to 1.3.3 and then upgrade it to 2.0.0. Which makes in the spark test, the driver will have version 2.0.0 and executor will have 1.7.6 since the upgrade in test_sparse_matrix_xgboost only affects driver.

@sonichi sonichi added this pull request to the merge queue Sep 22, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 22, 2023
@thinkall thinkall added this pull request to the merge queue Sep 22, 2023
Merged via the queue into main with commit 868e7dd Sep 22, 2023
13 checks passed
@sonichi sonichi deleted the xgb2 branch September 22, 2023 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: best_iteration is only defined when early stopping is used. with xgboost 2.0
3 participants