Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Add 'nrounds' as an alias for 'num_iterations' #4743

Closed
jameslamb opened this issue Oct 29, 2021 · 4 comments
Closed

[R-package] Add 'nrounds' as an alias for 'num_iterations' #4743

jameslamb opened this issue Oct 29, 2021 · 4 comments

Comments

@jameslamb
Copy link
Collaborator

jameslamb commented Oct 29, 2021

Description

In the R package, lgb.train() and lightgbm() expose a keyword argument nrounds, an integer indicating how many boosting rounds should be performed.

{lightgbm} uses the value of that argument to set num_iteration, unless num_iterations or another alias for it (https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_iterations) is provided in the keyword argument params.

params <- lgb.check.wrapper_param(
main_param_name = "num_iterations"
, params = params
, alternative_kwarg_value = nrounds
)

Since nrounds is not a supported alias for num_iterations in LightGBM, it is not possible to override the default value of nrounds in lgb.train() or lightgbm() by passing nrounds as part of the list in params.

As @mikemahoney218 noted in #4226 (comment), this is confusing behavior. Everywhere else in the R and Python packages, LightGBM treats values in params as higher-precedence than those passed through keyword arguments.

This behavior also adds friction to hyperparameter tuning (and will add even more once the suggestions from #4226 are fully implemented), as it makes nrounds a training parameter that cannot be altered by altering the list in params.

nrounds should be added as an alias for num_iterations in LightGBM.

Reproducible example

library(lightgbm)

data(agaricus.train, package = "lightgbm")

dtrain <- lightgbm::lgb.Dataset(
    agaricus.train$data
    , label = agaricus.train$label
)

bst <- lightgbm::lgb.train(
    params = list(
        "nrounds" = 17
        , "objective" = "regression"
    )
    , data = dtrain
)

# should be 17, but 100 boosting rounds were performed
bst$current_iter()

# [1] 100

How to fix this

Add nrounds to the list of aliases for num_iterations at

, "num_iterations" = c(
"num_iterations"
, "num_iteration"
, "n_iter"
, "num_tree"
, "num_trees"
, "num_round"
, "num_rounds"
, "num_boost_round"
, "n_estimators"
, "max_iter"
)
.

To ensure that other interfaces to LightGBM besides the R package respect this parameter alias, update the relevant C++ code and documentation. See https://github.com/microsoft/LightGBM/pull/4637/files for reference of which files should be changed. Do not edit docs/Parameters.rst directly... run python helpers/parameter_generator.py from the root of the repo after updating files in include/ and src/.

Add a unit test to https://github.com/microsoft/LightGBM/blob/798dc1d4191b93fd34797d62b79c66cd95209406/R-package/tests/testthat/test_basic.R which confirms that {lightgbm} respects nrounds passed through parameters.

Additional Comments

@StrikerRUS @Laurae2 please let me know if you disagree with this idea or have any additional thoughts to add.

@mikemahoney218
Copy link
Contributor

Thank you for opening this issue! This would be a nice quality-of-life improvement.

@jameslamb
Copy link
Collaborator Author

No problem, thanks for the report! @mikemahoney218 are you interested in contributing this change? I'd be happy to help answer any questions about the contribution process.

@mikemahoney218
Copy link
Contributor

@jameslamb Happy to, I'll pull together a PR.

@mikemahoney218
Copy link
Contributor

PR opened as #4746.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants