[Guideline] Training API Enhancement and Refactor: Use Callbacks #892

tqchen · 2016-02-28T17:16:03Z

There has been series of changes to enhance the training and cross validation API in python/R, example of these changes include:

Early stopping based on the statistics.
Whether save the results from evaluation in cross validation or training.
Whether print the results from evaluation.
Whether save and return the best model from cv or training.
Adapt learning rate during training.

Currently, each of these proposals involves an API change on core training API. One argument need to be added to each of these requirements. We need to use a better way to handle these issues, otherwise the training API will become extremely hard to maintain.

Use Callbacks to Handle These Cases

def early_stop_maximize(round, metric, verbose=True):
    """Example, Early stopping to maximize metric,
    """
    Info = namedtuple(["best_score", "best_score_i"]) 
    info = Info(best_score=float(-inf), best_score_i=0);
    def callback(iteration, booster, evaluation_results):
         """ Callback function to do  early stop.
         iteration: int
              Current iteration number, equal to total number of trees so far.
              If continue from existing model, 
         booster: Booster
              Current booster model
         evaluation_results: list of (str, float), evaluation results from watchlist.
         """ 
         score =  dict(evaluation_results)[metric]
         if score > info.best_score:
             info.best_score = score
             info.best_score_i = iteration
         if iteration - info.best_score_i > round:
             booster.best_iteration = info.best_score_i
             if verbose:
                  sys.stderr.write("Stopping at round %d" % iteration)
             raise StopTraining()
     return callback

def train(param, num_boost_round, callbacks):
     ....
     for i in range(num_boost_round):
          bst.update()
          try:
              for callback  in callbacks:
                  callback(i, bst, evaluation_results)
          except StopTraining:
              break

# call training
bst = train(param, num_boost_round,  callbacks=early_stop_maximize(3, 'test-auc'));

TODO List

Add the callback API to training and cv API.
- This include python/R/Julia/JVM
Add a callback function module to xgboost
- We will only accept improvements to callbacks in the future, and being more careful about training API change.
- Add callbacks to support early_stop, logging, best_model_save
Use the callbacks to keep backward-compatibility
- For example, when early_stop_rounds is detected, add early_stop_maximize to the callback list in the beginning of function
- Mark the newly added arguments as deprecated, and give a deprecation warning to ask user to use callback API
- We will consider remove some of the not so import arguments after two major release.

The text was updated successfully, but these errors were encountered:

tqchen · 2016-02-28T17:26:15Z

@terrytangyuan @hetong007 @CodingCat please reply to discuss the API you like and see if you would like to add the callback and do the refactoring for part of language bindings(python/R/jvm)

hetong007 · 2016-02-28T18:06:43Z

Is "Adapt learning rate during training" independent from this change? I think it is more related to bst.update().

CodingCat · 2016-02-28T19:32:46Z

is there any restriction on the signature of the callbacks? or shall we distinguish between the pre-iteration and post-iteration callback, i.e. where to put the following lines

for callback  in callbacks:
       callback(i, bst, evaluation_results)

tqchen · 2016-02-28T21:03:09Z

@hetong007 learning rate change can be done by setting parameters in post iteration, i.e. call bst.set_param in callback

tqchen · 2016-02-28T21:04:02Z

@CodingCat This is an proposal. Most of the application so far seems to be post iteration callback. So we can use post iteration for now. But it might be interesting explicit

terrytangyuan · 2016-02-29T02:29:00Z

@tqchen Yeah the API evolved a lot and became complicated. I'll look into this for Python package when I get a chance. I got quite busy recently so if anyone wants to do it, you are very welcomed to do it!

tqchen · 2016-05-20T00:52:44Z

Add callback API to python in #1211

tqchen · 2016-05-20T04:42:17Z

The callback API for python is checkin in #1211 I am looking for volunteers to contribute the R counterpart. Please reply if you want to do this @hetong007 @khotilov

hetong007 · 2016-05-20T04:49:07Z

I will look into it. @khotilov You are also welcome to do that.

khotilov · 2016-05-20T05:05:17Z

I'll give it some time this weekend.

terrytangyuan · 2016-05-20T16:15:46Z

@tqchen Awesome. Looks great!

khotilov · 2016-05-23T08:21:36Z

I've coded all the R callbacks and xgb.train but didn't finish debugging yet.

tqchen · 2016-05-23T16:23:26Z

@khotilov souds good. let us know when it is ready. Note that R can directly pass Environments, so as long as the variable naming is proper, it is even more powerful than python

khotilov · 2016-05-23T18:00:58Z

@tqchen: yes, that was the mechanism I was exploiting.

BTW, even though it's possible in R to throw and catch a specific exception
("signal a condition" in R-speak), I incline towards simply setting a
stop-flag variable for early stopping, so that it doesn't have to be the
last callback. I think that might make sense to do in Python as well.

Also I've added a finalizer type of callbacks in addition to pre- and
post-iteration ones. E.g., record_evaluation does very simple evaluation
history collection at post-iteration, and then its finalizer runs at the
end to do some fast in-bulk transformation of the collected data. I assume
that a finalizer would always be coupled with either a pre or post
callback, so it doesn't need to be a separate callback function, and it
could be invoked by, e.g., passing a finalize=TRUE option. It seems like
it's usually easy to detect the condition for calling init, but for
finalizing it would be more reliable and clean to call it explicitly. Does
that makes sense?

On Mon, May 23, 2016 at 11:23 AM, Tianqi Chen notifications@github.com
wrote:

@khotilov https://github.com/khotilov souds good. let us know when it
is ready. Note that R can directly pass Environments, so as long as the
variable naming is proper, it is even more powerful than python

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#892 (comment)

tqchen · 2016-05-23T21:49:21Z

I think the same logic could apply for an exception. By finish calling the rest of callbacks. The reason why exception is raised as opposed to return condition was to make the code more explicit about what is the condition, and possibly being compatible with future conditions.
I think an optional finalizer can be supported. We can supported if by an optional attribute in the callback. i.e. same as the flag of pre-iteration. So if the field exists, the finalizer will be invoked.

[R-package] callbacks per #892

tqchen added R-package type: python labels Feb 28, 2016

This was referenced Feb 28, 2016

Added eta_mod parameter to automatically update learning parameter. #794

Closed

Saving multiple versions of xgb models #734

Closed

tqchen mentioned this issue Feb 28, 2016

Save models from xgb.cv #795

Closed

tqchen mentioned this issue Mar 9, 2016

[R] Added early.stop.tolerance feature #945

Closed

tqchen mentioned this issue Mar 20, 2016

Added learning rates option to fit function in Python Sklearn #1018

Closed

tqchen mentioned this issue May 20, 2016

[PYTHON] Refactor trainnig API to use callback #1211

Merged

khotilov mentioned this issue Jun 4, 2016

xgb.cv which round of prediction=TRUE #1188

Closed

hetong007 added a commit that referenced this issue Jul 3, 2016

Merge pull request #1264 from khotilov/r_callbacks

44ed6d5

[R-package] callbacks per #892

tqchen closed this as completed Jul 29, 2016

tqchen mentioned this issue Jul 29, 2016

Tag version 0.6 #1422

Merged

lock bot locked as resolved and limited conversation to collaborators Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Guideline] Training API Enhancement and Refactor: Use Callbacks #892

[Guideline] Training API Enhancement and Refactor: Use Callbacks #892

tqchen commented Feb 28, 2016

tqchen commented Feb 28, 2016

hetong007 commented Feb 28, 2016

CodingCat commented Feb 28, 2016

tqchen commented Feb 28, 2016

tqchen commented Feb 28, 2016

terrytangyuan commented Feb 29, 2016 •

edited

Loading

tqchen commented May 20, 2016

tqchen commented May 20, 2016

hetong007 commented May 20, 2016

khotilov commented May 20, 2016

terrytangyuan commented May 20, 2016

khotilov commented May 23, 2016

tqchen commented May 23, 2016

khotilov commented May 23, 2016

tqchen commented May 23, 2016

[Guideline] Training API Enhancement and Refactor: Use Callbacks #892

[Guideline] Training API Enhancement and Refactor: Use Callbacks #892

Comments

tqchen commented Feb 28, 2016

Use Callbacks to Handle These Cases

TODO List

tqchen commented Feb 28, 2016

hetong007 commented Feb 28, 2016

CodingCat commented Feb 28, 2016

tqchen commented Feb 28, 2016

tqchen commented Feb 28, 2016

terrytangyuan commented Feb 29, 2016 • edited Loading

tqchen commented May 20, 2016

tqchen commented May 20, 2016

hetong007 commented May 20, 2016

khotilov commented May 20, 2016

terrytangyuan commented May 20, 2016

khotilov commented May 23, 2016

tqchen commented May 23, 2016

khotilov commented May 23, 2016

tqchen commented May 23, 2016

terrytangyuan commented Feb 29, 2016 •

edited

Loading