Add migration doc

dmlc · Aug 7, 2024 · f07905f · f07905f
1 parent 64e2442
commit f07905f
Show file tree

Hide file tree

Showing 2 changed files with 141 additions and 0 deletions.
diff --git a/doc/jvm/index.rst b/doc/jvm/index.rst
@@ -38,6 +38,7 @@ Contents
   XGBoost4J-Spark-GPU Tutorial <xgboost4j_spark_gpu_tutorial>
   Code Examples <https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example>
   API docs <api>
+  How to migrate from XGBoost Spark 3.x <xgboost_spark_migration>
 
 .. note::
 

diff --git a/doc/jvm/xgboost_spark_migration.rst b/doc/jvm/xgboost_spark_migration.rst
@@ -0,0 +1,140 @@
+######################################################
+Migration Guide: How to migrate from XGBoost Spark 3.x
+######################################################
+
+XGBoost Spark underwent significant modifications beginning with version 3.0,
+which may cause compatibility issues with existing user code.
+
+This guide will walk you through the process of updating your code to ensure
+it's compatible with XGBoost Spark 3.0 and later versions.
+
+**********************
+XGBoost Spark Packages
+**********************
+
+XGBoost Spark 3.0 introduced a single uber package named xgboost-spark_2.12-3.0.0.jar, which bundles 
+both xgboost4j and xgboost4j-spark. This means you can now simply use `xgboost-spark`` for your application.
+
+.. code-block:: xml
+
+  <dependency>
+      <groupId>ml.dmlc</groupId>
+      <artifactId>xgboost-spark_${scala.binary.version}</artifactId>
+      <version>3.0.0</version>
+  </dependency>
+
+When submitting the XGBoost application to the Spark cluster, you only need to specify the single `xgboost-spark` package.
+
+.. code-block:: bash
+
+  spark-submit \
+    --jars xgboost-spark_2.12-3.0.0.jar \
+    ... \
+
+**************
+XGBoost Ranking
+**************
+
+The ability to handle ranking problems using XGBoostRegressor has been discontinued.
+As an alternative, we have introduced XGBoostRanker, which is specifically designed
+to support ranking algorithms.
+
+.. code-block:: scala
+  
+  // before 3.0
+  val regressor = new XGBoostRegressor().setObjective("rank:ndcg")
+
+  // after 3.0
+  val ranker = new XGBoostRanker()
+
+******************************
+XGBoost Constructor Parameters
+******************************
+
+XGBoost Spark now categorizes parameters into two groups: XGBoost-Spark parameters and XGBoost parameters.
+When constructing an XGBoost estimator, only XGBoost-specific parameters are permitted. XGBoost-Spark specific 
+parameters must be configured using the estimator's setter methods. It's worth noting that 
+`XGBoost Parameters <https://xgboost.readthedocs.io/en/stable/parameter.html>`_
+can be set both during construction and through the estimator's setter methods.
+
+.. code-block:: scala
+
+  // before 3.0
+  val xgboost_paras = Map(
+    "eta" -> "1",
+    "max_depth" -> "6",
+    "objective" -> "binary:logistic",
+    "num_round" -> 5,
+    "num_workers" -> 1,
+    "features" -> "feature_column",
+    "label" -> "label_column",
+  )
+  val classifier = new XGBoostClassifier(xgboost_paras)
+
+
+  // after 3.0
+  val xgboost_paras = Map(
+    "eta" -> "1",
+    "max_depth" -> "6",
+    "objective" -> "binary:logistic",
+    )
+  val classifier = new XGBoostClassifier(xgboost_paras)
+    .setNumRound(5)
+    .setNumWorkers(1)
+    .setFeaturesCol("feature_column")
+    .setLabelCol("label_column")
+
+  // Or you can use setter to set all parameters
+  val classifier = new XGBoostClassifier()
+    .setNumRound(5)
+    .setNumWorkers(1)
+    .setFeaturesCol("feature_column")
+    .setLabelCol("label_column")
+    .setEta(1)
+    .setMaxDepth(6)
+    .setObjective("binary:logistic")
+
+*****************
+Unused Parameters
+*****************
+
+Starting from 3.0, below parameters are not used anymore.
+
+- cacheTrainingSet
+
+  If you wish to cache the training dataset, you have the option to implement caching
+  in your code prior to fitting the data to an estimator.
+
+  .. code-block:: scala
+    
+    val df = input.cache()
+    val model = new XGBoostClassifier().fit(df)
+
+- trainTestRatio
+
+  The following method can be employed to do the evaluation.
+
+  .. code-block:: scala
+    
+    val Array(train, eval) = trainDf.randomSplit(Array(0.7, 0.3))
+    val classifier = new XGBoostClassifer().setEvalDataset(eval)
+    val model = classifier.fit(train)
+
+- tracker_conf
+
+  The following method can be used to configure RabitTracker.
+
+  .. code-block:: scala
+    
+    val classifier = new XGBoostClassifer()
+      .setRabitTrackerTimeout(100)
+      .setRabitTrackerHostIp("192.168.0.2")
+      .setRabitTrackerPort(19203)
+
+- rabitRingReduceThreshold
+- rabitTimeout
+- rabitConnectRetry
+- singlePrecisionHistogram
+- lambdaBias
+- interactionConstraints
+- objectiveType