Improvements in Quick-start for Ranking #1014

gabrielspmoreira · 2023-06-14T21:04:09Z

This PR adds some improvements to the Quick-start for ranking scripts and documentation

In preprocessing.py:

Adds support to target encoding features, configurable through these new CLI arguments: --target_encoding_features, --target_encoding_targets, --target_encoding_kfold, --target_encoding_smoothing.

In ranking.py:

Adds support to selecting some columns to keep (--keep_columns) or remove (--ignore_columns) from at dataloading / training / evaluation.

This PR also converts those scripts to Python modules, to make it easier to import/extend their classes and to test them.
So now, instead of being run like python preprocessing.py --args ..., they need to be run as a Python module, e.g.

cd /Merlin/examples/
python -m quick_start.scripts.preproc.preprocessing --args ...

…s classes can be tested). Added target encoding args to preprocessing.py. Added args to keep or filter columns in ranking.py. Documentation was updated.

github-actions · 2023-06-14T21:06:03Z

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-1014

rnyak · 2023-06-14T21:12:38Z

examples/quick_start/scripts/preproc/README.md

+                        --target_encoding_targets is, all categorical
+                        features will be used.
+  --target_encoding_targets 
+                        Columns (comma-sep) with target columns that will be


wont giving multiple targets create issue? you were facing issues for that.. was that fixed? also what about test set needs target column issue?

No, I split the targets and create one TargetEncoding op for each to avoid the issue

rnyak · 2023-06-14T21:13:13Z

examples/quick_start/scripts/ranking/cufile.log

@@ -0,0 +1,23 @@
+ 13-06-2023 12:45:41:756 [pid=1014 tid=1014] ERROR  cufio-drv:716 nvidia-fs.ko driver not loaded
+ 13-06-2023 12:45:52:861 [pid=1156 tid=1156] ERROR  cufio-drv:716 nvidia-fs.ko driver not loaded
+ 13-06-2023 12:57:13:36 [pid=1737 tid=1737] ERROR  cufio-drv:716 nvidia-fs.ko driver not loaded


guess you might want to remove cufile.log file.

rnyak · 2023-06-14T21:23:08Z

examples/quick_start/scripts/preproc/preprocessing.py

+                args.target_encoding_features = args.categorical_features
+            if not args.target_encoding_targets:
+                args.target_encoding_targets = (
+                    args.binary_classif_targets + args.regression_targets


did you check if a target col is float (not an int) and target encoding works properly?

I gonna add the integration tests in another PR and check for those cases.

rnyak · 2023-06-14T21:25:26Z

examples/quick_start/scripts/preproc/preprocessing.py

@@ -263,11 +301,36 @@ def generate_nvt_workflow_targets(self, client=None):
                [Tags.REGRESSION, Tags.TARGET, Tags.BINARY]


why tagged as Binary as well?

Good catch. Just removed it.

rnyak · 2023-06-14T21:27:31Z

examples/quick_start/scripts/preproc/preprocessing.py

            eval_dataset_preproc.to_parquet(
                output_eval_dataset_path,
                output_files=args.output_num_partitions,
            )

        if args.predict_data_path:
+            # Adding to predict set dummy target columns that are


does not read well. may be rephrase as Adding a dummy target column(s) to the test set to perform target encoding op while this issue ...

rnyak · 2023-06-15T00:29:41Z

@gabrielspmoreira I approved in case you want to merge once you push your final changes.

…e cases). Adjusted the command line examples

* Adding target encoding features support to quick-start preprocessing * Converting the quick-start for ranking to a Python module (so that its classes can be tested). Added target encoding args to preprocessing.py. Added args to keep or filter columns in ranking.py. Documentation was updated. * Fixed bbut when casting the columns (it was shuffling the cols in some cases). Adjusted the command line examples * Small fix and comment adjustment

gabrielspmoreira added 2 commits June 14, 2023 17:56

Adding target encoding features support to quick-start preprocessing

c7e5d51

Converting the quick-start for ranking to a Python module (so that it…

61adb79

…s classes can be tested). Added target encoding args to preprocessing.py. Added args to keep or filter columns in ranking.py. Documentation was updated.

gabrielspmoreira self-assigned this Jun 14, 2023

gabrielspmoreira added the enhancement New feature or request label Jun 14, 2023

gabrielspmoreira added this to the Merlin 23.06 milestone Jun 14, 2023

gabrielspmoreira requested a review from rnyak June 14, 2023 21:15

rnyak reviewed Jun 14, 2023

View reviewed changes

rnyak approved these changes Jun 15, 2023

View reviewed changes

gabrielspmoreira added 2 commits June 15, 2023 00:26

Fixed bbut when casting the columns (it was shuffling the cols in som…

b1a765b

…e cases). Adjusted the command line examples

Small fix and comment adjustment

320bf6e

gabrielspmoreira merged commit e131376 into main Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements in Quick-start for Ranking #1014

Improvements in Quick-start for Ranking #1014

gabrielspmoreira commented Jun 14, 2023 •

edited

Loading

github-actions bot commented Jun 14, 2023

rnyak Jun 14, 2023

gabrielspmoreira Jun 15, 2023

rnyak Jun 14, 2023

rnyak Jun 14, 2023

gabrielspmoreira Jun 15, 2023

rnyak Jun 14, 2023

gabrielspmoreira Jun 15, 2023

rnyak Jun 14, 2023

rnyak commented Jun 15, 2023

		@@ -263,11 +301,36 @@ def generate_nvt_workflow_targets(self, client=None):
		[Tags.REGRESSION, Tags.TARGET, Tags.BINARY]

Improvements in Quick-start for Ranking #1014

Improvements in Quick-start for Ranking #1014

Conversation

gabrielspmoreira commented Jun 14, 2023 • edited Loading

github-actions bot commented Jun 14, 2023

Documentation preview

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rnyak commented Jun 15, 2023

gabrielspmoreira commented Jun 14, 2023 •

edited

Loading