Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to export criteo hugectr model for inference with merlin-training:21.09 container #1198

Closed
dishamehra opened this issue Oct 19, 2021 · 0 comments · Fixed by #1207
Assignees
Labels
bug Something isn't working

Comments

@dishamehra
Copy link

Describe the bug
Running export_hugectr_ensemble with merlin-training:21.09 container for scaling-criteo example (04-Triton-Inference-with-HugeCTR.ipynb) gives the following error:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_36095/230262359.py in <module>
      7 hugectr_params["embedding_vector_size"] = 128
      8 hugectr_params["n_outputs"] = 1
----> 9 export_hugectr_ensemble(
     10     workflow=workflow,
     11     hugectr_model_path="./criteo_hugectr/1/",

/nvtabular/nvtabular/inference/triton/__init__.py in export_hugectr_ensemble(workflow, hugectr_model_path, hugectr_params, name, output_path, label_columns, version, cats, conts, max_batch_size, nvtabular_backend)
    264         raise ValueError("Either cats or conts has to have a value.")
    265 
--> 266     workflow = _remove_columns(workflow, label_columns)
    267 
    268     # generate the nvtabular triton model

/nvtabular/nvtabular/inference/triton/__init__.py in _remove_columns(workflow, to_remove)
    739                 # TODO: Handle selector sub-groups?
    740                 if column in node.selector.names:
--> 741                     node.selector._names.remove(column)
    742 
    743     return workflow.fit_schema(new_schema)

ValueError: list.remove(x): x not in list

Steps/Code to reproduce bug
Follow this: https://github.com/NVIDIA-Merlin/NVTabular/blob/main/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb

Expected behavior
Creates config.pbtxt file

Environment details (please complete the following information):
Docker : merlin-training:21.09
NVIDIA-SMI 450.119.04,Driver Version: 450.119.04,CUDA Version: 11.4

@dishamehra dishamehra added the bug Something isn't working label Oct 19, 2021
@benfred benfred self-assigned this Oct 20, 2021
benfred added a commit to benfred/NVTabular that referenced this issue Oct 21, 2021
The inference notebooks with the Criteo example were broken - and failed
to generate triton config with an in _remove_columns like ```ValueError: list.remove(x): x not in
list```. This was because the label column wa being inserted into a subgroup of the outputnode,
and wasn't getting removed from there.

Fix and add a basic unittest for _remove_columns that would have caught this.

Closes NVIDIA-Merlin#1198
benfred added a commit to benfred/NVTabular that referenced this issue Oct 21, 2021
The inference notebooks with the Criteo example were broken - and failed
to generate triton config with an in _remove_columns like ```ValueError: list.remove(x): x not in
list```. This was because the label column wa being inserted into a subgroup of the outputnode,
and wasn't getting removed from there.

Fix and add a basic unittest for _remove_columns that would have caught this.

Closes NVIDIA-Merlin#1198
jperez999 pushed a commit that referenced this issue Oct 21, 2021
The inference notebooks with the Criteo example were broken - and failed
to generate triton config with an in _remove_columns like ```ValueError: list.remove(x): x not in
list```. This was because the label column wa being inserted into a subgroup of the outputnode,
and wasn't getting removed from there.

Fix and add a basic unittest for _remove_columns that would have caught this.

Closes #1198
mikemckiernan pushed a commit that referenced this issue Nov 24, 2022
The inference notebooks with the Criteo example were broken - and failed
to generate triton config with an in _remove_columns like ```ValueError: list.remove(x): x not in
list```. This was because the label column wa being inserted into a subgroup of the outputnode,
and wasn't getting removed from there.

Fix and add a basic unittest for _remove_columns that would have caught this.

Closes #1198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants