Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Getting NotImplementedError: <nvtabular.columns.selector.ColumnSelector object error from TargetEncoding op #1165

Closed
rnyak opened this issue Oct 7, 2021 · 2 comments · Fixed by #1169
Assignees
Labels
bug Something isn't working P0

Comments

@rnyak
Copy link
Contributor

rnyak commented Oct 7, 2021

Describe the bug
Getting this error from TargetEncoding:

Failed to fit operator <nvtabular.ops.target_encoding.TargetEncoding object at 0x7f83cc8c9940>
Traceback (most recent call last):
  File "/nvtabular/nvtabular/workflow/workflow.py", line 220, in fit
    stats.append(op.fit(workflow_node.input_columns, transformed_ddf))
  File "/nvtabular/nvtabular/ops/target_encoding.py", line 169, in fit
    moments = _custom_moments(ddf[self.target])
  File "/root/.local/lib/python3.8/site-packages/dask-2021.7.1-py3.8.egg/dask/dataframe/core.py", line 4016, in __getitem__
    raise NotImplementedError(key)
NotImplementedError: <nvtabular.columns.selector.ColumnSelector object at 0x7f83cc894250>
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_8111/2721983559.py in <module>
     18 
     19 workflow = nvt.Workflow(te_features + ["Author", "Engaging_User"])
---> 20 df_out = workflow.fit_transform(nvt.Dataset(df)).to_ddf().compute(scheduler="synchronous")

/nvtabular/nvtabular/workflow/workflow.py in fit_transform(self, dataset)
    264         Dataset
    265         """
--> 266         self.fit(dataset)
    267         return self.transform(dataset)
    268 

/nvtabular/nvtabular/workflow/workflow.py in fit(self, dataset)
    218                 op = workflow_node.op
    219                 try:
--> 220                     stats.append(op.fit(workflow_node.input_columns, transformed_ddf))
    221                     ops.append(op)
    222                 except Exception:

/nvtabular/nvtabular/ops/target_encoding.py in fit(self, col_selector, ddf)
    167         if self.target_mean is None:
    168             # calcualte the mean if we don't have it already
--> 169             moments = _custom_moments(ddf[self.target])
    170 
    171         col_groups = col_selector.grouped_names

~/.local/lib/python3.8/site-packages/dask-2021.7.1-py3.8.egg/dask/dataframe/core.py in __getitem__(self, key)
   4014             return self.where(key, np.nan)
   4015 
-> 4016         raise NotImplementedError(key)
   4017 
   4018     def __setitem__(self, key, value):

NotImplementedError: <nvtabular.columns.selector.ColumnSelector object at 0x7f83cc894250>

Please run the following code to repro the issue

Steps/Code to reproduce bug

from nvtabular import ColumnSelector
import cudf
df = cudf.DataFrame({
        "Cost": range(15),
        "Post": [1, 2, 3, 4, 5] * 3,
        "Author": ['A'] * 5 + ['B'] * 5 + ['C'] * 2 + ['D'] * 3,
        "Engaging_User": ['A'] * 5 + ['B'] * 3 + ['E'] * 2 + ['D'] * 3 + ['G'] * 2})

cat_groups = ['Author', 'Engaging_User']
labels = ColumnSelector(['Post']) >> (lambda col: (col>3).astype('int8'))
te_features = cat_groups >> ops.TargetEncoding(
    labels,
    out_path='./',
    kfold=1,
    out_dtype="float32",
    drop_folds=False,  # Keep folds to validate
)

workflow = nvt.Workflow(te_features + ["Author", "Engaging_User"])
df_out = workflow.fit_transform(nvt.Dataset(df)).to_ddf().compute(scheduler="synchronous")

Expected behavior
TargetEncoding should be able to take a transformed workflow node as target.

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of NVTabular install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Docker image: NVTabular-Pytorch-Training:21.09

@rnyak rnyak added the bug Something isn't working label Oct 7, 2021
@benfred benfred added the P0 label Oct 7, 2021
@benfred
Copy link
Member

benfred commented Oct 7, 2021

This works in v0.6.0 (changing ColumnSelector -> ColumnGroup above), but is broken in v0.7.0

@albert17
Copy link
Contributor

albert17 commented Oct 7, 2021

Then it looks like we are not covering TargetEncoding op properly with the Unit testing.

@jperez999 jperez999 linked a pull request Oct 7, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants