Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Filter + JoinExternal creates key errors #1143

Closed
bschifferer opened this issue Sep 23, 2021 · 2 comments
Closed

[BUG] Filter + JoinExternal creates key errors #1143

bschifferer opened this issue Sep 23, 2021 · 2 comments
Assignees
Labels
bug Something isn't working P0

Comments

@bschifferer
Copy link
Contributor

bschifferer commented Sep 23, 2021

Describe the bug
If I create a Filter op followed with join_external, which uses columns_ext as a parameter, the workflow produces a key error

KeyError: 'ext_col3'

Steps/Code to reproduce bug

import pandas as pd
import cudf

import nvtabular as nvt

df = cudf.DataFrame({
    'col1': [0,0,0,0,0,0],
    'col2': [1,2,3,4,5,6],
    'col3': [2,2,2,2,2,2]
})

df_ext = pd.DataFrame({
    'col1': [0,0,0,0,0,0],
    'ext_col2': [1,1,1,1,1,1],
    'ext_col3': [2,2,2,2,2,2]
})

df.to_parquet('df.parquet')

out = ['col1', 'col2', 'col3'] >> nvt.ops.Filter(lambda df: df['col2'].isin([1,2])) >> nvt.ops.JoinExternal(df_ext=df_ext,on='col1', columns_ext=['col1', 'ext_col2'])

dataset = nvt.Dataset('df.parquet')

workflow = nvt.Workflow(out)
workflow.transform(dataset).to_parquet('./test/')

Additional context
If I do not use the filter op, it works
If I dont use the columns_ext parameter, it works too

Aha! Link: https://nvaiinfra.aha.io/features/MERLIN-507

@bschifferer bschifferer added the bug Something isn't working label Sep 23, 2021
@karlhigley
Copy link
Contributor

This is probably related to recent changes to the Workflow graph, so I'm assigning the two of us who worked on that to triage and figure out what's going on here.

@benfred benfred added the P0 label Oct 6, 2021
@karlhigley
Copy link
Contributor

This works on the main branch after merging #1194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0
Projects
None yet
Development

No branches or pull requests

4 participants