Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable visualize just the real data (or just the synthetic data) in a multi-table setting #2160

Closed
npatki opened this issue Aug 1, 2024 · 0 comments · Fixed by #2169
Closed
Assignees
Labels
bug Something isn't working data:multi-table Related to multi-table, relational datasets feature:evaluation Related to running metrics or visualizations
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Aug 1, 2024

Environment Details

  • SDV version: 1.15.0
  • Python version: 3.10
  • Operating System: Linux (Google Colab)

Error Description

The latest update of SDMetrics allows you to plot just the real data (or just the synthetic data) by supplying the other dataset as None. This is useful for explorations an investigations.

In SDV, we have a wrapper around SDMetrics. I expect that these visualizations should have the same set of features as SDMetrics. However, I see that the ability to set None doesn't work -- it produces an error instead.

Steps to reproduce

from sdv.datasets.demo import download_demo
from sdv.evaluation.multi_table import get_column_plot

data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

fig = get_column_plot(
    real_data=data,
    synthetic_data=None, # I should be able set either this or the real_data to None
    metadata=metadata,
    table_name='guests',
    column_name='amenities_fee'
)
    
fig.show()
[/usr/local/lib/python3.10/dist-packages/sdv/evaluation/multi_table.py](https://localhost:8080/#) in get_column_plot(real_data, synthetic_data, metadata, table_name, column_name, plot_type)
     79     metadata = metadata.tables[table_name]
     80     real_data = real_data[table_name]
---> 81     synthetic_data = synthetic_data[table_name]
     82     return single_table_visualization.get_column_plot(
     83         real_data,

TypeError: 'NoneType' object is not subscriptable

Additional Context

Note that this works correctly in the single-table context. Only multi-table is erroring.

# this works as expected!
from sdv.datasets.demo import download_demo
from sdv.evaluation.single_table import get_column_plot

data, metadata = download_demo(
    modality='single_table',
    dataset_name='fake_hotel_guests'
)


fig = get_column_plot(
    real_data=data,
    synthetic_data=None,
    metadata=metadata,
    column_name='amenities_fee'
)
    
fig.show()
@npatki npatki added bug Something isn't working data:multi-table Related to multi-table, relational datasets feature:evaluation Related to running metrics or visualizations labels Aug 1, 2024
@R-Palazzo R-Palazzo self-assigned this Aug 5, 2024
@R-Palazzo R-Palazzo added this to the 1.15.1 milestone Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:multi-table Related to multi-table, relational datasets feature:evaluation Related to running metrics or visualizations
Projects
None yet
2 participants