Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set HyperTransformer config manually, based on Metadata if given #982

Closed
npatki opened this issue Aug 30, 2022 · 0 comments · Fixed by #995
Closed

Set HyperTransformer config manually, based on Metadata if given #982

npatki opened this issue Aug 30, 2022 · 0 comments · Fixed by #995
Assignees
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Aug 30, 2022

Environment Details

  • SDV version: 0.17.0.dev1

Error Description

  1. The entire RDT HyperTransformer config is being printed out during modeling. This is unnecessarily verbose because I don't have any way of changing it right now. (Especially the case for HMA1.)
  2. The HyperTransformer config does not properly correspond to the metadata that I provide. For example, if I specify that a column is type categorical, it is being detected and modeled as numerical.

Steps to reproduce

metadata, data = load_tabular_demo('student_placements', metadata=True)
model = GaussianCopula(
    table_metadata=metadata
)
model.fit(data)
synthetic_data = model.sample(num_rows=100)

Observe that the config is printed out. Also observe that the column duration is listed as categorical but is being read in as numerical. It's also being modeled as numerical, as the synthetic data includes values that are not previously see in the real data.

Expected Fix

If we use HyperTransformer set_config, we can solve both issues.

  • The config is printed out during detect_initial_config, which we don't need to do
  • The config we set should be based on the metadata that the user provides (if applicable). If not provided, it should be based on the pandas dtypes of the passed in data.
@npatki npatki added the bug Something isn't working label Aug 30, 2022
@npatki npatki added this to the 0.17.0 milestone Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants