Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable single table synthesizers to use new Metadata #2128

Open
amontanez24 opened this issue Jul 15, 2024 · 0 comments · May be fixed by #2186
Open

Enable single table synthesizers to use new Metadata #2128

amontanez24 opened this issue Jul 15, 2024 · 0 comments · May be fixed by #2186
Assignees
Labels
feature:metadata Related to describing the dataset feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

amontanez24 commented Jul 15, 2024

Problem Description

As a user, I'd like to easily create synthetic data for tables in my data once I have metadata independent of modality.

Currently, we have two different types of metadata for single and multi table use cases. This can be confusing and cumbersome for users. Once metadata is created, they should be able to use it with whichever synthesizer will work for their scenario.

Expected behavior

Once #2104 is completed, we need to enable all single table and sequential synthesizers to work with the new metadata class.

  • All single table and sequential synthesizers should run when passed the new metadata object in the init.
  • When a SingleTableSynthesizer is initialized with the new Metadata, check to make sure there is only one table. Otherwise, crash and ask user to use a MultiTableSynthesizer
  • Update all references to support both the SingleTableMetadata object and the new Metadata object. Run all integration tests using the new Metadata class and see what failures occur.
    • Update any parts of the code where the attribute might be different to support both the new class and the old SingleTableMetadata.
    • For example, in the DataProcessor, we access the metadata.columns directly. The new object will require you to go into the table first.
  • Update unit tests to continue to have 100% coverage
  • Deprecate SingleTableMetadata
    • Raise a future warning if users pass old metadata
  • The synthesizer should be backwards compatible (Still work with the old metadata)

Additional context

There will be a lot of references that might change between the new class and the SingleTableMetadata class since many attributes will now be nested. I propose the following strategy for handling these changes

  • For constraints and the DataProcessor, use the underlying SingleTableMetadata object. Since the MultiTableMetadata class is made up of a dictionary of SingleTableMetadata classes, you can extract that class for the one table and pass it to the DataProcessor and constraints.
  • In the SingleTableSynthesizer classes themselves (Base, GaussianCopula etc.), update the references to work with both the new and old classes. There aren’t as many references in these files so it should be more straightforward.
  • Note: There is a separate issue for getting the evaluation methods to work
@amontanez24 amontanez24 added feature request Request for a new feature feature:metadata Related to describing the dataset labels Jul 15, 2024
@lajohn4747 lajohn4747 self-assigned this Jul 17, 2024
@amontanez24 amontanez24 added this to the 1.17.0 milestone Aug 27, 2024
@amontanez24 amontanez24 linked a pull request Aug 27, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:metadata Related to describing the dataset feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants