Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn users to save their metadata file after auto-detecting/updating it #1762

Closed
npatki opened this issue Jan 29, 2024 · 0 comments · Fixed by #1786
Closed

Warn users to save their metadata file after auto-detecting/updating it #1762

npatki opened this issue Jan 29, 2024 · 0 comments · Fixed by #1786
Assignees
Labels
feature:metadata Related to describing the dataset feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jan 29, 2024

Problem Description

As specified in the Metadata docs the metadata auto-detection logic is not meant to be accurate or complete. Furthermore, the metadata auto-detection logic may change in between SDV versions, leading to inconsistent results.

For eg. the following script may not produce the same results in every SDV version because the auto-detection script changes!

from sdv.metadata import MultiTableMetadata
from sdv.multi_table import HMASynthesizer

metadata = MultiTableMetadata()
metadata.detect_from_dataframes(my_data) # this logic is not guaranteed to be accurate and may change!!

synthesizer = HMASynthesizer(metadata)
synthesizer.fit(data)

To avoid these issues, the SDV team strongly recommends saving the metadata as a separate JSON file. This is not communicated to the user strongly enough, and leads to confusion.

metadata.save_to_json('my_metadata.json')

Expected behavior

When initializing a synthesizer, we should warn the user if they are providing a metadata object that has been auto-detected/modified but has never been saved.

 # any single, multi or sequential synthesizer
synthesizer = GaussianCopulaSynthesizer(metadata)
Warning: We strongly recommend saving the metadata using 'save_to_json' for replicability in future SDV versions.

Additional context

The warning does not need to show up if:

  • The user has called the save_to_json() function on the metadata (meaning that they are following the recommendation) OR
  • The user has created the metadata using load_from_json() (meaning that they are loading a previously-saved version of it)
  • The user retrieved the metadata object from our download_demo() function

For all of the above: The warning should reappear if a user update the metadata afterwards using the Python API, or if they call auto-detect on it.

One way to accomplish this would be by setting/unsetting a private flag within the metadata object itself.

@npatki npatki added feature request Request for a new feature feature:metadata Related to describing the dataset labels Jan 29, 2024
@npatki npatki changed the title Warn users to save their metadata file after auto-detecting and changing it Warn users to save their metadata file after auto-detecting/updating it Jan 29, 2024
@amontanez24 amontanez24 added this to the 1.10.0 milestone Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:metadata Related to describing the dataset feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants