You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As specified in the Metadata docs the metadata auto-detection logic is not meant to be accurate or complete. Furthermore, the metadata auto-detection logic may change in between SDV versions, leading to inconsistent results.
For eg. the following script may not produce the same results in every SDV version because the auto-detection script changes!
fromsdv.metadataimportMultiTableMetadatafromsdv.multi_tableimportHMASynthesizermetadata=MultiTableMetadata()
metadata.detect_from_dataframes(my_data) # this logic is not guaranteed to be accurate and may change!!synthesizer=HMASynthesizer(metadata)
synthesizer.fit(data)
To avoid these issues, the SDV team strongly recommends saving the metadata as a separate JSON file. This is not communicated to the user strongly enough, and leads to confusion.
metadata.save_to_json('my_metadata.json')
Expected behavior
When initializing a synthesizer, we should warn the user if they are providing a metadata object that has been auto-detected/modified but has never been saved.
# any single, multi or sequential synthesizersynthesizer=GaussianCopulaSynthesizer(metadata)
Warning: We strongly recommend saving the metadata using 'save_to_json' for replicability in future SDV versions.
Additional context
The warning does not need to show up if:
The user has called the save_to_json() function on the metadata (meaning that they are following the recommendation) OR
The user has created the metadata using load_from_json() (meaning that they are loading a previously-saved version of it)
The user retrieved the metadata object from our download_demo() function
For all of the above: The warning should reappear if a user update the metadata afterwards using the Python API, or if they call auto-detect on it.
One way to accomplish this would be by setting/unsetting a private flag within the metadata object itself.
The text was updated successfully, but these errors were encountered:
npatki
changed the title
Warn users to save their metadata file after auto-detecting and changing it
Warn users to save their metadata file after auto-detecting/updating it
Jan 29, 2024
Problem Description
As specified in the Metadata docs the metadata auto-detection logic is not meant to be accurate or complete. Furthermore, the metadata auto-detection logic may change in between SDV versions, leading to inconsistent results.
For eg. the following script may not produce the same results in every SDV version because the auto-detection script changes!
To avoid these issues, the SDV team strongly recommends saving the metadata as a separate JSON file. This is not communicated to the user strongly enough, and leads to confusion.
Expected behavior
When initializing a synthesizer, we should warn the user if they are providing a metadata object that has been auto-detected/modified but has never been saved.
Additional context
The warning does not need to show up if:
save_to_json()
function on the metadata (meaning that they are following the recommendation) ORload_from_json()
(meaning that they are loading a previously-saved version of it)download_demo()
functionFor all of the above: The warning should reappear if a user update the metadata afterwards using the Python API, or if they call auto-detect on it.
One way to accomplish this would be by setting/unsetting a private flag within the metadata object itself.
The text was updated successfully, but these errors were encountered: