Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error messaging for nullable foreign keys #1780

Closed
npatki opened this issue Feb 6, 2024 · 0 comments · Fixed by #1810
Closed

Better error messaging for nullable foreign keys #1780

npatki opened this issue Feb 6, 2024 · 0 comments · Fixed by #1810
Assignees
Labels
data:multi-table Related to multi-table, relational datasets feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 6, 2024

Problem Description

Currently, the SDV does not support nullable foreign keys though this feature request has been filed in #1656. We expect support for nullable foreign keys to take some time, as it requires core algorithmic updates to each multi-table algorithm.

If you have a null foreign key, it currently shows up an an error. For eg. in demo dataset TubePricing_v1:

Data does not match metadata for dataset TubePricing_v1 The provided data does not match the metadata:
Relationships:
Error: foreign key column 'connection_type_id_2' contains unknown references: (nan). All the values in this column must reference a primary key.
Error: foreign key column 'connection_type_id_1' contains unknown references: (nan). All the values in this column must reference a primary key.

This message misleading because the data is actually valid -- databases do support nullable foreign keys. We should clear up confusion that users may have about this missing feature.

Expected behavior

  • The metadata.validate_data call should allow nullable foreign keys to exist. That is: It should not crash.
  • Each multi table synthesizer should instead make this call and clearly communicate to the user that the feature is unsupported. In the future, we will add support for each synthesizer individually, in which case that particular synthesizer be updated.
>>> metadata.validate()
>>> metadata.validate_data(data)
>>> synthesizer = HSASynthesizer(metadata)
>>> synthesizer.fit(data)
SynthesizerProcessingError: The data contains null values in foreign key columns. This feature is currently unsupported. Please remove null values to fit the synthesizer.

Affected columns:
Table 'transactions', column 'session_id'
Table 'sessions', column 'user_id'
@npatki npatki added feature request Request for a new feature data:multi-table Related to multi-table, relational datasets labels Feb 6, 2024
@npatki npatki changed the title (Temporary) Better error messaging for nullable foreign keys Better error messaging for nullable foreign keys Feb 6, 2024
@frances-h frances-h added this to the 1.11.0 milestone Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:multi-table Related to multi-table, relational datasets feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants