-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvalidDataError: The provided data does not match the metadata (although it matches) #1833
Comments
Hi there @deltaproximity it looks like the data you're using for training don't adhere to the constraint you specified (scalar range from 0.7 to 0.9). I see multiple values outside of this range. Constraints in sdv are used to describe business rules inherent in your real data that you want the trained synthesizer / model to know about. This error is being thrown because sdv detected that the underlying data for training doesn't match the constraint you specified: Do you mind sharing more about your use case here? What's the motivation to define such a constraint that deviates from your original data? |
Hi @srinify, thanks for your reply. I need the synthesizer to use only the data from the range [0.7, 0.9], because the data I want to sample should have values of this column only in this range. From what I understood from the sdv documentation when using the conditional sampling one can only fix values but cannot specify a range for sampling. Therefore, I wanted to create a synthesizer that learns only from the data in the above specified range. |
Thanks for the context. A few things:
As a workaround @deltaproximity what you can do is sample a bunch of rows and filter out the ones outside your range:
|
Hi all, I'm closing this issue out as it has been inactive for a few weeks. I believe we now have other issues that are more suited to the root cause of this (see previous comment). Please feel free to reply if there is anything more to discuss. We can always reopen the issue for more investigation. Thanks. |
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
Error when using constraint 'ScalarRange' on a numeric columns to train a synthesizer (see the attached image):
Error message:
File ~.conda\envs\scrf\lib\site-packages\sdv\single_table\base.py:164, in BaseSynthesizer.validate(self, data)
161 errors += self._validate(data) # Validate rules specific to each synthesizer
163 if errors:
--> 164 raise InvalidDataError(errors)
InvalidDataError: The provided data does not match the metadata:
Data is not valid for the 'ScalarRange' constraint:
col_with_constraint
0 1.000000
1 0.936195
2 0.936195
3 0.936195
4 0.936195
+2656 more
This is how the column "col_with_constraint" looks like:
Steps to reproduce
<Replace this text with a description of the steps that anyone can follow to reproduce the error. If the error happens only on a specific dataset, please consider attaching some example data to the issue so that others can use it to reproduce the error.>
The text was updated successfully, but these errors were encountered: