Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Can PythonPackageVersionEntity have multiple same value #2729

Open
harshad16 opened this issue Oct 27, 2022 · 7 comments
Open

[Question] Can PythonPackageVersionEntity have multiple same value #2729

harshad16 opened this issue Oct 27, 2022 · 7 comments
Labels
priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance. thoth/human-intervention-required This isse/PR requires human intervention! thoth/potential-observation This is a potential observation we need to learn from!

Comments

@harshad16
Copy link
Member

The database shows multiple entry of the same unique tuple
Screenshot from 2022-10-27 15-57-28
even though we have set the unique constriat

UniqueConstraint("package_name", "package_version", "python_package_index_id"),

The python_package_index_id, seems to be the differentiating factor, which somehow hasn't been included.

@harshad16 harshad16 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. thoth/potential-observation This is a potential observation we need to learn from! thoth/human-intervention-required This isse/PR requires human intervention! labels Oct 27, 2022
@harshad16
Copy link
Member Author

/sig stack-guidance

@sesheta sesheta added the sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance. label Oct 27, 2022
@mayaCostantini
Copy link
Contributor

Which is the postgres image used to run the database? Is the version < 15?
Postgres versions before 15 seems to consider NULL values as distinct, which would impose to use something like postgresql_where=python_package_index_id.isnot(None) in the python_package_version_entity_idx index:

Index(
"python_package_version_entity_idx",
"package_name",
"package_version",
"python_package_index_id",
unique=True,
),

      Index(
          "python_package_version_entity_idx",
          "package_name",
          "package_version",
          "python_package_index_id",
          postgresql_where=python_package_index_id.isnot(None),
          unique=True,
          ),

I have not tested this solution though as I still have issues with podman. I will sort this out and get back to you if that works.

@harshad16
Copy link
Member Author

we use Postgresql 10 in our systems.
I wasn't aware that the older version considered NULL as distinct.

Maybe it is time to upgrade to a newer version of PostgreSQL and clean the database a little.
we should identify what are the change that we would have to make for such upgrade

@mayaCostantini
Copy link
Contributor

Agree, I can assign myself to this issue and start working on it when I solve my testing environment issues. What would you say is the priority of this?

@mayaCostantini
Copy link
Contributor

Also, this might be related to #2189

@harshad16
Copy link
Member Author

#2189 could be related
It would be a priority as I feel this might affect the data ingestion.

@harshad16
Copy link
Member Author

Questions:

  • Isn't there a get_or_create provided by sqlalchemy itself ? Catching integrity error + session.rollback looks like the job of a context manager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance. thoth/human-intervention-required This isse/PR requires human intervention! thoth/potential-observation This is a potential observation we need to learn from!
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants