Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: support for async deletion of a package index #2657

Open
VannTen opened this issue Jul 8, 2022 · 2 comments
Open

Feature: support for async deletion of a package index #2657

VannTen opened this issue Jul 8, 2022 · 2 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance.

Comments

@VannTen
Copy link
Member

VannTen commented Jul 8, 2022

Problem statement

For thoth-station/management-api#790, the management
API need to trigger the deletion of an index. But it could potentially take a
long time to delete all the related storage items, and the management API needs
to return in a reasonnable timeframe (http, so in seconds).

Proposal description

Split the deletion in two part:

  1. mark the package index as 'deleted' (or 'to_delete')(similarly to the
    'enabled/disabled' state -> this is called by the management-api
  2. delete all storage related to indexs marked 'deleted' (graph + ceph) -> this
    is called by an async workflow created by the management-api

The purpose of splitting is to be tolerant of failures/timeout etc of the
"delete workflow".

Alternatives

Skip the first step and directly create the "delete workflow".

However, this seems fragile in certain cases:

  • The workflow fails or timeout.
  • ...

It does avoid changing the DB schema though.

Additional context

  • I'm making the assumption that all storage items can be related to an index
    easily
  • For the postgres side, the deletion should rely as much as possible on
    cascade delete.
  • I need to investigate for the Ceph side to see how it is organised and related
    to postgres.

Acceptance Criteria

TODO

@VannTen VannTen added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 8, 2022
@goern
Copy link
Member

goern commented Jul 20, 2022

/sig stack-guidance
/priority important-longterm

@sesheta sesheta added sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jul 20, 2022
@VannTen
Copy link
Member Author

VannTen commented Sep 13, 2022

So, my last thoughts on this:

First I would need a query of taking a package index and returning all the
currently stored document ids in object storage.

Once that done, the workflow is basically:

  1. Mark package index as deleted.
  2. Query PackageIndex -> all ids -> delete all ids
  3. Delete Package index -> postgres cleanup all related sql items by cascading.

Does that seems realistic ? I'm still not completely at ease with the storage
model, so opinions on that strategy would be welcome.

@mayaCostantini

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/stack-guidance Categorizes an issue or PR as relevant to SIG Stack Guidance.
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants