Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PSQL incident store into ORM (SQLAlchemy) #211

Merged
merged 7 commits into from
Jul 11, 2024

Conversation

JonahSussman
Copy link
Contributor

You will have to purge the data volume, delete all containers using it.

podman rm $(podman ps -a --filter volume=data -q)
podman volume rm data

Should do the trick.

Copy link
Contributor

@fabianvf fabianvf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions but looks way nicer/more maintainable 👍

def __init__(self, args: KaiConfigIncidentStorePostgreSQLArgs):
self.emb_provider = EmbeddingNone()
application_name: Mapped[str] = mapped_column(primary_key=True)
generated_at: Mapped[datetime.datetime] = mapped_column(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably don't want to use this as a primary key, since it will be unique if you insert the same report twice. I think ideally if we picked up the same report twice we would not duplicate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought process was that this was just a record of all reports in the order they were processed. What are you thinking as the primary key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is accurate, but we also don't want to duplicate the reports coming in. I think ideally we can pull an identifier or combination of identifiers off the Report itself. I say this mostly because the Hub importer polls the API, but if it restarts it could suck up all the previous reports again, so we just want to make sure those only go in once (or we could see like a crashloop but successfully load 50 reports where the database just balloons in size). We could do that filter elsewhere as well if needed but if there's an identifier on the report that can be used I think that would be ideal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that makes sense. How about storing the commit that generated the report instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think that could be part of an identifier but we could feasibly get multiple reports against the same commit. I'd be cool adding an ID field to the report object and requiring us to pass it in. I think we'll have report IDs on the konveyor side so we'd just need to figure out how to populate it in the on-disk scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok solid, I think I see what you're saying now

kai/service/incident_store/psql.py Outdated Show resolved Hide resolved
webapp["model_provider"] = ModelProvider(config.models)
KAI_LOG.info(f"Selected provider: {config.models.provider}")
KAI_LOG.info(f"Selected model: {webapp['model_provider'].model_id}")

webapp["incident_store"] = IncidentStore.from_config(
config.incident_store, webapp["model_provider"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the model provider to the incident store so we can do LLM summaries next

Signed-off-by: Jonah Sussman <sussmanjonah@gmail.com>
Signed-off-by: Jonah Sussman <sussmanjonah@gmail.com>
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
Copy link
Contributor

@fabianvf fabianvf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

return f"SQLIncident(violation_name={self.violation_name}, ruleset_name={self.ruleset_name}, application_name={self.application_name}, incident_uri={self.incident_uri}, incident_snip={self.incident_snip:.10}, incident_line={self.incident_line}, incident_variables={self.incident_variables}, solution_id={self.solution_id})"


# def dump(sql, *multiparams, **params):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can drop this I assume

Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
@JonahSussman JonahSussman merged commit d0e93bc into konveyor:main Jul 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants