Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add core secondary api and forwarding middleware #6685

Merged
merged 3 commits into from
Aug 7, 2024

Conversation

fontanierh
Copy link
Contributor

@fontanierh fontanierh commented Aug 6, 2024

Description

We want to experiment forwarding tables query related endpoints to a secondary core deployment as we suspect this is where the memory leak is coming from.

This PR adds a new k8s deployment wrapped in a new service (core-secondary-[service,deployment]). This new deployment is running the exact same code as core. Network policy has been updated such that only core can send requests to core-secondary. core-secondary has the same egress rights to other Dust services as core.
Also adds an axum middleware to forward tables query requests to secondary (disabled for now)

Risk

N/A (not enabled yet)

Deploy Plan

apply infra then deploy core

@fontanierh fontanierh force-pushed the feat/core-secondary-deployment branch from 00b2418 to d6cc818 Compare August 6, 2024 14:49
@Fraggle
Copy link
Contributor

Fraggle commented Aug 7, 2024

We want to experiment forwarding tables query related endpoints to a secondary core deployment as we suspect this is where the memory leak is coming from.

I don't understand the relationship between secondary core deployment and memory leak.
Can you elaborate (a tiny bit) on your hypothesis and how you expect to verify it with this PR ?

@@ -0,0 +1,69 @@
apiVersion: apps/v1
Copy link
Contributor

@Fraggle Fraggle Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can (but should we ?) we inherit from core-deployment.yaml and override the variables we need ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish but sadly I don't think you can do that (need specialized tools that we don't use to "build" yaml -- for now we're just duplicating when needed)

Copy link
Contributor

@Fraggle Fraggle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know rust or k8 enough to do a deep verification but it looks good to me 👍🏼

@fontanierh
Copy link
Contributor Author

fontanierh commented Aug 7, 2024

@Fraggle

I don't understand the relationship between secondary core deployment and memory leak.
Can you elaborate (a tiny bit) on your hypothesis and how you expect to verify it with this PR ?

So we've had a memory leak on core for a while now. We've tried many things to understand where it is coming from, but so far it has been a dead end -- @flvndvd, @spolu and myself have tried various stuff.
We suspect it's coming from "tables query", a feature that allows storing and querying structured data (eg google sheets, notion DBs) in Dust by creating "just in time" SQLite databases that AI agents can query (we feel like there's a correlation between leaked memory and calls to the tables query endpoints).

In order to confirm this, I am creating a duplicate "core" service (the "secondary") running that same core code, and I'm going to off-load the tables query requests (add data, query it etc..) to that deployment instead. Then we'll see which service(s) have a memory leak (I can then move some endpoints back to the primary core, until we can pinpoint the specific endpoint that has a leak).

Copy link
Contributor

@tdraier tdraier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Fraggle
Copy link
Contributor

Fraggle commented Aug 7, 2024

@Fraggle

I don't understand the relationship between secondary core deployment and memory leak.
Can you elaborate (a tiny bit) on your hypothesis and how you expect to verify it with this PR ?

So we've had a memory leak on core for a while now. We've tried many things to understand where it is coming from, but so far it has been a dead end -- @flvndvd, @spolu and myself have tried various stuff. We suspect it's coming from "tables query", a feature that allows storing and querying structured data (eg google sheets, notion DBs) in Dust by creating "just in time" SQLite databases that AI agents can query (we feel like there's a correlation between leaked memory and calls to the tables query endpoints).

In order to confirm this, I am creating a duplicate "core" service (the "secondary") running that same core code, and I'm going to off-load the tables query requests (add data, query it etc..) to that deployment instead. Then we'll see which service(s) have a memory leak (I can then move some endpoints back to the primary core, until we can pinpoint the specific endpoint that has a leak).

Thanks @fontanierh very clear and makes a lot of sense.

Copy link
Contributor

@flvndvd flvndvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


let client = Client::new();
let (parts, body) = req.into_parts();
let body_bytes: Bytes = match axum::body::to_bytes(body, usize::MAX).await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume that given we never deal with files, this simple redirection should cover our needs , right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure if it'd work with files, but not super important since we don't use them with this endpoint

@fontanierh fontanierh merged commit dc713f0 into main Aug 7, 2024
3 checks passed
@fontanierh fontanierh deleted the feat/core-secondary-deployment branch August 7, 2024 10:50
flvndvd pushed a commit that referenced this pull request Aug 8, 2024
* feat: add core secondary api and forwarding middleware

* proper error handling

* nit

---------

Co-authored-by: Henry Fontanier <henry@dust.tt>
albandum pushed a commit that referenced this pull request Aug 28, 2024
* feat: add core secondary api and forwarding middleware

* proper error handling

* nit

---------

Co-authored-by: Henry Fontanier <henry@dust.tt>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants