Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QSB Meta Issue] Association and Accounting of Requests in QSB #11900

Open
kaushalmahi12 opened this issue Jan 16, 2024 · 3 comments
Open

[QSB Meta Issue] Association and Accounting of Requests in QSB #11900

kaushalmahi12 opened this issue Jan 16, 2024 · 3 comments
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Search:Resiliency

Comments

@kaushalmahi12
Copy link
Contributor

kaushalmahi12 commented Jan 16, 2024

Is your feature request related to a problem? Please describe

This is a subpart of the original feature QSB.
Main RFC
Proposal Doc

Sandbox and its types

Sandbox

This is the main entity which will help us divide the traffic into groups and enforce system resource limits per such group. The classification of such groups depends on the request attributes e,g;

  • User
  • IndexPatterns
  • Index Type (hot/warm)

Although these attributes are a great way to start segmenting the traffic but still it is very hard to truly divide the traffic into user specific groups as we can't accurately partition the incoming requests into user specific sandboxes for example if incoming request coming from userA and for indexB could resolve into two different sandboxes. Hence it warrants to think little differently to handle such cases.

We will use special type of sandboxes which will be user specific only to address the following

  • Co-ordinator level traffic sandboxing as it can span across multiple indices
  • Enforce user level resource limits on workload

Sandbox Types

  1. Reserved Type - These sandboxes will have multiple attributes and will be responsible for shard level request association and accounting. It will have fixed low and high limits for each system resource, CPU and JVM allocations to start with. Sum of a system resource for all such sandboxes should not exceed the value 100. It will have all the attributes such as, index type, user and index pattern as mandatory attributes. On breach of low limit of a system resource for this sandbox will start causing rejections while on breach of high limit of a system resource it will start cancelling the requests in the sandbox. This may not cause the parent request cancellation because of the flag allow_partial_results.

  2. Constrained Type - These sandboxes will be created to address user level resource consumption enforcement along with co-ordinator level request accounting. Since co-ordinator request can span across multiple indices it is highly likely that there will be conflicting sandbox resoulutions for the co-ordinator level request. The accounting for these type of sandboxes is derived from the reserved type sandboxes. This sandbox will have user as the only selection attribute. Sum of a system resource for all such sandboxes can exceed the value 100. Now since this is a kind of abstracted from reserved type sandboxes (as user level sandbox on non co-ordinator node will sum up the shard level task resource usages for the user). At any point in time the sum of a resource across all the co-ordinator level sandboxes will not exceed 100. The low and high limits for a resource will exactly same as of reserved type sandbox. Now with this sandbox the only distinction is to track the co-ordinator and user level traffic.

  3. Default Type - This will be the default sandbox which will act as catch all for the requests which could not resolve into any of the sandboxes. This will have the least priority. We will keep one for co-ordinator level tasks and one for shard level tasks.

Sandbox_Workflow

Tracking and Cancellation flow diagram

tracking_and_cancellation

Related component

Search:Resiliency

Describe alternatives you've considered

No response

Additional context

No response

@kaushalmahi12 kaushalmahi12 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jan 16, 2024
@jainankitk
Copy link
Collaborator

@kaushalmahi12 - Thank you for taking stab at documenting this. While this captures some of the aspects we talked about offline, it is missing few things and some of the things are still unclear:

  • We should talk about the concept of sandbox group. Sandbox groups are independent of each other and slice the same resource pie (JVM, CPU) differently. For example - one sandbox group could divide into two equal pies for user Dave and Joe, another sandgroup could divide into 40/60 for hot/warm index type.
  • Right now, we are envisioning few types of sandbox which could evolve into other use cases in future. Hence, it might be better to have reserved / constrained as one of the attributes for sandbox family.

Sum of a system resource for all such sandboxes should not exceed the value 100. It will have all the attributes such as, index type, user and index pattern as mandatory attributes.

Why are all the attributes mandatory for reserved type sandboxes?

This sandbox will have user as the only selection attribute. Sum of a system resource for all such sandboxes can exceed the value 100.

Why do we have these conditions for constrained sandboxes?

@kaushalmahi12
Copy link
Contributor Author

@jainankitk Thanks for going through this!. But If we don't make the attributes mandatory for reserved type sandboxes, we may have multiple sandboxes resolving for a single request. These three attributes uniquely identifies a request. If we don't mandate the attributes then we can have 2^n different configurations for n attributes.

@peternied
Copy link
Member

[Triage - attendees 1 2 3]
Thanks for filing

@andrross andrross added the Roadmap:Search Project-wide roadmap label label May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Search:Resiliency
Projects
Status: New
Status: Later (6 months plus)
Development

No branches or pull requests

4 participants