-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add execution context service #102039
Add execution context service #102039
Conversation
Pinging @elastic/apm-ui (Team:apm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking good.
I have to admit, async state holding is still a little magical to me, especially with hapi's event based mechanism (I'm not even really sure to understand how node is able to retain the async trace in that situation tbh)
A few concerns / questions:
-
How is
AsyncLocalStorage
working regarding garbage collection? My fear is that not being able to properly clear the storage may result in memory leaks, is that an actual concern? The PR is cleaning up the storage state during the server'sresponse
event but- are we sure this covers all responses EOL scenarios?
- what about contexts created outside of the scope of a request handler. I'm thinking about task manager for example. Will the owners of such server-side services have to manually clear the context at the end of an operation?
-
If I do think we want to enable that by default, the perf impact makes me wonder if we shouldn't still add an option to disable the feature? OTOH that would force to re-implement the possibility to read the x-opaque-id from the ES client, which was removed in this PR, so this would complexity the code a bit. Just want to be sure we're all (the team, Product and so on) understanding the perf implication of this feature.
|
||
// the trimmed value in the server logs is better than nothing. | ||
function enforceMaxLength(header: string): string { | ||
return header.slice(0, MAX_BAGGAGE_LENGTH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the header value is a serialized json object, wouldn't truncation cause an invalid object in the end? I see we're try/catching on the server-side when parsing the header, but I wonder if this is good enough?
test/plugin_functional/test_suites/core_plugins/execution_context.ts
Outdated
Show resolved
Hide resolved
src/core/server/http/http_server.ts
Outdated
requestUuid: uuid.v4(), | ||
} as KibanaRequestState; | ||
return responseToolkit.continue; | ||
}); | ||
} | ||
|
||
private setupContextExecutionCleanup(executionContext?: InternalExecutionContextSetup) { | ||
if (!executionContext) return; | ||
this.server!.events.on('response', function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is response
covering all the request EOL scenarios? e.g is this handler called in case of internal handler error?
Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
@pgayvallet
From the nodejs test, we can see that the It makes To make sure we don't introduce a memory leak, I added a long string to the execution context: executionContext?.set({
...parentContext,
requested,
randomString: Math.random().toString().repeat(100_000), // 1.8Mb per a single request!
}); and ran load-testing for 2*6 minutes
Memory consumption on the Monitoring page: But anyway we should add a flag to disable
fair point. I can put the logic for legacy
yeah, as mentioned in the PR title, with nodejs/node#38577 landing to nodejs v14
@joshdover yes, From mshustov#8: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APM changes look good.
...this.config.requestHeadersWhitelist, | ||
]); | ||
scopedHeaders = filterHeaders( | ||
{ ...requestHeaders, ...requestIdHeaders, ...authHeaders }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still pass 'x-opaque-id'
header if executionContext
service is disabled
|
||
export const config: ServiceConfigDescriptor<ExecutionContextConfigType> = { | ||
path: 'execution_context', | ||
schema: configSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I didn't pass the config value on the client. I don't see a lot of benefits of making ExecutionContextContainer
methods no-ops as they don't add a lot of overhead. Any objections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to have the client always be 'enabled' regardless of the config value.
import { ServiceConfigDescriptor } from '../internal_types'; | ||
|
||
const configSchema = schema.object({ | ||
enabled: schema.boolean({ defaultValue: true }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can disable it by default based on the outcome of #102706
In the long term, the service should be enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see anything else, LGTM.
|
||
// the trimmed value in the server logs is better than nothing. | ||
function enforceMaxLength(header: string): string { | ||
return header.slice(0, MAX_BAGGAGE_LENGTH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels quite complex, so I'd say it's fine keeping it as you did for now. Let's use this initial implementation and see with our usages if the limit is effectively reached for any real usage.
|
||
export const config: ServiceConfigDescriptor<ExecutionContextConfigType> = { | ||
path: 'execution_context', | ||
schema: configSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to have the client always be 'enabled' regardless of the config value.
test/plugin_functional/test_suites/core_plugins/execution_context.ts
Outdated
Show resolved
Hide resolved
src/core/server/execution_context/integration_tests/tracing.test.ts
Outdated
Show resolved
Hide resolved
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Public APIs missing exports
Page load bundle
History
To update your PR or re-run it, just comment with: |
* add execution context service on the server-side * integrate execution context service into http service * add integration tests for execution context + http server * update core code * update integration tests * update settings docs * add execution context test plugin * add a client-side test * remove requestId from execution context * add execution context service for the client side * expose execution context service to plugins * add execution context service for the server-side * update http service * update elasticsearch service * move integration tests from http to execution_context service * integrate in es client * expose to plugins * refactor functional tests * remove x-opaque-id from create_cluster tests * update test plugin package.json * fix type errors in the test mocks * fix elasticsearch service tests * add escaping to support non-ascii symbols in description field * improve test coverage * update docs * remove unnecessary import * update docs * Apply suggestions from code review Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com> * address comments * remove execution context cleanup * add option to disable execution_context service on the server side * put x-opaque-id test back * put tests back * add header size limitation to the server side as well * fix integration tests * address comments Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
* add execution context service on the server-side * integrate execution context service into http service * add integration tests for execution context + http server * update core code * update integration tests * update settings docs * add execution context test plugin * add a client-side test * remove requestId from execution context * add execution context service for the client side * expose execution context service to plugins * add execution context service for the server-side * update http service * update elasticsearch service * move integration tests from http to execution_context service * integrate in es client * expose to plugins * refactor functional tests * remove x-opaque-id from create_cluster tests * update test plugin package.json * fix type errors in the test mocks * fix elasticsearch service tests * add escaping to support non-ascii symbols in description field * improve test coverage * update docs * remove unnecessary import * update docs * Apply suggestions from code review Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com> * address comments * remove execution context cleanup * add option to disable execution_context service on the server side * put x-opaque-id test back * put tests back * add header size limitation to the server side as well * fix integration tests * address comments Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com> Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
Summary
Part of #102626
This PR adds an initial implementation of the
ExectuionContext
service that takes care of propagation runtime meta-information Kibana client App --> Kibana Server --> Elasticsearch server.Design
Client-side
Kibana plugins create
context
and pass it through their application logic to inject it tohttp
service call. Kibana Core will serializecontext
object and inject it as a custom header.Server-side
There are two cases:
context
object. In this case, the context object is parsed and stored in AsyncLocalStorage. Whenever a plugin or Kibana Core calls Elasticseach server, some meta information from context (type + id) is attached to thex-opaque-id
header. If a search operation takes longer than expected, parameters of the incoming request (includingx-opaque-id
) will be logged to thesearch slowlogs
file.executionContext.set(context)
to attachcontext
object to the current async "thread". Unlike the logic on the client, the plugin doesn't need to pass the context object through all the layers of the application, nodejs already provides the API to store context through async operations.Elasticsearch
Receives
x-opaque-id
header, which starts withrequestId
for the BWC with the logic introduced in #71019. It has the following format:x-opaque-id: 1234-5678-9000
. ContainsrequestId
only ifexecution context
hasn't been attached.x-opaque-id: 1234-5678-9000;kibana:tsvb:5b2de169-2785-441b-ae8c-186a1936b17d
contains requestId +kibana:executionContext.type:executionContext.id
if the context has been attached.Next steps
In the next iteration, I'm going to add support for nested execution contexts. It can be used to compose execution context relationships across different apps:
Performance impact
Usage of AsyncLocalStorage and AsyncHooks are not free. Keeping track of async context does add some overhead.
I ran DemoJourney of https://github.com/elastic/kibana-load-testing with 100 concurrent users and saw the total 95th percentile of response time increased by a few percent. However, response time in a few scenarios increased by 5-30%
See detailed report
Before:before.tar.gz
After:
after.tar.gz
Right now plan to keep the logic enabled by default for all the users. Before the
v7.15
release we should measure the performance overhead of the final solution in #102706 Based on the final result, we might make the service opt-in.Also, there is a PR in nodejs v14 that should improve
async_hooks
performance by 3-4 times.Checklist
For maintainers