-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WARN Logging on Slow Transport Message Handling #62444
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -29,6 +29,7 @@ | |||||
import org.elasticsearch.common.io.stream.NamedWriteableRegistry; | ||||||
import org.elasticsearch.common.io.stream.StreamInput; | ||||||
import org.elasticsearch.common.transport.TransportAddress; | ||||||
import org.elasticsearch.common.unit.TimeValue; | ||||||
import org.elasticsearch.common.util.concurrent.AbstractRunnable; | ||||||
import org.elasticsearch.common.util.concurrent.ThreadContext; | ||||||
import org.elasticsearch.threadpool.ThreadPool; | ||||||
|
@@ -51,6 +52,8 @@ public class InboundHandler { | |||||
|
||||||
private volatile TransportMessageListener messageListener = TransportMessageListener.NOOP_LISTENER; | ||||||
|
||||||
private volatile long slowLogThresholdMs = Long.MAX_VALUE; | ||||||
|
||||||
InboundHandler(ThreadPool threadPool, OutboundHandler outboundHandler, NamedWriteableRegistry namedWriteableRegistry, | ||||||
TransportHandshaker handshaker, TransportKeepAlive keepAlive, Transport.RequestHandlers requestHandlers, | ||||||
Transport.ResponseHandlers responseHandlers) { | ||||||
|
@@ -71,6 +74,10 @@ void setMessageListener(TransportMessageListener listener) { | |||||
} | ||||||
} | ||||||
|
||||||
void setSlowLogThreshold(TimeValue slowLogThreshold) { | ||||||
this.slowLogThresholdMs = slowLogThreshold.getMillis(); | ||||||
} | ||||||
|
||||||
void inboundMessage(TcpChannel channel, InboundMessage message) throws Exception { | ||||||
channel.getChannelStats().markAccessed(threadPool.relativeTimeInMillis()); | ||||||
TransportLogger.logInboundMessage(channel, message); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we move the timing up to this level, since the logging on this line may also be a source of slowness? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ |
||||||
|
@@ -90,6 +97,7 @@ private void messageReceived(TcpChannel channel, InboundMessage message) throws | |||||
final Header header = message.getHeader(); | ||||||
assert header.needsToReadVariableHeader() == false; | ||||||
|
||||||
final long startTime = threadPool.relativeTimeInMillis(); | ||||||
ThreadContext threadContext = threadPool.getThreadContext(); | ||||||
try (ThreadContext.StoredContext existing = threadContext.stashContext()) { | ||||||
// Place the context with the headers from the message | ||||||
|
@@ -138,6 +146,12 @@ private void messageReceived(TcpChannel channel, InboundMessage message) throws | |||||
} | ||||||
} | ||||||
} | ||||||
} finally { | ||||||
final long took = threadPool.relativeTimeInMillis() - startTime; | ||||||
final long logThreshold = slowLogThresholdMs; | ||||||
if (logThreshold > 0 && took > logThreshold) { | ||||||
logger.warn("Slow handling of transport message [{}] took [{}ms]", message, took); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be useful to include the phrase
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ |
||||||
} | ||||||
} | ||||||
} | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -132,6 +132,12 @@ public final class TransportSettings { | |
Arrays.asList("internal:coordination/fault_detection/*"), | ||
Function.identity(), Setting.Property.Dynamic, Setting.Property.NodeScope); | ||
|
||
// Time that processing an inbound message on a transport thread may take at the most before a warning is logged | ||
public static final Setting<TimeValue> SLOW_OPERATION_THRESHOLD_SETTING = | ||
Setting.positiveTimeSetting("transport.slow_operation_logging_threshold", TimeValue.timeValueMillis(300), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm still a bit concerned about log spam -- we aren't aware of anything that blocks the transport threads for so long today but that might just be because we don't surface it yet, and some configs might be hitting this a lot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it might be kind of nice to be able to see bursts here (thinking about spotting something like CPU throttling)? (if you have a time period where there's a bunch of logging across multiple threads for all kinds of messages without a clear pattern or so). I guess we could use something like the log4j burst filter here to capture that kind of thing and still rate limit but what this: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd be happy with No need for a separate debug logging threshold IMO -- given that it would need user involvement to see more events in the logs, they may as well just reduce There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point -> pushed the change to 5s :) |
||
Setting.Property.Dynamic, Setting.Property.NodeScope); | ||
|
||
|
||
private TransportSettings() { | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly the chain of setters here is a little hacky but:
InboundHandler
TransportService
anyway.