Pub/Sub StreamingPull
receives many duplicates when there is a backlog
#3383
Labels
api: pubsub
Issues related to the Pub/Sub API.
priority: p2
Moderately-important priority. Fix may not be included in next release.
status: blocked
Resolving the issue is dependent on other work.
type: question
Request for information or clarification. Not an issue.
The Pub/Sub
StreamingPull
API gives many duplicates when the messages are small and there is a backlog. This is a difference fromPull
, which does not exhibit this behavior. After a while, more than 50% of messages can be duplicates. This makes it very hard to process a backlog. @kir-titievsky suggested that I create a new issue when I described it in #2465.I created two test programs to replicate this issue:
MessageReceiver
) Java API to receive messages and write to a file of JSONs. Optionally, you can also use my customgoogle-cloud-java
branch in which I instrumentedMessageDispatcher.java
to log the message IDs that are acked.--initial-publish-messages=10000
) and then publishes a stream of messages (default--publish-period=400
means 2.5 messages/second)--message-processing-time=5000
ms), then throttles acking to--period=333
which means 3 messages/second, and then acks the message. Note that it should make progress since its--period
is less than the publisher’s--publish-period
, but it doesn’t because of the duplicate messages.FlowControlSettings
. By default,--concurrent-messages=20
means that 20 receivers sleep in parallel. Since 5000 < 333*20 = 6660, there are enough concurrent threads that a 5000ms sleep does not reduce the subscriber throughput below the desired 3 messages per second.jq < /tmp/inv-log-cloudpubsub-pub2.5-sub3.jsons --slurp '[.[] | .messageId] | {unique: sort|unique|length, total: length} | .duplicates = .total - .unique'
StreamingPull
API as the high-levelMessageReceiver
API does. LikeCloudPubSub.java
, it logs the message ids to stdout and to a file of JSONs.--initial-publish-messages=10000
) and then publishes a stream of messages (default--publish-period=400
means 2.5 messages/second)onError
is implemented on the ack StreamObserver so that we can detect acks that failed. I have not seen any failures.request(1)
to get one message at a time, queues them up, and processes them with similar timing asCloudPubSub.java
(i.e., waits 5 seconds per message and then throttles acks to 3 messages per second). It also calls modack to extend deadlines.jq < /tmp/inv-log-grpc-pub2.5-sub3.jsons --slurp '[.[] | .messageId] | {unique: sort|unique|length, total: length} | .duplicates = .total - .unique'
MessageReceiver
Java API, but at version 0.21.1-beta which still usedPull
instead ofStreamingPull
. It does not give a significant number of duplicates.I opened a support ticket for this, Case 15877623, 2018-05-21: Pub/Sub subscriber receives duplicated MessageIds and never catches up. On 2018-05-29, the representative said this is a known issue with
StreamingPull
, but there is no ETA for fixing it and that I should poll the PubSub release notes for updates.The text was updated successfully, but these errors were encountered: