Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to block the publisher when subscriber queue is full #615

Open
2 of 9 tasks
budrus opened this issue Mar 17, 2021 · 17 comments · Fixed by #652, #663, #674, #681 or #697
Open
2 of 9 tasks

Possibility to block the publisher when subscriber queue is full #615

budrus opened this issue Mar 17, 2021 · 17 comments · Fixed by #652, #663, #674, #681 or #697
Assignees
Labels
enhancement New feature
Milestone

Comments

@budrus
Copy link
Contributor

budrus commented Mar 17, 2021

Brief feature description

Today our default is to use an "overflowing queue" for the subscribers. If the subscriber does not consume fast enough we start loosing samples. An option would be nice to block the publisher in this case for ensuring that no samples are lost.

Detailed information

The overflowing queue starts to drop the oldest sample in case of an overflow, so technically it behaves like a ring buffer. In many use cases this is fine as we want to have a "provide the last X samples" contract. E.g. if a subscriber is only interested in latest greatest data, they can set the queue size to 1 and we don't waste memory chunks with samples that are not interesting for the subscriber. We often also do not want to have an interference from a subscriber back to a publisher. So if the subscriber is not fast enough to consume all samples solutions could be

  1. increase the frequency of the subscribing application if it operates in polling mode
  2. increase the queue size for the subscriber
  3. decrease the runtime for the subscribing application
    But there also might be use cases where it is fine to slow down the publisher to ensure that no data is lost in our system. The solution would be to block the publish() call when we detect a queue overflow until the subscriber popped samples and there is again a free slot in the queue. Sure, this has an influence on the publishing applications ans also other subscribers that are connected to this publisher. This is comparable to the DDS history QoS KeepAll. The normal behavior with our overflowing queue is comparable to the DDS history QoS KeepLastX

ToDo

When implemented implement the following integration tests:

  • Modified icedelivery where subscriber acquires no samples until publisher blocks. When publisher blocks press CTRL-c (for both sides)
  • Unsubscriber subscriber when publisher is in blocking push.
  • Destroy subscriber object when publisher is in blocking push.
  • Subscribe new subscriber when publisher is in blocking push.
  • Unsubscribe different subscriber when publisher is in blocking push with another subscriber.
  • Optimization in ChunkDistributor Iox #615 block publisher when subscriber queue full #663 (comment)
  • Fix TriggerQueue Iox #615 block publisher when subscriber queue full #663 (comment)
  • Ctrl+C on an application with an publisher blocked by a slow subscriber doesn't shut down when a signal handler is installed; this is due to the while (!remainingQueues.empty()) in ChunkDistributor::deliverToAllStoredQueues which is not stopped when SIG_TERM has a custom signal handler
    • a Runtime::unblockShutdown could be implemented
    • this could call stop offer on all the publisher
    • it must be carefully checked what the stop offer call does since only a limited number of functions are allowed to be called in the signal handler (https://man7.org/linux/man-pages/man7/signal-safety.7.html)
  • An application with a blocked publisher slows down the RouDi shutdown due to the 45s processKillDelay in RouDi::shutdown method
    • after m_prcMgr->requestShutdownOfAllProcesses(); RouDi has to make all the publisher stop offering so that the discovery loop can remove the subscriber queues from the ChunkDistributor of the publisher
@budrus budrus self-assigned this Mar 17, 2021
@budrus budrus added the enhancement New feature label Mar 17, 2021
@budrus
Copy link
Contributor Author

budrus commented Mar 17, 2021

Proposed solution

  1. Introduce a publisher option that allows subscribers to block this publisher if they have a queue overflow. Publisher can only be blocked if this is enabled
  2. Introduce a subscriber option that indicates that this subscriber does not want to use any samples. This will result in a rejecting and not an overflowing queue for this subscriber
  3. If a subscriber requests the publisher to block if the queue is full but the publisher does not allow this, we print a warning on connect or call the error handler (tbd)
  4. If the publisher supports blocking, the subscriber requests this and we have an overflow, the publisher has to wait and try again. The simplest solution would be a busy loop that tries to push until it succeeds, the advanced version would be to use our inter-process condition variable to wake up the publisher from a non-busy wait when there is again space in the queue. A good compromise for now could be something in between, like tiny sleeps and retries
  5. We extend iceperf to also measure the throughput that could be reached with the blocking behavior as this should guarantee that no samples are lost.

@budrus budrus mentioned this issue Mar 17, 2021
@elBoberido
Copy link
Member

@budrus iceperf already does it more or less this way. One appliacation only sends data after it received data from the other, so we already have a blocking wait for the publisher.

@budrus
Copy link
Contributor Author

budrus commented Mar 17, 2021

@elBoberido That's somehow right. But only because we have the ping-pong back channel. For a throughput measurement I think other middleware doing a sent as fast as possible and check if everything was received and how much in a fixed time. So I would prefer to also have a setup close to what people do when testing iceoryx. This is send like crazy as fast as possible and see if there is any data lost and what throughput it has.

@elBoberido
Copy link
Member

@budrus okay, with this approach there will almost certainly be a data loss when the publisher is not blocked since the queue sizes are way to small to hold enough samples even if the OS stops the subscriber only for a few milliseconds.

@elfenpiff
Copy link
Contributor

elfenpiff commented Mar 18, 2021

I would suggest the following approach.

  1. adjust the trigger queue
    • add a template parameter to select the queue type (SoFi, FiFo etc.)
    • add a template policy to select the waiting strategy (e.g. semaphore, busy wait, condition variable)
    • implement the easiest policy for the trigger queue (more sophisticated waiting policies can come later)
    • integrate the trigger queue into the variant queue
    • adjust the trigger queue interface (add blockingPush, timedPush, tryPush as pendant to blockingPop, timedPop, tryPop
  2. use the trigger queue flavor in subscriber when option is set to I_WANT_IT_ALL
  3. use blocking push in publisher when option is not DONT_STOP_ME_NOW or is set to THE_SHOW_MUST_GO_ON
  4. Think about the fact that Freddy Mercury seems to know more about lock free programming then we do?!

@budrus budrus added this to the Prio 1 milestone Mar 18, 2021
@ankitkk
Copy link

ankitkk commented Mar 20, 2021

This is great. Anything which doesn’t force or imply a threading model on clients is awesome.

@budrus
Copy link
Contributor Author

budrus commented Mar 25, 2021

Hackathon:

  1. extension of iceperf -> @elBoberido
  2. extension of publisher and subscriber options C++, extension of runtime and RouDi processMessage() -> @FerdinandSpitzschnueffler
  3. extension of publisher and subscriber options C, needs 2. -> @FerdinandSpitzschnueffler
  4. Extension of trigger queue, blocking behavior -> @elfenpiff
  5. Extension of constructSubscriber(), needs 4., 2.
  6. Discovery. Store reliable/best effort in BasePort and extension of PortManager discovery matching -> @mossmaurice
  7. Naming things -> @budrus

FerdinandSpitzschnueffler added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
Signed-off-by: Marika Lehmann <marika.lehmann@apex.ai>
mossmaurice added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
…ueFullPolicy

Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
mossmaurice added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
…nd let RouDi compare it in the discovery loop
elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
…-queue-full' of github.com:ApexAI/iceoryx into iox-eclipse-iceoryx#615-block-publisher-when-subscriber-queue-full
mossmaurice added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
…-queue-full' of github.com:ApexAI/iceoryx into iox-eclipse-iceoryx#615-block-publisher-when-subscriber-queue-full
elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
elfenpiff added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
@orecham
Copy link
Contributor

orecham commented Mar 25, 2021

@budrus with an impossible task

@budrus
Copy link
Contributor Author

budrus commented Mar 25, 2021

@budrus with an impossible task

@ithier at least you realized that I got the hardest job

mossmaurice added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
mossmaurice added a commit to ApexAI/iceoryx that referenced this issue Mar 25, 2021
…sher/SubscriberPort

Signed-off-by: Simon Hoinkis <simon.hoinkis@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Christian Eltzschig <me@elchris.org>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…o solve sanitizer issue

Signed-off-by: Christian Eltzschig <me@elchris.org>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…ssue

Signed-off-by: Christian Eltzschig <me@elchris.org>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
… libc++abi.dylib: terminating

Signed-off-by: Christian Eltzschig <me@elchris.org>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…en chunks are lost

Signed-off-by: Michael Poehnl <michael.poehnl@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…riant queue

Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Michael Poehnl <michael.poehnl@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…lisher-when-subscriber-queue-full

Iox eclipse-iceoryx#615 block publisher when subscriber queue full
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…ions example

Signed-off-by: Michael Poehnl <michael.poehnl@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
… by full subscriber queue

Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…ouDi-shutdown-with-blocked-publisher

iox-eclipse-iceoryx#615 unblock RouDi shutdown with blocked publisher
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…lisher-when-subscriber-queue-full

Iox eclipse-iceoryx#615 block publisher when subscriber queue full
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
Signed-off-by: Mathias Kraus <mathias.kraus@apex.ai>
marthtz pushed a commit to boschglobal/iceoryx that referenced this issue May 12, 2021
…ouDi-shutdown-with-blocked-publisher

Iox eclipse-iceoryx#615 unblock roudi shutdown with blocked publisher
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment