-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Round Robin PRQ #3982
WIP: Round Robin PRQ #3982
Conversation
@dgrisham could you rebase this on latest master so that only your changes are here? |
7feb232
to
9b8251b
Compare
@whyrusleeping yep, just did. |
@dgrisham make sure the go-libp2p-peer package hash that you imported there is the latest one |
@whyrusleeping the import is in master as well, think it came after I rebased. won't be able to test this until morning (in Europe), but may need to be fixed in master |
@whyrusleeping I ran
That hash is indeed the most recent for
Am I doing something weird/not updating dependencies as I should be or something? (Also, accidentally closed the issue, wasn't intentional.) |
@dgrisham the easiest way to fix things is to just run: |
9b8251b
to
2f999d4
Compare
@whyrusleeping hm, that didn't work unfortunately. ended up re-cloning the go-ipfs and moving my changes into it, seems to be fine now |
StatusAfter talking with @whyrusleeping, we're going to stick closer to the initial implementation (as opposed to the 'Alternative Implementation Idea' discussed below). Here's the current plan:
Everything below is deprecated, left for archival/further commentsUpdate -- Alternative Implementation IdeaAs mentioned in the OP, there may be an issue with not punishing poorly-performing peers who degrade performance when there are no 'good' peers for them to be compared to. Since we're using the Bitswap ledgers to measure peers, it seems to make more sense to instead think of it as "consider peer An alternative implementation would be to do something like:
Then we have the question of how to allocate requests to peers within a round. Two options would be:
Both of these options allocate the same number of requests to all peers within a single round. This makes it so that, on a large enough time scale, peers are effectively served fairly (if we assume constant weights, at least...I'm pretty sure that's still true if the weights vary, which of course they will). Option 2, though, allocates a different number of requests on a round-by-round basis (based on |
7f11665
to
1dfbfda
Compare
1dfbfda
to
1e0bce1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments to the code.
I'll need to read it a few more times to really digest all the logic here.
One question I have:
Assuming I'm serving 2 peers (a,b), each with 50/50 allocations, peer A with 2x the bandwidth of B (and me having more than both of them), won't the bandwidth of peer B throttle peer A?
func (rrq *RRQueue) Shift() { | ||
var peer *RRPeer | ||
peer, rrq.allocations = rrq.allocations[0], rrq.allocations[1:] | ||
rrq.allocations = append(rrq.allocations, peer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just swap references here, reslicing this way may lead to some weird memory/gc behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call 👍
// delete task if it's trash | ||
if task.trash { | ||
task = nil | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the task to nil is not really needed. Inverting the logic would make it a bit simpler:
if !task.trash {
return task
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good point. I copied this part from the way it was done in the old PRQ, not sure why it was that way.
return &strategy_prq{ | ||
taskMap: make(map[string]*peerRequestTask), | ||
partners: make(map[peer.ID]*activePartner), | ||
pQueue: pq.New(partnerCompare), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partnerQueue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I named it pQueue
to maintain symmetry with the old PRQ. I can do another PR after and change the name in both.
815bdaa
to
56177bd
Compare
@magik6k Hm, maybe. Someone should check me on this, but my understanding is that any issues with a peer having low bandwidth would manifest in packet loss rather than throttling another peer. E.g. if you send more data than peer B can receive within a certain time frame and the line starts to 'clog' around the bottleneck then the router (probably the one peer B is directly connected to) will start to drop packets. But once the data going to peer B is on the wire, you move on and start sending to peer A. |
I'm not sure how it's handled internally in libp2p, but TCP (which we are using), will just slow down data upload if the client can't receive it at full speed. On the app side there are 2 things that can happen:
So sending data to peer B would slow down, peer A would exhaust it's allocations way before B, effectively making peer A transfer as slow as peer Bs. Is this correct or am I missing something? |
@magik6k Right, okay, I see what you mean -- haven't had to think about transport protocols in awhile, so I appreciate you talking through this. I'm wondering if the decoupling of the PRQ from the message queues/workers has any importance here. The decision engine will I think one of the crucial steps in all of that is the peer's allocation is reduced once the block is removed by I assume there can still be an issue with a message worker taking a long time sending peer B's message, and too much of this might start to throttle other users, but that seems like it might be an issue on the message worker end and not the PRQ end. I could very well be missing the mark here, thoughts on all of that? |
I don't think having PRQ being independent from message workers is a good thing as we need some sort of feed-back mechanism from the networked part to properly adjust the weights Let's define couple of scenarios (with what I think may happen, though that's something that should get proper benchmarks):
What we want to optimize for is maximum possible bandwidth utilization and fair bandwidth sharing. I'm not sure if it's possible to do that or if any OS supports this - it might be easier to tweak some TCP parameters of underlying connections to tune the speed at which we serve the data to peers to what the bitswap strategy says. |
Absolutely. One of the longer-term goals of the PRQ is to have the strategy function (which sets peer weights) accept other inputs than just the Bitswap ledger, and values from the message workers would certainly fit into this category. But maybe that's something worth getting into this PR.
Agreed with all of this. Ultimately we want to be allocation bandwidth and not data as I've done here, this is just a first pass that isn't allocating bandwidth since, as you mentioned, bandwidth is difficult to measure. Maybe @Stebalien has thoughts on your idea of modifying the underlying TCP connections to serve our purposes here? I'll continue to research approaches to this in the meantime. Might be worthwhile to see how this is handled in Bittorrent clients. |
(hopefully this is actually addressing your issue and not my just the issue I'm perceiving...) First, to tackle the specific issue of peer A limiting the bandwidth of peer B. TCP (and all† of our transports) have backpressure. If you send more than the other side the network can handle, writes to the stream will slow down. Therefore, the correct way to deal with situations like this is not to measure bandwidth but to simply rely on the backpressure mechanism. Unfortunately, I believe the only way to actually fix the issue being discussed is to get rid of workers. If we had a low-level event loop driven IO system, the worker system would definitely be the way to go. In a low-level application, you'd have an event loop that would wake up when a connection is able to receive some number of bytes, write those bytes, and then go to sleep until another connection is ready to receive bytes. However, we can't do that in go (at least, not with our current stream abstractions). Any system that limits itself to N blocking workers will have the problems we're discussing here (unless I'm missing something, @whyrusleeping?). Instead, I believe the best solution is to use resource budgets and rely on backpressure. That is:
Note: spawning go routines can be expensive (likely the reason we have the worker system). However, we can alleviate that by having a large, dynamically sized worker pool (where we can limit the maximum number of workers using a budget): select {
case workers<-job:
default:
go worker(workers, job)
}
func worker(workers <-chan job, j job) { /* ... */ } There are a bunch of other optimizations we can make but we can do that down the road. We should also be using bandwidth/CPU budgets but I'll get into those below. Now for the fun part: what problem are we actually trying to solve? Really, we're trying to: Optimize: throughput and fairness FairnessWe don't want to unduly favor any given peer. How to do deal with this has been thoroughly discussed here so I'm not really going to go into depth (I also don't really have many strong opinions on this topic). However, I'd make sure to consider resources other than bandwidth. That is, also consider memory*time and, if possible, CPU usage (so that lots of tiny requests aren't favored over a few large ones). ResourcesWe have three constrained resources: network, memory, and CPU. My IPFS node shouldn't eat 500MiB and CPU core just because my neighbors want some files from me. MemoryAs discussed above, the simplest solution is to attack this problem directly by budgeting memory. Whenever handling a request, deduct NetworkIdeally, we'd be able to say: never use more then N MiB/s bandwidth. However, this can be a bit tricky to do efficiently (at the stream level, at least). We can, but we probably don't have to do that now (or ever, really). A simpler tactic would be to simply aim for some average bandwidth by limiting the rate at which we queue up blocks to be sent. This is far from perfect but I believe it'll work well enough for our purposes. Importantly, it'll actually be quite accurate when under constant heavy load (only being wildly inaccurate under bursty conditions where, IMO, bandwidth constraints are less important). CPUEventually, we'll probably want to deal with this by putting backpressure on noisy peers. The simplest way to do this would be to sleep occasionally while reading off the bitswap request (wantlist) stream if the peer gets too chatty. ThroughputFinally, we don't want slow peers to keep fast peers from having their requests processed. This is actually a well known DoS attack called slow-lorris and is, unfortunately, impossible to solve without either:
This is because, unfortunately, it's very difficult (if not impossible) to distinguish between a bad peer and a slow peer. IMO, the best we can do is:
And do the best we can. Eventually, we'd like a reputation system that'd allow us to better solve this issue but we don't have that yet :(. † Except for the go-multiplex muxer which will kill a stream if you don't read from it fast enough. However, that will only happen if the receiver is slower than both the sender and the network (unlikely). Also, this is the only transport we have, AFAIK, that suffers from something like this. |
Initial round robin PRQ (aka `strategy_prq`) implementation. The strategy_prq is a peerRequestQueue implementation that accepts a Bitswap strategy function and serves peers in a weighted round-robin fashion, where the weights are calculated by the strategy function. This implementation maintains backwards-compatibility with the original PRQ. The strategy PRQ has 4 tests: one test copied from the original PRQ implementation and three that test the round-robin functionality specific to the strategy PRQ. Done: - Round-Robin Queue (RRQ) implementation with tests. - Strategy PRQ implementation. This uses the RRQ internally and provides the same interface as the existing PRQ. - Codebase is compatible with the old PRQ and the Strategy PRQ. - Write tests for Strategy PRQ. TODO: - Add flag to switch between original PRQ and Strategy PRQ. License: MIT Signed-off-by: David Grisham <dgrisham@mines.edu>
…pected with the SPRQ. License: MIT Signed-off-by: David Grisham <dgrisham@mines.edu>
f3cedc5
to
65653b5
Compare
License: MIT Signed-off-by: David Grisham <dgrisham@mines.edu>
License: MIT Signed-off-by: David Grisham <dgrisham@mines.edu>
…-ipfs into impl/bitswap/round-robin-prq
License: MIT Signed-off-by: David Grisham <dgrisham@mines.edu>
This is a preliminary implementation of a round robin peer-request-queue (PRQ) for Bitswap.
Motivation
The current PRQ orders peers based on a specified comparison function, which was originally intended to one day implement Bitswap strategies. However, this comparison function is limited in that it doesn't map very well onto the idea of Bitswap strategies. For example, it is difficult to say something like "Peer A has a 2x preference over Peer B, so satisfy Peer A's requests twice as much as Peer B's". However, a weighted round robin queue can easily satisfy such a request. For this reason, I'm working to replace the current PRQ implementation with one that uses a weighted round robin queue to send data to users based on their relative weights, and these weights are determined by a pluggable Bitswap strategy.
Implementation
RRQueue
The round robin queue is implemented as the
RRQueue
type:A single round satisifies
roundBurst
requests. At the start of each round, the peers'weights
(determined by theStrategy
, discussed below) determine the number of requests each peer receives (as a proportion ofroundBurst
), which is stored inallocations
. For example, if we have two peers as such:then
peer1
will be allocated 3x as many requests aspeer2
. So, ifroundBurst = 1000
, thenpeer1
would be allocated 750 requests for this round,peer2
would be allocated 250 requests. ThetotalWeight
is simply used to reduce the number of calculations to find the sum of allweights
and is used to calculate peer allocations (in this example,totalWeight
would be equal to3 + 1 = 4
.allocations
is calculated for each round robin round, whileweights
persists between rounds and a peer's weight is updated on aprq.Push
.peerBurst
determines the number of consecutive requests a peer will be served within the round. For example, ifpeerBurst = 2
in the above example, thenpeer1
would be served 2 of its 3 requests (then placed at the end of the queue), thenpeer2
would be served its only request (and removed from the queue), thenpeer1
would be served its final request.roundBurst
andpeerBurst
are meant to be tunable parameters and are currently set to simple test values.Strategies
Bitswap strategies are implemented by the following types:
A
StrategyFunc
takes in a Bitswap ledger (as an immutableReceipt
) and calculates a weight to be used in theRRQueue
.freezeWeight
is used in freezing a peer, which is a concept that was a part of the previous PRQ implementation. To accommodate freezing, theRRQueue
reduces a peer's round robin weight based onfreezeWeight
(e.g.weights['peer1'] /= freezeWeight
when freezingpeer1
).Interface
The current implementation maintains the same outward-facing
prq
interface as the previous, with a single caveat: aStrategy
must be provided to thenewPRQ
function so that the PRQ can determine how to weight peers. For testing purposes, the current strategy is simply one that weights peers based on their ledgerValue
s.Testing
Tests are still being written to verify this implementation. To see how it works, you can run the preliminary tests (which only print out info, they do not actually verify anything yet) with
go test -v
in theexchange/bitswap/decision
directory. As of now, at least one of theengine_test.go
tests hangs, soengine_test.go
has been moved tofailing_tests
for testing convenience until the issue is resolved.Issues
The way requests are allocated here does not fully reflect the specification in the IPFS whitepaper (draft 3). In particular, since requests are allocated to peers based on their relative weight to the sum of all peers, a node may serve poorly-performing peers in the same way as peers who perform well.
For example, say node
i
has two peers,j
andk
. If peersj
andk
start out with the same weight, they may both stop serving peeri
's requests completely (known as 'freeriding') and still receive the same service fromi
(since their relative weights stay the same, andi
merely providesj
andk
with service based on the sum of their weights). This is not an issue if we can assume that a peeri
will always have a peerl
who is a long-standing good peer with a large weight (and thus outweighs bad peersj
andk
), but without this assumption peeri
is susceptible to attack by its peers.One way to address this would be to include a
baselineWeight
in aStrategy
, which is a value that could be interpreted as the line between a 'good' and a 'bad' peer. Then, theRRQueue
could somehow use this value to scale the round robin allocations. For example, ifbaselineWeight = 2
,weights['j'] = 1
andweights['k'] = 1
, we might include peersj
andk
in only half of the round robin rounds. However, this would require 'stalling' the round robin round somehow to ensure that peersj
andk
have to wait before their requests are fulfilled. We could also simply not serve peersj
andk
if their weight goes below some threshold, but this would simply provide a lower bound on the value that determines how much peersj
andk
can degrade performance without being penalized.Discuss!
All input and questions are welcome, let me know your thoughts on this!