Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Feat: A more robust provider finder for sessions (for now) and soon for all bitswap #60

Merged
merged 9 commits into from
Feb 5, 2019

Conversation

hannahhoward
Copy link
Contributor

Goals

Build a robust provider finder for sessions so that we can deduplicate provider logic (#52) and soon merge GetBlock w/ Session.GetBlocks (#49)

Implementation

A manager for find provider requests that:

  • dedups requests
  • rate limits requests
  • manages timeouts
  • handles connecting to peers once they are found

For discussion

Will writeup general concurrency architecture tomorrow

@ghost ghost assigned hannahhoward Jan 23, 2019
@ghost ghost added the status/in-progress In progress label Jan 23, 2019
@hannahhoward
Copy link
Contributor Author

Some tests are mildly unreliable, need to work on a bit more.

@hannahhoward
Copy link
Contributor Author

hannahhoward commented Jan 23, 2019

Tests for this package should be more reliable now (still need to work on other BS ones)

@hannahhoward hannahhoward changed the base branch from master to feat/stabilize-tests January 26, 2019 02:20
Add a manger for querying providers on blocks, in charge of managing requests, deduping, and rate
limiting
Add functionality to timeout find provider requests so they don't run forever
Integrate the ProviderQueryManager into the SessionPeerManager and bitswap in general

re #52, re #49
Add debug logging for the provider query manager and make tests more reliable
Removed a minor condition check that could fail in some cases just due to timing, but not a code
issue
@hannahhoward hannahhoward changed the base branch from feat/stabilize-tests to master January 30, 2019 21:24
}

type receivedProviderMessage struct {
k cid.Cid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick/nonblocking: should we get more explicit with these names? key instead of k and session instead of ses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well ses is gone.

Honestly, I struggle with what to call k -- cause k is actually a CID, not a key. But, key or k is widely used, and cid is usually the name of the package unfortunately. Maybe contentID. What do you use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've noticed the same, I've been using key and cid mixed in different places, maybe we can start using just key consistently? And I'll update on my end too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I'd be done for that but I want to all agree on what we use cause it's a problem through the bitswap codebase

@Stebalien
Copy link
Member

What's the motivation for breaking up the "Network" interface that way?

wg.Add(1)
go func(p peer.ID) {
defer wg.Done()
err := pqm.network.ConnectTo(pqm.ctx, p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered #51 (comment)? That plus switching to sessions everywhere seems like a more robust solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and, I'm inclined to not address that until we get sessions everywhere merged first. This is necessary to do that. Hence: chicken/egg. Arguably it could all be one big PR, but I'm also generally disinclined to do things that way. Let me know if that's ok to proceed in this manner.

findProviderCtx, cancel := context.WithTimeout(pqm.ctx, pqm.findProviderTimeout)
pqm.timeoutMutex.RUnlock()
defer cancel()
providers := pqm.network.FindProvidersAsync(findProviderCtx, k, maxProviders)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to keep working even if we no longer need to find any providers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have fixed this. The context is no longer derived from the overall query manager, but a context for this keys request that itself is cancelled if all sessions requesting the key cancel their request.

providerquerymanager/providerquerymanager.go Outdated Show resolved Hide resolved
k: k,
}
// clear out any remaining providers
for range incomingProviders {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do this drain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't want to block incase there are outstanding sends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but maybe there is a better way to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about it a bit more, and yea, this is neccesary. The reason is this:
messages get processed serially in the run loop. One of those messages might be an incoming provider found message that gets broadcast to everyone who is waiting. We don't want that to block, so we need to make sure the channel is read, emptied, and closed (which happens once the cancel is processed) before we proceed.

removed session id user completely from providerquerymanager
Make sure if all requestors cancel their request to find providers on a peer, the overall query gets
cancelled
@hannahhoward
Copy link
Contributor Author

What's the motivation for breaking up the "Network" interface that way?

@Stebalien Well the session peer manager now only uses the connection manager (other network calls are through the Provider Query Manager) so I figured why not simplify even further? Makes tests simpler among other things.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to go.

case <-pqm.ctx.Done():
return
case <-sessionCtx.Done():
pqm.providerQueryMessages <- &cancelRequestMessage{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs to select on pqm.ctx.Done().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this needs to read from incomingProviders at the same time. Otherwise, we're relying on the buffer in providerQueryMessages being large enough to fit this message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. I hate go. :P will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, I think

pqm.timeoutMutex.RLock()
findProviderCtx, cancel := context.WithTimeout(fpr.ctx, pqm.findProviderTimeout)
pqm.timeoutMutex.RUnlock()
defer cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will defer till the function returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that bad? this cancel now only serves to make sure the context gets cleaned up. the cancel due to all the requests cancelling is the parent of this context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defered functions will get pushed onto a stack which will continue to grow until the worker returns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, it'll leak a bunch of memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I get a linter error if I don't name it and call it. I moved the discard to right after the for range of channel loop, which seems like the first safe time to cancel it.

Keep channels unblocked in cancelling request -- refactored to function. Also cancel find provider
context as soon as it can be.
@Stebalien
Copy link
Member

We should probably test this on the gateway, if possible. This now blocks the provider workers on actually connecting to the providers so that could have unintended consequences.

@hannahhoward hannahhoward merged commit bb89789 into master Feb 5, 2019
@ghost ghost removed the status/in-progress In progress label Feb 5, 2019
Jorropo pushed a commit to Jorropo/go-libipfs that referenced this pull request Jan 26, 2023
…provider

Feat: A more robust provider finder for sessions (for now) and soon for all bitswap

This commit was moved from ipfs/go-bitswap@bb89789
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants