Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eth call by hash #1905

Merged
merged 6 commits into from
Sep 21, 2020
Merged

Eth call by hash #1905

merged 6 commits into from
Sep 21, 2020

Conversation

leoyvens
Copy link
Collaborator

@leoyvens leoyvens commented Sep 18, 2020

This makes calls by block hash instead of block number. The tricky part is that contract calls may no longer be retried indefinitely, so if the eth node returns an error after 10 retries, we consider that a possible reorg. Getting that information returned to the instance manager requires some plumbing. The instance manager will then reset the block stream so we may handle the reorg.

There is one edge case which is when re-processing a block for new data sources. At this point we have a dirty indexing context due to the new data sources, and reverting that isn't a straightforward task, so if we fail to make a contract call that is treated as an ordinary error and not retried. Ideally we'd be able to retry this case as well, but that's left as future work.

I could not test this handling an actual reorg because that's not easy to reproduce. But I did test that when forcing an invalid hash on the call, the block stream would be reset.

It is not a realiable behaviour.
This used to be necessary when Geth
returned empty responses for temporary errors
but that is no longer the case.
@leoyvens leoyvens requested a review from a team September 18, 2020 21:50
Err(_) => false,
}
// Ganache does not support calls by block hash.
// See https://github.com/trufflesuite/ganache-cli/issues/745
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thanks for including a link to the issue. I've verified eth_clientVersion returns ...TestRPC... so the distinction we're making here looks correct.

.limit(16)
.no_logging()
.no_timeout()
.limit(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What impact does this have in situations where the Ethereum node is temporarily down? Do we still keep retrying indefinitely on e.g. network errors, or do subgraphs fail eventually, after 10 attempts?

Comment on lines +667 to +682
Err(MappingError::PossibleReorg(e)) => {
info!(ctx.state.logger,
"Possible reorg detected, retrying";
"error" => format!("{:?}", e.to_string()),
"id" => ctx.inputs.deployment_id.to_string(),
);

// In case of a possible reorg, we want this function to do nothing and restart the
// block stream so it has a chance to detect the reorg.
//
// The `ctx` is unchanged at this point, except for having cleared the entity cache.
// Losing the cache is a bit annoying but not an issue for correctness.
//
// See also b21fa73b-6453-4340-99fb-1a78ec62efb1.
return Ok((ctx, true));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this answers my previous question. We're bubbling the problem up and retrying the stream as a whole because we may not want to stick to the hash that keeps failing. Makes perfect sense.

Comment on lines +759 to +762
// This treats a `PossibleReorg` as an ordinary error which will fail the subgraph.
// This can cause an unecessary subgraph failure, to fix it we need to figure out a
// way to revert the effect of `create_dynamic_data_sources` so we may return a
// clean context as in b21fa73b-6453-4340-99fb-1a78ec62efb1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a tradeoff worth making to get the feature in. It also seems like something we should prioritize immediately after merging. I'm pretty sure the problem will happen, and it'll surface in a similar way to the current issues caused by calling by block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering how these triggers are much less common, the chance of us hitting the bug here is proportionally lower. And this bug already doesn't come up everyday, so I wouldn't count on us seeing this happen. But I think the fix for this and for the duplicate trigger processing will be similar, so we can try to fix them together.

Err(e) => Err(anyhow::anyhow!(

// Any error reported by the Ethereum node could be due to the block no longer being on
// the main chain. This is very inespecific but we don't want to risk failing a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: inespecific -> unspecific?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants