-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BeaconNode APIs #25
Comments
The three general areas that I've found useful are chain info, node info and validator info. To provide some more information on each of these: Chain infoThese are properties of the chain and node, and should be inconsiderate of the node's connection to the network. Fields include:
Node infoThese are reflections of the node's current state, and as such can depend on the node's data and connection to the network. Fields include:
Validator infoThese are a simple way of obtaining data about a defined subset of validators. In Prysm this is a streaming gRPC call, which means that an interested can set up the list of validators they care about and receive real-time updates i.e. after an epoch transition:
|
Deleted post that I accidentally submitted |
I'd suggest a more segregated approach than to @mcdee. My goals are to:
This would be my suggestion to cover @mcdee's endpoints:
A rough consumer lifecycle might look like:
This would be useful info, but we have to be very clear that these are estimates. For a lot of the validator transitions the exact times are unknowable.
I agree that this would be useful. However, it raises something else. I think that it would be useful to allow queries that specify an epoch to also specify a block or state root. This allows users to query about things that aren't in the canonical chain. I imagine this would be useful for block explorers and slashers. |
@paulhauner sorry, to clarify: I was listing data I am using rather than suggesting an API structure. Regarding your proposed API structure: it looks fine, although I'm in general more of a fan of APIs that target use cases, for example if I wanted to create a
That's quite a lot of work, and likely to be repeated for any non-trivial request. I'd rather look at this from a user's point of view and provide APIs that meet their requirements, rather than are constrained by internal data structures of the beacon chain. Totally agree with the view of avoiding mixing constants with variables where possible. [Transition timestamp] [Epoch to which data pertains] The vast majority of requirements from client to node will be for real-time data. Explorers and the like will take the real-time data and put it in to a format useful for historical reporting. General users care about the state of the chain and their validators now, rather than last week. Adding an "at epoch x" style parameter to endpoints requires a lot of work, especially around edge cases. Is this something that is really necessary? (Note that I'm not against a node implementing this if they really want to, but making it a requirement for all nodes seems like overkill). Instead, streaming APIs do tend to work well with this type of data. If we take a streaming
Could I please ask here that we don't define an API that requires polling? Polling is always ugly, involves trade-offs that hurt either the server or the user, and usually end up with some form of limiter in place that confuses everybody. Streaming APIs for data that is fluid are easier to work with, as real-time as the node can manage, and don't result in wasted cycles (which may not matter individually but certainly will at scale). |
@paulhauner I would add/propose couple of changes
I would propose this to return Server sent events stream with just current epoc, slot and unix time. And have additional apis:
This could also have |
Sorry, I realized afterwards that this might be the case.
This seems like it approximates an opinionated vs unopinionated decision. I'd say that if we're going to choose opinionated then we need quite a firm list of use-cases and their priorities. I'm a little hesitant to go down the opinionated path since I don't feel I have a firm grasp on the uses cases of this application beyond staking. I expect to see a lot of emergent use cases and feel that it's a little early to become opinionated yet.
Yes, but then there's the counter argument that if I wanted to do some simple task then I would have downloaded a bunch of unnecessary information or that perhaps there's so many endpoints it becomes unclear which one to use. I think this type of discussion really needs to start from a list of prioritized use-cases as opposed to ad-hoc scenarios.
Personally I've done quite a lot of work in extracting old eth1 data for analysis. I'd imagine the presence of libraries like ethereum-etl indicates to me that other people find this interesting/necessary too. I'm keen to build an API that considers those doing research and analysis as first-class users. If you're giving historical information based on the state at your head, it doesn't seem particularly challenging to parameterize that over any state. We already do this and I believe Prysm does in some places too.
I haven't worked with SSE on the server side before, so I need to do some research here. A quick poke around for Rust and Nim libraries gives the impression that support is thin. Do you have some suggestions on how you wish to do streaming in HTTP @mcdee? |
It's not so much opinionated, more that there are obvious clusters of info, for example:
I'm not suggesting we have lots of endpoints of the form
We already have that in your proposed API; the entire There's no easy way to consider if 1 request with 2KB of data is "better" than 2 requests each with 1KB of data. I'd be inclined to group data where appropriate and lightweight, and break out anything that required significant data access or computation in to their own endpoints, but that's just my view.
Sure, but I was talking generally. How many people do an ETL of Ethereum data, compared to how many want to know their current balance? And again, I have no problem with the historical functionality being standardised, but not so much of a fan of it being a requirement.
There are a few streaming options out there (web sockets, SSE, HTTP/2 server push) but 'm not thinking in terms of a particular protocol at current; I'd rather we sorted the API requirements and then picked an protocol from those that can meet them. |
I did include seconds per slot in the
Agreed
Good question, I don't think we really know what the main uses for phase 0 APIs are apart from staking and block explorers. However, I would say that knowing your balance change across some period would be interesting to many stakers and arguably a basic requirement. E.g., how much have I made in the past month? I don't mind making them optional, but I think past balance and participation metrics will be very useful for validator UX. Specialised endpoints is always an option, but I am partial to a generic solution.
Sure. It seems to me your requirements stated so far can be fulfilled by:
Generally, I'm finding it hard to figure out what closes this issue so we can unblock #24 (#24 (comment)) and continue making progress. I assume static endpoints are a given, so perhaps we need to wait to see what people think about having streaming endpoints? Then, if we do all agree we need streaming, we can look at the requirements of those endpoints and choose a technology? Contrary to my previous comment (#25 (comment)), perhaps doing detailed listing of endpoints is something that we want to avoid here and instead do that once we've got a protocol to express the endpoint in. |
I think if there were concrete use cases, that'd be helpful in getting things moving fairly quickly. That way we could build out per use-case, without necessarily all use-cases being finalized, plus it gives a 'guide' to people that wish to consume things. It can also inform the decisions on what should be included / excluded in response objects. To my mind there's also differing consumers, from validators that want to get the work done, to monitoring tools that want to get an overview. If we define these well, we can tailor endpoints to their needs, and not build out generalised endpoints until the specific use cases we have are met (or maybe never?) The validator use cases seem to be relatively well encapsulated, as they
Monitoring / block explorers are presumably relatively well defined in their use cases as well. Unfortunately I'm not really across the entire scope of everything (is anyone?), but maybe we need resources from each area to define a minimum set of use cases so that we could build out an opinionated spec... The downside I guess is that we don't necessarily have everything anyone could ever need, but then maybe that also helps limit things that are built so everything that is actually built is required? The upside is we don't need to be experts in all areas to understand what is needed and why, and similarly the use cases for people writing software to consume the endpoints will be available to have a high-level view of how the endpoints are put to use. As an aside, TEKU is starting to publish the API here although at the moment its not automatically generated on pushing to master, so it will be slightly be hind. |
Sorry, I am late to this party. How can @prysmaticlabs be helpful here? We have written out many use cases in this product requirements document. We found these requirements to be satisfactory in meeting most or all of the needs of eth2 block explorers which are already in (testnet) production. Is anyone looking or has looked at the differences between Prysm's API and eth2.0-APIs? Prylabs team has many opinions on how this API should be designed which is reflected in our implementation of ethereumapis, but we're not sure how to best contribute to this conversation. |
What if we have fine-grained up to a certain extent API allowing for various use cases build atop of it? If this would be the case then tooling satisfying real user needs could be build with zero involvement of client developers and be re-used with any client software supporting that API. This is aligned with the following rule of thumb: keep your API fine-grained, services coarse-grained. Ethereum JSON-RPC showed that implementing use cases (like filters) as a part of a client is cumbersome and puts huge burden on client developers, especially, in those parts that have statefulness as a requirement. If we go down the use cases road then we will likely face the same problem. Some of the pain points have been previously shared in this thread. One of two main responsibilities of the client is to keep beacon chain's head and provide access to the data it already has with no additional computations required. We probably want to be in line with single responsibility principle and avoid any unnecessary complication of already sophisticated beacon chain client software. Of course, there is a drawback. Without additional tooling this approach makes beacon chain client almost useless for end user. It also could delay delivery date as this tooling will have to be created by some party. But this drawback seems to be a short term issue and shouldn't have any impact in the long term. Currently, I am not much into the details of the API, this is merely a high level point of view to the problem of API design in our context. Feel free to skip these thoughts if you find them irrelevant. |
The point about dumping internal data structures as API endpoints Vs. doing some work to present useful information to end users is an important one. The former puts significant burden on every entity that wants to use the data, whereas the latter places that burden on the beacon node client. In general I prefer the latter, from both the producer and consumer side. Producers benefit because the information provided is what the user wants, and they aren't continually bombarded with questions about how to translate from the internal information to the external information, or why their guess at translation from internal to external information is not consistent with some other implementation. Consumers benefit because they can obtain the information they want without needing to understand the details of the underlying implementation. And the separation means that internal data structures can be altered if necessary without impacting on their consumers. Obviously there are caveats to this approach. The translation work needs to be bounded and lightweight enough to not have an impact on the beacon node's primary responsibilities. A good example of what I'm trying to say here is the validator info. Clients have some approximation of the validator info internally that maps to the
But this information is not a lot of use to answer some the questions most users want to know about their validator, like "I just submitted a deposit; when will my validator activate?" or "what is the balance of my validator?", so just providing the data structure and expecting the consumer to do the work is not, in my opinion, feasible. |
@mcdee If consumers of that API would be end users with no other option then I would agree with you. Updating data structures impacting client developers, they will have to update and test the API, otherwise, if there is no test suite then it will likely annoy consumers with software bugs impacting on them as well. My point is more about sharing responsibilities between clients and tools for end users that could handle complicated use cases. Of course, developers of those tools will have to track changes in client API and maintain its test suite as well, and so forth. But that would be their responsibility. If we don't have resources or according to other reasons we won't follow this approach then we have to keep this tooling inside of beacon client. Probably, a compromise would be to start with basic use cases as you've mentioned and keep a lower-level API to allow for implementation of the others. I don't think that "What would be my revenue for this year if deposit pace and software failures stay at the same level?" is something that would fit the beacon client. User requests will highly likely go far beyond the responsibility of beacon clients. |
@mkalinin On the complicated use cases, I agree that we don't want the API to provide that type of information (for all sorts of reasons). I would suggest that the example you gave wouldn't fall under my (admittedly hazy) idea of "lightweight". But then I just said that consumers would like to know when their just-sent deposit is likely to be activated, which will also take a chunk of effort to work out. Which makes me think we really need to decide on the use cases that we do want to support (in addition to the low-level data, as you say) to be able to proceed here. |
Considering that the beacon node is critical infrastructure, it would make sense to not burden it any more than is absolutely necessary, for several reasons: security surface area, performance constraints, complexity of maintaining cross-client compatibility, as @mkalinin points out. Beacon nodes are not databases or general-purpose computation engines - that task is better left to after-market products that can cater to specific needs - to use an example, "what was my balance a month ago" is not something that the beacon node must calculate as part of its normal operation of selecting a head and maintaining a state, and therefore should really not be in the API, but rather the API should be able to provide data such that a third party may build a tool that can use the (much smaller) API supported by all client implementations. We want multiple client implementations of the critical consensus parts, not reimplement SQL in every client. Every API that diverges from being a task that the node has to perform to maintain its "normal" operation should therefore pass a much higher "utility" bar before being admitted into the standard, because inevitably this is where implementation differences will start to show - the data in the API, not being critical for normal operation, will eventually end up divergent and unreliable across implementations. Thus, for the first release of the API, I'd go with minimal and complete - any computed and opinionated API also increase the risk of the core API not being complete because they also serve as crutches that the initial tooling will come to expect. The other thing we could be looking at would be an extensibility mechanism where clients advertise what they support: a minimal core API, as well as extensions to that, where different clients freely can experiment with risky API that do additional calculations. Think libp2p, but for RPC: clients advertise certain capabilities and consumers can smartly choose which parts they use - this also provides an upgrade path and a documented way to extend the API in a straightforward way, beyond the minimal and complete core. Regarding polling, the risks are race conditions - you can guess the next slot from Regarding streaming, one thing that stands out as difficult is handling forks/reorgs correctly - when this happens, past data becomes invalid and we should design a data feed that takes this into account, ie clearly signals that a rewind has happened so that third-party tooling at least can be implemented correctly - polling API will generally have this problem as well.
generally, for node operation we need little beyond the latest finalized state. this ties in to there not yet being a way to synchronize state between clients, but synthesizing any state in a client may be an expensive operation - this is also why state sync is difficult: clients need to agree on which states to sync, and thus keep "stored" - 1-2 "blessed" states per weak subjectivity period have been suggested (ie the first state every 6 months could be a "blessed" starting point that acts as stand-in genesis and thus is transferrable) - as an example, a minimal and complete API here is to be able to transfer each block instead, from genesis. splitting the protocol into several features like
+1
specifically, it puts the burden on beacon node client developers, and having to negotiate the features between multiple implementations for every iteration, or seeing a fragmented community. A minimal and complete API does not have this issue because it already exposes all data available in a natural way.
Hm, I have no experience with these either, though I agree that on the surface they look attractive in that they enforce a "clean" and simple one-way stream - worth exploring at least as it looks like changing from SSE to websockets would be trivial because of the more constrained capabilities of SSE. |
Proto and I have surveyed the existing client APIs and attempted to organize them into a minimal, representative set Design goals (copied from doc):
Open questions:
NOTE: Validator API proposal in separate page in spreadsheet. There is a solid start in this repo, and probably some points to debate. Prioritizing user and debug APIs in initial discussion |
If we want to fit REST well then we likely want to follow its design best practices, a good description is given here. Namespace would be a widely used term for category or group. Usually namespace names are singular, they could be nested but to my experience having a hierarchy with more than 2 layers becomes cumbersome and could be a sign of bad design. If we have a
This also should follow REST ideology. Params in the path are usually resource identifiers, for instance, |
Thanks for the feedback @mkalinin I did a pass on the spreadsheet, fitting things more into the standards you mentioned where possible. Most of the items that very cleanly map to the REST standards are outlined in black boxes. The rest of the items are more of "helpers" or don't map cleanly to types/records contained in the spec and more derivative or just sub-sections of a type. A problem with mapping many of our datatypes to For blocks (and some other types) Curious to hear your thoughts on all of the above |
You could just name path param "blockId" and in description write that it can be either root or slot or "head". It's easy to distinguish what kind of id is by type of param.
|
I don't really like '/full' as a concept. I'd prefer we have pagination by default on lists likely to be large (/validators) and have a sensible default, but allow users to pass in large page sizes if required. Generally if the query parameters to search the list are adequate, people rarely actually want a 'full' large list like that.
For me if you have an endpoint with query parameters, maybe 2 parameters possible and only 1 of those 2 parameters can be specified, and one must be specified (like /beacon/block in LH and TEKU) then you're actually looking at a path abstraction, not using query parameters. This enforces the usage, and makes the interface clearer to use. The validators endpoint query parameters having different use-cases for active, pagination etc, show a good use-case for query parameters - where you want the first 100 validators, but only active to be returned. The parameters give you the flexibility to make that query happen, and get only the response you require. |
The current proposal covers Prysm's most important user stories. Here are a few points of feedback from me:
Overall, I tentatively support this proposal for Talk to you all tomorrow! |
Duck-typing works until you have two different types with the same encoding. For example, how do we distinguish between For example, if I call
I think this is an important point. We're different from many applications in that we're not backed by a "traditional database" (e.g., SQL) where there's a clearly defined primary key for each of our endpoints. For example, what is the primary key for a block in If we assume there's no natural primary key for "For filtering the dataset, we can pass various options through query params. If we did take this approach, we'd have to consider what Additionally, I think it would be prudent to disallow
I agree that pagination will work on endpoints like |
This is a good concrete use case, i like it. More to the point, the '123' could reference a committee index. As a general rule, this fails because we dont have the information required for the call to work in the first place, we need another number.
👍
👍
Absolutely, i think that's a good example of requiring further route context - just specifying /beacon/blocks would be bad. My preference would be a natural route first, but query parameters if that's the most logical choice. I would see using either block hash or slot id as a way to address this with a route.
Attestations might not be so bad? they could potentially be accessed via epoch, or maybe slot even slot? I guess this comes into where its not a natural id, so should be more of a query parameter, rather than a route... Pagination can be ok with query parameters, but it does mean re-running the query constantly as a general rule, so the server value in using query parameters is diminished... |
That happens only if you design api poorly. In this case commities belongs to epoch resource, so endpoint would be: |
It seems to me that your solution is to prefix the value with a type, which make sense. But following from your example, what is Thinking a little more about the User wants to request:
Solution with query params:
Constraints:
Without query params:
Now consider that in the future someone makes a strong case for including Given that our data model isn't designed like a typical "customers/products/purchases" database, query params seem to be less opinionated, less verbose and more extensible. |
+1, along with a request to discover what API and extensions a node supports |
This can be handled far better in a streaming API than a polling one. A streaming API can just send an update, whereas a polling one would require the user to repeatedly look back at data (which, realistically, won't change the vast majority of the time). If a full-blown streaming API is not something that people want to sign up for then perhaps a single endpoint that sends details of reorgs (and normal operations that act as triggers for subsequent client operations, such as "epoch transition computed") would be a compromise that would work for more people. |
That was not my idea. We should definitely document what kind of id type we support on each endpoint. That said, committee endpoint is a bit tricky as it doesn't have single identifier but:
I don't think this discussion is productive, we should probably discuss this on per case basis on PR/issue with user requests/implementation/description. |
Totally agree! I'd argue that we do have identifiers for at least all the chain objects that we're dealing with. In most of the cases they are the roots of those objects. For instance, to identify the validator in a REST model we would use We should probably re-consider sticking with REST in our case and look at e.g. GraphQL (however it has its own drawbacks, like using single endpoint and only POST requests). But let's try to give REST a chance anyway. We won't fully utilize REST as most of our cases are covered by GET and POST (to query and to create), rarely DELETE, and we would hardly ever use PUT/PATCH. REST advises to represent every object as a resource rather than a set of unstructured data obtained with multiple requests. The resource must be uniquely identified by its URL. If it's not the case that we're addressing then we have to use search with query string that narrows down our result set. IMO, REST is good to access chain data in most of the cases we need cause chain data has well defined solid structure where each object could be uniquely identified by URL if the URL is built well. But if the URL uniquely identifying the object can't be built or we want to get a list of the objects as a result query string must be used instead of attempt to abuse URL. A good example of it are committees which are identified by the state (seed), the slot number and the committee index. Slots and states are orthogonal and can't be sanely put into the same URL. I agree with @mpetrunic that for simplicity we could stem from strict REST and use different identifiers where it's appropriate. For instance, |
I'm not sure if this is the right place for this comment, but on the eth2-api spreadsheet, it seems that the Also most responses wrap everything with data, but theres a couple of validator endpoints that don't, and we should probably be consistent in that regard. |
@rolfyone Probaly better in #37 or in spresheets itself. As for your question, there is:
I think we concluded that array responses should be wrapped inside data key as we will probably need some metadata as pagination etc while it's better for single resource endpoints to return resource without data wrapper so it's easier to add "ssz" response of resource |
Thus far, this repo only consists of "validator" APIs. To aid in application developers, we aim to conform upon a set of user-level "beacon node" APIs.
Currently, Prysmatic has a number of user-level APIs defined in prysmaticlabs/ethereumapis with some decent traction with block explorers and other applications, and Lighthouse has a number of APIs that have also begun to be used by various applications (link?).
I propose that we make a list of the obvious items first (and especially any that have overlap between prysmatic and lighthouse). We can list them out, get a feel for the general categories, structure of arguments, etc and move from there. Explicit notes on how/why users are using them today will be helpful in better understanding the API in general.
After that, we can debate anything that seemed "non-obvious" or maybe client specific
The text was updated successfully, but these errors were encountered: