Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validated EVM Contracts #2348

Closed
wants to merge 12 commits into from
273 changes: 273 additions & 0 deletions EIPS/eip-2348.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
---
eip: 2348
title: Validated EVM Contracts
author: Danno Ferrin (@shemnon)
discussions-to: https://ethereum-magicians.org/t/eip-2348-validated-evm-contracts/3756
status: Draft
type: Standards Track
category: Core
created: 2019-11-01
requires: 1702, 2327
---

## Simple Summary

Make minor changes to EVM contract layout and add validation rules to a subset of those contracts.

## Abstract

A set of contract markers and validation rules relating to those markers is proposed. These
validation rules enable forwards compatible evolution of EVM contracts and provide some assurances
to Ethereum clients allowing them to disable some runtime verification steps by moving these
validations to the deployment phase.

## Motivation

There are two major motivations: first the need to make the EVM easier to evolve, and the second is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two major and one minor?

to provide validations that allow clients to optimize their EVM execution.

First there is the issue of an evolvable EVM. With the current state of EVM contracts literally any
sequence of bytes can be deployed to the blockchain. Some tools take advantage of this situation and
add meta-data to the end of their contract deployment. The real impact is that this precludes the
addition of new multi-byte instructions (such as the `PUSHn` series) because the new instructions
could hide a previously valid `JUMPDEST` when evaluated as a new opcode set. To prevent this account
versioning will be used so that contracts can be deployed in a way that is demonstrably validated.

Second there is the issue of improving runtime execution. One example is `JUMPDEST` evaluation.
Because each jump must "land" on a jump dest each client needs to validate that the dest is a valid
opcode location. Clients either need to do the analysis and store the values or re-evaluate the
contract on each execution. Stronger deployment validation will allow clients to presume jump calls
are valid in certain circumstances.

A tertiary motivation is to prepare the way for easily JITable contracts. While the current EVM can
be JIT compiled there are certain analyses that need to be performed to prevent or accommodate some
pathological or uncompilable cases from being compiled. With stricter rules these cases can be
detected at deploy time and rejected allowing EVM clients to make better assumptions about the
contract being compiled.

## Specification

There are three interlocking portions specified in this EIP and two portions from other active EIPs
included in this validation. [EIP-1702] (Generalized Account Versioning Scheme) and [EIP-2327]
(`BEGINDATA` opcode) are specified in their published locations. The portions specified in this EIP
are a versioning header (similar to what was in [EIP-1707]), invalid opcode validation (similar to
[EIP-1712]), and static jump analysis.

### EVM Account Versioning

Starting at `BLOCKNUM` (TBD) `EIP-1702` will be activated, `LATEST_VERSION` will be set to `1`, and
all new and updated accounts will have the account version `1`. The validation phase will apply the
rules described in the Version Header, `BEGINDATA`, Invalid Opcode Validation, and Static Jump
Validations sections.

These EIP sections applies to contracts stored or in the process of being stored in in accounts with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in in accounts => in accounts

version `1`. This EIP never applies to contracts stored or in the process of being stored in
accounts at version `0`. For initcode being executed for `CREATE` and `CREATE2` operations this
applies if the contract invoking the opcode is version `1`. If the calling contract was stored in an
account with version `0` this EIP does not apply.

Future EIPs may increase the set of contract versions this EIP applies to.

### Version Header

For contracts with the first byte is not `0xef`, or whose total length is less than 4 bytes, the
contract is treated exactly as through it had been deployed to an account with version `0`. For
these contracts none of the other subsections in this EIP apply.

When deploying a contract if a contract starts with `0xef` and has a length 4 or later the first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later => larger, greater

four bytes form a version header. If a version header is not recognized by the EVM the contract
deployment transaction fails with out-of-gas.

When executing a contract with a header the execution should start at `PC=4`, corresponding to the
first byte of the contract that is not part of the headers.

EVM implementations could model this as a 4 byte no-op no-gas operation that can only occur at the
zeroth index of a contract. However they would need to take care that the byte `0xef` would be
invalid if it occurred in the code segment at any location other than the zeroth byte.

For this EIP the header byte sequence [`0xef`, `0x65`, `0x76`, `0x6d`] is defined (corresponding to
the ISO/IEC 8859 part 1 string `'ïevm'`) is specified. This version indicates that next set of
validations are applied to the content of the contract, keeping all other semantics of the current
"version 0" EVM contracts, including the same gas schedule.

Future EIPs may expand on the valid set of headers. No other header sequences are defined in this
EIP.

### `BEGINDATA` operation

As described in [EIP-2327] a new opcode `BEGINDATA` (`0xb6`) is added that indicates the remainder
of the contract should not be considered executable code.

If the EVM attempts to execute the `BEGINDATA` operation it should be treated as attempting to
execute an invalid operation. Similarly jumping into any location after the `BEGINDATA` operation is
an invalid operation, even if the byte jumped to corresponds to the `JUMPDEST` opcode.

### Code Segment Size Limit

With the introduction of the `BEGINDATA` opcode the contract can now be conceptually split into a
code segment ad a data segment. The code segments corresponds to all the bytes prior to and
including the `BEGINDATA` opcode or the entire contract if no `BEGINDATA` opcode is present. All
other data after the code segment is referred to as the data segment. If there is no `BEGINDATA`
operation there are no bytes in the data segment.

In [EIP 170](https://eips.ethereum.org/EIPS/eip-170) a contract code size limit was introduced. All
code segment data, including the header bytes and `BEGINDATA` operation (if present) must be equal
to or less than the chain's specified contract code size limit, which is currently 24KiB for
mainnet.

For contract creation transactions, and the return of `CREATE`, and `CREATE2` operations this limit
is already enforced for the entire size of the contract, including code and data segments. For the
initialization code for a `CREATE` or `CREATE2` operation there is no specified limit, so the
separate enforcement of the code segment length will need to be enforced in those instances. The
combined code and data segment size for init code in `CREATE` and `CREATE2` operations is out of
scope for this EIP.

### Invalid Opcode Validation

All data between the Version Header and either the `BEGINDATA` marker or the end of the contract if
`BEGINDATA` is not present must represent a valid EVM program at all points of the data. Invalid
opcode validation consists of the following process:

- Iterate over the code bytes starting after the header bytes one by one.
- If the code byte is a multi-byte operation, skip the appropriate number of bytes and continue.
- If the code byte is a valid opcode or the designated invalid instruction (`0xfe`), continue.
- If the code byte is the `BEGINDATA` operation (`0xb6`) stop iterating and consider the contract
valid.
- If more bytes than the contract code size limit would be validated the contract is invalid and
the operation fails.
- Otherwise, the contract is invalid and the operation fails.

As of the Istanbul upgrade all of the multi-byte operations are the `PUSHn` series of operations
from `0x60` to `0x7f`. Future upgrades may add more multi-byte operations.

As of the Istanbul upgrade the invalid opcodes are `0x0c` to `0x0f`, `0x1e`, `0x1f`, `0x21` to
`0x2f`, `0x46` to `0x4f`, `0x5c` to `0x5f`, `0xa5` to `0xaf`, `0xb3` to `0xef`, `0xf6` to `0xf9`,
`0xfb`, `0xfc`, and `0xfe`. Future upgrades will remove items from this list. Note that `0xb6` is
referenced in this spec as the `BEGINDATA` marker, but is not part of any deployed upgrade. Also
note that `0xfe` would remain as a reserved 'invalid instruction' that will still be permitted.

### Static Jump Validations

For every jump operation preceded by a `PUSHn` instruction the value of the data pushed on to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about jump operations not preceded by a PUSHn instruction?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about jump operations not preceded by a PUSHn instruction?

In that case we have to use data flow analysis to determine if the argument to JUMP is a constant specified by a PUSHn. I assume validation is a one time thing (before deploying the contract?) so building a data flow graph does not seem to be too expensive.

BTW, compilers only generate PUSH2 in this case because ... there is a size limit to the smart contract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we should not use this relatively weak heuristics as a part of new validation rules. It is better to implement subroutines (which eliminates the most common source of dynamic jumps - which is return from the subroutine), and then the actual static jumps, and disable the dynamic jumps all together. Then we can remove JUMPDEST.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using subroutines is way better for validating the contract, but it is not infeasible to validate a contract without static jumps. Symbolic execution is still able to figure out which jump is dynamic and hence report it.

stack by the `PUSHn` operation must point to a valid `JUMPDEST` operation. If this validation fails
then the contract creation fails with out-of-gas.

As of the Istanbul upgrade the jump operations are `JUMP` (`0x56`) and `JUMPI` (`0x57`). Future
upgrades may add more jump operations.

As a client optimization this check may be performed during invalid opcode validation, or it may be
performed separately at contract deployment time.

## Rationale

The choice for the first byte of the header as `0xef` was first recommended in
[issue 154](https://github.com/ethereum/EIPs/issues/154) of the EIP repository. It also maps to an
unused opcode in the version 0 spec and packs next to the `0xf0` series of call instructions, and
the `evm` part was to mirror what WASM has done. Choosing `0x00` as the first byte as it could be
confused with a nonsensical, but correct contract that starts with STOP and the next operation is
PUSH5 if lowercase e was selected, or `STOP` `GASLIMIT` `JUMP` `<invalid 0x4d>` if capital letters
were used. A header that was always invalid in the prior EVM specs was seen as desirable.

The first major validation is the invalid opcode removal. In the case where a contract has an
invalid opcode that later becomes a multi-byte opcode followed by a `JUMPDEST` marker that contract
would become invalid after an upgrade because the destination marker would become part of the new
multi-byte instruction, as described in the [EIP-663 discussion]. If no invalid opcodes can be
deployed then the possibility of the `JUMPDEST` being absorbed by new multi-byte instructions is
eliminated.

One complication is that current versions of solidity append the swarm hash of the source code of
the contract in some instances to the end of the generated EVM bytecode. That is what motivated the
addition of the `BEGINDATA` opcode. Solidity can add a fairly simple wrapper function to it's
existing EVM generation. This option was chosen for its simplicity over other options such as
encoding the data in uncalled `PUSNn` instructions.

`JUMPDEST` validation is present to eliminate repeated validation calls for contracts and to reduce
the needed data storage requirements for cached validation. For example, if a client notices a
contract contains only static jumps it could store a cached validation flag that no jump analysis
needs to be performed, alternately they could defer the analysis until the first dynamic jump is
encountered.

## Backwards Compatibility

Almost all existing contract deployments will be able to be deployed with no client changes. The one
exception is contract deployments that start with `0x00`. This should have no impact on existing
contract execution because any contract with a `0x00` in the first position would immediately halt
because `0x00` maps to the `STOP` instruction, the utility and value of those contracts is minimal
at best. If this is not desirable a different header signaling byte that does not map to an existing
opcode (such as `0xEF`) can be used.

Except for the validation rules and versioning header all other semantics of the EVM are the same.
Gas schedules and opcode tables would be the same between account versions and whether or not the
contract was deployed with headers. Future EIPs may add opcodes that are only valid with a contract
that is deployed with a version header. Because of the version header validation rules multi-byte
contracts can be deployed.

Existing compilers (such as solidity) can provide support for headers by prepending their output
stream with `0xef`, `0x65`, `0x76`, `0x6d` and appending `0xb6` prior to any non-code data added as
part of the contract.

## Forwards Compatibility

This spec provides forward compatibility in at least two ways.

First, the content of multi byte and jump dest validated opcodes can be increased in future
upgrades. Contracts that would be valid under new rules would be rejected under old rules, and all
older contracts would still be valid under the new rules. Any newly deployed opcodes would be
disabled unless the code is appropriately validated.

Second, the versioning header can be extended to allow for stricter validations in future upgrades
while keeping the EVM evaluation semantics the same. Such possible stricter validations could
include prohibiting dynamic jumps.

## Test Cases

This is an incomplete list, but provides insight as to the scope of the required testing. Each test
would need to be written 3 times, once for normal contract deployment, once for `CREATE`, and once
again for `CREATE2`.

- Positive
- no header and invalid opcodes
- including the case where a `JUMPDEST` gets consumed by a proposed multi-byte operation
- no header and all valid opcodes
- includes static jump to invalid destination
- header and all valid opcodes
- includes static jump to valid destination
- header, all valid opcodes, and `BEGINDATA`
- header, all valid opcodes, `BEGINDATA`, and invalid opcodes in data
- three byte program, starts with zero
- four bytes program, header only
- header and begin data only
- validated code in `CREATE` an `CREATE2` init code with proper code segment size and total size
greater than the code segment limit
- Negative
- contract with otherwise valid program that starts with zero, 5 bytes or more
- contract with header and invalid opcodes
- contract with header, begin data, and invalid opcodes in the middle
- contract with header, and static jump to bad place
- contract with unrecognized header
- contract with a static jump into code in `BEGINDATA`
- contract with a static jump outside of all data
- header, and contract code+header to large by less than 4 bytes
- header, and contract code+header to large by more than 4 bytes
- header, contract code, begin data, data, and the whole thing is too large
- one test for each invalid opcode: no header, with header, and with header and `BEGINDATA`
- code segment size violations
- In a contract creation transaction
- In `CREATE` and `CREATE2` init code
- In `CREATE` and `CREATE2` created contracts

## Implementation

No implementation yet.

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

[eip-615]: https://eips.ethereum.org/EIPS/eip-615
[eip-1702]: https://eips.ethereum.org/EIPS/eip-1702
[eip-1707]: https://github.com/ethereum/EIPs/pull/1707
[eip-1712]: https://github.com/ethereum/EIPs/pull/1712
[eip-2327]: https://github.com/ethereum/EIPs/pull/2327
[eip-663 discussion]:
https://ethereum-magicians.org/t/eip-663-unlimited-swap-and-dup-instructions/3346/11?u=shemnon