Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement EOF #972

Open
wants to merge 32 commits into
base: forks/prague
Choose a base branch
from
Open

Implement EOF #972

wants to merge 32 commits into from

Conversation

gurukamath
Copy link
Collaborator

What was wrong?

The current specs do not support EOF

Related to Issue #971

How was it fixed?

Implement the 11 EOF EIPs

Cute Animal Picture

Cute Animals - 1 of 1

@gurukamath gurukamath changed the base branch from forks/prague to master July 6, 2024 10:46
Copy link
Collaborator

@SamWilsn SamWilsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review.

src/ethereum/prague/vm/__init__.py Outdated Show resolved Hide resolved
src/ethereum/prague/vm/__init__.py Outdated Show resolved Hide resolved
@@ -95,6 +111,7 @@ class Evm:
error: Optional[Exception]
accessed_addresses: Set[Address]
accessed_storage_keys: Set[Tuple[Address, Bytes32]]
eof: EOF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a descriptive field name. How about:

Suggested change
eof: EOF
container_format: Eof

src/ethereum/prague/vm/__init__.py Outdated Show resolved Hide resolved
src/ethereum/prague/vm/__init__.py Outdated Show resolved Hide resolved
header_end_index: Uint


def get_opcode(opcode: int, eof: EOF) -> Ops:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should name this more descriptively. Some (bad) ideas:

  • integer_to_op
  • parse_op
  • validate_op

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as map_int_to_op

src/ethereum/prague/vm/eof.py Outdated Show resolved Hide resolved
pc: Uint
stack: List[U256]
memory: bytearray
code: Bytes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create NewType for code vs. container vs. whatever other types of sections there are?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interpreting this as the code being implemented. If it is legacy, then it's just the code. If EOF, it is the code in the section being currently implemented.

)


def validate_header(container: bytes) -> EOFMetadata:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this separate from meta_from_valid_eof1_container?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined into one function with a flag to validate

raise InvalidEOF("Invalid input/output for first section")


def get_valid_instructions(code: bytes) -> Set[int]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a better name for this. From just the signature, it looks like it returns a uniqued list of all the opcodes used in code, but I think it returns offsets instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has now been integrated into the validate_code_section function.

@gurukamath gurukamath marked this pull request as ready for review August 13, 2024 09:07
@gurukamath gurukamath changed the base branch from master to forks/prague August 15, 2024 14:13
@gurukamath
Copy link
Collaborator Author

Rebased on latest forks/prague

Copy link
Collaborator

@SamWilsn SamWilsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review (5/22)

Comment on lines 929 to 966
except InvalidEof:
output = MessageCallOutput(
gas, U256(0), tuple(), set(), set(), InvalidEof(), b""
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should preserve the original exception.

Suggested change
except InvalidEof:
output = MessageCallOutput(
gas, U256(0), tuple(), set(), set(), InvalidEof(), b""
)
except InvalidEof as error:
output = MessageCallOutput(
gas, U256(0), tuple(), set(), set(), error, b""
)

Comment on lines +64 to +66
if data > MAX_ADDRESS_U256:
raise ValueError("Address is too large")
return Address(data.to_be_bytes32()[-20:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be more clear if we used to_be_bytes, left_pad_zero_bytes, and Address.LENGTH?

@@ -62,7 +90,7 @@ def compute_contract_address(address: Address, nonce: Uint) -> Address:
return Address(padded_address)


def compute_create2_contract_address(
def compute_contract_address_2(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps?

Suggested change
def compute_contract_address_2(
def compute_contract_address_with_salt(

return Address(data.to_be_bytes32()[-20:])


def compute_contract_address_1(address: Address, nonce: Uint) -> Address:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def compute_contract_address_1(address: Address, nonce: Uint) -> Address:
def compute_contract_address_with_nonce(address: Address, nonce: Uint) -> Address:

Perhaps?

elif isinstance(target, Address):
current_target = target
msg_data = data
code = get_account(env.state, target).code
if code_address is None:
code_address = target

if get_eof_version(code) == EofVersion.LEGACY:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting really nested. Is there a good way to refactor this?


if get_eof_version(code) == EofVersion.LEGACY:
eof = None
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a catch-all here will make this more difficult to catch if we add another EofVersion variant.

Is mypy smart enough to let us do something like:

eof_version = get_eof_version(code)
if eof_version == EofVersion.LEGACY:
    ...
elif eof_version == EofVersion.EOF1:
    ...
else:
    assert_never(eof_version)

IIRC, assert_never is available in typing_extensions.

Copy link
Collaborator

@SamWilsn SamWilsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More review (14 / 22)

stack_height: Optional[OperandStackHeight]


SectionMetadata = Dict[Uint, InstructionMetadata]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because I'm just curious, but why Dict[Uint, InstructionMetadata] over List[Optional[InstructionMetadata]]? Is this very hole-y data?

from ..instructions import Ops

eof: Eof
sections: Dict[Uint, SectionMetadata]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sections: Dict[Uint, SectionMetadata]
sections: SectionMetadata

Maybe?

The current validator instance.
"""
code = validator.current_code
position = Uint(validator.current_pc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't current_pc already a Uint?

target_type = eof_meta.type_section_contents[target_index]
target_outputs = target_type[1]

if target_outputs == 0x80:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does 0x80 come from? Should it be a constant?

current_outputs = current_section_type[1]
target_outputs = target_section_type[1]

if target_outputs != 0x80 and target_outputs > current_outputs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the same 0x80, and so should be in a constant.

raise InvalidEof("Kind code not specified in header")
kind_code = container[counter]
counter += 1
if validate and kind_code != 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to disentangle validation from the other logic? Perhaps in two functions, maybe with a dataclass to store intermediate values?

Just seems like this function is a bit complex.

if op_metadata.stack_height.max > computed_maximum_stack_height:
computed_maximum_stack_height = op_metadata.stack_height.max

if computed_maximum_stack_height > 1023:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make an EOF max stack height constant?

]
for index in range(len(eof_meta.container_section_contents)):
if (
index in eofcreate_references
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make some intermediate variables to clean up these conditions?

Maybe:

eofcreate_seen = index in eofcreate_references
returncontract_seen = index in returncontract_references

if eofcreate_seen and returncontract_seen:
    ...

Comment on lines +288 to +325
if (
len(container) < EOF_MAGIC_LENGTH
or container[:EOF_MAGIC_LENGTH] != EOF_MAGIC
):
raise InvalidEof("Invalid magic")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, maybe:

Suggested change
if (
len(container) < EOF_MAGIC_LENGTH
or container[:EOF_MAGIC_LENGTH] != EOF_MAGIC
):
raise InvalidEof("Invalid magic")
if len(container) < EOF_MAGIC_LENGTH:
raise InvalidEof("Too short for magic")
if container[:EOF_MAGIC_LENGTH] != EOF_MAGIC:
raise InvalidEof("Invalid magic")

Personally I find separate if blocks easier to read than a single long if condition.


# PROGRAM COUNTER
relative_offset = int.from_bytes(
evm.code[evm.pc + 1 : evm.pc + 3], "big", signed=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever possible to get an IndexError here, or is this safe because of the EOF validation?

Copy link
Collaborator

@SamWilsn SamWilsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last bit

Comment on lines +459 to 468
if evm.eof is None and Uint(return_data_start_position) + Uint(size) > len(
evm.return_data
):
raise OutOfBoundsRead
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe?

Suggested change
if evm.eof is None and Uint(return_data_start_position) + Uint(size) > len(
evm.return_data
):
raise OutOfBoundsRead
if evm.eof is None:
if Uint(return_data_start_position) + Uint(size) > len(evm.return_data):
raise OutOfBoundsRead

charge_gas(evm, copy_gas_cost + extend_memory.cost)

# OPERATION
assert evm.eof is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's a check somewhere else that enforces datacopy is only used in EOF contracts? Should we leave a comment explaining that?

@gurukamath
Copy link
Collaborator Author

Rebased on current forks/prague and updated tests to the latest release

This commit updates to the latest test releases and fixes some minor bugs exposed by the increased coverage.
In particular, the bugs related to faulty return flag in the type section are fixed. Any code section that has
RETF instruction or has a JUMPF to a returning section is considered a returning section. Otherwise, the code section
is considered non-returning. The output in the type section should reflect this behaviour. If not, the container is
invalid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants