Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden against IO failures using DB atomicity #1180

Closed
wants to merge 2 commits into from

Conversation

adaszko
Copy link
Contributor

@adaszko adaszko commented May 20, 2020

Issue Addressed

Partially addresses #692

@paulhauner
Copy link
Member

I'm seeing a test failure on CI that looks like it could be related to changes in this PR:

test finalizes_after_resuming_from_db ... FAILED

failures:

---- finalizes_after_resuming_from_db stdout ----
thread 'finalizes_after_resuming_from_db' panicked at 'should not error during attestation processing: UnknownHeadBlock { beacon_block_root: 0x230dcb3f606e112424eb6c2c04d1f59719dfcde3d2c6fd349008685d2dce2e6e }', /home/runner/work/lighthouse/lighthouse/beacon_node/beacon_chain/src/test_utils.rs:577:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I'll flag this as WIP and hold off on a full review until we've sorted that out :)

@paulhauner paulhauner added the work-in-progress PR is a work-in-progress label May 21, 2020
Comment on lines +165 to +173
StoreOp::PutState(state_hash, state) => {
let untyped_hash: Hash256 = (*state_hash).into();
let key = Self::get_key_for_col(
DBColumn::BeaconState.into(),
untyped_hash.as_bytes(),
);
let value = StorageContainer::new(state).as_ssz_bytes();
leveldb_batch.put(key, &value);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State storage needs to take into account the HotStateSummary here, as in the non-atomic case:

/// Store a post-finalization state efficiently in the hot database.
///
/// On an epoch boundary, store a full state. On an intermediate slot, store
/// just a backpointer to the nearest epoch boundary.
pub fn store_hot_state(
&self,
state_root: &Hash256,
state: &BeaconState<E>,
) -> Result<(), Error> {
// On the epoch boundary, store the full state.
if state.slot % E::slots_per_epoch() == 0 {
trace!(
self.log,
"Storing full state on epoch boundary";
"slot" => state.slot.as_u64(),
"state_root" => format!("{:?}", state_root)
);
store_full_state(&self.hot_db, state_root, &state)?;
}
// Store a summary of the state.
// We store one even for the epoch boundary states, as we may need their slots
// when doing a look up by state root.
self.put_state_summary(state_root, HotStateSummary::new(state_root, state)?)?;
Ok(())
}

The compositionality of the HotColdDB consisting of two sub-stores (hot_db and cold_db) with the same API is a lie, and is another reason why the Store trait should be refactored IMO. I don't think Store should be implemented for LevelDB, because we never use just one LevelDB database as the store any more, and it creates awkwardness when a Store API has to be implemented on the underlying store with assumptions about that API only getting used for the hot or cold DB (in this case we're assuming hot). I think it makes sense to put all this logic in the parent impl of do_atomically (i.e. HotColdDB), and stub out the impl for LevelDB so that it returns an error.

pub enum StoreOp {
pub enum StoreOp<'a, E: EthSpec> {
PutBlock(SignedBeaconBlockHash, SignedBeaconBlock<E>),
PutState(BeaconStateHash, &'a BeaconState<E>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with using a reference here, but with state summaries existing it might make sense to mix in the two variants from StateBatch:

// Allow owned states for the storage of intermediate states in block processing (uncommon)
PutState(BeaconStateHash, Cow<'a, BeaconState<E>>),
PutStateSummary(BeaconStateHash, HotStateSummary),

See https://github.com/sigp/lighthouse/blob/master/beacon_node/store/src/state_batch.rs

StateBatch could then be deleted of course.

@adaszko
Copy link
Contributor Author

adaszko commented May 21, 2020

I'm seeing a test failure on CI that looks like it could be related to changes in this PR:

test finalizes_after_resuming_from_db ... FAILED

failures:

---- finalizes_after_resuming_from_db stdout ----
thread 'finalizes_after_resuming_from_db' panicked at 'should not error during attestation processing: UnknownHeadBlock { beacon_block_root: 0x230dcb3f606e112424eb6c2c04d1f59719dfcde3d2c6fd349008685d2dce2e6e }', /home/runner/work/lighthouse/lighthouse/beacon_node/beacon_chain/src/test_utils.rs:577:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I'll flag this as WIP and hold off on a full review until we've sorted that out :)

That's related to my GitHub Actions test jobs not being triggered automatically.

I thought this PR is going to be a quick improvement but I see now it's way deeper and gonna involve more time.

@paulhauner
Copy link
Member

paulhauner commented May 22, 2020

That's related to my GitHub Actions test jobs not being triggered automatically.

Oh so this is an old test result?

EDIT: Oh I remember this now. Hopefully it changes now you're a contributor.

@adaszko adaszko closed this May 22, 2020
@michaelsproul
Copy link
Member

That's related to my GitHub Actions test jobs not being triggered automatically.

I think Github Actions were working fine here, right? CI ran against the latest commit from the atomicity branch (7bf1629) and failed because of the reasons raised in my review.

@adaszko
Copy link
Contributor Author

adaszko commented May 25, 2020

That's related to my GitHub Actions test jobs not being triggered automatically.

I think Github Actions were working fine here, right? CI ran against the latest commit from the atomicity branch (7bf1629) and failed because of the reasons raised in my review.

On the pull request, yes. On my fork, no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
work-in-progress PR is a work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants