Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dump state to s3 by multiple nodes #9049

Merged
merged 27 commits into from
May 18, 2023
Merged

Conversation

ppca
Copy link
Contributor

@ppca ppca commented May 11, 2023

To enable dumping with multiple nodes, do the following:

  1. add
"state_sync": {
    "dump": {
      "location": {
        "S3": {
          "bucket":"state-parts-xiangyi",
          "region":"us-west-1"
        },
      },
    }
}

to ~/.near/config.json on your node;
2. run AWS_ACCESS_KEY_ID="xxxx" AWS_SECRET_ACCESS_KEY="xxx" ./target/release/neard run

How the multi-node dump work?
Each node will have one thread per shard, and will check s3 for missing parts per shard for the epoch it's dumping every 60 seconds. After checking missing parts, it will draw one part_id without replacement from the missing parts, generate and upload to s3 repeatedly until time's up. Then the thread repeat the process until all parts of the specific shard and epoch is dumped. Then the thread switch to the latest epoch that their chain head is in.
The nodes that are dumping state do not need to communicate to each other, they use s3 to check what's dumped and what's not.
The frequency of checking s3 for missing parts will be a trade off between repeated dumps and s3 latency: the more often the check, the less repeats happen, but the more s3 check latency hurts the speed of dump. May end up using a frequency different than 60s.

@ppca ppca marked this pull request as ready for review May 11, 2023 21:04
@ppca ppca requested a review from a team as a code owner May 11, 2023 21:04
@ppca ppca requested review from aborg-dev and nikurt and removed request for aborg-dev May 11, 2023 21:04
@walnut-the-cat
Copy link
Contributor

when we roll out this, can we somehow include some baseline template to config file to give users some hint on how to enable the feature?

@walnut-the-cat
Copy link
Contributor

The nodes that are dumping state do not need to communicate to each other, they use s3 to check what's dumped and what's not.

Is it common for multiple nodes to end up trying to upload the same part?

chain/client/src/sync/state.rs Outdated Show resolved Hide resolved
chain/client/src/sync/state.rs Outdated Show resolved Hide resolved
chain/client/src/sync/state.rs Outdated Show resolved Hide resolved
chain/client/src/sync/state.rs Outdated Show resolved Hide resolved
chain/client/src/sync/state.rs Outdated Show resolved Hide resolved
nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
let mut existing_nums = HashSet::new();
for name in file_names {
let splitted: Vec<_> = name.split("_").collect();
let part_id = splitted.get(2).unwrap().to_string().parse::<u64>()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to a function. String manipulation is error-prone, and moving it to a separate function:

  1. abstracts away the complexity
  2. lets us unit-test the complexity

nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
@ppca
Copy link
Contributor Author

ppca commented May 15, 2023

when we roll out this, can we somehow include some baseline template to config file to give users some hint on how to enable the feature?

For now, we don't expect users to dump state yet, we will start expecting that once the decentralized version is ready.

@ppca
Copy link
Contributor Author

ppca commented May 15, 2023

The nodes that are dumping state do not need to communicate to each other, they use s3 to check what's dumped and what's not.

Is it common for multiple nodes to end up trying to upload the same part?

this will happen from time to time, we expect this to be less common at the start of dump of an epoch, and become more frequent as number of parts left to dump for an epoch decreases.

@ppca ppca requested a review from nikurt May 15, 2023 23:05
chain/client/src/sync/state.rs Outdated Show resolved Hide resolved
nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
@@ -446,31 +447,43 @@ trait StatePartReader {
fn get_state_part_reader(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file can be refactored to remove StatePartReader and StatePartWriter, but it's fine to do it in a followup PR.

async fn state_sync_dump(
shard_id: ShardId,
chain: Chain,
epoch_manager: Arc<dyn EpochManagerAdapter>,
shard_tracker: ShardTracker,
runtime: Arc<dyn RuntimeAdapter>,
chain_id: String,
restart_dump_for_shards: Vec<ShardId>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider removing restart_dump_for_shards from ClientConfig and from Config (i.e. NearConfig::config)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, restart is still handy to have. In situations where latest epoch is marked as alldumped by error, unless we wipe out data from rocksdb, the node will never dump for the epoch again, restart_dump_for_shards solves this issue.

nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
parts_to_dump.len().try_into().unwrap();
let mut cnt_parts_dumped = num_parts - cnt_parts_to_dump;
let timer = Instant::now();
while timer.elapsed().as_secs() <= 60 && !parts_to_dump.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define a constant instead of magic value 60.

nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
nearcore/src/state_sync.rs Show resolved Hide resolved
nearcore/src/state_sync.rs Show resolved Hide resolved
@ppca ppca requested a review from nikurt May 16, 2023 21:08
Copy link
Contributor

@nikurt nikurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

@ppca ppca force-pushed the xiangyi/multi_node_state_dump branch from 56f5a85 to 5051015 Compare May 18, 2023 20:50
@ppca ppca merged commit d331000 into master May 18, 2023
@ppca ppca deleted the xiangyi/multi_node_state_dump branch May 18, 2023 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants