Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace multipart download with parallel file download #10519

Merged

Conversation

andrross
Copy link
Member

There are a few open issues with the multi-stream download approach:

  • Recovery stats are not being reported correctly
  • It is incompatible (short of reopening and re-reading the entire file) with the existing Lucene checksum validation logic
  • There are some issues with integrating it with the pending client side encryption work

Given this, I attempted an experiment where I replaced with multi-stream-within-a-single-file approach with simply parallelizing downloads across files (this is how snapshot restore works). I actually got better results with this approach: recovering a ~52GiB shard took about 4.7 minutes with the multi-stream code versus 3.9 minutes with the parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume, S3 as remote repository).

I think this is the right approach as it leverages the more battle-tested code path and addresses the three issues listed above. The multi-stream approach still has promise as it will allow us to download very large files faster (whereas this approach they can be the long poll on the transfer operation). However, given that 5GB segments (made up of multiple files in practice) are the norm, we generally aren't dealing with huge files.

Related Issues

Resolves #10516

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 10, 2023

Compatibility status:

Checks if related components are compatible with change d4cca9a

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/asynchronous-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

@andrross andrross added backport 2.x Backport to 2.x branch backport 2.11 labels Oct 11, 2023
@andrross andrross merged commit 00ccfc4 into opensearch-project:main Oct 11, 2023
22 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.11 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.11 2.11
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.11
# Create a new branch
git switch --create backport/backport-10519-to-2.11
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 00ccfc49f3aa2bf06ddc22ff8f0bc24790b222da
# Push it to GitHub
git push --set-upstream origin backport/backport-10519-to-2.11
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.11

Then, create a pull request where the base branch is 2.11 and the compare/head branch is backport/backport-10519-to-2.11.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 11, 2023
There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 00ccfc4)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Oct 11, 2023
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 00ccfc4)
andrross added a commit to andrross/OpenSearch that referenced this pull request Oct 11, 2023
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 00ccfc4)
Signed-off-by: Andrew Ross <andrross@amazon.com>
kotwanikunal pushed a commit that referenced this pull request Oct 11, 2023
There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.


(cherry picked from commit 00ccfc4)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@andrross andrross deleted the parallel-files-download-v2 branch October 11, 2023 14:59
kotwanikunal pushed a commit that referenced this pull request Oct 11, 2023
There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.


(cherry picked from commit 00ccfc4)

Signed-off-by: Andrew Ross <andrross@amazon.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Oct 19, 2023
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Oct 23, 2023
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Oct 23, 2023
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Dec 13, 2023
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Jan 19, 2024
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Jan 19, 2024
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Feb 6, 2024
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Feb 6, 2024
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.11 backport-failed bug Something isn't working distributed framework skip-changelog v2.11.0 Issues and PRs related to version 2.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Recovery stats not reported for remote store clusters
3 participants