-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify that chunker sizes are in bytes #5923
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,12 +78,16 @@ You can now refer to the added file in a gateway, like so: | |
|
||
The chunker option, '-s', specifies the chunking strategy that dictates | ||
how to break files into blocks. Blocks with same content can | ||
be deduplicated. The default is a fixed block size of | ||
be deduplicated. Different chunking strategies will produce different | ||
hashes for the same file. The default is a fixed block size of | ||
256 * 1024 bytes, 'size-262144'. Alternatively, you can use the | ||
rabin chunker for content defined chunking by specifying | ||
rabin-[min]-[avg]-[max] (where min/avg/max refer to the resulting | ||
chunk sizes). Using other chunking strategies will produce | ||
different hashes for the same file. | ||
Rabin fingerprint chunker for content defined chunking by specifying | ||
rabin-[min]-[avg]-[max] (where min/avg/max refer to the desired | ||
chunk sizes in bytes), e.g. 'rabin-262144-524288-1048576'. | ||
|
||
The following examples use very small byte sizes to demonstrate the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 1k isn't that small and users definitely shouldn't use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I’m never suggesting using 2 MiB chunks. rabin-262144-524288-1048576 is in bytes and not kilobytes, so it’s 128 kiB–256 kiB–512 kiB. This was what this patch was intended to help clarify. 1 kiB is ridiculously small in this context. You don’t ever want to go through DHT and peer discovery for a file that is split into a thoiusand 1 kiB chunks. The protocol overhead would be enormous. Keep in mind that you’re also storing each individual chunk in a file that takes up a 256 kiB block on most file systems, so the disk packing wastage would also be enormous. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You won't have to do that. You'll go through the DHT for the root node (which will likely be smaller than 1KiB as it doesn't contain any actual data).
Most filesystem have 4KiB blocks. Also note that we're moving towards a datastore that doesn't store each chunk in a separate file. However, I do agree that a 1KiB is small.
Ah, yeah, you're right. We might want to just bump those sizes up instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Otherwise, I'd be fine merging this as-is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@Stebalien Where can I find more information about that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
properties of the different chunkers on a small file. You'll likely | ||
want to use a 1024 times larger chunk sizes for most files. | ||
|
||
> ipfs add --chunker=size-2048 ipfs-logo.svg | ||
added QmafrLBfzRLV4XSH1XcaMMeaXEUhDJjmtDfsYU95TrWG87 ipfs-logo.svg | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We might want to move the pros/cons of chunking to a new paragraph (this is pretty choppy). However, your change is still an improvement so we can mess with that later if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is one of the most important things to know about this option and chunking. I moved it up so people would read it early before they get distracted by all the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My complaint is that it's "fact, fact, fact..." with no relationship between the facts. But didn't flow all that well before so we can fix this later.