Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeker can't seek when referencing data-containing node #4286

Closed
mib-kd743naq opened this issue Oct 6, 2017 · 12 comments
Closed

Seeker can't seek when referencing data-containing node #4286

mib-kd743naq opened this issue Oct 6, 2017 · 12 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@mib-kd743naq
Copy link
Contributor

Version information:

All versions affected

Type:

Bug

Severity:

High

Description:

I put together a demonstration of the problem at this location: if you try to look at the last entry in that directory, the following is observed:

curl -D - -s ipfs.io/ipfs/QmTTDNxNJUqF5PZicJqjDjsUed1bHaPWwFdEMvb1iY93Ur
HTTP/1.1 500 Internal Server Error
Date: Fri, 06 Oct 2017 11:16:53 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 18
Connection: keep-alive
Access-Control-Allow-Headers: Content-Range, X-Chunked-Output, X-Stream-Output
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Content-Range, X-Chunked-Output, X-Stream-Output
Cache-Control: public, max-age=29030400, immutable
Etag: "QmTTDNxNJUqF5PZicJqjDjsUed1bHaPWwFdEMvb1iY93Ur"
Last-Modified: Thu, 01 Jan 1970 00:00:01 GMT
Suborigin: ipfs000bciqex6g372ssrd2i6o5tc35pdbbpxxgyw2mm56wpb3sxchw6rlangzy
X-Content-Type-Options: nosniff
X-Ipfs-Path: /ipfs/QmTTDNxNJUqF5PZicJqjDjsUed1bHaPWwFdEMvb1iY93Ur
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload

seeker can't seek

A complete diagram of all blocks involved and their contents is included at https://ipfs.io/ipfs/QmZL4wivfYBEUutxksKgbpPfFkj1tGNMESYLFF2MWcST8Y/block_dag.svg

@mib-kd743naq
Copy link
Contributor Author

Note that the CLI appears to work fine:

~$ ipfs cat QmTTDNxNJUqF5PZicJqjDjsUed1bHaPWwFdEMvb1iY93Ur | hexdump -C
00000000  66 6f 6f 66 6f 6f                                 |foofoo|
00000006

@Kubuxu Kubuxu added the kind/bug A bug in existing code (including security flaws) label Oct 6, 2017
@magik6k
Copy link
Member

magik6k commented Oct 12, 2017

The error is caused by https://github.com/ipfs/go-ipfs/blob/master/unixfs/io/pbdagreader.go#L67, which is called by https://github.com/ipfs/go-ipfs/blob/master/unixfs/io/pbdagreader.go#L222, which is called by gateway/http logic (That's why CLI works).

It looks like your data has invalid size hints:

$ ipfs dag get QmTTDNxNJUqF5PZicJqjDjsUed1bHaPWwFdEMvb1iY93Ur
{"data":"CAIYBg==","links":[{"Cid":{"/":"QmcJw6x4bQr7oFnVnF6i8SLcJvhXjaxWvj54FYXmZ4Ct6p"},"Name":"","Size":0},{"Cid":{"/":"QmcJw6x4bQr7oFnVnF6i8SLcJvhXjaxWvj54FYXmZ4Ct6p"},"Name":"","Size":0}]}

Can you share how you created those nodes?

@mib-kd743naq
Copy link
Contributor Author

mib-kd743naq commented Oct 12, 2017

@magik6k I am working on a project that uses the IPFS infrastructure for retrieval/display purposes only. I am using a separate block generator written from scratch in another language, that does not have anything in common with the ipfs codebase (it is not in a state where it is easily open-sourceable). I use its output to feed to the API as per #3552 (comment)

Back when I was working on my first large-scale exploratory test #3552 (comment), someone on IRC specifically told me that sizing is optional, and is essentially not enforced anywhere. In fact everything in that matrix does not have a size on the leaves at all.

In all the blocks I generate ( as you can see in the graph) omit many of the fields. Namely this is never specified, neither is this except for directories, and where it is specified the size corresponds to the actual data bytes underneath, to provide actual user-friendly representation within a directory listing

So far pretty much everything works as advertised: the size thing is treated as entirely optional, and is used for nothing but display purposes, except of course the bug raised in this particular ticket.

Now - if go-ipfs did a 180 and declared retroactively that all these sizes are in fact mandatory, I would be sad ( because all these examples I have above will no longer be browseable ), but at least there will be clarity how one can move forward. Though keep in mind that there is no way to prevent "invalid" blocks like mine being placed into the DHT, so the UI needs to be better at erroring out if this is the direction you want to move. Additionally keep in mind that go-ipfs itself will happily create bogus values, e.g. #3440 and #4151

Additionally - the way my project chunks data, my average blocksize is around 4096 bytes ( with a hard-limit on block sizes at 64k ), so forgoing any optional metadata is critical for me in order to reduce the size of my objects ( as the overhead adds up incredibly fast when one deals with hundreds of millions of objects in a DAG ).

@whyrusleeping
Copy link
Member

@mib-kd743naq The sizes are optional, but seeking through a file without the sizes set will throw an error (as youre seeing). The issue here appears to be that go's http.ServeContent uses seeks to determine the size of the object its serving, which then triggers our issue. See: https://golang.org/pkg/net/http/#ServeContent

This is somewhat troublesome, it would be nice to only call seek if a range header is set, instead of calling seek for every call. This actually might be causing performance issues when serving content from the gateway, as it has to call Seek (which requires fetching blocks) before it can serve the first byte. Looking at the code for ServeContent, its a wrapper around an internal serveContent function that passed in a sizeFunc which is where the seeking is done. Perhaps we can do something with our own custom size func?

I'm mostly out this week, but I can take a deeper look into resolving this when i'm back. Aside from fixing your issue, this will likely improve gateway perf quite nicely.

@mib-kd743naq
Copy link
Contributor Author

This is somewhat troublesome, it would be nice to only call seek if a range header is set

So much this!!! I was actually looking into strangely odd performance issues with my HTTP requests but was never able to dig too far.

I will be having a caching proxy in front of go-ipfs, so the inability to seek to a range is not a problem. However, given that you confirmed sizing is optional, perhaps a "dumb seek via skip" mode isn't a bad thing to have for DAGs like mine.

@whyrusleeping
Copy link
Member

whyrusleeping commented Oct 13, 2017 via email

@mib-kd743naq
Copy link
Contributor Author

I guess if there aren't any sizes set

Which is precisely how I construct my File / Chunk ( UnixFS types 2 and 0 ) blocks: they entirely lack both the linksize field and the blocklist fields

@mib-kd743naq
Copy link
Contributor Author

The size: 0 entries seen in @magik6k's #4286 (comment) are filled in by something within go-ipfs, the actual blocks do not have them:

~$ ipfs dag get QmTTDNxNJUqF5PZicJqjDjsUed1bHaPWwFdEMvb1iY93Ur
{"data":"CAIYBg==","links":[{"Cid":"QmcJw6x4bQr7oFnVnF6i8SLcJvhXjaxWvj54FYXmZ4Ct6p","Name":"","Size":0},{"Cid":"QmcJw6x4bQr7oFnVnF6i8SLcJvhXjaxWvj54FYXmZ4Ct6p","Name":"","Size":0}]}

~$ ipfs block get QmTTDNxNJUqF5PZicJqjDjsUed1bHaPWwFdEMvb1iY93Ur | protoc --decode_raw
1 {
  1: 2
  3: 6
}
2 {
  1 {
    2: "\317\222\375\357\315\303L\254\000\234\213\005\353f+\340a\215\271\336U\354\324\'\205\351\354g\022\370\337e"
  }
}
2 {
  1 {
    2: "\317\222\375\357\315\303L\254\000\234\213\005\353f+\340a\215\271\336U\354\324\'\205\351\354g\022\370\337e"
  }
}

@Kubuxu
Copy link
Member

Kubuxu commented Oct 13, 2017

So looking at HTTP spec Content-Length should be sent whenever the size can be determined. It is optional and we should just not set it if it is not known or malformed.

@mib-kd743naq
Copy link
Contributor Author

@whyrusleeping just a gentle poke/reminder that you "promised" to dig into this a bit Monday ;)

@whyrusleeping
Copy link
Member

@mib-kd743naq right you are! Investigating now

@whyrusleeping
Copy link
Member

its looking like we should implement our own variant of the http.ServeContent function that lets us skip certain things. I'll try and get a PR up for this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

4 participants