-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement Actions around client and retrieval-client commands #1687
Conversation
testhelpers/fat/action_client.go
Outdated
|
||
// ClientCat runs the client cat command against the filecoin process. | ||
// TODO the actual command returns a DAGReader, however that's not easily doable here, | ||
// since DAGReader implements the "ReadSeekCloser" a ReadCloser should work for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
er, i dont follow. when this runs the command i expect it gets text or json output from the command, how does it know anything about a dagreader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json and text won't work very well if the file we are cat'ing is a binary file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im missing something about how iptb works. doesnt it invoke a shell command against the gfc binary? if so why should iptb do anything different than what the client cat command does, which is presumable output the binary to stdout? why would we care what the command does under the hood? @travisperson @frrist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically we are wanting to avoid writing the output of an IPTB command to a buffer and instead aim to expose a pipe of that data directly to the caller so they may operate on it efficiently and exert appropriate backpressure.
As an example: currently if I was to cat
a file that was 100GB using IPTB it would buffer here until the process crashed. We want to modify IPTB to pass this reader directly to the caller (the method this comment is on). This will allow us to efficiently read the data without buffering using go's io.Reader
interface. I will add a better TODO here. @travisperson has come up with a solution and is composing an issue that's addresses it.
You aren't really missing anything, our IPTB plugins just weren't written very well because they haven't been a priority until FAT got traction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does it know anything about a dagreader?
It doesn't really. However, that interface is just a grouping of io.Reader
, io.Seeker
, io.Closer
, io.WriterTo
, with two additional methods Size() uint64
, and CtxReadFull(context.Context, []byte) (int, error)
.
There is no reason it couldn't return something that implemented all of those if we could find a way to get the size out. Doing that with just the client cat
command would be difficult, and require some other kind of encoding that it doesn't support at the moment, that I know of.
doesnt it invoke a shell command against the gfc binary?
Yes, this is basically how the plugins are implemented.
if so why should iptb do anything different than what the client cat command does, which is presumable output the binary to stdout?
Dumping the data of the dag is all it does right now. My guess is @frrist is saying maybe it should do more. If we can get the size at the start of the call to client cat
, then we could implement it. However, I don't think client cat
should worry about this. To properly implement the entire DagReader interface we would probably use the dag
sub-commands. As efficiently implementing io.Seeker
would require bi-directional communication and I don't think cmdkit supports this at all.
Basically we are wanting to avoid writing the output of an IPTB command to a buffer and instead aim to expose a pipe of that data directly to the caller so they may operate on it efficiently and exert appropriate backpressure.
I don't understand what the buffering issue has to do with this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something to keep in mind is that we are implementing an api in FAT for the cli, not the go-filecoin plumbing api.
if err != nil { | ||
return nil, err | ||
} | ||
return out.Stdout(), err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we always be getting json output so we break less when output changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client cat
just dumps the binary blob of the object referenced by the CID. I would say this is the correct behavior and probably shouldn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing me to rethink the Output
interface on IPTB.
In the filecoin plugins we read the entire output of a command into a byte array and then return it as a reader through the Output
interface. In hindsight this seems wasteful and redundant. If we were cat'ing a 12TB file this pattern wouldn't work as it doesn't allow the consumer to read returned data on an "as needed" basis.
To avoid this I am thinking we may want to add something like a StreamingOutput
interface to IPTB that would allow us to pass the output pipes all the way through to commands like this. The StreamingOutput
interface would expose a Close()
method that calls exec.Wait()
. This would allow us to wait for the respective I/O loop copying from the process to complete, and would give use the ability to handle large volumes of data when calling IPTB methods from code like this.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the interface already supports this use case. Both Stderr
and Stdout
are defined to return an io.ReadCloser
. I think we just need a better implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is those pipes will be closed when Wait
gets called, meaning we cannot read from them unless we allow the caller of the method to call Wait
, currently it is handled in RunCmd
-- this is the problem I am trying to get at.
@@ -0,0 +1,19 @@ | |||
package fat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you mixing underscores and dashes in filenames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the command is called retrieval-client
-- open to suggestions on this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think converting the dashes to underscores is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, but we need to use the stdin for files to support the docker plugin.
var out cid.Cid | ||
if err := f.RunCmdJSONWithStdin(ctx, nil, &out, "go-filecoin", "client", "import", data.FullPath()); err != nil { | ||
return cid.Undef, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a file path isn't going to work for the docker plugin, or other iptb plugins that do not share the same filesystem. To support this we will need a way to create a file in the plugins context.
This is actually just a limitation due to the implementation details of the docker plugin, as we run the client side of the request inside of the docker container. If we were not concerned about binary comparability, we could implement the docker plugin to use a local process, and communicate with the filecoin daemon running in docker.
Any command that uses cmds.FileArg
supports reading from stdin
which we can use to allow this to work without the filesystem issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this is something we need to watchout for.
I think file path could work if we make some assumptions about the environment the docker plugin runs in. For example, if the docker containers have a predictable set of files in something like /data/files/*
then we could use file path to tell the plugins to load those files. Not saying this is the correct approach, just one to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think what you are getting at is that ClientImportData
could also accept an io.Reader
that it passes to RunCmdJSONWithStdin
that will be used on an as needed basis, I will add that if thats what you mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
I just now realized that we are using go-ipfs-files
, I thought this was actually golang os.File
.
Something to note, a PR landed about a month ago that changed this interface up quite a bit. Though it appears to still maintain the basic functionality.
ipfs/go-ipfs-files#2
I think we can continue to accept the files.File
, because a user can create one from an io.ReadCloser
, so for a in memory buffer users could use bytes.Buffer
wrapped in bufio.NopCloser
, and then create the files.File
with files.NewReaderFile
. The current files.NewReaderFile
accepts a path, but if the user leaves it blank, we can use this information to instead pass the file in as stdin (as opposed to passing the filename in), as it implements io.Reader
.
The no path idea is also native to the new implementation (the super easy one being NewBytesFile
)
https://github.com/ipfs/go-ipfs-files/blob/86b2cb84da8644dab2bd08797731cf12150fedf7/readerfile.go#L19-L34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I completely follow. Wanna give my rebase a 👍 if I got it right?
https://github.com/filecoin-project/go-filecoin/pull/1682/files#diff-2793111db86639f3cba9e4d114fb10a2R30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya the update addresses my original comment. The followup was how to support both using stdin, and a file by name. We don't need to worry about that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Not sure I understand the "No Path Idea" being described above, can you file an issue for it -- it seems like a good idea to have here at some point.
072a9c6
to
f975aaf
Compare
f975aaf
to
b920282
Compare
closes #1667