Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Actions around client and retrieval-client commands #1687

Merged
merged 1 commit into from
Jan 25, 2019

Conversation

frrist
Copy link
Member

@frrist frrist commented Jan 23, 2019

closes #1667


// ClientCat runs the client cat command against the filecoin process.
// TODO the actual command returns a DAGReader, however that's not easily doable here,
// since DAGReader implements the "ReadSeekCloser" a ReadCloser should work for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er, i dont follow. when this runs the command i expect it gets text or json output from the command, how does it know anything about a dagreader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json and text won't work very well if the file we are cat'ing is a binary file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im missing something about how iptb works. doesnt it invoke a shell command against the gfc binary? if so why should iptb do anything different than what the client cat command does, which is presumable output the binary to stdout? why would we care what the command does under the hood? @travisperson @frrist

Copy link
Member Author

@frrist frrist Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we are wanting to avoid writing the output of an IPTB command to a buffer and instead aim to expose a pipe of that data directly to the caller so they may operate on it efficiently and exert appropriate backpressure.

As an example: currently if I was to cat a file that was 100GB using IPTB it would buffer here until the process crashed. We want to modify IPTB to pass this reader directly to the caller (the method this comment is on). This will allow us to efficiently read the data without buffering using go's io.Reader interface. I will add a better TODO here. @travisperson has come up with a solution and is composing an issue that's addresses it.

You aren't really missing anything, our IPTB plugins just weren't written very well because they haven't been a priority until FAT got traction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it know anything about a dagreader?

It doesn't really. However, that interface is just a grouping of io.Reader, io.Seeker, io.Closer, io.WriterTo, with two additional methods Size() uint64, and CtxReadFull(context.Context, []byte) (int, error).

There is no reason it couldn't return something that implemented all of those if we could find a way to get the size out. Doing that with just the client cat command would be difficult, and require some other kind of encoding that it doesn't support at the moment, that I know of.

doesnt it invoke a shell command against the gfc binary?

Yes, this is basically how the plugins are implemented.

if so why should iptb do anything different than what the client cat command does, which is presumable output the binary to stdout?

Dumping the data of the dag is all it does right now. My guess is @frrist is saying maybe it should do more. If we can get the size at the start of the call to client cat, then we could implement it. However, I don't think client cat should worry about this. To properly implement the entire DagReader interface we would probably use the dag sub-commands. As efficiently implementing io.Seeker would require bi-directional communication and I don't think cmdkit supports this at all.

Basically we are wanting to avoid writing the output of an IPTB command to a buffer and instead aim to expose a pipe of that data directly to the caller so they may operate on it efficiently and exert appropriate backpressure.

I don't understand what the buffering issue has to do with this thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something to keep in mind is that we are implementing an api in FAT for the cli, not the go-filecoin plumbing api.

if err != nil {
return nil, err
}
return out.Stdout(), err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we always be getting json output so we break less when output changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client cat just dumps the binary blob of the object referenced by the CID. I would say this is the correct behavior and probably shouldn't change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing me to rethink the Output interface on IPTB.

In the filecoin plugins we read the entire output of a command into a byte array and then return it as a reader through the Output interface. In hindsight this seems wasteful and redundant. If we were cat'ing a 12TB file this pattern wouldn't work as it doesn't allow the consumer to read returned data on an "as needed" basis.

To avoid this I am thinking we may want to add something like a StreamingOutput interface to IPTB that would allow us to pass the output pipes all the way through to commands like this. The StreamingOutput interface would expose a Close() method that calls exec.Wait(). This would allow us to wait for the respective I/O loop copying from the process to complete, and would give use the ability to handle large volumes of data when calling IPTB methods from code like this.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the interface already supports this use case. Both Stderr and Stdout are defined to return an io.ReadCloser. I think we just need a better implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is those pipes will be closed when Wait gets called, meaning we cannot read from them unless we allow the caller of the method to call Wait, currently it is handled in RunCmd -- this is the problem I am trying to get at.

@@ -0,0 +1,19 @@
package fat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you mixing underscores and dashes in filenames?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the command is called retrieval-client -- open to suggestions on this though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think converting the dashes to underscores is fine.

Copy link
Contributor

@travisperson travisperson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, but we need to use the stdin for files to support the docker plugin.

var out cid.Cid
if err := f.RunCmdJSONWithStdin(ctx, nil, &out, "go-filecoin", "client", "import", data.FullPath()); err != nil {
return cid.Undef, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a file path isn't going to work for the docker plugin, or other iptb plugins that do not share the same filesystem. To support this we will need a way to create a file in the plugins context.

This is actually just a limitation due to the implementation details of the docker plugin, as we run the client side of the request inside of the docker container. If we were not concerned about binary comparability, we could implement the docker plugin to use a local process, and communicate with the filecoin daemon running in docker.

Any command that uses cmds.FileArg supports reading from stdin which we can use to allow this to work without the filesystem issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this is something we need to watchout for.

I think file path could work if we make some assumptions about the environment the docker plugin runs in. For example, if the docker containers have a predictable set of files in something like /data/files/* then we could use file path to tell the plugins to load those files. Not saying this is the correct approach, just one to consider.

Copy link
Member Author

@frrist frrist Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think what you are getting at is that ClientImportData could also accept an io.Reader that it passes to RunCmdJSONWithStdin that will be used on an as needed basis, I will add that if thats what you mean

Copy link
Contributor

@travisperson travisperson Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

I just now realized that we are using go-ipfs-files, I thought this was actually golang os.File.

Something to note, a PR landed about a month ago that changed this interface up quite a bit. Though it appears to still maintain the basic functionality.
ipfs/go-ipfs-files#2

I think we can continue to accept the files.File, because a user can create one from an io.ReadCloser, so for a in memory buffer users could use bytes.Buffer wrapped in bufio.NopCloser, and then create the files.File with files.NewReaderFile. The current files.NewReaderFile accepts a path, but if the user leaves it blank, we can use this information to instead pass the file in as stdin (as opposed to passing the filename in), as it implements io.Reader.

The no path idea is also native to the new implementation (the super easy one being NewBytesFile)
https://github.com/ipfs/go-ipfs-files/blob/86b2cb84da8644dab2bd08797731cf12150fedf7/readerfile.go#L19-L34

Copy link
Member Author

@frrist frrist Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I completely follow. Wanna give my rebase a 👍 if I got it right?
https://github.com/filecoin-project/go-filecoin/pull/1682/files#diff-2793111db86639f3cba9e4d114fb10a2R30

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya the update addresses my original comment. The followup was how to support both using stdin, and a file by name. We don't need to worry about that right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Not sure I understand the "No Path Idea" being described above, can you file an issue for it -- it seems like a good idea to have here at some point.

@frrist frrist merged commit 3d3ca60 into master Jan 25, 2019
@frrist frrist deleted the feat/fat-actions-client branch January 25, 2019 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Actions for client and retrieval-client commands
3 participants