Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Google Cloud storage as a DataStore #106

Merged
merged 18 commits into from
Sep 16, 2017
Merged

Conversation

tab1293
Copy link
Contributor

@tab1293 tab1293 commented Feb 3, 2017

This pull request allows tusd users to specify Google Cloud storage as their datastore backend. By passing the --gcs-bucket [bucket-name] flag and setting the GCS_SERVICE_ACCOUNT_FILE environment variable to point to their Google service account file, tusd can now upload to GCS. @Acconut please let me know what you think of this feature and if there are any changes you'd like me to make.

cc @vayam

@Acconut
Copy link
Member

Acconut commented Feb 5, 2017

Wow, this is truly amazing and impressive, especially since I didn't expect it 👍 Thank you a lot for this. Unfortunately, I am currently a bit busy due to private circumstances and won't be able to fully review it in the next two weeks but I will do so afterwards.

@Acconut Acconut self-requested a review February 22, 2017 12:48
@Acconut Acconut self-assigned this Feb 22, 2017
Copy link
Member

@Acconut Acconut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the delay on my side but I had a deep dive into your code and it looks very good. Yet, I found a few topic worth talking about. As a last wish, would you be so kind a run go fmt since I found a few style inconsistencies?


import (
"io"
"context"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still claiming support for Go 1.5 and 1.6 which to not include the context package. Would it be possible to use golang.org/x/net/context instead?

split := strings.Split(name, "_")
idx, err := strconv.Atoi(split[len(split)-1])
if err != nil {
return -1, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return 0 here as that's the default value for int64s and also matches the remaining returned values.

@@ -0,0 +1,154 @@
package gcsstore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper documentation for all exported types and functions in this file would be really appreciated. :)

}

m[i] = objAttrs.Name
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to collect the names directly in a slice using https://godoc.org/google.golang.org/api/iterator#PageInfo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this interface is that it is not possible to convert directly between an iterator and a slice. This is because, depending on the number of objects returned by the query, additional API calls might be made to fetch more objects. That is why Next() and Done are used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. However, it might be more efficient to just create a slice (with a start capacity of 10 or so) and let it grow while you append the new object names. Since the capacity of the slice is doubled if it has to be reallocated (see https://play.golang.org/p/10KXfDndw4), we could get rid of the intermediate map.

@@ -0,0 +1,224 @@
package gcsstore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation for the GCSStore struct and New function will be really nice. In addition, information about what access rights are required for this should be included in a similar fashion as it's done for the s3store package: https://github.com/tus/tusd/blob/master/s3store/s3store.go#L3

}

info.Offset = info.Offset + n
err = store.WriteInfo(id, info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the call to WriteInfo fails (for whatever reason), you can end up in an inconsistent state where you were able to upload the chunk to the bucket but didn't update the offset. In order to avoid this issue, the s3store does not update the offset after writing a chunk but instead calculates the offset for each call to GetInfo (see https://github.com/tus/tusd/blob/master/s3store/s3store.go#L309-L335). It would be nice to have a similar approach here.

}

var objectParams GCSObjectParams
for _, name := range names {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to reuse the same functionality from Terminate here as they both to pretty much the same?

Sources: names,
}

err = store.Service.ComposeObjects(composeParams)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be helpful to attach the metadata to the composed object (compare with https://github.com/tus/tusd/blob/master/s3store/s3store.go#L203)

@@ -0,0 +1,337 @@
package gcsstore_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be beneficial to also add tests for the GCSApi as that is the part which is directly communicating with the GCS.

Prefix: prefix,
}

names, err := store.Service.FilterObjects(filterParams)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that the object names are returned in the correct order? I believe this could have issues if you have more than 10 chunks uploaded. For example, you have following objects: foo_0, foo_1, ..., foo_9, foo_10. Will the ordering in this case be foo_0, foo_1, ..., foo_9, foo_10 (as wanted) or foo_0, foo_1, foo_10, foo_2, ..., foo_9 (sorted lexicographically)?

Copy link
Contributor Author

@tab1293 tab1293 Mar 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, good catch. It seems query responses are sorted lexicographically and therefore are not composed correctly. I also just realized that Google limits compositions to a max of 32 objects. I will make changes to fix these two issues.

@tab1293
Copy link
Contributor Author

tab1293 commented Apr 3, 2017

@Acconut I've made the following changes to this PR:

  • Proper documentation
  • Using golang.org/x/net/context
  • FilterObjects returning the correct order of objects
  • Handle Google's max of a 32 object composition operation
  • Compare CRC32 checksum of composed object and its parts. Max 3 retries.
  • Call to WriteInfo calculates the offset instead of incrementing the offset in WriteChunk
  • Metadata is attached to composed object
  • Other minor tweaks you suggested

I have updated the GCSStore tests with these changes but have not had a chance to write tests for the GCSService. Between me and @peixian, we will get these tests written in the near future. In the mean time, how do you feel about merging this to master? It would be a convenience for us to not have to manage this separate branch. I can also squash these commits in to one if you would like :)

@kvz
Copy link
Member

kvz commented Apr 4, 2017

This is looking really good!

Question: should we write a new blog post up on tus.io that we have gcs support once all's said & done? @tab1293 I think it would be great Tom if you or a teammate could take the lead in that as author. If you don't have the time however, we could also figure something out with the Transloadit crew, as I believe it will be good for tus to get the word out on this

@tab1293
Copy link
Contributor Author

tab1293 commented Apr 6, 2017

@kvz yes good idea. I can have a blog post written about this probably in the next week or so. Is this a blocker for merge?

@kvz
Copy link
Member

kvz commented Apr 6, 2017

Is this a blocker for merge?

Definitely not, we can add that later! This was just a slightly related idea I had, I'll leave the actual code review to Marius as Go is not my strong suit

@Acconut
Copy link
Member

Acconut commented Apr 7, 2017

Between me and @peixian, we will get these tests written in the near future. In the mean time, how do you feel about merging this to master? It would be a convenience for us to not have to manage this separate branch. I can also squash these commits in to one if you would like :)

Let's phrase it like following: I could sleep better if this PR already included the necessary tests ;) Besides that, your changes are looking very good! I will have another in-depth look this weekend.

@Acconut
Copy link
Member

Acconut commented Apr 9, 2017

@tab1293 This PR is looking amazing, in particular after your latest round of updates! A few more things (besides test), please:

@peixian
Copy link

peixian commented Apr 17, 2017

Hey @Acconut can you clarify on what tests you'd like to see? @tab1293 and I were thinking that we lacked tests on a real GCS bucket, but we were wondering if there's anything other tests we're didn't think of.

@Acconut
Copy link
Member

Acconut commented Apr 17, 2017

were thinking that we lacked tests on a real GCS bucket, but we were wondering if there's anything other tests we're didn't think of.

That's a good idea but I was also talking about tests for the gcsservice.go file.

@peixian
Copy link

peixian commented May 1, 2017

Hey @Acconut, I've updated this and added the vendoring dependencies so that the travis tests pass for Go 1.6-1.8. However, the underlying grpc package that is used by the google api dropped support for Go1.5 (grpc/grpc-go@0a20758), and uses dialContext, which is also not available for Go1.5. We were curious on how to proceed for that, should the gcsstore tests simply be skipped like the consul tests on Go1.5?

@Acconut
Copy link
Member

Acconut commented May 10, 2017

I've updated this and added the vendoring dependencies so that the travis tests pass for Go 1.6-1.8

Perfect 👍

We were curious on how to proceed for that, should the gcsstore tests simply be skipped like the consul tests on Go1.5?

Yes, that would be nice.

Also, what's the status on the additional tests (see #106 (comment))? I'm sorry that I am so picky about them but I like my code tested :)

@peixian
Copy link

peixian commented May 10, 2017

@Acconut I'll commit a change tomorrow to skip the tests in Go 1.5, and hopefully later this week I'll add in some tests for gcsservice.go.

@tab1293
Copy link
Contributor Author

tab1293 commented May 17, 2017

@Acconut @peixian @kvz Let's merge this thang! :)

@Acconut
Copy link
Member

Acconut commented May 20, 2017

@tab1293 I am still waiting on the tests for gcsservice.go, that you promised :) Or did you change the plans for that?

@peixian
Copy link

peixian commented May 23, 2017

@Acconut I can't seem to think of a way to test gcsservice.go without using a real production service, do you have any suggestions for that?

@Acconut
Copy link
Member

Acconut commented May 24, 2017

Either that or you create your own mock server using httptest.Server (https://golang.org/pkg/net/http/httptest/#Server) and then point the GCSService to it. It allows you to assert the incoming requests and also tests how well the client handles errors.

@Acconut
Copy link
Member

Acconut commented May 24, 2017

Also, thinking about this, would it be possible to not export GCSService and reduce the exposed API making changes in the future easier?

@tab1293
Copy link
Contributor Author

tab1293 commented Jul 5, 2017

@Acconut I still don't see how creating a mock http server will enable us to create tests for the GCSService. If you look at the implemented interface methods such as ReadObject, DeleteObject, WriteObject, ComposeObjects, etc, these are all thin wrappers around the cloud.google.com/go/storage/Client interface. This client does not work against a mockable http server. The only way I could see testing actually be meaningful is if we create end to end tests against an actual Google Cloud storage account like @peixian mentioned (which would cost money). I think the tests written for the GCSStore should be enough to get this PR merged in to master. If you disagree, could you be more specific or provide a code example of how you would want to test the GCSService?

@Acconut
Copy link
Member

Acconut commented Jul 8, 2017

If you look at the implemented interface methods such as ReadObject, DeleteObject, WriteObject, ComposeObjects, etc, these are all thin wrappers around the cloud.google.com/go/storage/Client interface.

I disagree. I would not consider functions, such as compose (https://github.com/tus/tusd/pull/106/files#diff-c920e8ef8f3f85a7b94ee8d23483c11aR197) and recursiveCompose (https://github.com/tus/tusd/pull/106/files#diff-c920e8ef8f3f85a7b94ee8d23483c11aR255), thin wrappers. Instead they contain a lot of logic which needs tests to ensure they keep working in the future. I know that writing tests is not fun but if software hits the production environment, tests are mandatory.

If you disagree, could you be more specific or provide a code example of how you would want to test the GCSService?

As I pointed out some time ago, Go has a built-in HTTP server for testing (https://golang.org/pkg/net/http/httptest/#Server). Using it is relatively easy: You start it, point the GCSService at the server's URL and assert that the proper requests are coming in. Here is also an article which explains this concept a bit more: https://mkaz.tech/geek/testing-clients-to-an-http-api-in-go/. If you want to, I could also write an example test.

peixian and others added 4 commits July 12, 2017 11:44
* Refactor as a struct of clojures for testing

* Refactor to support tests

* Add docs

* Struct members cannot fufill the interface, adding wrapper functions
@peixian
Copy link

peixian commented Jul 12, 2017

Hey @Acconut, I've updated this PR with 2 tests for gcsservice.go, one of which tests all the compose functions.

Unfortunately the Google Storage library cannot be easily ran with a mock HTTP test server since the Google Cloud Storage library assumes you are interacting with their infrastructure, meaning that you cannot pass in the mock server's address. Due to this, I opted to refactor some of the code to support closures as members of the GCSService struct, which allows them to be overwritten during testing.

@Acconut
Copy link
Member

Acconut commented Jul 15, 2017

Unfortunately the Google Storage library cannot be easily ran with a mock HTTP test server since the Google Cloud Storage library assumes you are interacting with their infrastructure, meaning that you cannot pass in the mock server's address.

Are you sure? AFAIK, the SDK has a WithEndpoint (https://godoc.org/google.golang.org/api/option#WithEndpoint) option which can be used in a similar fashion to you do it here: https://github.com/tus/tusd/pull/106/files#diff-c920e8ef8f3f85a7b94ee8d23483c11aR90

Also, thank you for your efforts :)

@peixian
Copy link

peixian commented Jul 17, 2017

Yep, I tried the option.WithEndpoint previously, and it didn't seem to work, returning the generic Caller does not have permissions to storage.... As far as I could tell, the examples for the storage client (https://github.com/GoogleCloudPlatform/golang-samples/blob/master/storage/buckets/main_test.go and https://github.com/GoogleCloudPlatform/golang-samples/blob/master/storage/objects/main_test.go) do not use the withEndpoint function.

The internal tests to Google Storage do utilize the net/httptest, however, they involve setting specific parameters to enable the tests to work, which would not work for the gcsservice unless we retooled it to take parameters that are only used for testing, which I think would decouple the tests from the actual library.

@peixian
Copy link

peixian commented Jul 24, 2017

@Acconut ^

@Acconut
Copy link
Member

Acconut commented Jul 28, 2017

@peixian I apologize for not responding here but I am very busy due to private reasons and cannot take a look at the current situation until next week. I hope you understand :)

@Acconut
Copy link
Member

Acconut commented Aug 29, 2017

@peixian I deeply apologize for my delay, other projects occupied my time and I lost a bit track of your work here. This provided me the time to attempt writing tests on my own. As you correctly said the WithEndpoint function does not really work for our situation. However, this still does not mean we cannot create proper tests ;) Instead, I successfully used the gock (https://github.com/h2non/gock) package to do so. It incepts every HTTP request and allows us to mock the responses (it's README has a short section on how it works: https://github.com/h2non/gock#how-it-mocks).

Below you can find an example test for the GetObjectSize function. Please not that I did not use a credentials file (since OAuth2 would require significant more work to mock) and instead just supply an API key.

package gcsstore_test

import (
	"context"
	"testing"

	"gopkg.in/h2non/gock.v1"

	"cloud.google.com/go/storage"
	. "github.com/tus/tusd/gcsstore"
	"google.golang.org/api/option"
)

func TestGetObjectSize(t *testing.T) {
	defer gock.Off()

	gock.New("https://www.googleapis.com").
		Get("/storage/v1/b/test-bucket/o/test-name").
		MatchParam("key", "foo").
		Reply(200).
		JSON(map[string]string{"size": "54321"})

	ctx := context.Background()
	client, err := storage.NewClient(ctx, option.WithAPIKey("foo"))
	if err != nil {
		t.Fatal(err)
	}

	service := &GCSService{
		Client: client,
		Ctx:    ctx,
	}

	size, err := service.GetObjectSize(GCSObjectParams{
		Bucket: "test-bucket",
		ID:     "test-name",
	})
	if err != nil {
		t.Errorf("Error: %v", err)
	}

	if size != 54321 {
		t.Errorf("Error: Did not match given size")
	}
}

What do you think of this approach?

@peixian
Copy link

peixian commented Sep 7, 2017

@Acconut I think Gock is a fine solution to this problem, I've updated the PR with Gock tests for each function in gcsservice

@Acconut
Copy link
Member

Acconut commented Sep 9, 2017

@peixian Those tests look very good, nice work 👍 Since we found a way to mock the Google Cloud Storage API, I believe these functions properties (6eb4efe#diff-c920e8ef8f3f85a7b94ee8d23483c11aR78) in GCSService can be removed, can't they?

@Acconut Acconut merged commit 4c0f4cc into tus:master Sep 16, 2017
@Acconut
Copy link
Member

Acconut commented Sep 16, 2017

Just merged your PR. I would like to thank you very much for your efforts and especially your patience with my requests. It is greatly appreciated :)

@tab1293
Copy link
Contributor Author

tab1293 commented Sep 16, 2017

🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants