Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use initial schema during bulk load. #3333

Merged
merged 1 commit into from
Apr 29, 2019

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Apr 26, 2019

Use the initial schema during bulk load, otherwise data might not be
loaded correctly for those predicates.

In particular, I observed that when loading the 21million dataset with
dgraph.type triples added to it, queries would not respond and would
eventually timeout. I figured this was because the index for that
predicate was not built during the bulkload. I reloaded the dataset with
my changes included and queries respond immediately.

Fixes #3329


This change is Reviewable

Use the initial schema during bulk load, otherwise data might not be
loaded correctly for those predicates.

In particular, I observed that when loading the 21million dataset with
dgraph.type triples added to it, queries would not respond and would
eventually timeout. I figured this was because the index for that
predicate was not built during the bulkload. I reloaded the dataset with
my changes included and queries respond immediately.
@martinmr martinmr requested review from manishrjain and a team as code owners April 26, 2019 23:39
Copy link
Contributor

@codexnull codexnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @manishrjain)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @martinmr)


schema/schema.go, line 423 at r1 (raw file):

// them later than miss some of them. An example of such situation is during bulk
// loading.
func CompleteInitialSchema() []*pb.SchemaUpdate {

So, does it mean that bulk loader would always spit out the schemas for acls, even when the user only wants the open source version?

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain)


schema/schema.go, line 423 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

So, does it mean that bulk loader would always spit out the schemas for acls, even when the user only wants the open source version?

Initial schema will consider the worker options when deciding if a predicate should be created at start up. Right now there are two type of optional predicates (_predicate_ and ACL predicates) but there might be more in the future so I thought it would be safer to add all of them during the bulk process rather than risking missing some of them and not creating the proper indices during bulk load.

In any case, the ACL predicates are considered reserved even if ACL is off, so users should not be able to create another predicate with the same name.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain)

@martinmr martinmr merged commit 85c608f into master Apr 29, 2019
@martinmr martinmr deleted the martinmr/bulk-load-initial-schema branch April 29, 2019 22:58
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
Use the initial schema during bulk load, otherwise data might not be
loaded correctly for those predicates.

In particular, I observed that when loading the 21million dataset with
dgraph.type triples added to it, queries would not respond and would
eventually timeout. I figured this was because the index for that
predicate was not built during the bulkload. I reloaded the dataset with
my changes included and queries respond immediately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Bulk loader not using reserved predicates.
3 participants