Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the number of nested documents #27405

Merged

Conversation

mayya-sharipova
Copy link
Contributor

Add an index level setting index.mapping.nested_objects.limit to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes #26962

@mayya-sharipova mayya-sharipova added >breaking v7.0.0 :Data Management/Indices APIs APIs to create and manage indices and templates labels Nov 15, 2017
@mayya-sharipova mayya-sharipova force-pushed the limit-nested-docs-number branch 2 times, most recently from bf33198 to a5c68ee Compare November 16, 2017 21:47
@cbuescher cbuescher self-assigned this Nov 17, 2017
@cbuescher cbuescher self-requested a review November 17, 2017 11:25
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mayya-sharipova, I did a first round of review and left a couple of comments, mostly minor. The change looks good already, can you please also add test that show that the default values work as expected (e.g. intergration tests) and maybe a rest test for a small setting (like the ones in the unit test you already have)

@@ -305,6 +305,10 @@ public void addDynamicMapper(Mapper update) {

private SeqNoFieldMapper.SequenceIDFields seqID;

private final Long maxAllowedNumNestedDocs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can probably be a primitive long

@@ -305,6 +305,10 @@ public void addDynamicMapper(Mapper update) {

private SeqNoFieldMapper.SequenceIDFields seqID;

private final Long maxAllowedNumNestedDocs;

private Long numNestedDocs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can probably be a primitive long

@@ -321,6 +325,8 @@ public InternalParseContext(@Nullable Settings indexSettings, DocumentMapperPars
this.version = null;
this.sourceToParse = source;
this.dynamicMappers = new ArrayList<>();
this.maxAllowedNumNestedDocs = indexSettings.getAsLong(MapperService.INDEX_MAPPING_NESTED_DOCS_LIMIT_SETTING.getKey(), 10000L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is a way to get the default value from the MapperService.INDEX_MAPPING_NESTED_DOCS_LIMIT_SETTING itself, since it seems to be defined there. Would be good to not have to define the constant in two places in case it needs changing in the future. I did some initial digging but need to understand this a bit better myself I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher thanks Christoph! I agree it is strange to define the constant in two places, but so far did not find the way. I will investigate more the way to define it a single place only.

@@ -366,6 +372,13 @@ public Document doc() {

@Override
protected void addDoc(Document doc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, is this only called for nested documents? I see there is another ParseContext implementation with "addDoc()", I guess that one isn't used for parsing nested documents?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher Yes, this method only invoked to add nested documents. The Lucene document for the non nested parts of the document being indexed is added to the documents list in the constructor of this class.

@@ -524,4 +523,95 @@ public void testParentObjectMapperAreNested() throws Exception {
assertFalse(objectMapper.parentObjectMapperAreNested(mapperService));
}

public void testLimitNestedDocs() throws Exception{
// setting limit to allow only two nested objects in the whole doc
Long nested_docs_limit = 2L;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: long, otherwise it gets outboxed later anyway

DocumentMapper docMapper = mapperService.documentMapperParser().parse("type", new CompressedXContent(mapping));

// parsing a doc with 2 nested objects succeeds
SourceToParse source1 = SourceToParse.source("test1", "type", "1", XContentFactory.jsonBuilder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion so make this kind of xContent setup easier to read and less prone to accidental reformatting. I sometimes like to do something a little bit more verbose like this:

XContentBuilder docBuilder = XContentFactory.jsonBuilder();
docBuilder.startObject();
{
      docBuilder.startArray("nested1");
      {
           docBuilder.startObject().field("field1", "11").field("field2", "21").endObject();
           docBuilder.startObject().field("field1", "12").field("field2", "22").endObject();
      }
      docBuilder.endArray();
}
docBuilder.endObject();
SourceToParse source1 = SourceToParse.source("test1", "type", "1", docBuilder.bytes(), XContentType.JSON);

It allows to add some indentation that won't break so easily when reformatting the source.

);
}

public void testLimitNestedDocsMultipleNestedFields() throws Exception{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting (and a great test by the way), because I just realized that the maximum number of nested docs check is applied across all nested fields in the document. I didn't immediately get that from the desciption in #26962 or the PR so far, maybe worth adding to the migration notest and docs as well.

The maximum number of `nested` json objects within a single document across
all nested fields, defaults to 10000. Indexing one document with an array of
100 objects within a nested field, will actually create 101 documents, as
each nested object will be indexed as a separate hidden document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something that this limit applies accros all nested fields in a document. (see above)

@mayya-sharipova
Copy link
Contributor Author

@cbuescher Christoph, thanks for your review. I have tried to address your comments in the 2nd commit, can you please continue/finalize the review when you have available time.
The only thing I am not sure is what you meant by "migration notest" in your comment: "maybe worth adding to the migration notest and docs as well".

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova thanks a lot, this looks very close. I left one more comment about testing the default setting and two nits, scratch my previous comment about "migration notest", I didn't see they were already there.

"Indexing a doc with No. nested objects less or equal to index.mapping.nested_objects.limit should succeed":
- skip:
version: " - 6.99.99"
reason: index.mapping.nested_objects setting was introduced from this version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you slightly rephrase so the version this feature was introduced is in the "reason" description (e.g. something like "the XYZ feature has been added in 7.0.0")? I think the skip "reason" is printed when running the tests and it might be useful to see the reason including the version at a glance.

"Indexing a doc with No. nested objects more than index.mapping.nested_objects.limit should fail":
- skip:
version: " - 6.99.99"
reason: index.mapping.nested_objects setting was introduced from this version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

.endObject().endObject().endObject().string();
DocumentMapper docMapper = mapperService.documentMapperParser().parse("type", new CompressedXContent(mapping));

// parsing a doc with 3 nested objects succeeds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case where #nested_docs < limit is already covered by the later tests, could you change this to that the the document has more than the default number of nested docs (simple loop) and gets rejected? This way we can test the default value works if no explicit setting is there. I'd also check if that unit test then takes a lot more time (which I doubt, gut feeling check is okay here I think)

Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes elastic#26962
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes elastic#26962
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes elastic#26962
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova thanks a lot, this looks amost ready, I'm just a bit confused about the new test that checks the default setting and why it works (see line comment). Maybe you can clarify that, other than that I think this PR is ready.

docBuilder.startObject().field("field1", "11").field("field2", "21").endObject();
docBuilder.startObject().field("field1", "12").field("field2", "22").endObject();
docBuilder.startObject().field("field1", "13").field("field2", "23").endObject();
for(int i=0; i<=defaultMaxNoNestedDocs; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some whitespaces around the operators etc... would be nice

docBuilder.startObject().field("field1", "12").field("field2", "22").endObject();
docBuilder.startObject().field("field1", "13").field("field2", "23").endObject();
for(int i=0; i<=defaultMaxNoNestedDocs; i++) {
docBuilder.startObject().field("f", i).endObject();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suprised this triggers the no. fields warnings since "f" is not the field thats defined with the "nested" type above? I would have expected this to not trigger the limit but the exception tested below seems to indicate it triggers the setting. What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher Please correct me if I misunderstood your previous comment: "could you change this to that the the document has more than the default number of nested docs (simple loop) and gets rejected".
This is what this new test is about.
We have a parent nested field nested1 under which we have many json objects with a field f, and since the number of these json objects is more than the default number, we get an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I probably read the test to quickly and missed the startArray() part. You are absolutely right, this is how I expected the test. Sorry for the noise.

}

public void testLimitNestedDocs() throws Exception{
// setting limit to allow only two nested objects in the whole doc
long nested_docs_limit = 2L;
long maxNoNestedDocs = 2L;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just as feedback, nothing to change really, but I liked the previous variable name better ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher thanks for the feedback, I thought in the previous variable names I did not follow Java camel case standards for variable names (was more of python style)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, and you are right, camel case is preferred. I probably misread the "NoNestedDocs" part of the name as "no nested docs" and that confused me for a second, but either way is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher thanks Christoph, you are rightNoNestedDocs is confusing, I will avoid using No for "number of" in the future.

Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes elastic#26962
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for bearing with me @mayya-sharipova ;-)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Data Management/Indices APIs APIs to create and manage indices and templates v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants