Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate API fails when validating Terms query #29033

Closed
jweber opened this issue Mar 13, 2018 · 5 comments · Fixed by #29483
Closed

Validate API fails when validating Terms query #29033

jweber opened this issue Mar 13, 2018 · 5 comments · Fixed by #29483
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@jweber
Copy link

jweber commented Mar 13, 2018

Elasticsearch version: 6.2.2

Plugins installed: -

JVM version: 1.8.0_161

OS version : docker image

Description of the problem including expected versus actual behavior:

The validate API is failing when attempting to validate a valid terms query. The following error message is shown:

nested: IllegalStateException[async actions are left after rewrite];; java.lang.IllegalStateException: async actions are left after rewrite

Steps to reproduce:

PUT twitter
{}
PUT twitter/_mapping/_doc
{
  "properties": {
    "user": { "type": "integer" },
    "followers": { "type": "integer" }
  }
}
POST twitter/_validate/query?explain=true
{
  "query": {
    "terms": {
      "user": {
        "index": "twitter",
        "type": "_doc",
        "id": "2",
        "path": "followers"
      }
    }
  }

Provide logs (if relevant):

Response from the validate API:

{
  "valid": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "twitter",
      "valid": false,
      "error": """
[twitter/v460ppW7Q6SMXFcqf1Q5SA] QueryShardException[failed to create query: {
  "terms" : {
    "user" : {
      "index" : "twitter",
      "type" : "_doc",
      "id" : "2",
      "path" : "followers"
    },
    "boost" : 1.0
  }
}]; nested: IllegalStateException[async actions are left after rewrite];; java.lang.IllegalStateException: async actions are left after rewrite
"""
    }
  ]
}
@colings86 colings86 added >bug help wanted adoptme :Search/Search Search-related issues that do not fall into other categories labels Mar 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@pcsanwald
Copy link
Contributor

pcsanwald commented Apr 9, 2018

Some notes: I reproduced this issue on master, just to confirm it's still an issue in 7.x. The message in the Exception is thrown in only one place in the codebase, Rewritable.java, reproducing the query and inspecting with a debugger, the error appears to be stemming from (at debug time) 4 async tasks that are still on the context (QueryRewriteContext) while Rewritable.rewrite is being called.

Looking at the TermsQueryBuilder class itself, the doRewrite method does appear to register async actions via queryRewriteContext.registerAsyncAction. So it seems like Rewritable.rewrite is expecting a context that doesn't have any outstanding async actions, yet the TermsQueryBuilder.doRewrite method is returning a context that will most likely have outstanding async actions. Aforementioned async actions look like this:

screen shot 2018-04-09 at 3 26 47 pm

It seems like these async actions would get executed via QueryRewriteContext.executeAsyncActions, but I'm pretty sure that code path isn't getting called for this case (I was unable to catch a breakpoint in this method in a debugger).

Interested if anyone from search-aggs has any idea, based on the above, what the problem might be rooted from? I'm not sure if TermsQueryBuilder.doRewrite is doing the wrong thing here, if the assertion about async items is incorrect in Rewritable, or, if all this is just a red herring and the problem lies elsewhere?

@colings86
Copy link
Contributor

I think the problem here is down to us only calling one of the two rewrite phases for the QueryBuilder in the Validate Query API. There are two rewrites phases that happen on the search API:

  1. The first happens on the coordinating node, this is where we can do things like fetches so we ensure that all the requests going to the shards have the same query (they are not affected by different shards getting different versions of the lookup for example). This is the rewrite phase I think we are missing in the validate query api.
  2. The second rewrite happens on the shard and rewrites the query to a a more efficient form if it can. An example of this is rewriting a part of the query that contains a date range to an exists query if the range of dates in the shard is completely within the range in the query. This is the rewrite phase that I think is throwing the exception.

The first rewrite phase is called when we execute a search request in TransportSearchAction. I think we might need to add similar logic to TransportValidateQueryAction#doExecute() so the request is rewritten on the coordinating node before we send it to the shard.

@cbuescher cbuescher self-assigned this Apr 10, 2018
@cbuescher cbuescher removed the help wanted adoptme label Apr 10, 2018
@cbuescher cbuescher assigned pcsanwald and unassigned cbuescher Apr 10, 2018
@pcsanwald
Copy link
Contributor

pcsanwald commented Apr 10, 2018

So it's seeming like TransportSearchAction is only part of the problem (or, isn't the problem). I've added a commit here that shows what I tried, but, this doesn't fix the issue in that there are still asyncTasks on the context. In tracking this down, I think the problem is actually in TermsQueryBuilder.doRewrite: That method is called by Rewritable in this line: rewrittenBuilder = builder.rewrite(context), and so the context goes from having 0 async tasks to having several inside the execution of Rewritable.rewrite.

So, I think the code that's causing the issue here is TermsQueryBuilder.doRewrite:

queryRewriteContext.registerAsyncAction((client, listener) -> {
                fetch(termsLookup, client, ActionListener.wrap(list -> {
                    supplier.set(list);
                    listener.onResponse(null);
                }, listener::onFailure));
            });

So it seems like what's happening is:

  1. Rewritable.rewrite invokes TermsQueryBuilder.doRewrite
  2. TermsQueryBuilder.doRewrite adds some async items to the context
  3. Rewritable.rewrite, then checks the context for outstanding async tasks, throws an error if they exist.

If the above is accurate, what's the best strategy to fix this?

Seems like we could either Update Rewritable to block on async tasks, update TermsQueryBuilder.doRewrite not to use async tasks, or something similar?

Only a couple of the QueryBuilders add async actions to the context, GeoShape query and Percolate query builders. Interestingly, PercolateQueryBuilder.doRewrite never gets called when I do an explain on a percolate query.

@colings86
Copy link
Contributor

colings86 commented Apr 11, 2018

@pcsanwald I took a look at this and applied your changes locally. The code is actually doing almost everything right in that its correctly registering the async action and then executing it when it runs Rewriteable.rewriteAndFetch() and the resulting query is correct, has the resolved list of terms and no more async actions. The problem is that we are not using the rewritten query after we have done the rewrite and fetch and are just propagating the original query. The reason for this is this line of code. Here we are being passed the newly rewritten query in the lambda (the variable is called source but it is the rewritten query, we should probably rename this variable to rewrittenQuery) but we are calling super.doExecute(task, request, listener) without modifying the request to contain the rewritten query. I think adding request.query(source); above the call to super.doExecute() will solve this problem and the request return the expected response without error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants