Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard token filter removal causes exceptions after upgrade #50734

Closed
matriv opened this issue Jan 8, 2020 · 2 comments · Fixed by #50912
Closed

Standard token filter removal causes exceptions after upgrade #50734

matriv opened this issue Jan 8, 2020 · 2 comments · Fixed by #50912
Assignees
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@matriv
Copy link
Contributor

matriv commented Jan 8, 2020

The removal of standard token filter in combination with the way the relevant factories are cached causes exceptions to be thrown when trying to query or insert documents to a < 7.0.0 index.

Reproduction steps:

  • Create an index in es 6.8.6
PUT /myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type":      "custom", 
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding",
            "standard"
          ]
        }
      }
    }
  }
}

POST /myindex/_mapping/_doc

{
  "properties": {
    "title": {
      "type":     "text",
      "analyzer": "my_custom_analyzer"
    }
  }
}
  • Upgrade to 7.4.2 and then query the index or insert a doc:
GET /myindex/_search
{
	"query": {
		"match" : {
			"title" : "Lala la lalala as a developer adf"
		}
	}
}

or

POST /myindex/_doc
{
	"title" : "foo bar"
}

and exception is thrown:

Caused by: java.lang.IllegalArgumentException: The [standard] token filter has been removed.
	at org.elasticsearch.indices.analysis.AnalysisModule.lambda$setupPreConfiguredTokenFilters$1(AnalysisModule.java:189) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.PreConfiguredTokenFilter.lambda$singletonWithVersion$2(PreConfiguredTokenFilter.java:66) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.PreConfiguredTokenFilter$1.create(PreConfiguredTokenFilter.java:132) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:92) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:136) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:199) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.createQuery(MatchQuery.java:497) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.createFieldQuery(MatchQuery.java:386) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.apache.lucene.util.QueryBuilder.createBooleanQuery(QueryBuilder.java:96) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.elasticsearch.index.search.MatchQuery.parseInternal(MatchQuery.java:289) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.search.MatchQuery.parse(MatchQuery.java:281) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.MatchQueryBuilder.doToQuery(MatchQueryBuilder.java:426) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:99) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.QueryShardContext.lambda$toQuery$1(QueryShardContext.java:305) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:317) ~[elasticsearch-7.4.2.jar:7.4.2]
	... 17 more

The exception is gone if the es node is restarted once again (after the upgrade to >= 7).
It's caused by the way the Analysis#setupPreConfiguredTokenFilters registers in the cache using the PreConfiguredTokenFilter#singletonWithVersion. The strategy used is ONE so there is only one factory and not one per version. So when the node starts for the first time in >= 7 a bunch of new internal indices are created:

[2020-01-07T18:43:52,363][INFO ][o.e.c.m.MetaDataIndexTemplateService] [matriv] adding template [.watch-history-10] for index patterns [.watcher-history-10*]
[2020-01-07T18:43:52,364][WARN ][o.e.c.s.MasterService    ] [matriv] took [43.8s], which is over [10s], to compute cluster state update for [create-index-template [.watch-history-10], cause [api]]
[2020-01-07T18:43:55,023][INFO ][o.e.c.m.MetaDataIndexTemplateService] [matriv] adding template [.slm-history] for index patterns [.slm-history-1*]
[2020-01-07T18:43:59,344][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [matriv] adding index lifecycle policy [watch-history-ilm-policy]
[2020-01-07T18:43:59,467][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [matriv] adding index lifecycle policy [slm-history-ilm-policy]
[2020-01-07T18:43:59,734][INFO ][o.e.c.r.a.AllocationService] [matriv] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[myindex][0]]]).

Of course those have index creation version 7.x.x and so the TokenFilterFactory is registered once with version 7.x.x. When our data index myindex gets processed it uses the 7.x.x as version (because due to the ONE caching strategy there is no other instanced cache with version 6.x.x) and so the code below:

PreConfiguredTokenFilter.singletonWithVersion("standard", true, (reader, version) -> {
                if (version.before(Version.V_7_0_0)) {
                    deprecationLogger.deprecatedAndMaybeLog("standard_deprecation",
                        "The [standard] token filter is deprecated and will be removed in a future version.");
                } else {
                    throw new IllegalArgumentException("The [standard] token filter has been removed.");
                }
                return reader;
            }));
``` leads to the exception.
@matriv matriv added >bug :Search Relevance/Analysis How text is split into tokens labels Jan 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@matriv matriv assigned matriv and unassigned romseygeek Jan 13, 2020
matriv added a commit to matriv/elasticsearch that referenced this issue Jan 13, 2020
The `PreConfiguredTokenFilter#singletonWithVersion` uses the version
internaly for the token filter factories but it registers only one
instance in the cahce and not one instance per version. This can lead
to exceptions like the one described in elastic#50734 since the version created
of the first index creates and caches the singleton.

Remove the `singletonWithVersion()` methods and use the
`elasticsearchVersion()` methods instead.

Fixes: elastic#50734
matriv added a commit that referenced this issue Jan 16, 2020
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internaly for the token filter factories but it registers only one
instance in the cahce and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: #50734
matriv added a commit to matriv/elasticsearch that referenced this issue Jan 16, 2020
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internaly for the token filter factories but it registers only one
instance in the cahce and not one instance per version. This can lead
to exceptions like the one described in elastic#50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: elastic#50734
(cherry picked from commit 24e1858)
matriv added a commit to matriv/elasticsearch that referenced this issue Jan 16, 2020
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internaly for the token filter factories but it registers only one
instance in the cahce and not one instance per version. This can lead
to exceptions like the one described in elastic#50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: elastic#50734
(cherry picked from commit 24e1858)
matriv added a commit to matriv/elasticsearch that referenced this issue Jan 16, 2020
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internaly for the token filter factories but it registers only one
instance in the cahce and not one instance per version. This can lead
to exceptions like the one described in elastic#50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: elastic#50734
(cherry picked from commit 24e1858)
matriv added a commit that referenced this issue Jan 16, 2020
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internally for the token filter factories but it registers only one
instance in the cache and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: #50734
(cherry picked from commit 24e1858)
matriv added a commit that referenced this issue Jan 16, 2020
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internally for the token filter factories but it registers only one
instance in the cache and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: #50734
(cherry picked from commit 24e1858)
matriv added a commit that referenced this issue Jan 16, 2020
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internally for the token filter factories but it registers only one
instance in the cache and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: #50734
(cherry picked from commit 24e1858)
@matriv
Copy link
Contributor Author

matriv commented Jan 16, 2020

master : 24e1858
7.x : fda25ed
7.6 : b65e293
7.5 : 3055eee

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internaly for the token filter factories but it registers only one
instance in the cahce and not one instance per version. This can lead
to exceptions like the one described in elastic#50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: elastic#50734
@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants