Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch High Level Client has a transitive dependency to Lucene #29184

Closed
yrodiere opened this issue Mar 21, 2018 · 18 comments
Closed

Elasticsearch High Level Client has a transitive dependency to Lucene #29184

yrodiere opened this issue Mar 21, 2018 · 18 comments
Labels

Comments

@yrodiere
Copy link
Contributor

yrodiere commented Mar 21, 2018

Elasticsearch version (bin/elasticsearch --version): 6.2.3

Plugins installed: N/A

JVM version (java -version): any

OS version (uname -a if on a Unix-like system): any

Description of the problem including expected versus actual behavior:

The Elasticsearch High Level Client seems to depend on internal Elasticsearch code, and ends up having a transitive dependency to Lucene itself.
Given the goal of the client is to be used in remote servers, where no actual indexing is performed, one would expect the client to not have any dependency to Lucene, and to only manipulate JSON data.

This dependency is a problem for applications that also happen to use Lucene locally, for other purposes. It creates complex dependency management issues when trying to find a version of Lucene that will suit both the Elasticsearch High Level Client and the specific application needs.
These applications cannot just exclude the dependency from the client to Lucene, because the client seems to be using Lucene's Version class at some point.

Would it be possible to trim down the dependency tree of the Elasticsearch High Level Client, excluding the Lucene dependency in particular, and avoiding the use of Lucene's Version class?

For the record, here is the dependency tree for the Elasticsearch High Level Client version 6.2.3:

org.elasticsearch.client:elasticsearch-rest-high-level-client:jar:6.2.3
+- org.elasticsearch:elasticsearch:jar:6.2.3:compile
|  +- org.elasticsearch:elasticsearch-core:jar:6.2.3:compile
|  +- org.apache.lucene:lucene-core:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-analyzers-common:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-backward-codecs:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-grouping:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-highlighter:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-join:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-memory:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-misc:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-queries:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-queryparser:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-sandbox:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-spatial:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-spatial-extras:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-spatial3d:jar:7.2.1:compile
|  +- org.apache.lucene:lucene-suggest:jar:7.2.1:compile
|  +- org.elasticsearch:securesm:jar:1.2:compile
|  +- org.elasticsearch:elasticsearch-cli:jar:6.2.3:compile
|  |  \- net.sf.jopt-simple:jopt-simple:jar:5.0.2:compile
|  +- com.carrotsearch:hppc:jar:0.7.1:compile
|  +- joda-time:joda-time:jar:2.9.9:compile
|  +- org.yaml:snakeyaml:jar:1.17:compile
|  +- com.fasterxml.jackson.core:jackson-core:jar:2.8.10:compile
|  +- com.fasterxml.jackson.dataformat:jackson-dataformat-smile:jar:2.8.10:compile
|  +- com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:jar:2.8.10:compile
|  +- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.8.10:compile
|  +- com.tdunning:t-digest:jar:3.0:compile
|  +- org.hdrhistogram:HdrHistogram:jar:2.1.9:compile
|  +- org.apache.logging.log4j:log4j-api:jar:2.9.1:compile
|  \- org.elasticsearch:jna:jar:4.5.1:compile
+- org.elasticsearch.client:elasticsearch-rest-client:jar:6.2.3:compile
|  +- org.apache.httpcomponents:httpclient:jar:4.5.2:compile
|  +- org.apache.httpcomponents:httpcore:jar:4.4.5:compile
|  +- org.apache.httpcomponents:httpasyncclient:jar:4.1.2:compile
|  +- org.apache.httpcomponents:httpcore-nio:jar:4.4.5:compile
|  +- commons-codec:commons-codec:jar:1.10:compile
|  \- commons-logging:commons-logging:jar:1.1.3:compile
+- org.elasticsearch.plugin:parent-join-client:jar:6.2.3:compile
|  +- org.locationtech.spatial4j:spatial4j:jar:0.6:compile
|  +- com.vividsolutions:jts:jar:1.13:compile
|  \- org.apache.logging.log4j:log4j-core:jar:2.9.1:compile
+- org.elasticsearch.plugin:aggs-matrix-stats-client:jar:6.2.3:compile
\- org.elasticsearch.plugin:rank-eval-client:jar:6.2.3:compile

Steps to reproduce:

  1. Create an application depending on the Elasticsearch High Level Client, version 6.2.3.
  2. Add a dependency to a version of Lucene other than 7.2.1.
  3. Enjoy the JAR Hell.

Provide logs (if relevant): N/A

@dadoonet
Copy link
Member

dadoonet commented Mar 21, 2018

See a related discussion here: #23331 (comment)
And also #28504 is needed before to reach that goal.

Not sure if we need to keep that issue opened though as we know we want to go that way anyway.

@gsmet
Copy link

gsmet commented Mar 21, 2018

@dadoonet I can see how the end goal would be to not depend on Elasticsearch itself but AFAICS from the issues it does not seem to be something we would have early.

Trimming down the dependency tree could be a good first step, that can be done right away without requiring too much work as it would just require adding some exclusions.

@dadoonet
Copy link
Member

IIRC it's because some builders depends on Lucene and some other classes like Version as you mentioned. So excluding Lucene is just not that easy.

But that's my 2 cents on this. I prefer having @javanna answering :)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@javanna
Copy link
Member

javanna commented Mar 22, 2018

I am not extremely sure that using Version is the only problem, it is certainly the first to be encountered and more obvious one. I agree that this is odd, and removing the dependency from Version may be done but if we expect the high-level REST client to work without lucene, we should also test this scenario and make sure it works. The end goal is to not depend on Elasticsearch, but it will take some time and if there is a way to alleviate this pain earlier we should look into it. Like David mentioned above, #28504 is a big step in this direction.

@javanna
Copy link
Member

javanna commented Mar 23, 2018

We discussed this today with the team as part of our FixItFriday. As said above the plan is for the high-level REST client to not depend on Elasticsearch. There is work in progress to allow for this, for instance to remove the lucene dependency from our xcontent code. There is much more to be done and we agreed that the lucene Version class is only one of the problems. We want to get there but at the moment we cannot give guarantees as we don't test this situation and we think that trimming down the dependency tree is not going to work till we get there properly, it is not only a matter of excluding the dependency, lucene classes are used in requests and responses unfortunately.

@nik9000
Copy link
Member

nik9000 commented Mar 23, 2018

We actually return some Lucene classes on the objects that come out of the high level rest client. SearchHit#explanation and our "funny" Text class at least expose Lucene. I'd like to clean those up as well but I figure they are lower down the list than things like org.apache.logging.log4j:log4j-core:jar:2.9.1:compile.

@yrodiere
Copy link
Contributor Author

Thanks for the update. I see this is not a trivial issue. We will wait :)

@fbaligand
Copy link
Contributor

I'm happy to see that ES team is going in this direction : have a lightweight client that is independent from elasticsearch server jars.
Today, high level client v6.2.4 with all its JAR dependencies weigh 24 Mo. That's heavy for a client.

On the other way, some parts of elasticsearch core code are very useful to keep in high level client : these are all the query builders.
That's why I would love that all this part (query builders) would be packed in an independent module.
That would be great !

@ikaygorodov
Copy link

This issue fix is highly anticipated. Trying to embed Java Highlevel client into Alfresco system became a nightmare.

@dforegger
Copy link

Is there any plan for this on the roadmap? From the earlier conversation, it sounds like this is desirable and there was work towards this a year ago, but I don't see any target release/milestones on the ticket.

At the risk of overstepping...: This has been one of the few areas where elastic falls well behind in feature-comparison with other search applications. Writing queries/indexing (and index management operations) with the low-level client feels like re-inventing the wheel, and there are concerns about the maintenance burden moving forward of keeping our JSON-builders up to date with the rest apis. I'd imagine the heavy-weight client is also a concern for any cloud-based 'serverless' applications.

tl;dr: Would love to see this as a priority in the 7.x product :)

@ikaygorodov
Copy link

As a temporary solution Maven Shade plugin could be used. Like it is listed for Low Level Client.

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-low-usage-shading.html

Example with shading all of dependencies is below.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.1.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals><goal>shade</goal></goals>
                    <configuration>
                        <relocations>
                            <relocation>
                                <pattern>org.elasticsearch</pattern>
                                <shadedPattern>hidden.org.elasticsearch</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>org.apache.lucene</pattern>
                                <shadedPattern>hidden.org.apache.lucene</shadedPattern>
                            </relocation>
                            <!-- This relocations probably could be removed -->
                            <relocation>
                                <pattern>com.fasterxml</pattern>
                                <shadedPattern>hidden.com.fasterxml</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>org.apache.httpcomponents</pattern>
                                <shadedPattern>hidden.org.apache.httpcomponents</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>org.apache.logging.log4j</pattern>
                                <shadedPattern>hidden.org.apache.logging.log4j</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>org.hdrhistogram</pattern>
                                <shadedPattern>hidden.org.hdrhistogram</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>org.slf4j</pattern>
                                <shadedPattern>hidden.org.slf4j</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>org.yaml</pattern>
                                <shadedPattern>hidden.org.yaml</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>net.sf.jopt-simple</pattern>
                                <shadedPattern>hidden.net.sf.jopt-simple</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>joda-time</pattern>
                                <shadedPattern>hidden.joda-time</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>commons-logging</pattern>
                                <shadedPattern>hidden.commons-logging</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>commons-codec</pattern>
                                <shadedPattern>hidden.commons-codec</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>com.tdunning</pattern>
                                <shadedPattern>hidden.com.tdunning</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>com.github.spullara</pattern>
                                <shadedPattern>hidden.com.github.spullara</shadedPattern>
                            </relocation>
                            <relocation>
                                <pattern>com.carrotsearch</pattern>
                                <shadedPattern>hidden.com.carrotsearch</shadedPattern>
                            </relocation>
                        </relocations>
                    </configuration>
                </execution>
            </executions>
        </plugin>

@dforegger
Copy link

Thanks. We'll keep that in mind if we run into any more dependency issues. Still hoping we'll see a lightweight client someday soon.

@fbaligand
Copy link
Contributor

If you want a lightweight client right now, there is Jest :
https://github.com/searchbox-io/Jest

@jesinity
Copy link
Contributor

jesinity commented Aug 5, 2019

This looks really an high hanging fruit.
in the org.elastichsearch.common package there are plenty of references to lucene and other libraries.
So it seems to me that the common should be first moved on one or more modules on their own (also moving part to core that already contains common classes ) and reference them both server and the high level client.

@loicmathieu
Copy link

For the record, in Quarkus we add the following exclusions to the Elasticsearch High Level Client dependency.
I used jdeps to find what is really needed by the Elasticsearch High Level Client.
Maybe this can be done upstream.

<dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <!-- We exclude all lucene libraries that are not needed -->
            <!-- Jdeps shows that only lucene-core and lucene-queries are needed-->
            <!-- Native image needs also lucene-highlighter and lucene-join -->
            <exclusions>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-analyzers-common</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-backward-codecs</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-grouping</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-memory</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-misc</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-queryparser</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-sandbox</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-spatial</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-spatial-extras</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-spatial3d</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-suggest</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

@fbaligand
Copy link
Contributor

Thanks for the share @loicmathieu !

@dakrone
Copy link
Member

dakrone commented Mar 8, 2024

Closing this as we've removed the high level rest client in favor of the Java client.

@dakrone dakrone closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests