Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track 3rd party libs used in the dist package #17208

Open
kgyrtkirk opened this issue Oct 1, 2024 · 3 comments
Open

Track 3rd party libs used in the dist package #17208

kgyrtkirk opened this issue Oct 1, 2024 · 3 comments

Comments

@kgyrtkirk
Copy link
Member

Description

It would be great to at least somehow track the 3rd party deps in a way that they need changes to the PR itself if new ones gets added - which will drag attention toward them and could possibly improve the situation.

Motivation

It seems like there are quite a few versions of the same lib in the distribution build - these might have landed via transitive deps and most likely without being considered.

@kgyrtkirk
Copy link
Member Author

I'll describe one approach - there might be others:

# do a full dist build like
mvn install -DskipTests  -Pdist -Pbundle-contrib-exts

from there ; we could keep a textfile in the project which supposed to match the list of jars in the dist build.
By sorting by filename it could show that the same is present at multiple places - and also it could show that different versions of the same lib are present

tar tzf distribution/target/apache-druid-32.0.0-SNAPSHOT-bin.tar.gz | grep jar$ | sed 's|.*/||'|grep -v '^druid'|sort > distribution/dist_jars.txt

if that list changes; the build should fail

There could also be a check to ensure that libs from lib are get reused via provided

# make a content list
tar tzf distribution/target/apache-druid-32.0.0-SNAPSHOT-bin.tar.gz | grep jar$ | grep -v '/druid' > base.li
# this list should be empty
fgrep -f <(grep /lib/ base.li |sed 's|.*/||') base.li |grep -v '/lib/'

@shigarg1
Copy link
Contributor

I was checking this and found two problems as of now

  1. There are multiple copies of same version across multiple extensions
  2. There are different versions for same dependencies coming as part of transitive dependencies.

For 1st I found a way to reduce it to max 3 copies which reduced the distribution size from 900M to 600M - #17321
I am looking for a way to reduce it to 1 copy

For 2nd I found Maven enforcer rule - https://maven.apache.org/enforcer/enforcer-rules/dependencyConvergence.html
We can add dependencies in exclude for which we know multiple versions are required.

@abhishekagarwal87
Copy link
Contributor

There is some work done in #16973 that might be usable here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants