Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect assemblies with too many entries to fail shell script prepending #3140

Merged
merged 9 commits into from
May 6, 2024

Conversation

lefou
Copy link
Member

@lefou lefou commented May 3, 2024

This is an attempt to fix the issue of invalid assembly files which occurs when the following two conditions are met:

  • The JAR file has more than 65535 ZIP entries
  • The JAR has a prepended shell script

The issue was reported and analyzed in the following tickets:

This issue also hits other build tools, but it seems Mill is the only tools which automatically enables shell script prepending by default.

Since there is no real fix available, we simply try to detect the issue after the fact and fail the assemble task with a actionable error message.

To make the fix binary compatible, I had to deprecated the upstreamAssembly target in favor to the new upstreamAssembly2 target, which returns also the added ZIP entry count. Since this can be a behavioral change when users have overridden the upstreamAssembly target, I also added some warning messages with will detect this at runtime and provide actionable help.

@lefou lefou changed the base branch from repro-528 to main May 3, 2024 09:10
@lefou lefou requested a review from lihaoyi May 3, 2024 09:11
val problematicEntryCount = 65535
if (
prependScript.isDefined &&
(upstream.addedEntries + created.addedEntries) > problematicEntryCount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's really worth keeping track of the number of entries incrementally? v.s. creating the assembly first, scanning it, and then failing if it has too many entries. If the performance of scanning the assembly is acceptable (milliseconds to tens of milliseconds?) then that would save us a bunch of book-keeping passing around addedEntries values, and a bunch of churn in replacing upstreamAssembly with upstreamAssembly2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably depends. When we scan the jar right after it's written, it should be in the file system cache of any reasonable OS and scanning should be fast. On the other side, just remembering the added count is quasi for free, esp. when we assume, that the upstream-assembly is the larger portion of the assembly and keeps stable.

I think I want to experiment what the result looks like and how it performs.

Copy link
Member Author

@lefou lefou May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues with scanning afterwards:

  • Scanning and counting entries is slow. Scanning the jar of test case noExe.large took almost 3 seconds (75516 entries), although we just wrote it
  • Java's ZipInputStream and JarInputStream aren't able to find any ZipEntry in a prependend jar making scanning after packaging non-trivial
  • Shelling out to some OS-installed zip tools doesn't seem right

Copy link
Member

@lihaoyi lihaoyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lefou lefou merged commit 43843e9 into main May 6, 2024
38 checks passed
@lefou lefou deleted the fix-assembly branch May 6, 2024 09:28
@lefou lefou added this to the 0.11.8 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants