Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ClangOffloadBundler] Add file size to header #88827

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

yxsamliu
Copy link
Collaborator

__hipRegisterFatBinary only accepts one pointer argument. It is expected to get the fat binary size from the header.

This patch adds a file size field to the header of the compressed bundle.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Apr 16, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 16, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

__hipRegisterFatBinary only accepts one pointer argument. It is expected to get the fat binary size from the header.

This patch adds a file size field to the header of the compressed bundle.


Full diff: https://github.com/llvm/llvm-project/pull/88827.diff

3 Files Affected:

  • (modified) clang/include/clang/Driver/OffloadBundler.h (+7-5)
  • (modified) clang/lib/Driver/OffloadBundler.cpp (+28-13)
  • (modified) clang/test/Driver/clang-offload-bundler-zstd.c (+9-8)
diff --git a/clang/include/clang/Driver/OffloadBundler.h b/clang/include/clang/Driver/OffloadBundler.h
index 65d33bfbd2825f..a39464472d1219 100644
--- a/clang/include/clang/Driver/OffloadBundler.h
+++ b/clang/include/clang/Driver/OffloadBundler.h
@@ -97,6 +97,7 @@ struct OffloadTargetInfo {
 //
 // The format is as follows:
 // - Magic Number (4 bytes) - A constant "CCOB".
+// - Total file size (4 bytes).
 // - Version (2 bytes)
 // - Compression Method (2 bytes) - Uses the values from
 // llvm::compression::Format.
@@ -107,13 +108,14 @@ struct OffloadTargetInfo {
 class CompressedOffloadBundle {
 private:
   static inline const size_t MagicSize = 4;
+  static inline const size_t FileSizeFieldSize = sizeof(uint32_t);
   static inline const size_t VersionFieldSize = sizeof(uint16_t);
   static inline const size_t MethodFieldSize = sizeof(uint16_t);
-  static inline const size_t SizeFieldSize = sizeof(uint32_t);
-  static inline const size_t HashFieldSize = 8;
-  static inline const size_t HeaderSize = MagicSize + VersionFieldSize +
-                                          MethodFieldSize + SizeFieldSize +
-                                          HashFieldSize;
+  static inline const size_t UncompressedSizeFieldSize = sizeof(uint32_t);
+  static inline const size_t HashFieldSize = sizeof(uint64_t);
+  static inline const size_t HeaderSize =
+      MagicSize + FileSizeFieldSize + VersionFieldSize + MethodFieldSize +
+      UncompressedSizeFieldSize + HashFieldSize;
   static inline const llvm::StringRef MagicNumber = "CCOB";
   static inline const uint16_t Version = 1;
 
diff --git a/clang/lib/Driver/OffloadBundler.cpp b/clang/lib/Driver/OffloadBundler.cpp
index 77c89356bc76bb..23bf0c2322367d 100644
--- a/clang/lib/Driver/OffloadBundler.cpp
+++ b/clang/lib/Driver/OffloadBundler.cpp
@@ -1010,10 +1010,16 @@ CompressedOffloadBundle::compress(llvm::compression::Params P,
 
   uint16_t CompressionMethod = static_cast<uint16_t>(P.format);
   uint32_t UncompressedSize = Input.getBuffer().size();
+  uint32_t TotalFileSize = MagicNumber.size() + sizeof(TotalFileSize) +
+                           sizeof(Version) + sizeof(CompressionMethod) +
+                           sizeof(UncompressedSize) + sizeof(TruncatedHash) +
+                           CompressedBuffer.size();
 
   SmallVector<char, 0> FinalBuffer;
   llvm::raw_svector_ostream OS(FinalBuffer);
   OS << MagicNumber;
+  OS.write(reinterpret_cast<const char *>(&TotalFileSize),
+           sizeof(TotalFileSize));
   OS.write(reinterpret_cast<const char *>(&Version), sizeof(Version));
   OS.write(reinterpret_cast<const char *>(&CompressionMethod),
            sizeof(CompressionMethod));
@@ -1033,7 +1039,9 @@ CompressedOffloadBundle::compress(llvm::compression::Params P,
     double CompressionSpeedMBs =
         (UncompressedSize / (1024.0 * 1024.0)) / CompressionTimeSeconds;
 
-    llvm::errs() << "Compressed bundle format version: " << Version << "\n"
+    llvm::errs() << "Total file size (including headers): "
+                 << formatWithCommas(TotalFileSize) << " bytes\n"
+                 << "Compressed bundle format version: " << Version << "\n"
                  << "Compression method used: " << MethodUsed << "\n"
                  << "Compression level: " << P.level << "\n"
                  << "Binary size before compression: "
@@ -1069,21 +1077,26 @@ CompressedOffloadBundle::decompress(const llvm::MemoryBuffer &Input,
     return llvm::MemoryBuffer::getMemBufferCopy(Blob);
   }
 
+  size_t CurrentOffset = MagicSize;
+  uint32_t TotalFileSize;
+  memcpy(&TotalFileSize, Blob.data() + CurrentOffset, sizeof(uint32_t));
+  CurrentOffset += FileSizeFieldSize;
+
   uint16_t ThisVersion;
+  memcpy(&ThisVersion, Blob.data() + CurrentOffset, sizeof(uint16_t));
+  CurrentOffset += VersionFieldSize;
+
   uint16_t CompressionMethod;
+  memcpy(&CompressionMethod, Blob.data() + CurrentOffset, sizeof(uint16_t));
+  CurrentOffset += MethodFieldSize;
+
   uint32_t UncompressedSize;
+  memcpy(&UncompressedSize, Blob.data() + CurrentOffset, sizeof(uint32_t));
+  CurrentOffset += UncompressedSizeFieldSize;
+
   uint64_t StoredHash;
-  memcpy(&ThisVersion, Input.getBuffer().data() + MagicNumber.size(),
-         sizeof(uint16_t));
-  memcpy(&CompressionMethod, Blob.data() + MagicSize + VersionFieldSize,
-         sizeof(uint16_t));
-  memcpy(&UncompressedSize,
-         Blob.data() + MagicSize + VersionFieldSize + MethodFieldSize,
-         sizeof(uint32_t));
-  memcpy(&StoredHash,
-         Blob.data() + MagicSize + VersionFieldSize + MethodFieldSize +
-             SizeFieldSize,
-         sizeof(uint64_t));
+  memcpy(&StoredHash, Blob.data() + CurrentOffset, sizeof(uint64_t));
+  CurrentOffset += HashFieldSize;
 
   llvm::compression::Format CompressionFormat;
   if (CompressionMethod ==
@@ -1135,7 +1148,9 @@ CompressedOffloadBundle::decompress(const llvm::MemoryBuffer &Input,
     double DecompressionSpeedMBs =
         (UncompressedSize / (1024.0 * 1024.0)) / DecompressionTimeSeconds;
 
-    llvm::errs() << "Compressed bundle format version: " << ThisVersion << "\n"
+    llvm::errs() << "Total file size (from header): "
+                 << formatWithCommas(TotalFileSize) << " bytes\n"
+                 << "Compressed bundle format version: " << ThisVersion << "\n"
                  << "Decompression method: "
                  << (CompressionFormat == llvm::compression::Format::Zlib
                          ? "zlib"
diff --git a/clang/test/Driver/clang-offload-bundler-zstd.c b/clang/test/Driver/clang-offload-bundler-zstd.c
index 4485e57309bbbc..bbdf637f46077c 100644
--- a/clang/test/Driver/clang-offload-bundler-zstd.c
+++ b/clang/test/Driver/clang-offload-bundler-zstd.c
@@ -22,19 +22,20 @@
 // Check compression/decompression of offload bundle.
 //
 // RUN: clang-offload-bundler -type=bc -targets=hip-amdgcn-amd-amdhsa--gfx900,hip-amdgcn-amd-amdhsa--gfx906 \
-// RUN:   -input=%t.tgt1 -input=%t.tgt2 -output=%t.hip.bundle.bc -compress -verbose 2>&1 | \
-// RUN:   FileCheck -check-prefix=COMPRESS %s
+// RUN:   -input=%t.tgt1 -input=%t.tgt2 -output=%t.hip.bundle.bc -compress -verbose >%t.1.txt 2>&1
 // RUN: clang-offload-bundler -type=bc -list -input=%t.hip.bundle.bc | FileCheck -check-prefix=NOHOST %s
 // RUN: clang-offload-bundler -type=bc -targets=hip-amdgcn-amd-amdhsa--gfx900,hip-amdgcn-amd-amdhsa--gfx906 \
-// RUN:   -output=%t.res.tgt1 -output=%t.res.tgt2 -input=%t.hip.bundle.bc -unbundle -verbose 2>&1 | \
-// RUN:   FileCheck -check-prefix=DECOMPRESS %s
+// RUN:   -output=%t.res.tgt1 -output=%t.res.tgt2 -input=%t.hip.bundle.bc -unbundle -verbose >%t.2.txt 2>&1
+// RUN: cat %t.1.txt %t.2.txt | FileCheck %s
 // RUN: diff %t.tgt1 %t.res.tgt1
 // RUN: diff %t.tgt2 %t.res.tgt2
 //
-// COMPRESS: Compression method used: zstd
-// COMPRESS: Compression level: 3 
-// DECOMPRESS: Decompression method: zstd
-// DECOMPRESS: Hashes match: Yes
+// CHECK: Total file size (including headers): [[SIZE:[0-9]*]] bytes
+// CHECK: Compression method used: zstd
+// CHECK: Compression level: 3
+// CHECK: Total file size (from header): [[SIZE]] bytes
+// CHECK: Decompression method: zstd
+// CHECK: Hashes match: Yes
 // NOHOST-NOT: host-
 // NOHOST-DAG: hip-amdgcn-amd-amdhsa--gfx900
 // NOHOST-DAG: hip-amdgcn-amd-amdhsa--gfx906

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this ABI breaking since we're changing the size of the struct? Shouldn't that necessitate a new version?

Also unrelated, I wonder if there's a future where we can use the ClangOffloadPackager format in the HIP runtime.

@yxsamliu
Copy link
Collaborator Author

Isn't this ABI breaking since we're changing the size of the struct? Shouldn't that necessitate a new version?

Also unrelated, I wonder if there's a future where we can use the ClangOffloadPackager format in the HIP runtime.

I think you are right. Although this feature has not been used extensively, it is better to bump up the version to keep backward compatibility.

Currently we are moving the extraction of code object from fat binary from runtime to comgr. We can treat the offload packager format as another flavor of fat binary. We only need to pass a pointer to sufficient information to __hipRegisterFatBinary, and let runtime pass that pointer to comgr, then comgr do the extraction. comgr just needs to identify it is offload packager format and handle it properly.

@yxsamliu yxsamliu force-pushed the bundle-file-size branch 2 times, most recently from d18a8c9 to fec9509 Compare April 16, 2024 12:44
@jhuber6
Copy link
Contributor

jhuber6 commented Apr 16, 2024

Isn't this ABI breaking since we're changing the size of the struct? Shouldn't that necessitate a new version?
Also unrelated, I wonder if there's a future where we can use the ClangOffloadPackager format in the HIP runtime.

I think you are right. Although this feature has not been used extensively, it is better to bump up the version to keep backward compatibility.

Currently we are moving the extraction of code object from fat binary from runtime to comgr. We can treat the offload packager format as another flavor of fat binary. We only need to pass a pointer to sufficient information to __hipRegisterFatBinary, and let runtime pass that pointer to comgr, then comgr do the extraction. comgr just needs to identify it is offload packager format and handle it properly.

Yeah, it's got magic bytes so it should be fairly trivial so long as someone includes the code to extract it (Which is pretty simple as well).

@jhuber6
Copy link
Contributor

jhuber6 commented Apr 16, 2024

Seems the documentation builder is complaining, maybe something wrong with the .rst file.

@yxsamliu
Copy link
Collaborator Author

Seems the documentation builder is complaining, maybe something wrong with the .rst file.

It is passing now

__hipRegisterFatBinary only accepts one pointer argument. It is
expected to get the fat binary size from the header.

This patch adds a file size field to the header of the compressed
bundle.
@yxsamliu
Copy link
Collaborator Author

reorder the fields to be naturally aligned

@yxsamliu yxsamliu merged commit 78dca4a into llvm:main Apr 19, 2024
4 of 5 checks passed
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024
__hipRegisterFatBinary only accepts one pointer argument. It is expected
to get the fat binary size from the header.

This patch adds a file size field to the header of the compressed
bundle.
GZGavinZhao added a commit to GZGavinZhao/rocm-llvm-project that referenced this pull request Apr 21, 2024
A manual backport of
ROCm@78dca4a

Signed-off-by: Gavin Zhao <git@gzgz.dev>
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 22, 2024
__hipRegisterFatBinary only accepts one pointer argument. It is expected
to get the fat binary size from the header.

This patch adds a file size field to the header of the compressed
bundle.

llvm#88827

This patch also cherry-picks the dependent changes:
e9901d8 124d0b7
 e733d7e 78dca4a

Change-Id: Ia49f3b4a6c81a27b7959c5d9b437496e8bce6657
GZGavinZhao pushed a commit to GZGavinZhao/rocm-llvm-project that referenced this pull request Jun 21, 2024
__hipRegisterFatBinary only accepts one pointer argument. It is expected
to get the fat binary size from the header.

This patch adds a file size field to the header of the compressed
bundle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants