Skip to content

GPGPU-Sim enabled Turing WMMA API and its benchmark results. Undergraduate study at Yonsei Univ.

Notifications You must be signed in to change notification settings

DooHyun-Lee/GPGPU_Sim-Enabled-Turing-WMMA-API

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstract

 The current(2020.12) GPGPU-Sim supports up to the 1st Gen(Volta) NVIDIA tensor core. This distribution consists of GPGPU-Sim enabled Turing WMMA API and its benchmark results. Each directory inside the Benchmark directory has hardware benchmark results and revised gpgpu-sim benchmark results.
 In this study, the microarchitecture of Tensor Core in Turing architecture is proposed. Since NVIDIA does not disclose the inside of the tensor core, it is necessary to profile through microbenchmarking. Dissecting the NVIDIA GPUs has also been done in previous studies. However, it was not revealed about the experimental features of the Turing architecture, i.e. INT4(int 4-bit) operation mode and B1(binary 1-bit) operation mode. All of these functions were analyzed in this study.

Repository Structure

  • gpgpu-sim
    • GPGPU-Sim enalbed Turing WMMA API
  • Benchmark
    • b1(1-bit)
    • u4(unsigned 4-bit)
    • u8(unsigned 8-bit)
    • fp16(floating point 16-bit)
    • mixed(mixed precision)
  • Paper
    • Thesis paper

Recommended environment for running benchmark

Hardware benchmarking

  1. Go to the directory you want to benchmark.
  2. Set the matrix size at test.cu inside hard directory.
  3. $ make
  4. See results in the log file

GPGPU-Sim benchmarking

  1. build GPGPU-Sim(check its version of CUDA is 10 or higher)
  2. Set the matrix size at test.cu inside sim directory.
  3. $ make
  4. See the result shown by simulator.

Results

  • Proposed 2nd Gen tensor core architecture

  • Benchmark results





About

GPGPU-Sim enabled Turing WMMA API and its benchmark results. Undergraduate study at Yonsei Univ.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 47.7%
  • C 34.8%
  • Cuda 14.1%
  • Python 1.6%
  • Yacc 0.6%
  • Makefile 0.6%
  • Other 0.6%