Skip to content

🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.

Notifications You must be signed in to change notification settings

coderonion/awesome-cuda-and-hpc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Awesome-CUDA-and-HPC

Awesome

🔥🔥🔥 This repository lists some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.

Contents

Awesome List

  • Awesome CUDA

    • codingonion/awesome-cuda-and-hpc : A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.

    • Erkaman/Awesome-CUDA : This is a list of useful libraries and resources for CUDA development.

    • jslee02/awesome-gpgpu : 😎 A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources.

    • mikeroyal/CUDA-Guide : A guide covering CUDA including the applications and tools that will make you a better and more efficient CUDA developer.

Learning Resources

Frameworks

  • HPC Frameworks

    • BLAS : BLAS (Basic Linear Algebra Subprograms). The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations. The Level 1 BLAS perform scalar, vector and vector-vector operations, the Level 2 BLAS perform matrix-vector operations, and the Level 3 BLAS perform matrix-matrix operations.

    • LAPACK : LAPACK development repository. LAPACK — Linear Algebra PACKage. LAPACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.

    • OpenBLAS : OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version. www.openblas.net

    • BLIS : BLAS-like Library Instantiation Software Framework.

    • NumPy : The fundamental package for scientific computing with Python. numpy.org

    • SciPy : SciPy library main repository. SciPy (pronounced "Sigh Pie") is an open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. scipy.org

    • Gonum : Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more. www.gonum.org/

    • YichengDWu/matmul.mojo : High Performance Matrix Multiplication in Pure Mojo 🔥. Matmul.🔥 is a high performance muilti-threaded implimentation of the BLIS algorithm in pure Mojo 🔥.

  • CUDA Frameworks

    • GPU Interface

      GPU接口
      • CPP Version
        • CCCL : CUDA C++ Core Libraries. The concept for the CUDA C++ Core Libraries (CCCL) grew organically out of the Thrust, CUB, and libcudacxx projects that were developed independently over the years with a similar goal: to provide high-quality, high-performance, and easy-to-use C++ abstractions for CUDA developers.

        • HIP : HIP: C++ Heterogeneous-Compute Interface for Portability. HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code. rocmdocs.amd.com/projects/HIP/

      • Python Version
      • Rust Version
      • Julia Version
    • Performance Benchmark

    • Scientific Computing Framework

      科学计算框架
      • cuBLAS : Basic Linear Algebra on NVIDIA GPUs. NVIDIA cuBLAS is a GPU-accelerated library for accelerating AI and HPC applications. It includes several API extensions for providing drop-in industry standard BLAS APIs and GEMM APIs with support for fusions that are highly optimized for NVIDIA GPUs. The cuBLAS library also contains extensions for batched operations, execution across multiple GPUs, and mixed- and low-precision execution with additional tuning for the best performance.

      • CUTLASS : CUDA Templates for Linear Algebra Subroutines.

      • MatX : MatX - GPU-Accelerated Numerical Computing in Modern C++. An efficient C++17 GPU numerical computing library with Python-like syntax. nvidia.github.io/MatX

      • CuPy : CuPy : NumPy & SciPy for GPU. cupy.dev

      • GenericLinearAlgebra.jl : Generic numerical linear algebra in Julia.

      • custos-math : This crate provides CUDA, OpenCL, CPU (and Stack) based matrix operations using custos.

    • Machine Learning Framework

      • cuDNN : The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, attention, matmul, pooling, and normalization.

      • PyTorch : Tensors and Dynamic neural networks in Python with strong GPU acceleration. pytorch.org

      • PaddlePaddle : PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署). www.paddlepaddle.org/

      • flashlight/flashlight : A C++ standalone library for machine learning. fl.readthedocs.io/en/latest/

      • NVlabs/tiny-cuda-nn : Lightning fast C++/CUDA neural network framework.

      • yhwang-hub/dl_model_infer : his is a c++ version of the AI reasoning library. Currently, it only supports the reasoning of the tensorrt model. The follow-up plan supports the c++ reasoning of frameworks such as Openvino, NCNN, and MNN. There are two versions for pre- and post-processing, c++ version and cuda version. It is recommended to use the cuda version., This repository provides accelerated deployment cases of deep learning CV popular models, and cuda c supports dynamic-batch image process, infer, decode, NMS.

    • AI Inference Framework

      AI推理框架
    • Multi-GPU Framework

      多GPU框架
    • Robotics Framework

      机器人框架
      • Cupoch : Robotics with GPU computing.
    • ZKP and Web3 Framework

      零知识证明和Web3框架

Applications

Blogs

Videos

Jobs and Interview

Releases

No releases published

Packages

No packages published