Skip to content

CUDA parallelized SAXPY kernel defined by the BLAS (Basic Lineal Algebra Subroutine)

License

Notifications You must be signed in to change notification settings

MarcRamonMoreno/CUDA-SAXPY-KERNEL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA SAXPY Kernel Implementation

Overview

This code provides a CUDA implementation of the SAXPY operation, a common computation in linear algebra. SAXPY stands for "Single-Precision A*X Plus Y", where A, X, and Y are vectors. The code performs this operation in parallel on a GPU using CUDA, NVIDIA's parallel computing platform and programming model. Requirements

NVIDIA GPU with CUDA support
CUDA Toolkit installed on your system

Description

The code is divided into two main parts:

CUDA Kernel (saxpy_parallel): This function runs on the GPU and performs the SAXPY computation. Each thread computes a single element of the result vector.

Host Code (main function): This part runs on the CPU. It allocates memory on the GPU, copies data from the host to the GPU, invokes the CUDA kernel, and then copies the result back to the host.

Key Components

saxpy_parallel: CUDA kernel function for the SAXPY operation.
main: Host function to set up CUDA environment and invoke the kernel.
n: Size of the input vectors.
alpha: Scalar value in the SAXPY operation.
x, y: Input vectors.
d_x, d_y: Device (GPU) copies of x and y.

Usage

Compile the code using the CUDA compiler (nvcc).
Run the executable. It will perform the SAXPY operation and print the results.

Important Considerations

The code assumes a certain size for the vectors (n = 2000), but this can be adjusted as needed.
The number of threads per block is set to 256, but this value can be tuned for different GPUs.
Memory allocation and free operations are crucial to avoid memory leaks.
Error checks for CUDA operations (like cudaMalloc, cudaMemcpy) are not included but are recommended for robust code.

Conclusion

This CUDA SAXPY implementation demonstrates basic GPU programming concepts, including memory management, kernel invocation, and parallel computation. It serves as a starting point for more complex GPU-accelerated applications. Before running the make command, ensure that the makefile has the correct paths and instructions for your system's architecture and the intended CUDA toolkit version.

About

CUDA parallelized SAXPY kernel defined by the BLAS (Basic Lineal Algebra Subroutine)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published