All-Pair-Shortest-Path GPU Implementation

Author: Yair Schiff
Project: Fall 2018 CSCI-GA 3033-004: GPUs
Instructor: Prof. Zahran

Overview

This project compares an efficient GPU implementation of the Floyd-Warshall (FW) algorithm for solving the all-pair-shortest-path (APSP) problem with that of a CPU implementation and other GPU implementations.

Requirements

CUDA compiler - module load cuda-9.1
OpenACC compiler - module load module load pgi

How to use

Compilation

To compile the CUDA version of the parallel FW algorithm use the Makefile provided in the src directory.

make ARCH=<sm architecture>

This will compile the FW.cu file and produce an executable FW binary for the GPU architecture provided in the argument ARCH. To remove this file you can run

make clean

To compile the OpenACC version of the code use one of the following two commands (once the compiler has been loaded)

(For GPU)

pgcc -o FW_acc -acc -Minfo FW_openacc.c

(For Multi-Core)

pgcc -o FW_multi -acc -ta=multicore -Minfo FW_openacc.c

The GPU binary file, FW_acc, and that from the multi-core compilations, FW_multi, will have the same usage as the CUDA code, as described below.

Usage

To run the algorithm on a CPU, GPU, or Multi-Core processor use the following command line from within the /src directory where the code was compiled (shown for the CUDA binary file):

./FW <path to input file> <N> <machine> <verbose>

The first argument <path to input file> points to the input matrix. You can use ../inputs/20181101.as-rel.txt to use the CAIDA AS Relationship data from the project report. Note, when using other inputs, the code assumes that the data is in the following format: i | j | weight\n, which represents a weighted directed edge from vertex i to vertex j (both of which are integers).

The second argument, indicates how much of the input matrix to use. For example, N=1000 will use the first 1,000 vertices of the graph.

The third argument should be either 0 (sequential implementation on CPU) or 1 (parallel implementation on GPU or multi-core).

The final argument should be 0, but when set to 1, will print an NxN table of the original input matrix.

Once the algorithm has finished running it will output the time taken and will save a file titled: solution_path_<N>.txt to the /outputs directory which contains the path and length (if a path exists) between all (i, j) vertex pairs.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Images		Images
Results		Results
inputs		inputs
outputs		outputs
src		src
.gitignore		.gitignore
GPU_Project_report.pdf		GPU_Project_report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

All-Pair-Shortest-Path GPU Implementation

Overview

Requirements

How to use

Compilation

Usage

About

Releases

Packages

Languages

yair-schiff/GPU_Project

Folders and files

Latest commit

History

Repository files navigation

All-Pair-Shortest-Path GPU Implementation

Overview

Requirements

How to use

Compilation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages