Skip to content

RISC‐V

aidget edited this page Dec 20, 2023 · 1 revision

Welcome to the Aidget RISC-V wiki!

0. common 函数

0.hello world

一个简易的helloworld工程,可用于测试交叉编译链是否可用。

$ adb shell "./aidget_riscv"
Hello World!

1.memcpy test

memcpy的小实验,需要注意vsetvliloadstore的指令v0.7和v1.0是不一样的,目前用的是v0.7。

$ adb shell "./aidget_riscv"
memcpy_test: 0 1 2 3 4 5 6 7 8 9
指令 v0.7 v1.0 备注
vsetvli vsetvli t0, a2, e8, m8 vsetvli t0, a2, e8, m8, ta, ma Vectors of 8b
load vlb.v v0, (a1) vle8.v v0, (a1) Load bytes
store vsb.v v0, (a3) vse8.v v0, (a3) Store bytes
ta   # Tail agnostic
tu   # Tail undisturbed
ma   # Mask agnostic
mu   # Mask undisturbed

在 v0.9 之前,当未在 vsetvli 上指定这些标志时,它们默认为掩码未受干扰/尾部未受干扰

vsetvli t0, a2, e8

这个例子中,初见vsetvli指令,a2是长度n。

  • 1st,a2 = 10 --> t0 = 8 --> a2 = 2
  • 2nd, a2 = 2 --> t0 = 2 --> a2 = 0 -->ret

2.saxpy

SAXPY(Scalar Alpha X Plus Y)是一个在 Basic Linear Algebra Subprograms(BLAS)数据包中的函数,并且是一个并行向量处理机(vector processor)中常用的计算操作指令。

y=αx+y,其中α是标量,x和y矢量。

$ adb shell "./aidget_riscv"
saxpy_test: 2.1 4.2 6.3 8.4 10.5 12.6 14.7 16.8 18.9 21.0

vsetvli a4, a0, e32, m8

这个例子中,又见vsetvli指令,vsetvli使用m8参数设置了每条指令处理8个连续的向量寄存器,a0是长度n。

n = 10 --> a0 = 10

a = 2.0 --> fa0 = 2.0

vsetvli a4, a0, e32, m8 # a4 = min(10,8) = 8

vlw.v v0, (a1) # v0-v7 = x0-x7 next: v0-v7 = x8-...

sub a0, a0, a4 # a0 = a0 - a4 = 10 - 8 = 2

slli a4, a4, 2 # a4 = a4 << 2 = 8*4 = 32 # float占4个Byte

add a1, a1, a4 # a1 本指向 x0,现在指向 x8

vlw.v v8, (a2) # y0-y7 load 到 v8-v15

vfmacc.vf v8, fa0, v0 # (v8-v15) = fa0 * (v0-v7) + (v8-v15)

vsw.v v8, (a2) # store 到 y0

add a2, a2, a4 # a2本指向y0,现在指向y8

3.memcpy bandwidth test

测试内存带宽的小脚本

$ adb shell "./aidget_riscv"
memory_bandwidth_test:
0: memcpy bandwidth (read and write)
AVG     Method: MEMCPY  Elapsed: 0.09464        MiB: 100.00000  Copy: 1056.639 MiB/s
AVG     Method: DUMB    Elapsed: 0.60153        MiB: 100.00000  Copy: 166.243 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.09546        MiB: 100.00000  Copy: 1047.582 MiB/s
1: flw bandwidth (read)
--> flw Memory read bandwidth is 2.732 GB/s
2: vlw bandwidth (read)
--> vlw[m8] Memory read bandwidth is 1.155 GB/s
--> vlw[m4] Memory read bandwidth is 1.276 GB/s
--> vlw[m2] Memory read bandwidth is 0.347 GB/s
--> vlw[m1] Memory read bandwidth is 0.320 GB/s
Clone this wiki locally