首页 > 其他分享 >CSC3050 RISC-V Simulator with RVV

CSC3050 RISC-V Simulator with RVV

时间:2024-11-14 18:30:05浏览次数:1  
标签:CSC3050 instruction RISC will vector RVV vd

CSC3050 Project 3: RISC-V Simulator with RVV

1 Background

RISC-V, an open standard instruction set architecture (ISA), has rapidly become apivotal force in academic research and industrial development due to its flexibilityand open-source nature. Unlike proprietary ISAs, RISC-V offers the freedom fordevelopers to customize and extend the architecture, making it an ideal platformfor innovation in research, education, and the design of specialized hardware. Oneof its most impactful extensions is the RISC-V Vector Extension (RVV), whichintroducesefficient vector processing capabilities—a cornerstone of modern highperformance computing. This is especially critical for applications like machinelearning, cryptography, and scientific simulations, where parallel data processing iessentialfor improving computational speed and efficiency.In this project, you are tasked with extending the QTRVSim RISC-V simulator to support vector operations by implementing some of the RVV instructions.After reviewing the number of cycles, you will get a feeling of how this is fasterthan conducting element-wise operations.Start early, this project can be time-consuming if you are not familiar withsimulators.2 QTRVSim QTRVSim is a RISC-V CPU simulator for education, where you can try its onlineversion on this link. Just in case you want to try different instructions, you can referto this page: RISC-V Instruction Set Specifications. A helpful video about usingTRVSim can be found on YoutubeAfter familiarizing yourself with the QtRVSim manual, you can begin planning howto integrate RVV instructions into the existing implementation. The simulator’ssource code, written in C++ and including both the core simulation functions andgraphical user interfaces (GUIs), can be found in the repository at this link. To testyourmodifications, QtRVSim offers two methods for simulating assemblycode: GUIor command-line prompts.Note: For this project, you are not required to modify any of the GUI components.Your primary goal is to ensure that the RVV instructions function correctly whenusing command-line prompts. Another objective in this project is to save the numberof cycles; the smaller the number you get, thebetter the score you get.12.1 How to run We give the example of running QTRVSim on Ubuntu with the terminal. You canfollow these steps:

  1. We assume you already have the necessary packages for compiling cpp. Ifnot, you can easily find tutorial for them on the internet.
  1. Install QT6 (QT5 does not work in most cases) with sudo apt install qt6-base-dev. You might need sudo apt update first, and make sure you areinstalling QT6, not QT5.
  1. Download QTRVSim from the given repository.
  2. Make a new directory for building files (mkdir build; cd build)
  3. cmake -DCMAKE BUILD TYPE=Release /path/to/qtrvsim
  4. make -j X, where X is the number of threads you want to use
  5. If everything goes correctly, you can use ./target/qtrvsim cli –asm XXXXX.Sto run your .S file.
  1. Via ./target/qtrvsim cli –help, you can check all helpful arguments.3 RVV Instructions In this assignment, you are required to implement the following RVV instructions(suppose max vector size is 32):
  1. vsetvl rd, rs1, rs2: sets the length register vl to rs1 and rd, also sets theregister holding the type of vector to rs2 (8/16/32).
  1. vadd.vv vd, vs2, vs1: adds two vectors vs2 and vs1, and stores the resultin vd
  1. vadd.vx vd, vs2, rs1: adds rs1 to each element of vector vs2, and storesthe result in vd
  1. vadd.vi vd, vs2, imm: adds the scalar value imm to each element of vectorvs2, and stores the result in vd
  1. vmul.vv vd, vs2, vs1: conducts dot production on two vectors vs2 and vs1,and stores the result in vd
  1. vlw.v vd, (rs1): loads elements stored starting at rs1 into vector vd. Thelength to load is dependent on the length stored at vl and the unit lengthpecified earlier.
  1. vsw.v vs3, (rs1): stores vector elements of vs3 into memory starting at rs1.The length to load is dependent on the length stored at vl and the unit lengthspecified earlier.2Figure 1: Matrix stored as vectorThe whole point of this project is that, through the implementation, you willunderstand why are vector operations is much faster than manipulate eachelementindividually. For example, writing 100 elements into memory will require 100instructions can be found at this manual. Reminder: Do not forget to update vl when switching to operate on vectors with differentengths.4 Matrix Multiplication After implementing and testing the aforementioned functionalities, you are requiredto write a .S file that conduct matrix to matrix multiplication.Ci,j = X Ai,kBk,j The actual matrix will be stored as a vector in memory, as shown in Figure 1. Inorder to conductvector multiplication, the size of the matrix n × m will be given.We require you to generate two random matrices with sizes of 20 × 46 and46 × 50 where elements can be of your own choice.5 Tricks There are several tricks you can apply to reduce cycle counts.
  1. Reduction (required): This is similar to calculate the summation of avector, but more efficiently. The basic requirement is that you conduct thissummation on each element one-by-one, which leads to excessive cycles.Another approach is to do binary split, i.e. repeatedly decompose the a vectorof size n into 2 vectors of size n//2, and then conduct vadd. There are alsoother trick for conducting reduction, and you can explore any of them.3Possible reduction:(a) scalar loop(b) vector shift(c) reduction instruction(d) ...Chaining (Extracredit):When conducting vector operations, it is not necessary to wait for the entire instruction to complete. As shown in Figure 2, itis possible to conduct VADD on the first element, right after obtaining thefirst element ofVMUL. A much better illustration can be found at Prof.Hsu’sslides at this link.Figure 2: chaining6 Instruction on Implementation The code involved in QTRVSim is quite complicated. Luckily, you only need tofocus on few script files.
  1. src/machine/instruction.cpp: Edit this file to add new instructions. Theboxed fields are:
  • instruction name
  • instruction enum type (you can edit this by yourself; no need to followthe example)
  • input types (you can go through instruction.cpp to see what char is forwhat type)
  • machine code (hexadecimal)
  • mask for effective bits for instruction (hexadecimal)
  • customize flags (you can edit this by yourself; no need to follow theexample)
  1. src/machine/core.cpp: Main pipeline of the simulator. You can find fetch,decode, execute, writeback, memory in it, and edit these codes for your convenience.src/machine/execute/alu.cpp: specify what to do for each alu operation.You can create/edit these codes for your own convenience.Other files might also interest you, but we will not go through all of them here.Feel free to modify any codes as long as they work.: you need to use state.cycle count++; in core.cpp when needed.Notice2: If you want to use v1,v2... as the vector register, you can modifyparse reg from string() in instruction.cpp.Notice3: You might want to check dt.num rt, dt.num rd, dt.num rs for specificregister indexing.Notice4: The largest vector register length is 32. Load instruction will have amemory latency of 32. Besides, the cycles for multiplication is 4. (This means that,to load a vector of length 10, the totalcycles will be 1 + 1 + 32 + 10 + 1 + 1 = 46)

7 Grading Criteria

The maximum score you can get for this lab is 100 points. We will first examine the correctness of your outputs to test cases. Since hard-coding each operation is fairlyeasy in C++, we will check the execution information, such as thenumber of cycles, and content in memories/registers. Using of ChatGPT to improve writing/generate codes/provide ideas is allowed and highly-recommended as ChatGPT has become one of the best productivity tools.Conducting ”higher-level” reduction or finishing the task with less 代写CSC3050  RISC-V Simulator with RVV number of cycleswill be granted with extracredit. are also required to compose a report, where you should show the resultsof your test case executions. Besides you also need to show the total number of

 and explain where those cycles come from. (few sentences, no need to besuper specific.)The deadline of this project is 23:59, Tuesday, 2024/11/19. For each day afterthe deadline, 10 points will be deducted from your final score up to 30 points, after you will get 0 points.Besides, if anyone is interested in developing with QT, you are more than welcomeo implement GUI support for RVV instruction. If done properly, you will earn extracredits, and might contribute to future contents of this class.

Feel free to ask questions if you find anything confusing.58 Submission You should make sure your code compiles and runs. Then, it should be compressedinto a .zip file and submitted to BlackBoard. Any necessary instructions tocompile and run your code should also be documented and included. Finally, you arealso required to include a report containing the results of your test case execution.

标签:CSC3050,instruction,RISC,will,vector,RVV,vd
From: https://www.cnblogs.com/comp9021T2/p/18545751

相关文章

  • riscv64-unknown-linux-gnu-strip 的功能
    riscv64-unknown-linux-gnu-strip 是针对RISC-V架构的GNUstrip工具的一个版本,用于处理RISC-V架构下的可执行文件、共享库文件以及目标文件。strip 命令的主要作用是去除这些文件中的符号表和调试信息。具体来说,strip 命令的用处包括以下几个方面:减小文件大小:通过去......
  • UcOs-III RISC-V接口移植源码阅读: os_cpu_a.S、os_cpu_c.c、os_cpu.h
    os_cpu_a.S:#********************************************************************************************************#uC/OS-III#TheReal-TimeKernel##......
  • FPGA、VHDL 基于RISC-V格式的16位位缩模型机设计
    项目地址:FPGA、VHDL基于RISC-V格式的16位位缩模型机设计设计目的实现基于RISC-V格式的16位MCU设计,参考RISC-V的基本格式,进行位数缩减。实现RISC-V中寄存器加法add,立即数加法addi,半字加载lh,半字存储sh,不等条件跳转bne,相等条件跳转beq,无条件跳转链接jal。实现立即寻址、寄存......
  • 8 位 RISC 模型机 状态机控制 ALU双端口
    8位RISC模型机状态机控制双端口项目地址:8位RISC模型机状态机控制双端口从8位寄存器(D触发器)开始DDD:8位输入......
  • 一起学RISC-V汇编第10讲之汇编器语法
    目录1常用的汇编器指令1.1定义字符串变量1.2定义整数变量1.3定义一个函数2其它汇编器指令2.1条件编译与文件引用2.2宏定义2.3循环展开2.4本地标签和程序跳转2汇编源程序例子了解了RISC-V的基础指令集以及ABI接口,我们就可以动手写汇编程序了,编写汇编程序有两种常用的方......
  • 一起学RISC-V汇编第11讲之内嵌汇编
    目录1内嵌汇编示例2内嵌汇编样式2.1模版关键字2.2汇编指令列表2.3输出操作数2.4输入操作数2.5破坏描述部分3内嵌汇编使用示例内嵌汇编(InlineAssembly),允许在高级语言(c或c++)中嵌入汇编语言,从而实现汇编语言和高级语言混合编程。我之前的一篇学习笔记讲过内嵌汇编,见risc......
  • 一起学RISC-V汇编第9讲之RISC-V ABI之寄存器使用约定
    目录1RISC-V寄存器使用约定2Caller-saved与Callee-saved2.1对比几种不同的寄存器保存方式2.2为什么要分caller-saved与callee-saved?2.3caller-saved与callee-saved寄存器的灵活使用寄存器使用约定告诉我们函数调用时通过哪些寄存器传递参数、通过哪些寄存器保存返回值、......
  • 一起学RISC-V汇编第9讲之RISC-V ABI之栈帧
    这一节讲解RISC-V中的栈帧。1C语言中的{}的秘密函数执行的底层其实是操作寄存器,CPU的寄存器是有限的,为什么我们进行一系列函数调用后还能正确运行,这些函数之间是怎么协调使用寄存器的?答案是:栈函数之间能随意调用,还能顺利恢复现场,这个就是栈的功劳。为什么我们在代码中并没有......
  • 一起学RISC-V汇编第9讲之RISC-V ABI之函数调用
    目录1RISC-VABI接口2RISC-V函数调用约定2.1入参的传递2.2返回值的传递1RISC-VABI接口ABI(ApplicationBinaryInterface)为应用程序二进制接口,它定义了应用程序之间或应用程序和操作系统之间进行二进制级交互时必须遵循的规则和约定。ABI包括了关于函数调用约定(参数传递,函......
  • PICO 2 RP2350使用官方推荐RISC-V编译器在O3优化下的coremark跑分,与Hazard3库宣传跑分
    编译环境:WSLUbuntu22.04GCC13.2.0 Hazard3存储库https://github.com/Wren6991/Hazard3/RP2350默认频率150MHz,编译内核为其RISC-V架构内核,在此频率下实测O3等级跑分453左右,O2等级跑分429左右。在测试时,当我打开第二个核心后,并且第二个核心只用来控制led灯,此时coremark跑......