首页 > 其他分享 >计算机体系结构第三章习题存档

计算机体系结构第三章习题存档

时间:2023-02-12 11:55:38浏览次数:48  
标签:R4 pipeline R1 R2 存档 branch 习题 cycles 体系结构

3 Pipelining

3.1 A Simple Implementation of DLX

Basic steps: IF→ID→EX→MEM→WB

Read Page 3-3 to 3-5 for details.

3.2 The Basic Pipeline for DLX

During MEM stage, branch instruction does nothing because it has store the res during IF stage.

Speedup for pipelining: 

Example: 

 * Assume:

clock: 10ns;

ALU: 4 cycles, 40%;

branches: 4 cycles, 20%;

memory operations: 5 cycles, 40%.

* Suppose: due to skew and setup, pipelining machine adds to clock overhead=1ns.

* how much speedup we will gain from pipeline ?

Ans: 

unpipelined: 10*(0.4*4+0.2*4+0.4*5)=44ns

pipelined: 10+1=11ns

speedup: 44/11=4

* clock cycle in pipeline must be equal, which means it should be the largest time for any stage.

3.3 The Major Hurdle of Pipelining–Pipeline Hazards

Structural hazards: resource conficts, and some functional unit is not fully pipelined, caused mainly by float calculations.

Data hazards: data used before calculated

Control hazards: related jump

3.4 Data Hazards

Solving data hazards with forwarding, and structural hazards will be sovled together.

RAW, the most common hazard

WAW, which will no happen in DLX

WAR, example in 3-27

But not all hazards can be solved with forwarding. (3-27)

About implementing Control for DLX Pipeline, to check the next 3 lines is enough to detect hazards. (3-33)

3.5 Control Hazards

When it comes to control hazards, the most nature idea is to stall two stages. To avoid stalls, we should both find out whether branch is taken and compute the taken PC.

We compute PC in EX at first. Now we move it into ID, which needs one more adder, and save 1 stall. In the same way, we write PC in IF.

Then we try to treat each branch as taken or not, and as a compromise, the branch delay slots appears. The best choice is to take irrelated lines. Second choice is to copy from the target, or untaken branch.

3.7 Overcoming Data Hazard with Dynamic Scheduling

**The Tomasulo Approach**

Refer 3-66 to recall the process.

Example:

Using the following code fragment:

Loop: LW R1, 0(R2);   load R1 from address 0+R2

ADDI R1, R1, #1;     R1=R1+1

SW 0(R2), R1;       store R1 at address 0+R2

ADDI R2, R2, #4;        R2=R2+4

SUB R4, R3, R2;        R4=R3-R2

BNEZ R4, Loop;      Branch to loop if R4!=0

Assume that the initial value of R3 is R2+396. Throughout this exercise use the classic RISC five-stage integer pipeline and assume all memory access take 1 clock cycle.

A. Show the timing of this instruction sequence for the RISC pipeline without any forwarding or bypassing hardware but assuming a register read and a write in the same clock cycle “forwards” through the register file. Assume that the branch is handled by flushing the pipeline. If all memory references take 1 cycle, how many cycles does this loop take to execute?

Ans: First, the cycle will repeat 396/4=99 times.

Without any optimizing, all the results can only be gained after MEM (the WB stage can write before read). The PC will be written at MEM. So the pipeline will be like the following:

Thus, the answer will be 17×98+18=1684 clock cycles.

B. Show the timing of this instruction sequence for the RISC pipeline with normal forwarding and bypassing hardware. Assume that the branch is handled by predicting it as not taken. If all memory reference take 1 cycle, how many cycles does this loop take to execute?

Ans: Now the forwarding is allowed, so the only stall will occur in the situation where a register is used after load operation.

The answer is 10×98+11=991 clock cycles.

C. Assume the RISC pipeline with a single-cycle delayed branch and normal forwarding and bypassing hardware. Schedule the instructions in the loop including the branch delay slot. You may reorder instructions and modify the individual instruction operands, but do not undertake other loop transformations that change the number or opcode of the instructions in the loop. Show a pipeline timing diagram and compute the number of cycles needed to execute the entire loop.

Ans: The reordered program can be executed without any stalls:

Loop: LW R1,0(R2);   load R1 from address 0+R2

ADDI R2,R2,#4;      R2=R2+4

SUB R4,R3,R2;       R4=R3-R2

ADDI R1,R1,#1;      R1=R1+1

BNEZ R4,Loop;      Branch to loop if R4!=0

SW -4(R2),R1;     store R1 at address 0+R2

So the answer is 6×98+10=598 clock cycles.

标签:R4,pipeline,R1,R2,存档,branch,习题,cycles,体系结构
From: https://www.cnblogs.com/capterlliar/p/17112594.html

相关文章

  • 算法导论 34 章习题
    在学校学whk的空闲~\(\bold{34.1}\)\(\bold{34.1-1}\)若\(\operatorname{LONGEST-PATH-LENGTH}\)可以在多项式时间内解决,那么只需调用它并将其答案与给定的\(k\)......
  • 云计算的体系结构
    云计算的体系结构由5部分组成,分别为应用层,平台层,资源层,用户访问层和管理层,云计算的本质是通过网络提供服务,所以其体系结构以服务为核心。如下图:1,资源层资源池层是指基础架构......
  • 前端复习题记录
    异步操作有哪些?回调函数,事件监听,promise,ajax,async,setTimeout,GeneratorPromise是什么?Promise是异步编程的一种解决方案。从语法上讲,promise是一个对象,通过它可以......
  • 深度学习与神经网络练习题
    以下题来源于博思自测......
  • 计算机体系结构第一章习题存档
    课本:ComputerArchitecture:AQuantitativeApproachAuthor:JohnL.HennessyandDavidA.Patterson1 FundamentalsofComputerArchitecture1.1 LayersofC......
  • 计算机网络体系结构
    计算机网络所划分的层次以及各层协议的集合即为计算机网络体系结构。常见的三种划分方法:1.OSI参考模型国际标准化组织:开放系统互连(OpenSystemInterconnection,OSI)参考模......
  • 开学考前练习题的一点小思路
    大概是因为比较笨,刚看到王老师发的参考练习题时我是没有看懂的。原本本着熟悉一下流程的想法试了试,结果除了实现登陆外就没有什么太大的进展。后来又研究了研究,有了一点......
  • ARM体系结构与汇编
    ARM体系结构在ARM处理器中,除了用户模式是普通模式,剩下的几种工作模式都属于特权模式。应用程序如果要读写磁盘上的音频数据,驱动声卡播放音乐,往屏幕写数据显示歌词,......
  • 动态树习题
    维护连通性P2147[SDOI2008]洞穴勘测[]本篇参考大佬的题单......
  • csci 体系结构设计怎么写
    ComputerSoftwareConfigurationItem 计算机软件配置项  参考文献:计算机软件配置项csci-百度文库(baidu.com)说人话:软件产品的各个组成部分,细分到xx模块 ......