首页 > 其他分享 >Design Space Exploration

Design Space Exploration

时间:2024-11-11 18:09:19浏览次数:1  
标签:Space Bench Exploration EDP Design normalized il1 design absolute

Design Space Exploration

Solutions to project assignments are to be developed within your group, without collaboration with other groups. However, as the projects in this class require the use ofsoftware tools and frameworks that students may have uneven prior familiarity with, discussion and assistance among students in gaining expertise with these software tools constitutes acceptable behavior. Note that this assistance and discussion cannot include

the sharing of access to any code produced in solution to the project assignments. In

order to avoid potential ambiguity in what constitutes ”code produced in solution to theproject assignment,” students wishing to aid their peers with auxiliary supporting scripts,mechanisms, or examples are directed to pass any such artifacts to the course staff for vetting and possible inclusion on project-specific FAQs rather than share it with their peersdirectly.In this project, you are going to use SimpleScalar as the evaluation engine to performa design space exploration, using the provided framework code, over a 18dimensionalprocessor pipeline and memory hierarchy design space (some of these dimensions are notindependent). You will use a 5-benchmark suite as the workload.

  1. Project GoalYour assignment is to, with an evaluation count limit of 1000 design points, explore thedesign space in order to select the best performing design under a set of two differentoptimization functions. These include:
  1. The “best” performing overall design (in term of the geometric mean of normalized execution time normalized across all benchmarks)
  1. The most energy-efficient design (as measured by the lowest geometric mean ofnormalized energy-delay product [units of energy delay product are joule-seconds]across all benchmarks)
  1. Background

2.1. SimpleScalar SimpleScalar is an architectural simulator which enables a study of how different processor and memory system parameters affect performance and energy efficiency. Thesimulator accepts a set of system design parameters and an executable (workload) to runon the described system. A wide range of system statisticsare recorded by the simulatoras the executable runs on the simulated system. Once the framework in this project issetup, interested readers can have a look at one of the log files in rawProjectOutputData folder to view SimpleScalar output.This project heavily uses SimpleScalar but most of the interface is abstracted out by asimpler framework interface. Nevertheless, you can refer to this SimpleScalar guide fordetails about parameters passed to SimpleScalar.

2.2. Design Space Exploration Given a set of design parameters, Design Space Exploration (DSE) involves probing var

ious design points to find the most suitable design to meet required goals. Follow thisquick reading about DSE before moving ahead.

DSE can be performed for different design goals. For example, one DSE may want tofind the best performing design whereas another DSE may be aimed at finding the mostenergy efficient design. A more complex DSE may look for the best performing designgiven a fixed energy budget.An exhaustive DSE simply tries out all possible combinations of parameter values tfind the absolute best design. However, as the size of design space increases this approachquickly becomes infeasible. Consider a 10-dimensional design space with 5 possiblevalues for each parameter and 2 minutes simulation time to evaluate a given designpoint;an exhaustive search will take 5 10 2min 37years.A more intelligent DSE employs heuristics t代写Design Space Exploration o intelligently prune down the design spaceand to prioritize evaluation of more reasonable design points first. If the assumptionsemployed by the heuristics are correct, the DSE will still result in the best design. On theother hand. with a set of reasonably justified assumptions a heuristic can result in a “goodenough” design point.

2.3. Energy-Delay Product Energy-Delay Product (EDP) is a metric which consolidates both performance and energy

efficiency.EDP = total execution energy * execution time Design A takes 100pJ to process an image in 100ms, EDP = 10000 units. Design Btakes 80pJ to process an image in 2000ms, EDP = 160000. Design A is clearly moreenergy efficient, but it performs poorly as it incurs more execution time. EDP enables amore holistic design comparison.

  1. Our HeuristicWe define OurHeuristic as follows:
  1. Design space dimensions can be labelled as either explored and unexplored.
  2. Initially all dimensions are unexplored
  3. Choose an unexplored dimension, exit if all dimensions are explored3.1. Evaluate all possible design points by changing the value of this dimensiononly23.2. Fix value of this dimension by selecting the best design so far (considerDSE goal)3.3. Mark this dimension as explored
  1. Go to step 3.You should choose an unexplored dimension in step 3 based on your PSU ID Numbersof students in the group, as follows.DSE dimensions can be categorized in four major classes as follows:
  1. Branch predictor (BP) configurations (i.e. branchsettings, ras, btb)
  2. Cache configurations (i.e. {l1, ul2}block, {dl1, il1, ul2}sets, {dl1, il1, ul2}assoc)
  3. Core configurations (i.e. width, scheduling)
  4. Floating Point Unit (FPU) configuration (i.e. fpwidth)Based on your

ID numbers, you should calculate

(

8.2.3. Cache and Memory

Following list comprises tuples of format: [cache size or memory, access energy(pJ),

leakage/refresh power(mW)]

  • 8KB: 20pJ, 0.125mW
  • 16KB: 28pJ, 0.25mW

632KB: 40pJ, 0.5mW

  • 64KB: 56pJ, 1mW
  • 128KB: 80pJ, 2mW
  • 256KB: 112pJ, 4mW
  • 512KB: 160pJ, 8mW
  • 1024KB: 224pJ, 16mW
  • 2048KB: 360pJ, 32mW
  • Main Memory: 2nJ, 512mW

8.2.4. Energy per Committed Instruction

  • Dynamic, fetch width = 1: 10pJ
  • In-order, fetch width = 1: 8pJ
  • Dynamic, fetch width = 2: 12pJ
  • In-order, fetch width = 2: 10pJ
  • Dynamic, fetch width = 4: 18pJ
  • In-order, fetch width = 4: 14pJ
  • Dynamic, fetch width = 8: 27pJ
  • In-order, fetch width = 8: 20pJ

8.3. Validation Constraints

You must implement these validation constraints in your code. Specifically, validate

Configuration and generateCacheLatencyParams must be implemented properly.

  1. The il1 (L1 instruction cache) block size must be at least the ifq (instruction fetchqueue) size (e.g., for the baseline machine the ifqsize is set to 1 word (8B) thenthe il1 block size should be at least 8B). The dl1 (L1 data cache) should have thesame block size as your il1.
  1. The ul2 (unified L2 cache) block size must be at least twice your il1 (and dl1)block size with a maximum block size of 128B. Your ul2 must be at least twice aslarge as il1+dl1 in order to be inclusive.
  1. il1 size and dl1 size: Minimum = 2 KB; Maximum = 64 KB
  2. ul2 size: Minimum = 32 KB; Maximum = 1 MB
  3. The il1 sizes and il1 latencies are linked as follows (the same linkages hold for the

dl1 size and dl1 latency):

(a) il1 = 2 KB means il1lat = 1

(b) il1 = 4 KB means il1lat = 2

(c) il1 = 8 KB means il1lat = 3

(d) il1 = 16 KB means il1lat = 4

(e) il1 = 32 KB means il1lat = 5

(f) il1 = 64 KB means il1lat = 6

(g) The above are for direct mapped caches. For 2-way set associative add 1

additional cycle of latency to each of the above; for 4-way add 2 additional

cycles; for 8-way add 3 additional cycles.

  1. The ul2 sizes and ul2 latencies are linked as follows:

(a) ul2 = 32 KB means ul2lat = 5

(b) ul2 = 64 KB means ul2lat = 6

7(c) ul2 = 128 KB means ul2lat = 7

(d) ul2 = 256 KB means ul2lat = 8

(e) ul2 = 512 KB means ul2 lat = 9

(f) ul2 = 1024 KB (1 MB) means ul2lat = 10

(g) The above are for direct mapped caches. For 2-way set associative add 1additional cycle of latency to each of the above; for 4-way add 2 additionalcycles; for 8-way add 3 additional cycles; for 16-way add 4 additionalcycles.

8.4. Miscellaneous Constraints These constraints have already been specified in the framework. Have a look at SimpleScalar invocation command in runprojectsuite.sh for an exhaustive list of specifiedparameters. Moreover, any parameter not specified in runprojectsuite.sh will default toSimpleScalar default settings.

A.2. Plots The report should include the following four plots:

  1. Line plot of normalized geomean execution time (y axis) for each considered design point vs. number of designs considered (x axis)
  1. Line plot of normalized geomean of energy-delay product (y axis) vs number ofdesigns consideredBar chart showing normalized per-benchmark execution time and geomean normalized execution time for the best performing designBar chart showing per-benchmark normalized energy-delay product and geomeannormalized energy delay product for the most energy-efficient design found

These four plots must be labelled in your report corresponding exactly to num

bering in the list above. Furthermore, axis in the plots should be properly labelled.

9A.3. Other Guidelines For clarity in the written report, when listing the best design points, please do not represent

be assigned for following the guidelines and adhering to appropriatelevels of clarity, and style (and spelling, grammar, etc.) for a technical document.

10B. Project FAQs

Q: What are the column headers for the .log file?

A: normalized EDP, normalized Execution time, absolute EDP, absolute Execution

time. The writes to both the .best and .log files are generated near the end of main.

Q: What are the column headers for the .best file?

A: Headers differ by line:

Line 1 headers: bestEDPconfig, normalized EDP of bestEDPconfig, normalized Execution time of bestEDPconfig, absolute EDP of bestEDPconfig, absolute Execution timeof bestEDPconfig, absolute EDP of Bench 0 on bestEDPconfig, normalized EDP of Bench0 on bestEDPconfig, absolute EDP of Bench 1 on bestEDPconfig, normalized EDP ofBench 1 on bestEDPconfig, absolute EDP of Bench 2 on bestEDPconfig, normalizedEDP of Bench 2 on bestEDPconfig, absolute EDP of Bench 3 on bestEDPconfig, normalized EDP of Bench 3 on bestEDPconfig, absolute EDP of Bench 4 on bestEDPconfig,normalized EDP of Bench 4 on bestEDPconfig

Line 2 headers: bestTimeconfig, normalized EDP of bestTimeconfig, normalized Ex

ecution time of bestTimeconfig, absolute EDP of bestTimeconfig, absolute Executiontime of bestTimeconfig, absolute Time of Bench 0 on bestTimeconfig, normalized Timeof Bench 0 on bestTimeconfig, absolute Time of Bench 1 on bestTimeconfig, normalized Time of Bench 1 on bestTimeconfig, absolute Time of Bench 2 onbestTimeconfig,normalized Time of Bench 2 on bestTimeconfig, absolute Time of Bench 3 on bestTimeconfig, normalized Time of Bench 3 on bestTimeconfig, absolute Time of Bench 4 onbestTimeconfig, normalized Time of Bench 4 on bestTimeconfig

Q: Why are there only 18 configuration parameters when SimpleScalar (and the

project specification) list so many more?

A: There are 18 configuration variables, and more derived settings from those 18 con-figuration variables, and still more settings that are fixed as constant (e.g. MPLAT). Giventhe block size (set independently), associativity (set independently), and number of sets(set independently), you can determine total cache size for the L1D and I caches and thenvalidate if the latency for that cache (set independently) is set correctly.

Q: What’s a quota error, why are half my output files empty, and why can’t I

make new files anymore?

A: It means you are out of disk space. Each run of this program produces a large number of intermediate output files for the evaluated design points. These are kept to speedup subsequent evaluations of the same design point in future runs as a means of reducingdebugging/heuristic development time. Consider cleaning out your browser caches if youare low on disk quota before performing a project run.

标签:Space,Bench,Exploration,EDP,Design,normalized,il1,design,absolute
From: https://www.cnblogs.com/comp9321/p/18540209

相关文章

  • Cadence IC617为什么design库被识别成了Technology库,如何转换
    在我们设计电路过程中,经常建立工程,在本次设计电路过程中,得到别人给我的电路之后,一打开,电路不好使,元件库文件识别错了。相关联文件,才发现该文件已经被识别成Technology库了。这怎么办?这个问题好解决,你只要打开文件,看到文件下面有三个这样的文档,如下图。然后把图中画圈的位置......
  • 2025年航天航空工程与材料技术国际会议(AEMT 2025) 2025 International Conference on
    @目录一、会议详情二、重要信息三、大会介绍四、出席嘉宾五、征稿主题一、会议详情二、重要信息大会官网:https://ais.cn/u/vEbMBz三、大会介绍四、出席嘉宾五、征稿主题如想"投稿"请点击如下图片......
  • 【buuctf】[WUSTCTF2020]spaceclub
    小白的第七天,日常记录WP。flag奉上:wctf2020{h3re_1s_y0ur_fl@g_s1x_s1x_s1x}1.打开下载的附件发现里面有内容但是被隐藏了。2.使用Sublime_Text打开选中发现了一堆点。3.猜测是二进制数,将长的替换成1,短的替换成0。(先替换长的不然会将短的部分变为1)4.使用python脚本......
  • C++ namespace介绍
    我们来看一下这一段代码:intrand=0;intmain(){ printf("%d",rand); return0;}运行结果如下:当我们添加一个头文件stdlib.h时,运行结果如下:我们可以发现,报错了。这里的问题出现在我们在全局定义了一个变量rand.并且导入了一个头文件stdlib.h在stdlib.h这个......
  • c++中使用using namespace的命名空间污染问题
    如果 a.h 中使用了 usingnamespaceaa;,并且 b.cpp 包含了 a.h,那么在 b.cpp 中可以直接使用 aa 命名空间中的内容,因为 usingnamespaceaa; 会被一并包含进来。解释usingnamespace 会将指定命名空间中的所有符号引入当前作用域。当 b.cpp 包含 a.h 时,a.h 中......
  • 【Axure】Arco Design组件库 - AxureMost
    【Axure】ArcoDesign组件库-AxureMostAxureMost官网【Axure】ArcoDesign组件库-AxureMost【Axure】ArcoDesign组件库/元件库ArcoDesign组件库旨在提供一套高效、美观的企业级设计解决方案。它包含丰富的组件和样式,覆盖了多种交互场景。以下是ArcoDesign组件库......
  • TDesign了解及使用
    文章目录1、概述2、快速开始2.1使用npm安装2.2通过浏览器引入安装2.3、使用3、简单案例3.1路由创建3.2、页面创建3.3、Table组件3.4、序号展示3.5、图片展示及预览3.6、性别字段处理1、概述TDesign是腾讯推出的设计系统,旨在提供一致的设计语言和视觉......
  • MudBlazor:基于Material Design风格开源且强大的Blazor组件库
    项目介绍MudBlazor是一个基于MaterialDesign风格开源、免费(MITLicense)、功能强大的Blazor组件框架,注重易用性和清晰的结构。它非常适合想要快速构建Web应用程序的.NET开发人员,无需费力地处理CSS和JavaScript。由于MudBlazor完全使用C#编写,因此你可以自由地调整、修复或扩......
  • Vue3+AntDesign后台管理系统 | 小蚂蚁云
      项目介绍基于SpringBoot3、SpringSecurity、MybatisPlus、Vue3、TypeScript、Vite、AntDesign、MySQL等技术栈实现的单体前后端分离后台管理系统;后端基于Java语言采用SpringBoot3、SpringSecurity、MybatisPlus、MySQL等主流技术栈,前端基于Vue3、TypeScript、Vite等技术栈......
  • 推荐一款业内领先的建模工具:SAP PowerDesigner
    SAPPowerDesigner是一款业内领先的建模工具,帮助您改进商务智能,打造更卓越的信息架构。通过该软件的元数据管理功能,可以构建关键信息资产的360度全方位视图,从而使数据管理、BI、数据集成和数据整合工作大获裨益。其分析功能有助于整个信息架构中进行相关变更时,缩短时间,降低风......