首页 > 其他分享 >CEG 4136 Computer Architecture

CEG 4136 Computer Architecture

时间:2024-09-21 17:24:39浏览次数:1  
标签:CEG code forest simulation will Computer Architecture using CUDA

CEG 4136 Computer Architecture III

Fall 2024 To be submitted September 28, 11:59 p.m.

Lab1: Optimizing Forest Fire Simulation with CUDA

  1. Introduction

In this lab, you will work on a forest fire simulation code that uses a 1000×1000 grid. The firestarts at 100 distinct locations in the forest. The provided code is implemented sequentially. Itsimulates the propagation of fire, the burning of trees, and their eventual extinction. The grid isdisplayed using the OpenGL library, where each cell represents a tree or an empty space.The objective of this lab is to parallelize the existing code using CUDA C to leverage the powerof graphics processing units (GPUs) to make the simulation faster and more efficient. You willidentify parts of the code that are most appropriate for optimization, such as the forest updateprocess, and transform them to run in parallel.

  1. Objective

The primary objective of this lab is to convert the sequential code into an optimized version usingCUDA C to accelerate the simulation. You will learn to:

  • Identify code sections that can be parallelized.
  • Use CUDA C to run computations in parallel on a GPU.
  • Measure the performance gains achieved through parallelization.2
  1. Development Platform

Development and optimization of the program will be done on machines equipped with CUDAcapable GPUs. The tools to be used include:

  • CUDA Toolkit (12.6 or later) for compiling CUDA programs.
  • Visual Studio 2022 for editing and debugging the code.
  • CUDA Debugger for testing and profiling your CUDA kernels.

You will use OpenGL for rendering the simulation, and work will be carried out on workstationswith NVIDIA GPUs that support CUDA.

  1. Tasks

Step 1: Understand the Starter Code

  • Analyze the provided code. It is a forest fire simulation where each cell in the gridrepresents either a tree or an empty space. Fire starts at 100 random locations, spreads toneighboring cells, and burning trees eventually extinguish after a set amount of time.Step 2: Identify Opportunities for Parallelization
  • Grid updating is a significant part of the 代写 CEG 4136 Computer Architecture code that can be parallelized. Each cell in the gridcan be updated independently of the others.
  • Analyze the updateForest() function, which is responsible for updating the state ofburning trees and propagating fire to neighboring cells. This is the section that needs to beoptimized using CUDA.Step 3: Implement Parallelization with CUDA C
  • CUDA Initialization: Allocate memory for the grid (forest) and burn time (burnTime) onthe GPU using cudaMalloc().
  • CUDA Kernel: Implement a kernel that updates the state of each cell in the forest inparallel.
  • Parallel Execution: Ensure that each cell in the grid is updated in parallel using multiplethreads on the GPU.
  • Block and Thread Management: Divide the grid into CUDA thread blocks for optimizedexecution.

Step 4: Measure Performance

Measure the runtime of the sequential program and compare it to the optimized CUDA version.Use CUDA profiling tools to identify performance gains and any further possible optimizations.5. Deliverables Each team must submit a report containing the following:

  • An explanation of the parts of the code that were parallelized.
  • The modified source code with the CUDA implementation.
  • A performance analysis showing the execution times before and after optimization.
  • Screenshots of the running program with visual simulation results.
  1. Evaluation Criteria

The following criteria will be considered in the evaluation:

  • Correctness: The program must work correctly after optimization. The simulation shouldbehave the same as the sequential version.
  • Effective Parallelization: The code should demonstrate proper and effective use of CUDA,with significant parallelization of the appropriate parts of the program.
  • Performance Improvement: Measurable performance gains should be demonstrated withthe CUDA version. The difference in execution times between the sequential and parallelversions must be clearly explained.
  • Code Quality: The code should be well-structured, commented, and follow goodrogramming practices.

Note: This lab serves as an introduction to parallelization using CUDA, so it's important to have

a solid understanding of the basics of CUDA before you begin coding.

标签:CEG,code,forest,simulation,will,Computer,Architecture,using,CUDA
From: https://www.cnblogs.com/WX-codinghelp/p/18424269

相关文章

  • CVPR(Conference On Computer Vision and Pattern Recognition)近十年研究热点追踪
    CVPR(ConferenceOnComputerVisionandPatternRecognition)近十年研究热点追踪CVPR近十年(2015~2024)研究热点追踪......
  • COMP90086 Computer Vision
    COMP90086Computer Vision,2024Semester 2Assignment3: ComputingdisparitybetweenstereoimagesIntroductionFindingthedisparitiesbetweentwoimagesisthemainstepinestimatingthethreedimensionalstruc- ture of a scene.This assignmentuse......
  • Professional Linux Kernel Architecture(一)
    基于linux内核2.6.24版本,书籍:ProfessionalLinuxKernelArchitecture英文版(可在https://github.com/welldef/os_books.git下载)1一些概念1.1微内核和单体内核微内核:只有最基本的功能直接在中央内核(微内核)中实现。所有其他功能都委托给各自独立的进程,这些进程通过通信接口与......
  • FIT1047 Introduction to computer systems
    FIT1047 Introductiontocomputersystems, networksand security–S22024Assignment3– NetworksPurposeStudentswill recorddatafromareal-worldwireless networkanddemonstrate that theycananalyse it, identify its propertiesand p......
  • ITD102: Computer Technology Fundamentals
    ITD102:ComputerTechnologyFundamentalsWorkbook2:HighLevelTechnologiesThisdocumentcontainsthe practicalexercises questionsrelevanttothesecondpartofthisunit.RaspberryPi:AllstudentsneedaRaspberryPi.Thereisnotextbooktopurchas......
  • 论文分享 《Timing Side-channel Attacks and Countermeasures in CPU Microarchitect
    Attack概述传统攻击(CONVENTIONALATTACKS)在传统攻击中,Attacker通常:与Victim共享硬件资源(比如说LLC,BP,Prefetcher等)可以观察,改变微架构状态攻击步骤本文作者将传统攻击分为以下三步,如Fig1所示:定位“漏洞”:该漏洞包括“代码漏洞”(vulnerablecodegadgets),即......
  • CS5229 Advanced Computer Networks
    CS5229AdvancedComputerNetworksProgrammingAssignment1Semester1AY2024/251 OverviewInthisprogramming assignment,youwill explore sketch-based algorithms designed to mon- itor network traffic across a high-traffic network link. We......
  • SOMEIP_ETS_105: SD_ClientServiceGetLastValueOfEventUDPUnicast
    测试目的:验证DUT在客户端服务模式下能够订阅事件组,接收UINT8UDP单播事件,并在触发clientServiceGetLastValueOfEventUDPUnicast方法后返回该事件的值。描述本测试用例旨在确保DUT能够在客户端服务模式下正确地处理订阅和单播事件接收流程,并且能够通过特定的方法返回最近......
  • 关于RTX 4090 微调llama2模型时出现nvcc fatal : Unsupported gpu architecture 'comp
    RTX4090是现在普通人可以轻松获取的最好的显卡了。运算速度仅次于专业图形卡TeslaA100,A800,H100RTX4090显卡是可以单卡推理llama27b和13b模型的,7b模型占用缓存14G左右,13b模型单卡推理显存占用在23G多点(只是运行一段时间容易爆显存),所以普通人都是可以使用llama2大语言模型。......
  • SC3060: Computer Graphics and Visualization
    SC3060:ComputerGraphicsandVisualizationSWLaborYourOwn ComputerMaking Images with MathematicsLab Experiments 1- 5SESSION 2024/2025SEMESTER 1COMPUTERSCIENCECOURSEMAKING IMAGESWITH MATHEMATICS1. OBJECTIVEInthiscourseworkyou w......