首页 > 其他分享 >CMPSC311 mdadm Linear Device

CMPSC311 mdadm Linear Device

时间:2023-04-08 13:58:41浏览次数:39  
标签:Linear int cache num Device mdadm your block


Assignment #4 – mdadm Linear Device (Caching) CMPSC311 - Introduction to Systems Programming Fall 2021 - Prof. Aghayev

Due date: April 9, 2023 (11:59 PM) EST

You just completed implementing mdadm and it is working. The software engineers who plan to build secure crypto wallet on top of your storage system have been torturing your storage system by throwing at it all sorts of I/O patterns, and they have been unable to find any inconsistency in your implementation. This is great, because now you have a working system, even though it may not be performant. As professor John Ousterhout of Stanford says, “the best performance improvement is the transition from nonworking state to working state”.

The software engineers are happy that your storage system is working correctly, but now they want you to make it fast as well. To this end, you are going to implement a block cache in mdadm. Caching is one of the oldest tricks in the book for reducing request latency by saving often used data in a faster (and smaller) storage medium than your main storage medium. Since we covered caching extensively in the class, we are skipping its details in this document. You must watch the lecture to understand what caching is, and how the least-recently used (LRU) algorithm that you are going to implement in this assignment works.

Overview

In general, caches store key and value pairs in a fast storage medium. For example, in a CPU cache, the key is the memory address, and the value is the data that lives at that address. When the CPU wants to access data at some memory address, it first checks to see if that address appears as a key in the cache; if it does, the CPU reads the corresponding data from the cache directly, without going to memory because reading data from memory is slow.

In a browser cache, the key is the URL of an image, and the value is the image file. When you visit a web site, the browser fetches the HTML file from the web server, parses the HTML file and finds the URLs for the images appearing on the web page. Before making another trip to retrieve the images from the web server, it first checks its cache to see if the URL appears as a key in the cache, and if it does, the browser reads the image from local disk, which is much faster than reading it over the network from a web server.

In this assignment you will implement a block cache for mdadm. In the case of mdadm, the key will be the tuple consisting of disk number and block number that identifies a specific block in JBOD, and the value will be the contents of the block. When the users of mdadm system issue mdadm_read call, your implementation of mdadm_read will first look if the block corresponding to the address specified by the user is in the cache, and if it is, then the block will be copied from the cache without issuing a slow JBOD_READ_BLOCK call to JBOD. If the block is not in the cache, then you will read it from JBOD and insert it to the cache, so that if a user asks for the block again, you can serve it faster from the cache.

Cache Implementation

Typically, a cache is an integral part of a storage system and it is not accessible to the users of the storage system. However, to make the testing easy, in this assignment we are going to implement cache as a separate module, and then integrate it to mdadm_read and mdadm_write calls.

Please take a look at cache.h file. Each entry in your cache is the following struct.

typedef struct {

bool valid;

int disk_num;

int block_num;

uint8_t block[JBOD_BLOCK_SIZE];

int access_time;

} cache_entry_t;

The valid field indicates whether the cache entry is valid. The disk_num and block_num fields identify the block that this cache entry is holding and the block field holds the data for the corresponding block. The access_time field stores when the cache element was last accessed—either written or read.

The file cache.c contains the following predefined variables.

static cache_entry_t *cache = NULL;

static int cache_size = 0;

static int clock = 0;

static int num_queries = 0;

static int num_hits = 0;

Now let’s go over the functions declared in cache.h that you will implement and describe how the above variables relate to these functions. You must look at cache.h for more information about each function.

1. int cache_create(int num_entries);Dynamicallyallocatespacefornum_entriescache entries and should store the address in the cache global variable. It should also set cache_size to num_entries, since that describes the size of the cache and will also be used by other functions. Calling this function twice without an intervening cache_destroy call (see below) should fail.

2. int cache_destroy(void);Freethedynamicallyallocatedspaceforcache,andshouldsetcache to NULL, and cache_size to zero. It should also set cache_size to num_entries, since that is the size of the cache and it will also be used by other functions. Calling this function twice without an intervening cache_create call should fail. The num_entries argument can be 2 at minimum and 4096 at maximum.

3. int cache_lookup(int disk_num, int block_num, uint8_t *buf);Lookuptheblock identified by disk_num and block_num in the cache. If found, copy the block into buf, which can- not be NULL. This function must increment num_queries global variable every time it performs a lookup. If the lookup is successful, this function should also increment num_hits global variable; it should also increment clock variable and assign it to the access_time field of the corresponding entry, to indicate that the entry was used recently. We are going to use num_queries and num_hits variables to compute your cache’s hit ratio.

4. int cache_insert(int disk_num, int block_num, uint8_t *buf);Inserttheblock identified by disk_num and block_num into the cache and copy buf—which cannot be NULL—to the corresponding cache entry. Insertion should never fail: if the cache is full, then an entry should be overwritten according to the LRU policy using data from this insert operation. This function should also increment and assign clock variable to the access_time of the newly inserted entry.

5. void cache_update(int disk_num, int block_num, const uint8_t *buf);Ifthe entry exists in cache, updates its block content with the new data in buf. Should also update the access_time if successful.

6. bool cache_enabled(void); Returns true if cache is enabled. This will be useful when integrat- ing the cache to your mdadm_read and mdadm_write functions.

Strategy for Implementation

The tester now includes new tests for your cache implementation. You should first aim to implement functions in cache.c and pass all the tester unit tests. Once you pass the tests, you should incorporate your cache into your mdadm_read and mdadm_write functions—you need to implement caching in mdadm_write as well, because we are going to use write-through caching policy, as described in the class. Once you do that, make sure that you still pass all the tests.

Next, try your implementation on the trace files and see if it improves the performance. To evaluate the performance, we have introduced a new cost is a metric into JBOD for measuring the effectiveness of your cache, which is calculated based on the number of operations executed. Each JBOD operation has a different cost, and by effective caching, you reduce the number of read operations, thereby reducing your cost. Now, the tester also takes a cache size when used with a workload file, and prints the cost and hit rate at the end. The cost is computed internally by JBOD, whereas the hit rate is printed by cache_print_hit_rate function in cache.c. The value it prints is based on num_queries and num_hits variables that you should increment.

Here’s how the results look like with the reference implementation. First, we run the tester on random input file:

$ ./tester -w traces/random-input >x

Cost: 18948700

Hit rate: -nan%

The cost is 18948700, and the hit rate is undefined because we have not enabled cache. Next, we rerun the tester and specify a cache size of 1024 entries, using -s option:

$ ./tester -w traces/random-input -s 1024 >x

Cost: 17669400

Hit rate: 24.5%

As you can see, the cache is working, given that we have non-zero hit rate, and as a result, the cost is now reduced. Let’s try it one more time with the maximum cache size:

$ ./tester -w traces/random-input -s 4096 >x

Cost: 13091800

Hit rate: 87.9%

$ diff x traces/random-expected-output

$

Once again, we significantly reduced the cost using a larger cache. We also make sure that introducing caching does not violate correctness by comparing the outputs. If introducing a cache violates correctness of your mdadm implementation, you will get a zero grade for the corresponding trace file.

Grading

8 points of your grade will come from passing the unit tests. You will get an extra point for each of the random and linear trace files, if you demonstrate that your cache has reduced the cost.

WX:codehelp mailto: [email protected]

标签:Linear,int,cache,num,Device,mdadm,your,block
From: https://www.cnblogs.com/qujava/p/17298401.html

相关文章

  • 现代计算机图形学——P1-2. Review of Linear Algebra
    (自用笔记)P1.(虚假的P1)计算机图形学、其研究内容及其相关领域与学科P1.(真正的P1)OverviewofComputerGraphicsP2.ReviewofLinearAlgebraP1计算机图形学——是研究怎样用计算机输入、生成(处理)、存储、显示、输出图形的一门学科。构成图形的要素:1.几何要素——形......
  • DevEco Device Tool 3.1 Release新版本发布,新增资源管理器、SFTP、HDC
     DevEcoDeviceTool是面向智能设备开发者提供的一站式集成开发环境,支持代码编辑、编译、烧录和调试、性能监测等功能,支持C/C++语言,以插件的形式部署在VisualStudioCode(简称VSCode)上,支持Windows1064位或Ubuntu18.04-21.10版本。本次为大家带来的是DevEcoDeviceTool3.1......
  • LinearLayout增加divider分割线
    在android3.0及后面的版本在LinearLayout里增加了个分割线android:divider="@drawable/shape"<!--分割线图片-->android:showDividers="middle|beginning|end"<!--分割线位置-->分割线如果是图片那就直接使用图片就行,如果要使用颜色就必须使用shape来显......
  • 【Linux】raid管理工具-mdadm-raid0管理
    文章目录mdadm介绍mdadm命令的参数实战raid0新增两块硬盘创建raid0把配置信息保存起来对raid0创建分区格式化分区并挂载设置开机自动挂载mdadm介绍管理软raid工具:mdadmmdadm是linux下用于创建和管理软件RAID的命令,是一个模式化命令mdadm命令的参数-C建立一个新阵列-A激活磁......
  • 【linux】mdadm-raid1管理
    文章目录回顾:raid1原理:实验内容:1)创建分区2)创建raid13)将RAID1信息保存到配置文件中4)检查我们的磁盘阵列5)在raid设备上创建文件系统并挂载6)创建测试文件,看如果一块磁盘坏掉,数据是否丢失7)模拟损坏(sdd1盘坏掉了)8)移除坏掉的设备,同时另外加一个备份盘9)增加一块热备盘总结:回顾:rai......
  • MIT Linear Algebra
    MITLinearAlgebra:按照列的方式进行多维向量的线性组合而不是用对应行和列的各元素的乘积和;......
  • Ceres 中的线性求解器类型(linear_solver_type)
    CeresSolver中的线性求解器类型(linear_solver_type)有多个选项,包括:DENSE_QR:使用稠密QR分解方法求解线性方程组。适用于内存足够的小规模问题,求解速度较快。DENSE_SCHUR:......
  • linux rm 命令, Device or resource busy,无法删除
    环境:linuxCentOS遇到的问题:我打断了pytorch下的模型训练,导致tensorboard输出的文件无法删除。想使用rm-r删除文件夹时候遇到错误。rm:cannotremove`你的文......
  • Tensflow & Numpy to implement Linear Regresssion and Logistic Regression
    OptionalLab-NeuronsandLayers¶Inthislabwewillexploretheinnerworkingsofneurons/unitsandlayers.Inparticular,thelabwilldrawparallelstot......
  • Python互联网大数据爬虫的武汉市二手房价格数据采集分析:Linear Regression模型、XGBoo
    全文链接:http://tecdat.cn/?p=31958原文出处:拓端数据部落公众号分析师:YanLiu我国有大量的资金都流入了房地产行业,同时与其他行业有着千丝万缕的联系,可以说房地产行业对......