首页 > 其他分享 >CS-453 Software transactional memory

CS-453 Software transactional memory

时间:2024-12-16 20:11:09浏览次数:7  
标签:transaction transactions implementation transactional memory CS Bob

Project description (CS-453)S´ebastien Rouault Antoine Murat

September 24, 2024Project forum:https://moodle.epfl.ch/mod/forum/view.php?id=108506011 Software transactional memory

our goal is to implement a software transactional memory library.The project repository (see Section 2.1) contains a very simple reference implementation of such a

lbrary, along with the skeleton for another implementation: the one you’ll have to (design and) code.The project repository includes the source code for, and a Makefile to run the program that will evaluate your implementation on the evaluation server (provided by the Distributed Computing Laboratory).

1.1 Gentle introduction

Let me introduce you to Alice. Alice runs a bank. As for any bank today, her system must be able toprocess electronic orders. And it should be fast (Alice’s bank is verysuccessful and has many customers).

At least two decades ago, to increase the system throughput, Alice could have only needed to buynewer hardware, and her program would run faster. But 15–20 years ago, this free lunch ended [1]: Alicenow has to use the computational power of several hardware threads to get the most of modern processors.This is where a transactional memory can come in handy, and where you step in to help Alice.

Alice’s program is currently written to run on a single thread. When Alice runs her code on hersingle-core processor (no concurrency), it runs as expected.For instance, let’s say that two customers, Bob and Charlie, want to transfer money: both Bob andharlie wants to wire each other (and they did not agree beforehand on “who owes who” to make onlyone transfer. . . ). The pseudo-code for the “money transfer”function could be the following:shared accounts as integer[];f Bob’s requestBEven if Bob may not be very happy with this second possible outcome, it is still a consistent executionas far as the specification of the bank is concerned: when the fundsare insufficient, no transfer happens.Now, Alice wants to try her program (by assumption correct on a single-core processor) on brand

ew, multi-core hardware. This way her system will be able to process transfers concurrently, (hopefully)answering her customers faster. So Alice, for testing purposes, runs the same program on multiplethreads, each taking customer requests as they come. To analyze what utcomes are possible when twothreads process the two respective transfers from Bob and Charlie in concurrence, we will rewrite thepseudo code to highlight each sharedmemory access, and in which order they are written in the program:fn transfer(src, dst, amount) {

2We assume that this pseudo-code runs on a pseudo-machine with atomic register1 , and that thispseudo-machine does not reorder memory accesses2 : memory is always accessed in the same order awritten in the program, and there exist a total order following which each atomic memory access happens.Supposing the account of Bob is at index 0 in accounts, and the account of Charlie at index 1, a possible,concurrent execution is then (for clarity, v0, v1, v2 have been replaced by accounts[. . . ]):Such a mechanism would prevent the execution from Figure 1, as only one thread accessing the shared

variable accounts can run at any given time. But this mechanism also prevents multiple threads fromprocessing transfers concurrently. Alice wanted to use multiple cores specifically to process multiple

requests at the same time, answering faster her (many) customers. Always taking a single, coarse lockwhen accessing the shared memory is thus not a satisfying solution.Actually, Alice noticed that some transfers can always run concurrently, without the need for any synchronization. For instance, if Bob wires Charlie and Dan wires Erin at the same time, no orderg of memory accesses can lead to an inconsistency. This is simply because the two sets of sharedmemory segments respectively accessed by these two transfers, i.e.{accounts[0], accounts[1]} and{accounts[2], accounts[3]}, do not overlap3 : they do not conflict.

This is where the transactional memory is useful: it runs in parallel transactions that do not conflict, and synchronizes these which do conflict, (hopefully) allowing to make the most out of multi-coreprocessors with minimal coding efforts. To get a glimpse of the general ideas behind the interface of the

A transaction is enclosed, between calls to tm.begin and tx.end. Unlike with a lock, these functions(especially begin) may not block, allowing for concurrent executions.Reads and writes (and allocs and frees) 代写CS-453 Software transactional memory onto the shared memory are all controlled by the software

transactional memory library: Alice’s code is not allowed to e.g. directly read accounts[dst]. Instead,a call to tx.read must be made (tx.write, tx.alloc, tx.free for otheroperations). Notably, each ofthese operations can abort, requesting the caller to try again the same transaction.

1.2 Specification

The software transactional memory library sees and controls every access to the shared memory. Thelibrary can delay execution, return two different values when reading the same address from two different,concurrent transactions, etc. Basically, the goal of the library is to make concurrent transactions appear

as if they were executed serially, without concurrency. (In the course, this notion is called opacity.) Also,once a transaction successfully committed, its modifications must be visible in subsequent transactions.A software transactional memory library that achieves this goal will be deemed correct.

Think of the serial executions of Bob’s and Charlie’s concurrent transfers (in Section 1.1): either therequest from Bob was processed first, not seeing any modification from Charlie’s request, or the opposite.They executed one after the other. Also, if Bob makes a new transaction (e.g. requesting its accountdetails) after these two transactions committed, Bob must see both ofWe can define three notions which, taken together, would satisfy the (informal) goal described above.These notions are: snapshot isolation, atomicity, and consistency. Namely:

Snapshot isolation ensures that: (1) no write/alloc/free5 in any non-committed transaction can

be read/become available/unavailable in any other transaction, (2) all the reads/writes/allocs/freesmade in a transaction appear to be made from the same atomic snapshot of the whole sharedmemory, excepted for (3) a transaction observes its own modifications (i.e. writes/allocs/frees).Example: if snapshot isolation had been satisfied in Figure 1, as Charlie’s transaction already read

v1 == 12, then the 2nd read from Charlie’s thread could not have read v2 == 20, which either: (1) isthe effect of a write in a non-committed transaction, if Bob’s transaction had not committed, or (2) belongs to a different snapshot, if Bob’s transaction had committed between the two reads.

Atomicity ensures that all the memory writes/allocs/frees of any committed transaction seem tohave all happened at one indivisible point in time.Example: without atomicity, Bob’s transaction could successfully commit, and another transaction (e.g. Charlie’s transaction) could take a snapshot where only one of the two writes had been made. 5

When Alice’s program calls tx.free, the transactional memory library is not required to immediately call libc’s free.

4❼ Consistency ensures, for a committing transaction T, that: (1) none of the transactions thatcommitted since the atomic snapshot of T (a) freed a segment of memory read or written by T or(b) allocated a segment that overlaps with a segment allocated by T, and (2) each read made by T

in its snapshot would read the same value in a snapshot taken after the last6 committed transaction.Example: let’s consider again Figure 1. Both Bob’s and Charlie’s transactions run concurrently with

(in this example) the same snapshot: Bob has 12 CHF and Charlie 10 CHF. It is consistent for Bob’s

transaction to commit first, as there has been no committed transaction since Bob’s transaction

begun and thus the shared memory remained the same since the snapshot of Bob’s transaction.

Regarding Charlie’s committing transaction, which reads memory locations that were modified since its atomic snapshot, it would not be consistent to commit: Charlie’s transaction must be retried7 . Now if we consider the other example of Bob wiring Charlie while Dan wires Erin, it would be consistent for both transactions to commit, since they accessed different words of the shared memory. Side notes:

❼ There are several definitions of consistency in the literature. Here we provide a sensible definitionfor this project, while being arguably more precise/specific/actionablethan some other existingdefinitions (e.g. “Data is in a consistent state when a transaction starts and when it ends.” [3]).

❼ Point (3) of snapshot isolation is usually found in consistency.

1.3 Possible implementations (in English) This section describes the reference implementation and a possible transactional memory for this project.You are free to implement any other software transactional memory (see the rules in Section 2.1.3). Moreideas can be found while studying:

❼ TinySTM: https://github.com/patrickmarlier/tinystm

❼ LibLTX: https://sourceforge.net/projects/libltx

❼ stmmap: https://github.com/skaphan/stmmap

1.3.1 Using a coarse-lock (i.e. the “reference” implementation)

This implementation is very simple to describe: the transactional memory library uses a single mutex to

(trivially) serialize transactions made onto the shared memory.When a transaction begins (i.e. tm.begin is called), the single mutex is taken. When the transactionends, it always commits (i.e. never retries), and the single mutex is released. Read, write, allocation anddeallocation operations are directly mapped to reading/writing the memory at the requested addresses,and essentially calling libc’s malloc/free8 .This approach is obviously correct, as it directly consists in both (1) executing transactions seriallyand (2) ensuring new transactions see all the writes/allocs/frees from the last committed transaction.From the prism of the three notions laid above, snapshot isolation is guaranteed because no (concurrent) transaction can run to observe the memory effects of the pending transaction holding the lock,

until the pending transaction commits, and so, releases the lock. Atomicity is directly guaranteed by thelock, and ditto for consistency, since there cannot be two transactions running concurrently: the pendingtransaction is the only one able to alter the state of the shared memory.

1.3.2 Dual-versioned transactional memory

This transactional memory is only provided as a suggestion. You are free to look for other implementations in the litterature. It is not the fastest possible for the grading workload9 , and not even the simplest you can implement. You thus have two choices: 1) be guided and follow the pseudocode in this document, which will get you an STM that requires a good amount of non-trivial optimizations to get the maximum grade, 6The set of committed transactions must be linearizable, and for any valid linearization, the last committed transactionfor any (but the first) transaction T is simply the transaction that is considered to have happened right before T.7As long as tm end (see Section 2.2) for Bob’s transaction has not returned control, an implementation could as well,without violating linearizability, make Bob’s transaction abort and instead let Charlie’s transaction commit.8

The implementation also needs to keep track of the allocated segments, to prevent any memory leak: see Section 2.2.9Any reasonably optimized implementation will nevertheless be faster than the reference.

52) explore the world of transactional memories by yourself and implement a more efficient algorithm such

as TL2 or LSA, which should make getting the maximum grade easier. If you want to get help from the TAs, and are not that much of an explorer, it is recommended you stick with the algorithm described below. Also, only the high-level concepts and logic will be presented. Low-level details and optimizations,

e.g. actual memory layouts, are omitted on purpose, and for you to find.

Through the use of a very basic multiversionning scheme, this implementation aims at making surread-only transactions can both: run concurrently with read-write transactions, and never fail. Two copiesof every (aligned) word of the size of the requested alignment (see Section 2.2.2) are kept. When a trans

action reads/writes words in shared memory, the transactional library has to decide for each word whichof the two copies must be read/written, or if none can be read/written and the transaction must abort.One key structure that can considerably simplify the design (at the expense of performance though)

is what we will call the Thread Batcher (or more simply, the Batcher ). The goal of the Batcher is toartificially create points in time in which no transaction is running. Thisgoal might sound uncanny, butit is not: a mere mutex, as the one used in the reference implementation, creates such points in time.

Indeed, when the transaction that got the global lock releases it, no transaction is running.You can think of the Batcher as a special kind of mutex: while a mutex lets only one blocked threadenter and leave no matter how many threads were blocked waiting, the Batcher lets each and every blocked thread(s) enter together when the last thread (from the previous batch) leaves. So the interfaceof the Batcher is pretty much the same as the one of a mutex: enter and leave (analogous to lock anunlock), and one special function called get epoch that can only be called by a thread after it entered(and before it leaved) the Batcher. get epoch returns an integer that is unique to each batch of threads

e.g. a counter, incrementing when the last thread of each batch leaves, would do well).Batcher pseudo-code, assuming all functions execute in one indivisible point in ime(they are atomic):Data: counter : integer, remaining: integer, blocked: list of threadsallocate enough space for a segment of size size;if the allocation failed then return the allocated failed;

end

initialize the control structure(s) (one per word) in the segment;

nitialize each copy of each word in the segment to zeroes;

register the segment in the set of allocated segments;

return the transaction can continue;

end

fn free(target)mark target for deregistering (from the set of allocated segments)and freeing once the last transaction of the current epoch leavesthe Batcher, if the calling transaction ends up being committed;return the transaction can continue;

end 8fn commit()foreach written word index do defer the swap, of which copy for word index is the “valid”copy, just after the last transaction from the current epochleaves the Batcher and before the next batch starts running;end return the transaction has committed;

end As for the Batcher pseudo-code, we assume the functions outlined above all execute atomically. Thismeans that the execution of a function should not be affected by the concurrent execution of otherfunctions. Practically, this implies that you should use synchronization primitives to control the access

to shared variables. Numerous implementation details, e.g. how to access the control structure of each

word, how to keep track of the allocated segments, etc, are intentionally left completely unspecified.

For reference: (very optimized) implementations can reach speedups above ×2.5 against the grading

workload, and the implementation of the TA is less than 1000 lines long (all comments and spacesincluded). The proposed algorithms can be extended with speculative execution of read-only transactions(for you to figure out) to get a higher speedup or to compensate for the lack of optimizations. You shouldlook for such optimizations after getting an initial version that works correctly.

2 The project

You are expected to be reasonably fluent in C11 or C++17 for this project. Picking one or the othershouldn’t impact the performance of your implementation (granted you don’t use overly expensive abstractions). Pick the one you’re the most familiar with. If you’re unsure, we advise you to stick to C11.

A (very brief) introduction to the memory model of C11/C++11, along with available atomic operations,are provided in Section 3.

2.1 Practical information

This project is associated with a git repository, hosted on GitHub. It will be available at https://github.com/LPD-EPFL/CS453-2024-project when the semester starts. Its layout is the following:grading/ Directory containing the source code of the evaluation program, to test your solution on

your own machine. The evaluation server (see below) runs the exact same binary, possibly with aseed. Section 2.1.1 further details how to test and later submit your code.include/ C11 and C++17 header files that define the public interface of the STM.template/ Template (in C11) of your implementation, but you are free to replace everything andwrite C++17 instead. See section 2.1.3 for more information.reference/ Reference, single lock-based implementation (see Section 1.3.1).

Note that you are not allowed to write an implementation that is equivalent to this referenceimplementation. See section 2.1.3 for more information.

sync-examples/ Examples of how to use synchronization primitives (locks, condition variables andatomics).

.compare exchange strong(The only difference between the weak and strong version of a compare-and-swap is that the weak versionmay (spuriously) act as if the stored value was not the expected value, even if they were equal.)Each of these atomic operations either have: counterparts which names end in explicit and takeone (or two) more parameter(s) (in C11), or defaulted parameters (since C++11). These parameters

specify ordering constraints when the atomic operation is used to synchronize with concurrent threads.Understanding these notions, and using the ordering parameters, are optional for the project. Memory ordering is introduced in Section 3.2 only for completeness, and for students interested to know more.Real uses of these atomic operations are available in the provided repository, in class Sync in filegrading/grading.cpp, and in the two custom definitions of struct lock t in file reference/tm.c.

标签:transaction,transactions,implementation,transactional,memory,CS,Bob
From: https://www.cnblogs.com/CSE2425/p/18609883

相关文章

  • 2024ciscn 逆向ezCsky和dump详解
    ezCskyExeinfo看了不是exeIDA分析不了,使用鸡爪Ghidra进行分析。这边顺带讲一下Ghidra的基础操作方法下载Ghidra:https://gitcode.com/gh_mirrors/gh/ghidra_installer下载java11(对版本有要求)打开.bat文件第一次用需要先输入jar文件所在的地址,比如我的就是C:\ProgramFile......
  • Elasticsearch、Logstash和Kibana
    ELK概述定义与组成部分:ELK是Elasticsearch、Logstash和Kibana的缩写,这三个工具共同构成了一个强大的日志管理和分析平台。Elasticsearch:是一个分布式的、RESTful风格的搜索和数据分析引擎。它能够存储和索引大量的数据,并提供快速的搜索功能。其核心功能是对数据进行索引,使得......
  • B4X编程语言:B4A, B4i 字符序列生成器CSBuilder
            B4X为我们提供了一个功能强大的字符串操作工具CSBuilder对象(仅用于B4A、B4i)。        CSBuilder类似 StringBuilder。 但与构建字符串不同,CSBuilder是通过操作字符序列来操作字符串,因此它也叫字符序列生成器,它构建了包含样式信息的字符序列。......
  • CSP 2024 游记
    \(\texttt{Day-INF}\)只考S。这gesp怎么只能免一次啊,有点坑。洛谷模拟赛没打到1=线,联考次次挂分,慌得一批。考得比去年低就好玩了。运动会和csp冲了一天。好消息:不用排练开幕式了。坏消息:第二天还得去。\(\texttt{Day-5}\)福建师大附中润德楼50250号。感觉状态有......
  • [SAP ABAP] 上传CSV文件到内表
    CSV文件数据测试数据.csv上传csv文件到内表的开发步骤:①选择屏幕以及上传文件的相关参数设置②获取上传的CSV文件数据行自定义的csv文件,编码格式是utf-8,但是使用GUI_UPLOAD函数读取文件数据,会出现中文乱码,因此需要给形参codepage指定编码格式'8400'③......
  • 洛谷P7911 [CSP-J 2021] 网络连接题解
    普通的模拟题,数据很小,基本排除超时超空间的可能上代码:#include<bits/stdc++.h>#defineLLlonglongusingnamespacestd;vector<pair<int,string>>sv;//用于存储Server,sv[i].first代表Server编号,sv[i].second代表Server地址intturn(stringstr){//string转int if(str.......
  • NLP论文速读(MetaMetrics)|使用人类偏好校准生成任务的度量
    论文速读|METAMETRICS:CALIBRATINGMETRICSFORGENERATIONTASKSUSINGHUMANPREFERENCES论文信息:简介:    本文探讨了在自然语言处理(NLP)和其他生成任务中,如何评估模型输出的质量以确保其与人类偏好一致。传统的评估指标(如BLEU分数)往往不能全面捕捉语言的多样......
  • ElasticSearch 常见故障解析与修复秘籍
    文章目录一、ElasticSearch启动服务提示无法使用root用户二、ElasticSearch启动提示进程可拥有的虚拟内存少三、ElasticSearch提示用户拥有的可创建文件描述符太少四、ElasticSearch集群yellow状态分析五、ElasticSearch节点磁盘使用率过高,read_only状态问题解决六、Elas......
  • 前端工程化_CSS 工具链_学习笔记
    CSS工具链css呢,有以下两个缺点1.语法缺失(循环、判断、拼接)2.功能缺失(颜色函数、数学函数、自定义函数)虽然CSS支持几个函数,比如:url('')用于引入外部资源calc()计算函数,计算尺寸、间距等linear-gradient渐变函数但还是太少了这时候有人就创造了新语言新语言是CSS......
  • Debezium TopicSelector 类设计分析
    DebeziumTopicSelector类设计分析:从需求到实现的演进过程文章目录1.引言2.设计背景3.设计演进过程4.核心设计模式分析5.关键实现细节6.性能优化策略7.最佳实践与应用场景8.总结与思考1.引言在分布式数据复制系统中,主题(Topic)命名的管理是一个看似简......