Project description (CS-453)S´ebastien Rouault Antoine Murat
September 24, 2024Project forum:https://moodle.epfl.ch/mod/forum/view.php?id=108506011 Software transactional memory
our goal is to implement a software transactional memory library.The project repository (see Section 2.1) contains a very simple reference implementation of such a
lbrary, along with the skeleton for another implementation: the one you’ll have to (design and) code.The project repository includes the source code for, and a Makefile to run the program that will evaluate your implementation on the evaluation server (provided by the Distributed Computing Laboratory).
1.1 Gentle introduction
Let me introduce you to Alice. Alice runs a bank. As for any bank today, her system must be able toprocess electronic orders. And it should be fast (Alice’s bank is verysuccessful and has many customers).
At least two decades ago, to increase the system throughput, Alice could have only needed to buynewer hardware, and her program would run faster. But 15–20 years ago, this free lunch ended [1]: Alicenow has to use the computational power of several hardware threads to get the most of modern processors.This is where a transactional memory can come in handy, and where you step in to help Alice.
Alice’s program is currently written to run on a single thread. When Alice runs her code on hersingle-core processor (no concurrency), it runs as expected.For instance, let’s say that two customers, Bob and Charlie, want to transfer money: both Bob andharlie wants to wire each other (and they did not agree beforehand on “who owes who” to make onlyone transfer. . . ). The pseudo-code for the “money transfer”function could be the following:shared accounts as integer[];f Bob’s requestBEven if Bob may not be very happy with this second possible outcome, it is still a consistent executionas far as the specification of the bank is concerned: when the fundsare insufficient, no transfer happens.Now, Alice wants to try her program (by assumption correct on a single-core processor) on brand
ew, multi-core hardware. This way her system will be able to process transfers concurrently, (hopefully)answering her customers faster. So Alice, for testing purposes, runs the same program on multiplethreads, each taking customer requests as they come. To analyze what utcomes are possible when twothreads process the two respective transfers from Bob and Charlie in concurrence, we will rewrite thepseudo code to highlight each sharedmemory access, and in which order they are written in the program:fn transfer(src, dst, amount) {
2We assume that this pseudo-code runs on a pseudo-machine with atomic register1 , and that thispseudo-machine does not reorder memory accesses2 : memory is always accessed in the same order awritten in the program, and there exist a total order following which each atomic memory access happens.Supposing the account of Bob is at index 0 in accounts, and the account of Charlie at index 1, a possible,concurrent execution is then (for clarity, v0, v1, v2 have been replaced by accounts[. . . ]):Such a mechanism would prevent the execution from Figure 1, as only one thread accessing the shared
variable accounts can run at any given time. But this mechanism also prevents multiple threads fromprocessing transfers concurrently. Alice wanted to use multiple cores specifically to process multiple
requests at the same time, answering faster her (many) customers. Always taking a single, coarse lockwhen accessing the shared memory is thus not a satisfying solution.Actually, Alice noticed that some transfers can always run concurrently, without the need for any synchronization. For instance, if Bob wires Charlie and Dan wires Erin at the same time, no orderg of memory accesses can lead to an inconsistency. This is simply because the two sets of sharedmemory segments respectively accessed by these two transfers, i.e.{accounts[0], accounts[1]} and{accounts[2], accounts[3]}, do not overlap3 : they do not conflict.
This is where the transactional memory is useful: it runs in parallel transactions that do not conflict, and synchronizes these which do conflict, (hopefully) allowing to make the most out of multi-coreprocessors with minimal coding efforts. To get a glimpse of the general ideas behind the interface of the
A transaction is enclosed, between calls to tm.begin and tx.end. Unlike with a lock, these functions(especially begin) may not block, allowing for concurrent executions.Reads and writes (and allocs and frees) 代写CS-453 Software transactional memory onto the shared memory are all controlled by the software
transactional memory library: Alice’s code is not allowed to e.g. directly read accounts[dst]. Instead,a call to tx.read must be made (tx.write, tx.alloc, tx.free for otheroperations). Notably, each ofthese operations can abort, requesting the caller to try again the same transaction.
1.2 Specification
The software transactional memory library sees and controls every access to the shared memory. Thelibrary can delay execution, return two different values when reading the same address from two different,concurrent transactions, etc. Basically, the goal of the library is to make concurrent transactions appear
as if they were executed serially, without concurrency. (In the course, this notion is called opacity.) Also,once a transaction successfully committed, its modifications must be visible in subsequent transactions.A software transactional memory library that achieves this goal will be deemed correct.
Think of the serial executions of Bob’s and Charlie’s concurrent transfers (in Section 1.1): either therequest from Bob was processed first, not seeing any modification from Charlie’s request, or the opposite.They executed one after the other. Also, if Bob makes a new transaction (e.g. requesting its accountdetails) after these two transactions committed, Bob must see both ofWe can define three notions which, taken together, would satisfy the (informal) goal described above.These notions are: snapshot isolation, atomicity, and consistency. Namely:
❼ Snapshot isolation ensures that: (1) no write/alloc/free5 in any non-committed transaction can
be read/become available/unavailable in any other transaction, (2) all the reads/writes/allocs/freesmade in a transaction appear to be made from the same atomic snapshot of the whole sharedmemory, excepted for (3) a transaction observes its own modifications (i.e. writes/allocs/frees).Example: if snapshot isolation had been satisfied in Figure 1, as Charlie’s transaction already read
v1 == 12, then the 2nd read from Charlie’s thread could not have read v2 == 20, which either: (1) isthe effect of a write in a non-committed transaction, if Bob’s transaction had not committed, or (2) belongs to a different snapshot, if Bob’s transaction had committed between the two reads.
❼ Atomicity ensures that all the memory writes/allocs/frees of any committed transaction seem tohave all happened at one indivisible point in time.Example: without atomicity, Bob’s transaction could successfully commit, and another transaction (e.g. Charlie’s transaction) could take a snapshot where only one of the two writes had been made. 5
When Alice’s program calls tx.free, the transactional memory library is not required to immediately call libc’s free.
4❼ Consistency ensures, for a committing transaction T, that: (1) none of the transactions thatcommitted since the atomic snapshot of T (a) freed a segment of memory read or written by T or(b) allocated a segment that overlaps with a segment allocated by T, and (2) each read made by T
in its snapshot would read the same value in a snapshot taken after the last6 committed transaction.Example: let’s consider again Figure 1. Both Bob’s and Charlie’s transactions run concurrently with
(in this example) the same snapshot: Bob has 12 CHF and Charlie 10 CHF. It is consistent for Bob’s
transaction to commit first, as there has been no committed transaction since Bob’s transaction
begun and thus the shared memory remained the same since the snapshot of Bob’s transaction.
Regarding Charlie’s committing transaction, which reads memory locations that were modified since its atomic snapshot, it would not be consistent to commit: Charlie’s transaction must be retried7 . Now if we consider the other example of Bob wiring Charlie while Dan wires Erin, it would be consistent for both transactions to commit, since they accessed different words of the shared memory. Side notes:
❼ There are several definitions of consistency in the literature. Here we provide a sensible definitionfor this project, while being arguably more precise/specific/actionablethan some other existingdefinitions (e.g. “Data is in a consistent state when a transaction starts and when it ends.” [3]).
❼ Point (3) of snapshot isolation is usually found in consistency.
1.3 Possible implementations (in English) This section describes the reference implementation and a possible transactional memory for this project.You are free to implement any other software transactional memory (see the rules in Section 2.1.3). Moreideas can be found while studying:
❼ TinySTM: https://github.com/patrickmarlier/tinystm
❼ LibLTX: https://sourceforge.net/projects/libltx
❼ stmmap: https://github.com/skaphan/stmmap
1.3.1 Using a coarse-lock (i.e. the “reference” implementation)
This implementation is very simple to describe: the transactional memory library uses a single mutex to
(trivially) serialize transactions made onto the shared memory.When a transaction begins (i.e. tm.begin is called), the single mutex is taken. When the transactionends, it always commits (i.e. never retries), and the single mutex is released. Read, write, allocation anddeallocation operations are directly mapped to reading/writing the memory at the requested addresses,and essentially calling libc’s malloc/free8 .This approach is obviously correct, as it directly consists in both (1) executing transactions seriallyand (2) ensuring new transactions see all the writes/allocs/frees from the last committed transaction.From the prism of the three notions laid above, snapshot isolation is guaranteed because no (concurrent) transaction can run to observe the memory effects of the pending transaction holding the lock,
until the pending transaction commits, and so, releases the lock. Atomicity is directly guaranteed by thelock, and ditto for consistency, since there cannot be two transactions running concurrently: the pendingtransaction is the only one able to alter the state of the shared memory.
1.3.2 Dual-versioned transactional memory
This transactional memory is only provided as a suggestion. You are free to look for other implementations in the litterature. It is not the fastest possible for the grading workload9 , and not even the simplest you can implement. You thus have two choices: 1) be guided and follow the pseudocode in this document, which will get you an STM that requires a good amount of non-trivial optimizations to get the maximum grade, 6The set of committed transactions must be linearizable, and for any valid linearization, the last committed transactionfor any (but the first) transaction T is simply the transaction that is considered to have happened right before T.7As long as tm end (see Section 2.2) for Bob’s transaction has not returned control, an implementation could as well,without violating linearizability, make Bob’s transaction abort and instead let Charlie’s transaction commit.8
The implementation also needs to keep track of the allocated segments, to prevent any memory leak: see Section 2.2.9Any reasonably optimized implementation will nevertheless be faster than the reference.
52) explore the world of transactional memories by yourself and implement a more efficient algorithm such
as TL2 or LSA, which should make getting the maximum grade easier. If you want to get help from the TAs, and are not that much of an explorer, it is recommended you stick with the algorithm described below. Also, only the high-level concepts and logic will be presented. Low-level details and optimizations,
e.g. actual memory layouts, are omitted on purpose, and for you to find.
Through the use of a very basic multiversionning scheme, this implementation aims at making surread-only transactions can both: run concurrently with read-write transactions, and never fail. Two copiesof every (aligned) word of the size of the requested alignment (see Section 2.2.2) are kept. When a trans
action reads/writes words in shared memory, the transactional library has to decide for each word whichof the two copies must be read/written, or if none can be read/written and the transaction must abort.One key structure that can considerably simplify the design (at the expense of performance though)
is what we will call the Thread Batcher (or more simply, the Batcher ). The goal of the Batcher is toartificially create points in time in which no transaction is running. Thisgoal might sound uncanny, butit is not: a mere mutex, as the one used in the reference implementation, creates such points in time.
Indeed, when the transaction that got the global lock releases it, no transaction is running.You can think of the Batcher as a special kind of mutex: while a mutex lets only one blocked threadenter and leave no matter how many threads were blocked waiting, the Batcher lets each and every blocked thread(s) enter together when the last thread (from the previous batch) leaves. So the interfaceof the Batcher is pretty much the same as the one of a mutex: enter and leave (analogous to lock anunlock), and one special function called get epoch that can only be called by a thread after it entered(and before it leaved) the Batcher. get epoch returns an integer that is unique to each batch of threads
e.g. a counter, incrementing when the last thread of each batch leaves, would do well).Batcher pseudo-code, assuming all functions execute in one indivisible point in ime(they are atomic):Data: counter : integer, remaining: integer, blocked: list of threadsallocate enough space for a segment of size size;if the allocation failed then return the allocated failed;
end
initialize the control structure(s) (one per word) in the segment;
nitialize each copy of each word in the segment to zeroes;
register the segment in the set of allocated segments;
return the transaction can continue;
end
fn free(target)mark target for deregistering (from the set of allocated segments)and freeing once the last transaction of the current epoch leavesthe Batcher, if the calling transaction ends up being committed;return the transaction can continue;
end 8fn commit()foreach written word index do defer the swap, of which copy for word index is the “valid”copy, just after the last transaction from the current epochleaves the Batcher and before the next batch starts running;end return the transaction has committed;
end As for the Batcher pseudo-code, we assume the functions outlined above all execute atomically. Thismeans that the execution of a function should not be affected by the concurrent execution of otherfunctions. Practically, this implies that you should use synchronization primitives to control the access
to shared variables. Numerous implementation details, e.g. how to access the control structure of each
word, how to keep track of the allocated segments, etc, are intentionally left completely unspecified.
For reference: (very optimized) implementations can reach speedups above ×2.5 against the grading
workload, and the implementation of the TA is less than 1000 lines long (all comments and spacesincluded). The proposed algorithms can be extended with speculative execution of read-only transactions(for you to figure out) to get a higher speedup or to compensate for the lack of optimizations. You shouldlook for such optimizations after getting an initial version that works correctly.
2 The project
You are expected to be reasonably fluent in C11 or C++17 for this project. Picking one or the othershouldn’t impact the performance of your implementation (granted you don’t use overly expensive abstractions). Pick the one you’re the most familiar with. If you’re unsure, we advise you to stick to C11.
A (very brief) introduction to the memory model of C11/C++11, along with available atomic operations,are provided in Section 3.
2.1 Practical information
This project is associated with a git repository, hosted on GitHub. It will be available at https://github.com/LPD-EPFL/CS453-2024-project when the semester starts. Its layout is the following:grading/ Directory containing the source code of the evaluation program, to test your solution on
your own machine. The evaluation server (see below) runs the exact same binary, possibly with aseed. Section 2.1.1 further details how to test and later submit your code.include/ C11 and C++17 header files that define the public interface of the STM.template/ Template (in C11) of your implementation, but you are free to replace everything andwrite C++17 instead. See section 2.1.3 for more information.reference/ Reference, single lock-based implementation (see Section 1.3.1).
Note that you are not allowed to write an implementation that is equivalent to this referenceimplementation. See section 2.1.3 for more information.
sync-examples/ Examples of how to use synchronization primitives (locks, condition variables andatomics).
.compare exchange strong(The only difference between the weak and strong version of a compare-and-swap is that the weak versionmay (spuriously) act as if the stored value was not the expected value, even if they were equal.)Each of these atomic operations either have: counterparts which names end in explicit and takeone (or two) more parameter(s) (in C11), or defaulted parameters (since C++11). These parameters
specify ordering constraints when the atomic operation is used to synchronize with concurrent threads.Understanding these notions, and using the ordering parameters, are optional for the project. Memory ordering is introduced in Section 3.2 only for completeness, and for students interested to know more.Real uses of these atomic operations are available in the provided repository, in class Sync in filegrading/grading.cpp, and in the two custom definitions of struct lock t in file reference/tm.c.
标签:transaction,transactions,implementation,transactional,memory,CS,Bob From: https://www.cnblogs.com/CSE2425/p/18609883