CS: APP--Chapter05: optimizing program performance (part 2)
标签(空格分隔): CS:APP
目录8. loop uprolling
Without further ado, one sentence concludes what it is :
Loop uprolling reduces the number of overall iterations by increasing the number of elements computed on each iteration.
The result of gauging its performance reveals that it is closer to the latency bounds. One thing highlighted here is also latency bounds because the number of iterations also is n, But the idea of parallelism just hits these bounds.
9. enhancing parallelism
The modern processor allows multiple operations of same type to be executed simultaneously, denoted by capacity.
what we are going to do is break the data dependency and get performance better than the latency bounds.
9.1 multiple accumulators
In the I7 core processor, it provides two function units for float multiplication. We can take full advantage of them to execute two float multiplication simultaneously.
标签:optimizing,--,APP,program,CS,performance From: https://www.cnblogs.com/UQ-44636346/p/17045324.htmlOne knowledge point emphasized here: float arithmetic operations are not in accordance with associative and communitive law.