基本介绍从这个link看的:https://www.techcenturion.com/nvidia-cuda-cores/
其中,抽象上这里表述较好理解:
Let us consider an example to understand the working of CUDA cores.
Think of the processor as a water tank. If you want to empty the tank, you will need to make use of pipes.
If you connect more numbers of pipes, then naturally you will be able to empty the tank faster. CUDA cores act like these pipes to the processor. More number of CUDA cores means that the processing can be done at a much faster rate.
cuda core基本结构
第一代nvidia gpu是fermi架构,拥有512个cuda cores,16个SM所以每个SM有32个cuda cores。这时,cuda core是一个浮点数运算单元和一个整数运算单元。
到了maxwell架构和pascal架构,整数运算单元中去掉了复杂的矩阵乘运算单元。
到了turing架构,每个SM中的cuda cores第一次减少了,以往都是增长的。但是减少cuda cores的目的是为了在元器件中加入ray-tracing和tensor core单元,这也是这两个元件第一次加入到gpu。此外,在turing架构中,首次将整数运算单元和浮点数运算单元拆开了。
到了ampere架构,每个cuda core包含了两个浮点数运算单元。这代架构的一个重要改动是,浮点数运算单元和整数运算单元可以在一个指令周期完成,比如(1xfp32+1xint32)/cycle或者(2xfp32+1xint32)/cycle。以前的cuda core中,一个cycle只能完成一个整数或者一个浮点数,不能同时。
总结起来:
- More number of CUDA cores means more data can be processed parallelly.
- More clock speed means that a single core can perform much faster.
- The GPUs get better with new generations and architectures, so a graphic card with more number of CUDA cores is not necessarily more powerful than the one with lesser CUDA cores.
文章结尾写道(真谛):
As the developers start understanding the newer architectures better, they can better optimize their games and programs to further boost the performance.
备注:以上所说的浮点数运算单元是指fp32的运算单元。
标签:core,运算,浮点数,cuda,cores,单元 From: https://www.cnblogs.com/ijpq/p/16844456.html