#define ROUND_UP(x, align) (((int)(x) + (align - 1)) & ~(align - 1)) #define MIN(a, b) (((a) < (b)) ? (a) : (b)) #define BLOCK_SZ 8 void transpose_submit(int M, int N, int A[N][M], int B[M][N]) { int i, j; for (i = 0; i < N; i += BLOCK_SZ) { for (j = 0; j < M; j += BLOCK_SZ) { int R = MIN(N - i, BLOCK_SZ); int C = MIN(M - j, BLOCK_SZ); int tmp[BLOCK_SZ][BLOCK_SZ]; int r, c; for (r = 0; r < R; r++) { for (c = 0; c < C; c++) { tmp[r][c] = A[i + r][j + c]; } } for (c = 0; c < C; c++) { for (r = 0; r < R; r++) { B[j + c][i + r] = tmp[r][c]; } } } } }
把A矩阵切分成8X8的Block,然后关键的地方来了,我们用一个8X8的局部数组存放这个Block,然后按列遍历,把Block写入到B矩阵。
标签:case,SZ,csapp,MIN,int,30,tmp,BLOCK,define From: https://www.cnblogs.com/james-ling/p/16727916.html