Chapter 9 Optimizing with SIMD Instructions 205
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
Btr_ptr -= 32;
// The addresses Aptr0, Aptr1, Aptr2, and Aptr3 need to be
// incremented to the next block of 4 rows of A to be dotted
// upon B's column. 4 rows of A are 128 doubles in size, and
// in the n-loop above they were incremented by 32 already, so they
// must be incremented an additional 96 to point to the
// next 4 rows of A to be dotted.
Aptr0 += 96;
Aptr1 += 96;
Aptr2 += 96;
Aptr3 += 96;
}
// Pointer to B-transpose is incremented by a row so as to point
// to the next row of B upon which matrix A needs to be multiplied.
Btr_ptr += 32;
}
}