Compaq ECQD2KCTE Laptop User Manual


 
Software Considerations A–7
In a frequently executed loop, compilers could allocate the data items accessed from memory
so that, on each loop iteration, all of the memory addresses accessed are either in exactly the
same aligned 64-byte block or differ in bits VA<10:6>. For loops that go through arrays in a
common direction with a common stride, this requires allocating the arrays, checking that the
first-iteration addresses differ, and if not, inserting up to 64 bytes of padding between the
arrays. This rule will avoid thrashing in small direct-mapped data caches with block sizes up to
64 bytes and total sizes of 2K bytes or more.
Example:
REAL*4 A(1000),B(1000)
DO 60 i=1,1000
60 A( i ) = f(B( i ))
Figures A–3, A–4, and A–5 show bad, better, and best allocation in cache, respectively.
BAD allocation (A and B thrash in 8 KB direct-mapped cache):
Figure A–3: Bad Allocation in Cache
BETTER allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be
in cache simultaneously):
Figure A–4: Better Allocation in Cache
BEST allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be in
cache simultaneously, and both arrays fit entirely in 8 KB or bigger cache):
Figure A–5: Best Allocation in Cache
In a frequently executed loop, compilers could allocate the data items accessed from memory
so that, on each loop iteration, all of the memory addresses accessed are either in exactly the
same 8 KB page, or differ in bits VA<17:13>. For loops that go through arrays in a common
direction with a common stride, this requires allocating the arrays, checking that the first-itera-
AB
0
4K
8K 16K12K
AB
0 4K 8K+64 16K12K
AB
0 4K-64 8K 16K12K