Forum - STREAM Benchmark Results for different Vector Lengths

Overview > Topics > Others > STREAM Benchmark Results for different Vector Lengths

[#91]

Hello,

im am currently in the process of investigating the relationship between vector length and memory bandwidth on the SX-Aurora. For that purpose I changed the for loops of the various kernels to include the vector length. It then looks like this:
for (j=0; j<STREAM_ARRAY_SIZE; j+=VLEN){

 #pragma _NEC nounroll
 #pragma _NEC vector_threshold(1)
 for(int i = 0; i < VLEN; ++i){
   Copy, Scale, Add or Triad Kernel here
 }

}
What I initially expected was a linear relationship between the bandwidth and vector length. Meaning that for a length of 256 I can reach the official ~1229 GB/s or a value close to it. And then for example for length 64 I expected 1/4 of that peak or the measured value.
But the results are very much different. You can see them in the following table

Stream Benchmark Results, Array Size of 2.2 GiB, LLC Hitrate <0.01%
Vector Length	Copy GiB	Scale GiB	Add GiB	Triad GiB	Best result converted to GB	Best as % of 1229GB/s	Best as % of real value at 256	Linear Relationship assumes GB/s
1	21.0967	20.1343	24.5327	26.8802	28.8624	2.3484	2.7469	4.8
2	43.9901	45.0213	54.4394	54.7388	58.7753	4.7824	5.5938	9.6
4	94.2324	94.4371	106.0623	107.2386	115.1466	9.3691	10.9588	19.2
8	182.8993	179.9025	182.5055	191.794	204.9372	16.675	19.5997	38.4
16	309.3943	302.7845	324.9520	316.2976	348.9146	28.3901	33.2072	76.8
32	451.3887	448.9818	485.1928	495.0939	531.603	43.2549	50.5942	153.6
64	754.8973	730.8980	896.0194	896.5940	962.7105	78.3328	91.624	307.2
128	984.1308	983.8855	977.8357	980.4262	1056.7024	85.9867	100.5695	614.4
256	958.9995	959.8635	973.6647	978.5581	1050.718	65.4938	100	1229

I have validated these results over multiple runs of the benchmarks, they only vary slightly. And I am now left with the question as to what causes these results. The official documentation of the hardware and memory systems gives no hints as to why this happens. And for my work I have to find the true reason to explain this behaviour. Which means I now have the following questions:

What causes this nonlinear relationship between memory bandwidth and vector length? There are probably hardware reasons for it.
And how can it be modeled realistically?

Thank you for your help

Posted by CPTSulu on 17 October 2022 at 10:17.
Edited by CPTSulu on 17 October 2022 at 10:21.

Forum - STREAM Benchmark Results for different Vector Lengths

Navigation menu

Search