Forum > Discussion Forum
Single vs Dual memory channel - no effect ?
Raistmer:
I tested fully loaded by CPU AP quad with single memory module installed (2 GB) vs dual-channel enabled config (4GB).
It seems there is almost no difference. AstroPulse should be very L2-cache hungry so cache misses (and delays because of memory accesses) should be pretty often if all cores do AstroPulses. Nevertheless I see no benefits from dual-channel config...
Any thoughts?
Richard Haselgrove:
Try it again with CPU MB VHAR - then I think you'll see a difference!
I think all you've proved is that AP does too little memory transfer to saturate even a single-channel bus.
_heinz:
Hi Raistmer,
keep in mind that you fill all two or four slots with single or dual-mode memory.
Remember on my Xeon, at first I filled still two slots and get 6000, after fill all 4 slots I get 12000 throughput
with 2 modules the quadcannel works still as dualchannel.
Raistmer:
I have dual-channel motherboard AFAIK, not quad-channel.
Gigabyte GA-Q35M-S2
Jason G:
Another good test would be to see if relaxing the memory timings completes the picture (lengthening the elapsed time or not).
IMO, what makes our 45nM chips 'difficult' is the intelligent prefetchers, If the TLB entries for the dataset are still resident in the page table, then memory fetches for single or dual channel will be bound by the prefetcher speed itself, rather than the memory subsystem directly.
For our 45nM Core2's we really have obscenely large cache relative to the dataset size, and are accessing fairly linear, medium sized datasets, so really we should be 'moslty' L2 bound for single task. The dual channel is presumably interleaved, so it should halve the effective latency into L2, which to my mind improved caching accounts for the large difference to earlier Intel implementations, by our chips. It's the hyperthreaded p4s & other smaller cache designs where the thrashing becomes a problem.
So I think you've proved the application matches our chips better than older ones (Dual channel or not).
A good indicator with latency (instead of bandwidth) is elapsed time variation, since you can get 'lucky' some of the time:
Dual channel, while offering enhanced prefetchability, doesn't add more 'read/write ports' to the CPU cores unfortunately. Min to max elapsed variation there, for single channel I make to be 3079 - 3044.5 seconds = 34.5 seconds ( a bit over 1%, Doesn't sound like much does it ? )
But,
Contrast this with the dual channel elapsed variation:
3052.5 - 3031 = 21.5 seconds
So a full one third of the elapsed time variation has been removed ... Still doesn't sound like much, *except* that This can mean that a big difference over a long run.
That IMO, is where the Dual channel benefit is supported by your numbers, in reducing the 'Worst case' scenario, not the 'best case' (or even the Average case much)
Navigation
[0] Message Index
[#] Next page
Go to full version