Author Topic: Single vs Dual memory channel - no effect ? (Read 12114 times)

Raistmer · « **on:** 14 Nov 2009, 06:34:45 pm »

I tested fully loaded by CPU AP quad with single memory module installed (2 GB) vs dual-channel enabled config (4GB).
It seems there is almost no difference. AstroPulse should be very L2-cache hungry so cache misses (and delays because of memory accesses) should be pretty often if all cores do AstroPulses. Nevertheless I see no benefits from dual-channel config...

Any thoughts?

Richard Haselgrove · « **Reply #1 on:** 14 Nov 2009, 07:37:36 pm »

Try it again with CPU MB VHAR - then I think you'll see a difference!

I think all you've proved is that AP does too little memory transfer to saturate even a single-channel bus.

_heinz · « **Reply #2 on:** 14 Nov 2009, 08:23:31 pm »

Hi Raistmer,
keep in mind that you fill all two or four slots with single or dual-mode memory.
Remember on my Xeon, at first I filled still two slots and get 6000, after fill all 4 slots I get 12000 throughput
with 2 modules the quadcannel works still as dualchannel.

Raistmer · « **Reply #3 on:** 14 Nov 2009, 08:34:27 pm »

I have dual-channel motherboard AFAIK, not quad-channel.
Gigabyte GA-Q35M-S2

Jason G · « **Reply #4 on:** 15 Nov 2009, 12:29:00 am »

Another good test would be to see if relaxing the memory timings completes the picture (lengthening the elapsed time or not).

IMO, what makes our 45nM chips 'difficult' is the intelligent prefetchers, If the TLB entries for the dataset are still resident in the page table, then memory fetches for single or dual channel will be bound by the prefetcher speed itself, rather than the memory subsystem directly.

For our 45nM Core2's we really have obscenely large cache relative to the dataset size, and are accessing fairly linear, medium sized datasets, so really we should be 'moslty' L2 bound for single task. The dual channel is presumably interleaved, so it should halve the effective latency into L2, which to my mind improved caching accounts for the large difference to earlier Intel implementations, by our chips. It's the hyperthreaded p4s & other smaller cache designs where the thrashing becomes a problem.

So I think you've proved the application matches our chips better than older ones (Dual channel or not).

A good indicator with latency (instead of bandwidth) is elapsed time variation, since you can get 'lucky' some of the time:
Dual channel, while offering enhanced prefetchability, doesn't add more 'read/write ports' to the CPU cores unfortunately. Min to max elapsed variation there, for single channel I make to be 3079 - 3044.5 seconds = 34.5 seconds ( a bit over 1%, Doesn't sound like much does it ? )

But,
Contrast this with the dual channel elapsed variation:
3052.5 - 3031 = 21.5 seconds

So a full one third of the elapsed time variation has been removed ... Still doesn't sound like much, *except* that This can mean that a big difference over a long run.

That IMO, is where the Dual channel benefit is supported by your numbers, in reducing the 'Worst case' scenario, not the 'best case' (or even the Average case much)

Raistmer · « **Reply #5 on:** 15 Nov 2009, 03:59:00 am »

Quote

That IMO, is where the Dual channel benefit is supported by your numbers, in reducing the 'Worst case' scenario

Unfortunately, there is another factor that can cause slowdown it seems.
[That is, I think I just didn't see worst case for dual channel still, cause I rebooted OS recently]
I've seen speed degradation when system is up for long time for hybrid build. There this effect pretty big. But it seems it presents for CPU-only app too.
So I would not trust timings that were recived w/o OS rebooting (or count them separately from "just rebooted" ones).
Why exactly I see such speed degradation - it's another (and pretty important) question...
[ ADDON: windows memory pool becomes fragmented??? ]

And yes, it seems CPU AP not much sensitive to memory speed directly (on my CPU). I trying to study what system components have big impact on SETI performance and what not.
For now it seems (at least for AP) that one can save money and buy usual memory, not the best possible, provided he managed to get CPU with really big cache

Today I will complete experiment proposed by Richard and post new data.

Jason G · « **Reply #6 on:** 15 Nov 2009, 05:08:33 am »

Quote from: Raistmer on 15 Nov 2009, 03:59:00 am

...
Why exactly I see such speed degradation - it's another (and pretty important) question...
[ ADDON: windows memory pool becomes fragmented??? ]
...

Hmmm, yes I don't really see with with CPU app & long uptime, so will give it some thought. Yes I think aggressive memory management of windows might be related ( which vmm gets less aggressive with successive Windows versions) so heap management accumulating stale crap could be an issue, which is why I was originally trying to bypass a layer by avoiding CRT & using vmm directly instead.

Just a theory: TLB Misses for code (rather than data), caused by accumulated entires, could be exacerbating front end stalls, which would indeed IMO suffer after longer uptime due to later windows not so aggressively paging out, so accumulating more stale entries for rarely accessed services & drivers etc. Over Christmas, will be putting some special performance counters to see if we're getting certain kinds of stalls in the decoding that are prone to happen on Core2, If not then will keep looking for the culprit. If front end stalls is a problem (OS/uptime induced or otherwise), then might need to do a bit more reading on how to rectify the situation. Not sure what part new OS features like Super-fetch might play, but will have a chance to look in a couple of weeks, when migrating to Win7.

Jason

Raistmer · « **Reply #7 on:** 15 Nov 2009, 06:35:37 am »

In my current config swap enabled (

usually I disable it completely if host has enough memory) and SuperFetch stopped (setted to manual instead of automatic).

Author Topic: Single vs Dual memory channel - no effect ? (Read 12114 times)

Raistmer

Single vs Dual memory channel - no effect ?

Richard Haselgrove

Re: Single vs Dual memory channel - no effect ?

_heinz

Re: Single vs Dual memory channel - no effect ?

Raistmer

Re: Single vs Dual memory channel - no effect ?

Jason G

Re: Single vs Dual memory channel - no effect ?

Raistmer

Re: Single vs Dual memory channel - no effect ?

Jason G

Re: Single vs Dual memory channel - no effect ?

Raistmer

Re: Single vs Dual memory channel - no effect ?