Author Topic: V8 Optimized App (Read 78576 times)

RottenMutt · « **Reply #15 on:** 11 Oct 2007, 03:10:53 am »

I was thinking that dual channel would bond two dimms to a single memory address for a 128bit word, and i though sandra would report it that way...
and for 5000x would yeild two 128bit channels...

Jason G · « **Reply #16 on:** 11 Oct 2007, 05:34:48 am »

if they did that, that would make single channel 128 bits

, we want dual 64 bit channels

If 1 memory address accessed 128 bits instead of 64, then I think the programs would have to change too[ and the sockets would need more pins I think]. Yes it is accessing 128 bits at a time.
Dual Channel = 2 separate 64 bit channels in paralell, doubling the bandwidth in the controller giving effective bandwidth of 128 bits. At a guess I don't think , gluing 64 bits from one channel, to the other 64 bits channel, and synchronising the two channels, would be any more efficient that one channel + one channel, but then again I'm not a motherboard designer

Jason G · « **Reply #17 on:** 11 Oct 2007, 05:50:49 am »

Anyway, Are your paired modules in matching slots? and dual [or must be dual interleaved at that bandwidth?] channel mode set in Bios?

[ It looks like you are running Dual Channel Interleaved correctly to me (maybe not check BIOS). That~12 GB total memory bandwidth looks like a lot more than DDR2 dual channel mode... but should you be higher?]
for PC2-5400 DDR2-SDRAM, roughly theoretical:

Quote

DDR 2 single channel ~5 GB/s
DDR 2 dual channel ~10 GB/s
DDR 2 dual channel interleaved ~??GB/s, Dual channel + a bit extra

[Just looked at the manual for your mobo, looks like you have them in the right sockets, and it wouldn't work any other way, EEEK, they are all the same colour

]

Quote

Interleaved memory is supported when pairs of DIMM modules are
installed in both Branch 0 and Branch 1.

which you have so I think it's good to go. [You'll want to check the 'Branch Mode" setting in BIOS is set to "Interleave" though] I would test with memtest86+ and look at the speed with different interleave ratio settings to find what's fastest ( 1:1, 1:4 etc..)

RottenMutt · « **Reply #18 on:** 18 Oct 2007, 01:13:05 am »

new memory, but still not much better...
why is everest memory reads slow?

[attachment deleted by admin]

RottenMutt · « **Reply #19 on:** 18 Oct 2007, 01:20:59 am »

Sandra memory benchmark

[attachment deleted by admin]

Jason G · « **Reply #20 on:** 18 Oct 2007, 02:15:07 am »

Hi again,
Do the FB-DIMM RAM sticks have 9 x chips (single rank), or 18 x chips (dual rank) per stick ? [This could be important] Either way You are beating the listed 5000 chipset reference machine. You might squeeze some more performance out looking at the timings ... [but remember that every benchmark software is different and only synthetic, I think they design them to defeat the Ram buffers making FB Dimm look slower, when in real applications they could be faster for some things]

I remember reading the Mac Pros like to run fastest [low latency, but less than maximum bandwidth] with 4 x dual rank fb dimms. they said that loading all 8 slots added some latency but gave more bandwidth.

NOTE the slower EVEREST benchmark shows you are running single channel! If that isn't an Error in Everest, you need to fix that if you haven't already! the sandra one looks OK so I can't see a reason for it.(maybe Check BIOS anyway - dual channel, interleave, and the branch mode I think too)

Have you maybe got Boinc running during the benchmark?

What does CPU-Z say ? (cpu pages and Memory Pages ?)

just remember the opterons seem to be using consumer DDR2 not server FB Dimms [ Or could they be RAMBUS RIMMS ? $$$

] , and have on die memory controller. (that's for playing games isn't it? ,

, Maybe if seti could use all the memory bandwidth then there would be more opterons in the top 100 computers list )

Jason G · « **Reply #21 on:** 18 Oct 2007, 03:28:29 am »

I Just realised another possibility why maybe the slower Everest Benchmark. It is possible driving the memory hard you get more correctable ECC Errors. the correction adds some latency I think. Maybe if you back off on the RAM a bit and check its cooling it might actually speed up some more. I think memtest86 [But don't know for sure] should show if ECC error correction is happening a lot.

RottenMutt · « **Reply #22 on:** 18 Oct 2007, 09:35:39 pm »

Quote from: j_groothu on 18 Oct 2007, 02:15:07 am

It is possible driving the memory hard you get more correctable ECC Errors. the correction adds some latency I think.

bios logs ecc errors and it hasn't with the new memory. i'm wondering if the current bios has something wrong???

Quote

Do the FB-DIMM RAM sticks have 9 x chips (single rank), or 18 x chips (dual rank) per stick ?

dual rank i do believe

Quote

NOTE the slower EVEREST benchmark shows you are running single channel!

why do you say that? i suspect it may be true.

Quote

Have you maybe got Boinc running during the benchmark?

no

Quote

What does CPU-Z say ? (cpu pages and Memory Pages ?)

cpuz doesn't say much of anything

[attachment deleted by admin]

RottenMutt · « **Reply #23 on:** 18 Oct 2007, 09:36:40 pm »

memory tab

[attachment deleted by admin]

RottenMutt · « **Reply #24 on:** 18 Oct 2007, 09:40:16 pm »

Everest spd info shows 2 rank

[attachment deleted by admin]

Jason G · « **Reply #25 on:** 19 Oct 2007, 03:32:01 am »

Looks Jolly good to me. The Everest benchmark you showed before , I think was faulty or there was a wrong setting, with a blue window did say "Single Channel DDR2-760FB SDRAM", So I think it is just wrong, the other things you show no problem.... If you think something about bios I'd be checking anyway.

Everything I can find on different ram configurations tested with with Fully Buffered DIMMS on similar systems (like the Mac Pro) is suggesting that the way you have it (dual rank, 4 slots) will show less than maximum bandwidth , but much lower(faster) latency. That is good

Fast Latency is more important for small random memory accesses (Like seti) , and high bandwidth is more important for database servers and stuff. (depending what they store )

so what you have will be the fastest combination for workstation / crunching use I reckon.

I think adding more sticks would increase bandwidth but raise latency too, making that more suitable for a high capacity database server that accesses big blocks of continuous data. (less suitable for seti / workstation]

All my opinions to be taken with a grain of salt, until you've worked out what's best for you

If you can find a way to measure ECC correction errors, Like the Mac Pro Has, Then you can tighten the timing (latencies) until it squeals then back off a bit. Of course that depends on how much control you have to start with.

Jason

RottenMutt · « **Reply #26 on:** 19 Oct 2007, 10:21:45 am »

Quote from: j_groothu on 19 Oct 2007, 03:32:01 am

Looks Jolly good to me. The Everest benchmark you showed before , I think was faulty or there was a wrong setting, with a blue window did say "Single Channel DDR2-760FB SDRAM"...

Good catch i totally missed it. I use to get 7000MB/s read in Everest, now i don't. I checked the bios and everything is set correctly, then i checked the DMI events and there were ECC errors:( I'm RMA'ing the board...

Jason G · « **Reply #27 on:** 19 Oct 2007, 07:02:34 pm »

Quote from: RottenMutt on 19 Oct 2007, 10:21:45 am

I use to get 7000MB/s read in Everest, now i don't. I checked the bios and everything is set correctly, then i checked the DMI events and there were ECC errors:( I'm RMA'ing the board...

LOL, the old 'When in doubt chuck it out" methodology. It will be interesting to compare a benchmark of new board against the data you have already, with everything else set the same. I have not seen measured anywhere the cost of the ECC errors on speed, definitely reliability though.

Was there indication whether they were "hard uncorrectable" ECC Errors ? Or were they "ECC correction events" ?

Gecko_R7 · « **Reply #28 on:** 19 Oct 2007, 10:24:48 pm »

Quote from: j_groothu on 19 Oct 2007, 03:32:01 am

Fast Latency is more important for small random memory accesses (Like seti) , and high bandwidth is more important for database servers and stuff. (depending what they store )

I think adding more sticks would increase bandwidth but raise latency too, making that more suitable for a high capacity database server that accesses big blocks of continuous data. (less suitable for seti / workstation]

Jason

Noticed your comment regarding latency.

So, on a Q6600 Quad for example, Seti would respond better w/ DDR2-800 @ CL-3 than cranking to higher bandwidth, say DDR2-1200 but having to run CL5?

Is this right?

Jason G · « **Reply #29 on:** 19 Oct 2007, 11:11:15 pm »

Quote from: Gecko_R7 on 19 Oct 2007, 10:24:48 pm

Noticed your comment regarding latency.

So, on a Q6600 Quad for example, Seti would respond better w/ DDR2-800 @ CL-3 than cranking to higher bandwidth, say DDR2-1200 but having to run CL5?

Is this right?

Conceptually I guess yes. Practically It would depend on what things you were using the machine for etc... Accessing small amounts of data spread randomly through Ram would benefit from lower latency (fast starting for a transfer) more than improved bandwidth (but slower starting). That "probably" makes general sense for non-server ram/mobos/buffered ram too, but as I see quoted here often "your mileage may vary".

That's why I have a problem with going by the bandwidth benchmarks as a performance guide. They would tend to use large blocks of data similar to streaming database [video] content or something like that. And these don't seem to make much mention of latency (access startup time) at all.

The statement I made was really regarding a specific combination of fully buffered Dimms on a Mac Pro similar motherboard. As I understand it (which may be wrong) These use a memory branching structure with serial links to interleave extra slots (giving a total of eight slots). Roughly understood, When these are all populated, the latency on each pair is increased by one (1) .. giving a slower start access, but more bandwidth due to interleaving structure.

With only four slots populated, in the correct combination, the latency remains at the original (fast value), but the bandwidth is less.

This may be all completely irrelevant for Q6600, which uses ordinary dual channel [not FB Dimms or branch interleaveing]. so should give the same latency whichever slots are filled [provided dual channel slot match is observed].

The conceptual argument/possibility of lower latency being preferred for seti type applications remains though (as you point out). [And you or I probably wouldn't be the first to suggest that backing off on amount of, and bandwidth of, ram but going for fastest(lowest) latency, might be a good idea for a workstation / cruncher]

Jason

[PS: As a guesstimate , if 3 cycle latency and 400Mhz (DDR2-800) is 7.5ns , and 5 cycle latency at 600MHz (ddr2-1200) is 8.33ns then a small access will start about 10 % faster with the low latency ddr2 800.

So if I was just doing seti and checking my emails I'd go the ddr2-800 low latency,

If I was editing video, I'd go for the higher bandwidth ddr2-1200. ]

Author Topic: V8 Optimized App (Read 78576 times)

RottenMutt

Re: V8 Optimized App

Jason G

Re: V8 Optimized App

Jason G

Re: V8 Optimized App

RottenMutt

Re: V8 Optimized App

RottenMutt

Re: V8 Optimized App

Jason G

Re: V8 Optimized App

Jason G

Re: V8 Optimized App

RottenMutt

Re: V8 Optimized App

RottenMutt

Re: V8 Optimized App

RottenMutt

Re: V8 Optimized App

Jason G

Re: V8 Optimized App

RottenMutt

Re: V8 Optimized App

Jason G

Re: V8 Optimized App

Gecko_R7

Re: V8 Optimized App

Jason G

Re: V8 Optimized App