+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: V8 Optimized App  (Read 47558 times)

Offline RottenMutt

  • Knight o' The Realm
  • **
  • Posts: 100
Re: V8 Optimized App
« Reply #15 on: 11 Oct 2007, 03:10:53 am »
I was thinking that dual channel would bond two dimms to a single memory address for a 128bit word, and i though sandra would report it that way...
and for 5000x would yeild two 128bit channels...

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: V8 Optimized App
« Reply #16 on: 11 Oct 2007, 05:34:48 am »
if they did that, that would make single channel 128 bits :D, we want dual 64 bit channels

If 1 memory address accessed 128 bits instead of 64, then I think the programs would have to change too[ and the sockets would need more pins I think]. Yes it is accessing 128 bits at a time.
Dual Channel = 2  separate 64 bit channels in paralell, doubling the bandwidth in the controller giving effective bandwidth of 128 bits.  At a guess I don't think , gluing 64 bits from one channel, to the other 64 bits channel, and synchronising the two channels, would be any more efficient that one channel + one channel, but then again I'm not a motherboard designer :D
« Last Edit: 11 Oct 2007, 08:00:57 am by j_groothu »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: V8 Optimized App
« Reply #17 on: 11 Oct 2007, 05:50:49 am »
Anyway, Are your paired modules in matching slots? and dual [or must be dual interleaved at that bandwidth?]  channel mode set in Bios?

[ It looks like you are running Dual Channel Interleaved correctly to me (maybe not check BIOS).  That~12 GB total memory bandwidth looks  like a lot more than DDR2 dual channel mode... but should you be higher?]
for PC2-5400 DDR2-SDRAM, roughly theoretical:
           
Quote
DDR 2 single channel ~5 GB/s
            DDR 2 dual channel ~10 GB/s
            DDR 2 dual channel interleaved ~??GB/s, Dual channel + a bit extra

[Just looked at the manual for your mobo,  looks like you have them in the right sockets, and it wouldn't work any other way,  EEEK, they are all the same colour :o ]

Quote
Interleaved memory is supported when pairs of DIMM modules are
installed in both Branch 0 and Branch 1.
which you have so I think it's good to go. [You'll want to check the 'Branch Mode" setting in BIOS is set to "Interleave" though] I would test with memtest86+ and look at the speed with different interleave ratio settings to find what's fastest  ( 1:1, 1:4 etc..)

« Last Edit: 11 Oct 2007, 09:01:11 am by j_groothu »

Offline RottenMutt

  • Knight o' The Realm
  • **
  • Posts: 100
Re: V8 Optimized App
« Reply #18 on: 18 Oct 2007, 01:13:05 am »
new memory, but still not much better...
why is everest memory reads slow?


[attachment deleted by admin]
« Last Edit: 18 Oct 2007, 01:19:35 am by RottenMutt »

Offline RottenMutt

  • Knight o' The Realm
  • **
  • Posts: 100
Re: V8 Optimized App
« Reply #19 on: 18 Oct 2007, 01:20:59 am »
Sandra memory benchmark

[attachment deleted by admin]

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: V8 Optimized App
« Reply #20 on: 18 Oct 2007, 02:15:07 am »
Hi again,
   Do the FB-DIMM RAM sticks have 9 x chips (single rank), or 18 x chips (dual rank) per stick ? [This could be important]  Either way You are beating the listed 5000 chipset reference machine. You might squeeze some more performance out looking at the timings ...  [but remember that every benchmark software is different and only synthetic, I think they design them to defeat the Ram buffers making FB Dimm look slower, when in real applications they could be faster for some things]

I remember reading the Mac Pros like to run fastest [low latency, but less than maximum bandwidth] with 4 x dual rank fb dimms. they said that loading all 8 slots added some latency but gave more bandwidth.

NOTE the slower EVEREST benchmark shows you are running single channel!  If that isn't an Error in Everest, you need to fix that if you haven't already! the  sandra one looks OK so I can't see a reason for it.(maybe Check BIOS anyway  - dual channel, interleave, and the branch mode I think too)

Have you maybe got Boinc running during the benchmark?

What does CPU-Z say ?  (cpu pages and Memory Pages ?) 

 just remember the opterons seem to be using consumer DDR2 not server FB Dimms [ Or could they be RAMBUS RIMMS ? $$$  :o]  , and have on die memory controller.  (that's for playing games isn't it? , :P , Maybe if seti could use all the memory bandwidth then there would be more opterons in the top 100 computers list )
« Last Edit: 18 Oct 2007, 07:17:43 am by j_groothu »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: V8 Optimized App
« Reply #21 on: 18 Oct 2007, 03:28:29 am »
I Just realised another possibility why maybe the slower Everest Benchmark.  It is possible driving the memory hard you get more correctable ECC Errors. the correction adds some latency I think.   Maybe if you back off on the RAM a bit and check its cooling it might actually speed up some more. I think memtest86 [But don't know for sure] should show if ECC error correction is happening a lot.
« Last Edit: 18 Oct 2007, 04:13:32 am by j_groothu »

Offline RottenMutt

  • Knight o' The Realm
  • **
  • Posts: 100
Re: V8 Optimized App
« Reply #22 on: 18 Oct 2007, 09:35:39 pm »

It is possible driving the memory hard you get more correctable ECC Errors. the correction adds some latency I think.
bios logs ecc errors and it hasn't with the new memory.  i'm wondering if the current bios has something wrong???

Quote
Do the FB-DIMM RAM sticks have 9 x chips (single rank), or 18 x chips (dual rank) per stick ?
dual rank i do believe

Quote
NOTE the slower EVEREST benchmark shows you are running single channel!
why do you say that?  i suspect it may be true.

Quote
Have you maybe got Boinc running during the benchmark?
no

Quote
What does CPU-Z say ?  (cpu pages and Memory Pages ?)
cpuz doesn't say much of anything

[attachment deleted by admin]

Offline RottenMutt

  • Knight o' The Realm
  • **
  • Posts: 100
Re: V8 Optimized App
« Reply #23 on: 18 Oct 2007, 09:36:40 pm »
memory tab

[attachment deleted by admin]

Offline RottenMutt

  • Knight o' The Realm
  • **
  • Posts: 100
Re: V8 Optimized App
« Reply #24 on: 18 Oct 2007, 09:40:16 pm »
Everest spd info shows 2 rank

[attachment deleted by admin]

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: V8 Optimized App
« Reply #25 on: 19 Oct 2007, 03:32:01 am »
Looks Jolly good to me.  The Everest benchmark you showed before , I think was faulty or there was a wrong setting, with a blue window did say "Single Channel DDR2-760FB SDRAM",  So I think it is just wrong,  the other things you show no problem.... If you think something about bios I'd be checking anyway.

Everything I can find on different ram configurations tested with with Fully Buffered DIMMS on similar systems (like the Mac Pro)  is suggesting that the way you have it (dual rank, 4 slots) will show less than maximum bandwidth ,  but much lower(faster) latency.  That is good :D 

Fast Latency is more important for small random memory accesses (Like seti) , and  high bandwidth is more important for database servers and stuff. (depending what they store )

so what you have will be the fastest combination for workstation / crunching use I reckon.

I think adding more sticks would increase bandwidth but raise latency too, making that more suitable for a high capacity database server that accesses big blocks of continuous data. (less suitable for seti / workstation]

All my opinions to be taken with a grain of salt, until you've worked out what's best for you  ;)

If you can find a way to measure ECC correction errors,  Like the Mac Pro Has,  Then you can tighten the timing (latencies)  until it squeals then back off a bit.  Of course that depends on how much control you have to start with.

Jason



Offline RottenMutt

  • Knight o' The Realm
  • **
  • Posts: 100
Re: V8 Optimized App
« Reply #26 on: 19 Oct 2007, 10:21:45 am »
Looks Jolly good to me.  The Everest benchmark you showed before , I think was faulty or there was a wrong setting, with a blue window did say "Single Channel DDR2-760FB SDRAM"...
Good catch i totally missed it.  I use to get 7000MB/s read in Everest, now i don't.  I checked the bios and everything is set correctly, then i checked the DMI events and there were ECC errors:(  I'm RMA'ing the board...

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: V8 Optimized App
« Reply #27 on: 19 Oct 2007, 07:02:34 pm »
I use to get 7000MB/s read in Everest, now i don't.  I checked the bios and everything is set correctly, then i checked the DMI events and there were ECC errors:(  I'm RMA'ing the board...
LOL, the old 'When in doubt chuck it out" methodology. It will be interesting to compare a benchmark of new board against the data you have already, with everything else set the same.  I have not seen measured anywhere the cost of the ECC errors on speed, definitely reliability though. 

Was there indication whether they were "hard uncorrectable" ECC Errors ?  Or were they "ECC correction events" ?

Gecko_R7

  • Guest
Re: V8 Optimized App
« Reply #28 on: 19 Oct 2007, 10:24:48 pm »

Fast Latency is more important for small random memory accesses (Like seti) , and high bandwidth is more important for database servers and stuff. (depending what they store )

I think adding more sticks would increase bandwidth but raise latency too, making that more suitable for a high capacity database server that accesses big blocks of continuous data. (less suitable for seti / workstation]

Jason

Noticed your comment regarding latency.

So, on a Q6600 Quad for example, Seti would respond better w/ DDR2-800 @ CL-3 than cranking to higher bandwidth, say DDR2-1200 but having to run CL5?

Is this right?

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: V8 Optimized App
« Reply #29 on: 19 Oct 2007, 11:11:15 pm »
Noticed your comment regarding latency.

So, on a Q6600 Quad for example, Seti would respond better w/ DDR2-800 @ CL-3 than cranking to higher bandwidth, say DDR2-1200 but having to run CL5?

Is this right?

Conceptually I guess yes. Practically It would depend on what things you were using the machine for etc...  Accessing small amounts of data spread randomly through Ram would benefit from lower latency (fast starting for a transfer)  more than improved bandwidth (but slower starting).  That "probably" makes general sense for non-server ram/mobos/buffered ram  too, but as I see quoted here often "your mileage may vary".

That's why I have a problem with going by the bandwidth benchmarks as a performance guide.  They would tend to use large blocks of data similar to streaming database [video] content or something like that.  And these don't seem to make much mention of latency (access startup time)  at all.

The statement I made was really regarding a specific combination of fully buffered Dimms on a Mac Pro similar motherboard.  As I understand it (which may be wrong) These use a memory branching structure  with serial links to interleave extra slots (giving a total of eight slots).  Roughly understood,  When these are all populated,  the latency on each pair is increased by one (1) .. giving a slower start access,  but more bandwidth due to interleaving structure. 

With only four slots populated, in the correct combination, the latency remains at the original (fast value), but the bandwidth is less.

This may be all completely irrelevant for Q6600, which uses ordinary dual channel [not FB Dimms or branch interleaveing]. so should give the same latency whichever slots are filled [provided dual channel slot match is observed].

The conceptual argument/possibility of lower latency being preferred for seti type applications remains though (as you point out).  [And you or I probably wouldn't be the first to suggest that backing off on amount of, and bandwidth of, ram but going for fastest(lowest) latency, might be a good idea for a workstation / cruncher]

Jason

[PS:  As a guesstimate , if 3 cycle latency and 400Mhz (DDR2-800)  is 7.5ns , and 5 cycle latency at 600MHz (ddr2-1200) is 8.33ns then a small access will start about 10 % faster  with the low latency ddr2 800.

So if I was just doing seti and checking my emails I'd go the ddr2-800 low latency,

If I was editing video, I'd go for the higher bandwidth ddr2-1200. ]

« Last Edit: 19 Oct 2007, 11:51:36 pm by j_groothu »

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 50
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 88
Total: 88
Powered by EzPortal