Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => Topic started by: RottenMutt on 26 Sep 2007, 09:42:19 pm

Title: V8 Optimized App
Post by: RottenMutt on 26 Sep 2007, 09:42:19 pm
The dual quad Macs are kicking my a$$. :-[  How about a V8 Optimized app?

The best I'm (http://setiathome.berkeley.edu//hosts_user.php?userid=4167&show_all=0&sort=expavg_credit) able to due is about 13th top computers.  Even though I'm dual 5320 OC to 2.625GHz the extra fsb (375MHz) puts my Boinc benchmarks higher then 5365 Macs.

Please Help ME
[/b]

edit:
I have a SuperMicro X7da3+ running two OC'd 5320 at 2.625GHz (375MHz FSB), 4 sticks of Crucial 2GB PC2-5300 FB-DDR2 Memory in every other slot.

this link is my computer, V8 5320 @2.625GHZ, RAC4500: http://setiathome.berkeley.edu/show_host_detail.php?hostid=2921877
this link is adream's V8 5365 @3.0GHz, RAC7000: http://setiathome.berkeley.edu/show_host_detail.php?hostid=3284384

Maybe the MAC's are OC'ing, if so they should be 1.3 times faster then me, but still.
I've originally started with a SM X7da8 and mismatched ram, replaced it all to find out it didn't make any difference. :(  Now I have two machines running, one with 5120's and another with 5320's all with Crucial FB-DDR2.
Title: Re: V8 Optimized App
Post by: RottenMutt on 26 Sep 2007, 09:57:35 pm
I've just loaded cruch3r's app

http://calbe.dw70.de/seti.html
Title: Re: V8 Optimized App
Post by: michael37 on 26 Sep 2007, 11:04:23 pm
First of all, impressive job with overclocking.  Wow.

Second, you are using the best application for your computer -- the SSSE3 64-bit app.  I couldn't figure out if you are running 2.4 or 2.4V, but the latter is a tad faster.  Upgrade if necessary.

I have no experience with V8 (dual-quad cores), but I have quite a bit experience with dual dual-cores 5120 and 5160s.  I think they are simply slower than their Q6000 counterparts.  I don't know why, but it has been reported on the main boards.  Maybe an inferior chipset and memory controller?  Though I see that your Q6700 is slower.

Lastly, I looked at completion times for your workunits on the E5320 and they are not impressive.  How are you cooling that monster?  It got to produce insane amount of heat overclocked.  Make sure it doesn't throttle down your CPUs.  Also, did you take care of the memory speed? 



Title: Re: V8 Optimized App
Post by: RottenMutt on 27 Sep 2007, 01:32:41 am
 I've water cooled the 5320s, they run around 36-45 degrees C, and the memory about 60-65C.  I seem to be limited by either the graphics card, the pci-e bus/the NB not running locked as i can hit 390 but not for long with BSD which reference the graphics.
I just DL 2.4V which i believe to be curnch3rs app.
The MACs are built on the same 5000x chipset so i don't think its a hardware limitation.  But I do here them talk about a V8 optimized app, i have no clue how that could or would be accomplished...

I've been running around a 4200rac on the OC q6700's (4x3.35GHz), compared to 4900rac on the 5120's (8x2.6GHz) using 2.4 off the main page.
Title: Re: V8 Optimized App
Post by: michael37 on 27 Sep 2007, 01:48:41 am
Mac application is not "V8" optimized.  It's just SSSE3 + MacOS optimized -- they still run 8 copies of the application, same as Windows or Linux users.

Graphics card has nothing to do with the Seti performance.  Neither does PCI-E bus.  Seti app performs no PCI-E IO.  Boinc app does a little -- writing files and doing network traffic, but that's really minor and doesn't define the performance.  What's your memory speed?  What's your memory rating?  I am beginning to suspect your RAM.



Title: Re: V8 Optimized App
Post by: michael37 on 27 Sep 2007, 01:56:38 am
OK let's compare two workunits, one by my 5150 and one by your E5320.  Something is not OK with your machine dropping your performance by 50%!  And we are using the same app.

Workunits are nearly identical: 0.4AR, 54.1 credit.

Your E5320:
Code: [Select]
CPU time 9918.890625

My 5150:
Code: [Select]
CPU time 6720.125

Those MACs have much more similar timing per workunit to my computer than to yours.

Your workunit:  http://setiathome.berkeley.edu//result.php?resultid=614751499

My workunit: http://setiathome.berkeley.edu/result.php?resultid=620964727

Mac workunit: http://setiathome.berkeley.edu/result.php?resultid=588774781
Title: Re: V8 Optimized App
Post by: Richard Haselgrove on 27 Sep 2007, 11:18:30 am

The MACs are built on the same 5000x chipset so i don't think its a hardware limitation.  But I do here them talk about a V8 optimised app, i have no clue how that could or would be accomplished...


There's nothing magical about the Mac V8 app - it's just conventional that the next release after version 7, is version 8.... :o

If you're just starting to get to grips with the E5320 / x5000 architecture, you might like to review this thread from almost a year ago: RAM it (almost certainly) is.

http://setiathome.berkeley.edu/forum_thread.php?id=36040&nowrap=true#486827
Title: Re: V8 Optimized App
Post by: RottenMutt on 27 Sep 2007, 07:26:06 pm
Thanks for the help so far.  I agree my work unit time see way off and I don’t know why.

Motherboard is SuperMicro X7DA8 (http://www.supermicro.com/products/motherboard/Xeon1333/5000X/X7DA8.cfm"), Memory is four 1GB Crucial FB 666MHz Dimms populated in quad channel mode.  My memory temps are around 60 to 65 degrees C.

Pictures of my system on 2CPU.com (http://forums.2cpu.com/showpost.php?p=703470&postcount=931")

The discussion on graphics and pci-e bus has to do with system stability while over-clocking and nothing to do with seti.

edit: I've since switched out the mobo to a SM x7da3+ and tried new ram, 4 sticks of crucial pc2-5300 (333MHz) FB-DDR2, to find it had no effect.  I did determine the X7DA8 has the correct revision chipset to run Quad Cores despite SM telling me mine doesn't (they said the serial number of my board doesn't have the correct revision NB).
Title: Re: V8 Optimized App
Post by: RottenMutt on 28 Sep 2007, 06:48:21 pm
Update:

I’ve started to see parity error in the BIOs!

When I ordered the MB, Processors (5120’s) and Memory (Crucial CT12872AF667) from Newegg (11/1/2006) they sent me one Micron made Dimm and 3 Samsung re-branded Dimms.  Since I’ve upgraded to quad cores maybe the the mismatch is now noticeable.  Good news Newegg has agreed to RMA the memory.  :)  Hopefully this it the problem, what do you think?
When I got the memory I thought crap, but then i thought about how the memory is separated from the NB by the ABM chip why should it matter.  At least they sent me matched Xeons which isn't always the case.
Title: Re: V8 Optimized App
Post by: Urs Echternacht on 04 Oct 2007, 09:41:06 pm
OK let's compare two workunits, one by my 5150 and one by your E5320.  Something is not OK with your machine dropping your performance by 50%!  And we are using the same app.
Workunits are nearly identical: 0.4AR, 54.1 credit.
Your E5320:
Code: [Select]
CPU time 9918.890625
My 5150:
Code: [Select]
CPU time 6720.125
...
That difference looks heavy. Analysing...
E5320 has only FSB 1066, while E5150 has FSB 1333. Maybe that is the limiting factor here, because the RAM's bandwidth is not   fully used (errors are an other factor).

Besides, the L5335 would use less power and has also the faster FSB1333. (But it's more expensive.)

Urs
Title: Re: V8 Optimized App
Post by: RottenMutt on 09 Oct 2007, 06:13:29 pm
the 5320 is overclocked to 1520 fsb :o
Title: Re: V8 Optimized App
Post by: michael37 on 09 Oct 2007, 10:41:41 pm
the 5320 is overclocked to 1520 fsb :o

Have you gotten your memory fixed?  If yes, did the times get back to something more reasonable?

If not, would you try dropping the overclocking rates down a bit?  I wonder if there is some major inefficiency with the memory controller due to imbalance of FSB vs DDR2 memory speeds.  Maybe it's not as much overclocking FSB rather than synchronization between FSB and RAM.
Title: Re: V8 Optimized App
Post by: RottenMutt on 10 Oct 2007, 11:02:26 am
I’ve rma’d the memory back to newegg.com and should get new memory today.  I will report back later.
Title: Re: V8 Optimized App
Post by: RottenMutt on 10 Oct 2007, 11:00:55 pm
Sandra hardware mainboard info is reporting the memory is 64-bit width???  This is the new memory... 4 sticks of 2GB PC2-5300 FB-DDR2 (Crucial).



SiSoftware Sandra

Mainboard
Manufacturer : Supermicro
Multi-Processor (MP) Support : 2 Processor(s)
MPS Version : 1.40
Model : X7DA8
Version : PCB Version
Serial Number : 0123456789

System Memory Controller
Location : Mainboard
Error Correction Capability : None
Number of Memory Slots : 8
Maximum Installable Memory : 64GB
Bank1 - DIMM1A : Micron 18HF25672FD667E1D4 E30447CC DIMM Synchronous DDR2-SDRAM 2GB/72 @ 667Mt/s
Bank1 - DIMM1B : Empty
Bank2 - DIMM2A : Micron 18HF25672FD667E1D4 E30447D3 DIMM Synchronous DDR2-SDRAM 2GB/72 @ 667Mt/s
Bank2 - DIMM2B : Empty
Bank3 - DIMM3A : Micron 18HF25672FD667E1D4 D21C5C03 DIMM Synchronous DDR2-SDRAM 2GB/72 @ 667Mt/s
Bank3 - DIMM3B : Empty
Bank4 - DIMM4A : Micron 18HF25672FD667E1D4 E304481A DIMM Synchronous DDR2-SDRAM 2GB/72 @ 667Mt/s
Bank4 - DIMM4B : Empty

Chipset 1
Model : Super Micro Computer Inc 5000X Chipset Memory Controller Hub
Revision : D2
Bus : Intel AGTL+
Front Side Bus Speed : 4x 373MHz (1492MHz data rate)
Maximum FSB Speed : 4x 266MHz (1064MHz data rate)
Width : 64-bit
SMP - MP Capability : Yes
I/O Queue Depth : 12 request(s)
Maximum Bus Bandwidth : 11936MB/s (estimated)

Logical/Chipset 1 Memory Banks
Bank 0 : 2GB DDR2-SDRAM FB-DIMM 5.0-5-5-15 (tCL-tRCD-tRP-tRAS) CR4
Bank 0 Temperature : 59.0°C / 138.2°F
Bank 1 : 2GB DDR2-SDRAM FB-DIMM 5.0-5-5-15 (tCL-tRCD-tRP-tRAS) CR4
Bank 1 Temperature : 58.5°C / 137.3°F
Bank 2 : 2GB DDR2-SDRAM FB-DIMM 5.0-5-5-15 (tCL-tRCD-tRP-tRAS) CR4
Bank 2 Temperature : 58.0°C / 136.4°F
Bank 3 : 2GB DDR2-SDRAM FB-DIMM 5.0-5-5-15 (tCL-tRCD-tRP-tRAS) CR4
Bank 3 Temperature : 55.0°C / 131.0°F
Supported Memory Types : DDR2-SDRAM FB-DIMM
Channels : 4
Memory Bus Speed : 2x 373MHz (746MHz data rate)
Maximum Memory Speed : 2x 400MHz (800MHz data rate)
Multiplier : 1/1x
Width : 64-bit
Memory Controller in Processor : No
Refresh Rate : 7.80µs
Power Save Mode : No
Fixed Hole Present : No
Maximum Memory Bus Bandwidth : 23872MB/s (estimated)
Title: Re: V8 Optimized App
Post by: Jason G on 11 Oct 2007, 12:07:58 am
Yes,
   DDR, DDR2 & DDR3 are all 64 bit wide memory data path AFAIK
Title: Re: V8 Optimized App
Post by: RottenMutt on 11 Oct 2007, 03:10:53 am
I was thinking that dual channel would bond two dimms to a single memory address for a 128bit word, and i though sandra would report it that way...
and for 5000x would yeild two 128bit channels...
Title: Re: V8 Optimized App
Post by: Jason G on 11 Oct 2007, 05:34:48 am
if they did that, that would make single channel 128 bits :D, we want dual 64 bit channels

If 1 memory address accessed 128 bits instead of 64, then I think the programs would have to change too[ and the sockets would need more pins I think]. Yes it is accessing 128 bits at a time.
Dual Channel = 2  separate 64 bit channels in paralell, doubling the bandwidth in the controller giving effective bandwidth of 128 bits.  At a guess I don't think , gluing 64 bits from one channel, to the other 64 bits channel, and synchronising the two channels, would be any more efficient that one channel + one channel, but then again I'm not a motherboard designer :D
Title: Re: V8 Optimized App
Post by: Jason G on 11 Oct 2007, 05:50:49 am
Anyway, Are your paired modules in matching slots? and dual [or must be dual interleaved at that bandwidth?]  channel mode set in Bios?

[ It looks like you are running Dual Channel Interleaved correctly to me (maybe not check BIOS).  That~12 GB total memory bandwidth looks  like a lot more than DDR2 dual channel mode... but should you be higher?]
for PC2-5400 DDR2-SDRAM, roughly theoretical:
           
Quote
DDR 2 single channel ~5 GB/s
            DDR 2 dual channel ~10 GB/s
            DDR 2 dual channel interleaved ~??GB/s, Dual channel + a bit extra

[Just looked at the manual for your mobo,  looks like you have them in the right sockets, and it wouldn't work any other way,  EEEK, they are all the same colour :o ]

Quote
Interleaved memory is supported when pairs of DIMM modules are
installed in both Branch 0 and Branch 1.
which you have so I think it's good to go. [You'll want to check the 'Branch Mode" setting in BIOS is set to "Interleave" though] I would test with memtest86+ and look at the speed with different interleave ratio settings to find what's fastest  ( 1:1, 1:4 etc..)

Title: Re: V8 Optimized App
Post by: RottenMutt on 18 Oct 2007, 01:13:05 am
new memory, but still not much better...
why is everest memory reads slow?


[attachment deleted by admin]
Title: Re: V8 Optimized App
Post by: RottenMutt on 18 Oct 2007, 01:20:59 am
Sandra memory benchmark

[attachment deleted by admin]
Title: Re: V8 Optimized App
Post by: Jason G on 18 Oct 2007, 02:15:07 am
Hi again,
   Do the FB-DIMM RAM sticks have 9 x chips (single rank), or 18 x chips (dual rank) per stick ? [This could be important]  Either way You are beating the listed 5000 chipset reference machine. You might squeeze some more performance out looking at the timings ...  [but remember that every benchmark software is different and only synthetic, I think they design them to defeat the Ram buffers making FB Dimm look slower, when in real applications they could be faster for some things]

I remember reading the Mac Pros like to run fastest [low latency, but less than maximum bandwidth] with 4 x dual rank fb dimms. they said that loading all 8 slots added some latency but gave more bandwidth.

NOTE the slower EVEREST benchmark shows you are running single channel!  If that isn't an Error in Everest, you need to fix that if you haven't already! the  sandra one looks OK so I can't see a reason for it.(maybe Check BIOS anyway  - dual channel, interleave, and the branch mode I think too)

Have you maybe got Boinc running during the benchmark?

What does CPU-Z say ?  (cpu pages and Memory Pages ?) 

 just remember the opterons seem to be using consumer DDR2 not server FB Dimms [ Or could they be RAMBUS RIMMS ? $$$  :o]  , and have on die memory controller.  (that's for playing games isn't it? , :P , Maybe if seti could use all the memory bandwidth then there would be more opterons in the top 100 computers list )
Title: Re: V8 Optimized App
Post by: Jason G on 18 Oct 2007, 03:28:29 am
I Just realised another possibility why maybe the slower Everest Benchmark.  It is possible driving the memory hard you get more correctable ECC Errors. the correction adds some latency I think.   Maybe if you back off on the RAM a bit and check its cooling it might actually speed up some more. I think memtest86 [But don't know for sure] should show if ECC error correction is happening a lot.
Title: Re: V8 Optimized App
Post by: RottenMutt on 18 Oct 2007, 09:35:39 pm

It is possible driving the memory hard you get more correctable ECC Errors. the correction adds some latency I think.
bios logs ecc errors and it hasn't with the new memory.  i'm wondering if the current bios has something wrong???

Quote
Do the FB-DIMM RAM sticks have 9 x chips (single rank), or 18 x chips (dual rank) per stick ?
dual rank i do believe

Quote
NOTE the slower EVEREST benchmark shows you are running single channel!
why do you say that?  i suspect it may be true.

Quote
Have you maybe got Boinc running during the benchmark?
no

Quote
What does CPU-Z say ?  (cpu pages and Memory Pages ?)
cpuz doesn't say much of anything

[attachment deleted by admin]
Title: Re: V8 Optimized App
Post by: RottenMutt on 18 Oct 2007, 09:36:40 pm
memory tab

[attachment deleted by admin]
Title: Re: V8 Optimized App
Post by: RottenMutt on 18 Oct 2007, 09:40:16 pm
Everest spd info shows 2 rank

[attachment deleted by admin]
Title: Re: V8 Optimized App
Post by: Jason G on 19 Oct 2007, 03:32:01 am
Looks Jolly good to me.  The Everest benchmark you showed before , I think was faulty or there was a wrong setting, with a blue window did say "Single Channel DDR2-760FB SDRAM",  So I think it is just wrong,  the other things you show no problem.... If you think something about bios I'd be checking anyway.

Everything I can find on different ram configurations tested with with Fully Buffered DIMMS on similar systems (like the Mac Pro)  is suggesting that the way you have it (dual rank, 4 slots) will show less than maximum bandwidth ,  but much lower(faster) latency.  That is good :D 

Fast Latency is more important for small random memory accesses (Like seti) , and  high bandwidth is more important for database servers and stuff. (depending what they store )

so what you have will be the fastest combination for workstation / crunching use I reckon.

I think adding more sticks would increase bandwidth but raise latency too, making that more suitable for a high capacity database server that accesses big blocks of continuous data. (less suitable for seti / workstation]

All my opinions to be taken with a grain of salt, until you've worked out what's best for you  ;)

If you can find a way to measure ECC correction errors,  Like the Mac Pro Has,  Then you can tighten the timing (latencies)  until it squeals then back off a bit.  Of course that depends on how much control you have to start with.

Jason


Title: Re: V8 Optimized App
Post by: RottenMutt on 19 Oct 2007, 10:21:45 am
Looks Jolly good to me.  The Everest benchmark you showed before , I think was faulty or there was a wrong setting, with a blue window did say "Single Channel DDR2-760FB SDRAM"...
Good catch i totally missed it.  I use to get 7000MB/s read in Everest, now i don't.  I checked the bios and everything is set correctly, then i checked the DMI events and there were ECC errors:(  I'm RMA'ing the board...
Title: Re: V8 Optimized App
Post by: Jason G on 19 Oct 2007, 07:02:34 pm
I use to get 7000MB/s read in Everest, now i don't.  I checked the bios and everything is set correctly, then i checked the DMI events and there were ECC errors:(  I'm RMA'ing the board...
LOL, the old 'When in doubt chuck it out" methodology. It will be interesting to compare a benchmark of new board against the data you have already, with everything else set the same.  I have not seen measured anywhere the cost of the ECC errors on speed, definitely reliability though. 

Was there indication whether they were "hard uncorrectable" ECC Errors ?  Or were they "ECC correction events" ?
Title: Re: V8 Optimized App
Post by: Gecko_R7 on 19 Oct 2007, 10:24:48 pm

Fast Latency is more important for small random memory accesses (Like seti) , and high bandwidth is more important for database servers and stuff. (depending what they store )

I think adding more sticks would increase bandwidth but raise latency too, making that more suitable for a high capacity database server that accesses big blocks of continuous data. (less suitable for seti / workstation]

Jason

Noticed your comment regarding latency.

So, on a Q6600 Quad for example, Seti would respond better w/ DDR2-800 @ CL-3 than cranking to higher bandwidth, say DDR2-1200 but having to run CL5?

Is this right?
Title: Re: V8 Optimized App
Post by: Jason G on 19 Oct 2007, 11:11:15 pm
Noticed your comment regarding latency.

So, on a Q6600 Quad for example, Seti would respond better w/ DDR2-800 @ CL-3 than cranking to higher bandwidth, say DDR2-1200 but having to run CL5?

Is this right?

Conceptually I guess yes. Practically It would depend on what things you were using the machine for etc...  Accessing small amounts of data spread randomly through Ram would benefit from lower latency (fast starting for a transfer)  more than improved bandwidth (but slower starting).  That "probably" makes general sense for non-server ram/mobos/buffered ram  too, but as I see quoted here often "your mileage may vary".

That's why I have a problem with going by the bandwidth benchmarks as a performance guide.  They would tend to use large blocks of data similar to streaming database [video] content or something like that.  And these don't seem to make much mention of latency (access startup time)  at all.

The statement I made was really regarding a specific combination of fully buffered Dimms on a Mac Pro similar motherboard.  As I understand it (which may be wrong) These use a memory branching structure  with serial links to interleave extra slots (giving a total of eight slots).  Roughly understood,  When these are all populated,  the latency on each pair is increased by one (1) .. giving a slower start access,  but more bandwidth due to interleaving structure. 

With only four slots populated, in the correct combination, the latency remains at the original (fast value), but the bandwidth is less.

This may be all completely irrelevant for Q6600, which uses ordinary dual channel [not FB Dimms or branch interleaveing]. so should give the same latency whichever slots are filled [provided dual channel slot match is observed].

The conceptual argument/possibility of lower latency being preferred for seti type applications remains though (as you point out).  [And you or I probably wouldn't be the first to suggest that backing off on amount of, and bandwidth of, ram but going for fastest(lowest) latency, might be a good idea for a workstation / cruncher]

Jason

[PS:  As a guesstimate , if 3 cycle latency and 400Mhz (DDR2-800)  is 7.5ns , and 5 cycle latency at 600MHz (ddr2-1200) is 8.33ns then a small access will start about 10 % faster  with the low latency ddr2 800.

So if I was just doing seti and checking my emails I'd go the ddr2-800 low latency,

If I was editing video, I'd go for the higher bandwidth ddr2-1200. ]

Title: Re: V8 Optimized App
Post by: RottenMutt on 20 Oct 2007, 02:22:10 am
i think i found my problem.  quad cores require G1 stepping NB chips.
Title: Re: V8 Optimized App
Post by: Jason G on 20 Oct 2007, 02:32:19 am
Cool, so the question I guess is will the new replacement board have a G1 stepping Northbridge? [ I haven't yet come across that requirement, but am looking :D]

Title: Re: V8 Optimized App
Post by: Gecko_R7 on 20 Oct 2007, 02:36:25 am
Noticed your comment regarding latency.

So, on a Q6600 Quad for example, Seti would respond better w/ DDR2-800 @ CL-3 than cranking to higher bandwidth, say DDR2-1200 but having to run CL5?

Is this right?

[PS: As a guesstimate , if 3 cycle latency and 400Mhz (DDR2-800) is 7.5ns , and 5 cycle latency at 600MHz (ddr2-1200) is 8.33ns then a small access will start about 10 % faster with the low latency ddr2 800.

So if I was just doing seti and checking my emails I'd go the ddr2-800 low latency,


Thanks Jason.  Makes good sense.  I'm going to give it a whirl and see how it works.
Regards,
Ian
Title: Re: V8 Optimized App
Post by: Jason G on 20 Oct 2007, 02:56:26 am
Thanks Jason.  Makes good sense.  I'm going to give it a whirl and see how it works.
Regards,
Ian

It'll  be interesting to see  if the theories apply in practice on non-server hardware too.  If low latency's the ticket whatever speed you buy I'll have to improve the ram in my old clunkers  ::)

[Later: came across this while looking for old ddr400 in low latency CAS2... part of advertising for Corsair modules]
Quote
Performance computing enthusiasts have recognized for some time that latency settings may, in fact, have a greater impact on overall system performance than the overall memory bus speed. Indeed, the latency settings have become even more critical due to current system architecture. The latest chip sets have demonstrated that performance is greatest when the memory bus runs at an integral multiple of the front side bus of the processor. So, the optimum memory performance is attained when the memory bus is synchronous with the processor, and latency settings are reduced to the lowest values possible.
Title: Re: V8 Optimized App
Post by: Gecko_R7 on 20 Oct 2007, 03:07:20 am
Thanks Jason. Makes good sense. I'm going to give it a whirl and see how it works.
Regards,
Ian

It'll be interesting to see if the theories apply in practice on non-server hardware too. If low latency's the ticket whatever speed you buy I'll have to improve the ram in my old clunkers ::)


I'm actually going to tighten-up a pair of Team Xtreem DDR2-1200 and try to run 1:1 at 400Mhz & CL3-3-3-8 1T w/ FSB 400x8 on P5k Dlx.

Key word is "try".
Title: Re: V8 Optimized App
Post by: Jason G on 20 Oct 2007, 03:17:17 am
 A Huh, that idea would seem to match what the corsair blurb is saying.  [ I was adding it to my earlier post as you posted]  Good luck.
Title: Re: V8 Optimized App
Post by: Jason G on 20 Oct 2007, 04:12:48 am
i think i found my problem.  quad cores require G1 stepping NB chips.


This is for the Supermicro X7DA8 motherboard?
Can't find info to support this about the northbridge stepping [which doesn't mean it isn't so].  What I am finding suggests it may be the motherboard revision needs to be 2.0+ and latest bios ? how does that compare with what you have ?
Title: Re: V8 Optimized App
Post by: Jason G on 20 Oct 2007, 05:17:35 am
You are right I think, Here 'tis  ;) at http://ftp.supermicro.com/support/faqs/faq.cfm?faq=6423
Quote
Question
I have one of your X7DA8 board that I bought in August 2006. I would like to use quad core CPUs on this machine. I see your website indicates that quad core CPUs are supported on this board. Can I just buy quad core CPU and use it on this board?
Answer
Running this CPU on X7 boards requires G1 stepping of Memory controller on the motherboard. This change was implement in November end after Intel released G1 stepping memory controller. So, all the boards manufactured before November would not support quad core CPU.
Title: Re: V8 Optimized App
Post by: RottenMutt on 26 Oct 2007, 01:54:47 pm
well i replaced the board with an X7DA3+ last night.  i do not believe the seti performance has increased.  I did notice in Everest it is now reporting dual channel, but with no improvements in any memory benchmarks.
i did get four more dimms of memory so I can try some benchmarks with all 8 dimms populated.  I will also try the X7DA8 with two dual cores to see if the memory preformance increases.

Title: Re: V8 Optimized App
Post by: Jason G on 26 Oct 2007, 03:39:45 pm
Theory based on the Mac Pros,  With 8 sticks You'll get *maybe much* more bandwidth, but higher latency, which you prefer will ultimately be up to you! good luck and enjoy :D
Title: Re: V8 Optimized App
Post by: Jason G on 26 Oct 2007, 05:02:32 pm
Oh one more thing, Some users over on NC forum are reporting that SSE3 builds seem to be quicker than the SSSE3 builds at the moment on a core2 based setup.  Can't verify that myself, or figure how a Cloverdale rig fits into that equation.   Something strange there....
Title: Re: V8 Optimized App
Post by: RottenMutt on 29 Nov 2007, 09:50:31 am
well i replaced the board with an X7DA3+ last night.  i do not believe the seti performance has increased.

I'm starting to forum the opinion that nothing is wrong the the computer/mobo and that the difference between the "Darwin" machines and the the Windows machines are just the app and the Mac app preforms much better on V8's then the optimised apps on windows.:(
Title: Re: V8 Optimized App
Post by: Gecko_R7 on 29 Nov 2007, 12:08:53 pm
well i replaced the board with an X7DA3+ last night.  i do not believe the seti performance has increased.

I'm starting to forum the opinion that nothing is wrong the the computer/mobo and that the difference between the "Darwin" machines and the the Windows machines are just the app and the Mac app preforms much better on V8's then the optimised apps on windows.:(

There is a very substantial performance difference of @ 30% to as much as 50% depending on the AR, if you compare OSX V8 and 2.4V on similar platforms.  Hardware is largely not responsible for this. The aps are really 2 different animals that have different lineages and developer approaches.   

Something else to keep in mind is that for Mac, there are really only three Intel flavors required....Core Duo, Core 2 and Xeon.  Alex still does G4 and G5 PPC ports as well, but this is really legacy support.  For x86, developers have to accommodate MANY different CPU & OS combos and generations of same.  This means some consideration has to be given on what development combos will work on the the widest range of the above.  Also, it takes considerable time and effort to alpha and beta build & test for all these different combos...time that could otherwise be re-invested into trying new things for additional incremental improvement.  The x86 aps also are the result of collaborative work and pieces involving MANY people over the past few years.  People come, and people go.  As Jason (j_groothu) can attest, if a new developer has ideas to add/improve the code base, it takes a little while to understand and figure out how the current code is structured & works before one can start down the path of optimizing for it.  It's rather like an architect trying to contribute to a house built in stages by 10 other architects, each with their own specialty and style.  The OSX ap has essentially had 1 main architect the past couple of years who prior to Mactel conversion, was the only active person building PPC aps.  PPC code development requirements were substantially different than X86 considerations..i.e. compilers, libraries, cpu architecture etc. yet many concepts/elements from this port quite well to current x86 Mactels and in some cases, work much better with Intel's development tools. 

All these things add-up over time and result in the different lineages we have today  ;)
In any case, the current aps for all platforms are the quickest ever and testament to the fantastic efforts and dedication of all our developers, past and present.  ;D

Cheers!


Title: Re: V8 Optimized App
Post by: Jason G on 29 Nov 2007, 11:56:38 pm
..
 As Jason (j_groothu) can attest, if a new developer has ideas to add/improve the code base, it takes a little while to understand and figure out how the current code is structured & works before one can start down the path of optimizing for it.  It's rather like an architect trying to contribute to a house built in stages by 10 other architects, each with their own specialty and style. 
..

I certainly can attest to that! Frankenstein's Monster might be another good analogy  ;D , I'm somewhere between figuring out how it's structured and how it works.

For the situation with different Hardware/OS/Compilers I like simple car analogies:
- If you are designing a new turbocharger for a specific vehicle it can be highly 'tweaked' for that car / engine, you have a fixed 'platform' to design for.
- If you want a more generic model then it may still be good, but will have compromises involved in the design, maybe size, shape, maybe bolt patterns, capacity, all additional considerations for turbocharger designer.you have a more loosely defined or even shifting platform.

Jason
Title: Re: V8 Optimized App
Post by: RottenMutt on 24 Dec 2007, 12:21:28 pm
Here is my latest RAC Graph, the computer crunches 24/7 except when i'm playing Crysis.

[attachment deleted by admin]
Title: Re: V8 Optimized App
Post by: RottenMutt on 19 Jan 2008, 08:38:30 pm
can someone compile a windows V8 app using Alex's code which can be downloaded from this threadhttp://setiathome.berkeley.edu/forum_thread.php?id=31810&nowrap=true#656269

i would think this would benifit Quads as well
Title: Re: V8 Optimized App
Post by: RottenMutt on 22 Jun 2008, 04:30:39 pm
thank you for the V8 port to windows:)
my bigest gain is on my 6700 dual core cpu, it went from 2400 RAC to 3700 RAC (54% improvement)!!!
my V8 went from 5200 RAC to 8000 RAC (hey that is a 54% improvement too)
my Quads are not doing as well, i haven't figured that out yet, Q6700 and Q9300, both around 3.5GHz.
Even my AMD machines, dual 270's and dual 2216's, have seen an improvement.
Title: Re: V8 Optimized App
Post by: sinj on 10 Nov 2008, 12:10:35 am
I noticed that seti has gone to version 603. Can I modify my appinfo.xml and run v8 against this on win64?
Title: Re: V8 Optimized App
Post by: Raistmer on 10 Nov 2008, 03:27:04 am
If you already run opt app you can just do nothing and continue crunching as before.
If you use stock version now, yes you should edit app_info.xml. Look at it's structure, find block that contains
    <app_version>
        <app_name>setiathome_enhanced</app_name>
        <version_num>528</version_num>
...............
    </app_version>
dublicate this block and replace 528 on 603.
Title: Re: V8 Optimized App
Post by: Slawek on 29 Jan 2009, 12:13:46 pm
Hi,
 i have one question about.. 64 bit OS.. 64 bit incresed performance computing ?
Title: Re: V8 Optimized App
Post by: sunu on 29 Jan 2009, 12:42:38 pm
At least for SETI, there isn't a big difference. Only a (very?) small increase for 64bit
Title: Re: V8 Optimized App
Post by: Raistmer on 29 Jan 2009, 03:33:20 pm
Hi,
 i have one question about.. 64 bit OS.. 64 bit incresed performance computing ?
Windows OSes have see speed increase for x64 opt AK_v8 apps. Even SSE2 version will showsom speedup.
Title: Re: V8 Optimized App
Post by: Slawek on 31 Jan 2009, 07:13:57 am
Hi,
 i have one question about.. 64 bit OS.. 64 bit incresed performance computing ?
Windows OSes have see speed increase for x64 opt AK_v8 apps. Even SSE2 version will showsom speedup.

Hi.. again i have question...

If i have 64 bit OS for more performance.. what version APP ? 64 bit or 32 bit ?
Title: Re: V8 Optimized App
Post by: Raistmer on 31 Jan 2009, 08:08:19 am
x64 one.
BTW you cold do tests for your own PC and report results. There was some very nice timings table for older opr app - it would be nice to collect something similar for current AK v8
Title: Re: V8 Optimized App
Post by: Slawek on 04 Feb 2009, 11:59:25 am
Hello

64 bit version OS and computing..  runing
Title: Re: V8 Optimized App
Post by: elec999 on 14 Feb 2009, 12:29:09 am
I have a Tyan with two xeons quad cores x2, this is 8 cores all together, no gpu, should I be using this app on it. I am currently using the Windows 32 bit AK v8.0 SSE4.1. Would this give me better results.
Thank you
Title: Re: V8 Optimized App
Post by: Jason G on 14 Feb 2009, 12:38:14 am
You might get ~1%, or so, better performance out of the SSSE3x build due to a certain optimisation designed exactly for machines liek that (x-->xeon).  You'd probably get a much bigger improvement (5-10% maybe ?) if you were able to use a 64-bit OS and the 64 bit build.  I would guess on a machine like that, 5-10% would be considered 'significant'.

Jason