Seti@Home optimized science apps and information
Optimized Seti@Home apps => Windows => Topic started by: Sutaru Tsureku on 08 Jun 2010, 04:21:22 am
-
I made a bench test with my new (old) Intel Core2 Duo E7600 @ 3.06 GHz / DDR2 800/5-5-5-18 (stock, not OCed).
I had let run two instances of Knabench V1.81r simultaneously - 'one for one CPU-Core' (without CPU-affinity).
AK_v8b_win_SSE41.exe
TaskName: PG0009.wu - 377.734 secs Elapsed - 375.609 secs CPU time
TaskName: PG0395.wu - 383.344 secs Elapsed - 381.188 secs CPU time
TaskName: PG0444.wu - 319.391 secs Elapsed - 317.344 secs CPU time
TaskName: PG1327.wu - 249.844 secs Elapsed - 247.672 secs CPU time
TaskName: PG0009.wu - 379.125 secs Elapsed - 376.953 secs CPU time
TaskName: PG0395.wu - 383.609 secs Elapsed - 381.516 secs CPU time
TaskName: PG0444.wu - 319.828 secs Elapsed - 317.750 secs CPU time
TaskName: PG1327.wu - 250.641 secs Elapsed - 248.547 secs CPU time
AK_v8b_win_SSSE3x.exe
TaskName: PG0009.wu - 369.000 secs Elapsed - 366.844 secs CPU time
TaskName: PG0395.wu - 351.047 secs Elapsed - 348.969 secs CPU time
TaskName: PG0444.wu - 290.594 secs Elapsed - 288.516 secs CPU time
TaskName: PG1327.wu - 249.484 secs Elapsed - 247.391 secs CPU time
TaskName: PG0009.wu - 371.375 secs Elapsed - 369.281 secs CPU time
TaskName: PG0395.wu - 351.109 secs Elapsed - 348.891 secs CPU time
TaskName: PG0444.wu - 290.453 secs Elapsed - 288.438 secs CPU time
TaskName: PG1327.wu - 249.594 secs Elapsed - 247.422 secs CPU time
I thought SSE4.1 is faster (only) on Core2 Duo CPUs?
Or only on the E8xxx series?
-
I thought SSE4.1 is faster (only) on Core2 Duo CPUs?
Or only on the E8xxx series?
It's actually dependant on the memory bus contention mostly, and indirectly via cache size. So as usual no choice is absolute given those CPUs did not exist when those comparisons were made, & these CPUs have a smaller cache than 8xxx series.
'Usually', since fast Duos have less contention than Quads (dual channel ram on 2 cores instead of 4 cores) the SSE4.1 build is equal or faster with fast RAM, but slower with slow RAM, however the smaller cache changes things again.
To Test: If you slow that RAM down more SSSE3x should be better by a greater margin. If it is possible to OC that ram, especially run lower latency, then the SSE4.1 build is likely to overtake somewhere around 960MHz @ cl4.
Your system shows a pretty big difference. That difference gets less and swaps places as memory system performance goes up. In some cases that swapping can include Quads that have extreme RAM & lots of cache also.
Jason
-
"SSE4.1" could be faster for hosts with relatively fast memory bus. Your host uses 800MHz memory so very likely memory bus saturated enough to give advantage of cache handling used in SSSE3x build. That's memory/cache issues, not SSE level of CPU per se.
IMO SSE4.1 build leaved mostly for peoples who can't understand that bigger!=better sometimes. SSE4.1 will win on some hosts indeed, but not on many IMO and surely not on all who can support SSE4.1 instruction set.
-
Duplicating your test on E8400 @3.6GHz w/Dual Channel DDR2@960MHz, Win7x64 (Second core loaded with Boinc):
AK_v8b_win_SSSE3x.exe
TaskName: PG0009.wu - 293.020 secs Elapsed - 290.318 secs CPU time
TaskName: PG0395.wu - 267.220 secs Elapsed - 264.531 secs CPU time
TaskName: PG0444.wu - 210.280 secs Elapsed - 207.793 secs CPU time
TaskName: PG1327.wu - 154.550 secs Elapsed - 152.195 secs CPU time
Total CPU 914.837
AK_v8b_win_SSE41.exe
TaskName: PG0009.wu - 289.011 secs Elapsed - 286.777 secs CPU time
TaskName: PG0395.wu - 263.460 secs Elapsed - 260.880 secs CPU time
TaskName: PG0444.wu - 214.420 secs Elapsed - 211.818 secs CPU time
TaskName: PG1327.wu - 154.311 secs Elapsed - 151.508 secs CPU time
Total CPU 910.983 - speedup = 0.42%
Pretty close! ;) So close it doesn't really matter anymore. I do plan to upgrade my RAM further in the near future, and so it will probably all change again.
Jason
-
with so small differencies one time measured numbers w/o standard deviation just have no sense actually. Times are the same.
-
I agree if the point was to decide which one to use. However for the purposes of replicating Surtaru's test with the same apps. Quoting myself:
Pretty close! ;) So close it doesn't really matter anymore.
Point was to show convergence of times with better memory subsystem, where there was a big difference on Sutaru's....so it does that successfully ;).
Of course I don't use either of these apps, but the 64 bit versions where the story is very slightly different again: and variation is more like 1-2%, which becomes a bit more repeatable & a tiny bit more worthwhile.
-
I agree if the point was to decide which one to use. However for the purposes of replicating Surtaru's test with the same apps. Quoting myself:
Pretty close! ;) So close it doesn't really matter anymore.
Point was to show convergence of times with better memory subsystem, where there was a big difference on Sutaru's....so it does that successfully ;).
Of course I don't use either of these apps, but the 64 bit versions where the story is very slightly different again: and variation is more like 1-2%, which becomes a bit more repeatable & a tiny bit more worthwhile.
FWIW, we've all seen SOOO many of the ssse3 and sse4.1 tests that bounce back and forth in range of .5% to 2-3% on one side or the other.
Earlier tests in 2007/08' were run on XP and Vista hosts etc.
These tests are now Win7.
I think we saw sse4.1 being favored on Jason's big-cache dual core Penryns & Mark's quad core rigs in cases where he had big cache + combo of low latency & high speed memory. Tests with small cache quad core Q8200/8300 etc. showed ssse3x quicker. Made sense since builds w/ entry to mid-level Q-series likely tend towards more budget-friendly MBs & DDR800 sticks in majority of cases. So, a triple whammy of core contention, slower bus and slower memory.
With a variance range of only .5 to 3%, OS differences between XP, Vista and Win7, number of and kind of background services and programs running, and even BOINC disk write interval settings can swing the results on both sides of this range for identical HW rigs.
So, even *if* there *is* a fractional advantage between instruction set builds, real world performance variance in HW and OS set-ups is likely a greater variance than the instruction set difference.
To quote Bugs Bunny, still looks like splitting Hares to me, lol. :P
-
To quote Bugs Bunny, still looks like splitting Hares to me, lol. :P
One favourite hair splitting trick I used back in the day, was running the same build against itself. That gives some idea of the run to run variation, but more importantly shows if something on the system is pinching cycles periodically. Wide run to run variation spotted by careful hair splitting can be a good locator of system issues & possible tuning, but, if the sources of variation can't be isolated & rectified, it tends to lead to 'grasping at straws'.
-
Another thing to draw out of this conversation is the important distinction between "information" and "advice" - applies everywhere, of course, but let's keep it to computing.
Information is always helpful, useful and welcome - "My hosts run slightly quicker with the SSSE3x version"
Advice - "You should install such-and-such a version on your host" - is much more problematic, and requires much more knowledge about the other user's situation, needs and wants than we readily have access to.
Many times down the years, I've given users information - "you could do SETI work more quickly and reliably if you installed an optimised application" - and they've considered it and decided not to: perhaps lack of confidence, perhaps difficulty re-gaining access to the machine when an update is required, whatever. But a considered and reasonable decision. It would have been quite wrong to "advise" such a user even to install an opti app, let alone to presume to choose the best one on their behalf.
-
Just starting a run with x32h. Managed to trash my cache of work on this machine. It has a GTS250 installed and can be seen at http://setiathome.berkeley.edu/results.php?hostid=5400417
Cheers,
MarkJ
edit: Just re-read 1st message in this thread and Jason wants us to concentrate on x32f cuda 3.0 build. So i'll let it finish off what its got and then change over.