Seti@Home optimized science apps and information
Optimized Seti@Home apps => Windows => Topic started by: Simon on 19 Nov 2006, 09:19:51 pm
-
Hi folks,
an updated version of the KWSN Test & Benchmark Tool has been released.
KWSN Test & Benchmark Tool with Auto-Installer - apps without graphics (1.75 MB compressed - 49.2 MB uncompressed - 7z rocks).
]KWSN Test & Benchmark Tool with Auto-Installer - apps with graphics (1.82 MB compressed - 43.3 MB uncompressed).
The apps containing graphics are minimally slower than the ones without - your choice, either eye candy or raw speed.
Double-click the downloaded exe file and unpack its contents. Then run the file with the same name inside the directory it creates.
This version now contains all released Rev 2.0 apps as well as the 1.41 Core 2 one. It also has a slider to select between shorter run time with less accurate results or longer runtime with more accurate ones.
By default, "medium" is selected. On an Athlon64 3500+, "short" took about 4 minutes, "medium" took about 10 minutes and "long" took about 25 minutes.
On hosts that are slower than 1 GHz, "long" is not recommended - even "medium" may take a while. Hint: if the run results are very close, you may want to up it a notch towards accuracy to help you decide. The few extra minutes you spend here will give you days or more in the long run.
Regards,
Simon.
-
Greetings.
Simon, thank you for your work on the optimizer.
Is there a chance you can port this application to Linux? If not, I'll be glad to work with you and develop a shell script which will have the same function on Linux machines.
michael37
-
Hmm am i doing something wrong?!
I try to run the executable and i only got unable to initiate the program correctly.. 0xc0000135.. Click OK to end the program..
Remarks, this text is written on Swedish.. so i'm trying to translate it to english so some words could be used differently in the english session..
Kind Regards Vyper
-
Vyper,
looks like you need Microsoft .NET 1.1 (http://www.microsoft.com/downloads/details.aspx?FamilyID=262d25e3-f589-4842-8157-034d1e7cf3a3&displaylang=en) :)
Michael,
no, right now I hadn't had a Linux version planned, although I'm interested in getting one made - if you can help, I'd be much obliged.
HTH,
Simon.
-
Doh!
Please compile a version that doesn't require .net .. I really don't like to flood my OS with various addons that isn't really necessary :D
And if you don't , then bugger for me . lol .
Kind Regards Vyper
-
I'm confused. ::)
Having read the posts about the 2.0 apps, I came to the conclusion, that the generic SSE2 app, would be the fastest for my Athlon 64 X2 3800+ @2564Mhz.
But using Iccpatch on the Intel only SSE2_P4 file, and running the test in most accurate mode, reveals that the Intel "only" app is 8 seconds faster than the generic client. In medium mode SSE2_P4 is 4 seconds faster, and fastest mode its 1 sec faster.
Now 8 seconds doesn't sound of much, but multiply it with average runtimes, and it could be minutes of crunching time saved pr. WU, and in the long run several more WU's crunched by my system pr. week. :o
I'm surprised this passed the developement and probably lots of tests, without being found out about, and consequently making _sure_ the generic app was the absolutely fastest app on Athlon64 systems.
-
KarVi,
don't assume we don't know that ;)
Thing is, the legal situation on ICCPatch is kind of unclear. I do not want to invalidate my expensive ICC/IPP licenses, so I'm not offering prepatched apps.
Comprende? :)
Simon.
-
Allthough my spanish is not all that good, I think I understand. But...
In every single message that I've read about these apps, it has been said that the generic app was the fastest. No hint to that with a little effort you could get even faster crunching. In my view thats borderline lying (don't take offense, its not ment as harsh, but i lack other words as english is not my motherlanguage)....
I fully understand the problems surrounding your license, and off course you have to protect your license from being rewoked.
But letting a hint, and perhaps a link to an explanation and the application to patch the executable yourself (its easy), could be done without endangering the license. As long as you dont distribute the patched executeables you should be safe.
In my view it says a lot about Intel, that they would even use such a threat to hamper the competiitions performance, but thats another discussion.
-
In the past,
I've posted about this topic a few times already. Still, right now there is no obvious hint about this - but enterprising people like you find out anyway ;)
Just goes to show you: people are resourceful...
In any case, for everyone:
http://www.swallowtail.org/naughty-intel.html
You can find more information on this subject here.
HTH,
Simon.
-
Simon,
Did you write the tool in "Auto-It"?
-
Nope.
I took the sources I had already, modified as necessary and recompiled (VB .NET).
A test/bench/install platform in auto-it would be the next step, but I haven't had time to get into it yet.
Regards,
Simon.
-
If someone were very enterprising...and curious about such things...
They could observe the 2.0 crunchers in action and note that the code checks for the specific presence of an Intel processor four times in each of the intel specific .exe files. The first 4 occurences of 'Genu' 'ineI' 'ntel' in the source code are solely to limit code to Intel only. Any following occurences are for my CPUID code when identifying CPU. Just for curiosity sake.
You might also notice that Simon neglected to compute an overall checksum over the entire EXE allowing naughty end users to potentially change the code and still have it run. Naughty Simon!
-
Naughty Simon!
Damn straight! ;D
-
I've installed the .Net Framework 1.1 on my laptop, but when i run the test tool, it still prompt to require the .Net Framework 2.0.50727.
Vyper,
looks like you need Microsoft .NET 1.1 (http://www.microsoft.com/downloads/details.aspx?FamilyID=262d25e3-f589-4842-8157-034d1e7cf3a3&displaylang=en) :)
Michael,
no, right now I hadn't had a Linux version planned, although I'm interested in getting one made - if you can help, I'd be much obliged.
HTH,
Simon.
>:(
-
If someone were very enterprising...and curious about such things...
They could observe the 2.0 crunchers in action and note that the code checks for the specific presence of an Intel processor four times in each of the intel specific .exe files. The first 4 occurences of 'Genu' 'ineI' 'ntel' in the source code are solely to limit code to Intel only. Any following occurences are for my CPUID code when identifying CPU. Just for curiosity sake.
You might also notice that Simon neglected to compute an overall checksum over the entire EXE allowing naughty end users to potentially change the code and still have it run. Naughty Simon!
And if someone should try to run the potentially changed code on an Opteron 265 they would notice that the generic SSE2 client is faster than the potentially changed one. ;)
A.
-
Arnulf:
Are you sure about that?
My processor is only a x2 3800+, and has only 512kB cache pr core, where the Opteron has 1Mb, but according to my test the modified Intel only core, was faster on my AMD64.
And returned results seem to indicate the same.
Some results for comparison (notice the Qxn versus Qxb):
The Intel "only" core on my A64:
*** CUT START***
CPU time 7928.3125
stderr out
<core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
Optimized SETI@Home Enhanced application
Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
Version: Windows SSE2 32-bit 'Ni!' based on seti V5.15 'Chicken Good!'
Rev: (R-2.0|QxN|FFT:IPP_SSE2|Ben-Joe)
CPUID: 'AMD K8 Athlon 64 X2 (Toledo)'
cpus: 1 cores: 2 threads: 1 cache: L1=64K L2=512K L3=0K
features: mmx 3Dnow 3Dnow+ sse sse2 sse3
speed: 2564 MHz -- read megs/sec: L1=14068, L2=7206, RAM=3067
Work Unit Info
True angle range: 0.421187
Spikes Pulses Triplets Gaussians Flops
0 2 2 4 16726219691776
</stderr_txt>
Validate state Valid
Claimed credit 64.8528194067711
***CUT END***
The generic client on my A64:
***CUT START***
CPU time 8045.375
stderr out
<core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
Optimized Windows SETI@Home Enhanced application
Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
Version: Windows SSE2 32-bit based on seti V5.15 'Chicken Good!'
Rev: (R-2.0|QxB|FFT:IPP_SSE2|Ben-Joe|)
CPUID: 'AMD K8 Athlon 64 X2 (Toledo)'
cpus: 1 cores: 2 threads: 1 cache: L1=64K L2=512K L3=0K
features: mmx 3Dnow 3Dnow+ sse sse2 sse3
speed: 2564 MHz -- read megs/sec: L1=14063, L2=7058, RAM=3159
Work Unit Info
True angle range: 0.426463
Spikes Pulses Triplets Gaussians Flops
2 2 4 2 16092869927990
</stderr_txt>
Validate state Valid
Claimed credit 62.3971229846839
***CUT END***
There's off course nothing proven by displaying two results, but with the results my system have finished since the patch, the Intel client seems about a 100 to 200 secs faster on 60+ credit WU's. And a _little faster on other WU's, but of course not as much, since they have shorter runtimes.
-
Some food for thought between Intel chip revisions, the A1 was bought about 4 months before the B2 one. Both are on the same MB (Abit AB9 pro) and have the same BIOS version as each other and the same memory modules at the same settings.
Each test was run 3x and then avg out.
C2D E6600 Conroe B2 revision/step
Medium
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for 124 seconds
Long
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for 226 seconds
C2D E6600 Conroe A1 revision/step
Medium
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for 135 seconds
Long
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for 238 seconds
Thats some serious performance increase between one stepping and another Intel, allowing for flucuations thats still nearly 8% gain. :)
-
Hi KarVi!
This is clipped from one of my results, and one thing that springs to mind is that I'm running the 5.5.0 app while you are running the .tx36 calibrating app - I don't know if that may be the cause?
As you can see below, the app. chooses various approaches for the different parts of the analysis.
I patched the Intel versions and the renamed them to replace the generic versions,
then I compared the different runs in the "KWSN - CPU Test & Benchmark Tool V2"
The generic SSE2 were the fastest on my system, in all of the three versions of the test.
But I will re-run the tests just to be sure an report back to you! :P
Update!
Having re-run the tests with potentially changed clients I have to agree with you, the Intel is the fastest.
And I'm changing to the fastest one now. .... :D
--------------------------------------------------------------------------------------------------------------------
Starting tests. This will take a few minutes, please be patient!
Testing setiathome-kwsn-ssse3-c2-v141.exe...does not work on your system!
Testing SaH_5.15_KWSN_SSE3_Ben-Joe_2.0_B.exe...ran for 624 seconds
Testing SaH_5.15_KWSN_SSE2-Intel_Ben-Joe_2.0_B.exe...ran for 606 seconds
Testing SaH_5.15_KWSN_SSE2-PM_Ben-Joe_2.0_B.exe...ran for 628 seconds
Testing SaH_5.15_KWSN_SSE2_generic_Ben-Joe_2.0_B.exe...ran for 630 seconds
Skipping other apps - SSE2 is quicker than SSE if supported.
Finished with test run!
--------------------------------------------------------------------------------------------------------------------------
Arnulf
-------------------------------------------------------------------------------------------------------------------------
<core_client_version>5.5.0</core_client_version>
<stderr_txt>
ChirpData--[ak's_sse3_chirp]: 10199721 (chosen)
GetPeak--[hand_opt]: 11601 (chosen)
f_sum--[hand_sse]: 37522 (chosen)
GetChiSq--[hoisted+abs(]: 40813 (chosen)
IPP FFT SSE2(64K)[original]: 4788086 (chosen)
Bench Time: 0.42 seconds
work_len=1048576
Optimized Windows SETI@Home Enhanced application
Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
Version: Windows SSE2 32-bit based on seti V5.15 'Chicken Good!'
Rev: (R-2.0|QxB|FFT:IPP_SSE2|Ben-Joe|)
CPUID: 'AMD K8 Opteron DC 2 (Italy)'
cpus: 2 cores: 2 threads: 1 cache: L1=64K L2=1024K L3=0K
features: mmx 3Dnow 3Dnow+ sse sse2 sse3
speed: 1799 MHz -- read megs/sec: L1=9868, L2=4944, RAM=2325
Work Unit Info
True angle range: 0.624283
Spikes Pulses Triplets Gaussians Flops
1 1 2 0 12471831233146
</stderr_txt>
-
Arnulf:
Allthough I'm using the tx36 calibrating client, all calibration is turned off. The only feature I use is its ability, to set processor affinity, so that each cruncher stays on each own CPU-core (normally they are working on both to various degrees). According to what I've read, its a little faster to set the affinity, the gains should be at about 1-2%.
Every little bit counts.
Glad to see you came to the same conclusion, that the Intel "only" rev is the fastest.
Intel claims that the Intel "only" files will use special hidden registers that are only present in Intel chips, but still they run flawless on AMD systems. I find it to be extremely bad behaviour to restrict the competition in such a way, allthough Intell is probably in their rights to do it (who knows).
It seems that the larger cache in your Opteron benefits even more than my X2 3800+. My difference was only 8 secs, yours is 24 secs, and even SSE3 is faster, which it is not on my system.
-
About the ICCpatched exes - it's funny, in some tests they're quicker, in some they're not. My opinion on the issue is still not set in stone.
[...]
Thats some serious performance increase between one stepping and another Intel, allowing for flucuations thats still nearly 8% gain. :)
Nice, didn't know that the different steppings were *that* much apart in performance...
Wonder what my Woodcrest is, will have to check with CPU-Z.
Thanks,
Simon.
-
Simon:
On my machine the results are conclusive, and repeatable.
Every time I've run the test, the Intel "only" version was faster.
Off course things can change with other WU's, different number of spikes, gaussians, triplets, the angle range, you name it.
But until anything else is proven, I stick with what my tests tell me.
-
KarVi,
I think it's also pretty model-dependent. For example, on more recent (X2+) AMD CPUs, it seems the Intel-only patched apps are quicker. On my A64 3500+ single core (S939, 512 cache), last time I tested they were slower.
So it's not really such a clear picture for everybody, but like you said: go with what your tests tell you.
I'm currently running some more tests on that A64 system and will put the app on BOINC for a few days to compare run times.
<edit>
Short: same runtime
Medium: SSE2-generic was 6 seconds faster
Long: SSE2-generic was 10 seconds faster
Guess I won't be putting it on BOINC then ;) Like I said, very model-dependent...
</edit>
Regards,
Simon.
-
Simon:
What rev. is your A64?
I seem to recall something about AMD making some small optimizations to the core, when they added SSE3 ability, and that they pinpointed SIMD and FPU execution in the optimization.
Perhaps this makes a difference? Off-course if your A64 has SSE3, then thats not the case :-)
But if its the case, recomendations could be narrowed down to that if the chip supports SSE3, then the Intel "only" version is recommended (for anyone who is able to patch it).
-
Patch? --- Patch?...what is this Patch thing you refer to Sir? ::)
-
Its possible to modify the Intel "only" versions of the aplications, so that they can run on AMD chips. Often even faster than the generic SSE2 version.
But this requires that one knows specifically what to change, or that you have at little patch program or script, which changes the code for you.
Thats the patch I'm talking about.
-
Edited away because of stupidity of me.. :-[ Sorry
-
Vyper,
thanks...but, you might have gone to page 1 of this thread, where I posted the URL (http://lunatics.at/index.php/topic,125.msg1760.html#msg1760) already.
Anyway, again (not meant personally, goes for everyone) -
We're aware of the situation, we have the relevant information, we've tested things extensively. We're however also slowly getting a bit exasperated with all the hubbub about patching or not.
My final words: I'm not going to put up patched executables, but in no way am I keeping you from doing what you have to.
Now, please, can we get back to more interesting topics? Also, please can everyone make sure to read the WHOLE thread? ;)
Regards,
Simon.
-
Hi,
I've run the v2 benchmark on a dual Intel PIII 1GHz (MMX and SSE-capable) system w/ Win2000 and the not supported apps are causing the benchmark application to crash (that is the SSE2 and SSE3 apps).
As a simple workaround I've replaced the not supported apps with an empty textfile renamed to the .exe filenames (while I was at it, I've also replaced one of the not supported apps with the standard Berkeley client for comparison ;D ).
As a suggestion: Is it possible to maybe include a button in the next version of your benchmark tool to disable one or more of the apps, before making the benchmark run?
Anyway, thx a lot for the tool. 8)
Regards
Alex
-
I just would like to add that the potentially modified SSE2_Intel app have taken the RAC of my Opteron rig from 1250 to 1550 the last month.
Arnulf ;D