Seti@Home optimized science apps and information
Optimized Seti@Home apps => Windows => Topic started by: Raistmer on 18 Feb 2009, 05:24:29 am
-
to test performance of AK_v8b SSE3 x64 build.
If you have SSE3-only Intel CPU (that is P4, not Core2 or Core ), x64 Windows OS and willing to run some performance test, please, send me PM.
-
I can do so!
Machine-Typ:
DELL Precision 690
Dual XEON X5355
16GB FBDIMM
Windows VISTA Business x64
BOINC Manager V6.6.0 x64
Actual Seti-Clients:
ap_5.00r103_SSE3
MB_6.08_mod_VLAR_kill_CUDA
Other Projects:
Einstein@home
Rosetta
Do you need other Informations?
[edit]
I see, you need NOT Core or Core 2!
The X5355 based on Core 2 ...
My false!
-
I attached test case to this post.
Please, extract archive and run Knabench151.cmd file. If BOINC installed as service it will be automatically stopped and restarted after test. Else, please, don't forget to stop BOINC.
After run will be completed, post txt file from TestDatas directory (as attachment).
[attachment deleted by admin]
-
finished on the PhenomII 940 see attached file Hope it helps going to do my x2 now
[attachment deleted by admin]
-
finished on the PhenomII 940 see attached file Hope it helps going to do my x2 now
Quick timetable
WU : PG0009.wu
AK_v8b_win_SSE3_AMD.exe : 362.125 secs CPU
AK_v8b_win_SSE3_AMD.exe : 361.938 secs CPU
Speedup : 0.05%
Ratio : 1.00 x
AK_v8b_win_x64_SSE3.exe : 346.063 secs CPU
Speedup : 4.44%
Ratio : 1.05 x
WU : PG0395.wu
AK_v8b_win_SSE3_AMD.exe : 367.156 secs CPU
AK_v8b_win_SSE3_AMD.exe : 367.188 secs CPU
Speedup : -0.01%
Ratio : 1.00 x
AK_v8b_win_x64_SSE3.exe : 359.422 secs CPU
Speedup : 2.11%
Ratio : 1.02 x
WU : PG0444.wu
AK_v8b_win_SSE3_AMD.exe : 300.484 secs CPU
AK_v8b_win_SSE3_AMD.exe : 301.219 secs CPU
Speedup : -0.24%
Ratio : 1.00 x
AK_v8b_win_x64_SSE3.exe : 304.750 secs CPU
Speedup : -1.42%
Ratio : 0.99 x
WU : PG1327.wu
AK_v8b_win_SSE3_AMD.exe : 243.000 secs CPU
AK_v8b_win_SSE3_AMD.exe : 241.984 secs CPU
Speedup : 0.42%
Ratio : 1.00 x
AK_v8b_win_x64_SSE3.exe : 273.828 secs CPU
Speedup : -12.69%
Ratio : 0.89 x
The same picture - big performance fall on VHAR...
-
I'll get my X2 AMD sent in tonight when I get home from work
-
Here are my result
[attachment deleted by admin]
-
LoL, and that one shows VHAR Boost :o
-
And i do not stops my normal work!
Only BOINC was closed.
No idea why on XEON VHAR gets boost ...
-
That's weird... The same app and >3% difference...
WU : PG0444.wu
AK_v8b_win_SSE3_AMD.exe : 367.211 secs CPU
AK_v8b_win_SSE3_AMD.exe : 378.614 secs CPU
Speedup : -3.11%
Ratio : 0.97 x
-
I saw that ... I suspect those dodgy OS delayed writes again. I may be wrong about that, but haven't thought of a completely foolproof way to eliminate that variation ... perhaps a custom copy command that issues a flush before close of the destination file ?
-
Well, it's CPU time now, not elapsed one.....
-
Hmmm, not even wisdom generation to consider ... I wonder, will think about some possible tests to see if I can identify such discrepancy on my system or not.
-
and heres my x2 data
[attachment deleted by admin]
-
Ok, thanks.
All AMD results confirm that this SSE3 x64 build not for AMD hosts - too slow on VHARs.
-
Here is my results :)
Hope its meaningful info Raistmer.
By watching the cmd window, looks like the 64 bit gave a bit of a boost.
[attachment deleted by admin]
-
Here is my results :)
Hope its meaningful info Raistmer.
By watching the cmd window, looks like the 64 bit gave a bit of a boost.
Thanks!
WU : PG0009.wu
AK_v8b_win_SSE3_AMD.exe : 677.688 secs CPU
AK_v8b_win_SSE3_AMD.exe : 677.328 secs CPU
Speedup : 0.05%
Ratio : 1.00 x
AK_v8b_win_x64_SSE3.exe : 635.703 secs CPU
Speedup : 6.20%
Ratio : 1.07 x
WU : PG0395.wu
AK_v8b_win_SSE3_AMD.exe : 642.125 secs CPU
AK_v8b_win_SSE3_AMD.exe : 641.719 secs CPU
Speedup : 0.06%
Ratio : 1.00 x
AK_v8b_win_x64_SSE3.exe : 578.188 secs CPU
Speedup : 9.96%
Ratio : 1.11 x
WU : PG0444.wu
AK_v8b_win_SSE3_AMD.exe : 518.734 secs CPU
AK_v8b_win_SSE3_AMD.exe : 519.922 secs CPU
Speedup : -0.23%
Ratio : 1.00 x
AK_v8b_win_x64_SSE3.exe : 471.625 secs CPU
Speedup : 9.08%
Ratio : 1.10 x
WU : PG1327.wu
AK_v8b_win_SSE3_AMD.exe : 406.172 secs CPU
AK_v8b_win_SSE3_AMD.exe : 404.938 secs CPU
Speedup : 0.30%
Ratio : 1.00 x
AK_v8b_win_x64_SSE3.exe : 358.891 secs CPU
Speedup : 11.64%
Ratio : 1.13 x
That test package was mostly for AMD, I will prepare another one with more P4 oriented apps. It seems this SSE3 x64 can be competitive for P4.
-
@Mike O
Could you run bench, attached to this post too.
Here all SSE3 apps suitable for SSE3 P4 with x64 Win OS.
[attachment deleted by admin]
-
@Mike O
Could you run bench, attached to this post too.
Here all SSE3 apps suitable for SSE3 P4 with x64 Win OS.
Sure.. Sorry for the delay.. lots of hours at work.
will post with results later today :)
-
here is the intel 6 sses results.
[attachment deleted by admin]
-
here is the intel 6 sses results.
Thanx a lot!
Quick timetable
WU : PG0009.wu
AK_v8_win_x64_SSE3.exe : 657.781 secs CPU
AK_v8_win_SSE3.exe : 688.578 secs CPU
Speedup : -4.68%
Ratio : 0.96 x
AK_v8_win_x64_SSE3.exe : 644.281 secs CPU
Speedup : 2.05%
Ratio : 1.02 x
AK_v8b_win_SSE3.exe : 664.688 secs CPU
Speedup : -1.05%
Ratio : 0.99 x
AK_v8b_win_x64_SSE3.exe : 621.875 secs CPU
Speedup : 5.46%
Ratio : 1.06 x
WU : PG0395.wu
AK_v8_win_x64_SSE3.exe : 571.938 secs CPU
AK_v8_win_SSE3.exe : 620.547 secs CPU
Speedup : -8.50%
Ratio : 0.92 x
AK_v8_win_x64_SSE3.exe : 571.938 secs CPU
Speedup : 0.00%
Ratio : 1.00 x
AK_v8b_win_SSE3.exe : 619.797 secs CPU
Speedup : -8.37%
Ratio : 0.92 x
AK_v8b_win_x64_SSE3.exe : 575.672 secs CPU
Speedup : -0.65%
Ratio : 0.99 x
WU : PG0444.wu
AK_v8_win_x64_SSE3.exe : 462.797 secs CPU
AK_v8_win_SSE3.exe : 499.047 secs CPU
Speedup : -7.83%
Ratio : 0.93 x
AK_v8_win_x64_SSE3.exe : 462.953 secs CPU
Speedup : -0.03%
Ratio : 1.00 x
AK_v8b_win_SSE3.exe : 497.063 secs CPU
Speedup : -7.40%
Ratio : 0.93 x
AK_v8b_win_x64_SSE3.exe : 465.938 secs CPU
Speedup : -0.68%
Ratio : 0.99 x
WU : PG1327.wu
AK_v8_win_x64_SSE3.exe : 354.813 secs CPU
AK_v8_win_SSE3.exe : 370.125 secs CPU
Speedup : -4.32%
Ratio : 0.96 x
AK_v8_win_x64_SSE3.exe : 354.688 secs CPU
Speedup : 0.04%
Ratio : 1.00 x
AK_v8b_win_SSE3.exe : 370.188 secs CPU
Speedup : -4.33%
Ratio : 0.96 x
AK_v8b_win_x64_SSE3.exe : 357.859 secs CPU
Speedup : -0.86%
Ratio : 0.99 x
So, x64 SSE3 build faster than x86 one on Intel's CPUs and V8b is viable build too.
Is it possible to meet host with Intel SSE3-only CPU but with CUDA capable GPU card? I.e. can these CPUs be inserted into motherboard with PCI-Express slot ?
-
Is it possible to meet host with Intel SSE3-only CPU but with CUDA capable GPU card? I.e. can these CPUs be inserted into motherboard with PCI-Express slot ?
Yes but only x86 is *likely* though x64 possible. My p4 SSE3 is running cuda app now installer with the installer, and 9600GSO. putting x64 on that one would be a waste, because the board will take a Q6600, which would be a better upgrade, then be worth it ... but no longer fit the profile of SSE3 only.
-
Is it possible to meet host with Intel SSE3-only CPU but with CUDA capable GPU card? I.e. can these CPUs be inserted into motherboard with PCI-Express slot ?
Yes but only x86 is *likely* though x64 possible. My p4 SSE3 is running cuda app now installer with the installer, and 9600GSO. putting x64 on that one would be a waste, because the board will take a Q6600, which would be a better upgrade, then be worth it ... but no longer fit the profile of SSE3 only.
Ok, thanks.
There could be obsolette P4 SSE3 but with x64 OS - if corporation has corporate Windows licanses or MSDNAA for example, but have no money to upgrade hardware. Very common case to me indeed ;D
-
@Mike O
Could you run one more test case, please.
This one has Intel-specific build that can be slightly more faster (I hope) htan Intel/AMD generic one.
[attachment deleted by admin]
-
OK. One down. The first of the Irwindales.
[attachment deleted by admin]
-
The second of the Irwindales...
[attachment deleted by admin]
-
And lastly the Dempsey...
[attachment deleted by admin]
-
OK. Those three were with the AMD version I found earliest in the thread. Now I've grabbed the later Intel specific version and am launching those runs.
-
Ok, all 3 test show that Xeons don't like AMD build. Second Xeon core called Nocona. What difference between Nocona and Irwindale?
-
Not a lot. but could make some diffeerence. I beleive Irwindale has twice the size L2 cache than Nocona, and some attempts to reduce the idle power usage.
(I prefer Nescafe Blend 43 over Nocona anyway :P)
-
Here's the Dempsey with all 4 Intel test apps.
[attachment deleted by admin]
-
Here's Irwindale #2 again with all 4 Intel apps.
[attachment deleted by admin]
-
And lastly the other Irwindale.
[attachment deleted by admin]
-
Not a lot. but could make some diffeerence. I beleive Irwindale has twice the size L2 cache than Nocona, and some attempts to reduce the idle power usage.
(I prefer Nescafe Blend 43 over Nocona anyway :P)
OK. Actually, cpuz reports the Irwindale has double the L2 cache of the Nocona. But to complicate matters you reminded me that that one system actually has one Irwindale and one Nocona CPU in it. It was upgraded to Dual Processor last fall and HP sent the Nocona as the appropriate upgrade. Good luck identifying which of the processors actually ran the test. :-)
I could pull the Irwindale later today and get a positive test on just the Nocona if it would help.
-
Good luck identifying which of the processors actually ran the test. :-)
If you didn't use affinity lock they both could participate. Just CPU-Z queried one of them.
-
Ok, thanks.
There is no speedup for Intel-specific build on these XEONs.
But both x64 AK_v8b builds can be treated as optimal ones for this moment.
-
@Mike O
Could you run one more test case, please.
This one has Intel-specific build that can be slightly more faster (I hope) htan Intel/AMD generic one.
You got it :) I will run this test now.. Sorry im behind here.. I work a lot.. to much it seems lately.
I am doing my best to let these run with NOTHING else running.. all apps shut down including boinc.. and the core clients.
Will post tomorrow.
-
@Mike O
Could you run one more test case, please.
This one has Intel-specific build that can be slightly more faster (I hope) htan Intel/AMD generic one.
You got it :) I will run this test now.. Sorry im behind here.. I work a lot.. to much it seems lately.
I am doing my best to let these run with NOTHING else running.. all apps shut down including boinc.. and the core clients.
Will post tomorrow.
here ya go :)
I have a P4 640 3.2ghz 2M/800fsb Its PGA775 and the mobo has a pic-e with a 9800 gt 1gig ram.
I have a 3.4 Pentium D dual on its way to replace this CPU. It too supports EM64T(INTEL 64) as the 640 does
[attachment deleted by admin]
-
@Mike O
Could you run one more test case, please.
This one has Intel-specific build that can be slightly more faster (I hope) htan Intel/AMD generic one.
You got it :) I will run this test now.. Sorry im behind here.. I work a lot.. to much it seems lately.
I am doing my best to let these run with NOTHING else running.. all apps shut down including boinc.. and the core clients.
Will post tomorrow.
here ya go :)
I have a P4 640 3.2ghz 2M/800fsb Its PGA775 and the mobo has a pic-e with a 9800 gt 1gig ram.
I have a 3.4 Pentium D dual on its way to replace this CPU. It too supports EM64T(INTEL 64) as the 640 does
Ok, thanks. It seems Prescott didn't feel any difference between Intel-specific and general build.
-
Thanks for your work on this Apps and the CUDA CPU/GPU team mod. A very nice job on that. Hope some more of the bugs are fixed but even now.. Im like'in the results.
So.. my prescott isnt any faster running the 64bit? That really doesn't surprise me. This chip has just enough 64 bit to get by. Im pretty sure the later CPUs are much better dealing with 64 bit than this old p4. Mabey the Pentium D will score better.
When its installed. I will re-run the tests again.
-
Thanks for your work on this Apps and the CUDA CPU/GPU team mod. A very nice job on that. Hope some more of the bugs are fixed but even now.. Im like'in the results.
So.. my prescott isnt any faster running the 64bit? That really doesn't surprise me. This chip has just enough 64 bit to get by. Im pretty sure the later CPUs are much better dealing with 64 bit than this old p4. Mabey the Pentium D will score better.
When its installed. I will re-run the tests again.
Why? It's faster with x64 app. It just didn't feel difference between 2 x64 apps. But x64 SSE3 the best for him.
-
My guess is better kernel driver probably is a part of it (no legacy code or pointer address space conversion to go through either). Win32 drivers have a lot of legacy code in them to handle many diverse setups. 64 bit drivers can generally be leaner, especially WDM ones if available, and can assume the availability of SSE & SSE2, which means fast streaming loads, cacheability instructions and and non-temporal stores can be used.
-
So.. is there a team V8 64 bit for only SSES? One that works with CPU/GPU mod you did?
Also.. Is there a 64bit for the SSSES CPUs that will work with the team mod?
-
So.. is there a team V8 64 bit for only SSES? One that works with CPU/GPU mod you did?
Also.. Is there a 64bit for the SSSES CPUs that will work with the team mod?
For normal AK_v8 (CPU) 64 bit SSE3 or higher only (because of library limitations, No Intel 64 bit SSE2-only CPUs were ever made AFAIK)
For Team mod (CPU+GPU), that requires the special builds of Raistmer's to function, so you'd have to check Raistmer's thread on his latest packages.
Jason
-
So.. is there a team V8 64 bit for only SSES? One that works with CPU/GPU mod you did?
Also.. Is there a 64bit for the SSSES CPUs that will work with the team mod?
The question is who need such builds ;)
What host config you have to want SSE-only build for using with CUDA-enabled GPU ?? I though such configs just don't exist.
There is V10a for SSE3 x64 and SSSE3 x64 already, look corresponding thread.