+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: Some performance comparision between x86 and x64 Windows-based apps  (Read 12399 times)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Hi all  :)
I did some tests with WU-1 from reference WUs (01mr99ab.14893.2848.703400.3.151) and recived such table:
KWSN_2.4_SSE2-AMD_MB   Win2003 x86   0:06:22
KWSN_2.4_SSE_MB   Win2003 x86      0:07:04
KWSN_2.4_MMX_MB   Win2003 x86      0:08:38
KWSN_2.4_SSE_MB x86   Win2003 x64   0:07:02
KWSN_2.4_MMX_MB x86   Win2003 x64   0:08:38
KWSN_2.4_SSE2-AMD_MB x86   Win2003 x64   0:06:11
KWSN_2.4_SSE2_IPP_Ben-Joe x64 Win2003 x64   0:08:16
KWSN_2.4_SSE2_IPP_Ben-Joe x64 Win2003 x64   0:08:14

It seems that under Win2003 x64 SETI runs slightly faster than under x86 edition (32-bit ones), but 64-bit SETI version slower than best 32-bit.
At least on my AMD Athlon 64 3200+. The best result is aquired with 32-bit SSE2 version under Win2003 64-bit edition.
Maybe someone did such comparisions on another hardware? Please, post your results and comments here.
« Last Edit: 16 Aug 2007, 12:18:43 pm by Raistmer »

Offline Crunch3r

  • Knight who says 'Ni!'
  • *****
  • Posts: 602
    • 64 bit boinc clients
Hi all  :)
I did some tests with WU-1 from reference WUs (01mr99ab.14893.2848.703400.3.151) and recived such table:
KWSN_2.4_SSE2-AMD_MB   Win2003 x86   0:06:22
KWSN_2.4_SSE_MB   Win2003 x86      0:07:04
KWSN_2.4_MMX_MB   Win2003 x86      0:08:38
KWSN_2.4_SSE_MB x86   Win2003 x64   0:07:02
KWSN_2.4_MMX_MB x86   Win2003 x64   0:08:38
KWSN_2.4_SSE2-AMD_MB x86   Win2003 x64   0:06:11
KWSN_2.4_SSE2_IPP_Ben-Joe x64 Win2003 x64   0:08:16
KWSN_2.4_SSE2_IPP_Ben-Joe x64 Win2003 x64   0:08:14

It seems that under Win2003 x64 SETI runs slightly faster than under x86 edition (32-bit ones), but 64-bit SETI version slower than best 32-bit.
At least on my AMD Athlon 64 3200+. The best result is aquired with 32-bit SSE2 version under Win2003 64-bit edition.
Maybe someone did such comparisions on another hardware? Please, post your results and comments here.


IPP for EM64T was never very well optimized for SSE2. If you have an Intel with SSE3, results will look different ;)


I want to share something with you: The three little sentences that will get you through life. Number 1: Cover for me. Number 2: Oh, good idea, Boss! Number 3: It was like that when I got here.

Homer Simpson

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Well, I will try to test Core 2 Duo soon :)
BTW, is it mean that on AMD Athlon 64 with SSE3 support situation will be better too?
« Last Edit: 16 Aug 2007, 12:19:23 pm by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Additional data
« Reply #3 on: 16 Aug 2007, 07:47:03 am »
Well, there is a table for Core 2 Duo 6420
CPUID:
 Intel(R) Core(TM)2 CPU          6420  @ 2.13GHz
     Speed: 2 x 2128 MHz
     Cache: L1=64K L2=4096K
  Features: MMX SSE SSE2 SSE3 x86_64

KWSN_2.4_SSE3-Core2_MB x86   Win2003 x64   0:03:57
KWSN_2.4_SSE3-Core2_MB x86   Win2003 x64   0:03:57
KWSN_2.4_SSE3-Intel-P4_MB x86   Win2003 x64   0:04:03
KWSN_2.4_SSE3-Intel-P4_MB x86   Win2003 x64   0:04:02
KWSN_2.4_SSE2-Intel-PM_MB x86   Win2003 x64   0:04:00
KWSN_2.4_SSE2-Intel-P4_MB x86   Win2003 x64   0:03:58
KWSN_2.4_SSE2-Intel-P4_MB x86   Win2003 x64   0:04:00
KWSN_2.4_SSE2-AMD_MB x86   Win2003 x64   0:03:58
KWSN_2.4_SSE_MB x86   Win2003 x64   0:04:15
KWSN_2.4_MMX_MB x86   Win2003 x64   0:06:07
KWSN_2.4_SSE2_IPP_Ben-Joe x64   Win2003 x64   0:03:45
KWSN_2.4_SSE2_IPP_Ben-Joe x64   Win2003 x64   0:03:45
KWSN_2.4_SSE3_IPP_Ben-Joe x64   Win2003 x64   0:03:49
KWSN_2.4_SSE3_IPP_Ben-Joe x64   Win2003 x64   0:03:50
KWSN_2.4_SSSE3_IPP_Ben-Joe   Win2003 x64   0:03:43
KWSN_2.4_SSSE3_IPP_Ben-Joe   Win2003 x64   0:03:44

Only 64-bit OS here. 64-bit app really runs better! But the difference between SSE2 and best of SSE3 is in error range. No promised SSE3 gain ;)
(the same WU was used as in first post)
P.S. two other test WUs show the same - 64-bit SSE2 and SSE3 conro-optimized (SSSE3) the best ones (results attached).

[attachment deleted by admin]
« Last Edit: 16 Aug 2007, 12:42:24 pm by Raistmer »

Offline michael37

  • Knight o' The Round Table
  • ***
  • Posts: 137
I am seeing very good performance with Windows 2.4 for SSSE3 64-bit. 

Code: [Select]
<stderr_txt>
Optimized SETI@Home Enhanced application
Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
   Version: Windows SSSE3 64-bit based on S@H V5.15  'Noo? No - Ni!'
  Revision: R-2.4|xT|FFT:IPP_SSSE3|Ben-Joe
     CPUID: Intel(R) Xeon(R) CPU            5150  @ 2.66GHz
     Speed: 2 x 2659 MHz
     Cache: L1=64K L2=4096K
  Features: MMX SSE SSE2 SSE3 x86_64
 
Work Unit Info
True angle range:  0.406102

Spikes Pulses Triplets Gaussians Flops
   4      0       0        0     16402148931791

</stderr_txt>

Offline Crunch3r

  • Knight who says 'Ni!'
  • *****
  • Posts: 602
    • 64 bit boinc clients
I've made some little changes to the windows x64 apps today. Those new ones should be again a bit faster now.
We need to test them a bit and after that, we can relase them.

 ;)
« Last Edit: 17 Aug 2007, 09:01:13 am by Crunch3r »
I want to share something with you: The three little sentences that will get you through life. Number 1: Cover for me. Number 2: Oh, good idea, Boss! Number 3: It was like that when I got here.

Homer Simpson

speedimic

  • Guest
And what about Windows compared to Linux?
Did someone run run any tests in that direction?

mic.

Offline Crunch3r

  • Knight who says 'Ni!'
  • *****
  • Posts: 602
    • 64 bit boinc clients
And what about Windows compared to Linux?
Did someone run run any tests in that direction?

mic.

Not that i'm aware off. (can't run a reliable test myself cuz i only got the 32/64 bit linux in vmware)

The only statement i can make is that the 64 bit linux app runs faster than the 32 bit one on intel.
Same for the windows x64 comared to the windows 32 bit one on intel (as you can see in Raistmers post).

If you have the oportunety to dual boot win/linux (32 or 64 bit) and do a bench test, it would be interesting to see.

Though one has to keep in mind that the windows and linux code is not identical in some ways.
For instance linux uses a splitted optimized SSE code for FP rounding/chirping etc...



I want to share something with you: The three little sentences that will get you through life. Number 1: Cover for me. Number 2: Oh, good idea, Boss! Number 3: It was like that when I got here.

Homer Simpson

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Great news! Will await new 64-bit app :)
BTW, here is a "flawless victory" ;) for 64-bit app on real long WU. As one can see both hosts are using optimized application, and my core2 duo @2.13 MHz slightly outperforms more faster
Core2 Quad CPU @ 2.40GHz
http://setiathome.berkeley.edu/workunit.php?wuid=148428035
Both are running Windows.
Some info about comparision of highly optimized app under Windows and Linux would be very interesting. Havent some modern Linux installation to do such tests right now, maybe someone can perform such comparision?

2 michael37 Your listing is probably not for WU-1 cause it contains no spikes AFAIK. Could you post elapsed time for test WU-1 calculation on your system (with different 2.4 apps or with fastets one at leat) ?
« Last Edit: 18 Aug 2007, 02:59:29 pm by Raistmer »

Offline michael37

  • Knight o' The Round Table
  • ***
  • Posts: 137
2 michael37 Your listing is probably not for WU-1 cause it contains no spikes AFAIK. Could you post elapsed time for test WU-1 calculation on your system (with different 2.4 apps or with fastets one at leat) ?
You're right, this is just a random "production" workunit.  My 64-bit Windows servers are remotely management servers where Boinc runs only at night, so the best I can is to drop in a new application, go to sleep, wake up and watch how the application did :) 

I got to turn my new Core 2 Duo laptop into the test computer.  I have 64-bit Linux and 32-bit Windows Vista currently installed.  I need to get 64-bit Windows OS (anyone got a spare version?) on it so I can test the full range of applications on the same hardware...

Offline Crunch3r

  • Knight who says 'Ni!'
  • *****
  • Posts: 602
    • 64 bit boinc clients
Great news! Will await new 64-bit app :)
BTW, here is a "flawless victory" ;) for 64-bit app on real long WU. As one can see both hosts are using optimized application, and my core2 duo @2.13 MHz slightly outperforms more faster
Core2 Quad CPU @ 2.40GHz
http://setiathome.berkeley.edu/workunit.php?wuid=148428035
Both are running Windows.
Some info about comparision of highly optimized app under Windows and Linux would be very interesting. Havent some modern Linux installation to do such tests right now, maybe someone can perform such comparision?

2 michael37 Your listing is probably not for WU-1 cause it contains no spikes AFAIK. Could you post elapsed time for test WU-1 calculation on your system (with different 2.4 apps or with fastets one at leat) ?

Now only thing left is someone to do the math on how much faster that is ... i'm to lazy to do that.... anyone helping me out ?

« Last Edit: 19 Aug 2007, 08:30:28 pm by Crunch3r »
I want to share something with you: The three little sentences that will get you through life. Number 1: Cover for me. Number 2: Oh, good idea, Boss! Number 3: It was like that when I got here.

Homer Simpson

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Some new data
« Reply #11 on: 26 Aug 2007, 08:05:59 am »
Comparision of 2.4 versions from this site and Crunch3r's site (http://calbe.dw70.de/seti.html)
SSE2 AMD version on AMD 64 Winchester, x86 mode:
KWSN_2.4_SSE2-AMD_MB.exe   Win2003 x86   382
KWSN_2.4_MB_SSE2A (Crunch3r)   Win2003 x86   377
(about 2% speedup)

SSSE3 x64 on Core2 Duo , x64 mode:
KWSN_2.4_SSSE3_IPP_Ben-Joe   Win2003 x64   223
KWSN_2.4_SSSE3_IPP_Ben-Joe (Crunch3r)   Win2003 x64   229
slowdown :(
BTW, these 2 apps have identical filenames, but different file sizes.

Quick timetable

WU : testWU-1.wu
KWSN_2.4_SSSE3_IPP_Ben-Joe.exe : 234 seconds
KWSN_2.4_MB_SSE2(Crunch3r).exe : 236 seconds
Speedup: -0.85%, Ratio: 0.99 x
KWSN_2.4_MB_SSE2A(Crunch3r).exe : 228 seconds
Speedup: 2.56%, Ratio: 1.03 x
KWSN_2.4_MB_SSE3(Crunch3r).exe : 227 seconds
Speedup: 2.99%, Ratio: 1.03 x
KWSN_2.4_MB_SSSE3(Crunch3r).exe : 232 seconds
Speedup: 0.85%, Ratio: 1.01 x
KWSN_2.4_SSE2_IPP_Ben-Joe(Crunch3r).exe : 237 seconds
Speedup: -1.28%, Ratio: 0.99 x
KWSN_2.4_SSE3_IPP_Ben-Joe(Crunch3r).exe : 246 seconds
Speedup: -5.13%, Ratio: 0.95 x

Results for the same app with same WU differs 223 and 234 sec, both were obtained many times. And no intermediate variants.
But in single run it seems new x86 SSE3 variant is the best one.

[attachment deleted by admin]
« Last Edit: 26 Aug 2007, 06:16:48 pm by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Athlon 64 3200 (Winchester)
Quick timetable

WU : testWU-1.wu
KWSN_2.4_SSE2-AMD_MB.exe : 380 seconds
KWSN_2.4_MB_SSE2A.exe : 377 seconds
Speedup: 0.79%, Ratio: 1.01 x
KWSN_2.4V_SSE2_MB.exe : 377 seconds
Speedup: 0.79%, Ratio: 1.01 x
No noticable speedup in "V"-version of SSE2 for AMD variant.

Athlon XP 2400
Quick timetable

WU : testWU-1.wu
KWSN_2.4_SSE_MB.exe : 822 seconds
KWSN_2.4_SSE_MB.exe : 824 seconds
Speedup: -0.24%, Ratio: 1.00 x
KWSN_2.4V_SSE_MB.exe : 820 seconds
Speedup: 0.24%, Ratio: 1.00 x
Again, no noticeable speedup from new version
« Last Edit: 03 Sep 2007, 01:06:28 pm by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
AMD Athlon(tm) 64 X2 Dual Core Processor 5400+ Windsor (Win x64)
Quick timetable
 
WU : testWU-1.wu
KWSN_2.4_SSE2-AMD_MB.exe : 278 seconds
KWSN_2.4_MB_SSE2A.exe : 274 seconds
Speedup: 1.44%, Ratio: 1.01 x
KWSN_2.4_SSE2_IPP_Ben-Joe.exe : 376 seconds
Speedup: -35.25%, Ratio: 0.74 x
KWSN_2.4_SSE2-AMD_MB.exe : 277 seconds
Speedup: 0.36%, Ratio: 1.00 x
KWSN_2.4V_SSE2_MB.exe : 276 seconds
Speedup: 0.72%, Ratio: 1.01 x
It seems 64-bit version slower (as on another Athlons with this WU at least) :(


[attachment deleted by admin]

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 52
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 45
Total: 45
Powered by EzPortal