+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: Core2 Duo E7600, SSSE3 instead of SSE4.1  (Read 13930 times)

Sutaru Tsureku

  • Guest
Core2 Duo E7600, SSSE3 instead of SSE4.1
« on: 08 Jun 2010, 04:21:22 am »
I made a bench test with my new (old) Intel Core2 Duo E7600 @ 3.06 GHz / DDR2 800/5-5-5-18 (stock, not OCed).

I had let run two instances of Knabench V1.81r simultaneously - 'one for one CPU-Core' (without CPU-affinity).


AK_v8b_win_SSE41.exe

TaskName: PG0009.wu - 377.734 secs Elapsed - 375.609 secs CPU time
TaskName: PG0395.wu - 383.344 secs Elapsed - 381.188 secs CPU time
TaskName: PG0444.wu - 319.391 secs Elapsed - 317.344 secs CPU time
TaskName: PG1327.wu - 249.844 secs Elapsed - 247.672 secs CPU time

TaskName: PG0009.wu - 379.125 secs Elapsed - 376.953 secs CPU time
TaskName: PG0395.wu - 383.609 secs Elapsed - 381.516 secs CPU time
TaskName: PG0444.wu - 319.828 secs Elapsed - 317.750 secs CPU time
TaskName: PG1327.wu - 250.641 secs Elapsed - 248.547 secs CPU time



AK_v8b_win_SSSE3x.exe

TaskName: PG0009.wu - 369.000 secs Elapsed - 366.844 secs CPU time
TaskName: PG0395.wu - 351.047 secs Elapsed - 348.969 secs CPU time
TaskName: PG0444.wu - 290.594 secs Elapsed - 288.516 secs CPU time
TaskName: PG1327.wu - 249.484 secs Elapsed - 247.391 secs CPU time


TaskName: PG0009.wu - 371.375 secs Elapsed - 369.281 secs CPU time
TaskName: PG0395.wu - 351.109 secs Elapsed - 348.891 secs CPU time
TaskName: PG0444.wu - 290.453 secs Elapsed - 288.438 secs CPU time
TaskName: PG1327.wu - 249.594 secs Elapsed - 247.422 secs CPU time



I thought SSE4.1 is faster (only) on Core2 Duo CPUs?

Or only on the E8xxx series?

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: Core2 Duo E7600, SSSE3 instead of SSE4.1
« Reply #1 on: 08 Jun 2010, 04:37:49 am »
I thought SSE4.1 is faster (only) on Core2 Duo CPUs?

Or only on the E8xxx series?

It's actually dependant on the memory bus contention mostly, and indirectly via cache size.  So as usual no choice is absolute given those CPUs did not exist when those comparisons were made, & these CPUs have a smaller cache than 8xxx series. 

'Usually', since fast Duos have less contention than Quads (dual channel ram on 2 cores instead of 4 cores) the SSE4.1 build is equal or faster with fast RAM, but slower with slow RAM, however the smaller cache changes things again.

To Test: If you slow that RAM down more SSSE3x should be better by a greater margin.  If it is possible to OC that ram, especially run lower latency, then the SSE4.1 build is likely to overtake somewhere around 960MHz @ cl4.

Your system shows a pretty big difference. That difference gets less and swaps places as memory system performance goes up. In some cases that swapping can include Quads that have extreme RAM & lots of cache also.

Jason
« Last Edit: 08 Jun 2010, 04:41:36 am by Jason G »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Core2 Duo E7600, SSSE3 instead of SSE4.1
« Reply #2 on: 08 Jun 2010, 04:38:42 am »
"SSE4.1" could be faster for hosts with relatively fast memory bus. Your host uses 800MHz memory so very likely memory bus saturated enough to give advantage of cache handling used in SSSE3x build. That's memory/cache issues, not SSE level of CPU per se.
IMO SSE4.1 build leaved mostly for peoples who can't understand that bigger!=better sometimes. SSE4.1 will win on some hosts indeed, but not on many IMO and surely not on all who can support SSE4.1 instruction set.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: Core2 Duo E7600, SSSE3 instead of SSE4.1
« Reply #3 on: 08 Jun 2010, 05:07:00 am »
Duplicating your test on E8400 @3.6GHz w/Dual Channel DDR2@960MHz, Win7x64 (Second core loaded with Boinc):

  AK_v8b_win_SSSE3x.exe
TaskName: PG0009.wu -  293.020 secs Elapsed -  290.318 secs CPU time
TaskName: PG0395.wu -  267.220 secs Elapsed -  264.531 secs CPU time
TaskName: PG0444.wu -  210.280 secs Elapsed -  207.793 secs CPU time
TaskName: PG1327.wu - 154.550 secs Elapsed -  152.195 secs CPU time
                                                                      Total CPU 914.837

  AK_v8b_win_SSE41.exe
TaskName: PG0009.wu -  289.011 secs Elapsed - 286.777 secs CPU time
TaskName: PG0395.wu -  263.460 secs Elapsed -  260.880 secs CPU time
TaskName: PG0444.wu -  214.420 secs Elapsed -  211.818 secs CPU time
TaskName: PG1327.wu -  154.311 secs Elapsed -  151.508 secs CPU time
                                                                      Total CPU 910.983 - speedup = 0.42%

Pretty close!  ;) So close it doesn't really matter anymore.  I do plan to upgrade my RAM further in the near future, and so it will probably all change again.

Jason
« Last Edit: 08 Jun 2010, 05:50:29 am by Jason G »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Core2 Duo E7600, SSSE3 instead of SSE4.1
« Reply #4 on: 08 Jun 2010, 07:02:30 am »
with so small differencies one time measured numbers w/o standard deviation just have no sense actually. Times are the same.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: Core2 Duo E7600, SSSE3 instead of SSE4.1
« Reply #5 on: 08 Jun 2010, 07:14:47 am »
I agree if the point was to decide which one to use. However for the purposes of replicating Surtaru's test with the same apps.  Quoting myself:
Quote
Pretty close!   ;) So close it doesn't really matter anymore.

Point was to show convergence of times with better memory subsystem, where there was a big difference on Sutaru's....so it does that successfully  ;).   

Of course I don't use either of these apps, but the 64 bit versions where the story is very slightly different again: and variation is more like 1-2%, which becomes a bit more repeatable & a tiny bit more worthwhile.

Gecko_R7

  • Guest
Re: Core2 Duo E7600, SSSE3 instead of SSE4.1
« Reply #6 on: 08 Jun 2010, 11:04:58 am »
I agree if the point was to decide which one to use. However for the purposes of replicating Surtaru's test with the same apps.  Quoting myself:
Quote
Pretty close!   ;) So close it doesn't really matter anymore.

Point was to show convergence of times with better memory subsystem, where there was a big difference on Sutaru's....so it does that successfully  ;).   

Of course I don't use either of these apps, but the 64 bit versions where the story is very slightly different again: and variation is more like 1-2%, which becomes a bit more repeatable & a tiny bit more worthwhile.

FWIW, we've all seen SOOO many of the ssse3 and sse4.1 tests that bounce back and forth in range of .5% to 2-3% on one side or the other.
Earlier tests in 2007/08' were run on XP and Vista hosts etc.
These tests are now Win7.

I think we saw sse4.1 being favored on Jason's big-cache dual core Penryns & Mark's quad core rigs in cases where he had big cache + combo of low latency & high speed memory.  Tests with small cache quad core Q8200/8300 etc. showed ssse3x quicker.  Made sense since builds w/ entry to mid-level Q-series likely tend towards more budget-friendly MBs & DDR800 sticks in majority of cases.  So, a triple whammy of core contention, slower bus and slower memory.

With a variance range of only .5 to 3%, OS differences between XP, Vista and Win7, number of and kind of background services and programs running, and even BOINC disk write interval settings can swing the results on both sides of this range for identical HW rigs.

So, even *if* there *is* a fractional advantage between instruction set builds, real world performance variance in HW and OS set-ups is likely a greater variance than the instruction set difference.

To quote Bugs Bunny, still looks like splitting Hares to me, lol.  :P
« Last Edit: 08 Jun 2010, 11:20:26 am by Gecko »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: Core2 Duo E7600, SSSE3 instead of SSE4.1
« Reply #7 on: 08 Jun 2010, 12:07:16 pm »
To quote Bugs Bunny, still looks like splitting Hares to me, lol.  :P

One favourite hair splitting trick I used back in the day, was running the same build against itself.  That gives some idea of the run to run variation, but more importantly shows if something on the system is pinching cycles periodically.  Wide run to run variation spotted by careful hair splitting can be a good locator of system issues & possible tuning, but, if the sources of variation can't be isolated & rectified, it tends to lead to 'grasping at straws'.

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: Core2 Duo E7600, SSSE3 instead of SSE4.1
« Reply #8 on: 08 Jun 2010, 12:20:30 pm »
Another thing to draw out of this conversation is the important distinction between "information" and "advice" - applies everywhere, of course, but let's keep it to computing.

Information is always helpful, useful and welcome - "My hosts run slightly quicker with the SSSE3x version"

Advice - "You should install such-and-such a version on your host" - is much more problematic, and requires much more knowledge about the other user's situation, needs and wants than we readily have access to.

Many times down the years, I've given users information - "you could do SETI work more quickly and reliably if you installed an optimised application" - and they've considered it and decided not to: perhaps lack of confidence, perhaps difficulty re-gaining access to the machine when an update is required, whatever. But a considered and reasonable decision. It would have been quite wrong to "advise" such a user even to install an opti app, let alone to presume to choose the best one on their behalf.

Offline MarkJ

  • Knight o' The Realm
  • **
  • Posts: 96
Re: Ongoing Multibeam Cuda x32f Testing
« Reply #9 on: 21 Aug 2010, 08:27:17 am »
Just starting a run with x32h. Managed to trash my cache of work on this machine. It has a GTS250 installed and can be seen at  http://setiathome.berkeley.edu/results.php?hostid=5400417

Cheers,
MarkJ

edit: Just re-read 1st message in this thread and Jason wants us to concentrate on x32f cuda 3.0 build. So i'll let it finish off what its got and then change over.
« Last Edit: 21 Aug 2010, 08:36:27 am by MarkJ »

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 4
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 96
Total: 96
Powered by EzPortal