+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 138479 times)

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: [Split] PowerSpectrum Unit Test
« Reply #45 on: 19 Nov 2010, 07:46:31 pm »
Got through mod 2 just fine, now it crashes on mod 3 512 threads.

I even set the clocks to 505/1010/1350 just to check.

Also crashes at 800/1600/1800

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #46 on: 20 Nov 2010, 12:49:52 am »
mmm, don't know why, weird.  Will look at mod3's differences to mod2 (not much).  Maybe some sort of driver bug ? It runs on XP32 here, but that's only a 260, not a Fermi.
I'd try a 263.06 driver clean install & see if that helps.

Can anyone else report crashing out on Mod3 ?  Looks like Mod1 (256 thread) will be the useful technique on Fermi cards anyway, but if there is some issue with Mod3 it'd be nice to find & fix for a fair comparison.

[A bit Later:] Might have found something, will try adjust mod3 & update later.    @arkayn:  :o why is your card the only one that tells me when I do something wrong ?
« Last Edit: 20 Nov 2010, 01:48:26 am by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #47 on: 20 Nov 2010, 01:56:32 am »
Updated first post:
Quote
[Updated] to PowerSpectrum Unit Test #4
Mod1: no changes
Mod2: no changes
Mod3: Tidy up & ironed out a bug that only manifests on Arkayn's card so far :o.  Could be a smidgen faster.

Thanks Arkayn for picking up my bugs.  Still no idea why yours is extra fussy, but it's very handy at the moment.

Offline M_M

  • Squire
  • *
  • Posts: 32
Re: [Split] PowerSpectrum Unit Test
« Reply #48 on: 20 Nov 2010, 03:15:02 am »
Mod3 perforamance improved in latest PS build...

Device: GeForce GTX 460, 810 MHz clock, 993 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
                PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       14.7 GFlops    5.9 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        8.2 GFlops    3.3 GB/s 121.7ulps
     64 threads:       14.6 GFlops    5.8 GB/s 121.7ulps
    128 threads:       22.3 GFlops    8.9 GB/s 121.7ulps
    256 threads:       26.2 GFlops   10.5 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        9.4 GFlops    3.8 GB/s   0.0ulps
     64 threads:       12.2 GFlops    4.9 GB/s   0.0ulps
    128 threads:       14.7 GFlops    5.9 GB/s   0.0ulps
    256 threads:       14.3 GFlops    5.7 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        8.2 GFlops    3.3 GB/s 121.7ulps
     64 threads:       14.7 GFlops    5.9 GB/s 121.7ulps
    128 threads:       22.3 GFlops    8.9 GB/s 121.7ulps
    256 threads:       26.1 GFlops   10.4 GB/s 121.7ulps
    512 threads:       25.7 GFlops   10.3 GB/s 121.7ulps
   1024 threads:       18.3 GFlops    7.3 GB/s 121.7ulps
« Last Edit: 20 Nov 2010, 03:19:33 am by M_M »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #49 on: 20 Nov 2010, 03:21:06 am »
hehe thanks. 460 with stock code is starting to look a bit anaemic, around all those 20+ figures

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #50 on: 20 Nov 2010, 04:27:52 am »
C:\ap_j>cd g_fft
Stopping Boinc...
starting PowerSpectrum4.exe
.

Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       20.6 GFlops    8.2 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:       12.5 GFlops    5.0 GB/s 121.7ulps
     64 threads:       20.5 GFlops    8.2 GB/s 121.7ulps
    128 threads:       27.6 GFlops   11.0 GB/s 121.7ulps
    256 threads:       29.9 GFlops   12.0 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:       13.5 GFlops    5.4 GB/s   0.0ulps
     64 threads:       16.7 GFlops    6.7 GB/s   0.0ulps
    128 threads:       17.2 GFlops    6.9 GB/s   0.0ulps
    256 threads:       15.7 GFlops    6.3 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:       12.6 GFlops    5.0 GB/s 121.7ulps
     64 threads:       20.6 GFlops    8.2 GB/s 121.7ulps
    128 threads:       27.5 GFlops   11.0 GB/s 121.7ulps
    256 threads:       30.0 GFlops   12.0 GB/s 121.7ulps
    512 threads:       29.7 GFlops   11.9 GB/s 121.7ulps
   1024 threads:       25.6 GFlops   10.2 GB/s 121.7ulps


.
Done
Restarting Boinc...
Drücken Sie eine beliebige Taste . . .

heinz

Ghost0210

  • Guest
Re: [Split] PowerSpectrum Unit Test
« Reply #51 on: 20 Nov 2010, 04:51:29 am »
Mod 4 Results on my 465:


Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       16.0 GFlops    6.4 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        9.8 GFlops    3.9 GB/s 121.7ulps
     64 threads:       15.9 GFlops    6.3 GB/s 121.7ulps
    128 threads:       21.0 GFlops    8.4 GB/s 121.7ulps
    256 threads:       23.1 GFlops    9.2 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:       10.7 GFlops    4.3 GB/s   0.0ulps
     64 threads:       13.1 GFlops    5.2 GB/s   0.0ulps
    128 threads:       13.3 GFlops    5.3 GB/s   0.0ulps
    256 threads:       12.1 GFlops    4.8 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        9.8 GFlops    3.9 GB/s 121.7ulps
     64 threads:       15.9 GFlops    6.4 GB/s 121.7ulps
    128 threads:       21.0 GFlops    8.4 GB/s 121.7ulps
    256 threads:       23.1 GFlops    9.2 GB/s 121.7ulps
    512 threads:       22.9 GFlops    9.1 GB/s 121.7ulps
   1024 threads:       19.5 GFlops    7.8 GB/s 121.7ulps

Edit: Corrected figures - was running downclocked in previous test (no tasks) stock 465 speeds now shown
« Last Edit: 20 Nov 2010, 05:21:20 pm by Ghost »

Offline M_M

  • Squire
  • *
  • Posts: 32
Re: [Split] PowerSpectrum Unit Test
« Reply #52 on: 20 Nov 2010, 04:54:38 am »
Mod1 & Mod3 256 threads seems to suit Fermi the best...

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: [Split] PowerSpectrum Unit Test
« Reply #53 on: 20 Nov 2010, 06:19:24 am »
Windows XP 32 seems to be faster than Windows 7 64.
I also noticed that for AP. For both Nvidia and AMD.



Windows 7 64

Device: GeForce GTX 460, 1451 MHz clock, 1024 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       12.7 GFlops    5.1 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        7.1 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.6 GFlops    5.0 GB/s 121.7ulps
    128 threads:       18.7 GFlops    7.5 GB/s 121.7ulps
    256 threads:       22.4 GFlops    9.0 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        8.0 GFlops    3.2 GB/s   0.0ulps
     64 threads:       10.4 GFlops    4.2 GB/s   0.0ulps
    128 threads:       12.5 GFlops    5.0 GB/s   0.0ulps
    256 threads:       12.3 GFlops    4.9 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        7.2 GFlops    2.9 GB/s 121.7ulps
     64 threads:       12.7 GFlops    5.1 GB/s 121.7ulps
    128 threads:       18.8 GFlops    7.5 GB/s 121.7ulps
    256 threads:       22.4 GFlops    9.0 GB/s 121.7ulps
    512 threads:       21.9 GFlops    8.8 GB/s 121.7ulps
   1024 threads:       15.6 GFlops    6.2 GB/s 121.7ulps


================================================

Windows XP 32

Device: GeForce GTX 460, 810 MHz clock, 993 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       13.2 GFlops    5.3 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        7.3 GFlops    2.9 GB/s 121.7ulps
     64 threads:       13.1 GFlops    5.2 GB/s 121.7ulps
    128 threads:       19.8 GFlops    7.9 GB/s 121.7ulps
    256 threads:       23.5 GFlops    9.4 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        8.4 GFlops    3.3 GB/s   0.0ulps
     64 threads:       10.9 GFlops    4.4 GB/s   0.0ulps
    128 threads:       13.0 GFlops    5.2 GB/s   0.0ulps
    256 threads:       12.7 GFlops    5.1 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        7.4 GFlops    3.0 GB/s 121.7ulps
     64 threads:       13.2 GFlops    5.3 GB/s 121.7ulps
    128 threads:       19.9 GFlops    8.0 GB/s 121.7ulps
    256 threads:       23.6 GFlops    9.5 GB/s 121.7ulps
    512 threads:       23.2 GFlops    9.3 GB/s 121.7ulps
   1024 threads:       16.2 GFlops    6.5 GB/s 121.7ulps


« Last Edit: 20 Nov 2010, 06:22:38 am by Frizz »
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline MarkJ

  • Knight o' The Realm
  • **
  • Posts: 96
Re: [Split] PowerSpectrum Unit Test
« Reply #54 on: 20 Nov 2010, 06:38:13 am »
I ran on all the different cards on the farm:

1st up the GT240 (Win7 x64) has 3 cards, the DDR5 variety. Device 0 is slightly slower than 1 and 2, although they are all the same brand/model. Output is from device 0.

Device: GeForce GT 240, 1340 MHz clock, 475 MB memory.
Compute capability 1.2
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:        9.9 GFlops    4.0 GB/s 1183.3ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        8.5 GFlops    3.4 GB/s 121.7ulps
     64 threads:       10.1 GFlops    4.0 GB/s 121.7ulps
    128 threads:       10.0 GFlops    4.0 GB/s 121.7ulps
    256 threads:       10.0 GFlops    4.0 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        2.1 GFlops    0.8 GB/s 1183.3ulps
     64 threads:        2.1 GFlops    0.8 GB/s 1183.3ulps
    128 threads:        2.1 GFlops    0.9 GB/s 1183.3ulps
    256 threads:        2.0 GFlops    0.8 GB/s 1183.3ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        8.8 GFlops    3.5 GB/s 121.7ulps
     64 threads:       10.1 GFlops    4.0 GB/s 121.7ulps
    128 threads:       10.0 GFlops    4.0 GB/s 121.7ulps
    256 threads:       10.0 GFlops    4.0 GB/s 121.7ulps
    512 threads:       10.0 GFlops    4.0 GB/s 121.7ulps
   1024 threads: N/A


*******************************************

Next we have a GTX275 (win7 x64):

Device: GeForce GTX 275, 1404 MHz clock, 873 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       27.1 GFlops   10.8 GB/s 1183.3ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:       17.1 GFlops    6.8 GB/s 121.7ulps
     64 threads:       27.1 GFlops   10.8 GB/s 121.7ulps
    128 threads:       27.3 GFlops   10.9 GB/s 121.7ulps
    256 threads:       27.3 GFlops   10.9 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        6.2 GFlops    2.5 GB/s 1183.3ulps
     64 threads:        6.3 GFlops    2.5 GB/s 1183.3ulps
    128 threads:        6.0 GFlops    2.4 GB/s 1183.3ulps
    256 threads:        6.0 GFlops    2.4 GB/s 1183.3ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:       17.1 GFlops    6.9 GB/s 121.7ulps
     64 threads:       27.1 GFlops   10.8 GB/s 121.7ulps
    128 threads:       27.4 GFlops   11.0 GB/s 121.7ulps
    256 threads:       27.2 GFlops   10.9 GB/s 121.7ulps
    512 threads:       27.3 GFlops   10.9 GB/s 121.7ulps
   1024 threads: N/A


*******************************************

Next a GTX295. Yeah, I know various people have run these. Win7 x64 again

Device: GeForce GTX 295, 1242 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       24.2 GFlops    9.7 GB/s 1183.3ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:       15.6 GFlops    6.3 GB/s 121.7ulps
     64 threads:       24.6 GFlops    9.8 GB/s 121.7ulps
    128 threads:       24.8 GFlops    9.9 GB/s 121.7ulps
    256 threads:       24.7 GFlops    9.9 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        5.6 GFlops    2.2 GB/s 1183.3ulps
     64 threads:        5.7 GFlops    2.3 GB/s 1183.3ulps
    128 threads:        5.5 GFlops    2.2 GB/s 1183.3ulps
    256 threads:        5.4 GFlops    2.2 GB/s 1183.3ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:       15.6 GFlops    6.3 GB/s 121.7ulps
     64 threads:       24.6 GFlops    9.8 GB/s 121.7ulps
    128 threads:       24.8 GFlops    9.9 GB/s 121.7ulps
    256 threads:       24.7 GFlops    9.9 GB/s 121.7ulps
    512 threads:       24.7 GFlops    9.9 GB/s 121.7ulps
   1024 threads: N/A


*******************************************

Then a GTX460 (factory OC'ed version from EVGA. Once again under Win7 x64

Device: GeForce GTX 460, 810 MHz clock, 738 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       12.0 GFlops    4.8 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        6.9 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.0 GFlops    4.8 GB/s 121.7ulps
    128 threads:       17.4 GFlops    6.9 GB/s 121.7ulps
    256 threads:       19.1 GFlops    7.6 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        7.6 GFlops    3.0 GB/s   0.0ulps
     64 threads:       10.0 GFlops    4.0 GB/s   0.0ulps
    128 threads:       11.9 GFlops    4.8 GB/s   0.0ulps
    256 threads:       11.7 GFlops    4.7 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        7.0 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.1 GFlops    4.8 GB/s 121.7ulps
    128 threads:       17.4 GFlops    6.9 GB/s 121.7ulps
    256 threads:       19.1 GFlops    7.7 GB/s 121.7ulps
    512 threads:       18.8 GFlops    7.5 GB/s 121.7ulps
   1024 threads:       14.3 GFlops    5.7 GB/s 121.7ulps


*******************************************

And lastly just for comparison the same brand/model factory OC'ed GTX460 but under WinXP

Device: GeForce GTX 460, 1350 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       12.1 GFlops    4.8 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        6.9 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.0 GFlops    4.8 GB/s 121.7ulps
    128 threads:       17.4 GFlops    7.0 GB/s 121.7ulps
    256 threads:       19.1 GFlops    7.6 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        7.6 GFlops    3.0 GB/s   0.0ulps
     64 threads:       10.0 GFlops    4.0 GB/s   0.0ulps
    128 threads:       11.9 GFlops    4.8 GB/s   0.0ulps
    256 threads:       11.7 GFlops    4.7 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        7.0 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.1 GFlops    4.8 GB/s 121.7ulps
    128 threads:       17.4 GFlops    7.0 GB/s 121.7ulps
    256 threads:       19.1 GFlops    7.7 GB/s 121.7ulps
    512 threads:       18.9 GFlops    7.5 GB/s 121.7ulps
   1024 threads:       14.3 GFlops    5.7 GB/s 121.7ulps


Cheers,
MarkJ

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #55 on: 20 Nov 2010, 07:27:55 am »
Busy thread and a lot happening here. My respect. I re-ran the version 4 benchmark again on:

Win7-64/8GB/8800GTX/260.99 drivers:

Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
                PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       17.8 GFlops    7.1 GB/s 1183.3ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:       14.0 GFlops    5.6 GB/s 121.7ulps
     64 threads:       17.8 GFlops    7.1 GB/s 121.7ulps
    128 threads:       17.8 GFlops    7.1 GB/s 121.7ulps
    256 threads:       17.6 GFlops    7.0 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        2.9 GFlops    1.1 GB/s 1183.3ulps
     64 threads:        2.9 GFlops    1.2 GB/s 1183.3ulps
    128 threads:        2.9 GFlops    1.1 GB/s 1183.3ulps
    256 threads:        2.9 GFlops    1.1 GB/s 1183.3ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:       14.6 GFlops    5.8 GB/s 121.7ulps
     64 threads:       17.9 GFlops    7.2 GB/s 121.7ulps
    128 threads:       17.7 GFlops    7.1 GB/s 121.7ulps
    256 threads:       17.5 GFlops    7.0 GB/s 121.7ulps
    512 threads:       16.1 GFlops    6.4 GB/s 121.7ulps
   1024 threads: N/A


EDIT: I still have WinXP32 installed on another HD of this machine; are you interested in a run of your tool under that OS?

Regards, Patrick.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #56 on: 20 Nov 2010, 07:52:23 am »
EDIT: I still have WinXP32 installed on another HD of this machine; are you interested in a run of your tool under that OS?

Yes please.  The difference picked up earlier (Thanks Frizz)  between XP32 & XP64 was interesting ( with stock, around 10% advantage to XP32, reduced to ~5% with Mod3 ) .    I've little doubt XP32 has a similar advantage over Win7x64, due to the simpler driver model, but it'd be nice to confirm if the mods close that gap a bit too.
« Last Edit: 20 Nov 2010, 08:03:14 am by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #57 on: 20 Nov 2010, 08:02:51 am »
I ran on all the different cards on the farm:

1st up the GT240 (Win7 x64) has 3 cards, the DDR5 variety. Device 0 is slightly slower than 1 and 2, although they are all the same brand/model. Output is from device 0.

Device: GeForce GT 240, 1340 MHz clock, 475 MB memory....

Nice to be edging out stock on that stubborn card.  With the rest of your results it's starting to paint a picture that might be easy to handle:

by Compute Capability
  2.0 & 2.1: Mod3 256 thread wins (Significant Boost )
  1.3: Mod3 with 128 threads  ( Very small boost )
 1.0-1.2: Mod3 with 64 threads  (edges out stock by a slim margin sometimes, but seems consistent)

That should be fairly straightforward to follow rules like this for other more important kernels, so I'll make sure I fully understand this behaviour & build kernels with that in mind.


Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #58 on: 20 Nov 2010, 11:17:03 am »
Test 4 Win 7 64 260.99

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum Unit Test #4
Stock GetPowerSpectrum<>:
     64 threads:       27.6 GFlops  11.0 GB/s      0.0ulps

GetPowerSpectrum<> mod 1: <made Fermi & Pre-Fermi match in accuracy.>
     32 threads:       17.4 GFlops   7.0 GB/s    121.7ulps
     64 threads:       27.5 GFlops  11.0 GB/s    121.7ulps
    128 threads:       36.4 GFlops  14.5 GB/s    121.7ulps
    256 threads:       39.6 GFlops  15.8 GB/s    121.7ulps

GetPowerSpectrum<> mod 2 <fixed, but slow>:
     32 threads:       18.9 GFlops   7.6 GB/s      0.0ulps
     64 threads:       23.1 GFlops   9.2 GB/s      0.0ulps
    128 threads:       24.1 GFlops   9.6 GB/s      0.0ulps
    256 threads:       22.7 GFlops   9.1 GB/s      0.0ulps

GetPowerSpectrum<> mod 3: <As with mod1, +threads & split loads>
     32 threads:       17.5 GFlops   7.0 GB/s    121.7ulps
     64 threads:       27.6 GFlops  11.0 GB/s    121.7ulps
    128 threads:       36.3 GFlops  14.5 GB/s    121.7ulps
    256 threads:       39.7 GFlops  15.9 GB/s    121.7ulps
    512 threads:       39.2 GFlops  15.7 GB/s    121.7ulps
   1024 threads:       34.7 GFlops  13.9 GB/s    121.7ulps

Steve

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: [Split] PowerSpectrum Unit Test
« Reply #59 on: 20 Nov 2010, 11:31:52 am »
Me and my little 9500GT reporting for duty sir but it's time for a little hand holding.I downloaded the package from the first post. I got a DLL and the executable. Where do I put the DLL before I open the EXE?

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 213
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 150
Total: 150
Powered by EzPortal