Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (11/62) > >>

_heinz:
C:\ap_j>cd g_fft
Stopping Boinc...
starting PowerSpectrum4.exe
.

Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       20.6 GFlops    8.2 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:       12.5 GFlops    5.0 GB/s 121.7ulps
     64 threads:       20.5 GFlops    8.2 GB/s 121.7ulps
    128 threads:       27.6 GFlops   11.0 GB/s 121.7ulps
    256 threads:       29.9 GFlops   12.0 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:       13.5 GFlops    5.4 GB/s   0.0ulps
     64 threads:       16.7 GFlops    6.7 GB/s   0.0ulps
    128 threads:       17.2 GFlops    6.9 GB/s   0.0ulps
    256 threads:       15.7 GFlops    6.3 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:       12.6 GFlops    5.0 GB/s 121.7ulps
     64 threads:       20.6 GFlops    8.2 GB/s 121.7ulps
    128 threads:       27.5 GFlops   11.0 GB/s 121.7ulps
    256 threads:       30.0 GFlops   12.0 GB/s 121.7ulps
    512 threads:       29.7 GFlops   11.9 GB/s 121.7ulps
   1024 threads:       25.6 GFlops   10.2 GB/s 121.7ulps


.
Done
Restarting Boinc...
Drücken Sie eine beliebige Taste . . .

heinz

Ghost0210:
Mod 4 Results on my 465:


Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       16.0 GFlops    6.4 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        9.8 GFlops    3.9 GB/s 121.7ulps
     64 threads:       15.9 GFlops    6.3 GB/s 121.7ulps
    128 threads:       21.0 GFlops    8.4 GB/s 121.7ulps
    256 threads:       23.1 GFlops    9.2 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:       10.7 GFlops    4.3 GB/s   0.0ulps
     64 threads:       13.1 GFlops    5.2 GB/s   0.0ulps
    128 threads:       13.3 GFlops    5.3 GB/s   0.0ulps
    256 threads:       12.1 GFlops    4.8 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        9.8 GFlops    3.9 GB/s 121.7ulps
     64 threads:       15.9 GFlops    6.4 GB/s 121.7ulps
    128 threads:       21.0 GFlops    8.4 GB/s 121.7ulps
    256 threads:       23.1 GFlops    9.2 GB/s 121.7ulps
    512 threads:       22.9 GFlops    9.1 GB/s 121.7ulps
   1024 threads:       19.5 GFlops    7.8 GB/s 121.7ulps

Edit: Corrected figures - was running downclocked in previous test (no tasks) stock 465 speeds now shown

M_M:
Mod1 & Mod3 256 threads seems to suit Fermi the best...

Frizz:
Windows XP 32 seems to be faster than Windows 7 64.
I also noticed that for AP. For both Nvidia and AMD.



Windows 7 64

Device: GeForce GTX 460, 1451 MHz clock, 1024 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       12.7 GFlops    5.1 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        7.1 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.6 GFlops    5.0 GB/s 121.7ulps
    128 threads:       18.7 GFlops    7.5 GB/s 121.7ulps
    256 threads:       22.4 GFlops    9.0 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        8.0 GFlops    3.2 GB/s   0.0ulps
     64 threads:       10.4 GFlops    4.2 GB/s   0.0ulps
    128 threads:       12.5 GFlops    5.0 GB/s   0.0ulps
    256 threads:       12.3 GFlops    4.9 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        7.2 GFlops    2.9 GB/s 121.7ulps
     64 threads:       12.7 GFlops    5.1 GB/s 121.7ulps
    128 threads:       18.8 GFlops    7.5 GB/s 121.7ulps
    256 threads:       22.4 GFlops    9.0 GB/s 121.7ulps
    512 threads:       21.9 GFlops    8.8 GB/s 121.7ulps
   1024 threads:       15.6 GFlops    6.2 GB/s 121.7ulps


================================================

Windows XP 32

Device: GeForce GTX 460, 810 MHz clock, 993 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       13.2 GFlops    5.3 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        7.3 GFlops    2.9 GB/s 121.7ulps
     64 threads:       13.1 GFlops    5.2 GB/s 121.7ulps
    128 threads:       19.8 GFlops    7.9 GB/s 121.7ulps
    256 threads:       23.5 GFlops    9.4 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        8.4 GFlops    3.3 GB/s   0.0ulps
     64 threads:       10.9 GFlops    4.4 GB/s   0.0ulps
    128 threads:       13.0 GFlops    5.2 GB/s   0.0ulps
    256 threads:       12.7 GFlops    5.1 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        7.4 GFlops    3.0 GB/s 121.7ulps
     64 threads:       13.2 GFlops    5.3 GB/s 121.7ulps
    128 threads:       19.9 GFlops    8.0 GB/s 121.7ulps
    256 threads:       23.6 GFlops    9.5 GB/s 121.7ulps
    512 threads:       23.2 GFlops    9.3 GB/s 121.7ulps
   1024 threads:       16.2 GFlops    6.5 GB/s 121.7ulps


MarkJ:
I ran on all the different cards on the farm:

1st up the GT240 (Win7 x64) has 3 cards, the DDR5 variety. Device 0 is slightly slower than 1 and 2, although they are all the same brand/model. Output is from device 0.

Device: GeForce GT 240, 1340 MHz clock, 475 MB memory.
Compute capability 1.2
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:        9.9 GFlops    4.0 GB/s 1183.3ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        8.5 GFlops    3.4 GB/s 121.7ulps
     64 threads:       10.1 GFlops    4.0 GB/s 121.7ulps
    128 threads:       10.0 GFlops    4.0 GB/s 121.7ulps
    256 threads:       10.0 GFlops    4.0 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        2.1 GFlops    0.8 GB/s 1183.3ulps
     64 threads:        2.1 GFlops    0.8 GB/s 1183.3ulps
    128 threads:        2.1 GFlops    0.9 GB/s 1183.3ulps
    256 threads:        2.0 GFlops    0.8 GB/s 1183.3ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        8.8 GFlops    3.5 GB/s 121.7ulps
     64 threads:       10.1 GFlops    4.0 GB/s 121.7ulps
    128 threads:       10.0 GFlops    4.0 GB/s 121.7ulps
    256 threads:       10.0 GFlops    4.0 GB/s 121.7ulps
    512 threads:       10.0 GFlops    4.0 GB/s 121.7ulps
   1024 threads: N/A


*******************************************

Next we have a GTX275 (win7 x64):

Device: GeForce GTX 275, 1404 MHz clock, 873 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       27.1 GFlops   10.8 GB/s 1183.3ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:       17.1 GFlops    6.8 GB/s 121.7ulps
     64 threads:       27.1 GFlops   10.8 GB/s 121.7ulps
    128 threads:       27.3 GFlops   10.9 GB/s 121.7ulps
    256 threads:       27.3 GFlops   10.9 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        6.2 GFlops    2.5 GB/s 1183.3ulps
     64 threads:        6.3 GFlops    2.5 GB/s 1183.3ulps
    128 threads:        6.0 GFlops    2.4 GB/s 1183.3ulps
    256 threads:        6.0 GFlops    2.4 GB/s 1183.3ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:       17.1 GFlops    6.9 GB/s 121.7ulps
     64 threads:       27.1 GFlops   10.8 GB/s 121.7ulps
    128 threads:       27.4 GFlops   11.0 GB/s 121.7ulps
    256 threads:       27.2 GFlops   10.9 GB/s 121.7ulps
    512 threads:       27.3 GFlops   10.9 GB/s 121.7ulps
   1024 threads: N/A


*******************************************

Next a GTX295. Yeah, I know various people have run these. Win7 x64 again

Device: GeForce GTX 295, 1242 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       24.2 GFlops    9.7 GB/s 1183.3ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:       15.6 GFlops    6.3 GB/s 121.7ulps
     64 threads:       24.6 GFlops    9.8 GB/s 121.7ulps
    128 threads:       24.8 GFlops    9.9 GB/s 121.7ulps
    256 threads:       24.7 GFlops    9.9 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        5.6 GFlops    2.2 GB/s 1183.3ulps
     64 threads:        5.7 GFlops    2.3 GB/s 1183.3ulps
    128 threads:        5.5 GFlops    2.2 GB/s 1183.3ulps
    256 threads:        5.4 GFlops    2.2 GB/s 1183.3ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:       15.6 GFlops    6.3 GB/s 121.7ulps
     64 threads:       24.6 GFlops    9.8 GB/s 121.7ulps
    128 threads:       24.8 GFlops    9.9 GB/s 121.7ulps
    256 threads:       24.7 GFlops    9.9 GB/s 121.7ulps
    512 threads:       24.7 GFlops    9.9 GB/s 121.7ulps
   1024 threads: N/A


*******************************************

Then a GTX460 (factory OC'ed version from EVGA. Once again under Win7 x64

Device: GeForce GTX 460, 810 MHz clock, 738 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       12.0 GFlops    4.8 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        6.9 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.0 GFlops    4.8 GB/s 121.7ulps
    128 threads:       17.4 GFlops    6.9 GB/s 121.7ulps
    256 threads:       19.1 GFlops    7.6 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        7.6 GFlops    3.0 GB/s   0.0ulps
     64 threads:       10.0 GFlops    4.0 GB/s   0.0ulps
    128 threads:       11.9 GFlops    4.8 GB/s   0.0ulps
    256 threads:       11.7 GFlops    4.7 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        7.0 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.1 GFlops    4.8 GB/s 121.7ulps
    128 threads:       17.4 GFlops    6.9 GB/s 121.7ulps
    256 threads:       19.1 GFlops    7.7 GB/s 121.7ulps
    512 threads:       18.8 GFlops    7.5 GB/s 121.7ulps
   1024 threads:       14.3 GFlops    5.7 GB/s 121.7ulps


*******************************************

And lastly just for comparison the same brand/model factory OC'ed GTX460 but under WinXP

Device: GeForce GTX 460, 1350 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       12.1 GFlops    4.8 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        6.9 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.0 GFlops    4.8 GB/s 121.7ulps
    128 threads:       17.4 GFlops    7.0 GB/s 121.7ulps
    256 threads:       19.1 GFlops    7.6 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        7.6 GFlops    3.0 GB/s   0.0ulps
     64 threads:       10.0 GFlops    4.0 GB/s   0.0ulps
    128 threads:       11.9 GFlops    4.8 GB/s   0.0ulps
    256 threads:       11.7 GFlops    4.7 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        7.0 GFlops    2.8 GB/s 121.7ulps
     64 threads:       12.1 GFlops    4.8 GB/s 121.7ulps
    128 threads:       17.4 GFlops    7.0 GB/s 121.7ulps
    256 threads:       19.1 GFlops    7.7 GB/s 121.7ulps
    512 threads:       18.9 GFlops    7.5 GB/s 121.7ulps
   1024 threads:       14.3 GFlops    5.7 GB/s 121.7ulps


Cheers,
MarkJ

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version