Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (40/62) > >>

perryjay:
I've changed over to win-7 64 bit just before we came back up so I decided to run test 6 again. Not sure how much of a difference it will make.

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd\test

C:\test>powerspectrum4.exe > results.txt
'powerspectrum4.exe' is not recognized as an internal or external command,
operable program or batch file.

C:\test>powerspectrum6.exe
'powerspectrum6.exe' is not recognized as an internal or external command,
operable program or batch file.

C:\test>powerspectrumtest6.exe

Device: GeForce 9500 GT, 1400 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>    2.9 GFlops   11.4 GB/s 1183.3ulps

 SumMax (    64)    0.3 GFlops    1.5 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    1.0 GFlops    4.1 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:        2.9 GFlops   11.5 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         1.6 GFlops    6.6 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         1.8 GFlops    7.3 GB/s 121.7ulps



Leave it to me to mess up, EVGA precision wasn't holding the o/c. I looked all over the place but couldn't find the little button to make it apply at startup until just now. Here's the corrected test...
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd\test

C:\test>powerspectrumtest6.exe

Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>    2.9 GFlops   11.5 GB/s 1183.3ulps

 SumMax (    64)    0.4 GFlops    1.8 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    1.2 GFlops    4.7 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:        2.9 GFlops   11.6 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         0.7 GFlops    3.0 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         2.1 GFlops    8.3 GB/s 121.7ulps



C:\test>

Jason G:
Updated first post:

--- Quote ---Update: PowerSpectrum(+summax reduction) Test #7
 - completed summax reduction sizes 8 through 64
 - refined Opt1 a little, should be a tad faster for size 64 that was in prior test
 - tidied up test result layout
 - enabled pinned memory use for Opt1 on all Cuda Capable cards (including cc1.0)
--- End quote ---

Please test on all cuda capable cards.
example output:
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    2.9 GFlops   12.9 GB/s
 PS+SuMx(    16) [OK]    3.9 GFlops   16.2 GB/s
 PS+SuMx(    32) [OK]    3.9 GFlops   15.8 GB/s
 PS+SuMx(    64) [OK]    6.0 GFlops   24.2 GB/s


Opt1: 256 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    4.3   18.6 121.7 [OK]   22.8   99.7 121.7
 PS+SuMx(    16)    6.7   28.1 121.7 [OK]   21.4   89.7 121.7
 PS+SuMx(    32)    9.4   38.6 121.7 [OK]   20.8   85.2 121.7
 PS+SuMx(    64)   11.7   47.4 121.7 [OK]   20.4   82.6 121.7

Claggy:
My 9800GTX+ on Win 7 x64:

Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    2.0 GFlops    8.8 GB/s
 PS+SuMx(    16) [OK]    2.6 GFlops   10.7 GB/s
 PS+SuMx(    32) [OK]    2.8 GFlops   11.5 GB/s
 PS+SuMx(    64) [OK]    4.5 GFlops   18.1 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    2.7   11.8 121.7 [OK]    7.1   31.0 121.7
 PS+SuMx(    16)    4.0   16.5 121.7 [OK]    7.7   32.1 121.7
 PS+SuMx(    32)    4.9   19.9 121.7 [OK]    7.3   29.7 121.7
 PS+SuMx(    64)    6.6   26.7 121.7 [OK]    8.9   35.9 121.7
and on my 128Mb 8400M GS on Vista 32bit:

Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    0.3 GFlops    1.3 GB/s
 PS+SuMx(    16) [OK]    0.3 GFlops    1.2 GB/s
 PS+SuMx(    32) [OK]    0.2 GFlops    0.9 GB/s
 PS+SuMx(    64) [OK]    0.4 GFlops    1.5 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    0.4    1.9 121.7 [OK]    0.5    2.1 121.7
 PS+SuMx(    16)    0.4    1.8 121.7 [OK]    0.5    1.9 121.7
 PS+SuMx(    32)    0.4    1.7 121.7 [OK]    0.4    1.8 121.7
 PS+SuMx(    64)    0.5    2.1 121.7 [OK]    0.5    2.2 121.7
Claggy

Jason G:
LoL, I thought stock code was already G80 optimised, guess I was WRONG.

Miep:
Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    0.57 +- 0.048 GFlops    2.49 +- 0.24 GB/s
 PS+SuMx(    16) [OK]    0.57 +- 0.048 GFlops    2.39 +- 0.19 GB/s
 PS+SuMx(    32) [OK]    0.49 +- 0.031 GFlops    2.01 +- 0.11 GB/s
 PS+SuMx(    64) [OK]    0.80 +- 0.105 GFlops    3.20 +- 0.41 GB/s


Opt1: 64 thrds/block
                        worst case                                 best case
                         GFlps          GB/s        ulps            GFlps         GB/s     ulps
 PS+SuMx(     8)    0.87 +- 0.048    3.92 +- 0.20 121.7 [OK]    1.21 +- 0.03  5.49 +- 0.03 121.7
 PS+SuMx(    16)    0.89 +- 0.19     3.70 +- 0.78 121.7 [OK]    1.20 +- 0      5.00  +- 0   121.7
 PS+SuMx(    32)    0.97 +-0.048    3.92 +- 0.19 121.7 [OK]    1.10 +- 0       4.60 +- 0  121.7
 PS+SuMx(    64)    1.24 +- 0.11    5.02 +- 0.42 121.7 [OK]    1.41 +- 0.03   5.85 +- 0.05 121.7
Average and standard deviation over 10 runs.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version