Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Claggy:
Here's the results from my 9800GTX+ on Win 7 64bit:
Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 16.0 GFlops 64.2 GB/s 1183.3ulps
SumMax ( 64) 1.4 GFlops 6.0 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 4.5 GFlops 18.3 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 16.2 GFlops 64.7 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
6.0 GFlops 24.3 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
7.9 GFlops 32.1 GB/s 121.7ulps
and from my 128Mb 8400M GS:
Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 1.2 GFlops 4.8 GB/s 1183.3ulps
SumMax ( 64) 0.1 GFlops 0.5 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 0.4 GFlops 1.5 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 1.2 GFlops 4.8 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
0.6 GFlops 2.4 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
0.6 GFlops 2.5 GB/s 121.7ulps
Claggy
Edit: Here's the results of my 9800GTX+ on Windows Vista 64bit:
Microsoft Windows [Version 6.0.6002]
Copyright (c) 2006 Microsoft Corporation. All rights reserved.
Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 16.0 GFlops 64.1 GB/s 1183.3ulps
SumMax ( 64) 1.4 GFlops 5.7 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 4.3 GFlops 17.6 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 16.2 GFlops 64.7 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
5.8 GFlops 23.4 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
7.5 GFlops 30.4 GB/s 121.7ulps
PatrickV2:
Ran it on my rig (Q6600/8GB/8800GTX/Win7-64), results:
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 18.1 GFlops 72.4 GB/s 1183.3ulps
SumMax ( 64) 1.2 GFlops 4.9 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 3.9 GFlops 15.6 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 18.2 GFlops 72.8 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
5.4 GFlops 22.0 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
6.6 GFlops 26.6 GB/s 121.7ulps
Are you also interested in a run under WinXP?
Regards,
Patrick.
Ghost0210:
Win7 x64 - GTX465:
Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 15.9 GFlops 63.8 GB/s 0.0ulps
SumMax ( 64) 1.3 GFlops 5.4 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 4.1 GFlops 16.6 GB/s
GetPowerSpectrum() choice for Opt1: 256 thrds/block
256 threads: 23.1 GFlops 92.5 GB/s 121.7ulps
Opt1 (PSmod3+SM): 256 thrds/block
256 threads, fftlen 64: (worst case: full summax copy)
6.0 GFlops 24.2 GB/s 121.7ulps
Every ifft average & peak OK
256 threads, fftlen 64: (best case, nothing to update)
8.7 GFlops 35.4 GB/s 121.7ulps
Jason G:
--- Quote from: PatrickV2 on 29 Nov 2010, 01:44:54 pm ---....Are you also interested in a run under WinXP? ...
--- End quote ---
Sure! it'll be interesting to see if I'm closing the gap, or making it wider ;).
Analysing your first result....
8800GTX
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~38%
best case speedup: ~69%
Jason G:
--- Quote from: Ghost on 29 Nov 2010, 01:47:57 pm ---Win7 x64 - GTX465:
--- End quote ---
Thanks, analysing your result too....
GTX 465
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~46%
best case speedup: ~112%
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version