Forum > GPU crunching
[Split] PowerSpectrum Unit Test
_heinz:
C:\ap_j>cd g_fft
Stopping Boinc...
starting PowerSpectrum4.exe
.
Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 20.6 GFlops 8.2 GB/s 0.0ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 12.5 GFlops 5.0 GB/s 121.7ulps
64 threads: 20.5 GFlops 8.2 GB/s 121.7ulps
128 threads: 27.6 GFlops 11.0 GB/s 121.7ulps
256 threads: 29.9 GFlops 12.0 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 13.5 GFlops 5.4 GB/s 0.0ulps
64 threads: 16.7 GFlops 6.7 GB/s 0.0ulps
128 threads: 17.2 GFlops 6.9 GB/s 0.0ulps
256 threads: 15.7 GFlops 6.3 GB/s 0.0ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 12.6 GFlops 5.0 GB/s 121.7ulps
64 threads: 20.6 GFlops 8.2 GB/s 121.7ulps
128 threads: 27.5 GFlops 11.0 GB/s 121.7ulps
256 threads: 30.0 GFlops 12.0 GB/s 121.7ulps
512 threads: 29.7 GFlops 11.9 GB/s 121.7ulps
1024 threads: 25.6 GFlops 10.2 GB/s 121.7ulps
.
Done
Restarting Boinc...
Drücken Sie eine beliebige Taste . . .
heinz
Ghost0210:
Mod 4 Results on my 465:
Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 16.0 GFlops 6.4 GB/s 0.0ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 9.8 GFlops 3.9 GB/s 121.7ulps
64 threads: 15.9 GFlops 6.3 GB/s 121.7ulps
128 threads: 21.0 GFlops 8.4 GB/s 121.7ulps
256 threads: 23.1 GFlops 9.2 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 10.7 GFlops 4.3 GB/s 0.0ulps
64 threads: 13.1 GFlops 5.2 GB/s 0.0ulps
128 threads: 13.3 GFlops 5.3 GB/s 0.0ulps
256 threads: 12.1 GFlops 4.8 GB/s 0.0ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 9.8 GFlops 3.9 GB/s 121.7ulps
64 threads: 15.9 GFlops 6.4 GB/s 121.7ulps
128 threads: 21.0 GFlops 8.4 GB/s 121.7ulps
256 threads: 23.1 GFlops 9.2 GB/s 121.7ulps
512 threads: 22.9 GFlops 9.1 GB/s 121.7ulps
1024 threads: 19.5 GFlops 7.8 GB/s 121.7ulps
Edit: Corrected figures - was running downclocked in previous test (no tasks) stock 465 speeds now shown
M_M:
Mod1 & Mod3 256 threads seems to suit Fermi the best...
Frizz:
Windows XP 32 seems to be faster than Windows 7 64.
I also noticed that for AP. For both Nvidia and AMD.
Windows 7 64
Device: GeForce GTX 460, 1451 MHz clock, 1024 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 12.7 GFlops 5.1 GB/s 0.0ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 7.1 GFlops 2.8 GB/s 121.7ulps
64 threads: 12.6 GFlops 5.0 GB/s 121.7ulps
128 threads: 18.7 GFlops 7.5 GB/s 121.7ulps
256 threads: 22.4 GFlops 9.0 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 8.0 GFlops 3.2 GB/s 0.0ulps
64 threads: 10.4 GFlops 4.2 GB/s 0.0ulps
128 threads: 12.5 GFlops 5.0 GB/s 0.0ulps
256 threads: 12.3 GFlops 4.9 GB/s 0.0ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 7.2 GFlops 2.9 GB/s 121.7ulps
64 threads: 12.7 GFlops 5.1 GB/s 121.7ulps
128 threads: 18.8 GFlops 7.5 GB/s 121.7ulps
256 threads: 22.4 GFlops 9.0 GB/s 121.7ulps
512 threads: 21.9 GFlops 8.8 GB/s 121.7ulps
1024 threads: 15.6 GFlops 6.2 GB/s 121.7ulps
================================================
Windows XP 32
Device: GeForce GTX 460, 810 MHz clock, 993 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 13.2 GFlops 5.3 GB/s 0.0ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 7.3 GFlops 2.9 GB/s 121.7ulps
64 threads: 13.1 GFlops 5.2 GB/s 121.7ulps
128 threads: 19.8 GFlops 7.9 GB/s 121.7ulps
256 threads: 23.5 GFlops 9.4 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 8.4 GFlops 3.3 GB/s 0.0ulps
64 threads: 10.9 GFlops 4.4 GB/s 0.0ulps
128 threads: 13.0 GFlops 5.2 GB/s 0.0ulps
256 threads: 12.7 GFlops 5.1 GB/s 0.0ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 7.4 GFlops 3.0 GB/s 121.7ulps
64 threads: 13.2 GFlops 5.3 GB/s 121.7ulps
128 threads: 19.9 GFlops 8.0 GB/s 121.7ulps
256 threads: 23.6 GFlops 9.5 GB/s 121.7ulps
512 threads: 23.2 GFlops 9.3 GB/s 121.7ulps
1024 threads: 16.2 GFlops 6.5 GB/s 121.7ulps
MarkJ:
I ran on all the different cards on the farm:
1st up the GT240 (Win7 x64) has 3 cards, the DDR5 variety. Device 0 is slightly slower than 1 and 2, although they are all the same brand/model. Output is from device 0.
Device: GeForce GT 240, 1340 MHz clock, 475 MB memory.
Compute capability 1.2
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 9.9 GFlops 4.0 GB/s 1183.3ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 8.5 GFlops 3.4 GB/s 121.7ulps
64 threads: 10.1 GFlops 4.0 GB/s 121.7ulps
128 threads: 10.0 GFlops 4.0 GB/s 121.7ulps
256 threads: 10.0 GFlops 4.0 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 2.1 GFlops 0.8 GB/s 1183.3ulps
64 threads: 2.1 GFlops 0.8 GB/s 1183.3ulps
128 threads: 2.1 GFlops 0.9 GB/s 1183.3ulps
256 threads: 2.0 GFlops 0.8 GB/s 1183.3ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 8.8 GFlops 3.5 GB/s 121.7ulps
64 threads: 10.1 GFlops 4.0 GB/s 121.7ulps
128 threads: 10.0 GFlops 4.0 GB/s 121.7ulps
256 threads: 10.0 GFlops 4.0 GB/s 121.7ulps
512 threads: 10.0 GFlops 4.0 GB/s 121.7ulps
1024 threads: N/A
*******************************************
Next we have a GTX275 (win7 x64):
Device: GeForce GTX 275, 1404 MHz clock, 873 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 27.1 GFlops 10.8 GB/s 1183.3ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 17.1 GFlops 6.8 GB/s 121.7ulps
64 threads: 27.1 GFlops 10.8 GB/s 121.7ulps
128 threads: 27.3 GFlops 10.9 GB/s 121.7ulps
256 threads: 27.3 GFlops 10.9 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 6.2 GFlops 2.5 GB/s 1183.3ulps
64 threads: 6.3 GFlops 2.5 GB/s 1183.3ulps
128 threads: 6.0 GFlops 2.4 GB/s 1183.3ulps
256 threads: 6.0 GFlops 2.4 GB/s 1183.3ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 17.1 GFlops 6.9 GB/s 121.7ulps
64 threads: 27.1 GFlops 10.8 GB/s 121.7ulps
128 threads: 27.4 GFlops 11.0 GB/s 121.7ulps
256 threads: 27.2 GFlops 10.9 GB/s 121.7ulps
512 threads: 27.3 GFlops 10.9 GB/s 121.7ulps
1024 threads: N/A
*******************************************
Next a GTX295. Yeah, I know various people have run these. Win7 x64 again
Device: GeForce GTX 295, 1242 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 24.2 GFlops 9.7 GB/s 1183.3ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 15.6 GFlops 6.3 GB/s 121.7ulps
64 threads: 24.6 GFlops 9.8 GB/s 121.7ulps
128 threads: 24.8 GFlops 9.9 GB/s 121.7ulps
256 threads: 24.7 GFlops 9.9 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 5.6 GFlops 2.2 GB/s 1183.3ulps
64 threads: 5.7 GFlops 2.3 GB/s 1183.3ulps
128 threads: 5.5 GFlops 2.2 GB/s 1183.3ulps
256 threads: 5.4 GFlops 2.2 GB/s 1183.3ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 15.6 GFlops 6.3 GB/s 121.7ulps
64 threads: 24.6 GFlops 9.8 GB/s 121.7ulps
128 threads: 24.8 GFlops 9.9 GB/s 121.7ulps
256 threads: 24.7 GFlops 9.9 GB/s 121.7ulps
512 threads: 24.7 GFlops 9.9 GB/s 121.7ulps
1024 threads: N/A
*******************************************
Then a GTX460 (factory OC'ed version from EVGA. Once again under Win7 x64
Device: GeForce GTX 460, 810 MHz clock, 738 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 12.0 GFlops 4.8 GB/s 0.0ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 6.9 GFlops 2.8 GB/s 121.7ulps
64 threads: 12.0 GFlops 4.8 GB/s 121.7ulps
128 threads: 17.4 GFlops 6.9 GB/s 121.7ulps
256 threads: 19.1 GFlops 7.6 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 7.6 GFlops 3.0 GB/s 0.0ulps
64 threads: 10.0 GFlops 4.0 GB/s 0.0ulps
128 threads: 11.9 GFlops 4.8 GB/s 0.0ulps
256 threads: 11.7 GFlops 4.7 GB/s 0.0ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 7.0 GFlops 2.8 GB/s 121.7ulps
64 threads: 12.1 GFlops 4.8 GB/s 121.7ulps
128 threads: 17.4 GFlops 6.9 GB/s 121.7ulps
256 threads: 19.1 GFlops 7.7 GB/s 121.7ulps
512 threads: 18.8 GFlops 7.5 GB/s 121.7ulps
1024 threads: 14.3 GFlops 5.7 GB/s 121.7ulps
*******************************************
And lastly just for comparison the same brand/model factory OC'ed GTX460 but under WinXP
Device: GeForce GTX 460, 1350 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 12.1 GFlops 4.8 GB/s 0.0ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 6.9 GFlops 2.8 GB/s 121.7ulps
64 threads: 12.0 GFlops 4.8 GB/s 121.7ulps
128 threads: 17.4 GFlops 7.0 GB/s 121.7ulps
256 threads: 19.1 GFlops 7.6 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 7.6 GFlops 3.0 GB/s 0.0ulps
64 threads: 10.0 GFlops 4.0 GB/s 0.0ulps
128 threads: 11.9 GFlops 4.8 GB/s 0.0ulps
256 threads: 11.7 GFlops 4.7 GB/s 0.0ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 7.0 GFlops 2.8 GB/s 121.7ulps
64 threads: 12.1 GFlops 4.8 GB/s 121.7ulps
128 threads: 17.4 GFlops 7.0 GB/s 121.7ulps
256 threads: 19.1 GFlops 7.7 GB/s 121.7ulps
512 threads: 18.9 GFlops 7.5 GB/s 121.7ulps
1024 threads: 14.3 GFlops 5.7 GB/s 121.7ulps
Cheers,
MarkJ
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version