+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 162557 times)

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: [Split] PowerSpectrum Unit Test
« Reply #270 on: 24 Dec 2010, 04:14:26 pm »
nr9

Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.

 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, line 254

ouch :)

ok stopping boinc helps ::) result tomorrow ok result now

Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #9 (FFT pipeline)
                                Christmas 2010 edition.
Stock:
 FFT+PS+SM(     8)    1.8 GFlops    3.2 GB/s  ulps(fft  1.3,ps 4775.9) [OK]
 FFT+PS+SM(    16)    2.9 GFlops    4.0 GB/s  ulps(fft  1.6,ps 4817.4) [OK]
 FFT+PS+SM(    32)    2.7 GFlops    3.0 GB/s  ulps(fft  1.6,ps 4628.1) [OK]
 FFT+PS+SM(    64)    5.3 GFlops    5.0 GB/s  ulps(fft  1.6,ps 4557.6) [OK]
 FFT+PS+SM(   128)    7.9 GFlops    6.5 GB/s  ulps(fft  2.0,ps 4942.0) [OK]
 FFT+PS+SM(   256)   11.0 GFlops    8.0 GB/s  ulps(fft  2.0,ps 4967.8) [OK]
 FFT+PS+SM(   512)   13.3 GFlops    8.7 GB/s  ulps(fft  2.1,ps 5128.1) [OK]
 FFT+PS+SM(  1024)   13.1 GFlops    7.8 GB/s  ulps(fft  2.5,ps 5552.5) [OK]
 FFT+PS+SM(  2048)   13.2 GFlops    7.2 GB/s  ulps(fft  2.7,ps 5770.3) [OK]
 FFT+PS+SM(  4096)   12.3 GFlops    6.1 GB/s  ulps(fft  2.4,ps 5313.7) [OK]
 FFT+PS+SM(  8192)   11.5 GFlops    5.3 GB/s  ulps(fft  2.8,ps 5881.1) [OK]
 FFT+PS+SM( 16384)   10.7 GFlops    4.6 GB/s  ulps(fft  3.3,ps 6399.1) [OK]
 FFT+PS+SM( 32768)   12.2 GFlops    5.0 GB/s  ulps(fft  3.3,ps 6380.1) [OK]
 FFT+PS+SM( 65536)   12.2 GFlops    4.7 GB/s  ulps(fft  3.4,ps 6534.8) [OK]
 FFT+PS+SM(131072)   12.5 GFlops    4.5 GB/s  ulps(fft  3.6,ps 6694.2) [OK]


Opt1 (worst case): 64 thrds/block
 FFT+PS+SM(     8)    3.7 GFlops    6.6 GB/s  ulps(fft  1.3,ps 4637.5) [OK]
 FFT+PS+SM(    16)    4.7 GFlops    6.4 GB/s  ulps(fft  1.6,ps 4589.2) [OK]
 FFT+PS+SM(    32)    5.6 GFlops    6.3 GB/s  ulps(fft  1.6,ps 4535.6) [OK]
 FFT+PS+SM(    64)    7.9 GFlops    7.5 GB/s  ulps(fft  1.6,ps 4426.7) [OK]
 FFT+PS+SM(   128)    9.4 GFlops    7.7 GB/s  ulps(fft  2.0,ps 4818.1) [OK]
 FFT+PS+SM(   256)   12.5 GFlops    9.1 GB/s  ulps(fft  2.0,ps 4831.0) [OK]
 FFT+PS+SM(   512)   15.3 GFlops   10.0 GB/s  ulps(fft  2.1,ps 4987.2) [OK]
 FFT+PS+SM(  1024)   15.0 GFlops    8.9 GB/s  ulps(fft  2.5,ps 5438.0) [OK]
 FFT+PS+SM(  2048)   14.6 GFlops    7.9 GB/s  ulps(fft  2.7,ps 5674.7) [OK]
 FFT+PS+SM(  4096)   14.1 GFlops    7.0 GB/s  ulps(fft  2.4,ps 5202.4) [OK]
 FFT+PS+SM(  8192)   12.8 GFlops    6.0 GB/s  ulps(fft  2.8,ps 5765.4) [OK]
 FFT+PS+SM( 16384)   11.6 GFlops    5.0 GB/s  ulps(fft  3.3,ps 6291.8) [OK]
 FFT+PS+SM( 32768)   13.1 GFlops    5.3 GB/s  ulps(fft  3.3,ps 6275.5) [OK]
 FFT+PS+SM( 65536)   14.1 GFlops    5.4 GB/s  ulps(fft  3.4,ps 6429.1) [OK]
 FFT+PS+SM(131072)   14.0 GFlops    5.0 GB/s  ulps(fft  3.6,ps 6590.4) [OK]

sorry no time for avarages atm
« Last Edit: 24 Dec 2010, 04:36:04 pm by Miep »
The road to hell is paved with good intentions

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #271 on: 25 Dec 2010, 01:35:50 am »
Thanks Heinz, perrjay & Carola,

   Nice to see the stubborn chips(that Quadro & ION) edging forward a bit now.

@perryjay:  ~3x for 9500GT in some sizes? Don't know why that is completely but I like it  ;D 

Jason

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #272 on: 25 Dec 2010, 05:21:25 am »
Hi there,

Ran test #9 on my Q6600/8GB/8800GTX, under both WinXP-32 as well as Win7-64.

First, WinXP-32:

Code: [Select]
Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
 FFT+PS+SM(     8)    9.3 GFlops   16.4 GB/s  ulps(fft  1.3,ps 4775.9) [OK]
 FFT+PS+SM(    16)   13.6 GFlops   18.5 GB/s  ulps(fft  1.6,ps 4817.4) [OK]
 FFT+PS+SM(    32)   16.0 GFlops   17.8 GB/s  ulps(fft  1.6,ps 4628.1) [OK]
 FFT+PS+SM(    64)   28.3 GFlops   26.8 GB/s  ulps(fft  1.6,ps 4557.6) [OK]
 FFT+PS+SM(   128)   44.4 GFlops   36.5 GB/s  ulps(fft  2.0,ps 4942.0) [OK]
 FFT+PS+SM(   256)   59.2 GFlops   43.1 GB/s  ulps(fft  2.0,ps 4967.8) [OK]
 FFT+PS+SM(   512)   72.6 GFlops   47.4 GB/s  ulps(fft  2.1,ps 5128.1) [OK]
 FFT+PS+SM(  1024)   71.7 GFlops   42.5 GB/s  ulps(fft  2.5,ps 5552.5) [OK]
 FFT+PS+SM(  2048)   72.1 GFlops   39.1 GB/s  ulps(fft  2.7,ps 5770.3) [OK]
 FFT+PS+SM(  4096)   66.5 GFlops   33.3 GB/s  ulps(fft  2.4,ps 5313.7) [OK]
 FFT+PS+SM(  8192)   63.3 GFlops   29.4 GB/s  ulps(fft  2.8,ps 5881.1) [OK]
 FFT+PS+SM( 16384)   58.6 GFlops   25.3 GB/s  ulps(fft  3.3,ps 6399.1) [OK]
 FFT+PS+SM( 32768)   62.9 GFlops   25.5 GB/s  ulps(fft  3.3,ps 6380.1) [OK]
 FFT+PS+SM( 65536)   67.2 GFlops   25.6 GB/s  ulps(fft  3.4,ps 6534.8) [OK]
 FFT+PS+SM(131072)   66.0 GFlops   23.7 GB/s  ulps(fft  3.6,ps 6694.2) [OK]


Opt1 (worst case): 64 thrds/block
 FFT+PS+SM(     8)   14.3 GFlops   25.2 GB/s  ulps(fft  1.3,ps 4637.5) [OK]
 FFT+PS+SM(    16)   21.2 GFlops   28.9 GB/s  ulps(fft  1.6,ps 4589.2) [OK]
 FFT+PS+SM(    32)   27.5 GFlops   30.7 GB/s  ulps(fft  1.6,ps 4535.6) [OK]
 FFT+PS+SM(    64)   39.1 GFlops   37.0 GB/s  ulps(fft  1.6,ps 4426.7) [OK]
 FFT+PS+SM(   128)   47.4 GFlops   39.0 GB/s  ulps(fft  2.0,ps 4818.1) [OK]
 FFT+PS+SM(   256)   62.5 GFlops   45.5 GB/s  ulps(fft  2.0,ps 4831.0) [OK]
 FFT+PS+SM(   512)   76.0 GFlops   49.7 GB/s  ulps(fft  2.1,ps 4987.2) [OK]
 FFT+PS+SM(  1024)   74.1 GFlops   43.9 GB/s  ulps(fft  2.5,ps 5438.0) [OK]
 FFT+PS+SM(  2048)   74.2 GFlops   40.3 GB/s  ulps(fft  2.7,ps 5674.7) [OK]
 FFT+PS+SM(  4096)   67.3 GFlops   33.7 GB/s  ulps(fft  2.4,ps 5202.4) [OK]
 FFT+PS+SM(  8192)   64.7 GFlops   30.0 GB/s  ulps(fft  2.8,ps 5765.4) [OK]
 FFT+PS+SM( 16384)   59.8 GFlops   25.9 GB/s  ulps(fft  3.3,ps 6291.8) [OK]
 FFT+PS+SM( 32768)   64.3 GFlops   26.0 GB/s  ulps(fft  3.3,ps 6275.5) [OK]
 FFT+PS+SM( 65536)   68.6 GFlops   26.1 GB/s  ulps(fft  3.4,ps 6429.1) [OK]
 FFT+PS+SM(131072)   67.5 GFlops   24.3 GB/s  ulps(fft  3.6,ps 6590.4) [OK]

Second, Win7-64:

Code: [Select]
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
 FFT+PS+SM(     8)    8.4 GFlops   14.9 GB/s  ulps(fft  1.3,ps 4775.9) [OK]
 FFT+PS+SM(    16)   12.1 GFlops   16.6 GB/s  ulps(fft  1.6,ps 4817.4) [OK]
 FFT+PS+SM(    32)   14.6 GFlops   16.3 GB/s  ulps(fft  1.6,ps 4628.1) [OK]
 FFT+PS+SM(    64)   25.9 GFlops   24.5 GB/s  ulps(fft  1.6,ps 4557.6) [OK]
 FFT+PS+SM(   128)   38.6 GFlops   31.8 GB/s  ulps(fft  2.0,ps 4942.0) [OK]
 FFT+PS+SM(   256)   50.3 GFlops   36.6 GB/s  ulps(fft  2.0,ps 4967.8) [OK]
 FFT+PS+SM(   512)   61.2 GFlops   40.0 GB/s  ulps(fft  2.1,ps 5128.1) [OK]
 FFT+PS+SM(  1024)   61.6 GFlops   36.5 GB/s  ulps(fft  2.5,ps 5552.5) [OK]
 FFT+PS+SM(  2048)   62.3 GFlops   33.8 GB/s  ulps(fft  2.7,ps 5770.3) [OK]
 FFT+PS+SM(  4096)   57.5 GFlops   28.7 GB/s  ulps(fft  2.4,ps 5313.7) [OK]
 FFT+PS+SM(  8192)   56.1 GFlops   26.0 GB/s  ulps(fft  2.8,ps 5881.1) [OK]
 FFT+PS+SM( 16384)   52.4 GFlops   22.7 GB/s  ulps(fft  3.3,ps 6399.1) [OK]
 FFT+PS+SM( 32768)   55.5 GFlops   22.5 GB/s  ulps(fft  3.3,ps 6380.1) [OK]
 FFT+PS+SM( 65536)   59.2 GFlops   22.5 GB/s  ulps(fft  3.4,ps 6534.8) [OK]
 FFT+PS+SM(131072)   58.8 GFlops   21.1 GB/s  ulps(fft  3.6,ps 6694.2) [OK]


Opt1 (worst case): 64 thrds/block
 FFT+PS+SM(     8)   14.2 GFlops   25.0 GB/s  ulps(fft  1.3,ps 4637.5) [OK]
 FFT+PS+SM(    16)   21.0 GFlops   28.6 GB/s  ulps(fft  1.6,ps 4589.2) [OK]
 FFT+PS+SM(    32)   27.5 GFlops   30.7 GB/s  ulps(fft  1.6,ps 4535.6) [OK]
 FFT+PS+SM(    64)   39.2 GFlops   37.1 GB/s  ulps(fft  1.6,ps 4426.7) [OK]
 FFT+PS+SM(   128)   46.8 GFlops   38.5 GB/s  ulps(fft  2.0,ps 4818.1) [OK]
 FFT+PS+SM(   256)   61.1 GFlops   44.5 GB/s  ulps(fft  2.0,ps 4831.0) [OK]
 FFT+PS+SM(   512)   75.2 GFlops   49.2 GB/s  ulps(fft  2.1,ps 4987.2) [OK]
 FFT+PS+SM(  1024)   73.6 GFlops   43.6 GB/s  ulps(fft  2.5,ps 5438.0) [OK]
 FFT+PS+SM(  2048)   73.4 GFlops   39.8 GB/s  ulps(fft  2.7,ps 5674.7) [OK]
 FFT+PS+SM(  4096)   67.7 GFlops   33.9 GB/s  ulps(fft  2.4,ps 5202.4) [OK]
 FFT+PS+SM(  8192)   64.4 GFlops   29.8 GB/s  ulps(fft  2.8,ps 5765.4) [OK]
 FFT+PS+SM( 16384)   59.5 GFlops   25.7 GB/s  ulps(fft  3.3,ps 6291.8) [OK]
 FFT+PS+SM( 32768)   64.0 GFlops   25.9 GB/s  ulps(fft  3.3,ps 6275.5) [OK]
 FFT+PS+SM( 65536)   68.2 GFlops   26.0 GB/s  ulps(fft  3.4,ps 6429.1) [OK]
 FFT+PS+SM(131072)   67.1 GFlops   24.1 GB/s  ulps(fft  3.6,ps 6590.4) [OK]

Regards, Patrick.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #273 on: 25 Dec 2010, 08:53:02 am »
Ran test #9 on my Q6600/8GB/8800GTX, under both WinXP-32 as well as Win7-64.

Excellent, not broken on the 8800.  Last hurdle for that code area cleared & can move on  :D

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: [Split] PowerSpectrum Unit Test
« Reply #274 on: 25 Dec 2010, 11:14:03 am »
Carola just mentioned something I haven't been doing. I have been running the test without stopping BOINC. Should I run it with BOINC stopped?

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: [Split] PowerSpectrum Unit Test
« Reply #275 on: 25 Dec 2010, 12:56:38 pm »
It's not neccessary to completely stop Boinc, but at least the GPU should be snoozed.
Can't test GPU computing/memory transfers when you are crunching with it.
Else you will see reduced values on the test.
The road to hell is paved with good intentions

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: [Split] PowerSpectrum Unit Test
« Reply #276 on: 25 Dec 2010, 01:05:33 pm »
Okay, let's see how much of a difference this makes....
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd\test

C:\test>powerspectrumtest9.exe

Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #9 (FFT pipeline)
                                Christmas 2010 edition.
Stock:
 FFT+PS+SM(     8)    3.0 GFlops    5.2 GB/s  ulps(fft  1.3,ps 4775.9) [OK]
 FFT+PS+SM(    16)    4.0 GFlops    5.5 GB/s  ulps(fft  1.6,ps 4817.4) [OK]
 FFT+PS+SM(    32)    4.4 GFlops    5.0 GB/s  ulps(fft  1.6,ps 4628.1) [OK]
 FFT+PS+SM(    64)    7.1 GFlops    6.7 GB/s  ulps(fft  1.6,ps 4557.6) [OK]
 FFT+PS+SM(   128)    9.8 GFlops    8.1 GB/s  ulps(fft  2.0,ps 4942.0) [OK]
 FFT+PS+SM(   256)   11.9 GFlops    8.6 GB/s  ulps(fft  2.0,ps 4967.8) [OK]
 FFT+PS+SM(   512)   15.0 GFlops    9.8 GB/s  ulps(fft  2.1,ps 5128.1) [OK]
 FFT+PS+SM(  1024)   16.2 GFlops    9.6 GB/s  ulps(fft  2.5,ps 5552.5) [OK]
 FFT+PS+SM(  2048)   17.5 GFlops    9.5 GB/s  ulps(fft  2.7,ps 5770.3) [OK]
 FFT+PS+SM(  4096)   13.4 GFlops    6.7 GB/s  ulps(fft  2.4,ps 5313.7) [OK]
 FFT+PS+SM(  8192)   14.2 GFlops    6.6 GB/s  ulps(fft  2.8,ps 5881.1) [OK]
 FFT+PS+SM( 16384)   13.7 GFlops    5.9 GB/s  ulps(fft  3.3,ps 6399.1) [OK]
 FFT+PS+SM( 32768)   12.1 GFlops    4.9 GB/s  ulps(fft  3.3,ps 6380.1) [OK]
 FFT+PS+SM( 65536)   13.0 GFlops    5.0 GB/s  ulps(fft  3.4,ps 6534.8) [OK]
 FFT+PS+SM(131072)   13.9 GFlops    5.0 GB/s  ulps(fft  3.6,ps 6694.2) [OK]


Opt1 (worst case): 64 thrds/block
 FFT+PS+SM(     8)    4.1 GFlops    7.3 GB/s  ulps(fft  1.3,ps 4637.5) [OK]
 FFT+PS+SM(    16)    5.7 GFlops    7.7 GB/s  ulps(fft  1.6,ps 4589.2) [OK]
 FFT+PS+SM(    32)    7.0 GFlops    7.8 GB/s  ulps(fft  1.6,ps 4535.6) [OK]
 FFT+PS+SM(    64)    9.2 GFlops    8.7 GB/s  ulps(fft  1.6,ps 4426.7) [OK]
 FFT+PS+SM(   128)   10.5 GFlops    8.6 GB/s  ulps(fft  2.0,ps 4818.1) [OK]
 FFT+PS+SM(   256)   12.7 GFlops    9.2 GB/s  ulps(fft  2.0,ps 4831.0) [OK]
 FFT+PS+SM(   512)   16.0 GFlops   10.5 GB/s  ulps(fft  2.1,ps 4987.2) [OK]
 FFT+PS+SM(  1024)   17.3 GFlops   10.2 GB/s  ulps(fft  2.5,ps 5438.0) [OK]
 FFT+PS+SM(  2048)   18.5 GFlops   10.0 GB/s  ulps(fft  2.7,ps 5674.7) [OK]
 FFT+PS+SM(  4096)   13.7 GFlops    6.9 GB/s  ulps(fft  2.4,ps 5202.4) [OK]
 FFT+PS+SM(  8192)   14.9 GFlops    6.9 GB/s  ulps(fft  2.8,ps 5765.4) [OK]
 FFT+PS+SM( 16384)   15.4 GFlops    6.6 GB/s  ulps(fft  3.3,ps 6291.8) [OK]
 FFT+PS+SM( 32768)   13.1 GFlops    5.3 GB/s  ulps(fft  3.3,ps 6275.5) [OK]
 FFT+PS+SM( 65536)   13.8 GFlops    5.3 GB/s  ulps(fft  3.4,ps 6429.1) [OK]
 FFT+PS+SM(131072)   14.5 GFlops    5.2 GB/s  ulps(fft  3.6,ps 6590.4) [OK]



C:\test>

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #277 on: 25 Dec 2010, 01:12:13 pm »
Ahah! that explained the inflated speedup on the previous run  :) .  In essence (some of) the optimisations (namely, asynchronous transfers) I'm trying out should be less susceptible to  slowdowns under load than stock code (synchronous transfers)....

I wasn't looking to test/refine that aspect yet, but you managed to prove it already works... Thanks!  ;D

(Overlapped execution/transfers on Pre-Fermi, and concurrent kernels on Fermi next .... )
« Last Edit: 25 Dec 2010, 01:16:35 pm by Jason G »

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: [Split] PowerSpectrum Unit Test
« Reply #278 on: 25 Dec 2010, 03:39:09 pm »
Sorry bout that... hope I didn't mess you up too much. Glad it gave you some extra to think about.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #279 on: 25 Dec 2010, 06:02:06 pm »
Sorry bout that... hope I didn't mess you up too much. Glad it gave you some extra to think about.
  Not at all messed up, just had me wondering how 9500GT was managing to get 3x throughput at some sizes, and now we know it was under load ;).  That unexpected benefit does indeed give me some more things to consider for the next stage, and it looks like we might be able to push a bit harder than I thought.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: [Split] PowerSpectrum Unit Test
« Reply #280 on: 25 Dec 2010, 06:12:35 pm »
Hey guys, I done something right for a change!!!  :)    ::)  Looking forward to the next test. This time I'll know to turn it off!

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #281 on: 25 Dec 2010, 09:58:08 pm »
Ran test #9 on my Q6600/8GB/8800GTX, under both WinXP-32 as well as Win7-64.

Excellent, not broken on the 8800.  Last hurdle for that code area cleared & can move on  :D

Wonderful to hear that. As always, looking forward to the next bit of execution-magic. ;)

Regards, Patrick.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #282 on: 26 Dec 2010, 04:36:06 am »
Will take me some time to cook up the next test, working out this streaming stuff.
  Mixed results with kernel streaming so far, appearing to benefit my smaller highly optimised kernels more over the stock-ish larger sizes (don't know why yet, and dividing further into additional streams seems to slow it down again ... tricky!  ):

As with test #9 (single stream)
Quote
Opt1 (worst case): 256 thrds/block, 1 x 1048576 element streams
 FFT+PS+SM(     8)   19.2 GFlops   33.8 GB/s  ulps(fft  1.2,ps 4324.2) [OK]
 FFT+PS+SM(    16)   36.8 GFlops   50.3 GB/s  ulps(fft  1.6,ps 4326.2) [OK]
 FFT+PS+SM(    32)   60.7 GFlops   67.8 GB/s  ulps(fft  1.3,ps 4003.6) [OK]
 FFT+PS+SM(    64)   86.2 GFlops   81.6 GB/s  ulps(fft  1.5,ps 4270.2) [OK]
 FFT+PS+SM(   128)   92.5 GFlops   76.1 GB/s  ulps(fft  1.7,ps 4347.9) [OK]
 FFT+PS+SM(   256)  135.0 GFlops   98.3 GB/s  ulps(fft  1.7,ps 4261.8) [OK]
FFT+PS+SM(   512)  172.0 GFlops  112.4 GB/s  ulps(fft  1.8,ps 4327.4) [OK]
 FFT+PS+SM(  1024)  214.7 GFlops  127.3 GB/s  ulps(fft  2.1,ps 4727.6) [OK]
 FFT+PS+SM(  2048)  225.9 GFlops  122.6 GB/s  ulps(fft  2.2,ps 4921.2) [OK]
 FFT+PS+SM(  4096)  232.3 GFlops  116.2 GB/s  ulps(fft  2.2,ps 4764.3) [OK]
 FFT+PS+SM(  8192)  226.0 GFlops  104.8 GB/s  ulps(fft  2.6,ps 5278.8) [OK]
 FFT+PS+SM( 16384)  221.5 GFlops   95.8 GB/s  ulps(fft  2.6,ps 5357.5) [OK]
 FFT+PS+SM( 32768)  213.1 GFlops   86.3 GB/s  ulps(fft  2.3,ps 4992.8) [OK]
 FFT+PS+SM( 65536)  210.5 GFlops   80.2 GB/s  ulps(fft  2.0,ps 4604.3) [OK]
 FFT+PS+SM(131072)  202.6 GFlops   72.8 GB/s  ulps(fft  2.7,ps 5392.8) [OK]

2x streams:
Quote
Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
 FFT+PS+SM(     8)   26.7 GFlops   47.2 GB/s  ulps(fft  1.2,ps 4324.2) [OK]
 FFT+PS+SM(    16)   66.9 GFlops   91.3 GB/s  ulps(fft  1.6,ps 4326.2) [OK]
 FFT+PS+SM(    32)   90.9 GFlops  101.5 GB/s  ulps(fft  1.3,ps 4003.6) [OK]
 FFT+PS+SM(    64)  105.0 GFlops   99.4 GB/s  ulps(fft  1.5,ps 4270.2) [OK]
 FFT+PS+SM(   128)   94.0 GFlops   77.3 GB/s  ulps(fft  1.7,ps 4347.9) [OK]
 FFT+PS+SM(   256)  135.9 GFlops   98.9 GB/s  ulps(fft  1.7,ps 4261.8) [OK]

 FFT+PS+SM(   512)  167.9 GFlops  109.7 GB/s  ulps(fft  1.8,ps 4327.4) [OK]
 FFT+PS+SM(  1024)  198.4 GFlops  117.6 GB/s  ulps(fft  2.1,ps 4727.6) [OK]
 FFT+PS+SM(  2048)  209.1 GFlops  113.4 GB/s  ulps(fft  2.2,ps 4921.2) [OK]
 FFT+PS+SM(  4096)  209.9 GFlops  105.0 GB/s  ulps(fft  2.2,ps 4764.3) [OK]
 FFT+PS+SM(  8192)  204.8 GFlops   95.0 GB/s  ulps(fft  2.6,ps 5278.8) [OK]
 FFT+PS+SM( 16384)  205.0 GFlops   88.6 GB/s  ulps(fft  2.6,ps 5357.5) [OK]
 FFT+PS+SM( 32768)  187.5 GFlops   75.9 GB/s  ulps(fft  2.3,ps 4992.8) [OK]
 FFT+PS+SM( 65536)  195.2 GFlops   74.4 GB/s  ulps(fft  2.0,ps 4604.3) [OK]
 FFT+PS+SM(131072)  172.5 GFlops   62.0 GB/s  ulps(fft  2.7,ps 5392.8) [OK]


Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #283 on: 26 Dec 2010, 12:13:36 pm »
Updated first Post:
Quote
Update: PowerPsectrum Test #10 (attached)
- summary performance of FFT pipeline improvements against stock, for assessing overall progress
- can vary, so may need a few runs, just to check stability of result
- Please use DLLs provided with Test#9

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: [Split] PowerSpectrum Unit Test
« Reply #284 on: 26 Dec 2010, 12:33:43 pm »
Code: [Select]
Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   67.27) Peak(  111.28) Min(    9.42) [OK]
   Memory thoughput GB/s   Avg(   36.72) Peak(   55.70) Min(   15.41)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   84.36, 1.25x) Peak(  131.47, 1.18x) Min(   31.13, 3.30x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   51.22, 1.39x) Peak(   66.16, 1.19x) Min(   34.18, 2.22x)

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 100
Total: 100
Powered by EzPortal