+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 162615 times)

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: [Split] PowerSpectrum Unit Test
« Reply #210 on: 21 Dec 2010, 02:48:19 pm »
Preparing the usual three:

9800GTX+, Windows 7/32
Code: [Select]
Device: GeForce 9800 GTX/9800 GTX+, 1890 MHz clock, 498 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    1.7 GFlops    7.4 GB/s
 PS+SuMx(    16) [OK]    2.3 GFlops    9.6 GB/s
 PS+SuMx(    32) [OK]    2.6 GFlops   10.5 GB/s
 PS+SuMx(    64) [OK]    3.9 GFlops   15.9 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.5   15.4 121.7 [OK]    7.1   31.3 121.7
 PS+SuMx(    16)    4.0   16.5 121.7 [OK]    7.4   31.0 121.7
 PS+SuMx(    32)    4.9   20.0 121.7 [OK]    7.2   29.5 121.7
 PS+SuMx(    64)    6.3   25.4 121.7 [OK]    8.8   35.5 121.7

9800GT, Windows XP/32
Code: [Select]
Device: GeForce 9800 GT, 1500 MHz clock, 512 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    1.7 GFlops    7.2 GB/s
 PS+SuMx(    16) [OK]    2.1 GFlops    8.9 GB/s
 PS+SuMx(    32) [OK]    2.2 GFlops    9.0 GB/s
 PS+SuMx(    64) [OK]    3.6 GFlops   14.5 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    2.5   11.1 121.7 [OK]    5.2   22.9 121.7
 PS+SuMx(    16)    3.5   14.7 121.7 [OK]    5.5   23.0 121.7
 PS+SuMx(    32)    4.1   16.7 121.7 [OK]    5.2   21.2 121.7
 PS+SuMx(    64)    5.4   21.7 121.7 [OK]    6.3   25.7 121.7

GTX 470, Windows XP/32
Code: [Select]
Device: GeForce GTX 470, 1215 MHz clock, 1280 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    2.3 GFlops    9.9 GB/s
 PS+SuMx(    16) [OK]    3.0 GFlops   12.6 GB/s
 PS+SuMx(    32) [OK]    3.0 GFlops   12.1 GB/s
 PS+SuMx(    64) [OK]    4.8 GFlops   19.3 GB/s


Opt1: 256 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.7   16.0 121.7 [OK]   15.6   68.4 121.7
 PS+SuMx(    16)    5.7   23.9 121.7 [OK]   14.8   61.8 121.7
 PS+SuMx(    32)    7.9   32.5 121.7 [OK]   14.3   58.7 121.7
 PS+SuMx(    64)    9.9   39.9 121.7 [OK]   14.0   56.7 121.7
« Last Edit: 21 Dec 2010, 03:02:29 pm by Richard Haselgrove »

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: [Split] PowerSpectrum Unit Test
« Reply #211 on: 21 Dec 2010, 03:09:47 pm »
Here's mine...


Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd/test

C:\test> powerspectrumtest7.exe

Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    0.7 GFlops    3.2 GB/s
 PS+SuMx(    16) [OK]    0.8 GFlops    3.5 GB/s
 PS+SuMx(    32) [OK]    0.8 GFlops    3.1 GB/s
 PS+SuMx(    64) [OK]    1.1 GFlops    4.4 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    1.2    5.4 121.7 [OK]    1.6    6.8 121.7
 PS+SuMx(    16)    0.7    3.0 121.7 [OK]    1.5    6.1 121.7
 PS+SuMx(    32)    1.4    5.6 121.7 [OK]    1.6    6.4 121.7
 PS+SuMx(    64)    1.7    6.7 121.7 [OK]    1.8    7.5 121.7



C:\test>

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: [Split] PowerSpectrum Unit Test
« Reply #212 on: 21 Dec 2010, 03:44:24 pm »
Best case requires few memory transfers back to the host CPU ( only one best spike & no detections)  ;)

[Edit:] Worst case would be a best signal + numdatapoints/fftlen detections, i.e. not really possible since we're limited to 30 detections, so wouldn't bother transferring more than the first 30 ( ... unlike stock...)

Now he tells us ::) ;)
So normal data would perform somewhere in between - any info on the distribution between the two endpoints?

The lower graph on http://setiathome.berkeley.edu/sah_glossary/spike_graphs.php is related, note the log scale on the counts. S@H Enhanced does relatively more short FFT lengths, but there's still a very strong bias toward the long FFT lengths for both reportable and "best" spikes. A quick survey of 44 recent results from my P-M showed 35 best_spikes at fft_len 131072, 6 at fft_len 65536, 2 at fft_len 32768, and 1 at fft_len 16384.

However, the processing order starts at FFT length 8 and works up, so there should be some "worst case" for short FFT lengths during that zero chirp sequence. Subsequent visits to the short FFT lengths are likely to be all "best case". At AR 0.42 FFT length 8 is done 13 times so overall there will be mostly "best case", but at AR 3.0 FFT length 8 is only done once so the probability of "worst case" will be higher.

Note that our test WUs shortened by lowering chirp limits will have a higher proportion of the zero chirp worst cases than full length WUs. In general I think that's good, brief sloppy tests which slightly underestimate improvement from optimization are better than those which cause unwarranted enthusiasm. But it would also be possible to create a set of test WUs shortened by adjusting chirp resolution which would give better quick test timing.

Edit: Jason, result_overflow is triggered by the 31st found signal...
                                                                                           Joe
« Last Edit: 21 Dec 2010, 03:46:55 pm by Josef W. Segur »

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: [Split] PowerSpectrum Unit Test
« Reply #213 on: 21 Dec 2010, 07:07:52 pm »
And now the GTX460-768 card,

Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    2.2 GFlops    9.7 GB/s
 PS+SuMx(    16) [OK]    2.8 GFlops   11.5 GB/s
 PS+SuMx(    32) [OK]    2.1 GFlops    8.7 GB/s
 PS+SuMx(    64) [OK]    3.4 GFlops   13.6 GB/s


Opt1: 256 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    4.2   18.3 121.7 [OK]   11.1   48.5 121.7
 PS+SuMx(    16)    5.8   24.5 121.7 [OK]   10.5   44.1 121.7
 PS+SuMx(    32)    7.2   29.7 121.7 [OK]   10.2   41.7 121.7
 PS+SuMx(    64)    8.4   33.9 121.7 [OK]   10.2   41.5 121.7

Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #214 on: 21 Dec 2010, 07:19:50 pm »

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    5.0 GFlops   22.0 GB/s
 PS+SuMx(    16) [OK]    6.0 GFlops   25.3 GB/s
 PS+SuMx(    32) [OK]    4.7 GFlops   19.2 GB/s
 PS+SuMx(    64) [OK]    7.2 GFlops   29.1 GB/s


Opt1: 256 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    9.0   39.2 121.7 [OK]   23.0  100.7 121.7
 PS+SuMx(    16)   11.7   49.0 121.7 [OK]   21.7   90.8 121.7
 PS+SuMx(    32)   13.6   55.8 121.7 [OK]   21.1   86.4 121.7
 PS+SuMx(    64)   15.1   61.2 121.7 [OK]   20.7   83.7 121.7


Steve

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #215 on: 22 Dec 2010, 01:11:14 am »
Thanks all for the massive amount of data  ;D  , will peruse to see if anything;s amiss, but think I found the sweet spot for 'worst case' at the moment, which is straightforward implementation.  I'm delighted that nothing seems to be broken on any GPU tested so far.  There is a lot of work to do to add the remaining sizes into the test (remaining powers of 2 up to 128k or so, maybe some larger sizes for growing room), Then adding FFTs & Findspikes on either side of this pipeline.   Once that's done looks like I can stripe the processing to fit Fermi's L2 cache, right through this pipeline, which should speed things up a lot for those cards.

@Joe, Thanks!, I keep forgetting it's 31 not 30  ::)  probably would have found it the hard way (again), but the heads up helps.

Jason


Offline glennaxl

  • Knight o' The Realm
  • **
  • Posts: 86
Re: [Split] PowerSpectrum Unit Test
« Reply #216 on: 22 Dec 2010, 01:11:31 am »
-device 0
Code: [Select]
Device: GeForce GTX 295, 1476 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    4.5 GFlops   19.6 GB/s
 PS+SuMx(    16) [OK]    5.0 GFlops   20.9 GB/s
 PS+SuMx(    32) [OK]    4.6 GFlops   18.7 GB/s
 PS+SuMx(    64) [OK]    7.0 GFlops   28.4 GB/s


Opt1: 128 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    6.1   26.7 121.7 [OK]   11.7   51.4 121.7
 PS+SuMx(    16)    7.5   31.2 121.7 [OK]   11.5   48.0 121.7
 PS+SuMx(    32)    8.7   35.6 121.7 [OK]   12.0   48.9 121.7
 PS+SuMx(    64)   10.9   44.1 121.7 [OK]   14.5   58.9 121.7

-device 1
Code: [Select]
Device: GeForce GTX 295, 1476 MHz clock, 873 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    4.4 GFlops   19.3 GB/s
 PS+SuMx(    16) [OK]    4.9 GFlops   20.6 GB/s
 PS+SuMx(    32) [OK]    4.5 GFlops   18.5 GB/s
 PS+SuMx(    64) [OK]    6.9 GFlops   27.9 GB/s


Opt1: 128 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    6.0   26.3 121.7 [OK]   11.6   50.8 121.7
 PS+SuMx(    16)    7.3   30.5 121.7 [OK]   11.4   47.7 121.7
 PS+SuMx(    32)    8.6   35.1 121.7 [OK]   11.7   48.1 121.7
 PS+SuMx(    64)   10.7   43.3 121.7 [OK]   14.4   58.2 121.7

-device 2
Code: [Select]
Device: GeForce GTX 260, 1487 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    4.3 GFlops   18.7 GB/s
 PS+SuMx(    16) [OK]    4.8 GFlops   19.9 GB/s
 PS+SuMx(    32) [OK]    4.3 GFlops   17.6 GB/s
 PS+SuMx(    64) [OK]    6.6 GFlops   26.8 GB/s


Opt1: 128 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    5.8   25.5 121.7 [OK]   10.9   47.5 121.7
 PS+SuMx(    16)    7.1   29.7 121.7 [OK]   10.6   44.3 121.7
 PS+SuMx(    32)    8.2   33.7 121.7 [OK]   11.0   45.2 121.7
 PS+SuMx(    64)   10.4   42.0 121.7 [OK]   13.5   54.7 121.7

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #217 on: 22 Dec 2010, 01:26:47 am »
a Hah!, we 're finding the 2xx series limits at last.  'best case' is tapering off sooner & clearly compute bound, while the worst cases show the limit of DDR3 against fermi's DDR5 memory.

Fermi best cases appear to be limited by the memory subsystem still, so down the road I'll be striping(streaming) this pipeline to fit in those cache levels.  That should lift the apparent ~20GFlops limit a bit on Fermis,  Unfortunately the 2xx cards don't have the cache levels, so we might be reaching a limit with those in some respects.

@glennaxl: could you confirm that the 200 series cards are reaching near ~100% GPU utilisation during the Opt1 tests (higher than the stock portion) ?  I can lengthen the test sequence if needed.

[A bit Later:]  extending the tests from 0.5 to 5 seconds allowed me to see what the 480 is doing as a cross check.  Looks like the Opt1 best cases are reaching ~100%, and opt1 worst cases are bandwidth limited, all as expected, no surprises yet.

[Still later:] I've added the extended PowerSpectrumTest7 to the first post.  I don't need data for the extended test(results are more or less the same), but provide it for those that want to be able to see GPU utilisation differences between the test phases on their cards, like the attached image. 

Moving onto larger sizes & FFT integration , after some beer   ;)
« Last Edit: 22 Dec 2010, 02:36:19 am by Jason G »

Offline glennaxl

  • Knight o' The Realm
  • **
  • Posts: 86
Re: [Split] PowerSpectrum Unit Test
« Reply #218 on: 22 Dec 2010, 02:42:25 am »
@glennaxl: could you confirm that the 200 series cards are reaching near ~100% GPU utilisation during the Opt1 tests (higher than the stock portion) ?  I can lengthen the test sequence if needed.

Yes, Opt1 spikes to 99%.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #219 on: 22 Dec 2010, 02:43:36 am »
Cheers!

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #220 on: 22 Dec 2010, 04:45:42 am »
7_extended
~~~~~~~
PowerSpectrumTest7_extended.exe -device 0

Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    2.7 GFlops   12.0 GB/s
 PS+SuMx(    16) [OK]    3.7 GFlops   15.6 GB/s
 PS+SuMx(    32) [OK]    3.3 GFlops   13.7 GB/s
 PS+SuMx(    64) [OK]    5.1 GFlops   20.7 GB/s


Opt1: 256 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    4.9   21.5 121.7 [OK]   17.6   77.2 121.7
 PS+SuMx(    16)    7.1   29.7 121.7 [OK]   16.7   69.8 121.7
 PS+SuMx(    32)    8.3   34.1 121.7 [OK]   16.2   66.4 121.7
 PS+SuMx(    64)   10.2   41.3 121.7 [OK]   16.0   64.6 121.7


PowerSpectrumTest7_extended.exe -device 1

Device: GeForce GTX 470, 810 MHz clock, 1249 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    2.7 GFlops   12.0 GB/s
 PS+SuMx(    16) [OK]    3.7 GFlops   15.4 GB/s
 PS+SuMx(    32) [OK]    3.4 GFlops   13.9 GB/s
 PS+SuMx(    64) [OK]    5.1 GFlops   20.7 GB/s


Opt1: 256 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    5.0   21.8 121.7 [OK]   17.7   77.4 121.7
 PS+SuMx(    16)    7.1   29.9 121.7 [OK]   16.7   70.0 121.7
 PS+SuMx(    32)    8.9   36.5 121.7 [OK]   16.3   66.6 121.7
 PS+SuMx(    64)   10.5   42.4 121.7 [OK]   16.0   64.7 121.7


.
Done
gpuload
I had never seen this Memory Controller load spike, comparing with primegrid it shows nothing.
gpuload_prime
« Last Edit: 22 Dec 2010, 09:05:21 am by Jason G »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #221 on: 22 Dec 2010, 06:42:54 am »
7 extended ION
~~~~~~~~~~
PowerSpectrumTest7_extended.exe -device 0

Device: ION, 1100 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    0.4 GFlops    1.5 GB/s
 PS+SuMx(    16) [OK]    0.3 GFlops    1.4 GB/s
 PS+SuMx(    32) [OK]    0.3 GFlops    1.1 GB/s
 PS+SuMx(    64) [OK]    0.4 GFlops    1.7 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    0.5    2.4 121.7 [OK]    0.6    2.8 121.7
 PS+SuMx(    16)    0.6    2.3 121.7 [OK]    0.6    2.6 121.7
 PS+SuMx(    32)    0.5    2.2 121.7 [OK]    0.6    2.3 121.7
 PS+SuMx(    64)    0.7    2.7 121.7 [OK]    0.7    2.9 121.7


.
Done
hmm. how to interpret
the stock values 1,7GB/s are much better with the ION.
must lookup to the ION device properties
CUDA: ION
Informationsliste   Wert
Geräteeigenschaften   
Gerätename   ION
Taktrate   1100 MHz
Multiprozessor / Kerne   2 / 16
Max Threads Per Block   512
Max Registers Per Block   8192
Warp Size   32 threads
Max Block Size   512 x 512 x 64
Max Grid Size   65535 x 65535 x 1
Compute Capability   1.1
CUDA DLL   nvcuda.dll (8.17.12.6061 - nVIDIA ForceWare 260.61)
   
Speichereigenschaften   
Total Memory   241 MB
Total Constant Memory   64 KB
Max Shared Memory Per Block   16 KB
Max Memory Pitch   2147483647 Bytes
Texture Alignment   256 Bytes
   
Gerät Besonderheiten   
32-bit Floating-Point Atomic Addition   Nicht unterstützt
32-bit Integer Atomic Operations   Unterstützt
64-bit Integer Atomic Operations   Nicht unterstützt
Concurrent Memory Copy & Execute   Nicht unterstützt
Double-Precision Floating-Point   Nicht unterstützt
Warp Vote Functions   Nicht unterstützt
__ballot()   Nicht unterstützt
__syncthreads_and()   Nicht unterstützt
__syncthreads_count()   Nicht unterstützt
__syncthreads_or()   Nicht unterstützt
__threadfence_system()   Nicht unterstützt
   
Gerätehersteller   
Firmenname   NVIDIA Corporation
Produktinformation   http://www.nvidia.com/page/products.html
Treiberdownload   http://www.nvidia.com/content/drivers/drivers.asp
Treiberupdate   http://www.aida64.com/driver-updates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OPEN_CL
~~~~~~~
OpenCL: ION
Informationsliste   Wert
OpenCL Properties   
Platform Name   NVIDIA CUDA
Platform Vendor   NVIDIA Corporation
Platform Version   OpenCL 1.0 CUDA 3.2.1
Platform Profile   Full
   
Geräteeigenschaften   
Gerätename   ION
Geräteart   Grafikprozessor (GPU)
Device Vendor   NVIDIA Corporation
Device Version   OpenCL 1.0 CUDA
Device Profile   Full
Taktrate   1100 MHz
Multiprocessors   2
Max 2D Image Size   4096 x 32768
Max 3D Image Size   2048 x 2048 x 2048
Max Samplers   16
Max Work-Item Size   512 x 512 x 64
Max Work-Group Size   512
Max Argument Size   4352 Bytes
Max Constant Buffer Size   64 KB
Max Constant Arguments   9
Profiling Timer Resolution   1000 ns
OpenCL DLL   opencl.dll (1.0.0)
   
Speichereigenschaften   
Global Memory   241 MB
Local Memory   16 KB
Memory Base Address Alignment   2048 Bit
Min Data Type Alignment   128 Bytes
   
Gerät Besonderheiten   
Command-Queue Out Of Order Execution   Aktiviert
Command-Queue Profiling   Aktiviert
Compiler   Unterstützt
Fehlerkorrektur   Nicht unterstützt
Images   Unterstützt
Kernel Execution   Unterstützt
Native Kernel Execution   Nicht unterstützt
   
Device Extensions   
cl_amd_d3d10_interop   Nicht unterstützt
cl_amd_d3d9_interop   Nicht unterstützt
cl_amd_device_attribute_query   Nicht unterstützt
cl_amd_fp64   Nicht unterstützt
cl_amd_media_ops   Nicht unterstützt
cl_amd_printf   Nicht unterstützt
cl_khr_3d_image_writes   Nicht unterstützt
cl_khr_byte_addressable_store   Unterstützt
cl_khr_d3d10_sharing   Unterstützt
cl_khr_fp16   Nicht unterstützt
cl_khr_fp64   Nicht unterstützt
cl_khr_gl_sharing   Unterstützt
cl_khr_global_int32_base_atomics   Unterstützt
cl_khr_global_int32_extended_atomics   Unterstützt
cl_khr_icd   Unterstützt
cl_khr_int64_base_atomics   Nicht unterstützt
cl_khr_int64_extended_atomics   Nicht unterstützt
cl_khr_local_int32_base_atomics   Nicht unterstützt
cl_khr_local_int32_extended_atomics   Nicht unterstützt
cl_khr_select_fprounding_mode   Nicht unterstützt
cl_nv_compiler_options   Unterstützt
cl_nv_d3d10_sharing   Unterstützt
cl_nv_d3d11_sharing   Unterstützt
cl_nv_d3d9_sharing   Unterstützt
cl_nv_device_attribute_query   Unterstützt
cl_nv_pragma_unroll   Unterstützt
   
Gerätehersteller   
Firmenname   NVIDIA Corporation
Produktinformation   http://www.nvidia.com/page/products.html
Treiberdownload   http://www.nvidia.com/content/drivers/drivers.asp
Treiberupdate   http://www.aida64.com/driver-updates
« Last Edit: 22 Dec 2010, 09:06:04 am by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #222 on: 22 Dec 2010, 08:59:03 am »
hmm. how to interpret
the stock values 1,7GB/s are much better with the ION.
must lookup to the ION device properties

No, your labels are misaligned Heinz, will fix them for you ....[Done... 2.7GB/s is a bit better than 1.7GB/s ]

[Edit] Fixed it again, and fixed the 470 ones so you can read them properly  ;)
« Last Edit: 22 Dec 2010, 09:07:05 am by Jason G »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #223 on: 22 Dec 2010, 09:24:15 am »
Thanks Jason,
must clean my glasses  ::)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #224 on: 22 Dec 2010, 05:53:11 pm »
7 extended ION
~~~~~~~~~~
rerun, now light oc'ed from  450 / 800 / 1100 to 475 / 850 / 1161

PowerSpectrumTest7_extended.exe -device 0

Device: ION, 1161 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    0.4 GFlops    1.6 GB/s
 PS+SuMx(    16) [OK]    0.3 GFlops    1.4 GB/s
 PS+SuMx(    32) [OK]    0.3 GFlops    1.1 GB/s
 PS+SuMx(    64) [OK]    0.4 GFlops    1.8 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    0.6    2.5 121.7 [OK]    0.7    2.9 121.7
 PS+SuMx(    16)    0.6    2.4 121.7 [OK]    0.6    2.7 121.7
 PS+SuMx(    32)    0.6    2.3 121.7 [OK]    0.6    2.4 121.7
 PS+SuMx(    64)    0.7    2.8 121.7 [OK]    0.8    3.1 121.7


.
Done
modify: the latest GPU-Z 0.4.9 did not show any Memory Controller load
looks like a issue ?
further it shows 4 ROPs for the ION, but it has 2 Multiprocessors(as far as I know)
emailed to techpowerup
« Last Edit: 22 Dec 2010, 06:28:30 pm by Jason G »

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 28
Total: 28
Powered by EzPortal