+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 137811 times)

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #15 on: 18 Nov 2010, 02:36:34 pm »
Not sure if you're looking for this, but below my results on my 8800GTX, 260.99 drivers:

Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:       17.8 GFlops    7.1 GB/s 1183.3ulps


GetPowerSpectrum() mod 1:
     32 threads:       14.2 GFlops    5.7 GB/s 1183.3ulps
     64 threads:       17.8 GFlops    7.1 GB/s 1183.3ulps
    128 threads:       17.8 GFlops    7.1 GB/s 1183.3ulps
    256 threads:       17.6 GFlops    7.0 GB/s 1183.3ulps


GetPowerSpectrum() mod 2:
     32 threads:        6.8 GFlops    2.7 GB/s 1183.3ulps
     64 threads:        6.2 GFlops    2.5 GB/s 1183.3ulps
    128 threads:        9.1 GFlops    3.7 GB/s 1183.3ulps
    256 threads:        8.0 GFlops    3.2 GB/s 1183.3ulps

Regards, Patrick.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #16 on: 18 Nov 2010, 03:37:36 pm »
starting PowerSpectrum2
.
-device 0
Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:       20.6 GFlops    8.2 GB/s   0.0ulps


GetPowerSpectrum() mod 1:
     32 threads:       12.5 GFlops    5.0 GB/s   0.0ulps
     64 threads:       20.5 GFlops    8.2 GB/s   0.0ulps
    128 threads:       27.6 GFlops   11.0 GB/s   0.0ulps
    256 threads:       29.9 GFlops   12.0 GB/s   0.0ulps


GetPowerSpectrum() mod 2:
     32 threads:       14.4 GFlops    5.8 GB/s   0.0ulps
     64 threads:       28.3 GFlops   11.3 GB/s   0.0ulps
    128 threads:       42.4 GFlops   16.9 GB/s   0.0ulps
    256 threads:       42.5 GFlops   17.0 GB/s   0.0ulps


-device 1
Device: GeForce GTX 470, 810 MHz clock, 1249 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:       20.6 GFlops    8.3 GB/s   0.0ulps


GetPowerSpectrum() mod 1:
     32 threads:       12.6 GFlops    5.0 GB/s   0.0ulps
     64 threads:       20.5 GFlops    8.2 GB/s   0.0ulps
    128 threads:       27.5 GFlops   11.0 GB/s   0.0ulps
    256 threads:       30.1 GFlops   12.0 GB/s   0.0ulps


GetPowerSpectrum() mod 2:
     32 threads:       14.4 GFlops    5.8 GB/s   0.0ulps
     64 threads:       28.4 GFlops   11.4 GB/s   0.0ulps
    128 threads:       42.2 GFlops   16.9 GB/s   0.0ulps
    256 threads:       41.1 GFlops   16.4 GB/s   0.0ulps


.
Done
modify:
@Jason, woundering about you get 20 GFlops more with 256 threads than mine GTX470
have you source for me to compile with 2011XE Compiler ?
« Last Edit: 18 Nov 2010, 03:51:06 pm by _heinz »

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: [Split] PowerSpectrum Unit Test
« Reply #17 on: 18 Nov 2010, 05:45:09 pm »
I tried running it on my 460 but the program always crashes on the end of 128/beginning of 256 threads in mod 2.

Never see any results.

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: [Split] PowerSpectrum Unit Test
« Reply #18 on: 18 Nov 2010, 05:47:34 pm »
Here's my 9800GTX+ result, like Richard's 9800GTX+ it's a factory overclocked example, but by XFX:

Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:       16.1 GFlops    6.5 GB/s 1183.3ulps


GetPowerSpectrum() mod 1:
     32 threads:       15.1 GFlops    6.1 GB/s 1183.3ulps
     64 threads:       16.1 GFlops    6.5 GB/s 1183.3ulps
    128 threads:       16.0 GFlops    6.4 GB/s 1183.3ulps
    256 threads:       15.9 GFlops    6.3 GB/s 1183.3ulps


GetPowerSpectrum() mod 2:
     32 threads:        6.2 GFlops    2.5 GB/s 1183.3ulps
     64 threads:        8.2 GFlops    3.3 GB/s 1183.3ulps
    128 threads:        8.3 GFlops    3.3 GB/s 1183.3ulps
    256 threads:        8.1 GFlops    3.2 GB/s 1183.3ulps

Claggy
« Last Edit: 18 Nov 2010, 07:07:54 pm by Claggy »

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: [Split] PowerSpectrum Unit Test
« Reply #19 on: 18 Nov 2010, 06:11:25 pm »
Here's my 128Mb 8400M GS's result, while it's not got enough RAM for Seti, it at least gives you some figures for very slow GPU's:

Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:        1.2 GFlops    0.5 GB/s 1183.3ulps


GetPowerSpectrum() mod 1:
     32 threads:        1.2 GFlops    0.5 GB/s 1183.3ulps
     64 threads:        1.2 GFlops    0.5 GB/s 1183.3ulps
    128 threads:        1.2 GFlops    0.5 GB/s 1183.3ulps
    256 threads:        1.2 GFlops    0.5 GB/s 1183.3ulps


GetPowerSpectrum() mod 2:
     32 threads:        0.7 GFlops    0.3 GB/s 1183.3ulps
     64 threads:        0.7 GFlops    0.3 GB/s 1183.3ulps
    128 threads:        0.7 GFlops    0.3 GB/s 1183.3ulps
    256 threads:        0.6 GFlops    0.2 GB/s 1183.3ulps

Claggy
« Last Edit: 18 Nov 2010, 07:02:21 pm by Claggy »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #20 on: 18 Nov 2010, 06:27:05 pm »
run it twice on the ION
~~~~~~~~~~~~~~~~~
starting PowerSpectrum2
.

Device: ION, 1100 MHz clock, 242 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:        1.9 GFlops    0.8 GB/s 1183.3ulps


GetPowerSpectrum() mod 1:
     32 threads:        1.3 GFlops    0.5 GB/s 1183.3ulps
     64 threads:        1.9 GFlops    0.7 GB/s 1183.3ulps
    128 threads:        1.9 GFlops    0.8 GB/s 1183.3ulps
    256 threads:        1.9 GFlops    0.8 GB/s 1183.3ulps


GetPowerSpectrum() mod 2:
     32 threads:        1.0 GFlops    0.4 GB/s 1183.3ulps
     64 threads:        1.0 GFlops    0.4 GB/s 1183.3ulps
    128 threads:        0.9 GFlops    0.4 GB/s 1183.3ulps
    256 threads:        0.8 GFlops    0.3 GB/s 1183.3ulps



Device: ION, 1100 MHz clock, 242 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:        1.9 GFlops    0.8 GB/s 1183.3ulps


GetPowerSpectrum() mod 1:
     32 threads:        1.3 GFlops    0.5 GB/s 1183.3ulps
     64 threads:        1.9 GFlops    0.8 GB/s 1183.3ulps
    128 threads:        1.9 GFlops    0.8 GB/s 1183.3ulps
    256 threads:        1.9 GFlops    0.8 GB/s 1183.3ulps


GetPowerSpectrum() mod 2:
     32 threads:        1.0 GFlops    0.4 GB/s 1183.3ulps
     64 threads:        1.0 GFlops    0.4 GB/s 1183.3ulps
    128 threads:        0.9 GFlops    0.4 GB/s 1183.3ulps
    256 threads:  &nbqp;     0.8 GFlops    0.3 GB/s 1183.3ulps


.
Done

Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #21 on: 18 Nov 2010, 07:01:14 pm »
This is what I got on my 480's with 260.99

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory
Compiled with CUDA 3020
Stock GetPowerSpectrum<> mod 1:
     64 threads:       27.6 GFlops   11.1 GB/s   0.0ulps

GetPowerSpectrum<> mod 1:
     32 threads:       17.5 GFlops   7.0 GB/s    0.0ulps
     64 threads:       27.5 GFlops&nb!`; 11.0 GB/s    0.0ulps
    128 threads:       36.4 GFlops  14.6 GB/s    0.0ulps
    256 threads:       39.6 GFlops  15.8 GB/s    0.0ulps

GetPowerSpectrum<> mod 2:
     32 threads:       20.2 GFlops   8.1 GB/s    0.0ulps
     64 threads:       39.7 GFlops  15.9 GB/s    0.0ulps
    128 threads:       64.1 GFlops  25.6 GB/s    0.0ulps
    256 threads:       64.3 GFlops  25.7 GB/s    0.0ulps

Steve

I edited the data as the first time I was crunching.
« Last Edit: 18 Nov 2010, 07:15:18 pm by SciManStev »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #22 on: 18 Nov 2010, 08:15:21 pm »
modify:
@Jason, woundering about you get 20 GFlops more with 256 threads than mine GTX470
have you source for me to compile with 2011XE Compiler ?

GTX480 has wider memory bus IIRC.  Also they're GPU Kernels Heinz, so CPU host side won't make any difference here (Unless Intel started messing with Cuda binaries  ;) ) After some work, this will lead to a set of optimisation strategies for other kernels throughout, rather than 1 specific piece of useful code

I'm looking at this (almost pure) memory bound computation (powerspectrum), as a way to see what optimisation strategies work on different cards with that type of operation.  This way I can learn to make kernels that choose the best memory access strategy internally by compute capability.

So far it looks like Mod2 is winning on Fermi (apart from whatever is causing arkayn's problems)  Prior Gen 200 series seem to like Mod1 better, so I suspect there is some memory pattern issue for me to look at in Mod2 with respect to prior gen cards.  Earlier G80-G92 cards could be even more memory subsystem constrained, or need even more special treatment of access patterns, by the looks of things.

@Arkayn, not sure what would cause that, but on my 480 that's where things start to get 'a bit warm'  ... Is there a possibility of temperature issues ? Try cranking the fan perhaps.  [Edit:]  Probably pushing the 2.1 (GTX 460) architecture limits in Mod2.  I'll look into that for mod3.

Steve's WINNING! (Just  ;) ) -

Plenty of data for me to chew on.  Will be thinking about mod3.

Jason
« Last Edit: 18 Nov 2010, 08:41:07 pm by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #23 on: 18 Nov 2010, 08:18:39 pm »
Here's my 128Mb 8400M GS's result, while it's not got enough RAM for Seti, it at least gives you some figures for very slow GPU's:

Nice!  Another stubborn GPU  :D

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #24 on: 18 Nov 2010, 08:37:59 pm »
Not sure if you're looking for this, but below my results on my 8800GTX, 260.99 drivers:
Exactly what I'm looking for, thanks.

Offline glennaxl

  • Knight o' The Realm
  • **
  • Posts: 86
Re: [Split] PowerSpectrum Unit Test
« Reply #25 on: 18 Nov 2010, 08:38:12 pm »
Device: GeForce 9800 GT, 1750 MHz clock, 500 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:       13.6 GFlops    5.4 GB/s 1183.3ulps


GetPowerSpectrum() mod 1:
     32 threads:       12.1 GFlops    4.9 GB/s 1183.3ulps
     64 threads:       13.7 GFlops    5.5 GB/s 1183.3ulps
    128 threads:       13.5 GFlops    5.4 GB/s 1183.3ulps
    256 threads:       13.4 GFlops    5.3 GB/s 1183.3ulps


GetPowerSpectrum() mod 2:
     32 threads:        5.3 GFlops    2.1 GB/s 1183.3ulps
     64 threads:        7.0 GFlops    2.8 GB/s 1183.3ulps
    128 threads:        7.1 GFlops    2.8 GB/s 1183.3ulps
    256 threads:        6.8 GFlops    2.7 GB/s 1183.3ulps

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #26 on: 18 Nov 2010, 08:48:52 pm »
If anyone's wondering what this figure is:

... 1183.3ulps ...

It's a measure of the precision against a CPU double precision reference power spectrum.

Fermi's get 0ulps total deviation (most accurate) because they default to IEEE-754 compliance, whereas earlier gen consistently get 1183.3 because they use a fast single precision implementation by default.

I can either use special intrinsic functions on the older cards to force compliance, at a speed penalty, or allow the Fermi's to use the faster (less accurate) computation.  Will see.  1183.3 'Units of Least Precision' isn't much total deviation from double precision reference over the 1048576 point data set used in multibeam. 

an ulp is defined here as:
Quote
const float ulp =  1.192092896e-07f;
... about  0.00000012 ... and there'd be some of that amount of variation from double precision CPU reference scattered throughout the dataset.

Jason
« Last Edit: 18 Nov 2010, 08:53:22 pm by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #27 on: 18 Nov 2010, 09:01:58 pm »
@Arkayn:  I looked through some results I have, and I have a GTX460 set that ran to completion @ stock speeds (Using driver 263.06).  Might be pushing the memory OC a bit on yours ?

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: [Split] PowerSpectrum Unit Test
« Reply #28 on: 18 Nov 2010, 10:05:35 pm »
I think it is at 800/1600 right now, runs Collatz just fine at that speed.

I just took it down to stock speed as well as the lowest setting that Afterburner allowed and it still crashed the program.

This is on a XP-64 pro machine though.

Driver is the 263.06, do I need the toolkit installed as well?
« Last Edit: 18 Nov 2010, 10:14:14 pm by arkayn »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #29 on: 18 Nov 2010, 10:41:16 pm »
Driver is the 263.06, do I need the toolkit installed as well?

Nope, It's definitely something weird.  Bear in mind that those upper kernels are pushing Fermi's memory subsystem harder than any boinc science app has to date that I know of, so I doubt Collatz or any other existing app would be a fair comparison ( except maybe Furmark, which is just a savage thing to do to a graphics card )

If it runs this at stock OK, but not at 800/1600, then it might be Collatz stable, but is unlikely to be future X series stable.  My current feeling is that the memory frequency is the culprit, rather than the core.

(If it doesn't run correctly at stock either, then more guessing to do  ;) )

[Later:]  At this stage I'm assuming some sort of bug in Mod2, so don;t go pulling things to bits just yet  ;)
« Last Edit: 19 Nov 2010, 02:14:46 am by Jason G »

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 48
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 50
Total: 50
Powered by EzPortal