Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: RottenMutt on 27 Jan 2011, 10:06:58 pm

Title: how do i run the stock GPU app and Lunatic CPU apps
Post by: RottenMutt on 27 Jan 2011, 10:06:58 pm
i tried installing using lunatic unified installer and not checking the gpu app and ended dumping my cache :P

anyhow, the stock app has been outperforming the lunatic app!!!!

look at this rig http://setiathome.berkeley.edu/show_host_detail.php?hostid=5413841
Title: Re: how do i run the stock GPU app and Lunatic CPUapps
Post by: Jason G on 28 Jan 2011, 02:21:48 am
Hi RottenMutt,
   It's well known that higher 2XX series running on XP with older drivers & apps can beat x32f at mid angle ranges under particular conditions, which happen to be quite common for now (as always your mileage may vary).  You can use the stock app if you wish via manual customisation of  your appinfo and providing appropriate files,  Or returning to stock outright.

The reason it's a 'your mileage may vary' situation, is that x32f was an early 'preview' release to address reliability problems in the stock app, reducing the number of '-12 errors' on high throughput hosts, and plugging a difficult emerging situation where Fermi users were installing incompatible applications that trash work.

Current optimisation efforts are directed at performance, and so status quo regarding preferred operating system and builds for best throughput is likely to change in the next few weeks,  as skills and techniques learned in the powerspectrum unit tests are injected into the codebase.

For a pretty good idea on how much current efforts are likely to effect you, please visit the [Now closed for implementation of those developments]  Powerspectrum Unit tests thread at  http://lunatics.kwsn.net/12-gpu-crunching/split-powerspectrum-unit-test.msg33244.html#msg33244, and post results for your 295 here (That thread is closed, due to research in that area being deemed sufficiently complete for implementation to proceed. ).   

The latest test piece (PowerspectrumTest10) requires Cuda 3.2 drivers & DLLs, and covers ~40-60% of multibeam processing through total rewrite for performance ( Still a lot of work to do in other areas though)

Best regards, Jason
Title: Re: how do i run the stock GPU app and Lunatic CPU apps
Post by: perryjay on 28 Jan 2011, 10:50:12 am
Hi Jason G,
Looking forward to the next round of tests but my system has changed. I've now got a new power supply and a GTS 450 running in place of my little 9500GT. I'm running my 450 at 900/1800/1804 and two work units at a time. (Just like the big boys!   ;D )  If we get much faster we won't need to download any work, it will be done before we can get it at the rate these things are growing. Okay, enough kidding around, I've got my 450 stable and ready to go on the next round of tests whatever they may be.
Title: Re: how do i run the stock GPU app and Lunatic CPU apps
Post by: Jason G on 28 Jan 2011, 10:51:58 am
Hi Jason G,
Looking forward to the next round of tests but my system has changed. I've now got a new power supply and a GTS 450 running in place of my little 9500GT. I'm running my 450 at 900/1800/1804 and two work units at a time. (Just like the big boys!   ;D )  If we get much faster we won't need to download any work, it will be done before we can get it at the rate these things are growing. Okay, enough kidding around, I've got my 450 stable and ready to go on the next round of tests whatever they may be.

If you feel like it, you could post PowerspectrumTest10 results for that killer here  ;D
Title: Re: how do i run the stock GPU app and Lunatic CPU apps
Post by: perryjay on 28 Jan 2011, 11:05:48 am
Here it is...





Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd/test

C:\test>powerspectrumtest10.exe

Device: GeForce GTS 450, 1800 MHz clock, 993 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   39.55) Peak(   64.30) Min(    6.03) [OK]
   Memory thoughput GB/s   Avg(   21.70) Peak(   32.15) Min(    9.69)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   56.33, 1.42x) Peak(   87.73, 1.36x) Min(   19.94, 3.31x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   33.93, 1.56x) Peak(   44.28, 1.38x) Min(   23.74, 2.45x)



C:\test>
Title: Re: how do i run the stock GPU app and Lunatic CPU apps
Post by: Jason G on 28 Jan 2011, 11:24:19 am
Here it is...

Thanks!  It spins me out a little when I see the ~3+ times 'Min' figures.  It means that the slowest kernels liked the attention I gave.  1.42x average processing rate should translate to a decent percentage improvement on whole tasks (probably around 20% or so without further refinement).  The next few weeks levering in those to the pipeline is going to be interesting.

Jason
Title: Re: how do i run the stock GPU app and Lunatic CPU apps
Post by: Josef W. Segur on 28 Jan 2011, 01:59:49 pm
i tried installing using lunatic unified installer and not checking the gpu app and ended dumping my cache :P
...

Richard Haselgrove wrote a very good FAQ, Run Seti Enhanced on Fermi class GPUs (4xx) (Advanced users) (http://boincfaq.mundayweb.com/index.php?language=1&view=531&sessionID=78fc8b3f4f7d9d475bc5e7a407fa6415). Although it's specifically for the 6.10 cuda_fermi stock application, the pattern is the same for any stock application.

Because the project has resend_lost_results on, when you get app_info.xml fixed up the "dumped" tasks may be sent again.
                                                                                            Joe
Title: Re: how do i run the stock GPU app and Lunatic CPU apps
Post by: Claggy on 28 Jan 2011, 02:30:12 pm
Since i've also got a new GPU, a GTX 460 GLH, here's my Test #10 results (Win 7 x64, 266.58):

Code: [Select]
Device: GeForce GTX 460, 1600 MHz clock, 993 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   69.92) Peak(  114.81) Min(    8.91) [OK]
   Memory thoughput GB/s   Avg(   37.67) Peak(   54.34) Min(   15.05)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  102.27, 1.46x) Peak(  151.62, 1.32x) Min(   21.28, 2.39x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   60.11, 1.60x) Peak(   77.00, 1.42x) Min(   37.53, 2.49x)

Claggy
Title: Re: how do i run the stock GPU app and Lunatic CPU apps
Post by: RottenMutt on 30 Jan 2011, 06:38:52 pm
all 266.58 drivers and x64 vista or 7
8800gts:  interesting, loaded the video engine alittle...

Code: [Select]
Device: GeForce 8800 GTS 512, 1674 MHz clock, 492 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   41.89) Peak(   58.05) Min(   11.00) [OK]
   Memory thoughput GB/s   Avg(   25.23) Peak(   36.89) Min(   18.43)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   48.37, 1.15x) Peak(   68.83, 1.19x) Min(   17.61, 1.60x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   30.32, 1.20x) Peak(   40.81, 1.11x) Min(   20.93, 1.14x)

GTX475 (460 flashed to make it think it is a 475; i think i can, i think i can, I can)

Code: [Select]
Device: GeForce GTX 470, 1250 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   75.32) Peak(  119.15) Min(    8.83) [OK]
   Memory thoughput GB/s   Avg(   41.63) Peak(   60.16) Min(   15.57)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  115.05, 1.53x) Peak(  164.29, 1.38x) Min(   38.70, 4.38x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   69.71, 1.67x) Peak(   88.85, 1.48x) Min(   49.02, 3.15x)

GTX295:

Code: [Select]
Device: GeForce GTX 295, 1369 MHz clock, 896 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   81.66) Peak(  113.97) Min(   16.45) [OK]
   Memory thoughput GB/s   Avg(   47.72) Peak(   67.99) Min(   29.01)


Opt1 (worst case): 128 thrds/block, 2 x 524288 element streams
  revert to single stream from size 256
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   78.32, 0.96x) Peak(  114.21, 1.00x) Min(   23.06, 1.40x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   47.65, 1.00x) Peak(   65.35, 0.96x) Min(   21.84, 0.75x)

GTX480:

Code: [Select]
Device: GeForce GTX 480, 1440 MHz clock, 1504 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   80.18) Peak(  121.88) Min(    9.81) [OK]
   Memory thoughput GB/s   Avg(   44.31) Peak(   64.58) Min(   17.29)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  145.60, 1.82x) Peak(  208.34, 1.71x) Min(   34.86, 3.55x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   86.97, 1.96x) Peak(  113.33, 1.75x) Min(   61.46, 3.55x)

another gtx480

Code: [Select]
Device: GeForce GTX 480, 1440 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   84.40) Peak(  130.24) Min(    9.68) [OK]
   Memory thoughput GB/s   Avg(   46.41) Peak(   68.03) Min(   17.06)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  144.95, 1.72x) Peak(  208.23, 1.60x) Min(   34.33, 3.55x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   86.54, 1.86x) Peak(  113.26, 1.66x) Min(   60.53, 3.55x)
Title: Re: how do i run the stock GPU app and Lunatic CPU apps
Post by: perryjay on 30 Jan 2011, 07:06:40 pm
Okay, you guys are just showing off now!!   ;D