+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: GPU crunching question  (Read 132152 times)

Freddy

  • Guest
Re: GPU crunching question
« Reply #45 on: 21 Feb 2007, 02:20:46 am »
Tested with 8800GTS 640MB Version (nothing done about the clock rate of memory or GPU)

min_n = 4
max_n = 4
RapidMind FFT Benchmark
-----------------------------------------------
Length: 16 = 2^4
Warming up...
Run timings, to and from host (in us):
 10095.2 8976.7 9132.39 8718.98 8906.92
 8904.71 8715.21 8833.48 8783.14 8836.1
 8674.97 8913.12 8764.64 8645.37 8741.8
 8818.75 9024.37 8807.76 8826.81 8911.87
 9002.08 9067.97 8945.69 8910.78 8722.34
 8785.37 8814.4 8836.28 8834.39 8795.27
 8778.69 8968.62 8747 8943.26 9291.43
 8890.32 8932.17 8860.98 8739.06 8734.42
 8871.18 8755.89 8868.9 9068.03 8763.38
 9002.55 8814.57 8864.37 8823.38 8856.53
 8831.87 8614.2 8851.8 8697.95 8952.61
 8711.42 8683.05 8912.46 8763.43 8755.46
 8718.52 9060.99 8932.78 8812.21 8834.16
 8825.66 8653.1 8801.54 8859.38 8665.22
 8906.53 8957.47 8860.75 8777.11 8759.25
 8845.62 9030.77 8915.02 8858.34 8676.31
 8819.07 9009.46 8837.26 8762.6 8834.04
 7046.69 8719.74 8610.55 8890.17 8839.04
 9646.3 8775.46 8739.86 8720.51 9064.7
 8947.07 8705.96 8704.77 8867.14 8880.16
Average execution time: 8842.67us
Normalized execution time (T/N): 552.667us/sample
Normalized by complexity (T/N lg N): 138.167
Mflops (5 N lg N/T): 0.0361882
Average execution time: 8842.67us
Minimum execution time: 7046.69us
Normalized average execution time (T/N): 552.667us/sample
Normalized minimum execution time (T/N): 440.418us/sample
Average time normalized by complexity (T/N lg N): 138.167
Minimum time normalized by complexity (T/N lg N): 110.105
Average Mflops (5 N lg N/T): 0.0361882
Peak Mflops (5 N lg N/T): 0.0454114
---
Warming up...
Run timings, GPU-local (in us):
 8263.18 8381.39 8462.2 8356.22 8373.54
 8503.47 8716.67 8385.77 8394.17 8419.64
 8659.13 8294.88 8407.95 8567.22 8493.25
 8384.13 8477.74 8508.42 8552.66 8398.76
 8761.34 8573.63 8430.25 8437 8615.68
 8464.32 8483.02 8540.84 8564.65 8566.38
 8503.04 8614.77 8437.5 8545.99 8401.69
 8442.15 8832.88 8638.04 8456.14 8492.51
 8693.16 8371.29 8350.92 8427.35 8414.12
 8851.89 8438.03 8443.12 8503.04 8665.21
 8719.99 8375.58 8501.07 8526.01 8325.1
 8614.5 8433.29 8432.5 8532.22 8529.62
 8481.02 8251.49 8543.71 8523.21 8422.35
 8640.62 8603.52 8661.46 8479.36 8548.6
 8649.6 8542.74 8373.39 8379.29 8413.56
 8598.13 8549.43 8460.99 8544.15 8515.79
 8576.4 8485.85 8558.77 8380.95 8520.18
 8764.88 8403.96 8483.77 8752.86 7361.6
 8661.36 8332.67 8480.45 8310.8 8649.39
 8708.75 8560.87 8488.33 8491.4 8473.15
Average execution time: 8495.79us
Minimum execution time: 7361.6us
Normalized average execution time (T/N): 530.987us/sample
Normalized minimum execution time (T/N): 460.1us/sample
Average time normalized by complexity (T/N lg N): 132.747
Minimum time normalized by complexity (T/N lg N): 115.025
BenchFFT average Mflops (5 N lg N/T): 0.0376657
BenchFFT peak Mflops (5 N lg N/T): 0.0434688
Residuals (compare with inverse):
  Average absolute: 1.26059e-008
  Maximum absolute: 5.96046e-008
  Average relative: -1.#IND
  Maximum relative: 1.#INF
-----------------------------------------------


RapidMind 2D FFT Benchmark
===============================================
Size: 256 x 256 = 2^8 x 2^8
Radix: 4 = 2^2
Total number of floating point operations: 5.24288e+006

Run timings, to and from host (in ms):

Average execution time: 13.7757ms
Overall average execution time: 13.7762ms
Minimum execution time: 13.2051ms
Average Mflops: 380.589
Peak Mflops: 397.035

Run timings, GPU-local (in ms):

Average execution time: 12.1273ms
Overall average execution time: 12.1279ms
Minimum execution time: 11.7326ms
Average Mflops: 432.32
Peak Mflops: 446.865


Both Tests end with an memory read  error.
OS is Windows XP Pro 32 Bit .Net 2.0 is not installed

Serching for Errors will be done later when work is over...

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #46 on: 21 Feb 2007, 05:36:35 am »
for G80 is better a CUDA version , i may search on my home computer some apps by Hans Dorn - he had builded some test apps based on CUDA ...

WR-HW95

  • Guest
Re: GPU crunching question
« Reply #47 on: 22 Feb 2007, 09:42:06 am »
With 8800GTX @ 612/975

Code: [Select]
C:\Release-vc8>fft.exe
min_n = 4
max_n = 4
RapidMind FFT Benchmark
-----------------------------------------------
Length: 16 = 2^4
Warming up...
Run timings, to and from host (in us):
 11561.3 10482.5 8229.39 12829.6 8740.71
 9539.26 9745.74 10875.1 11149.2 9760.27
 12356 8845.49 11541.2 8558.26 9808.89
 9916.74 9238.06 9773.12 8477.23 7909.47
 11607.7 10333.6 7918.13 11377.5 7920.09
 10473.6 8454.32 9801.9 10972.9 10767
 9267.11 11145.3 9876.5 9839.62 13427.2
 8664.71 10973.7 11119.3 9176.86 9062.31
 9811.68 8923.72 7202.85 9036.6 9994.13
 8747.42 10002.8 10443.1 9761.39 9866.44
 10177.1 10808.3 8371.89 10052 9621.96
 10266 11904.4 9640.12 9375.24 8899.69
 9294.78 10726.2 6828.72 12483.1 9911.99
 12466.6 8385.58 7925.68 10416.3 9766.97
 9917.02 11196.4 9642.64 10324.1 11035.8
 9518.3 8512.15 10829 9727.86 12404.3
 10707.5 10192.5 10868.4 7899.13 9340.32
 8048.62 7750.77 11226.9 8889.35 9273.54
 7777.87 7842.69 7471.92 8830.4 10697.4
 11466.3 8701.59 8419.39 7942.44 9761.11
Average execution time: 9788.45us
Normalized execution time (T/N): 611.778us/sample
Normalized by complexity (T/N lg N): 152.945
Mflops (5 N lg N/T): 0.0326916
Average execution time: 9788.45us
Minimum execution time: 6828.72us
Normalized average execution time (T/N): 611.778us/sample
Normalized minimum execution time (T/N): 426.795us/sample
Average time normalized by complexity (T/N lg N): 152.945
Minimum time normalized by complexity (T/N lg N): 106.699
Average Mflops (5 N lg N/T): 0.0326916
Peak Mflops (5 N lg N/T): 0.0468609
---
Warming up...
Run timings, GPU-local (in us):
 10815.9 11730.4 7816.99 7627.83 9804.42
 9321.6 9801.34 9725.06 7585.92 9003.07
 9982.68 6766.42 10917.9 8505.45 7894.38
 10349.5 8926.79 11731.8 7668.62 8905.56
 11206.2 9771.44 11598.2 8679.8 9933.78
 9116.51 8855.83 9696 9815.87 8695.17
 12109.5 9716.4 8787.65 8662.48 8444.54
 7717.24 8718.36 9792.96 10747.7 9169.6
 11555.5 8955.85 9709.7 6659.12 10377.2
 9286.95 10160.9 11761.7 8587.87 12249.8
 8761.67 10833.5 9495.95 7892.71 9270.47
 9678.68 10709.1 9684.55 7819.5 10225.5
 8822.58 12600.2 8660.8 8996.09 11010.3
 6783.74 10320.5 10069.9 9703.83 10450.1
 7650.74 10810.8 10639.8 9755.24 11815.3
 8054.21 7740.15 10277.5 10128.5 10209.3
 6895.78 7671.42 9653.26 9822.86 12298.4
 10547.4 7820.62 7712.77 6761.39 8859.18
 7419.95 8623.08 7702.71 8842.41 9383.91
 9820.06 7636.21 8563.29 9718.36 8473.6
Average execution time: 9385.19us
Minimum execution time: 6659.12us
Normalized average execution time (T/N): 586.574us/sample
Normalized minimum execution time (T/N): 416.195us/sample
Average time normalized by complexity (T/N lg N): 146.644
Minimum time normalized by complexity (T/N lg N): 104.049
BenchFFT average Mflops (5 N lg N/T): 0.0340963
BenchFFT peak Mflops (5 N lg N/T): 0.0480544
Residuals (compare with inverse):
  Average absolute: 1.26059e-008
  Maximum absolute: 5.96046e-008
  Average relative: -1.#IND
  Maximum relative: 1.#INF
-----------------------------------------------

Code: [Select]
C:\Release-vc8>fft2d.exe
RapidMind 2D FFT Benchmark
===============================================
Size: 256 x 256 = 2^8 x 2^8
Radix: 4 = 2^2
Total number of floating point operations: 5.24288e+006

Run timings, to and from host (in ms):

Average execution time: 15.6239ms
Overall average execution time: 15.6285ms
Minimum execution time: 13.4389ms
Average Mflops: 335.568
Peak Mflops: 390.126

Run timings, GPU-local (in ms):

Average execution time: 13.8474ms
Overall average execution time: 13.851ms
Minimum execution time: 10.7656ms
Average Mflops: 378.619
Peak Mflops: 487.004
 

It looks like this likes pretty much cpu speed too... above is ran with 2xrosetta and 3.05GHz Opteron 175.

I suspended Boinc and ran fft2d again.

Code: [Select]
C:\Release-vc8>fft2d.exe
RapidMind 2D FFT Benchmark
===============================================
Size: 256 x 256 = 2^8 x 2^8
Radix: 4 = 2^2
Total number of floating point operations: 5.24288e+006

Run timings, to and from host (in ms):

Average execution time: 14.0743ms
Overall average execution time: 14.0783ms
Minimum execution time: 13.1137ms
Average Mflops: 372.515
Peak Mflops: 399.801

Run timings, GPU-local (in ms):

Average execution time: 12.3266ms
Overall average execution time: 12.3304ms
Minimum execution time: 10.2948ms
Average Mflops: 425.332
Peak Mflops: 509.276
« Last Edit: 22 Feb 2007, 09:47:17 am by WR-HW95 »

pepperammi

  • Guest
Re: GPU crunching question
« Reply #48 on: 22 Feb 2007, 07:57:55 pm »
for G80 is better a CUDA version , i may search on my home computer some apps by Hans Dorn - he had builded some test apps based on CUDA ...
I hear the 8900 series will have 25% more shaders or something and still the G80 chips. Apparently there all along. Would that mean anything to all this?
I wonder if will be able to unlock them like I think was possible on some older ATI at some point?

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #49 on: 24 Feb 2007, 06:24:22 pm »
as i have wrote for older card are better a BrookGPU or Rapidmind...
for new cards are better  CUDA (nVIDIA) or CTM (ATI)

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #50 on: 24 Feb 2007, 06:32:34 pm »
as i have see in the RapidMind FFT source : algorithm is running on two complex on one pass (ala RGBA texture format). using this format has extremely efficiency in vertex/pixel shaders and by memory transfers (shaders/GPU memory)...

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #51 on: 24 Feb 2007, 06:40:42 pm »
off topic : Code Wizard : cool  :)

my name is yellow  :o
« Last Edit: 24 Feb 2007, 07:04:50 pm by Devaster »

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Re: GPU crunching question
« Reply #52 on: 24 Feb 2007, 06:47:41 pm »
;D
I thought so, too. Keep up the good work!

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #53 on: 24 Feb 2007, 07:03:44 pm »
maybe i have  a good idea : modifying a boinc manager to use a GPU as a next core ....
if you have a usable GPU , then you can run next instance of SETI ...

there would be a small performance hit .... (about 10 percent by my tests)

pepperammi

  • Guest
Re: GPU crunching question
« Reply #54 on: 24 Feb 2007, 09:17:24 pm »
I was reading an article the other day that the G80 is more like an x86 processor than the normally thought of GPU.
http://news.softpedia.com/news/G80-Is-Actually-a-CPU-44724.shtml

Gecko_R7

  • Guest
Re: GPU crunching question
« Reply #55 on: 25 Feb 2007, 12:45:09 am »
Devastater:   So, if a person was running S@H on C2D and had a graphics card, BOINC would recognize the GPU as a 3rd processor and manage the GPU's own client?  Well, even if the GPU lost 10% performance, being able to run the CPU clients simultaneously appears to be quite a gain in aggregate vs. GPU-only crunching at 100%.

This sounds pretty darn cool!  ;D
Good luck!

Offline Alex Kan

  • Alpha Tester
  • Squire
  • ***
  • Posts: 29
Re: GPU crunching question
« Reply #56 on: 25 Feb 2007, 05:12:43 pm »
Devaster: Neither of the data points you've picked for fft.exe are representative of SETI's FFT workload--SETI doesn't do two-dimensional FFTs, and spends much more time doing FFTs with lengths between 16K and 128K than it does any other lengths.

Also, if you're using the standard MFLOPS = 5 N log2(N) / (1000 * time in ms) metric for FFT performance, those times strike me as a bit on the low side. A lot of those speeds seem no faster than (or worse, slower than) doing the same computations on the CPU with tuned libraries. Does RapidMind provide built-in functionality for computing FFTs?

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #57 on: 25 Feb 2007, 05:49:04 pm »
from my side : for me  is not important if fft on gpu is more speedy or not but is in that you are using additional compute power to crunching ....

pepperammi

  • Guest
Re: GPU crunching question
« Reply #58 on: 27 Feb 2007, 02:58:39 pm »
This article at all useful or interesting? bit over my head to be honest  ;)
http://arstechnica.com/news.ars/post/20070227-8931.html

EastWind

  • Guest
Re: GPU crunching question
« Reply #59 on: 28 Feb 2007, 04:40:39 pm »
The peak MIPS showed here so far is less than 1GFlops. I checked my overclocked PD830(3780MHz), it has 2.69 GFLOPs for 62.4 credit work units. Does it mean the present GPU program is a bit too slow?

I ran GPU FFTW on my Nvidia 6200TC graphics previously, the speed is fater than FFT on my amd64 4400+ (at 2600MHz) cpu. The 6200 graphics cards just has two pixel pipe lines (eg 8 procesors). Does it mean if the GPU FFT program provided here can be improved further ?

I heard GPU speed at folding@home is about 59GFlops on average compared to 0.89GFlops on CPU. see statistics below

OS Type                Current TFLOPS*    Active CPUs    Total CPUs
Windows                   148                    155670         1607204
Mac OS X/PowerPC        7                     8518              94537
Mac OS X/Intel              6                      2112              5936
Linux                          29                      20504           209163
GPU                           39                       662              1984
Total                         229                   187466            1918824

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 28
Total: 28
Powered by EzPortal