+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: GPU crunching question  (Read 132129 times)

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #90 on: 13 Apr 2007, 03:47:36 pm »
a small problem : i have installed last directx sdk (04/2007) and ouha : some incompatibility between fxc from sdk and brcc compiler  :o

Haos

  • Guest
Re: GPU crunching question
« Reply #91 on: 13 Apr 2007, 05:36:15 pm »
Just as curiosity. Using 9500 (softmodded to 9700) - 326/586 gpu/mem timings
 on Win2k3 i was able to achieve:

-FFT bench:

min_n = 4
max_n = 4
RapidMind FFT Benchmark
-----------------------------------------------
Length: 16 = 2^4
Warming up...
Run timings, to and from host (in us):
 5795.75 5615.26 9583.4 4686.23 4654.09
 5703.83 4779.27 5022.07 6880.41 4667.78
 4814.75 4944.96 9120.98 5642.92 4681.48
 4857.22 4657.45 5188.32 6032.41 5560.77
 4826.49 4694.61 5058.68 4724.78 5647.95
 4876.22 4744.62 4652.42 10326.3 9268.23
 4911.71 5770.61 4956.97 23194.5 4759.99
 4882.65 6180.5 5031.01 4836.55 5471.36
 4928.47 4928.47 7661.36 5651.02 5982.4
 4808.33 6698.24 4948.59 5036.04 5189.72
 5267.11 4874.27 5834.03 4966.47 4908.07
 5025.15 5394.24 5988.82 4784.02 4641.8
 5427.77 6573.07 4754.12 6100.31 4694.61
 4805.81 4694.89 6234.14 4818.94 5904.72
 4763.34 4658.56 5026.82 5687.9 6996.09
 4931.55 4993.85 4619.45 5373.01 4758.59
 6509.92 11045.3 5100.31 7362.39 4694.89
 4770.61 4720.03 4724.78 4840.46 5887.12
 5021.79 4970.66 7732.61 4761.39 5846.88
 4848.28 6482.82 8503.49 6538.7 5774.52
Average execution time: 5751.77us
Normalized execution time (T/N): 359.485us/sample
Normalized by complexity (T/N lg N): 89.8713
Mflops (5 N lg N/T): 0.0556351
Average execution time: 5751.77us
Minimum execution time: 4619.45us
Normalized average execution time (T/N): 359.485us/sample
Normalized minimum execution time (T/N): 288.715us/sample
Average time normalized by complexity (T/N lg N): 89.8713
Minimum time normalized by complexity (T/N lg N): 72.1789
Average Mflops (5 N lg N/T): 0.0556351
Peak Mflops (5 N lg N/T): 0.0692724
---
Warming up...
Run timings, GPU-local (in us):
 4287.51 4164.57 6281.36 6275.22 4418.55
 5913.66 4145.01 5304 5119.87 4263.48
 4521.65 5006.15 4357.64 4280.25 4391.73
 5377.48 4325.79 4395.92 4089.41 4129.09
 4823.97 5475.55 4131.6 4458.51 8534.23
 4578.93 4113.44 4511.32 4092.76 4383.63
 4261.25 4618.33 4183.01 6111.48 4119.31
 9139.98 15454.9 4327.19 4232.47 5113.16
 4495.11 17601.6 4422.74 5288.91 4215.42
 4183.29 5226.6 4343.67 4503.77 4434.2
 5019.84 4253.98 5049.18 4101.43 4438.95
 4985.75 4206.48 4177.42 4077.95 5292.26
 4396.48 6117.35 4233.86 4148.09 5918.13
 4221.29 4130.48 4120.98 4343.39 14860.3
 4552.39 4233.31 5142.78 4885.16 5926.24
 4205.92 4913.66 4260.69 4510.2 4202.85
 4182.73 4203.97 7359.32 4228.56 4182.17
 4232.47 5304.55 5454.88 4221.57 5075.16
 4208.44 4438.11 4200.89 5349.54 6816.99
 4436.71 5529.76 4514.95 6238.61 4691.53
Average execution time: 5122.26us
Minimum execution time: 4077.95us
Normalized average execution time (T/N): 320.141us/sample
Normalized minimum execution time (T/N): 254.872us/sample
Average time normalized by complexity (T/N lg N): 80.0354
Minimum time normalized by complexity (T/N lg N): 63.718
BenchFFT average Mflops (5 N lg N/T): 0.0624724
BenchFFT peak Mflops (5 N lg N/T): 0.0784707
Residuals (compare with inverse):
  Average absolute: 4.37377e-006
  Maximum absolute: 2.29192e-005
  Average relative: -1.#IND
  Maximum relative: 1.#INF
-----------------------------------------------

-fft2d:

stopping after line:
Total number of floating point operations: 5.24288e+006

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #92 on: 18 Apr 2007, 03:53:01 am »
after discussion with brook creators they are working on the  fix for  the compatibility issue with fxc and brcc, and on issue why cant run kernels on vista ....
back to rapidmind backend : next reason why is powerspectrum so slow is number times of upload/dowload datas - a gpu computation is effective only for massive datas and  arithmetic intensity...   ???
 what to do ? that a question  :-\

wrong is that there is not a simple way to get some info .... :'(
« Last Edit: 18 Apr 2007, 04:16:32 am by Devaster »

Offline Vyper

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 376
Re: GPU crunching question
« Reply #93 on: 18 Apr 2007, 01:04:09 pm »
You're forgetting that GPU computing isn't needed in 100% of the code when it's not necessary.. I don't know if Powerspectrum is the most demanding part in the code or so, but u could convert those parts that benefit GPU programming the most in an experimental way and then move forward on optimizing other parts of the code..

U need to start somewhere and feel proud of it..

Btw, if u want GPU testing don't hesitate to contact me , running Vista X64 (Aero off, not to disturb the GPU) and a 8800GTX factory clocked...
I'm eager to assist you and try to persuade you to use the Cuda api aswell :)

Kind Regards Vyper

citroja

  • Guest
Re: GPU crunching question
« Reply #94 on: 18 Apr 2007, 06:50:45 pm »
wow....I have been gone for some time now and it seems like things are moving...though slowly and I am not entirely sure which direction  :)

Anyways,  I am back for a bit but I have to rebuild multiple comps over the next few weeks so I don't know how much I can help.

Also,  has anyone seen / heard from Hans Dorn recently?  He was working on the same project but has disappeared....

let me know if you need any help or testing.

-citroja

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: GPU crunching question
« Reply #95 on: 18 Apr 2007, 10:06:58 pm »
....
back to rapidmind backend : next reason why is powerspectrum so slow is number times of upload/dowload datas - a gpu computation is effective only for massive datas and  arithmetic intensity...   ???
 what to do ? that a question  :-\

Once past the baseline smoothing, all output from FFTs is converted to PowerSpectrum form before any other processing. If possible, a combined FFT+PowerSpectrum before the data is downloaded would be most efficient.
                                                                                       Joe

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #96 on: 19 Apr 2007, 12:58:41 am »
Quote
Once past the baseline smoothing, all output from FFTs is converted to PowerSpectrum form before any other processing. If possible, a combined FFT+PowerSpectrum before the data is downloaded would be most efficient.
                                                                                       Joe

yeah i think too and  i am going in this way in last two days

but : it must be done in separate steps ...

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #97 on: 19 Apr 2007, 01:00:35 am »
i dunno if its possible but what about a tool that emulates all gpu cores to windows.
i know gpu cores dont have those things like sse eg. and are more simple than a normal cpu, but with that kinda tool you dont have to code a different code for every software gpu crunching is used for.


It's the goal of CUDA. But it's a nVidia tool...


but if you havent a G80 then is cuda switched to emulation and its running on cpu  ???

Offline Vyper

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 376
Re: GPU crunching question
« Reply #98 on: 20 Apr 2007, 06:23:08 am »
Yes if u dont got a G80+ based card it will run in emul mode.. true .. Get a cheaper G80+ card to develop on later on! ;-) Nvidia just released cheaper 8X series card and i presume they will run G80+ code but albeit slower ..

Ofcourse that is not a priorty, first of all its fun to have a generic GPU code and that is what u Devaster is going for atm.. Keep up the good work and post the progress..

Kind Reg. Vyper

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #99 on: 24 Apr 2007, 02:36:00 pm »
ok now is nagas GPUFFTW fully included...
speed ?  ::) i dont know if it is good or bad ...

now another problem :

how i can rewrite this part of code
Code: [Select]
const float* outp=output.read_data ();
  for (int i=0;i<fftlen;i++)
  {
  PowerSpectrum[CurrentSub+i]=outp[i];
  }
to something like this
Code: [Select]
PowerSpectrum[CurrentSub]=output.read_data ();this is wrong ....
left side is float right side is const float*. how make typecast ???

Pepo

  • Guest
Re: GPU crunching question
« Reply #100 on: 24 Apr 2007, 04:45:33 pm »
how i can rewrite this part of code
Code: [Select]
const float* outp=output.read_data ();
  for (int i=0;i<fftlen;i++)
  {
  PowerSpectrum[CurrentSub+i]=outp[i];
  }
to something like this
Code: [Select]
PowerSpectrum[CurrentSub]=output.read_data ();left side is float right side is const float*. how make typecast ???

Hard to meaningfully typecast a pointer to float array into float value ;)

You would either have to sort of memcpy() the values from output.read_data() to &PowerSpectrum[CurrentSub], or create and later use some pointer instead of PowerSpectrum[CurrentSub] (which I suppose will not be possible, but I have never seen the code, so a wild guess only).

Peter

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #101 on: 25 Apr 2007, 04:15:41 pm »
yes i have used a memcpy...

i ll remember for memcpy today morning by work on mz house .... ;D

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #102 on: 27 Apr 2007, 12:12:26 pm »
yesterday i have compiled a seti client without graphics .... 120 percent performance boost over version with graphics . on GPU is:  FFT(Nagas) and Powerspectrum(Rapidmind) and a part of BaseSmooth (first fft(nagas)).

CPU load is about 60 percent and from this used a 30 percent for system thread. as i have seen in codeanalyst this 30 percent is a nvogl32.dll - this is a GLSL encapsulator.

but i need create a console for some messages because i dont know where the code is ....
simon can i use for this a DEBUG directive implemented in code ?

i must to do a validation check ....
thats all for now...  :)

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Re: GPU crunching question
« Reply #103 on: 27 Apr 2007, 06:47:12 pm »
Hi Devaster,

I'd just use a simple fprintf to stderr.txt - you could also use #ifdef DEBUG statements and echo to console from them. When you run the app inside Visual Studio, you'll get the console output. Otherwise, the stderr.txt (or a new file) seems simple to implement.

HTH,
Simon.

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU crunching question
« Reply #104 on: 28 Apr 2007, 12:24:28 am »
i willvconvert it to console aplication  ;)

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 50
Total: 50
Powered by EzPortal