Forum > GPU crunching

Some thinking and theoretic discussion about seti client on GPU

<< < (2/5) > >>

Devaster:
but by my observations is that 100% CPU usage only "empty loop" - waiting for driver response. by me at home when i run some pure GPU code from CUDA SDK  i haven't seemed any significant slowdown ....

abachler:
You are probably better off processing at least part fo the WU on teh CPU, so that it stays busywhile the GPU is processing the rest.   As for the FFT takign so long in RM, Yes, due to the nature of the FFT algorithm, it is difficult to implement it on a GPU without killing performance, but never fear, there is a workaround :)  Then again, since the CPU is idle, you should process a seperate WU on teh CPU while the GPU is processing the other.  I think ultimately the BOINC client will have to take care of recognizing when it should start mutiple clients including fro the GPU and to only use one client per GPU.

roisen.dubh:
From what I understand, Chirping the data is what takes the most amount of crunching. If getting the FFTs to crunch ion the GPU s what is causing the GPU client to go so slowly, why not have the GPUs chirp the data, and then send it to the CPU for the FFTs.

Or I could be completely mistaken

Jason G:

--- Quote from: roisen.dubh on 30 Dec 2007, 01:48:49 am ---From what I understand, Chirping the data is what takes the most amount of crunching. If getting the FFTs to crunch ion the GPU s what is causing the GPU client to go so slowly, why not have the GPUs chirp the data, and then send it to the CPU for the FFTs.

--- End quote ---
From vague memory when I did some profiling on my p4's [may or may not be relevant to GPU prcoessing, Don't know] , from most intensive to slighlty less intensive :
    Pulse Folding/Finding, sheer moving data about the place, then Chirping, then iFFT's& FFT's, then Gauss fitting.  Each of which vary by angle range and task content.

[Baseline Smoothing showed up somewhere too, but I don't remember how expensive that was... lower down on the list I think]

I remember at the time thinking these processing tasks seemed to each use a more even proportion of the total processing time than I would have expected. [Something like each major inner functions around 4 to 11% total execution time each]


Jason

Devaster:
yes i know that pulse find is the most time comsuming operation , but i must begin with something easy - fft, power spectrum, data chirp ....
when you take look at pulse find code - is it more compex as find spike for example , and i am not so good for now to easy convert/rewrite the code to pararell architecture ...

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version