GPU crunching question

Forum > Windows

<< < (2/37) > >>

Gecko_R7:

--- Quote from: Devaster on 28 Jan 2007, 09:31:57 pm ---i have modified sah application to use GPU for FFT , power spectrum and chirping using BrookGPU. i think i may release this on next weekend....

--- End quote ---

Very interesting.
What kind of results are you seeing vs. CPU-only performance?

Devaster:
first benefit is this that a GPU and CPU version can run pararel with a small performance hit

BenHer:
Here's some thoughts...

If you had an SLI machine, could you run 1 copy CPU, and 1 copy of GPU for each card?

Is the GPU kept busy enough by the version, or is the GPU idle some percentage of the time?
If somewhat idle, on a multi-core or multi-cpu system can multiple GPU copies be run?

Weee!

Devaster:
sli mode is based on driver if is enabled i think it would running but there is no a way to do running one prog per GPU separate...

by gpu version is all heavy calc doing on gpu therefore is cpu freed from workload. this free time can be used by cpu version....

multicore or multicpu has nothing with this you use a multiple gpus as one ,all distributing per gpu is done via driver !!!!

read more at http://gpgpu.org/

Devaster:
ok for now some link problems ....

btw. wanna see part of ps30 shader code ? :

--- Code: ---namespace {
using namespace ::brook::desc;
static const gpu_kernel_desc __DFTX_ps30_desc = gpu_kernel_desc()
.technique( gpu_technique_desc()
.output_address_translation()
.input_address_translation()
.pass( gpu_pass_desc(
" ps_3_0\n"
" def c26, 0, 0.5, 1, 2\n"
" def c27, -1, 1, 0, 0\n"
" dcl_texcoord1 v0.xy\n"
" dcl_2d s0\n"
" dcl_2d s1\n"
" dcl_2d s2\n"
" frc r0.xy, v0\n"
" add r0.xy, -r0, v0\n"
" mov r1.xy, c26\n"
" dp2add r0.z, r0, c20, r1.y\n"
" dp2add r0.x, r0, c20, r1.x\n"
" mul r2, r0.z, c22\n"
" frc r3, r2\n"
" add r2, r2, -r3\n"
" mad r0, r2, -c21, r0.x\n"
" add r0, r0, c26.y\n"
" mov r2, c23\n"
" mad r0, r0, r2, -c24\n"
" frc r2, r0\n"
" add r0, r0, -r2\n"
" cmp r2, r0, c26.x, c26.z\n"
" dp4 r1.x, r2, r2\n"
" cmp r1.x, -r1.x, c26.x, c26.z\n"
" mov r2, -r1.x\n"
" texkill r2\n"
" add r2, r0, -c25\n"
" cmp r2, r2, c26.z, c26.x\n"
" dp4 r1.x, r2, r2\n"
" cmp r1.x, -r1.x, c26.x, c26.z\n"
" mov r2, -r1.x\n"
" texkill r2\n"
" mad r2, r0, c0, r1.y\n"
" mul r2, r2, c1\n"
--- End code ---

nice isnt ? (whole code is about 8000 lines long) ;)

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version