Author Topic: GPU client (Read 194078 times)

Devaster · « **Reply #45 on:** 14 Dec 2007, 05:34:53 pm »

bob: yes stil 100% CPU usage not all things are on gpu and i dont use for now async acces ....

for now i am working on chirp routine ....

Macbeth · « **Reply #46 on:** 14 Dec 2007, 06:21:49 pm »

Out of curiosity, what RAC would you expect to get from say a Geforce 8800 series card?

Cheers.

Radiohead · « **Reply #47 on:** 15 Dec 2007, 09:18:44 am »

I learned to run Knabench

....but received a very very strange results:

WinXP 32. testWU-1 - testWU-7
C2D E6600 (2.4GHz), default-515.exe, one core in knabench vs 8800GTS 320Mb and last sahcuda.exe

1 - all 7 results - DIFFERENT!

2 - 8800GTS slower than one core E6600!!!

This is as it should be?

Quick timetable

WU : testWU-1.wu
default-515.exe : 304 seconds
sahcuda.exe : 499 seconds
Speedup: -64.14%, Ratio: 0.61 x

WU : testWU-2.wu
default-515.exe : 496 seconds
sahcuda.exe : 590 seconds
Speedup: -18.95%, Ratio: 0.84 x

WU : testWU-3.wu
default-515.exe : 541 seconds
sahcuda.exe : 657 seconds
Speedup: -21.44%, Ratio: 0.82 x

WU : testWU-4.wu
default-515.exe : 125 seconds
sahcuda.exe : 123 seconds
Speedup: 1.60%, Ratio: 1.02 x

WU : testWU-5.wu
default-515.exe : 499 seconds
sahcuda.exe : 596 seconds
Speedup: -19.44%, Ratio: 0.84 x

WU : testWU-6.wu
default-515.exe : 823 seconds
sahcuda.exe : 943 seconds
Speedup: -14.58%, Ratio: 0.87 x

WU : testWU-7.wu
default-515.exe : 361 seconds
sahcuda.exe : 376 seconds
Speedup: -4.16%, Ratio: 0.96 x

Devaster · « **Reply #48 on:** 15 Dec 2007, 10:19:40 am »

something wrong is on your computer ......

see there :http://setiathome.berkeley.edu/result.php?resultid=681495948 - this is one real work unit crunched with last aplication .....

and yes its still slower than any CPU version ....

Radiohead · « **Reply #49 on:** 15 Dec 2007, 10:36:40 am »

Have to reinstall Windows

Devaster · « **Reply #50 on:** 15 Dec 2007, 10:38:38 am »

lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....

Radiohead · « **Reply #51 on:** 16 Dec 2007, 05:31:51 am »

Quote from: Devaster on 15 Dec 2007, 10:19:40 am

and yes its still slower than any CPU version ....

Strangely....

I always thought that Nvidia 8 Series faster than Intel C2D

http://en.wikipedia.org/wiki/FLOPS
"As of 2007, the fastest PC processors perform over 30 GFLOPS.[8] GPUs in PCs are considerably more powerful in terms of pure FLOPS. For example, in the GeForce 8 Series the nVidia 8800 Ultra performs around 576 GFLOPS on 128 Processing elements. This equates to around 4.5 GFLOPS per element, compared with 2.75 per core for the Blue Gene/L. It should be noted that the 8800 series performs only Single precision calculations, and that while GPUs are highly efficient at calculations they are not as flexible as a general purpose CPU."

And Nvidia promises that the new card (GeForce 9800) will be even faster. 1 or 3 (!!!!) Tflops.... http://www.nordichardware.com/index.php?news=1&action=more&id=6911

I understand that this performance is not at all the tasks...
Perhaps the algorithm sahcuda can optimize computing?
seti_britta mathematician

It can help?

Radiohead · « **Reply #52 on:** 16 Dec 2007, 05:35:53 am »

Quote from: Devaster on 15 Dec 2007, 10:38:38 am

lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....

8500 - 16/16 processors
8800 GTS - 96/96 processors

Radiohead · « **Reply #53 on:** 16 Dec 2007, 05:50:05 am »

Quote from: Devaster on 15 Dec 2007, 10:19:40 am

something wrong is on your computer ......

Again, I launched the knabench.
Now all results - strongly similar

[attachment deleted by admin]

Devaster · « **Reply #54 on:** 16 Dec 2007, 08:53:12 am »

this code is not optimized ... there are a lot mem transfers that can be avoided for example and so on ... next there is mixed the CPU and GPU code in 95:5 .... and not used async access to device ....

first it mus be validated then optimized

Gecko_R7 · « **Reply #55 on:** 18 Dec 2007, 12:42:57 pm »

Mimo,

In what order does clock speed impact GPU performance as far as S@H is concerned? CPU clock, memory clock, shader clock?
Also, do I understand correctly that the G92 8800GT has 12 FPU processors in the GPU?
Do the shaders provide any benefit?

Sorry for the questions. Just tying to understand this better.

Devaster · « **Reply #56 on:** 18 Dec 2007, 02:12:23 pm »

yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction - you may imagine that your cpu has 16*4 cores ....
instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
cache is extremely effective in contignous reads/writes - called coalescing

clocks speed havent so great impact on gpu performance as count of shaders

Gecko_R7 · « **Reply #57 on:** 18 Dec 2007, 02:47:39 pm »

Quote from: Devaster on 18 Dec 2007, 02:12:23 pm

yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction - you may imagine that your cpu has 16*4 cores ....
instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
cache is extremely effective in contignous reads/writes - called coalescing

clocks speed havent so great impact on gpu performance as count of shaders

Thanks Mimo! Very interesting. So, higher shader count and faster shader clock will actually have better impact on crunching speed/potential for our purposes? In the case of a new G92 based 8800GT, 112 stream processors, each that can process 4 floats in 1 instruction. Wow! The interest in this becomes very clear. G80/G92 stream processors are scaler units, not vector processors?

Devaster · « **Reply #58 on:** 18 Dec 2007, 05:34:50 pm »

more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf

no shaders are strictly vectorized , massive pararelizm is implemented on hw....

Gecko_R7 · « **Reply #59 on:** 18 Dec 2007, 05:53:51 pm »

Quote from: Devaster on 18 Dec 2007, 05:34:50 pm

more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf

no shaders are strictly vectorized , massive pararelizm is implemented on hw....

Thanks!

Author Topic: GPU client (Read 194078 times)

Devaster

Re: GPU client

Macbeth

Re: GPU client

Radiohead

Re: GPU client

Devaster

Re: GPU client

Radiohead

Re: GPU client

Devaster

Re: GPU client

Radiohead

Re: GPU client

Radiohead

Re: GPU client

Radiohead

Re: GPU client

Devaster

Re: GPU client

Gecko_R7

Re: GPU client

Devaster

Re: GPU client

Gecko_R7

Re: GPU client

Devaster

Re: GPU client

Gecko_R7

Re: GPU client