+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: GPU client  (Read 158750 times)

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #45 on: 14 Dec 2007, 05:34:53 pm »
bob: yes stil 100% CPU usage not all things are on gpu and i dont use for now async acces ....

for now i am working on chirp routine ....

Macbeth

  • Guest
Re: GPU client
« Reply #46 on: 14 Dec 2007, 06:21:49 pm »
Out of curiosity, what RAC would you expect to get from say a Geforce 8800 series card?

Cheers.  ;)

Radiohead

  • Guest
Re: GPU client
« Reply #47 on: 15 Dec 2007, 09:18:44 am »
I learned to run Knabench  :D

....but received a very very strange results:

WinXP 32. testWU-1 - testWU-7
C2D E6600 (2.4GHz), default-515.exe, one core in knabench vs 8800GTS 320Mb and last sahcuda.exe

1 - all 7 results - DIFFERENT!  :(
2 - 8800GTS slower than one core E6600!!!  :-[

This is as it should be?


Quick timetable

WU : testWU-1.wu
default-515.exe : 304 seconds
sahcuda.exe : 499 seconds
Speedup: -64.14%, Ratio: 0.61 x

WU : testWU-2.wu
default-515.exe : 496 seconds
sahcuda.exe : 590 seconds
Speedup: -18.95%, Ratio: 0.84 x

WU : testWU-3.wu
default-515.exe : 541 seconds
sahcuda.exe : 657 seconds
Speedup: -21.44%, Ratio: 0.82 x

WU : testWU-4.wu
default-515.exe : 125 seconds
sahcuda.exe : 123 seconds
Speedup: 1.60%, Ratio: 1.02 x

WU : testWU-5.wu
default-515.exe : 499 seconds
sahcuda.exe : 596 seconds
Speedup: -19.44%, Ratio: 0.84 x

WU : testWU-6.wu
default-515.exe : 823 seconds
sahcuda.exe : 943 seconds
Speedup: -14.58%, Ratio: 0.87 x

WU : testWU-7.wu
default-515.exe : 361 seconds
sahcuda.exe : 376 seconds
Speedup: -4.16%, Ratio: 0.96 x
« Last Edit: 15 Dec 2007, 09:25:42 am by Radiohead »

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #48 on: 15 Dec 2007, 10:19:40 am »
something wrong is on your computer ...... :o

see there :http://setiathome.berkeley.edu/result.php?resultid=681495948 - this is one real work unit crunched with last aplication .....

and yes its still slower than any CPU version ....

Radiohead

  • Guest
Re: GPU client
« Reply #49 on: 15 Dec 2007, 10:36:40 am »
Have to reinstall Windows  :(

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #50 on: 15 Dec 2007, 10:38:38 am »
lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....

Radiohead

  • Guest
Re: GPU client
« Reply #51 on: 16 Dec 2007, 05:31:51 am »
and yes its still slower than any CPU version ....

Strangely....

I always thought that Nvidia 8 Series faster than Intel C2D

http://en.wikipedia.org/wiki/FLOPS
"As of 2007, the fastest PC processors perform over 30 GFLOPS.[8] GPUs in PCs are considerably more powerful in terms of pure FLOPS. For example, in the GeForce 8 Series the nVidia 8800 Ultra performs around 576 GFLOPS on 128 Processing elements. This equates to around 4.5 GFLOPS per element, compared with 2.75 per core for the Blue Gene/L. It should be noted that the 8800 series performs only Single precision calculations, and that while GPUs are highly efficient at calculations they are not as flexible as a general purpose CPU."

And Nvidia promises that the new card (GeForce 9800) will be even faster. 1 or 3 (!!!!) Tflops.... http://www.nordichardware.com/index.php?news=1&action=more&id=6911

I understand that this performance is not at all the tasks...
Perhaps the algorithm sahcuda can optimize computing?
seti_britta mathematician  :)
It can help?  ;D
« Last Edit: 16 Dec 2007, 03:41:43 pm by Radiohead »

Radiohead

  • Guest
Re: GPU client
« Reply #52 on: 16 Dec 2007, 05:35:53 am »
lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....
8500 - 16/16 processors
8800 GTS - 96/96 processors

Radiohead

  • Guest
Re: GPU client
« Reply #53 on: 16 Dec 2007, 05:50:05 am »
something wrong is on your computer ...... :o

Again, I launched the knabench.
Now all results - strongly similar  :o

[attachment deleted by admin]

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #54 on: 16 Dec 2007, 08:53:12 am »
this code is not optimized ... there are a lot mem transfers that can be avoided for example  and so on ... next there is  mixed the CPU and GPU code in 95:5 .... and not used async access to device ....

first it mus be validated then optimized

Gecko_R7

  • Guest
Re: GPU client
« Reply #55 on: 18 Dec 2007, 12:42:57 pm »
Mimo,

In what order does clock speed impact GPU performance as far as S@H is concerned?  CPU clock, memory clock, shader clock?
Also, do I understand correctly that the G92 8800GT has 12 FPU processors in the GPU?
Do the shaders provide any benefit?

Sorry for the questions. Just tying to understand this better.



Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #56 on: 18 Dec 2007, 02:12:23 pm »
yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction  - you may imagine that your cpu has 16*4 cores ....
instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
cache is extremely effective in contignous reads/writes - called coalescing

clocks speed havent so great impact on gpu performance as count of shaders

Gecko_R7

  • Guest
Re: GPU client
« Reply #57 on: 18 Dec 2007, 02:47:39 pm »
yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction  - you may imagine that your cpu has 16*4 cores ....
instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
cache is extremely effective in contignous reads/writes - called coalescing

clocks speed havent so great impact on gpu performance as count of shaders


Thanks Mimo! Very interesting.  So, higher shader count and faster shader clock will actually have better impact on crunching speed/potential for our purposes?  In the case of a new G92 based 8800GT, 112 stream processors, each that can process 4 floats in 1 instruction.  Wow!  The interest in this becomes very clear.  G80/G92 stream processors are scaler units, not vector processors? 
« Last Edit: 18 Dec 2007, 05:53:19 pm by Gecko_R7 »

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #58 on: 18 Dec 2007, 05:34:50 pm »
more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf

no shaders are strictly vectorized , massive pararelizm is implemented on hw....

Gecko_R7

  • Guest
Re: GPU client
« Reply #59 on: 18 Dec 2007, 05:53:51 pm »
more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf

no shaders are strictly vectorized , massive pararelizm is implemented on hw....

Thanks!

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 248
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 163
Total: 163
Powered by EzPortal