+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: GPU client  (Read 159781 times)

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #180 on: 15 Jul 2008, 01:33:11 pm »
u may use  knabench system for speed comparision ...

TheMule

  • Guest
Re: GPU client
« Reply #181 on: 15 Jul 2008, 02:08:16 pm »
Ok, not what I expected. Using KNAbench and work unit 1, I got:

226 sec - setiathome_6.01_windows_intelx86
203 sec - setiathome_5.27_windows_intelx86

About 23 sec slower.  Is it due to the FFT messages on the screen? Data follows:

setiathome_5.27_windows_intelx86.exe -nographics / testWU-1.wu :
Started at  : 13:53:37
Ended at    : 13:57:00
Elapsed time: 203 seconds
 
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_enhanced 5.27 DevC++/MinGW

Work Unit Info:
...............
WU true angle range is :  0.604884
Optimal function choices:
-----------------------------------------------------
name               
-----------------------------------------------------
              v_BaseLineSmooth (no other)
   v_vGetPowerSpectrumUnrolled 0.00006 0.00000
             sse3_ChirpData_ak 0.00899 0.00000
                 v_vTranspose4 0.00143 0.00000
                AK SSE folding 0.00076 0.00000

Flopcounter: 637401180238.359500

Spike count:    0
Pulse count:    0
Triplet count:  0
Gaussian count: 0
[ /stderr ]



setiathome_6.01_windows_intelx86.exe -nographics / testWU-1.wu :
Started at  : 13:44:56
Ended at    : 13:48:42
Elapsed time: 226 seconds
 
[ stderr ]
Device name: GeForce 8800 GTS 512
Device version: 1.1
Total global memory (MB): 512
Number of multiprocessors : 16
Number of cores :128
Shared memory per block (kB): 16
Registers per block: 8192
Warp size: 32
Max threads per block: 512
Shaders clock rate (MHz): 1674
Concurrent copy and execution: No
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_enhanced 6.01 Visual Studio/Microsoft C++
libboinc: 6.3.4

Work Unit Info:
...............
WU true angle range is :  0.604884

Flopcounter: 627299330081.366820

Spike count:    0
Pulse count:    0
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]
------------




Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #182 on: 15 Jul 2008, 03:21:35 pm »
okay :
new code - now 64-bit ...

as previous 32-bit build ....

compiled with VS2008+VS2005 under Windows Server 2008 x64

small test :
Code: [Select]
============
setiathome_6.00S08_windows_intelx86.exe -verb -nog / testWU-4.wu :
Started at  : 20:57:18.970
Ended at    : 21:00:46.190
    207.126 secs Elapsed
    199.109 secs CPU time
 
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_enhanced 6.00S08 DevC++/MinGW
libboinc: 6.1.6

DataIn=0x32b00c0, ChirpedData=0x2aa0040

Work Unit Info:
...............
WU true angle range is :  1.279649
Optimal function choices:
-----------------------------------------------------
                          name  timing   error
-----------------------------------------------------
              v_BaseLineSmooth (no other)

            v_GetPowerSpectrum 0.00079 0.00000  test
           v_vGetPowerSpectrum 0.00073 0.00000  test
          v_vGetPowerSpectrum2 0.00075 0.00000  test
   v_vGetPowerSpectrumUnrolled 0.00076 0.00000  test
  v_vGetPowerSpectrumUnrolled2 0.00075 0.00000  test
           v_vGetPowerSpectrum 0.00073 0.00000  choice

                   v_ChirpData 0.03327 0.00000  test
                 fpu_ChirpData 0.04556 0.00000  test
           v_vChirpData_x86_64 0.24693 0.00002  test
             sse1_ChirpData_ak 0.03216 0.00000  test
             sse2_ChirpData_ak 0.03455 0.00000  test
             sse3_ChirpData_ak 0.02924 0.00000  test
             sse3_ChirpData_ak 0.02924 0.00000  choice

                   v_Transpose 0.04322 0.00000  test
                  v_Transpose2 0.02599 0.00000  test
                  v_Transpose4 0.01550 0.00000  test
                  v_Transpose8 0.02781 0.00000  test
                v_pfTranspose2 0.02539 0.00000  test
                v_pfTranspose4 0.01571 0.00000  test
                v_pfTranspose8 0.02681 0.00000  test
                 v_vTranspose4 0.01173 0.00000  test
               v_vTranspose4np 0.01197 0.00000  test
              v_vTranspose4ntw 0.01090 0.00000  test
            v_vTranspose4x8ntw 0.00758 0.00000  test
           v_vTranspose4x16ntw 0.00580 0.00000  test
          v_vpfTranspose8x4ntw 0.01072 0.00000  test
           v_vTranspose4x16ntw 0.00580 0.00000  choice

               FPU opt folding 0.00423 0.00000  test
                AK SSE folding 0.00220 0.00000  test
                BH SSE folding 0.00201 0.00000  test
                BH SSE folding 0.00201 0.00000  choice


Flopcounter: 243285924139.522000

Spike count:    0
Pulse count:    0
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]
------------
setiathome_6.01_windows_intelx64.exe -verb -st / testWU-4.wu :
Started at  : 21:00:46.346
Ended at    : 21:03:02.643
    136.219 secs Elapsed
    128.750 secs CPU time
Speedup     : 35.34%
Ratio       : 1.55 x
 
Result      : Strongly similar,  Q= 99.99%
[ stderr ]
Device name: GeForce 9600 GT
Device version: 1.1
Total global memory (MB): 512
Number of multiprocessors : 8
Number of cores :64
Shared memory per block (kB): 16
Registers per block: 8192
Warp size: 32
Max threads per block: 512
Shaders clock rate (MHz): 1625
Concurrent copy and execution: No
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_enhanced 6.01 Visual Studio/Microsoft C++
libboinc: 6.3.5

Work Unit Info:
...............
WU true angle range is :  1.279649

Flopcounter: 238022320153.522060

Spike count:    0
Pulse count:    0
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]
 
 
------------
 
Quick timetable
 
WU : testWU-4.wu
setiathome_6.00S08_windows_intelx86.exe : 199.109 secs CPU
setiathome_6.01_windows_intelx64.exe : 128.750 secs CPU
Speedup     : 35.34%
Ratio       : 1.55 x
 
------------
CPU:
Number of processors 1
Number of cores 1 (max 1)
Specification AMD Athlon(tm) 64 Processor 3000+
Codename Venice
Core Speed 1005.3 MHz (5.0 x 201.1 MHz)
Core Stepping DH-E6
Technology 90 nm
Stock frequency 1800 MHz
------------
Chipset:
Northbridge NVIDIA nForce4 rev. A3
Southbridge NVIDIA nForce4 MCP rev. A3
------------
RAM:
Memory Type DDR
Memory Size 2048 MBytes
Memory Frequency 201.1 MHz (CPU/5)
Max bandwidth PC3200 (200 MHz)
CAS# 3.0
RAS# to CAS# 3
RAS# Precharge 3
Cycle Time (tRAS) 8
DRAM Idle Timer 16
------------
OS:
Windows Version Microsoft Windows Vista (6.0) Enterprise Edition (Full)  Service Pack 1 (Build 6001)
============

apps was runnig almost all the time at 100 percent - MS has made very good job with 2008 server in performance ....

[attachment deleted by admin]

Offline Morten

  • Knight o' The Round Table
  • ***
  • Posts: 165
Re: GPU client
« Reply #183 on: 15 Jul 2008, 11:04:45 pm »
Hi,

Tested x64-version and got this:

==================
Device name: Device Emulation (CPU)
Device version: 9999.9999
Total global memory (MB): 4095
Number of multiprocessors : 16
Number of cores :128
Shared memory per block (kB): 16
Registers per block: 8192
Warp size: 1
Max threads per block: 512
Shaders clock rate (MHz): 1350
Concurrent copy and execution: No
Can't set up shared mem: -1
Will run in standalone mode.
GPU memory allocation error (source buffer) ...

==================

I'm running Cuda display driver NVIDIADisplayWinVista64(177_35)Int.exe on Geforce 8800 GT

Morten

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #184 on: 16 Jul 2008, 12:42:55 am »
has someone same problem ?

try use latest drivers ....

Offline Morten

  • Knight o' The Round Table
  • ***
  • Posts: 165
Re: GPU client
« Reply #185 on: 16 Jul 2008, 01:53:05 pm »
I found the cause of the problem:

I was connected to the machine using RDP/Terminal Services ("mstsc /v:computer /console"). In this session Nvidia is not available.

After testing this I have some questions/comments:

1: When running the executable it's using 100% CPU - shouldn't the CPU utilization be close to zero and GPU be utilized to the max? As it is now it has no practical use as I give away my CPU in order to utilize the GPU.
2: How to install an run in combo with BOINC? What is your roadmap/intention on this?
3: With the Terminal Services issue mentioned, it appears the only way to run interactively is being logged on locally/physically.
3a: The best way to run is as a service - do you have any suggestions/plans on how to facilitate a service installation, or just use sc.exe?

Morten

[attachment deleted by admin]

Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #186 on: 16 Jul 2008, 02:08:05 pm »
1. for now not using streams and ported only 10 % of code to GPU ...
2. this code is onlz technology preview so i dont know .....
3. i dont know about  some workaround with terminal services ... sorry
4. service running is managed by BOINC core not by computing app ....

Offline Morten

  • Knight o' The Round Table
  • ***
  • Posts: 165
Re: GPU client
« Reply #187 on: 16 Jul 2008, 08:28:18 pm »
Hi,

Thanks for clearing that up.

Do you recon it's realistic to port 100% to GPU? Do you have an idea how much you will be able to port and when? I think this is such an excellent idea and am really hoping you'll be able to pull it off!

M

cbuchner1

  • Guest
Re: GPU client
« Reply #188 on: 19 Jul 2008, 04:17:39 pm »
fft and powerspectrum on GPU

are you making use of CUFFT's batching feature? If you do, you can basically run multiple FFTs with one CUDA call, which can save some API and kernel launch overhead.


Offline Devaster

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 653
  • I like Duke !!!
Re: GPU client
« Reply #189 on: 20 Jul 2008, 12:07:03 am »
yes , used cufft batch mode ....

BerndBrot

  • Guest
Re: GPU client
« Reply #190 on: 28 Jul 2008, 11:22:00 am »
okay :
new code - now 64-bit ...

as previous 32-bit build ....

compiled with VS2008+VS2005 under Windows Server 2008 x64

small test :


apps was runnig almost all the time at 100 percent - MS has made very good job with 2008 server in performance ....

How to install the test app?

Archangel999

  • Guest
Re: GPU client
« Reply #191 on: 24 Aug 2008, 02:16:07 pm »
all working fine
for wu1
with the x64 6.01 app ----- 127 sec
with ak v8 SSSE3.1 -----  69 sec
with ak v8 SSE4.1 ----- 45 sec
Best Regards
D.Draganov
Nvidia GeForce 8800GTX 768Mb
Core Duo E8500 @ 4.17
Windows x64 XP Pro

Just wondering if the GPU is 100% load :) myhahahaha
and when it recog. as another pro not a co proc



Device name: GeForce 8800 GTX
Device version: 1.0
Total global memory (MB): 767
Number of multiprocessors : 16
Number of cores :128
Shared memory per block (kB): 16
Registers per block: 8192
Warp size: 32
Max threads per block: 512
Shaders clock rate (MHz): 1350
Concurrent copy and execution: No
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_enhanced 6.01 Visual Studio/Microsoft C++
libboinc: 6.3.5
« Last Edit: 24 Aug 2008, 03:19:32 pm by Archangel999 »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: GPU client
« Reply #192 on: 24 Aug 2008, 03:10:23 pm »
What you mean?
What your GPU timing?

Archangel999

  • Guest
Re: GPU client
« Reply #193 on: 24 Aug 2008, 03:14:56 pm »
What you mean?
What your GPU timing?
all stock
engine 576
shader 1350
memory 1800
if i understand what you are asking

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: GPU client
« Reply #194 on: 24 Aug 2008, 03:45:59 pm »
Ah, no, I asked what time it takes to run GPU-version of SETI client on your host?
You wrote
Quote
with the x64 6.01 app ----- 127 sec
with ak v8 SSSE3.1 -----  69 sec
with ak v8 SSE4.1 ----- 45 sec
Are these numbers GPU-app run times?
What GPU app version you used?

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 226
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 112
Total: 112
Powered by EzPortal