+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: optimized sources  (Read 615693 times)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #570 on: 01 Jun 2010, 09:27:50 am »
GeForce GTX470
I got it running...
~~~~~~~~~~~
oclDeviceQuery.exe Starting...

OpenCL SW Info:

 CL_PLATFORM_NAME:      NVIDIA CUDA
 CL_PLATFORM_VERSION:   OpenCL 1.0 CUDA 3.0.1
 OpenCL SDK Revision:   5537818


OpenCL Device Info:

 1 devices found supporting OpenCL:

 ---------------------------------
 Device GeForce GTX 470
 ---------------------------------
  CL_DEVICE_NAME:                       GeForce GTX 470
  CL_DEVICE_VENDOR:                     NVIDIA Corporation
  CL_DRIVER_VERSION:                    197.75
  CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:          14
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:        1024 / 1024 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:        1024
  CL_DEVICE_MAX_CLOCK_FREQUENCY:        810 MHz
  CL_DEVICE_ADDRESS_BITS:               32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:         312 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:            1248 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
  CL_DEVICE_LOCAL_MEM_TYPE:             local
  CL_DEVICE_LOCAL_MEM_SIZE:             48 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:              1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:        128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       8
  CL_DEVICE_SINGLE_FP_CONFIG:           INF-quietNaNs round-to-nearest round-to-
zero round-to-inf fma

  CL_DEVICE_IMAGE <dim>                 2D_MAX_WIDTH     8192
                                        2D_MAX_HEIGHT    8192
                                        3D_MAX_WIDTH     2048
                                        3D_MAX_HEIGHT    2048
                                        3D_MAX_DEPTH     2048

  CL_DEVICE_EXTENSIONS:                 cl_khr_byte_addressable_store
                                        cl_khr_icd
                                        cl_khr_gl_sharing
                                        cl_nv_d3d9_sharing
                                        cl_nv_d3d10_sharing
                                        cl_nv_d3d11_sharing
                                        cl_nv_compiler_options
                                        cl_nv_device_attribute_query
                                        cl_nv_pragma_unroll
                                        cl_khr_global_int32_base_atomics
                                        cl_khr_global_int32_extended_atomics
                                        cl_khr_local_int32_base_atomics
                                        cl_khr_local_int32_extended_atomics
                                        cl_khr_fp64


  CL_DEVICE_COMPUTE_CAPABILITY_NV:      2.0
  NUMBER OF MULTIPROCESSORS:            14
  NUMBER OF CUDA CORES:                 448
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:     32768
  CL_DEVICE_WARP_SIZE_NV:               32
  CL_DEVICE_GPU_OVERLAP_NV:             CL_TRUE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:     CL_FALSE
  CL_DEVICE_INTEGRATED_MEMORY_NV:       CL_FALSE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>  CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1,
 DOUBLE 1


oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA
3.0.1, SDK Revision = 5537818, NumDevs = 1, Device = GeForce GTX 470

System Info:

 Local Time/Date = 15:22:35, 6/1/2010
 CPU Arch: 0
 CPU Level: 6
 # of CPU processors: 8
 Windows Build: 6002
 Windows Ver: 6.0


PASSED


Press <Enter> to Quit...
-----------------------------------------------------------
regards  ;)

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #571 on: 02 Jun 2010, 07:16:43 pm »
02.06.2010 22:27:47      NVIDIA GPU 0: GeForce GTX 470 (driver version 25715, CUDA version 3010, compute capability 2.0, 1248MB, 726 GFLOPS peak)

come to our beta forum to test the new sah fermi- app.

regards  heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #572 on: 03 Jun 2010, 05:18:17 pm »
If you want to see some fermi results have a look at my host
~12 a half min per wu against my Xeon with 3 hours.

 :o

 ;D

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #573 on: 08 Jun 2010, 11:12:44 am »
The Fermi application (v6.10) has become visible on the SETI applications page.
If you have a GTX470/480 you can download now and run it.
Work is not available at the moment, why the splitters are offline.
We are all waiting now.

heinz  ;)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #574 on: 28 Aug 2010, 08:49:40 am »
Vacation are over now, thank you all for your patience. ;)

Hi Jason,
the ION wu is up now.
Laufzeit 13,861.30
CPU Zeit 508.61

http://setiathome.berkeley.edu/result.php?resultid=1693745738
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: ION, 241 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2
     clockRate = 1100000
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: ION is okay
SETI@home using CUDA accelerated device ION
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan

 )       _   _  _)_ o  _  _
(__ (_( ) ) (_( (_  ( (_ ( 
 not bad for a human...  _)

Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is :  0.410268

Flopcounter: 34143005518374.668000

Spike count:    0
Pulse count:    0
Triplet count:  1
Gaussian count: 0
called boinc_finish

</stderr_txt>
]]>
if one of you will download the new cuda app and use the Unified Installer v0.37 goto: http://lunatics.kwsn.net/index.php
regards Heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #575 on: 29 Aug 2010, 06:02:03 am »
Hi Jason,
next wu of ION is up now.
Computer ID 5510631
Ablaufdatum 15 Oct 2010 19:55:05 UTC
Laufzeit 14,564.88
CPU Zeit 520.29

http://setiathome.berkeley.edu/result.php?resultid=1693924875

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: ION, 241 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2
     clockRate = 1100000
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: ION is okay
SETI@home using CUDA accelerated device ION
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan

 )       _   _  _)_ o  _  _
(__ (_( ) ) (_( (_  ( (_ ( 
 not bad for a human...  _)

Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is :  0.392020

Flopcounter: 45991923541269.156000

Spike count:    3
Pulse count:    0
Triplet count:  2
Gaussian count: 1
called boinc_finish

</stderr_txt>
]]>
______________________________
4h:02min, not bad for the ION chip,
so we know now that the app works on this chipset too.
regards Heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #576 on: 29 Aug 2010, 06:12:36 am »
...
4h:02min, not bad for the ION chip,
so we know now that the app works on this chipset too.
regards Heinz

That's good news heinz.  There are ways we can speed this up for similar onboard GPUs, but knowing that it works OK for now is a good first step.

Jason

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #577 on: 29 Aug 2010, 06:32:07 am »
Hi Jason,
I changed the OS from Vista32 to W7 32 and had to make a clean install on this machine.
Now I had to reinstall all my compiler and development programs. I will do it in one of the next days so we can explore this chipset a little bit more.
heinz  :)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #578 on: 29 Aug 2010, 08:09:51 am »
Good move heinz. Also due to cheap/good hard drive availability here, I've been gradually migrating my development environment over to a performance raid 10 one, and that influences production workflow far more than I expected.  I'll soon be migrating also, gradually, to VS2008 for primary development, since the nVidia nSight stuff is made to work on that.  Might make life a bit easier if we are on similar platforms/environment.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: optimized sources
« Reply #579 on: 29 Aug 2010, 01:11:02 pm »
I'll soon be migrating also, gradually, to VS2008 for primary development, since the nVidia nSight stuff is made to work on that.  Might make life a bit easier if we are on similar platforms/environment.
Yeah, I'm on VS2008 too already :)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #580 on: 29 Aug 2010, 01:14:30 pm »
Yeah, I'm on VS2008 too already :)

Did you play with nSight already ? do Ati have something similar for you to use for openCL ? 

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: optimized sources
« Reply #581 on: 29 Aug 2010, 01:21:43 pm »
Downloaded it but not installed still (have no NV GPU in dev host now).
ATI far away with debugging/profiling/supporting tools as usual...
Each new release of KernelAnalyser immediately starts new thread on AMD forum with new bugs discovered.
For example, in last one I see only 4xxx GPUs, where 5xxx gone - no idea.... Before they never restrict me with actually installed hardware, now only 4xxx available (but few of them, not only really installed hardware GPU)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #582 on: 30 Aug 2010, 05:06:08 am »
Hi Jason,
ION-results:  http://setiathome.berkeley.edu/results.php?hostid=5510631
<coproc>
            <type>CUDA</type>
            <count>1.0</count>
</coproc>
GTX470-results:  http://setiathome.berkeley.edu/results.php?hostid=4387433
<coproc>
            <type>CUDA</type>
            <count>0.5</count>
</coproc>
__________________________
Modify some later:
GTX470-results:  http://setiathome.berkeley.edu/results.php?hostid=4387433
<coproc>
            <type>CUDA</type>
            <count>0.25</count>
</coproc>
no problem with 4 wu's parallel (time vary from 24 to 37 min)
-----------------------------------------
runs great  ;)
« Last Edit: 30 Aug 2010, 06:51:55 pm by _heinz »

Gecko_R7

  • Guest
Re: optimized sources
« Reply #583 on: 30 Aug 2010, 09:17:25 am »
@ Heinz,

Is it worth revisiting updated Atom CPU builds based on more recent source?

Even 5-10% differences are worthwhile when current Atom AP is @ 80 hours.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #584 on: 30 Aug 2010, 11:02:45 am »
@ Heinz,

Is it worth revisiting updated Atom CPU builds based on more recent source?

Even 5-10% differences are worthwhile when current Atom AP is @ 80 hours.

Hi Gecko,
I have several compiled apps for Atom under testing. If  I'm ready to reconfig my Atom R3600 and have all development programs installed again, I will compile some new atom-apps based on latest source-updates. You know I changed to W7 now, so a lot of updates are necessary.
heinz

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 14
Total: 14
Powered by EzPortal