+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: ATI OpenCL AstroPulse (rev516) released  (Read 25229 times)

Offline benool

  • Squire
  • *
  • Posts: 45
Re: ATI OpenCL AstroPulse (rev516)
« Reply #15 on: 19 Mar 2011, 11:27:26 am »
here you go.

3 CSVs files with defaults from sprofile:

"ATI4550_unroll_5.csv" is from about an hour of runtime

"ATI4550_unroll_10.csv" and "ATI4550_unroll_10_2ndrun.csv" is from 2 attempts to run when using unroll 10. I uncluded both because they seems quite different. Appliation terminates in both cases (I earase all ap_state, fold.dat, pusle.out etc between each run)

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #16 on: 07 Apr 2011, 05:34:29 pm »
Gonna give it another try, new host i7-2600, 2x EAH5870, WIN7 64Bit Pro, BOINC 6.10.60 64Bit.
What unroll figure/factor is OK to try on these cards.

Offline skildude

  • Knight o' The Round Table
  • ***
  • Posts: 168
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #17 on: 07 Apr 2011, 05:53:21 pm »
my cards seem to like unroll at 10 but you'll need to adjust yours to your own liking.  10 is a good starting place

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #18 on: 20 Apr 2011, 05:49:15 am »
my cards seem to like unroll at 10 but you'll need to adjust yours to your own liking.  10 is a good starting place

Hello, started testing the ATI AP app. rev.516 on this rig.

Tried some different Unroll values, like 12, 13 and 14, even at 12, screenlag becomes too heavy, so put it back to 10.
But did double the ffa_block & ffa_block_fetch and run 2 per 5870 (2) and 8 MB WUs using the SSSE3x flavor, memory use is quite high,
(temps are quite high for 4x 2GByte DDR3 @1333MHz.) but puter is stable and can (& will) be used for other things, except, playing MPeG 2 or
 a game  ;) .

If the AP WUs validate, I can start using the MB (rev.177). Trying to learn, in & outs of OpenCL....... :o

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #19 on: 20 Apr 2011, 05:48:56 pm »
Couldn't help playing with ffa_bock_fetch and unroll, while running WU's,
but first 5 AP WU's with rev 516 have validated,
last of 5 AP W,
well no harm done. ::)

B.t.w. I still had some 100 Collatz C. WU's, deadline from 10 minutes to 2 to 3 days, so it runs a few at night, cooler  ;)
But GPU are almost trashed by C.C. load, fans at max, temps at max, when I go to sleep, have this one in my sleeping
room and is quite noisy with such TREATMENT , not good for the average life span and safe use of the
host, cause it gets really hot.

Also have some MW, but is it still active?
Back on topic, though.
« Last Edit: 20 Apr 2011, 07:09:25 pm by Fredericx51 »

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #20 on: 21 Apr 2011, 11:08:05 am »
Some more validations with rev.516, GPU use is almost 100%, CPU use heavily depending on blanking %, here are the latest
results, this one.

On this host.

Another one.




Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #21 on: 22 Apr 2011, 02:03:09 pm »
Last validated AP WU.

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<stderr_txt>
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:10
FFA thread block override value:8192
FFA thread fetchblock override value:4096
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 3;   AstroPulse v. 5.06
Non-graphics   FFTW   USE_CONVERSION_OPT   
Windows x86 rev 516, 5.06 match, by Raistmer with support of Lunatics.kwsn.net team.   SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mod, by Joe Segur.
static fftw lib, built by Jason G.
SSE3 dechirping by JDWhale

Build features: Non-graphics   OpenCL   COMBINED_DECHIRP_KERNEL   FFTW   USE_INCREASED_PRECISION   USE_SSE2   x86   
     CPUID:         Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
Number of OpenCL platforms:             1


 OpenCL Platform Name:                AMD Accelerated Parallel Processing
Number of devices:             2
  Max compute units:             20
  Max work group size:             256
  Max clock frequency:             875Mhz
  Max memory allocation:          134217728
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             536870912
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Cypress
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1332
  Version:                OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
  Max compute units:             20
  Max work group size:             256
  Max clock frequency:             875Mhz
  Max memory allocation:          134217728
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             536870912
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Cypress
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1332
  Version:                OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing


Info : Building Program (clBuildProgram):main kernels: OK code 0


    single pulses: 1
repetitive pulses: 0
  percent blanked: 0.00
class T_remove_radar:   total=3.71e+009,   N=1,   <>=3.71e+009,   min=3.71e+009,   max=3.71e+009
class T_main_loop_L1:   total=3.35e+013,   N=111,   <>=3.02e+011,   min=2.20e+011,   max=3.70e+011
 class T_FFT_forward:   total=8.62e+009,   N=182040,   <>=4.73e+004,   min=1.16e+004,   max=3.20e+008
 class T_remove_radar_randomize:   total=2.20e+009,   N=1817736,   <>=1.21e+003,   min=3.50e+002,   max=1.22e+008
 class T_build_chirp_table:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
 class T_DataWrite:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
  class T_DataWrite_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_oclReadBuf:   total=6.70e+006,   N=182040,   <>=3.60e+001,   min=1.80e+001,   max=2.11e+003
   class T_ChirpWrite:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
    class T_ChirpWrite_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_dechirp:   total=7.42e+009,   N=182040,   <>=4.07e+004,   min=1.60e+004,   max=1.21e+008
  class Dechirp_ns:   total=0,   N=0,   <>=0,   min=0   max=0
  class Half_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_PC_single_pulse_kernel_FFA_update:   total=1.22e+013,   N=182040,   <>=6.70e+007,   min=2.15e+007,   max=6.12e+008
  class PC_ns:   total=0,   N=0,   <>=0,   min=0   max=0
class T_oclReadBuf:   total=6.70e+006,   N=182040,   <>=3.60e+001,   min=1.80e+001,   max=2.11e+003
class T_oclWriteBuf:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
  class T_FFT_inverse:   total=3.22e+009,   N=182040,   <>=1.77e+004,   min=9.08e+003,   max=1.21e+008
 class T_ffa:   total=2.13e+013,   N=1998,   <>=1.06e+010,   min=1.15e+009,   max=5.62e+010
class T_GPU_buffer_read_backs:   total=2,   N=2,   <>=1,   min=1   max=1
USE_OPENCL   OPENCL_WRITE   USE_INCREASED_PRECISION   SMALL_CHIRP_TABLE   
rev 516
19:25:24 (3200): called boinc_finish

</stderr_txt>
]]>


Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #22 on: 24 Apr 2011, 09:37:26 am »
It's quied in here, but still trying different settings with unroll_data_chunk=16 , ffa_block=10240 an ffa_block_fetch 2048 (5:1), which gives
a almost constant 48%-58% GPU load, also doing 2 at a time, on 2 EAH5870's, starts to look like a Sweet-Spot, so I'll let these run, since I've still
AP WU's on this host.

Also almost no screen lag.


« Last Edit: 24 Apr 2011, 09:40:04 am by Fredericx51 »

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #23 on: 24 Apr 2011, 06:59:35 pm »
Two validated AP WU's with rev.516 and a few changes in ffa_block & ffa_block_fetch, unroll=16,

Both ATI AP tasks.

ATI and stock app..

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<stderr_txt>
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:4096
FFA thread fetchblock override value:2048
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 3;   AstroPulse v. 5.06
Non-graphics   FFTW   USE_CONVERSION_OPT   
Windows x86 rev 516, 5.06 match, by Raistmer with support of Lunatics.kwsn.net team.   SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mod, by Joe Segur.
static fftw lib, built by Jason G.
SSE3 dechirping by JDWhale

Build features: Non-graphics   OpenCL   COMBINED_DECHIRP_KERNEL   FFTW   USE_INCREASED_PRECISION   USE_SSE2   x86   
     CPUID:         Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
Number of OpenCL platforms:             1


 OpenCL Platform Name:                AMD Accelerated Parallel Processing
Number of devices:             2
  Max compute units:             20
  Max work group size:             256
  Max clock frequency:             890Mhz
  Max memory allocation:          134217728
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             536870912
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Cypress
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1332
  Version:                OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
  Max compute units:             20
  Max work group size:             256
  Max clock frequency:             890Mhz
  Max memory allocation:          134217728
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             536870912
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Cypress
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1332
  Version:                OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing


Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:6144
FFA thread fetchblock override value:2048
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2;   ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 3;   ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:2048
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2;   ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:12
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2;   ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:15
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2;   ### Restart at 92.79 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0


    single pulses: 3
repetitive pulses: 1
  percent blanked: 8.89
class T_remove_radar:   total=4.45e+009,   N=1,   <>=4.45e+009,   min=4.45e+009,   max=4.45e+009
class T_main_loop_L1:   total=1.04e+012,   N=7,   <>=1.48e+011,   min=1.36e+011,   max=1.98e+011
 class T_FFT_forward:   total=9.72e+009,   N=7672,   <>=1.27e+006,   min=1.60e+004,   max=9.38e+009
 class T_remove_radar_randomize:   total=1.20e+011,   N=114632,   <>=1.05e+006,   min=3.59e+002,   max=1.31e+008
 class T_build_chirp_table:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
 class T_DataWrite:   total=4.60e+007,   N=840,   <>=5.47e+004,   min=1.96e+004,   max=2.54e+005
  class T_DataWrite_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_oclReadBuf:   total=2.91e+005,   N=7672,   <>=3.70e+001,   min=1.80e+001,   max=1.21e+003
   class T_ChirpWrite:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
    class T_ChirpWrite_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_dechirp:   total=2.81e+008,   N=7672,   <>=3.66e+004,   min=2.06e+004,   max=1.66e+006
  class Dechirp_ns:   total=0,   N=0,   <>=0,   min=0   max=0
  class Half_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_PC_single_pulse_kernel_FFA_update:   total=4.37e+011,   N=7672,   <>=5.69e+007,   min=3.07e+007,   max=1.30e+010
  class PC_ns:   total=0,   N=0,   <>=0,   min=0   max=0
class T_oclReadBuf:   total=2.91e+005,   N=7672,   <>=3.70e+001,   min=1.80e+001,   max=1.21e+003
class T_oclWriteBuf:   total=4.68e+007,   N=840,   <>=5.57e+004,   min=2.00e+004,   max=2.56e+005
  class T_FFT_inverse:   total=1.18e+008,   N=7672,   <>=1.54e+004,   min=1.05e+004,   max=4.21e+005
 class T_ffa:   total=4.49e+011,   N=126,   <>=3.56e+009,   min=1.34e+009,   max=2.01e+010
class T_GPU_buffer_read_backs:   total=1,   N=1,   <>=1,   min=1   max=1
USE_OPENCL   OPENCL_WRITE   USE_INCREASED_PRECISION   SMALL_CHIRP_TABLE   
rev 516
14:40:55 (3104): called boinc_finish

</stderr_txt>
]]>

I'll start MB (rev.177), too, is it possible to run AP & MB on GPU, at the same time?


« Last Edit: 24 Apr 2011, 07:07:33 pm by Fredericx51 »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #24 on: 25 Apr 2011, 03:31:16 am »

I'll start MB (rev.177), too, is it possible to run AP & MB on GPU, at the same time?

If both configured appropriately (for 2 instance run) - should be possible.

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #25 on: 10 May 2011, 04:47:52 pm »
I just  DownLoaded from your Russian site, at least tried, like previous time (rev.516), but got rev.521 and installed
it.
Since I  can't use an AC, last days, friday, saturday, sunday and today (tuesday), temps were 25C till 31C and had to shutdown,
all, but 1 rig (X9650@3.51GHz. + 1x GTX480), whithout a casing, has no heat problems. (Computer cases, 9 out of 10,
isn't up for this job, 1, 2 or more GPU's, produce such heat, they should have their own separate casing, in or out of the case!

Got them up and running now, appeared to have some MW WU's (deadline 1 to 2 days), then I can try your latest rev.521 for AP
work.

I saw 2 AP WU's , running on 1 HD5870, looked like they'd crashed.............!

Better to try 1 at a time and with similar cmd line options  used with rev.516 ?




« Last Edit: 10 May 2011, 06:20:41 pm by Fredericx51 »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: ATI OpenCL AstroPulse (rev516) released
« Reply #26 on: 10 May 2011, 11:27:31 pm »

Better to try 1 at a time and with similar cmd line options  used with rev.516 ?


YEs, options should be the same

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 40
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 31
Total: 31
Powered by EzPortal