Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: Raistmer on 18 Mar 2011, 07:18:25 am

Title: ATI OpenCL AstroPulse (rev516) released
Post by: Raistmer on 18 Mar 2011, 07:18:25 am
It's ATI OpenCL builld update intended to replace rev456 OpenCL build and rev 449 OpenCL+Brook build.
Hardware requirements are the same as for rev456 OpenCL build (look old release notes here: http://lunatics.kwsn.net/12-gpu-crunching/astropulse-for-ati-gpus-released.msg31201.html#msg31201 )

New command line switches:
-hp - sets high priority class
-no_cpu_lock - disables affinity setting
-instances_per_device N - will allow running N copies per each supported GPU device (don't forget to set <count> field in app_info to 1/N to instruct BOINC to launch N tasks per GPU).

-unroll N -sets DATA_CHUNK_UNROLL variable to N. This allows to do N data chunks per FindSinglePulse kernel call improving (in most cases) performance but increasing GPU memory requirements. On low-end GPUs it may be worth to use lower values. Default setted to 10 as in r456 (there it was hardwired to 10).


Known issues:
Don't forget to finish current AP task before upgrade. Or you will need to manually update CL file not only in SETI project directory but in corresponding slot directory too. BOINC doesn't do this, design flaw IMHO.

Double check if your config (GPU+driver) has OpenCL support in case of mobility GPU. Ask Ati for OpenCL support if not.

app_info section for this app:

Code: [Select]
<app>
<name>astropulse_v505</name>
</app>
<file_info>
        <name>ap_5.06_win_x86_SSE2_OpenCL_ATI_r516.exe</name>
        <executable/>
    </file_info>
<file_info>
    <name>AstroPulse_Kernels.cl</name>
    <executable/>
</file_info>
 <app_version>
        <app_name>astropulse_v505</app_name>
        <version_num>506</version_num>
        <platform>windows_intelx86</platform>
        <avg_ncpus>0.04</avg_ncpus>
        <max_ncpus>0.20</max_ncpus>
        <plan_class>ati13ati</plan_class>
<cmdline>-instances_per_device 1 -hp -unroll 10 -ffa_block 4096 -ffa_block_fetch 2048</cmdline>
<flops>30987654321</flops>
             <file_ref>
                  <file_name>ap_5.06_win_x86_SSE2_OpenCL_ATI_r516.exe</file_name>
                  <main_program/>                           
           </file_ref>
    <file_ref>
        <file_name>AstroPulse_Kernels.cl</file_name>
        <copy_file/>
    </file_ref>
   <coproc>
   <type>ATI</type>
   <count>1</count>
   </coproc>
 </app_version>

Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: benool on 18 Mar 2011, 09:31:55 am
I'll be sure to test and report back.

Waiting on current AP to finish.
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Ghost0210 on 18 Mar 2011, 03:16:04 pm
Hi Raistmer, only a quick question - I thought in Beta this was version_num 506 in your app_info? The release version_num in this thread is 505?
Makes no difference, but thought I'd ask ;)
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Raistmer on 18 Mar 2011, 03:19:27 pm
Hi Raistmer, only a quick question - I thought in Beta this was version_num 506 in your app_info? The release version_num in this thread is 505?
Makes no difference, but thought I'd ask ;)
Doesn't matters, I'll check what was in prev release and correct if there was 5.06
[EDIT: changed to 506]
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Ghost0210 on 18 Mar 2011, 03:20:58 pm
Didn't think it mattered was just wondering if there was a reason?
(And it broke my task monitoring app  :o)
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Raistmer on 18 Mar 2011, 03:31:47 pm
Didn't think it mattered was just wondering if there was a reason?
(And it broke my task monitoring app  :o)
No reason besides some inaccuracy :) I took app_info section right from my production PC. And there I don;t care much for meaningless numbers :)

Also, I think flops section can/should (?) be omitted too now? Any opinions on flops ?
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Ghost0210 on 18 Mar 2011, 03:37:22 pm
I;ve only got them in becuase I'm currently running 6.12.18 and the v7 app @ beta and didn't want that app screwing up the rest of my times for my other tasks there
If it weren't for that I wouldn't bother either anymore

Edit: another quick question - with the app_info the plan_class doesn't get used either does it (I can't use standard plan_class as need a way to differentiate between Seti & Beta tasks)
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Raistmer on 18 Mar 2011, 04:22:49 pm
what do you want to use instead of ati13ati then ??
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Ghost0210 on 18 Mar 2011, 04:41:22 pm
what do you want to use instead of ati13ati then ??

I use ati13ati for Seti tasks and ati13ati_beta for beta tasks (and yes, I've also given cpu tasks a plan_class @ Beta as well)

I got fed-up with Boinc Manager and wrote a small app that gives me pratically the same information with half the overhead of BM
Only reason I ever open BM now is to start and stop processing and do network comms
Problem is, Beta and Seti Main tasks are identical in the client_state, so to seperate them out in the app, so I could get accurate info I had to change either the version_num or the plan_class values. I chose the plan_class to change

As far as I can tell it's made no difference to work fetch and/or scheduling as I think this is all done from the coproc value
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: benool on 18 Mar 2011, 05:13:03 pm
Seems to work great on my ATI 4550.  :-*

I had to set -unroll 5 instead of 10 otherwise I get computation errors.
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Raistmer on 18 Mar 2011, 05:36:57 pm
Seems to work great on my ATI 4550.  :-*

I had to set -unroll 5 instead of 10 otherwise I get computation errors.
Link to host?
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: benool on 19 Mar 2011, 02:32:29 am
http://setiathome.berkeley.edu/results.php?hostid=4876884
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Raistmer on 19 Mar 2011, 05:31:34 am
If you want to debug this problem you ould install APP SDK from AMD and check kernel exeution times under profiler. It's possible that they just take too long on this GPU (only 2 compute devices, 128 threads instead of 256, lower freq than on HD4870). First reported error appears inside FFT call.
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: benool on 19 Mar 2011, 08:43:22 am
okay, I got the SDK installed and figured out how to launch the app with the profiler.

Is there any particular option I should add to sprofile or are the defaults what you are looking for?

I'll do a pass with unroll at 10 and another with 5.
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: Raistmer on 19 Mar 2011, 08:53:08 am
defaults will go OK.
Title: Re: ATI OpenCL AstroPulse (rev516)
Post by: benool on 19 Mar 2011, 11:27:26 am
here you go.

3 CSVs files with defaults from sprofile:

"ATI4550_unroll_5.csv" is from about an hour of runtime

"ATI4550_unroll_10.csv" and "ATI4550_unroll_10_2ndrun.csv" is from 2 attempts to run when using unroll 10. I uncluded both because they seems quite different. Appliation terminates in both cases (I earase all ap_state, fold.dat, pusle.out etc between each run)
Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Fredericx51 on 07 Apr 2011, 05:34:29 pm
Gonna give it another try, new host i7-2600, 2x EAH5870, WIN7 64Bit Pro, BOINC 6.10.60 64Bit.
What unroll figure/factor is OK to try on these cards.
Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: skildude on 07 Apr 2011, 05:53:21 pm
my cards seem to like unroll at 10 but you'll need to adjust yours to your own liking.  10 is a good starting place
Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Fredericx51 on 20 Apr 2011, 05:49:15 am
my cards seem to like unroll at 10 but you'll need to adjust yours to your own liking.  10 is a good starting place

Hello, started testing the ATI AP app. rev.516 on this rig. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=5884985)

Tried some different Unroll values, like 12, 13 and 14, even at 12, screenlag becomes too heavy, so put it back to 10.
But did double the ffa_block & ffa_block_fetch and run 2 per 5870 (2) and 8 MB WUs using the SSSE3x flavor, memory use is quite high,
(temps are quite high for 4x 2GByte DDR3 @1333MHz.) but puter is stable and can (& will) be used for other things, except, playing MPeG 2 or
 a game  ;) .

If the AP WUs validate, I can start using the MB (rev.177). Trying to learn, in & outs of OpenCL....... :o
Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Fredericx51 on 20 Apr 2011, 05:48:56 pm
Couldn't help playing with ffa_bock_fetch and unroll, while running WU's,
but first 5 AP WU's with rev 516 have validated,
 last of 5 AP W,  (http://setiathome.berkeley.edu/workunit.php?wuid=728581935)
well no harm done. ::)

B.t.w. I still had some 100 Collatz C. WU's, deadline from 10 minutes to 2 to 3 days, so it runs a few at night, cooler  ;)
But GPU are almost trashed by C.C. load, fans at max, temps at max, when I go to sleep, have this one in my sleeping
room and is quite noisy with such TREATMENT , not good for the average life span and safe use of the
host, cause it gets really hot.

Also have some MW, but is it still active?
Back on topic, though.
Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Fredericx51 on 21 Apr 2011, 11:08:05 am
Some more validations with rev.516, GPU use is almost 100%, CPU use heavily depending on blanking %, here are the latest
results,  this one. (http://setiathome.berkeley.edu/workunit.php?wuid=728375448)

On this host. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=5884985)

Another one. (http://setiathome.berkeley.edu/workunit.php?wuid=728356762)



Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Fredericx51 on 22 Apr 2011, 02:03:09 pm
Last validated AP WU. (http://setiathome.berkeley.edu/workunit.php?wuid=728421332)

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<stderr_txt>
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:10
FFA thread block override value:8192
FFA thread fetchblock override value:4096
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 3;   AstroPulse v. 5.06
Non-graphics   FFTW   USE_CONVERSION_OPT   
Windows x86 rev 516, 5.06 match, by Raistmer with support of Lunatics.kwsn.net team.   SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mod, by Joe Segur.
static fftw lib, built by Jason G.
SSE3 dechirping by JDWhale

Build features: Non-graphics   OpenCL   COMBINED_DECHIRP_KERNEL   FFTW   USE_INCREASED_PRECISION   USE_SSE2   x86   
     CPUID:         Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
Number of OpenCL platforms:             1


 OpenCL Platform Name:                AMD Accelerated Parallel Processing
Number of devices:             2
  Max compute units:             20
  Max work group size:             256
  Max clock frequency:             875Mhz
  Max memory allocation:          134217728
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             536870912
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Cypress
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1332
  Version:                OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
  Max compute units:             20
  Max work group size:             256
  Max clock frequency:             875Mhz
  Max memory allocation:          134217728
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             536870912
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Cypress
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1332
  Version:                OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing


Info : Building Program (clBuildProgram):main kernels: OK code 0


    single pulses: 1
repetitive pulses: 0
  percent blanked: 0.00
class T_remove_radar:   total=3.71e+009,   N=1,   <>=3.71e+009,   min=3.71e+009,   max=3.71e+009
class T_main_loop_L1:   total=3.35e+013,   N=111,   <>=3.02e+011,   min=2.20e+011,   max=3.70e+011
 class T_FFT_forward:   total=8.62e+009,   N=182040,   <>=4.73e+004,   min=1.16e+004,   max=3.20e+008
 class T_remove_radar_randomize:   total=2.20e+009,   N=1817736,   <>=1.21e+003,   min=3.50e+002,   max=1.22e+008
 class T_build_chirp_table:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
 class T_DataWrite:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
  class T_DataWrite_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_oclReadBuf:   total=6.70e+006,   N=182040,   <>=3.60e+001,   min=1.80e+001,   max=2.11e+003
   class T_ChirpWrite:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
    class T_ChirpWrite_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_dechirp:   total=7.42e+009,   N=182040,   <>=4.07e+004,   min=1.60e+004,   max=1.21e+008
  class Dechirp_ns:   total=0,   N=0,   <>=0,   min=0   max=0
  class Half_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_PC_single_pulse_kernel_FFA_update:   total=1.22e+013,   N=182040,   <>=6.70e+007,   min=2.15e+007,   max=6.12e+008
  class PC_ns:   total=0,   N=0,   <>=0,   min=0   max=0
class T_oclReadBuf:   total=6.70e+006,   N=182040,   <>=3.60e+001,   min=1.80e+001,   max=2.11e+003
class T_oclWriteBuf:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
  class T_FFT_inverse:   total=3.22e+009,   N=182040,   <>=1.77e+004,   min=9.08e+003,   max=1.21e+008
 class T_ffa:   total=2.13e+013,   N=1998,   <>=1.06e+010,   min=1.15e+009,   max=5.62e+010
class T_GPU_buffer_read_backs:   total=2,   N=2,   <>=1,   min=1   max=1
USE_OPENCL   OPENCL_WRITE   USE_INCREASED_PRECISION   SMALL_CHIRP_TABLE   
rev 516
19:25:24 (3200): called boinc_finish

</stderr_txt>
]]>

Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Fredericx51 on 24 Apr 2011, 09:37:26 am
It's quied in here, but still trying different settings with unroll_data_chunk=16 , ffa_block=10240 an ffa_block_fetch 2048 (5:1), which gives
a almost constant 48%-58% GPU load, also doing 2 at a time, on 2 EAH5870's, starts to look like a Sweet-Spot, so I'll let these run, since I've still
AP WU's on this host. (http://setiathome.berkeley.edu/show_host_detail.php?hostid=5884985)

Also almost no screen lag.


Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Fredericx51 on 24 Apr 2011, 06:59:35 pm
Two validated AP WU's with rev.516 and a few changes in ffa_block & ffa_block_fetch, unroll=16,

 Both ATI AP tasks. (http://setiathome.berkeley.edu/workunit.php?wuid=730487052)

ATI and stock app.. (http://setiathome.berkeley.edu/workunit.php?wuid=730485662)

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<stderr_txt>
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:4096
FFA thread fetchblock override value:2048
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 3;   AstroPulse v. 5.06
Non-graphics   FFTW   USE_CONVERSION_OPT   
Windows x86 rev 516, 5.06 match, by Raistmer with support of Lunatics.kwsn.net team.   SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mod, by Joe Segur.
static fftw lib, built by Jason G.
SSE3 dechirping by JDWhale

Build features: Non-graphics   OpenCL   COMBINED_DECHIRP_KERNEL   FFTW   USE_INCREASED_PRECISION   USE_SSE2   x86   
     CPUID:         Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
Number of OpenCL platforms:             1


 OpenCL Platform Name:                AMD Accelerated Parallel Processing
Number of devices:             2
  Max compute units:             20
  Max work group size:             256
  Max clock frequency:             890Mhz
  Max memory allocation:          134217728
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             536870912
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Cypress
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1332
  Version:                OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
  Max compute units:             20
  Max work group size:             256
  Max clock frequency:             890Mhz
  Max memory allocation:          134217728
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             536870912
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Cypress
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1332
  Version:                OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Extensions:                cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing


Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:6144
FFA thread fetchblock override value:2048
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2;   ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 3;   ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:2048
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2;   ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:12
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2;   ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0

Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:15
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2;   ### Restart at 92.79 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0


    single pulses: 3
repetitive pulses: 1
  percent blanked: 8.89
class T_remove_radar:   total=4.45e+009,   N=1,   <>=4.45e+009,   min=4.45e+009,   max=4.45e+009
class T_main_loop_L1:   total=1.04e+012,   N=7,   <>=1.48e+011,   min=1.36e+011,   max=1.98e+011
 class T_FFT_forward:   total=9.72e+009,   N=7672,   <>=1.27e+006,   min=1.60e+004,   max=9.38e+009
 class T_remove_radar_randomize:   total=1.20e+011,   N=114632,   <>=1.05e+006,   min=3.59e+002,   max=1.31e+008
 class T_build_chirp_table:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
 class T_DataWrite:   total=4.60e+007,   N=840,   <>=5.47e+004,   min=1.96e+004,   max=2.54e+005
  class T_DataWrite_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_oclReadBuf:   total=2.91e+005,   N=7672,   <>=3.70e+001,   min=1.80e+001,   max=1.21e+003
   class T_ChirpWrite:   total=0.00e+000,   N=0,   <>=0.00e+000,   min=1.84e+019,   max=0.00e+000
    class T_ChirpWrite_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_dechirp:   total=2.81e+008,   N=7672,   <>=3.66e+004,   min=2.06e+004,   max=1.66e+006
  class Dechirp_ns:   total=0,   N=0,   <>=0,   min=0   max=0
  class Half_ns:   total=0,   N=0,   <>=0,   min=0   max=0
 class T_PC_single_pulse_kernel_FFA_update:   total=4.37e+011,   N=7672,   <>=5.69e+007,   min=3.07e+007,   max=1.30e+010
  class PC_ns:   total=0,   N=0,   <>=0,   min=0   max=0
class T_oclReadBuf:   total=2.91e+005,   N=7672,   <>=3.70e+001,   min=1.80e+001,   max=1.21e+003
class T_oclWriteBuf:   total=4.68e+007,   N=840,   <>=5.57e+004,   min=2.00e+004,   max=2.56e+005
  class T_FFT_inverse:   total=1.18e+008,   N=7672,   <>=1.54e+004,   min=1.05e+004,   max=4.21e+005
 class T_ffa:   total=4.49e+011,   N=126,   <>=3.56e+009,   min=1.34e+009,   max=2.01e+010
class T_GPU_buffer_read_backs:   total=1,   N=1,   <>=1,   min=1   max=1
USE_OPENCL   OPENCL_WRITE   USE_INCREASED_PRECISION   SMALL_CHIRP_TABLE   
rev 516
14:40:55 (3104): called boinc_finish

</stderr_txt>
]]>

I'll start MB (rev.177), too, is it possible to run AP & MB on GPU, at the same time?


Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Raistmer on 25 Apr 2011, 03:31:16 am

I'll start MB (rev.177), too, is it possible to run AP & MB on GPU, at the same time?

If both configured appropriately (for 2 instance run) - should be possible.
Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Fredericx51 on 10 May 2011, 04:47:52 pm
I just  DownLoaded from your Russian site, at least tried, like previous time (rev.516), but got rev.521 and installed
it.
Since I  can't use an AC, last days, friday, saturday, sunday and today (tuesday), temps were 25C till 31C and had to shutdown,
all, but 1 rig (X9650@3.51GHz. + 1x GTX480), whithout a casing, has no heat problems. (Computer cases, 9 out of 10,
isn't up for this job, 1, 2 or more GPU's, produce such heat, they should have their own separate casing, in or out of the case!

Got them up and running now, appeared to have some MW WU's (deadline 1 to 2 days), then I can try your latest rev.521 for AP
work.

I saw 2 AP WU's , running on 1 HD5870, looked like they'd crashed.............!

Better to try 1 at a time and with similar cmd line options  used with rev.516 ?




Title: Re: ATI OpenCL AstroPulse (rev516) released
Post by: Raistmer on 10 May 2011, 11:27:31 pm

Better to try 1 at a time and with similar cmd line options  used with rev.516 ?


YEs, options should be the same