Two validated AP WU's with rev.516 and a few changes in
ffa_block & ffa_block_fetch, unroll=16,
Both ATI AP tasks.ATI and stock app..<core_client_version>6.10.60</core_client_version>
<![CDATA[
<stderr_txt>
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:4096
FFA thread fetchblock override value:2048
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 3; AstroPulse v. 5.06
Non-graphics FFTW USE_CONVERSION_OPT
Windows x86 rev 516, 5.06 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2
OpenCL version by Raistmer
oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mod, by Joe Segur.
static fftw lib, built by Jason G.
SSE3 dechirping by JDWhale
Build features: Non-graphics OpenCL COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86
CPUID: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Cache: L1=64K L2=256K
CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
Number of OpenCL platforms: 1
OpenCL Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Max compute units: 20
Max work group size: 256
Max clock frequency: 890Mhz
Max memory allocation: 134217728
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Queue properties:
Out-of-Order: No
Name: Cypress
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.1332
Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
Max compute units: 20
Max work group size: 256
Max clock frequency: 890Mhz
Max memory allocation: 134217728
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Queue properties:
Out-of-Order: No
Name: Cypress
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.1332
Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
Info : Building Program (clBuildProgram):main kernels: OK code 0
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:6144
FFA thread fetchblock override value:2048
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2; ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 3; ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:16
FFA thread block override value:2048
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2; ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:12
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2; ### Restart at 78.38 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0
Number of app instances per device setted to:2
DATA_CHUNK_UNROLL setted to:15
FFA thread block override value:5120
FFA thread fetchblock override value:1024
Running on device number: 1
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 1 device, slots 2 to 3 (including) will be checked
Used slot is 2; ### Restart at 92.79 percent.
Info : Building Program (clBuildProgram):main kernels: OK code 0
single pulses: 3
repetitive pulses: 1
percent blanked: 8.89
class T_remove_radar: total=4.45e+009, N=1, <>=4.45e+009, min=4.45e+009, max=4.45e+009
class T_main_loop_L1: total=1.04e+012, N=7, <>=1.48e+011, min=1.36e+011, max=1.98e+011
class T_FFT_forward: total=9.72e+009, N=7672, <>=1.27e+006, min=1.60e+004, max=9.38e+009
class T_remove_radar_randomize: total=1.20e+011, N=114632, <>=1.05e+006, min=3.59e+002, max=1.31e+008
class T_build_chirp_table: total=0.00e+000, N=0, <>=0.00e+000, min=1.84e+019, max=0.00e+000
class T_DataWrite: total=4.60e+007, N=840, <>=5.47e+004, min=1.96e+004, max=2.54e+005
class T_DataWrite_ns: total=0, N=0, <>=0, min=0 max=0
class T_oclReadBuf: total=2.91e+005, N=7672, <>=3.70e+001, min=1.80e+001, max=1.21e+003
class T_ChirpWrite: total=0.00e+000, N=0, <>=0.00e+000, min=1.84e+019, max=0.00e+000
class T_ChirpWrite_ns: total=0, N=0, <>=0, min=0 max=0
class T_dechirp: total=2.81e+008, N=7672, <>=3.66e+004, min=2.06e+004, max=1.66e+006
class Dechirp_ns: total=0, N=0, <>=0, min=0 max=0
class Half_ns: total=0, N=0, <>=0, min=0 max=0
class T_PC_single_pulse_kernel_FFA_update: total=4.37e+011, N=7672, <>=5.69e+007, min=3.07e+007, max=1.30e+010
class PC_ns: total=0, N=0, <>=0, min=0 max=0
class T_oclReadBuf: total=2.91e+005, N=7672, <>=3.70e+001, min=1.80e+001, max=1.21e+003
class T_oclWriteBuf: total=4.68e+007, N=840, <>=5.57e+004, min=2.00e+004, max=2.56e+005
class T_FFT_inverse: total=1.18e+008, N=7672, <>=1.54e+004, min=1.05e+004, max=4.21e+005
class T_ffa: total=4.49e+011, N=126, <>=3.56e+009, min=1.34e+009, max=2.01e+010
class T_GPU_buffer_read_backs: total=1, N=1, <>=1, min=1 max=1
USE_OPENCL OPENCL_WRITE USE_INCREASED_PRECISION SMALL_CHIRP_TABLE
rev 516
14:40:55 (3104): called boinc_finish
</stderr_txt>
]]>
I'll start MB (rev.177), too, is it possible to run AP & MB on GPU, at the same time?