Forum > GPU crunching

GPU AP tuning: new set of test tasks for GPU AP

(1/4) > >>

Raistmer:
Feature of this set: tasks are zero-blanked w/o any signals. This constitutes ideal case for GPU part.
Preferred usage: to tune GPU AP parameters.
Also can be used in testing phase for check for false positives.

I personally prefer to use long enough (around 300secs mean elapsed time) task for tuning and performance measuring cause such task total length much more than startup time but still small enough for doing whole bunch of runs with different params.

Here longest one attached. Execution time on my HD6950 around 300secs, with good (but not nessesary the best) params:
-unroll 10 -ffa_block 4096 -ffa_block_fetch 4096
on idle CPU with Cat 12.1 driver, Win7 x64 OS.
WU : Clean_20LC.wu
AP6_win_x86_SSE2_OpenCL_ATI_r1363.exe -verbose  :
  Elapsed 304.541 secs
      CPU 91.573 secs

    single pulses: 0
repetitive pulses: 0
  percent blanked: 0.00

For slower GPUs I will attach smaller tasks. One can create own Clean_*LC task by editing     <dm_high>3455</dm_high> field.

EDIT: Perl-based extraction script added

{edit} JWS: Removing the Clean_20LC.rar attachment, now included in the Clean_xxLC_WUs.7z download for AP test tools.

Raistmer:
And here is the smallest task, for low-end GPUs.

My C-60 does it in~300 secs

Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1363.exe -unroll 2
with WU     : Clean_01LC.wu
Started at  : 02:48:48.977
Ended at    : 02:54:12.341
    323.181 secs Elapsed
    233.237 secs CPU time

{edit} JWS: Removing the Clean_01LC.wu.7z attachment, now included in the Clean_xxLC_WUs.7z download for AP test tools.

Claggy:
Here's a bench of the  Clean_20LC.wu task on my GTX460 (Win 7 x64, 304.79)

AP6_win_x86_SSE2_OpenCL_NV_r1363.exe  / Clean_20LC.wu :
AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe
AppArgs: 
TaskName: Clean_20LC.wu
Started at  : 01:37:12.156
Ended at    : 01:45:11.080
    478.900 secs Elapsed
    464.197 secs CPU time
Speedup     : 93.34%
Ratio       : 15.02x
 
ref-astropulse_6.01_windows_intelx86.exe-Clean_20LC.wu.res: <ap_signal>10,<pulses>0,<best_pulses>10
result-AP6_win_x86_SSE2_OpenCL_NV_r1363.exe-Clean_20LC.wu.res: <ap_signal>10,<pulses>0,<best_pulses>10
             All Signals: Weakly similar or Different.
                  Pulses: Checked   0,  0 , Strongly Similar
             Best Pulses: Weakly similar or Different.

Claggy

Raistmer:
Took quite more than 300 secs so if you will attempt to tune params on your GPU task could be reduced a little. Besides of fixed startup time execution time for AP clean task should scale linearly with number of large DM chuncks involved. Each large DM chunk consists of 128DMs. Lowest one is 896 so expression 896+N*128-1 can be used for high DM field where N is the number of large DM chunks task will contain. In your case I would use 12 or 13 LC one instead of 20LC.

Raistmer:
To get ~same execution time for C-60 as for HD6950 task was reduced in 20 times.
But C-60 has 2 CU while HD6950 has 22 CU, 11 times difference, not 20.
Looks like another 9 times of slowdown came from memory used. HD6950 is discrete GPU with own dedicated fast memory while C-60 is APU that uses system memory.

Interesting, what slowdown came from different architectures of these 2 devices....

Navigation

[0] Message Index

[#] Next page

Go to full version