Forum > GPU crunching
GPU AP tuning: new set of test tasks for GPU AP
Raistmer:
Feature of this set: tasks are zero-blanked w/o any signals. This constitutes ideal case for GPU part.
Preferred usage: to tune GPU AP parameters.
Also can be used in testing phase for check for false positives.
I personally prefer to use long enough (around 300secs mean elapsed time) task for tuning and performance measuring cause such task total length much more than startup time but still small enough for doing whole bunch of runs with different params.
Here longest one attached. Execution time on my HD6950 around 300secs, with good (but not nessesary the best) params:
-unroll 10 -ffa_block 4096 -ffa_block_fetch 4096
on idle CPU with Cat 12.1 driver, Win7 x64 OS.
WU : Clean_20LC.wu
AP6_win_x86_SSE2_OpenCL_ATI_r1363.exe -verbose :
Elapsed 304.541 secs
CPU 91.573 secs
single pulses: 0
repetitive pulses: 0
percent blanked: 0.00
For slower GPUs I will attach smaller tasks. One can create own Clean_*LC task by editing <dm_high>3455</dm_high> field.
EDIT: Perl-based extraction script added
{edit} JWS: Removing the Clean_20LC.rar attachment, now included in the Clean_xxLC_WUs.7z download for AP test tools.
Raistmer:
And here is the smallest task, for low-end GPUs.
My C-60 does it in~300 secs
Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1363.exe -unroll 2
with WU : Clean_01LC.wu
Started at : 02:48:48.977
Ended at : 02:54:12.341
323.181 secs Elapsed
233.237 secs CPU time
{edit} JWS: Removing the Clean_01LC.wu.7z attachment, now included in the Clean_xxLC_WUs.7z download for AP test tools.
Claggy:
Here's a bench of the Clean_20LC.wu task on my GTX460 (Win 7 x64, 304.79)
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe / Clean_20LC.wu :
AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe
AppArgs:
TaskName: Clean_20LC.wu
Started at : 01:37:12.156
Ended at : 01:45:11.080
478.900 secs Elapsed
464.197 secs CPU time
Speedup : 93.34%
Ratio : 15.02x
ref-astropulse_6.01_windows_intelx86.exe-Clean_20LC.wu.res: <ap_signal>10,<pulses>0,<best_pulses>10
result-AP6_win_x86_SSE2_OpenCL_NV_r1363.exe-Clean_20LC.wu.res: <ap_signal>10,<pulses>0,<best_pulses>10
All Signals: Weakly similar or Different.
Pulses: Checked 0, 0 , Strongly Similar
Best Pulses: Weakly similar or Different.
Claggy
Raistmer:
Took quite more than 300 secs so if you will attempt to tune params on your GPU task could be reduced a little. Besides of fixed startup time execution time for AP clean task should scale linearly with number of large DM chuncks involved. Each large DM chunk consists of 128DMs. Lowest one is 896 so expression 896+N*128-1 can be used for high DM field where N is the number of large DM chunks task will contain. In your case I would use 12 or 13 LC one instead of 20LC.
Raistmer:
To get ~same execution time for C-60 as for HD6950 task was reduced in 20 times.
But C-60 has 2 CU while HD6950 has 22 CU, 11 times difference, not 20.
Looks like another 9 times of slowdown came from memory used. HD6950 is discrete GPU with own dedicated fast memory while C-60 is APU that uses system memory.
Interesting, what slowdown came from different architectures of these 2 devices....
Navigation
[0] Message Index
[#] Next page
Go to full version