Seti@Home optimized science apps and information
Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: Raistmer on 16 Jul 2012, 07:16:12 am
-
New switch added:
-initial_ffa_sleep N M
where N - number of ms to sleep in short PC-FFA, M - number of ms to sleep in large FFA. This sleep will occur before polling for event loop in -use_sleep case (and this sleep independed from -use-sleep switch)
Recommended usage:
1) do test run with -v 2 -use_sleep options.
2) look into stderr.txt for usual sleeping times for short and large FFA (they will differ considerably)
3) Enter those usual values (or those -1ms) into this new switch parameters fialed.
4) additional run with this param "+" -v 2 -use_sleep can be done to check if sleep loop times now much smaller (1-2ms). Then -use_sleep can be omitted at all.
Take care, this switch requires exactly 2 params (2 integer numbers separated with space), not 1.
-
Bench for GTX 260 + Core2Duo 6420 (Conroe), CPU idle, OS Windows Server 2003 x64, driver 263.06 attached; dependence from unroll param. More will come later.
-
Bench for GTX 260 + Core2Duo 6420 (Conroe), CPU idle, OS Windows Server 2003 x64, driver 263.06 attached; dependence from unroll param. More will come later.
Is this new switch for all (NVidia & AMD?ATI) devices/GPUs or NVidia only.
-
It's for all GPU AP builds.
[But how helpful it would be for particular vendor/device/driver config - need to test in each particular case]
-
Works very well on my 7970. many non zeroed WU's completing in less than an hour.
No errors to report.
-
It's for all GPU AP builds.
[But how helpful it would be for particular vendor/device/driver config - need to test in each particular case]
Raistmer, do you have AMD/ATI 5000/6000/7000 series of GPU(s), since you're the man, doing most of the coding, testing, IIRC, Jason Gee; Richard Haselgrove
and forgot somebody, too but you're putting a lot of time in this project and should
have the necessary equipment, IMHO.
If not, you're have to get one, I think and willing to pay for one or part of?! Just PM ::)
(Also have a HD4850 & HD5770 lying and not using atm. cause my VISTA rig has
strange failliars, could be PSU related cause it's only 350Watt).
Hope, you don't mind asking this,
Fredericx51.
-
Raistmer, do you have AMD/ATI 5000/6000/7000 series of GPU(s),
Currently I have HD6950 installed in one host, bought on SETI project members donations and GTX 260, donated too and sent by Mike to me, installed in another host.
Also I have own HD4870, GSO9600, GT9500, 8600 (or8500?), but not installed. I bought PCI->PCI-e adaptor on eBay, tested it on some AMD64 host and perhaps will install it + some of these cards into another AMD64 host, Winchester based one.
But I currently develop on C-60 based netbook so most of debugging and testing going there (it's AMD's APU: CPU+OpenCL-capable GPU in single chip). All other architectures covered by our excellent alpha testers.
-
Hi Raistmer,
What would be the correct setting based on these values:?
In FFA -2048 before main loop buffer freeing
Awaited 40 ms for completion
PC_inner_ffa result is: 0
Awaited 27 ms for completion
PC_inner_ffa result is: 0
Awaited 27 ms for completion
PC_inner_ffa result is: 0
Awaited 26 ms for completion
PC_inner_ffa result is: 0
Before FFA buffer release, end of FFA -2048
In FFA 2048 before main loop buffer freeing
Awaited 38 ms for completion
PC_inner_ffa result is: 0
Awaited 28 ms for completion
PC_inner_ffa result is: 0
Awaited 27 ms for completion
PC_inner_ffa result is: 0
Awaited 26 ms for completion
PC_inner_ffa result is: 0
Before FFA buffer release, end of FFA 2048
In FFA -2064 before main loop buffer freeing
Awaited 40 ms for completion
PC_inner_ffa result is: 0
Awaited 28 ms for completion
PC_inner_ffa result is: 0
Awaited 27 ms for completion
PC_inner_ffa result is: 0
Awaited 26 ms for completion
PC_inner_ffa result is: 0
Before FFA buffer release, end of FFA -2064
In FFA 2064 before main loop buffer freeing
-initial_ffa_sleep 26 -2064 ?
or
-initial_ffa_sleep 40 2048 ?
As the crunching of the task progresses, these values are increasing, meaning that the negative and positive value gets larger, as well as the "Awaited xx ms" positive value.
At 50% crunched it's like this:
Before FFA buffer release, end of FFA -8448
In FFA 8448 before main loop buffer freeing
Awaited 120 ms for completion
PC_inner_ffa result is: 0
Awaited 108 ms for completion
PC_inner_ffa result is: 0
Awaited 106 ms for completion
PC_inner_ffa result is: 0
Awaited 104 ms for completion
PC_inner_ffa result is: 0
Awaited 103 ms for completion
PC_inner_ffa result is: 0
Awaited 100 ms for completion
PC_inner_ffa result is: 0
Awaited 96 ms for completion
PC_inner_ffa result is: 0
Awaited 95 ms for completion
PC_inner_ffa result is: 0
Awaited 95 ms for completion
PC_inner_ffa result is: 0
Awaited 92 ms for completion
PC_inner_ffa result is: 0
Awaited 89 ms for completion
PC_inner_ffa result is: 0
Awaited 88 ms for completion
PC_inner_ffa result is: 0
Awaited 88 ms for completion
PC_inner_ffa result is: 0
Awaited 87 ms for completion
PC_inner_ffa result is: 0
Awaited 56 ms for completion
PC_inner_ffa result is: 0
Before FFA buffer release, end of FFA 8448
-
You can try -initial_ffa_sleep 26 95 then. And see if it saves any CPU time and how much it will increase elapsed time.
EDIT: and positive/negative number that increases over time is DM value, should be ignored for this particular purpose, it's not a time count.
-
Hi Raistmer and everybody else,
I've decided to get back to AP crunching on my GTX260 in hope to bypass current server issues (and limits), so I have few questions:
- Is r1363 latest release?
- Do I need -initial_ffa_sleep N M switch to run this app?
- Can I use my old cmdline params <cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 10 -instances_per_device 1 -no_cpu_lock</cmdline>
/edit (I just saw from Raistmer's r1316 opening post, that I don't need -no cpu lock switch)
- In above mentioned post there is app_info section for ATI GPU's in which I couldn't find file_info and file_ref parts for AstroPulse_Kernels_r1363.cl file. I used to have these when I was running r521, so are they obsolete, or not needed in ATI setup?
- Can I stay with 266.58 drivers? (I don't like high cpu usage because I crunch AP wus on all four cores)
Thanks in advance :)
-
Its latest official release yes.
And you can still use old cmdline params.
-
Is this OK?
<app>
<name>astropulse_v6</name>
</app>
<file_info>
<name>AP6_win_x86_SSE2_OpenCL_NV_r1363.exe</name>
<executable/>
</file_info>
<file_info>
<name>AstroPulse_Kernels_r1363.cl</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v6</app_name>
<version_num>604</version_num>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.20</max_ncpus>
<plan_class>cuda</plan_class>
<flops>475000000000</flops>
<cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 10 -instances_per_device 1</cmdline>
<file_ref>
<file_name>AP6_win_x86_SSE2_OpenCL_NV_r1363.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r1363.cl</file_name>
<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
</app_version>
-
It should work.
But you dont need to mention the cl file any longer.
Evenso _instance_per_device 1 is needless.
Count 1 is enough now.
Mine looks like this.
<app>
<name>astropulse_v6</name>
</app>
<file_info>
<name>AP6_win_x86_SSE2_OpenCL_ATI_r1363.exe</name>
<executable/>
</file_info>
<file_info>
<name>ap_cmdline.txt</name>
</file_info>
<app_version>
<app_name>astropulse_v6</app_name>
<version_num>601</version_num>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>ati13ati</plan_class>
<coproc>
<type>ATI</type>
<count>0.5</count>
</coproc>
<file_ref>
<file_name>AP6_win_x86_SSE2_OpenCL_ATI_r1363.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>ap_cmdline.txt</file_name>
</file_ref>
</app_version>
ap_cmdline.txt includes the params.
Mike
-
It's working ! Thanks.
Still I'd like to know about -initial_ffa_sleep N M switch and which is recomended driver for my old GTX260 ;)
edit: Here is my first result (http://setiathome.berkeley.edu/result.php?resultid=2719777604) with new app. Should I be worried about infos and warnings about opening some binary kernel files? I've noticed that those files were created in my data folder.
-
If they were created - no reason to worry.
-initial_ffa_sleep N M is experimental switch provided in case someone finds it useful for own host.
Recommended driver (for OpenCL NV app) is: 263.06