Seti@Home optimized science apps and information
Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: TouchuvGrey on 08 Dec 2010, 09:48:03 pm
-
My old brain has failed me yet again. i know i have seen the instructions
here, but cannot recall where. i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time per card.
My cc_config.xml currently looks like this:
<cc_config>
<log_flags>
<sched_op_debug>1</sched_op_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>
What do i need to change it to ?
-
My old brain has failed me yet again. i know i have seen the instructions
here, but cannot recall where. i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time per card.
My cc_config.xml currently looks like this:
<cc_config>
<log_flags>
<sched_op_debug>1</sched_op_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>
What do i need to change it to ?
You would need to change it in the app_info.xml file, find the line that says count and change it to 0.5
-
...
i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time per card.
...
The 200 series cards like your GTS 250 are not capable of running more than one WU at a time, and there's no way to tell BOINC to treat two cards in one host differently. You'll have to move a card to a different host or give up the idea.
Joe
-
i must be misunderstanding what i am seeing in
that case ( this is not unusual )
12/9/2010 6:18:42 PM SETI@home Restarting task 13ja10aa.7071.21335.12.10.234_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM SETI@home Restarting task 13ja10aa.7071.21335.12.10.228_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM SETI@home Restarting task 13ja10aa.7071.21335.12.10.225_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM SETI@home Restarting task 13ja10aa.7071.21335.12.10.222_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM [wfd]: work fetch start
12/9/2010 6:18:42 PM SETI@home chosen: minor shortfall NVIDIA GPU: 0.00 inst, 936855.26 sec
12/9/2010 6:18:42 PM [wfd] ------- start work fetch state -------
12/9/2010 6:18:42 PM [wfd] target work buffer: 0.86 + 864000.00 sec
12/9/2010 6:18:42 PM [wfd] CPU: shortfall 6848715.19 nidle 7.84 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00
12/9/2010 6:18:42 PM SETI@home chosen: minor shortfall NVIDIA GPU: 0.00 inst, 936855.26 sec
12/9/2010 6:18:42 PM [wfd] ------- start work fetch state -------
12/9/2010 6:18:42 PM [wfd] target work buffer: 0.86 + 864000.00 sec
12/9/2010 6:18:42 PM [wfd] CPU: shortfall 6848715.19 nidle 7.84 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00
12/9/2010 6:18:42 PM SETI@home [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 3887.85 int 86400.00
12/9/2010 6:18:42 PM [wfd] NVIDIA GPU: shortfall 936855.26 nidle 0.00 saturated 394927.13 busy 0.00 RS fetchable 1000.00 runnable 1000.00
12/9/2010 6:18:42 PM SETI@home [wfd] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
12/9/2010 6:18:42 PM SETI@home [wfd] overall LTD -1931022.03
12/9/2010 6:18:42 PM SETI@home [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 3887.85 int 86400.00
12/9/2010 6:18:42 PM [wfd] NVIDIA GPU: shortfall 936855.26 nidle 0.00 saturated 394927.13 busy 0.00 RS fetchable 1000.00 runnable 1000.00
12/9/2010 6:18:42 PM SETI@home [wfd] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
12/9/2010 6:18:42 PM SETI@home [wfd] overall LTD -1931022.03
it looks to me like i'm running 2 WU's on each card. If that is not the case
please enlighten me as to what i'm seeing.
-
200 series ( and even 8800 series like the GTS 250 ::)) *should* run 2 instances at a time fine provided you don't run out of memory. They just won't benefit from doing so directly, since context switch hardware wasn't included until Fermi. Since you have 400 series there, the likely benefit will outweigh any added cost penalty to the older card (Your mileage may vary).
Note that the operation you're seeing is AFTER, many driver revisions & improvements, so I am surprised that it is working also. Joe's statements were quite correct not so long ago (though I can't pinpoint the exact dates/versions of the corrections . Too many changes too quickly ;))
... Cuda 3.1 was most definitely broken with mixing generations in the same host, which I reported through nVidia's registered developer program. These things are fixed in Cuda 3.2. The Cuda 3.0 build in operation, should also be fine as your host shows, just be certain to keep an eye on things ;)
-
I have a done a quite thorough benchmark over at my blog containing performance data of various official/unofficial executables and benefits when doing full WU runs.
Head over there to check..
http://vyper.kafit.se
Kind regards Vyper
-
Head over there to check..
http://vyper.kafit.se
Is it rape when I click on the link? I mean ... it's in Sweden ;D
-
Head over there to check..
http://vyper.kafit.se
Is it rape when I click on the link? I mean ... it's in Sweden ;D
Lol!! No! :D
Regards Vyper
-
200 series ( and even 8800 series like the GTS 250 ::)) *should* run 2 instances at a time fine provided you don't run out of memory. They just won't benefit from doing so directly, since context switch hardware wasn't included until Fermi. Since you have 400 series there, the likely benefit will outweigh any added cost penalty to the older card (Your mileage may vary).
Note that the operation you're seeing is AFTER, many driver revisions & improvements, so I am surprised that it is working also. Joe's statements were quite correct not so long ago (though I can't pinpoint the exact dates/versions of the corrections . Too many changes too quickly ;))
... Cuda 3.1 was most definitely broken with mixing generations in the same host, which I reported through nVidia's registered developer program. These things are fixed in Cuda 3.2. The Cuda 3.0 build in operation, should also be fine as your host shows, just be certain to keep an eye on things ;)
So, err, just so I have it spelled out to me: ;)
My 8800GTX should also be able to run 2 WU's at the same time, but it will not make any difference in the total throughput?
So, bottom-line, the current way it crunches (1 WU at a time) is fine and dandy?
Regards,
Patrick.
-
the key is that newer GPU's can handle the multiple apps running. older cards cannot
-
more precisely, older GPU can benefit from multiple tasks running only if there are big enough periods of idle GPU through app execution. Such idle period should be long enough to offset switching cost.
If not, running few instances per GPU will be counterproductive because of switching overhead.
-
Just to to say ( confirm) that GTX 460 can work three WU in same time. On average after 33 minutes got three results.
-
more precisely, older GPU can benefit from multiple tasks running only if there are big enough periods of idle GPU through app execution. Such idle period should be long enough to offset switching cost.
Astropulse OpenCL version for example benefits a lot when running multiple tasks. Check out my findings here: http://setiathome.berkeley.edu/forum_thread.php?id=62385&nowrap=true#1057180
-
multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...
-
multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...
you do have -instances_per_device 2 set in your app_info don't you?
Claggy
-
multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...
you do have -instances_per_device 2 set in your app_info don't you?
Claggy
No I dont have it
I try to chage <count>1</count> to <count>0.5</count> ( and that is not same , right) ?
-
multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...
you do have -instances_per_device 2 set in your app_info don't you?
Claggy
No I dont have it
I try to chage <count>1</count> to <count>0.5</count> ( and that is not same , right) ?
You need to set both, read the opening post in the Beta thread:
ATI OpenCL MultiBeam beta testing (http://lunatics.kwsn.net/gpu-testing/ati-opencl-multibeam-beta-testing.0.html)
Claggy
-
Same thing again ( with setup you say to do)
first result have progress
second doesnot ( show 0,000%)
-
Same thing again ( with setup you say to do)
first result have progress
second doesnot ( show 0,000%)
I've just tried it on my HD5770, works here with count set to 0.5 (so Boinc will start two instances), and <cmdline>-period_iterations_num 2 -instances_per_device 2</cmdline> set too (so the app will allow two instances to run)
Claggy
Edit: anyway problems with the Beta ATI MB app should be posted in the Beta thread.
-
and here's the two completed Wu's:
resultid=1748608140 (http://setiathome.berkeley.edu/result.php?resultid=1748608140)
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Number of period iterations for PulseFind setted to:2
Number of app instances per device setted to:2
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 0 device, slots 0 to 1 (including) will be checked
Used slot is 0; Info : Building Program (clBuildProgram):main kernels: OK code 0
resultid=1748608136 (http://setiathome.berkeley.edu/result.php?resultid=1748608136)
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Number of period iterations for PulseFind setted to:2
Number of app instances per device setted to:2
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 0 device, slots 0 to 1 (including) will be checked
Used slot is 1; Info : Building Program (clBuildProgram):main kernels: OK code 0
Please check your own results.
Claggy