Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: TouchuvGrey on 08 Dec 2010, 09:48:03 pm

Title: Multiple WU's per GPU
Post by: TouchuvGrey on 08 Dec 2010, 09:48:03 pm
My old brain has failed me yet again. i know i have seen the instructions
here, but cannot recall where. i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time  per card.
My cc_config.xml currently looks like this:

<cc_config>
<log_flags>
<sched_op_debug>1</sched_op_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

            What do i need to change it to ?
Title: Re: Multiple WU's per GPU
Post by: arkayn on 08 Dec 2010, 10:48:45 pm
My old brain has failed me yet again. i know i have seen the instructions
here, but cannot recall where. i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time  per card.
My cc_config.xml currently looks like this:

<cc_config>
<log_flags>
<sched_op_debug>1</sched_op_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

            What do i need to change it to ?

You would need to change it in the app_info.xml file, find the line that says count and change it to 0.5
Title: Re: Multiple WU's per GPU
Post by: Josef W. Segur on 09 Dec 2010, 03:11:11 pm
...
i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time  per card.
...

The 200 series cards like your GTS 250 are not capable of running more than one WU at a time, and there's no way to tell BOINC to treat two cards in one host differently. You'll have to move a card to a different host or give up the idea.
                                                                                               Joe
Title: Re: Multiple WU's per GPU
Post by: TouchuvGrey on 09 Dec 2010, 08:00:31 pm
i must be misunderstanding what i am seeing in
that case ( this is not unusual )

12/9/2010 6:18:42 PM   SETI@home   Restarting task 13ja10aa.7071.21335.12.10.234_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM   SETI@home   Restarting task 13ja10aa.7071.21335.12.10.228_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM   SETI@home   Restarting task 13ja10aa.7071.21335.12.10.225_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM   SETI@home   Restarting task 13ja10aa.7071.21335.12.10.222_0 using setiathome_enhanced version 608

12/9/2010 6:18:42 PM      [wfd]: work fetch start
12/9/2010 6:18:42 PM   SETI@home   chosen: minor shortfall NVIDIA GPU: 0.00 inst, 936855.26 sec
12/9/2010 6:18:42 PM      [wfd] ------- start work fetch state -------
12/9/2010 6:18:42 PM      [wfd] target work buffer: 0.86 + 864000.00 sec
12/9/2010 6:18:42 PM      [wfd] CPU: shortfall 6848715.19 nidle 7.84 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00


12/9/2010 6:18:42 PM   SETI@home   chosen: minor shortfall NVIDIA GPU: 0.00 inst, 936855.26 sec
12/9/2010 6:18:42 PM      [wfd] ------- start work fetch state -------
12/9/2010 6:18:42 PM      [wfd] target work buffer: 0.86 + 864000.00 sec
12/9/2010 6:18:42 PM      [wfd] CPU: shortfall 6848715.19 nidle 7.84 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 3887.85 int 86400.00
12/9/2010 6:18:42 PM      [wfd] NVIDIA GPU: shortfall 936855.26 nidle 0.00 saturated 394927.13 busy 0.00 RS fetchable 1000.00 runnable 1000.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] overall LTD -1931022.03

12/9/2010 6:18:42 PM   SETI@home   [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 3887.85 int 86400.00
12/9/2010 6:18:42 PM      [wfd] NVIDIA GPU: shortfall 936855.26 nidle 0.00 saturated 394927.13 busy 0.00 RS fetchable 1000.00 runnable 1000.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] overall LTD -1931022.03

it looks to me like i'm running 2 WU's on each card. If that is not the case
please enlighten me as to what i'm seeing.
Title: Re: Multiple WU's per GPU
Post by: Jason G on 09 Dec 2010, 10:05:19 pm
200 series ( and even 8800 series like the GTS 250  ::))  *should* run 2 instances at a time fine provided you don't run out of memory.  They just won't benefit from doing so directly, since context switch hardware wasn't included until Fermi.  Since you have 400 series there, the likely benefit will outweigh any added cost penalty to the older card (Your mileage may vary).

Note that the operation you're seeing is AFTER, many driver revisions & improvements, so I am surprised that it is working also.  Joe's statements were quite correct not so long ago (though I can't pinpoint the exact dates/versions of the corrections .  Too many changes too quickly  ;))

... Cuda 3.1 was most definitely broken with mixing generations in the same host, which I reported through nVidia's registered developer program.  These things are fixed in Cuda 3.2.  The Cuda 3.0 build in operation, should also be fine as your host shows, just be certain to keep an eye on things  ;)
Title: Re: Multiple WU's per GPU
Post by: Vyper on 10 Dec 2010, 05:54:58 am
I have a done a quite thorough benchmark over at my blog containing performance data of various official/unofficial executables and benefits when doing full WU runs.

Head over there to check..

http://vyper.kafit.se

Kind regards Vyper
Title: Re: Multiple WU's per GPU
Post by: Frizz on 10 Dec 2010, 06:31:01 am
Head over there to check..

http://vyper.kafit.se

Is it rape when I click on the link? I mean ... it's in Sweden  ;D
Title: Re: Multiple WU's per GPU
Post by: Vyper on 10 Dec 2010, 07:54:36 am
Head over there to check..

http://vyper.kafit.se

Is it rape when I click on the link? I mean ... it's in Sweden  ;D

Lol!! No! :D

Regards Vyper
Title: Re: Multiple WU's per GPU
Post by: PatrickV2 on 10 Dec 2010, 12:13:27 pm
200 series ( and even 8800 series like the GTS 250  ::))  *should* run 2 instances at a time fine provided you don't run out of memory.  They just won't benefit from doing so directly, since context switch hardware wasn't included until Fermi.  Since you have 400 series there, the likely benefit will outweigh any added cost penalty to the older card (Your mileage may vary).

Note that the operation you're seeing is AFTER, many driver revisions & improvements, so I am surprised that it is working also.  Joe's statements were quite correct not so long ago (though I can't pinpoint the exact dates/versions of the corrections .  Too many changes too quickly  ;))

... Cuda 3.1 was most definitely broken with mixing generations in the same host, which I reported through nVidia's registered developer program.  These things are fixed in Cuda 3.2.  The Cuda 3.0 build in operation, should also be fine as your host shows, just be certain to keep an eye on things  ;)

So, err, just so I have it spelled out to me: ;)

My 8800GTX should also be able to run 2 WU's at the same time, but it will not make any difference in the total throughput?

So, bottom-line, the current way it crunches (1 WU at a time) is fine and dandy?

Regards,

Patrick.
Title: Re: Multiple WU's per GPU
Post by: skildude on 10 Dec 2010, 12:55:22 pm
the key is that newer GPU's can handle the multiple apps running.  older cards cannot
Title: Re: Multiple WU's per GPU
Post by: Raistmer on 10 Dec 2010, 01:20:14 pm
more precisely, older GPU can benefit from multiple tasks running only if there are big enough periods of idle GPU through app execution. Such idle period should be long enough to offset switching cost.
If not, running few instances per GPU will be counterproductive because of switching overhead.
Title: Re: Multiple WU's per GPU
Post by: Pepi on 22 Dec 2010, 07:25:43 pm
Just to to say ( confirm) that GTX 460  can work three WU  in same time. On average after 33 minutes got three results. 
Title: Re: Multiple WU's per GPU
Post by: Frizz on 23 Dec 2010, 05:01:19 am
more precisely, older GPU can benefit from multiple tasks running only if there are big enough periods of idle GPU through app execution. Such idle period should be long enough to offset switching cost.

Astropulse OpenCL version for example benefits a lot when running multiple tasks. Check out my findings here: http://setiathome.berkeley.edu/forum_thread.php?id=62385&nowrap=true#1057180
Title: Re: Multiple WU's per GPU
Post by: Pepi on 23 Dec 2010, 06:18:59 am
multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...
Title: Re: Multiple WU's per GPU
Post by: Claggy on 23 Dec 2010, 06:23:07 am
multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...
you do have -instances_per_device 2 set in your app_info don't you?

Claggy
Title: Re: Multiple WU's per GPU
Post by: Pepi on 23 Dec 2010, 06:28:48 am
multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...
you do have -instances_per_device 2 set in your app_info don't you?

Claggy

No I dont have it
I try to chage <count>1</count> to <count>0.5</count> ( and that is not same , right) ?
Title: Re: Multiple WU's per GPU
Post by: Claggy on 23 Dec 2010, 06:35:08 am
multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...
you do have -instances_per_device 2 set in your app_info don't you?

Claggy

No I dont have it
I try to chage <count>1</count> to <count>0.5</count> ( and that is not same , right) ?
You need to set both, read the opening post in the Beta thread:

ATI OpenCL MultiBeam beta testing (http://lunatics.kwsn.net/gpu-testing/ati-opencl-multibeam-beta-testing.0.html)

Claggy
Title: Re: Multiple WU's per GPU
Post by: Pepi on 23 Dec 2010, 06:42:05 am
Same thing again ( with setup  you say to do)

first result have progress
second doesnot ( show 0,000%)
Title: Re: Multiple WU's per GPU
Post by: Claggy on 23 Dec 2010, 07:09:05 am
Same thing again ( with setup  you say to do)

first result have progress
second doesnot ( show 0,000%)
I've just tried it on my HD5770, works here with count set to 0.5 (so Boinc will start two instances), and  <cmdline>-period_iterations_num 2 -instances_per_device 2</cmdline> set too (so the app will allow two instances to run)

Claggy   

Edit: anyway problems with the Beta ATI MB app should be posted in the Beta thread.
Title: Re: Multiple WU's per GPU
Post by: Claggy on 23 Dec 2010, 08:21:13 am
and here's the two completed Wu's:

resultid=1748608140 (http://setiathome.berkeley.edu/result.php?resultid=1748608140)

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Number of period iterations for PulseFind setted to:2
Number of app instances per device setted to:2
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 0 device, slots 0 to 1 (including) will be checked
Used slot is 0;   Info : Building Program (clBuildProgram):main kernels: OK code 0

resultid=1748608136 (http://setiathome.berkeley.edu/result.php?resultid=1748608136)

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Number of period iterations for PulseFind setted to:2
Number of app instances per device setted to:2
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 0 device, slots 0 to 1 (including) will be checked
Used slot is 1;   Info : Building Program (clBuildProgram):main kernels: OK code 0

Please check your own results.

Claggy