Author Topic: Multiple WU's per GPU (Read 45637 times)

TouchuvGrey · « **on:** 08 Dec 2010, 09:48:03 pm »

My old brain has failed me yet again. i know i have seen the instructions
here, but cannot recall where. i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time per card.
My cc_config.xml currently looks like this:

<cc_config>
<log_flags>
<sched_op_debug>1</sched_op_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

What do i need to change it to ?

arkayn · « **Reply #1 on:** 08 Dec 2010, 10:48:45 pm »

Quote from: TouchuvGrey on 08 Dec 2010, 09:48:03 pm

My old brain has failed me yet again. i know i have seen the instructions
here, but cannot recall where. i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time per card.
My cc_config.xml currently looks like this:

<cc_config>
<log_flags>
<sched_op_debug>1</sched_op_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

What do i need to change it to ?

You would need to change it in the app_info.xml file, find the line that says count and change it to 0.5

Josef W. Segur · « **Reply #2 on:** 09 Dec 2010, 03:11:11 pm »

Quote from: TouchuvGrey on 08 Dec 2010, 09:48:03 pm

...
i have 2 video cards a GTS 250 and a GTX 460.
i would like to run 2 work units at the same time per card.
...

The 200 series cards like your GTS 250 are not capable of running more than one WU at a time, and there's no way to tell BOINC to treat two cards in one host differently. You'll have to move a card to a different host or give up the idea.
Joe

TouchuvGrey · « **Reply #3 on:** 09 Dec 2010, 08:00:31 pm »

i must be misunderstanding what i am seeing in
that case ( this is not unusual )

12/9/2010 6:18:42 PM   SETI@home   Restarting task 13ja10aa.7071.21335.12.10.234_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM   SETI@home   Restarting task 13ja10aa.7071.21335.12.10.228_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM   SETI@home   Restarting task 13ja10aa.7071.21335.12.10.225_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM   SETI@home   Restarting task 13ja10aa.7071.21335.12.10.222_0 using setiathome_enhanced version 608
12/9/2010 6:18:42 PM      [wfd]: work fetch start
12/9/2010 6:18:42 PM   SETI@home   chosen: minor shortfall NVIDIA GPU: 0.00 inst, 936855.26 sec
12/9/2010 6:18:42 PM      [wfd] ------- start work fetch state -------
12/9/2010 6:18:42 PM      [wfd] target work buffer: 0.86 + 864000.00 sec
12/9/2010 6:18:42 PM      [wfd] CPU: shortfall 6848715.19 nidle 7.84 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00

12/9/2010 6:18:42 PM   SETI@home   chosen: minor shortfall NVIDIA GPU: 0.00 inst, 936855.26 sec
12/9/2010 6:18:42 PM      [wfd] ------- start work fetch state -------
12/9/2010 6:18:42 PM      [wfd] target work buffer: 0.86 + 864000.00 sec
12/9/2010 6:18:42 PM      [wfd] CPU: shortfall 6848715.19 nidle 7.84 saturated 0.00 busy 0.00 RS fetchable 0.00 runnable 0.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 3887.85 int 86400.00
12/9/2010 6:18:42 PM      [wfd] NVIDIA GPU: shortfall 936855.26 nidle 0.00 saturated 394927.13 busy 0.00 RS fetchable 1000.00 runnable 1000.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] overall LTD -1931022.03

12/9/2010 6:18:42 PM   SETI@home   [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 3887.85 int 86400.00
12/9/2010 6:18:42 PM      [wfd] NVIDIA GPU: shortfall 936855.26 nidle 0.00 saturated 394927.13 busy 0.00 RS fetchable 1000.00 runnable 1000.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
12/9/2010 6:18:42 PM   SETI@home   [wfd] overall LTD -1931022.03

it looks to me like i'm running 2 WU's on each card. If that is not the case
please enlighten me as to what i'm seeing.

Jason G · « **Reply #4 on:** 09 Dec 2010, 10:05:19 pm »

200 series ( and even 8800 series like the GTS 250

) *should* run 2 instances at a time fine provided you don't run out of memory. They just won't benefit from doing so directly, since context switch hardware wasn't included until Fermi. Since you have 400 series there, the likely benefit will outweigh any added cost penalty to the older card (Your mileage may vary).

Note that the operation you're seeing is AFTER, many driver revisions & improvements, so I am surprised that it is working also. Joe's statements were quite correct not so long ago (though I can't pinpoint the exact dates/versions of the corrections . Too many changes too quickly

)

... Cuda 3.1 was most definitely broken with mixing generations in the same host, which I reported through nVidia's registered developer program. These things are fixed in Cuda 3.2. The Cuda 3.0 build in operation, should also be fine as your host shows, just be certain to keep an eye on things

Vyper · « **Reply #5 on:** 10 Dec 2010, 05:54:58 am »

I have a done a quite thorough benchmark over at my blog containing performance data of various official/unofficial executables and benefits when doing full WU runs.

Head over there to check..

http://vyper.kafit.se

Kind regards Vyper

Frizz · « **Reply #6 on:** 10 Dec 2010, 06:31:01 am »

Quote from: Vyper on 10 Dec 2010, 05:54:58 am

Head over there to check..

http://vyper.kafit.se

Is it rape when I click on the link? I mean ... it's in Sweden

Vyper · « **Reply #7 on:** 10 Dec 2010, 07:54:36 am »

Quote from: Frizz on 10 Dec 2010, 06:31:01 am

Quote from: Vyper on 10 Dec 2010, 05:54:58 am
Head over there to check..

http://vyper.kafit.se

Is it rape when I click on the link? I mean ... it's in Sweden

Lol!! No!

Regards Vyper

PatrickV2 · « **Reply #8 on:** 10 Dec 2010, 12:13:27 pm »

Quote from: Jason G on 09 Dec 2010, 10:05:19 pm

200 series ( and even 8800 series like the GTS 250 ) *should* run 2 instances at a time fine provided you don't run out of memory. They just won't benefit from doing so directly, since context switch hardware wasn't included until Fermi. Since you have 400 series there, the likely benefit will outweigh any added cost penalty to the older card (Your mileage may vary).

Note that the operation you're seeing is AFTER, many driver revisions & improvements, so I am surprised that it is working also. Joe's statements were quite correct not so long ago (though I can't pinpoint the exact dates/versions of the corrections . Too many changes too quickly )

... Cuda 3.1 was most definitely broken with mixing generations in the same host, which I reported through nVidia's registered developer program. These things are fixed in Cuda 3.2. The Cuda 3.0 build in operation, should also be fine as your host shows, just be certain to keep an eye on things

So, err, just so I have it spelled out to me:

My 8800GTX should also be able to run 2 WU's at the same time, but it will not make any difference in the total throughput?

So, bottom-line, the current way it crunches (1 WU at a time) is fine and dandy?

Regards,

Patrick.

skildude · « **Reply #9 on:** 10 Dec 2010, 12:55:22 pm »

the key is that newer GPU's can handle the multiple apps running. older cards cannot

Raistmer · « **Reply #10 on:** 10 Dec 2010, 01:20:14 pm »

more precisely, older GPU can benefit from multiple tasks running only if there are big enough periods of idle GPU through app execution. Such idle period should be long enough to offset switching cost.
If not, running few instances per GPU will be counterproductive because of switching overhead.

Pepi · « **Reply #11 on:** 22 Dec 2010, 07:25:43 pm »

Just to to say ( confirm) that GTX 460 can work three WU in same time. On average after 33 minutes got three results.

Frizz · « **Reply #12 on:** 23 Dec 2010, 05:01:19 am »

Quote from: Raistmer on 10 Dec 2010, 01:20:14 pm

more precisely, older GPU can benefit from multiple tasks running only if there are big enough periods of idle GPU through app execution. Such idle period should be long enough to offset switching cost.

Astropulse OpenCL version for example benefits a lot when running multiple tasks. Check out my findings here: http://setiathome.berkeley.edu/forum_thread.php?id=62385&nowrap=true#1057180

Pepi · « **Reply #13 on:** 23 Dec 2010, 06:18:59 am »

multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...

Claggy · « **Reply #14 on:** 23 Dec 2010, 06:23:07 am »

Quote from: Pepi on 23 Dec 2010, 06:18:59 am

multibeam WU
I have 5670 and 5770. And both cannot run 2 WU in same time: Boinc show normal progress for first WU and no progress from second WU, But when first WU is finished then second starts. Can you help me
5670 has 512 MB ram
5770 has 1024 MB ram
so I think they have enough memory to process wu. And they are so slow compared to cuda...

you do have -instances_per_device 2 set in your app_info don't you?

Claggy

Author Topic: Multiple WU's per GPU (Read 45637 times)

TouchuvGrey

Multiple WU's per GPU

arkayn

Re: Multiple WU's per GPU

Josef W. Segur

Re: Multiple WU's per GPU

TouchuvGrey

Re: Multiple WU's per GPU

Jason G

Re: Multiple WU's per GPU

Vyper

Re: Multiple WU's per GPU

Frizz

Re: Multiple WU's per GPU

Vyper

Re: Multiple WU's per GPU

PatrickV2

Re: Multiple WU's per GPU

skildude

Re: Multiple WU's per GPU

Raistmer

Re: Multiple WU's per GPU

Pepi

Re: Multiple WU's per GPU

Frizz

Re: Multiple WU's per GPU

Pepi

Re: Multiple WU's per GPU

Claggy

Re: Multiple WU's per GPU