Author Topic: CUDA MB V12b for multi-GPU multicore hosts. (Read 29320 times)

Raistmer · « **Reply #30 on:** 27 Dec 2009, 02:51:56 pm »

Ok, my own and Pappa's tests show that V12b unneeded for single GPU hosts.
That is, no advantage to use on non-targed hardware.

Raistmer · « **Reply #31 on:** 27 Dec 2009, 02:55:17 pm »

@glennaxl
PM sent.

Sutaru Tsureku · « **Reply #32 on:** 30 Dec 2009, 08:39:08 pm »

Raistmer, I have one, two questions..

SETI@home/NC subforum/'eFMer Priority' - (CUDA) app priority change- [Message 959115]

Which forum you prefer, here or there?

Raistmer · « **Reply #33 on:** 31 Dec 2009, 04:46:15 am »

Quote from: Sutaru Tsureku on 30 Dec 2009, 08:39:08 pm

Raistmer, I have one, two questions..

SETI@home/NC subforum/'eFMer Priority' - (CUDA) app priority change- [Message 959115]

Which forum you prefer, here or there?

I answered most of these questions few times already.
Once more - no, Windows not smart enough to do priority based scheduling through different cores.
About changes between V12b and V12 - most probably small speed variations you see comes just from different binaries.
If you pay attention to other tests published you will see that this difference pretty variable one.
+ limiting number of cores available for app will inherently lead to some small (or not so small - it strongly depends from system load pattern) slowdown.
This was explained in some thread too. This is some kind of tradeoff: loosing some speed under one set of conditions to get boost under another.
And again once more, V12b intended to fight possible x2 (or x3 if 3 GPU processes launched on same core) slowdowns. If you don't see such slowdowns with V12 now - no reason to use V12b.
About priority - still didn't see your tests w/o EFmer's priority tool for V12b. Do you see difference in speed or not??

Raistmer · « **Reply #34 on:** 16 Jan 2010, 09:31:00 am »

@glennaxl
could you attach (zipped) logs from those 8 CPU benchmarks you ran with 3 or 2 GPUs together for results you posted earlier. It will give full picture, thanks in advance.

glennaxl · « **Reply #35 on:** 16 Jan 2010, 10:03:32 am »

Quote from: Raistmer on 16 Jan 2010, 09:31:00 am

@glennaxl
could you attach (zipped) logs from those 8 CPU benchmarks you ran with 3 or 2 GPUs together for results you posted earlier. It will give full picture, thanks in advance.

Here it is. Its good I still have those logs, i don't have to re-run it.

[attachment deleted by admin]

Raistmer · « **Reply #36 on:** 16 Jan 2010, 10:27:06 am »

Thanks a lot, will see what can I study from them

Raistmer · « **Reply #37 on:** 16 Jan 2010, 02:10:16 pm »

Quote from: glennaxl on 16 Jan 2010, 10:03:32 am

Quote from: Raistmer on 16 Jan 2010, 09:31:00 am
@glennaxl
could you attach (zipped) logs from those 8 CPU benchmarks you ran with 3 or 2 GPUs together for results you posted earlier. It will give full picture, thanks in advance.

Here it is. Its good I still have those logs, i don't have to re-run it.

Unfortunately very old version of KNA bench was taken. It only reports elapsed time, w/o CPU time, and only in integer number of seconds.
But will see what picture we have with such data at least...

Raistmer · « **Reply #38 on:** 16 Jan 2010, 03:06:37 pm »

Ok, results from one of glennaxl hosts, with 2 GPUs one:

What expected: higher load on first 4 CPUs. ~~What unexpected - sometimes bigger load on CPUs with higher numbers. Here both groups sometime over-loaded and sometimes not - it's strange~~.
EDIT:
Actually, cause CPU tasks go w/o affinity, from task to task CPU, assigned for particular bench number can change. So CPU results completely expected! 4 cores always have higher load than 4 another.
Interesting to test x4 build on same host. Here I would expect only 2 overloaded cores instead of 4.

Elapsed times for GPU apps don't allow to chose the best app IMO.

If additional tests on that host possible what I would love to have:

1) benchmark script replaced on something more new, possible samples attached to this post. Lack of CPU times for GPU app is very sad.
2) test-wu6 can be excluded completely. It VLAR-killed anyway.
3) No need so much work on CPU now. GPU loaded only ~350 seconds and CPU loaded ~1600 seconds total. If CPU would be loaded slightly longer that GPU it would be OK for my purposes and save time for productive crunching

(although nothing wrong with doing all test WUs on CPU too).
4) Slightly changed experiment conditions:
a) single GPU0 run, w/o CPU loaded at all.
b) single GPU0 run with CPU fully loaded as here.
c) separate (it's important) run for V12 with both GPU + all CPU loaded.
d) again, separate run for V12b both GPU, all CPU.
e) separate run for V12b x4, both GPU, all CPU.

Is it possible to perform these additional tests?

[attachment deleted by admin]

Raistmer · « **Reply #39 on:** 16 Jan 2010, 04:07:56 pm »

And here data for another host.
Looks like third GPU likes x4 build

If possible same new set of tests would be very nice to have for this host too.

glennaxl · « **Reply #40 on:** 23 Jan 2010, 10:08:14 am »

Another round of tests, same rigs as before. A 3rd rig test is coming but somehow its crashing on test b. 3rd rig is a q6600, dual gtx 260 on p55 chipset board.

Test Cases:
TEST A: 1 GPU, No CPU
TEST B: 1 GPU, 100% CPU
TEST C: v12 ALL GPU, 100% CPU
TEST D: v12b ALL GPU, 100% CPU
TEST E: v12 x4 ALL GPU, 100% CPU

TEST A:
1) a. GPU0 @ GTX295-CORE0 (v12 vs v12b)
b. GPU0 @ GTX295-CORE0 (v12 vs v12b x4)
2) a. GPU1 @ GTX260 (v12 vs v12b)
b. GPU1 @ GTX260 (v12 vs v12b x4)
3) a. GPU2 @ GTX295-CORE1 (v12 vs v12b)
b. GPU2 @ GTX295-CORE1 (v12 vs v12b x4)

TEST B:
CPU0-7 @i7 920 (AKv8 vs AKv8b)
1) a. GPU0 @ GTX295-CORE0 (v12 vs v12b)
b. GPU0 @ GTX295-CORE0 (v12 vs v12b x4)
2) a. GPU1 @ GTX260 (v12 vs v12b)
b. GPU1 @ GTX260 (v12 vs v12b x4)
3) a. GPU2 @ GTX295-CORE1 (v12 vs v12b)
b. GPU2 @ GTX295-CORE1 (v12 vs v12b x4)

TEST C:
1) GPU0 @ GTX295-CORE0 (stock609 vs v12)
GPU1 @ GTX260 (stock609 vs v12)
GPU2 @ GTX295-CORE1 (stock609 vs v12)
CPU0-7 @i7 920 (AKv8 vs AKv8b)

TEST D:
1) GPU0 @ GTX295-CORE0 (v12 vs v12b)
GPU1 @ GTX260 (v12 vs v12b)
GPU2 @ GTX295-CORE1 (v12 vs v12b)
CPU0-7 @i7 920 (AKv8 vs AKv8b)

TEST E:
1) GPU0 @ GTX295-CORE0 (v12 vs v12b x4)
GPU1 @ GTX260 (v12 vs v12b x4)
GPU2 @ GTX295-CORE1 (v12 vs v12b x4)
CPU0-7 @i7 920 (AKv8 vs AKv8b)

[attachment deleted by admin]

glennaxl · « **Reply #41 on:** 23 Jan 2010, 02:12:32 pm »

The 3rd rig I mentioned - upgraded to 196.21 from 195.62 and it fix the issue.

It seems the speed up is less on Q6600 than i7 920.

[attachment deleted by admin]

Raistmer · « **Reply #42 on:** 23 Jan 2010, 02:15:47 pm »

Thanks a lot!
Will look at results.

glennaxl · « **Reply #43 on:** 23 Jan 2010, 06:25:39 pm »

Results are from Test D and E.

Raistmer · « **Reply #44 on:** 23 Jan 2010, 06:36:25 pm »

Looks like V12b has some sense for hosts with 3 GPU ~~but not for host with only 2 GPUs.~~ [Rig 3 2-GPUs only too...]
V12b x4 takes 1 CPU only and for i7 CPU it will mean that 2 instanses sitting on same physical core because of HyperThreading. It's sub-optimal of course so x4 results almost always worser.
V12b takes 2 CPUs per instance that is, always full i7 core, but using only first 4 CPUs so again, 3 instanses will use 2 i7 cores instead of 3.
Will try to do some i7-related tuning and maybe results will be more clearer...

Author Topic: CUDA MB V12b for multi-GPU multicore hosts. (Read 29320 times)

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Sutaru Tsureku

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

glennaxl

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

glennaxl

Re: CUDA MB V12b for multi-GPU multicore hosts.

glennaxl

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

glennaxl

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.