Forum > GPU crunching
CUDA MB V12b for multi-GPU multicore hosts.
glennaxl:
--- Quote from: Raistmer on 16 Jan 2010, 09:31:00 am ---@glennaxl
could you attach (zipped) logs from those 8 CPU benchmarks you ran with 3 or 2 GPUs together for results you posted earlier. It will give full picture, thanks in advance.
--- End quote ---
Here it is. Its good I still have those logs, i don't have to re-run it.
[attachment deleted by admin]
Raistmer:
Thanks a lot, will see what can I study from them :)
Raistmer:
--- Quote from: glennaxl on 16 Jan 2010, 10:03:32 am ---
--- Quote from: Raistmer on 16 Jan 2010, 09:31:00 am ---@glennaxl
could you attach (zipped) logs from those 8 CPU benchmarks you ran with 3 or 2 GPUs together for results you posted earlier. It will give full picture, thanks in advance.
--- End quote ---
Here it is. Its good I still have those logs, i don't have to re-run it.
--- End quote ---
Unfortunately very old version of KNA bench was taken. It only reports elapsed time, w/o CPU time, and only in integer number of seconds.
But will see what picture we have with such data at least...
Raistmer:
Ok, results from one of glennaxl hosts, with 2 GPUs one:
What expected: higher load on first 4 CPUs. What unexpected - sometimes bigger load on CPUs with higher numbers. Here both groups sometime over-loaded and sometimes not - it's strange.
EDIT:
Actually, cause CPU tasks go w/o affinity, from task to task CPU, assigned for particular bench number can change. So CPU results completely expected! 4 cores always have higher load than 4 another.
Interesting to test x4 build on same host. Here I would expect only 2 overloaded cores instead of 4.
Elapsed times for GPU apps don't allow to chose the best app IMO.
If additional tests on that host possible what I would love to have:
1) benchmark script replaced on something more new, possible samples attached to this post. Lack of CPU times for GPU app is very sad.
2) test-wu6 can be excluded completely. It VLAR-killed anyway.
3) No need so much work on CPU now. GPU loaded only ~350 seconds and CPU loaded ~1600 seconds total. If CPU would be loaded slightly longer that GPU it would be OK for my purposes and save time for productive crunching :) (although nothing wrong with doing all test WUs on CPU too).
4) Slightly changed experiment conditions:
a) single GPU0 run, w/o CPU loaded at all.
b) single GPU0 run with CPU fully loaded as here.
c) separate (it's important) run for V12 with both GPU + all CPU loaded.
d) again, separate run for V12b both GPU, all CPU.
e) separate run for V12b x4, both GPU, all CPU.
Is it possible to perform these additional tests?
[attachment deleted by admin]
Raistmer:
And here data for another host.
Looks like third GPU likes x4 build :)
If possible same new set of tests would be very nice to have for this host too.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version