Forum > GPU crunching

CUDA MB V12b for multi-GPU multicore hosts.

<< < (6/9) > >>

Raistmer:
@glennaxl
Unfortunately, final bench output contains only CPU times, no elapsed times provided.
Could you attach log files from TestDatas directory too, please?
And could you give some description of used environment:
in what conditions bench running? some background CPU tasks enabled or only single GPU in work and second GPU + CPU cores sit idle during test?

I gave some explanations of idea behind V12b mod on SETI main, but will try to explain it here again. Better understanding of idea could lead to better benchmark configs.

What we can see with prev versions on hosts running 2 or more CUDA MB processes and 2,4 or 8 (duo, quad or i7) CPU MB/AP processes (or CPU-based apps from another projects, it doesn't matter):
windows can pair 2 CUDA MB processes on single core leaving all other cores for CPU processes. This will heavely increase CUDA MB initialization times and reduce performance.
One solution for this - leave CPU cores idle (i.e., not dong CPU-based apps at all). But this will reduce host performance.
Another one, implemented in V12b, restrict available cores for GPU app making them reside on different cores.
In case of otherwise idle CPU cores it can/will reduce app performance (cause when Windows uses core for its own needs it can't move GPU process to another idle core). That's what we see for standalone test when all other cores/GPUs idle.
The possible advantage of V12b can be highlighted (or will be proved that there are no benefit at all ;) ) by measuring GPU both elapsed and CPU times in next config:
for i7 CPU:
8 KNA bench running with same/different test tasks in separate directories, CPU MB app
+
2 KNA bench in separate directories running a)V12 b)V12b.
Possible difference between timings a) and b) cases could give valuable info.
But again, full-loaded system required for this test (!)

And some note: due to variable nature of Windows sheduling decisions a) should be completed few times (cause sometime GPU apps (provided number of GPUs less than numbers of cores) can be paired with each other on single core, sometimes - not.

Order of bench launches should be:
CPU apps first (!),
GPU apps - second (GPU apps should be launched on already busy system, this precisely emulates usual BOINC state).

glennaxl:
@Raistmer
As requested:
-created 11 kna bench folders (8 cpu &  3 GPU)
-launch them, cpu first then gpu
also
-modified the script to run to specific device (-device n, where n is the device number)
-tweak the cpuz reporting using the latest version for correct info


--- Code: ---Quick timetable for GPU0 (gtx295)
 
WU : testWU-1.wu
MB_6.08_CUDA_V12_VLARKill_FPLim2048_test.exe : 29 seconds
MB_6.08_mod_CUDA_V12b.exe : 31 seconds
Speedup: -6.90%, Ratio: 0.94 x
MB_6.08_mod_CUDA_V12b_x4.exe : 37 seconds
Speedup: -27.59%, Ratio: 0.78 x
 
WU : testWU-2.wu
MB_6.08_CUDA_V12_VLARKill_FPLim2048_test.exe : 32 seconds
MB_6.08_mod_CUDA_V12b.exe : 35 seconds
Speedup: -9.38%, Ratio: 0.91 x
MB_6.08_mod_CUDA_V12b_x4.exe : 42 seconds
Speedup: -31.25%, Ratio: 0.76 x
 
WU : testWU-3.wu
MB_6.08_CUDA_V12_VLARKill_FPLim2048_test.exe : 34 seconds
MB_6.08_mod_CUDA_V12b.exe : 39 seconds
Speedup: -14.71%, Ratio: 0.87 x
MB_6.08_mod_CUDA_V12b_x4.exe : 39 seconds
Speedup: -14.71%, Ratio: 0.87 x
 
WU : testWU-4.wu
MB_6.08_CUDA_V12_VLARKill_FPLim2048_test.exe : 30 seconds
MB_6.08_mod_CUDA_V12b.exe : 26 seconds
Speedup: 13.33%, Ratio: 1.15 x
MB_6.08_mod_CUDA_V12b_x4.exe : 24 seconds
Speedup: 20.00%, Ratio: 1.25 x
 
WU : testWU-5.wu
MB_6.08_CUDA_V12_VLARKill_FPLim2048_test.exe : 34 seconds
MB_6.08_mod_CUDA_V12b.exe : 38 seconds
Speedup: -11.76%, Ratio: 0.89 x
MB_6.08_mod_CUDA_V12b_x4.exe : 32 seconds
Speedup: 5.88%, Ratio: 1.06 x
 
WU : testWU-6.wu
MB_6.08_CUDA_V12_VLARKill_FPLim2048_test.exe : 4 seconds
MB_6.08_mod_CUDA_V12b.exe : 2 seconds
Speedup: 50.00%, Ratio: 2.00 x
MB_6.08_mod_CUDA_V12b_x4.exe : 3 seconds
Speedup: 25.00%, Ratio: 1.33 x
 
WU : testWU-7.wu
MB_6.08_CUDA_V12_VLARKill_FPLim2048_test.exe : 23 seconds
MB_6.08_mod_CUDA_V12b.exe : 23 seconds
Speedup: 0.00%, Ratio: 1.00 x
MB_6.08_mod_CUDA_V12b_x4.exe : 26 seconds
Speedup: -13.04%, Ratio: 0.88 x
--- End code ---

[attachment deleted by admin]

Pappa:
@glennaxl

When You go to post you will notice a Yellow "Additional Options" That is where you click to upload the file. When it opens you should see Attach and a "Browse" Button that allows you to find the file.

Pappa:
Okay, I have Aqua running on the X2 6000 using both cores...

Quick timetable
 
WU : 01-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 17.578 secs CPU
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 14.047 secs CPU
Speedup     : 20.09%
Ratio       : 1.25 x
MB_6.08_mod_CUDA_V12b.exe : 14.391 secs CPU
Speedup     : 18.13%
Ratio       : 1.22 x
 
WU : 02-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 17.703 secs CPU
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 13.781 secs CPU
Speedup     : 22.15%
Ratio       : 1.28 x
MB_6.08_mod_CUDA_V12b.exe : 14.250 secs CPU
Speedup     : 19.51%
Ratio       : 1.24 x
 
WU : 03-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 17.156 secs CPU
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 14.422 secs CPU
Speedup     : 15.94%
Ratio       : 1.19 x
MB_6.08_mod_CUDA_V12b.exe : 14.594 secs CPU
Speedup     : 14.93%
Ratio       : 1.18 x
 
WU : 04-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 17.375 secs CPU
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 13.422 secs CPU
Speedup     : 22.75%
Ratio       : 1.29 x
MB_6.08_mod_CUDA_V12b.exe : 14.953 secs CPU
Speedup     : 13.94%
Ratio       : 1.16 x
 
WU : 05-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 16.438 secs CPU
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 13.703 secs CPU
Speedup     : 16.64%
Ratio       : 1.20 x
MB_6.08_mod_CUDA_V12b.exe : 14.203 secs CPU
Speedup     : 13.60%
Ratio       : 1.16 x
 


[attachment deleted by admin]

Pappa:
Actually the more complete test on the X2 6000

adding in the "CUDAMB_V13noKill_ICCIPP_SSE3_AKPFTest_TK4.exe"

Quick timetable
 
WU : 01-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 16.938 secs CPU
CUDAMB_V13noKill_ICCIPP_SSE3_AKPFTest_TK4.exe : 24.297 secs CPU
Speedup     : -43.45%
Ratio       : 0.70 x
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 13.547 secs CPU
Speedup     : 20.02%
Ratio       : 1.25 x
MB_6.08_mod_CUDA_V12b.exe : 14.344 secs CPU
Speedup     : 15.31%
Ratio       : 1.18 x
 
WU : 02-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 16.703 secs CPU
CUDAMB_V13noKill_ICCIPP_SSE3_AKPFTest_TK4.exe : 24.938 secs CPU
Speedup     : -49.30%
Ratio       : 0.67 x
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 13.859 secs CPU
Speedup     : 17.03%
Ratio       : 1.21 x
MB_6.08_mod_CUDA_V12b.exe : 14.125 secs CPU
Speedup     : 15.43%
Ratio       : 1.18 x
 
WU : 03-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 16.984 secs CPU
CUDAMB_V13noKill_ICCIPP_SSE3_AKPFTest_TK4.exe : 24.359 secs CPU
Speedup     : -43.42%
Ratio       : 0.70 x
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 14.031 secs CPU
Speedup     : 17.39%
Ratio       : 1.21 x
MB_6.08_mod_CUDA_V12b.exe : 14.281 secs CPU
Speedup     : 15.91%
Ratio       : 1.19 x
 
WU : 04-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 16.609 secs CPU
CUDAMB_V13noKill_ICCIPP_SSE3_AKPFTest_TK4.exe : 25.094 secs CPU
Speedup     : -51.09%
Ratio       : 0.66 x
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 14.016 secs CPU
Speedup     : 15.61%
Ratio       : 1.19 x
MB_6.08_mod_CUDA_V12b.exe : 14.500 secs CPU
Speedup     : 12.70%
Ratio       : 1.15 x
 
WU : 05-FMN0446.wu
setiathome_6.08_windows_intelx86__cuda.exe : 17.094 secs CPU
CUDAMB_V13noKill_ICCIPP_SSE3_AKPFTest_TK4.exe : 24.984 secs CPU
Speedup     : -46.16%
Ratio       : 0.68 x
MB_6.08_CUDA_V12_noKill_FPLim2048.exe : 13.609 secs CPU
Speedup     : 20.39%
Ratio       : 1.26 x
MB_6.08_mod_CUDA_V12b.exe : 14.078 secs CPU
Speedup     : 17.64%
Ratio       : 1.21 x
 


[attachment deleted by admin]

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version