AK V8 + CUDA MB team work mod

Forum > GPU crunching

<< < (17/51) > >>

Raistmer:

--- Quote from: Grey Shadow on 03 Feb 2009, 01:42:20 pm ---Hi Raistmer,

Would you mind updating your mod to switch between CPU/GPU for VLAR units (as you mentioned on the first page - if task is VLAR - stay with CPU, else - try pass task to GPU)? VLAR Autokill works perfectly, at last I have stable SETI setup using GPU and not requiring constant attention, but sometimes it is very pity to see that 20 WUs downloaded between constant "out of work" messages go to trash bin...

Thanks in advance :)

--- End quote ---

Well, still can't figure out how accomplish that w/o going to idle GPU. as you said VLARs will come in bunches (20 tasks in your example) if GPU will reject to do them passing them to CPU - well, GPU will be just idle. No gains. So, much better to use V8 (not V8a) for now - as I remember it should process VLARs on GPU instead of abort it. BTW, CUDA app from all V8 packages are interchengeable so you could just overwrite single EXE (with stopped BOINC of course) to switch between VLAR kill and VLAR process.

Grey Shadow:
I understand your concern, but GPU-crunching of VLAR workunit requires at least the same time as CPU-crunching. So there will be almost no gain from allowing GPU-crunching of VLARs. Also when I was using stock apps or V8 I was very worried by PC overload during VLAR crunching by GPU - PC starts working very-very slow and unstable, it is almost impossible to use it for any other purposes during that time... really I think that CPU efficace in such conditions also decrease so we loose any gain from GPU involvement.

Of course, the best way is to postpone VLAR workunit if GPU is idle and go to next available workunit, but as I understand this is BOINC task so optimized apps can't achieve this without client support.

Raistmer:

--- Quote from: Grey Shadow on 03 Feb 2009, 02:53:38 pm ---I understand your concern, but GPU-crunching of VLAR workunit requires at least the same time as CPU-crunching. So there will be almost no gain from allowing GPU-crunching of VLARs.

--- End quote ---
Well, you recive one additional core instead 2 or even 3 - it worse of course but not bad too ;)

--- Quote --- Also when I was using stock apps or V8 I was very worried by PC overload during VLAR crunching by GPU - PC starts working very-very slow and unstable, it is almost impossible to use it for any other purposes during that time...

--- End quote ---
I see, but because experiment with tuning AK_v8 on different ARs failed (there was no speed increase if I did PGO only on VLAR dyn data set instead of complete set) adding AR recognition now on low priority really.

--- Quote ---really I think that CPU efficace in such conditions also decrease so we loose any gain from GPU involvement.

--- End quote ---
Not so sure. Don't forget, sluggish user interface - one thing and CPU app performance - completely another...

--- Quote ---Of course, the best way is to postpone VLAR workunit if GPU is idle and go to next available workunit, but as I understand this is BOINC task so optimized apps can't achieve this without client support.

--- End quote ---
Yes, interaction with BOINC API needed...

cyclejon:

--- Quote from: Raistmer on 28 Jan 2009, 08:02:15 pm ---As you can see SSE2 build performs better than SSE3 one on SSE3-capable (early) AMD. Maybe on latest Phenom SSE3-situation was improved ?
Could someone of our pre-testers or regular users try to run KWSN bench for AK_v8b_SSE3 and AK_v8b_SSE2 on new Phenom CPU to shed light on current situation with AMD SSE3 support quality ?

--- End quote ---

Ran it on a phenom 9500 with the v8b apps and again with the regular v8 op apps. SSE2 was faster in the v8b apps, but the v8 apps were mixed. My athlon x2 4200 showed the same results except when put against x64 builds. SSE3 x64 ran faster than SSE2 x86. I'll attach the benchmark files from the phenom.

[attachment deleted by admin]

Raistmer:
WU : testWU-1.wu
AK_v8b_win_SSE3_GPU_CPU_team.exe : 199 seconds
AK_v8b_win_SSE2_GPU_CPU_team.exe : 6 seconds
Speedup: 96.98%, Ratio: 33.17 x

Strange result.
How did you prevent call to CUDA app?

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version