AK V8 + CUDA MB team work mod

Forum > GPU crunching

<< < (21/51) > >>

Raistmer:
Each time when CUDA app finishes BOINC can take decisio to switch to another project. In that case 4 (for quad) SETI MB/AP apps will continue with CPU , GPU will be idle and another CPU-related app (for another project) will be started.
The less cores has PC the more probable such situation.

Solution could be regular checking by CPU SETI MB app if GPU app is running and if not -> exit immediately with zero status.
This exit should be treated by BOINC as non-error exit. It will restart task allowing CPU team app call CUDA app again to continue computations on GPU.
This should decrease amount of time GPU stays idle.

Will try to perform such modification (CPU app could check if GPU is free or busy each checkpoint interval for example - it's very good time to exit - right after making checkpoint ;D )

Raistmer:
Ok, app modified and built, now SSSE3 version only.

If someone interesting in decreasing PGU idle time, please, try it and report here. It will inform about rescheduling attempt by this line in stderr:
"Idle GPU detected, trying to reschedule task to GPU, exiting..."

You need stop BOINC, extract AK_v8 version from attached archive in place of old one and start BOINC.
Please, don't forget, only SSSE3 for now.

[attachment deleted by admin]

Raistmer:
There is some situation when VLAR autokill will waste resources:

If VLAR task is already near completion on CPU but re-scheduling to GPU occurs, almost complete task will be killed by CUDA app autokill mod.

To avoid such situation VLAR task will be not killed but processed on GPU if it was started before (if it has non-zero progress already). If it's fresh VLAR task it will be aborted as before.
That way CPU "investments" in task will be saved.

Update to CUDA app attached.
This update appropriate to any SSE level of CPU.

ADDON: you can see if it works by this line in stderr:
"VLAR WU (AR: xxxxx )detected, but task partially done already, continuing computations"

[attachment deleted by admin]

Raistmer:
And another update - now it's hack into BOINC API.
This build should ignore BOINC request to suspend execution when BOINC switched to another task.
This should reduce idle GPU time and increase total system performance.
Update appropriate for all SSE levels.

Special thanks go to Jason who pointed me where to dig :)

Warning: I don't know if it will work as intended so consider this update as experimental one. If you have no time to watch your BOINC installation or you feel yourself not to be able to deal with possible consequences, please, don't use it.

I would like to recive some feedback if it helps avoid GPU idle state or how it works on your host in general.

[attachment deleted by admin]

Raistmer:
I noticed some bug in current CUDA mod behavior. Mostly it's OS problem with correct scheduling but net result - idle GPU:

When I run another CPU intensive application (single threaded) (quantum chem calculations in this particular case) GPU temperature goes to almost idle value and CPU consumption of CUDA app goes to zero. I use quad so only 1 core could be busy with non-BOINC task =>OS could just reschedule CUDA app on another core... but it seems Vista doesn't understand that fact.
And what is cure in this situation?: I restricted non-BOINC app to CPU#3 only and restricted CUDA app to CPU#0-2 (all except 3). No process priority changed, only affinity corrections.
And GPU temp restores, %done begin increasing, CPU consumption of CUDA app restored.

Pure example of CPU scheduling bug in Vista IMHO.

So, think CUDA app modification with affinity bound to first CPU can easy such situation (I can't re-set CUDA app affinity each 6 minutes, but can exclude first CPU for very long running non-BOINC app). Maybe it will help with another cases of CUDA hang with 0 CPU consumption described before.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version