CUDA_V12

Forum > GPU crunching

CUDA_V12_app

<< < (12/16) > >>

Raistmer:
on quad or higher V12b locks 2 cores to each app, v12bx4 locks 1 core (different one, of course) to each app. the idea is to remove competition for cpu between high-priority gpu apps themselves. It will be needed only if we have few gpu apps running, i.e. multi-GPU host.

sunu:
Thanks Raistmer, so you distribute them evenly among available cores, that is also the best in my thinking. I'll write a script in the linux forum to simulate it in linux since we don't have the luxury of a new build.

I'll do something similar for CPU multibeam tasks also, but it is a little trickier. I try to think a "fixed" feature of CPU tasks to "measure" for the affinity, PIDs, running time of the task (older-newer) or anything else seem not good enough characteristics.

To me the best seem slot numbers. They are pretty fixed, they might change only when boinc goes to pre-empt mode. What do you think?

Raistmer:
yes, slots numbers should go. But it's worth to check if affinity really increase throughput.
I interested affinity locking few times already and always seen performance degradation for app. Only in some special circumstanses as app feeding GPU it could bring some benefits. In most cases windows do good thread allocation between cores (don't know about linux).
It (windows) just can't do thread re-shedule on different core based only on thread priority (maybe it can but doesn't do it when needed).
That is, 2 gpu apps on one core while 2 cpu-only apps on another core is quite possible. gpu app has bigger priority but can't preempt cpu app on second core (experimental fact with 2 early hybrid AP and 4 MB apps on my quad). It looks as pretty big OS core limitation, maybe in some win version it removed already, maybe some msdn reading on this topic could give more info...

sunu:

--- Quote from: Raistmer on 14 Jan 2010, 10:58:56 am ---yes, slots numbers should go. But it's worth to check if affinity really increase throughput.
I interested affinity locking few times already and always seen performance degradation for app. Only in some special circumstanses as app feeding GPU it could bring some benefits.

--- End quote ---

The logical thing would be that four AK_v8 tasks fixed in their respective cores during their lifetime would be better than have them jumping around cores every few seconds. You say that experiments have shown the opposite?

Raistmer:

--- Quote from: sunu on 14 Jan 2010, 11:28:04 am ---
--- Quote from: Raistmer on 14 Jan 2010, 10:58:56 am ---yes, slots numbers should go. But it's worth to check if affinity really increase throughput.
I interested affinity locking few times already and always seen performance degradation for app. Only in some special circumstanses as app feeding GPU it could bring some benefits.

--- End quote ---

The logical thing would be that four AK_v8 tasks fixed in their respective cores during their lifetime would be better than have them jumping around cores every few seconds. You say that experiments have shown the opposite?

--- End quote ---
yes, opposite. runtime for affinity locked app is bigger.
for example when one core wait for disk io or smth else it can be better to bring thread from another core. in general, thare never num of active threads <=number of cores. For windows especially vista and so on, number of threads can reach thousands.
for example, my netbook under vista currently runs 59 processes with 765 threads. of course most of these threads are suspended, but there are only 2 CPUs to handle all others....

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version