Author Topic: CUDA_V12_app (Read 43853 times)

Sutaru Tsureku · « **Reply #45 on:** 13 Jan 2010, 11:51:37 am »

Thanks!

I thought this would be also interesting.. 'Message 962882'.
Only ~ 200 MB free GPU RAM? All 2 GTX295 (all 4 GPUs), Win7 prob?

What would be the easiest way?
Install WinXP?

Just curious.. 'Message 962794', the same WinXP on more PCs? True or lie? $:-\$
This would be the easiest/cheapest way for Jon..

BTW. My browser need ~ 10+ sec. to find this site before he can load the site.
Is there a connection prob?
In past it was much faster/immediately..

Pappa · « **Reply #46 on:** 13 Jan 2010, 01:21:04 pm »

@Raistmer how much futher should I carry the experiment. This host has been running for quite a while and the bulk of the errors are -12's a -1 and vlarkilled.

usability of video is good... Running Aqua on CPU

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5133086

Shortly I can switch on Seti to the CPU's if needed.

Raistmer · « **Reply #47 on:** 13 Jan 2010, 04:46:47 pm »

do yuo run V12b? On single GPU? at any moment

It's for multi GPU and even there advantages still questionable.

Pappa · « **Reply #48 on:** 13 Jan 2010, 07:02:05 pm »

Quote from: Raistmer on 13 Jan 2010, 04:46:47 pm

do yuo run V12b? On single GPU? at any moment
It's for multi GPU and even there advantages still questionable.

Actually I like the affinty feature esp when on an AMD. In the past I played with setting affinity and saw increased progress but was not able to set for 24/7 to set affinity on each task. Even Crunch3r knew that if task could get locked to a specifice CPU then on a dual CPU machine one runs full tilt and the other handles most of the OS stuff. So to that extent that part is a success.

Here is a LAR http://setiathome.berkeley.edu/result.php?resultid=1478568634 0.21 AR that it just completed (2210.4 sec).

Now if we could get Sataru to test it (live), we would have a better idea of its merit. It will not hurt his RAC and after a month live it is "safe."

sunu · « **Reply #49 on:** 13 Jan 2010, 07:21:03 pm »

In a multi-GPU, multi core system there could be two possibilities: 1) all GPU tasks assigned to a single core, 2) every GPU gets its own core. Have you considered these regarding affinity?

Pappa · « **Reply #50 on:** 13 Jan 2010, 08:39:35 pm »

Quote from: sunu on 13 Jan 2010, 07:21:03 pm

In a multi-GPU, multi core system there could be two possibilities: 1) all GPU tasks assigned to a single core, 2) every GPU gets its own core. Have you considered these regarding affinity?

It has been studied many times and can work to the advantage of the system administrator. Over the years there have been many utilities that you could down load that would "lock on launch" an application to a specific CPU. So while CPU 0 is handling Ring 0 stuff, launch the Database compression on CPU1 with affinity lock. So in this first experiment Raistmers intent was in machines with multi cores and multi GPU's that the GPU tasks were assigned to a specific core. Looking at Boinc and how things are ran the Apps migrate back and forth depending on what "might happen" at any given instant. With the advent of multicores Crunch3r attempted to convince the Boinc Devs that "affinity locking" might be a very good idea. He even produced a boinc core or two to prove it.

So to actually fully prove it all Lunatics apps would have to be CPU/core aware (and a table setup to read what is where) . Core 0 takes care of the OS and other stuff the Users plays with. Other cores real or virtual then are assigned to a specfic app, they run clean uninteruppted.

in Nix you can assign certain things to CPU's which has been there for ages... That was done on a Dual Pent Pro under slackware (then it was only 19 1.44 floppies to load).

sunu · « **Reply #51 on:** 13 Jan 2010, 09:07:01 pm »

Thanks Pappa, I know all these, more or less. I'm asking how Raistmer implemented the affinity in this app, all cuda tasks in a fixed core or distributes them among all available?

Pappa · « **Reply #52 on:** 13 Jan 2010, 09:24:38 pm »

Quote from: sunu on 13 Jan 2010, 09:07:01 pm

Thanks Pappa, I know all these, more or less. I'm asking how Raistmer implemented the affinity in this app, all cuda tasks in a fixed core or distributes them among all available?

From what I read all GPUs would use one core. But that was few pages ago...

We wait for Raistmer

For nostaliga it is team member #5 before it went to Nix

http://seticlassic.ssl.berkeley.edu/stats/team/team_57956.html

boy how things change

sunu · « **Reply #53 on:** 13 Jan 2010, 10:22:39 pm »

Quote from: Pappa on 13 Jan 2010, 09:24:38 pm

From what I read all GPUs would use one core. But that was few pages ago...
We wait for Raistmer

Theoretically speaking, I think it would be better to distribute them among all cores, or not? $:-\$
I'll try to do something similar in linux through script.

Quote from: Pappa on 13 Jan 2010, 09:24:38 pm

For nostaliga it is team member #5 before it went to Nix
http://seticlassic.ssl.berkeley.edu/stats/team/team_57956.html
boy how things change

Oh, memories of seti classic!

Pappa · « **Reply #54 on:** 14 Jan 2010, 01:04:15 am »

Quote from: sunu on 13 Jan 2010, 10:22:39 pm

Quote from: Pappa on 13 Jan 2010, 09:24:38 pm
From what I read all GPUs would use one core. But that was few pages ago...
We wait for Raistmer
Theoretically speaking, I think it would be better to distribute them among all cores, or not? $:-\$
I'll try to do something similar in linux through script.

Quote from: Pappa on 13 Jan 2010, 09:24:38 pm
For nostaliga it is team member #5 before it went to Nix
http://seticlassic.ssl.berkeley.edu/stats/team/team_57956.html
boy how things change

Oh, memories of seti classic!

Actually I think that is part was what is trying to be processed . Silly me.

Raistmer · « **Reply #55 on:** 14 Jan 2010, 03:16:25 am »

on quad or higher V12b locks 2 cores to each app, v12bx4 locks 1 core (different one, of course) to each app. the idea is to remove competition for cpu between high-priority gpu apps themselves. It will be needed only if we have few gpu apps running, i.e. multi-GPU host.

sunu · « **Reply #56 on:** 14 Jan 2010, 10:00:05 am »

Thanks Raistmer, so you distribute them evenly among available cores, that is also the best in my thinking. I'll write a script in the linux forum to simulate it in linux since we don't have the luxury of a new build.

I'll do something similar for CPU multibeam tasks also, but it is a little trickier. I try to think a "fixed" feature of CPU tasks to "measure" for the affinity, PIDs, running time of the task (older-newer) or anything else seem not good enough characteristics.

To me the best seem slot numbers. They are pretty fixed, they might change only when boinc goes to pre-empt mode. What do you think?

Raistmer · « **Reply #57 on:** 14 Jan 2010, 10:58:56 am »

yes, slots numbers should go. But it's worth to check if affinity really increase throughput.
I interested affinity locking few times already and always seen performance degradation for app. Only in some special circumstanses as app feeding GPU it could bring some benefits. In most cases windows do good thread allocation between cores (don't know about linux).
It (windows) just can't do thread re-shedule on different core based only on thread priority (maybe it can but doesn't do it when needed).
That is, 2 gpu apps on one core while 2 cpu-only apps on another core is quite possible. gpu app has bigger priority but can't preempt cpu app on second core (experimental fact with 2 early hybrid AP and 4 MB apps on my quad). It looks as pretty big OS core limitation, maybe in some win version it removed already, maybe some msdn reading on this topic could give more info...

sunu · « **Reply #58 on:** 14 Jan 2010, 11:28:04 am »

Quote from: Raistmer on 14 Jan 2010, 10:58:56 am

yes, slots numbers should go. But it's worth to check if affinity really increase throughput.
I interested affinity locking few times already and always seen performance degradation for app. Only in some special circumstanses as app feeding GPU it could bring some benefits.

The logical thing would be that four AK_v8 tasks fixed in their respective cores during their lifetime would be better than have them jumping around cores every few seconds. You say that experiments have shown the opposite?

Raistmer · « **Reply #59 on:** 14 Jan 2010, 12:55:24 pm »

Quote from: sunu on 14 Jan 2010, 11:28:04 am

Quote from: Raistmer on 14 Jan 2010, 10:58:56 am
yes, slots numbers should go. But it's worth to check if affinity really increase throughput.
I interested affinity locking few times already and always seen performance degradation for app. Only in some special circumstanses as app feeding GPU it could bring some benefits.

The logical thing would be that four AK_v8 tasks fixed in their respective cores during their lifetime would be better than have them jumping around cores every few seconds. You say that experiments have shown the opposite?

yes, opposite. runtime for affinity locked app is bigger.
for example when one core wait for disk io or smth else it can be better to bring thread from another core. in general, thare never num of active threads <=number of cores. For windows especially vista and so on, number of threads can reach thousands.
for example, my netbook under vista currently runs 59 processes with 765 threads. of course most of these threads are suspended, but there are only 2 CPUs to handle all others....

Author Topic: CUDA_V12_app (Read 43853 times)

Sutaru Tsureku

Re: CUDA_V12_app

Pappa

Re: CUDA_V12_app

Raistmer

Re: CUDA_V12_app

Pappa

Re: CUDA_V12_app

sunu

Re: CUDA_V12_app

Pappa

Re: CUDA_V12_app

sunu

Re: CUDA_V12_app

Pappa

Re: CUDA_V12_app

sunu

Re: CUDA_V12_app

Pappa

Re: CUDA_V12_app

Raistmer

Re: CUDA_V12_app

sunu

Re: CUDA_V12_app

Raistmer

Re: CUDA_V12_app

sunu

Re: CUDA_V12_app

Raistmer

Re: CUDA_V12_app