Author Topic: CUDA MB V12b for multi-GPU multicore hosts. (Read 38355 times)

Raistmer · « **on:** 20 Dec 2009, 06:20:53 pm »

Attached builds use affinity lock and priority class NORMAL/ thread priority HIGHEST to achieve better CPU response times for their needs.
On my own quad with single low-end GPU I don;t see any improvement so try at your own hosts and see if these build will increase host's RAC or not.
x4 version designed for use on quad (or better) with >2 GPUs installed.
another one - for multicore (duo or quad or better) host with only 2 GPUs installed.
But again, try both and see what will works better on your particular equipment (apps not tested on targed hardware, it's only assumptions).

[attachment deleted by admin]

Raistmer · « **Reply #1 on:** 23 Dec 2009, 02:24:38 pm »

> 20 downloads and still no feedback?

Pappa · « **Reply #2 on:** 23 Dec 2009, 09:52:41 pm »

I have been running I grab it and will install on the 8400GS...

I have to ask is it VLAR or NONVLAR kill..

Edit: successfully tranplanted into Main on these two hosts.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5133086 which is the 9800GT, I am noticing a bit of sluggishness on this machine doing a 0.40 AR. But then it is doing Aqua on teh CPU's and Seti on the GPU. Still tolerable.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=2435134 which is the 8400GS (which is 4 hours 20% into a 0.008). So whiile the VLAR is sluggish it is tollerable.

Raistmer · « **Reply #3 on:** 24 Dec 2009, 03:18:38 am »

1) Should be VLARkill version (cause all who aware use rebranding tool and for ones who unaware better not to meet VLARs at all

)
So not quite understand how your can do VLAR with this build.
2) Unfortunately, you have no targed hardware it seems (as do I myself). On my 9400GT this build showed worser results on standalone test than prev build.
That is, it probably not suitable for single GPU configs. Affinity lock implemented (and needed) solely for multi-GPU (and fast GPU) hosts where initial CPU-based phase should be as small as possible....

Pepi · « **Reply #4 on:** 24 Dec 2009, 09:40:10 am »

Raistmer I try on my both machines your new builds ( both)
I know that you not build form my type of computer , but I was trying them and now I am back to your "normal build"

1 computer is Sampron 140 with green GT9800. With new builds ( both of them) every WU is slower about 50 sec in 1600 sec ( 1650 sec verse 1600 sec with "old build")
2 computer is AMD Quad with GT240: both new builds are slower, but not as in previous case.

I disabled network access, make rar archive and try set of ten results , so I think it is good comparative method.

Best regards for holidays

Raistmer · « **Reply #5 on:** 24 Dec 2009, 11:29:19 am »

Nothing new, it's for multi-GPU hardware.

Pappa · « **Reply #6 on:** 24 Dec 2009, 12:11:40 pm »

on the 8400GS host it had 3 errors not sure if it I have not rebooted in a couple fo days (memory corruption)

AR 2.0 error in pulsefind

http://setiathome.berkeley.edu/result.php?resultid=1460163970
http://setiathome.berkeley.edu/result.php?resultid=1460163968
http://setiathome.berkeley.edu/result.php?resultid=1460163895
switching back

Raistmer · « **Reply #7 on:** 24 Dec 2009, 12:24:04 pm »

Wow, time exceeded in pulsefind. Resembles my own 9400GT experiments but I had unspecified error...

And do you never encountered same error for prev builds?

Pappa · « **Reply #8 on:** 24 Dec 2009, 03:40:45 pm »

Quote from: Raistmer on 24 Dec 2009, 12:24:04 pm

Wow, time exceeded in pulsefind. Resembles my own 9400GT experiments but I had unspecified error...

And do you never encountered same error for prev builds?

the 8400 was running the v12 nokill and wasdoing fine other than a few inconclusives.
the 9800GT has no problems.

Raistmer · « **Reply #9 on:** 25 Dec 2009, 03:36:37 am »

Hm... interesting... can't recall on whose PC last released V2 was built - mine or Jason's? And what CUDA version was used at build time... Currently I did build with 2.3 CUDA SDK installed.
It would be good to discriminate effect of building environment and priority/affinity changes itself.
Cause nothing in data-processing path was changed I see no other possible reasons for such change in behavior...
Ah, BTW, I did one more change in last build (this change reduced executable size against prev builds) - dropped Volkov's FFT sources from build cause they not used (it seems CUDA compiler embed kernel code into executable even if kernel doesn't in use in program). This surely changed alignment of CUDA kernels. Could it be reason of this timeout you've seen - no idea...

Jason G · « **Reply #10 on:** 25 Dec 2009, 04:17:14 am »

The release V12 in the installer & the NoKill separate download ? Were built on mine using 2.2 at the time (2.3 was not in common use) and set that limit thing to 2048 as directed by yourself & Joe.

No other changes from your sources at that time, around June 20th 2009 according to build date on the exe at my end, and svn logs, (though later experiments deviate quite a lot). I think it's possible the 2.3 sdk does build larger kernels, and the 2.3 DLL's are definitely larger & produce more stress, and use more video RAM. What effects this should have on smaller cards I'm not entirely sure.

Later in the course of experimentation, as well as adding Joe's triplet kernel fixing stuff, I did introduce a constant definition in my experimental branch, called NUM_ITER which reduces the length of the pulsefinding calls. But that definition isn't in those builds.

@Al, please tell me the creation date on the exe you used that worked well on the 8400GS, so I can pinpoint which parameters were used, and corresponding svn revision.

Cheers, Jason

Raistmer · « **Reply #11 on:** 25 Dec 2009, 06:19:44 am »

Thanks, low-end GPUs are borderline case (by amount of memory available and by lenght of kernel calls) so they are especially sensefull to even smallest changes between builds. I still had to understand why my own 9400GT works just well in Q9450 and fails badly and often in Core duo and Athlon64 hosts....

Jason G · « **Reply #12 on:** 25 Dec 2009, 06:30:51 am »

Hmm, yes, very confusing. Could you list the builds you've tried on the Athlon (and I guess from what you say none work properly...). At one time before I went hybrid, I did lots of test builds with reduced pulse finding blocks (NUM_ITER5 IIRC), perhaps those work in this? While v13 would be interesting to try on that, I don't think it'll help pinpoint the problem, since obviously the problem is in cuda code or hardware somewhere. I'm thinking something to do with chipset/DMA transfers. I presume the mobo BIOS is up to date? because there was some issues with PCIe on some mobos IIRC.

Pappa · « **Reply #13 on:** 25 Dec 2009, 11:59:34 am »

Quote from: Jason G on 25 Dec 2009, 04:17:14 am

The release V12 in the installer & the NoKill separate download ? Were built on mine using 2.2 at the time (2.3 was not in common use) and set that limit thing to 2048 as directed by yourself & Joe.

No other changes from your sources at that time, around June 20th 2009 according to build date on the exe at my end, and svn logs, (though later experiments deviate quite a lot). I think it's possible the 2.3 sdk does build larger kernels, and the 2.3 DLL's are definitely larger & produce more stress, and use more video RAM. What effects this should have on smaller cards I'm not entirely sure.

Later in the course of experimentation, as well as adding Joe's triplet kernel fixing stuff, I did introduce a constant definition in my experimental branch, called NUM_ITER which reduces the length of the pulsefinding calls. But that definition isn't in those builds.

@Al, please tell me the creation date on the exe you used that worked well on the 8400GS, so I can pinpoint which parameters were used, and corresponding svn revision.

Cheers, Jason

6-19-2009

this message SETI CUDA MB so with this the "Proposed 'Better?' medium term VLAR solution" started not too long after that around the 1st of July.

Jason G · « **Reply #14 on:** 25 Dec 2009, 12:11:11 pm »

Quote from: Pappa on 25 Dec 2009, 11:59:34 am

....at around the 1st of July.

OK... that date matches up with my builds of 'Bog standard V12' with FPLIM 2048 applied (amongst other values tested at the time), later committed, after proof with testing & mimo's profiling, on on 7th July. (prior assertions confirmed)

@Raistmer: That corresponds to r93 in the CudaMB_exp branch, which you might like to compare to your r89, which it is based on. I don't see significant source changes amongst the experiments, between those revisions, So I guess used SDK might be one remaining suspect.

Author Topic: CUDA MB V12b for multi-GPU multicore hosts. (Read 38355 times)

Raistmer

CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Pappa

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Pepi

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Pappa

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Pappa

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Jason G

Re: CUDA MB V12b for multi-GPU multicore hosts.

Raistmer

Re: CUDA MB V12b for multi-GPU multicore hosts.

Jason G

Re: CUDA MB V12b for multi-GPU multicore hosts.

Pappa

Re: CUDA MB V12b for multi-GPU multicore hosts.

Jason G

Re: CUDA MB V12b for multi-GPU multicore hosts.