Author Topic: SETI MB CUDA for Linux (Read 543763 times)

Richard Haselgrove · « **Reply #75 on:** 25 Feb 2009, 05:22:30 am »

The credit difference is universal to all SETI CUDA applications - stock Windows as well. That goes back to the developers in Berkeley / nVidia - nothing to do with optimisations in general, or Linux in particular.

dtiger · « **Reply #76 on:** 26 Feb 2009, 02:52:55 am »

The credits don't matter for me. I'm interesting in technology only.

As I see from my experience, small units with working time about 16 mins store correct log info in stderr_txt. The longer units are full of "Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument."

Also, small units can start one after one on GPU, while longer units are fall back to CPU after completing one on GPU. Seems to memory issue problem with current Crunch3r's SETI-CUDA release (setiathome-CUDA-6.08.i686.tar.bz2).

Also, as BOINC starts 2 normal CPU crunchers on my C2D E4400 and additionally SETI-CUDA grabs one of CPU for 100%, the crunchers start fighting for second CPU and all thing goes very slowly including X-server response time.

sunu · « **Reply #77 on:** 26 Feb 2009, 06:46:51 am »

256MB seem borderline or not enough for the linux cuda app.

If the cpu app you run is astropulse you can force boinc to run only one instance. In your app_info.xml, in the astropulse section, add

   <avg_ncpus>2.0000</avg_ncpus>
   <max_ncpus>2.0000</max_ncpus>

immediately after

<version_num>500</version_num>

dtiger · « **Reply #78 on:** 03 Mar 2009, 04:45:33 am »

Seems like 256 MB is enough for Win's version of SETI-CUDA. They run fine.
Also, as I see from workunits page, Windows clients crunch units for 100-200 seconds on 8800 GTS 256MB, while my 8600 GT 256MB run about 1000-2000 seconds for the same unit, it's a huge abnormal difference for similar hardware.

sunu · « **Reply #79 on:** 03 Mar 2009, 07:33:46 am »

Quote from: dtiger on 03 Mar 2009, 04:45:33 am

Also, as I see from workunits page, Windows clients crunch units for 100-200 seconds on 8800 GTS 256MB, while my 8600 GT 256MB run about 1000-2000 seconds for the same unit, it's a huge abnormal difference for similar hardware.

Two things:
1. Some users with 256 MB graphics cards see some WUs fall back to CPU computation because of not enough memory. Maybe that explains the increased time.
2. The linux CUDA app uses a full core so the time reported is the "real" computation time. The windows CUDA app uses a small percentage of a single core and records only that time. The "real" computation time for windows machines is much larger, possibly equivalent to that of linux PCs.

CorranHorn · « **Reply #80 on:** 13 Mar 2009, 08:39:48 am »

The windows version is faster than the linux version.

http://setiathome.berkeley.edu/workunit.php?wuid=423584760

Crunch3r · « **Reply #81 on:** 13 Mar 2009, 10:16:07 am »

Quote from: CorranHorn on 13 Mar 2009, 08:39:48 am

The windows version is faster than the linux version.

http://setiathome.berkeley.edu/workunit.php?wuid=423584760

No It's not faster. You should get some info about the reported 'CPU time' first and the difference between the win & linux app, before posting such a BS...

Raistmer · « **Reply #82 on:** 13 Mar 2009, 10:51:06 am »

More correctly - Linux build uses much more CPU time than Windows one. Why it doing so - that's the question.

Jason G · « **Reply #83 on:** 13 Mar 2009, 11:17:37 am »

Quote from: Raistmer on 13 Mar 2009, 10:51:06 am

More correctly - Linux build uses much more CPU time than Windows one. Why it doing so - that's the question.

LoL I'm with Crunch3r on this one. Because CPU time is a useless measure of GPU app performance, and depends on how the OS defines CPU time. Where and how cpu time is attributed to the user program or kernel time will vary by platform, along with the methods used for handling the GPU feeding.

When dealing with a parallel program, you can only go by Wall clock time on Same AR WUs only. The scheduling and accounting semantics between the two OSes will be vastly different, and likely the Linux figure is just being 'more honest'.

Jason

Raistmer · « **Reply #84 on:** 13 Mar 2009, 11:21:22 am »

No, you missed that if CPU is busy - it's busy.
But if CPU free - it can be used somewhere else.
It seems in Linux CPU is busy all time CUDA app runs (I can do conclusions only by read posts of course, didn't run it on own host).

ADDON: on windows I studied total run time (elapsed) for busy cores with CUDA app so pretty confident, CPU is almost FREE while CUDA app running INDEED.
Windows doesn't cheat here as you suppose.

Jason G · « **Reply #85 on:** 13 Mar 2009, 11:25:41 am »

Quote from: Raistmer on 13 Mar 2009, 11:21:22 am

No, you missed that if CPU is busy - it's busy.
But if CPU free - it can be used somewhere else.
It seems in Linux CPU is busy all time CUDA app runs (I can do conclusions only by read posts of course, didn't run it on own host).

You missed that If I'm spending time in a kernel driver, I can attribute it to the program or not. Windows doesn't.

Raistmer · « **Reply #86 on:** 13 Mar 2009, 11:43:26 am »

Again, I _measured_ elapsed times in config all cores busy with CPU app for CUDA app and measured elapsed time for CPU app when CUDA app running and other cores busy too.
So, NO noticeable kernel time increase here, all fair.
Linux does something wrong here it seems...

Jason G · « **Reply #87 on:** 13 Mar 2009, 11:45:33 am »

Watch deferred procedure Calls process (DPCs) %CPUusage in process explorer, with & without Cuda app running.

Raistmer · « **Reply #88 on:** 13 Mar 2009, 11:47:00 am »

Quote from: Jason G on 13 Mar 2009, 11:45:33 am

Watch deferred procedure Calls process (DPCs) %CPUusage in process explorer, with & without Cuda app running.

For what? Elapsed == WALL CLOCK.

Jason G · « **Reply #89 on:** 13 Mar 2009, 12:00:08 pm »

Quote from: Raistmer on 13 Mar 2009, 11:47:00 am

Quote from: Jason G on 13 Mar 2009, 11:45:33 am
Watch deferred procedure Calls process (DPCs) %CPUusage in process explorer, with & without Cuda app running.
For what? Elapsed == WALL CLOCK.

That's why I said, Use only wall clock for app comparison.

Deferred procedure calls are executing on another core in another process space, so counts as no extra wall clock or CPU time for that Cuda Process... even though it was made by it.(and consumes resources)

DPC CPU usage with no Cuda App running ~0.77%
DPC CPU usage with Cuda Running ~2.5%
(~3 x)

Which is a full ~50% of the Cuda app shunted off to another kernel process, which will Not effect ELAPSED WALL-CLOCK, because it runs on another core, or register on app CPU_TIME either.

Linux has no windows deferred procedure calls AFAIK (could be wrong) , so cannot shunt of the CPU time to aniother process / core, so cops the cputime allocation locally.

(i.e. Windows is giving extra hidden CPU time to cuda app, there is no magic. )

http://en.wikipedia.org/wiki/Deferred_Procedure_Call

Author Topic: SETI MB CUDA for Linux (Read 543763 times)

Richard Haselgrove

Re: SETI MB CUDA for Linux

dtiger

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

dtiger

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

CorranHorn

Re: SETI MB CUDA for Linux

Crunch3r

Re: SETI MB CUDA for Linux

Raistmer

Re: SETI MB CUDA for Linux

Jason G

Re: SETI MB CUDA for Linux

Raistmer

Re: SETI MB CUDA for Linux

Jason G

Re: SETI MB CUDA for Linux

Raistmer

Re: SETI MB CUDA for Linux

Jason G

Re: SETI MB CUDA for Linux

Raistmer

Re: SETI MB CUDA for Linux

Jason G

Re: SETI MB CUDA for Linux