+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: x38g reports  (Read 150080 times)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
x38g reports
« on: 18 Jun 2011, 01:34:50 pm »
Hi all,
if you have problems and errors with x38g please post here.

heinz

~~~~~~~~~~~~~
<core_client_version>6.12.26</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: GeForce GT 540M, 961 MiB, regsPerBlock 32768
     computeCap 2.1, multiProcs 2
     clockRate = 1500000
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GT 540M is okay
SETI@home using CUDA accelerated device GeForce GT 540M
Priority of process raised successfully
Priority of worker thread raised successfully
Cuda Active: Plenty of total Global VRAM (>300MiB).
 All early cuFft plans postponed, to parallel with first chirp.

 )       _   _  _)_ o  _  _
(__ (_( ) ) (_( (_  ( (_ ( 
 not bad for a human...  _)

Multibeam x3 g Preview, Cuda 3.20

Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is :  2.592522
CUFFT error in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_fft.cu' in line 125.
Cuda error 'cudaFree(dev_PowerSpectrumSumMax)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 522 : unknown error.
Cuda error 'cudaFree(dev_outputposition)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 524 : unknown error.
Cuda error 'cudaFree(dev_flagged)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 526 : unknown error.
Cuda error 'cudaFree(dev_NormMaxPower)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 528 : unknown error.
Cuda error 'cudaFree(dev_PoTPrefixSum)' in file '@ :/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 530 : unknown error.
Cuda error 'cudaFree(dev_PoT)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 532 : unknown error.
Cuda error 'cudaFree(dev_GaussFitResults)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in lin534 : unknown error.
Cuda error 'cudaFree(dev_t_PowerSpectrum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 536 : unknown error.
Cuda error 'cudaFree(dev_PowerSpectrum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 538 : unknown error.
Cuda error 'cudaFree(dev_WorkData)' in file 'c:/[Projects]/X_CudaMB/client/cudaÀ udaAcceleration.cu' in line 540 : unknown error.
Cuda error 'cudaFree(dev_flag)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 542 : unknown error.
Cuda error 'cudaFree(dev_cx_ChirpDataArray)DD in file 'c:/[Projects]/X_CudaMB/client/cuda/cwdaAcceleration.cu' in line 546 : unknown error.
Cuda error '»óudaFree(dev_cx_DataArray)' in file 'c:/[Projects]/X_@adaMB/client/cuda/cudaAcceleration.cu' in line 548 : unknown error.
Cuda sync'd & freed.

</stderr_txt>
]]>
« Last Edit: 18 Jun 2011, 04:16:11 pm by _heinz »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #1 on: 18 Jun 2011, 02:04:50 pm »
How many of these have you had Heinz ?
Quote
CUFFT error in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_fft.cu' in line 125.

Looks similar to something that crops up on my GTX260 from time to time.  Too early in my {ïvestigation for that to say for sure what causes it, as it only seems to happen sometimes & only on certain GPUs.  I'm currently digging at the chirp directly preceding those calls, and haven't come across an issue that could cause it there, but I'll keep my eyes out.
« Last Edit: 18 Jun 2011, 02:07:59 pm by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #2 on: 18 Jun 2011, 04:12:13 pm »
to now it works very perfect und very fast

on XP with GTX 460 (266.58) i think this is 3 Minutes faster as x32f

Thanks! The 460's & 560ti's are showing vo be very nice cards, looks like the choice for replacing my 260 if I need to.  I believe they Can `o re yet.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #3 on: 18 Jun 2011, 06:27:50 pm »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #4 on: 18 Jun 2011, 06:42:32 pm »
Thanks,
   I've been working via PM with Slavac ( http://setiathome.berkeley.edu/show_user.php?usurid=9475661 , 2 x 560ti's )   to isolate how much problems might relate to something fixable in the application, and how much to something else ( i.e. drivers &/or hardware ).   

He is reporting no downclocks with the new app & Drivers, but still some of the FFT Errors, usually a few or more per day, as with my P4 with GTX 260.  That is why I suspect a deeper issue with either the drivers, library or SDK , but am trying to isolate it to something more specific & find out if there is anything in surrounding code that could be antagonising the issue. 

Nothing yet, but I'll keep searching & probably update the app or make other recommendations if I find a way to avoid them on those cards (including my 260).

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #5 on: 18 Jun 2011, 06:59:31 pm »
I'm not sure how widespread the prblem is but I have been noticing 560Tis showing up in my results as giving -9s where I and another finished clean. Hadn't looked close enough to figure out if it was just one or two of them or re spread out over the whole type. Hope you find the problem as it looks like a really nice card otherwise. After peading that thread it seems like they have found a pretty good workaround.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #6 on: 18 Jun 2011, 07:24:09 pm »
Yeah upping the voltage slightly is a likely solution for those -9's, especially when running several tasks at once.  Probably the manufacturers kept the voltage down to meet a power spec or something.

I'll likely know more in a few days, but it does look like there are timing sensitivity issues as well, possibly to do with memory controller load.  I'm testing a build on my p4/GTX260 now that both gives more descriptive error messages, and guts & replaces a bunch of code inherited from nVidia dev rk preceeding the FFTs that fail.  That 260 has steadily exhibited the issue, so has proven useful for exploring how to make the app harder.

Prior to the code-gutting excercise, I did a quick test to look for the Error code:
Quote
...
Multibeam x39 Preview, Cuda 3.20
 
Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is :  2.592398
...
A FFT launch failed (try 1), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 2), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 3), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 4), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 5), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 6), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 7), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 8), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 9), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 10), code CUFFT_EXEC_FAILED    = 0x6
CUFFT error in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_fft.cu' in line 81. code CUFFT_EXEC_FAILED    = 0x6

Googling CUT_EXEC_FAILED reveals common issues with this in the part on various cards. especially on GPUGrid, mostly with GTX 260's , so I have replaced this x39 on that host with a 'special' different x39 build that looks the same but changes a lot of code before the FFTs.  I'll be watching that for a day or so, then determine if there indeed was some crankiness in the drivers etc, or careful code leading up to the FFTs resolves the issue.

No more errors yet, touchwood, on the 260  since I changed from 'ordinary x39 with extra descriptive errors' to ' x39 with replaced code before FFTs'.  The latest errors with x39 visible (for the moment) for that host is the first kind of x39, so is part of attempting to track things down.  I'll be looking for any after 18 Jun 2011 | 16:52:11 UTC for further diagnosis/investigation.

Jason
« Last Edit: 18 Jun 2011, 07:31:34 pm by Jason G »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: x38g reports
« Reply #7 on: 19 Jun 2011, 03:01:16 am »
Hi Jason,
on my ATOM- ION x38g runs still on CPU... it has 10% after 5 hours, so ~ 50 hours runtime at end.
Should I abort it ?

Maybe there is not enough memory available while initializing the GPU.
Some standalone test with test wu's are necessary.

Have BOINC 6.10.60
NVIDIA GPU 0: ION (driver version 27533, CUDA version 4000, compute capability 1.1, 64MB, 35 GFLOPS peak)
And since I installed the driver, it shows wrong value 64MB

heinz

Offline Pepi

  • Knight o' The Realm
  • **
  • Posts: 119
Re: x38g reports
« Reply #8 on: 19 Jun 2011, 09:29:07 am »
If I may write some words :)
It looks like 560TI is little power hungry beast.  And some manufacturers think that card will operate well with lower voltage settings. But in case of SETI and other BOINC projects  it is not true. I have issue with my Gigabyte 560 TI also, and until I raise voltage to 1.0375V I was unable to get stable work. With this voltage I can do 24/7 crunching without any problems. So before you blame your app or SDK, or drivers, do two things: or downclock your GPU to 820 MHZ ( stock freq) or give to GPU more voltage , not below 1.025V. And then do testing.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #9 on: 19 Jun 2011, 11:00:14 am »
Hi Jason,
on my ATOM- ION x38g runs still on CPU... it has 10% after 5 hours, so ~ 50 hours runtime at end.
Should I abort it ?
We know it has more memory than that, so stop Boinc put a driver that reports properly  & reboot.  The app should pick up properly ehrn iy sees enough TAM.  That mechanism will change in x39 series to say 'go away' if there isn't enough total, and a Boinc temporary exit ( newer BoincApi feature I'll have to update the BoincApi in use to have access to )
« Last Edit: 19 Jun 2011, 11:28:19 am by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #10 on: 19 Jun 2011, 11:06:56 am »
If I may write some words :)

Thanks Pepi, yup we tracked down the need for a voltage tweak for those in posts & it makes sense.

I'm now currently poking at a different kind of error that occurs on some GPUs *sometimes*.  As per heinz' first report of a FFT error, noting that my 260 sees the same I have modified some code & they seem to have gone away on mine.  That's the last major 'niggle' I've found under V6 operation so far, and my p4 with GTX 260 seems to have come good for ~24 hours of operation, which I'm keeping an eye on to see if it is really solved.

Jason

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #11 on: 19 Jun 2011, 09:30:10 pm »
Hey Jason, we validated. You got the canonical result after the fourth guy got reported.   :o

http://setiathome.berkeley.edu/workunit.php?wuid=757762089


In case anyone is wondering why I posted this, as far as I know it's the first where two of us were running the new installer. Jason G was the first and I came in third. We wondered why it didn't decide then to validate instead of sending out to another wingman.
« Last Edit: 19 Jun 2011, 10:21:52 pm by perryjay »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #12 on: 19 Jun 2011, 10:21:01 pm »
Woohoo!, Yay me  ;D

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: x38g reports
« Reply #13 on: 20 Jun 2011, 10:23:26 am »
Hi Jason,
after 30 hours the ION ended the wu resultid=1956143930

have a look at it: -177 (0xffffffffffffff4f)

heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #14 on: 20 Jun 2011, 10:33:11 am »
Yeah that underreporting of VRAM is a problem for you for sure:

Quote
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: ION, 64 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2
     clockRate = 1200000
setiathome_CUDA: device 1 not have enough available global memory. Only found 67108864
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device cannot be used
SETI@home NOT using CUDA, falling back on host CPU processing

It actually did what it's supposed to.  which is a surprise, as I'm still probing at the memory initialisation sequence.  Resolving why your ION only reports 64MiB should solve your issue.

Jason

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 6
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 236
Total: 236
Powered by EzPortal