Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: _heinz on 18 Jun 2011, 01:34:50 pm

Title: x38g reports
Post by: _heinz on 18 Jun 2011, 01:34:50 pm
Hi all,
if you have problems and errors with x38g please post here.

heinz

~~~~~~~~~~~~~
<core_client_version>6.12.26</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: GeForce GT 540M, 961 MiB, regsPerBlock 32768
     computeCap 2.1, multiProcs 2
     clockRate = 1500000
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GT 540M is okay
SETI@home using CUDA accelerated device GeForce GT 540M
Priority of process raised successfully
Priority of worker thread raised successfully
Cuda Active: Plenty of total Global VRAM (>300MiB).
 All early cuFft plans postponed, to parallel with first chirp.

 )       _   _  _)_ o  _  _
(__ (_( ) ) (_( (_  ( (_ ( 
 not bad for a human...  _)

Multibeam x3 g Preview, Cuda 3.20

Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is :  2.592522
CUFFT error in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_fft.cu' in line 125.
Cuda error 'cudaFree(dev_PowerSpectrumSumMax)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 522 : unknown error.
Cuda error 'cudaFree(dev_outputposition)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 524 : unknown error.
Cuda error 'cudaFree(dev_flagged)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 526 : unknown error.
Cuda error 'cudaFree(dev_NormMaxPower)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 528 : unknown error.
Cuda error 'cudaFree(dev_PoTPrefixSum)' in file '@ :/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 530 : unknown error.
Cuda error 'cudaFree(dev_PoT)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 532 : unknown error.
Cuda error 'cudaFree(dev_GaussFitResults)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in lin534 : unknown error.
Cuda error 'cudaFree(dev_t_PowerSpectrum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 536 : unknown error.
Cuda error 'cudaFree(dev_PowerSpectrum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 538 : unknown error.
Cuda error 'cudaFree(dev_WorkData)' in file 'c:/[Projects]/X_CudaMB/client/cudaÀ udaAcceleration.cu' in line 540 : unknown error.
Cuda error 'cudaFree(dev_flag)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 542 : unknown error.
Cuda error 'cudaFree(dev_cx_ChirpDataArray)DD in file 'c:/[Projects]/X_CudaMB/client/cuda/cwdaAcceleration.cu' in line 546 : unknown error.
Cuda error '»óudaFree(dev_cx_DataArray)' in file 'c:/[Projects]/X_@adaMB/client/cuda/cudaAcceleration.cu' in line 548 : unknown error.
Cuda sync'd & freed.

</stderr_txt>
]]>
Title: Re: x38g reports
Post by: Jason G on 18 Jun 2011, 02:04:50 pm
How many of these have you had Heinz ?
Quote
CUFFT error in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_fft.cu' in line 125.

Looks similar to something that crops up on my GTX260 from time to time.  Too early in my {ïvestigation for that to say for sure what causes it, as it only seems to happen sometimes & only on certain GPUs.  I'm currently digging at the chirp directly preceding those calls, and haven't come across an issue that could cause it there, but I'll keep my eyes out.
Title: Re: x38g reports
Post by: Jason G on 18 Jun 2011, 04:12:13 pm
to now it works very perfect und very fast

on XP with GTX 460 (266.58) i think this is 3 Minutes faster as x32f

Thanks! The 460's & 560ti's are showing vo be very nice cards, looks like the choice for replacing my 260 if I need to.  I believe they Can `o re yet.
Title: Re: x38g reports
Post by: perryjay on 18 Jun 2011, 06:27:50 pm
Oops,

http://sediat`ome.beeley.edu/forum_thread.php?id=63429
Title: Re: x38g reports
Post by: Jason G on 18 Jun 2011, 06:42:32 pm
Thanks,
   I've been working via PM with Slavac ( http://setiathome.berkeley.edu/show_user.php?usurid=9475661 , 2 x 560ti's )   to isolate how much problems might relate to something fixable in the application, and how much to something else ( i.e. drivers &/or hardware ).   

He is reporting no downclocks with the new app & Drivers, but still some of the FFT Errors, usually a few or more per day, as with my P4 with GTX 260.  That is why I suspect a deeper issue with either the drivers, library or SDK , but am trying to isolate it to something more specific & find out if there is anything in surrounding code that could be antagonising the issue. 

Nothing yet, but I'll keep searching & probably update the app or make other recommendations if I find a way to avoid them on those cards (including my 260).
Title: Re: x38g reports
Post by: perryjay on 18 Jun 2011, 06:59:31 pm
I'm not sure how widespread the prblem is but I have been noticing 560Tis showing up in my results as giving -9s where I and another finished clean. Hadn't looked close enough to figure out if it was just one or two of them or re spread out over the whole type. Hope you find the problem as it looks like a really nice card otherwise. After peading that thread it seems like they have found a pretty good workaround.
Title: Re: x38g reports
Post by: Jason G on 18 Jun 2011, 07:24:09 pm
Yeah upping the voltage slightly is a likely solution for those -9's, especially when running several tasks at once.  Probably the manufacturers kept the voltage down to meet a power spec or something.

I'll likely know more in a few days, but it does look like there are timing sensitivity issues as well, possibly to do with memory controller load.  I'm testing a build on my p4/GTX260 now that both gives more descriptive error messages, and guts & replaces a bunch of code inherited from nVidia dev rk preceeding the FFTs that fail.  That 260 has steadily exhibited the issue, so has proven useful for exploring how to make the app harder.

Prior to the code-gutting excercise, I did a quick test to look for the Error code:
Quote
...
Multibeam x39 Preview, Cuda 3.20
 
Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is :  2.592398
...
A FFT launch failed (try 1), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 2), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 3), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 4), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 5), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 6), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 7), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 8), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 9), code CUFFT_EXEC_FAILED    = 0x6
A FFT launch failed (try 10), code CUFFT_EXEC_FAILED    = 0x6
CUFFT error in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_fft.cu' in line 81. code CUFFT_EXEC_FAILED    = 0x6

Googling CUT_EXEC_FAILED reveals common issues with this in the part on various cards. especially on GPUGrid, mostly with GTX 260's , so I have replaced this x39 on that host with a 'special' different x39 build that looks the same but changes a lot of code before the FFTs.  I'll be watching that for a day or so, then determine if there indeed was some crankiness in the drivers etc, or careful code leading up to the FFTs resolves the issue.

No more errors yet, touchwood, on the 260  since I changed from 'ordinary x39 with extra descriptive errors' to ' x39 with replaced code before FFTs'.  The latest errors with x39 visible (for the moment) for that host is the first kind of x39, so is part of attempting to track things down.  I'll be looking for any after 18 Jun 2011 | 16:52:11 UTC for further diagnosis/investigation.

Jason
Title: Re: x38g reports
Post by: _heinz on 19 Jun 2011, 03:01:16 am
Hi Jason,
on my ATOM- ION x38g runs still on CPU... it has 10% after 5 hours, so ~ 50 hours runtime at end.
Should I abort it ?

Maybe there is not enough memory available while initializing the GPU.
Some standalone test with test wu's are necessary.

Have BOINC 6.10.60
NVIDIA GPU 0: ION (driver version 27533, CUDA version 4000, compute capability 1.1, 64MB, 35 GFLOPS peak)
And since I installed the driver, it shows wrong value 64MB

heinz
Title: Re: x38g reports
Post by: Pepi on 19 Jun 2011, 09:29:07 am
If I may write some words :)
It looks like 560TI is little power hungry beast.  And some manufacturers think that card will operate well with lower voltage settings. But in case of SETI and other BOINC projects  it is not true. I have issue with my Gigabyte 560 TI also, and until I raise voltage to 1.0375V I was unable to get stable work. With this voltage I can do 24/7 crunching without any problems. So before you blame your app or SDK, or drivers, do two things: or downclock your GPU to 820 MHZ ( stock freq) or give to GPU more voltage , not below 1.025V. And then do testing.
Title: Re: x38g reports
Post by: Jason G on 19 Jun 2011, 11:00:14 am
Hi Jason,
on my ATOM- ION x38g runs still on CPU... it has 10% after 5 hours, so ~ 50 hours runtime at end.
Should I abort it ?
We know it has more memory than that, so stop Boinc put a driver that reports properly  & reboot.  The app should pick up properly ehrn iy sees enough TAM.  That mechanism will change in x39 series to say 'go away' if there isn't enough total, and a Boinc temporary exit ( newer BoincApi feature I'll have to update the BoincApi in use to have access to )
Title: Re: x38g reports
Post by: Jason G on 19 Jun 2011, 11:06:56 am
If I may write some words :)

Thanks Pepi, yup we tracked down the need for a voltage tweak for those in posts & it makes sense.

I'm now currently poking at a different kind of error that occurs on some GPUs *sometimes*.  As per heinz' first report of a FFT error, noting that my 260 sees the same I have modified some code & they seem to have gone away on mine.  That's the last major 'niggle' I've found under V6 operation so far, and my p4 with GTX 260 seems to have come good for ~24 hours of operation, which I'm keeping an eye on to see if it is really solved.

Jason
Title: Re: x38g reports
Post by: perryjay on 19 Jun 2011, 09:30:10 pm
Hey Jason, we validated. You got the canonical result after the fourth guy got reported.   :o

http://setiathome.berkeley.edu/workunit.php?wuid=757762089


In case anyone is wondering why I posted this, as far as I know it's the first where two of us were running the new installer. Jason G was the first and I came in third. We wondered why it didn't decide then to validate instead of sending out to another wingman.
Title: Re: x38g reports
Post by: Jason G on 19 Jun 2011, 10:21:01 pm
Woohoo!, Yay me  ;D
Title: Re: x38g reports
Post by: _heinz on 20 Jun 2011, 10:23:26 am
Hi Jason,
after 30 hours the ION ended the wu resultid=1956143930 (http://setiathome.berkeley.edu/result.php?resultid=1956143930)

have a look at it: -177 (0xffffffffffffff4f)

heinz
Title: Re: x38g reports
Post by: Jason G on 20 Jun 2011, 10:33:11 am
Yeah that underreporting of VRAM is a problem for you for sure:

Quote
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: ION, 64 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2
     clockRate = 1200000
setiathome_CUDA: device 1 not have enough available global memory. Only found 67108864
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device cannot be used
SETI@home NOT using CUDA, falling back on host CPU processing

It actually did what it's supposed to.  which is a surprise, as I'm still probing at the memory initialisation sequence.  Resolving why your ION only reports 64MiB should solve your issue.

Jason
Title: Re: x38g reports
Post by: _heinz on 20 Jun 2011, 10:55:07 am
Yeah that underreporting of VRAM is a problem for you for sure:

It actually did what it's supposed to.  which is a surprise, as I'm still probing at the memory initialisation sequence.  Resolving why your ION only reports 64MiB should solve your issue.

Jason
Version: 275.33 WHQL
Freigabedatum: 2011.06.01
the latest whql driver (http://www.nvidia.de/object/win7-winvista-32bit-275.33-whql-driver-de.html) and the driver before show wrong values, so I must go more backwards with the driver installation.

Is'nt it a shame that the the latest whql driver has such error again. Blame nvidia.  :'(
Or is it BOINC that shows the wrong value ? ?
heinz
Title: Re: x38g reports
Post by: Jason G on 20 Jun 2011, 10:57:11 am
Is'nt it a shame that the the latest whql driver has such error again. Blame nvidia.  :'(

Is that the verde drivers Heinz ?  I will update my ION2 (Not currently running Boinc) & see what that says.
Title: Re: x38g reports
Post by: _heinz on 20 Jun 2011, 11:02:27 am
Is'nt it a shame that the the latest whql driver has such error again. Blame nvidia.  :'(

Is that the verde drivers Heinz ?  I will update my ION2 (Not currently running Boinc) & see what that says.
It is not the verde
verde is there --> http://www.nvidia.de/object/notebook-win7-winvista-275.33-whql-driver-de.html
As far as I know the verde is still for laptops.
My R3600 is not a laptop

heinz
Title: Re: x38g reports
Post by: Jason G on 20 Jun 2011, 11:04:18 am
My R3600 is not a laptop

try it anyway  :D

[Edit:] downloading onto my netbook now

For the desktop listing there also seems to be a newer beta, will check out the release notes for that

Update: with verde 275.33, Boinc shows 434MiB VRAM on my ION2, so a bit less than the previous driver (That said 444MiB).

Could there be some BIOS aperture size or similar setting for you Heinz, that is limiting the reported memory ?
Title: Re: x38g reports
Post by: _heinz on 20 Jun 2011, 12:36:29 pm
The verde driver did not install.
No compatible hardware found.
Now is latest 27533 installed, in the controlpanel of NVIDIA is "Autosearch Updates" marked. 
20.06.2011 18:24:20      NVIDIA GPU 0: ION (driver version 27533, CUDA version 4000, compute capability 1.1, 64MB, 35 GFLOPS peak)

hmm... Till now I have never looked in the BIOS of the R3600.

heinz
modify:
BOINC 6.10.60 shows:
28.04.2011 12:45:52      NVIDIA GPU 0: ION (driver version 27051, CUDA version 4000, compute capability 1.1, 306MB, 35 GFLOPS peak)
anyhow curious 306 MB ?
but this was the last working version(27051 beta)
and this shows ~250
On the ION Boinc shows the driver
07.03.2011 16:36:18      NVIDIA GPU 0: ION (driver version 27032, CUDA version 4000, compute capability 1.1, 242MB, 35 GFLOPS peak)
Title: Re: x38g reports
Post by: _heinz on 20 Jun 2011, 03:20:43 pm
Hi Jason,
The Querying for a CUDA Device is so different in the OPTIMUS Technology,
 see http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_Developer_Guide_for_Optimus_Platforms.pdf
 page 3

have a look !
heinz
Title: Re: x38g reports
Post by: _heinz on 20 Jun 2011, 04:40:19 pm
hi Jason,
I installed latest nvidia beta driver on my laptop, 275.50-notebook-win7-winvista-64bit-international-beta
BOINC 6.12.26 shows:
20.06.2011 22:23:06 |  | NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 4000, compute capability 2.1, 962MB, 172 GFLOPS peak)

275.33-notebook-win7-winvista-64bit-international-whql has the same issue.
in general, there must be something wrong in the driver detection of BOINC.

Although,   I can run primegrid.

heinz
Title: Re: x38g reports
Post by: Jason G on 20 Jun 2011, 07:15:19 pm
... there must be something wrong in the driver detection of BOINC.

Good possibility.  At some stage (when I'm bored)  I'll look at how they detect that, to see if it uses driver APIs etc properly.
Title: Re: x38g reports
Post by: Claggy on 20 Jun 2011, 07:20:08 pm
... there must be something wrong in the driver detection of BOINC.

Good possibility.  At some stage (when I'm bored)  I'll look at how they detect that, to see if it uses driver APIs etc properly.
I wonder if the driver is now reporting the Real amount of RAM the Ion has, and now isn't reporting the extra system RAM the BIOS settings add,

Claggy
Title: Re: x38g reports
Post by: Jason G on 20 Jun 2011, 08:03:35 pm
I wonder if the driver is now reporting the Real amount of RAM the Ion has, and now isn't reporting the extra system RAM the BIOS settings add,

That would seem to break the WDDM driver model, which basically says you get what you're given.  I would have thought issues on one ION should appear on another.  I haven't looked if there is a BIOS setting for mine as it has 512MiB dedicated & Windows 7, so system shared amount is determined by Turbocache functionality.  I'll probably do that.

As I recall, You're good at working out complicated stuff like NewCredit  :D .  You could, if you're bored at some stage, go through the document at http://msdn.microsoft.com/en-us/windows/hardware/gg487348.aspx  , to see if there's anything related , or especially anything I missed that I might need to know when working out the difference to XP Driver Model ( regarding the performance jump at various drivers & XP-WDDM performance difference with older simpler application code, etc )

[Edit:] checked my ION Netbook, no video related settings there at all, oh well
Jason

Title: Re: x38g reports
Post by: _heinz on 21 Jun 2011, 04:19:32 am
Hi Jason,
with my ION R3600 I'm going back to driver 270.32
21.06.2011 10:06:40      NVIDIA GPU 0: ION (driver version 27032, CUDA version 4000, compute capability 1.1, 242MB, 35 GFLOPS peak)
If I'm lookink up with "AIDA64 Extreme Edition" it shows:
Informationsliste   Wert
Video Adapter Eigenschaften   
Gerätebeschreibung   NVIDIA ION
Adapterserie   ION
BIOS Version   Version 62.79.63.0.1
Chiptyp   ION
DAC Typ   Integrated RAMDAC
Treiberdatum   20.02.2011
Treiberversion   8.17.12.7032 - nVIDIA ForceWare 270.32
Treiberanbieter   NVIDIA
Speichergröße   256 MB
   
Installierter Treiber   
nvd3dum   8.17.12.7032 - nVIDIA ForceWare 270.32
nvwgf2um   8.17.12.7032
nvwgf2um   8.17.12.7032
   
Video Adapter Hersteller   
Firmenname   NVIDIA Corporation
Produktinformation   http://www.nvidia.com/page/products.html
Treiberdownload   http://www.nvidia.com/content/drivers/drivers.asp
Treiberupdate   http://www.aida64.com/driver-updates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now I will try x38g again
edit:
x38g is running on GPU now. Report result as soon as ready.

heinz
Title: Re: x38g reports
Post by: _heinz on 21 Jun 2011, 04:50:45 am
What we learn from this:
270.32 is not a WDDM driver and BOINC shows 242MB
The newer WDDM driver will not detected properly by BOINC and does not show correct values of VRAM on any of my systems(ION R3600, and I3 Geforce GT540M)
BOINC on I3 shows:
20.06.2011 22:23:06 |  | NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 4000, compute capability 2.1, 962MB, 172 GFLOPS peak)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
"AIDA64 Extreme Edition" shows on my I3:
Informationsliste   Wert
Grafikprozessor Eigenschaften   
Grafikkarte   nVIDIA GeForce GT 540M (Medion)
GPU Codename   GF108M
PCI-Geräte   10DE-0DF4 / 17C0-10E2  (Rev A1)
Transistoren   585 Mio.
Fertigungstechnologie   40 nm
Gehäusefläche   114 mm2
Bustyp   PCI Express 2.0 x16 @ x16
Speichergröße   1 GB
GPU Takt (Geometric Domain)   750 MHz
GPU Takt (Shader Domain)   1500 MHz
RAMDAC Takt   400 MHz
Pixel Pipelines   8
Texturen Mapping Einheiten   16
Unified Shaders   96  (v5.0)
DirectX Hardwareunterstützung   DirectX v11
Pixel Füllrate   6000 MPixel/s
Texel Füllrate   12000 MTexel/s
   
Speicherbus-Eigenschaften   
Bustyp   DDR3
Busbreite   128 Bit
Tatsächlicher Takt   450 MHz (DDR)
Effektiver Takt   900 MHz
Bandbreite   14.1 GB/s
   
Auslastung   
Grafikprozessor (GPU)   99%
Speichercontroller   0%
Video Engine   0%
   
nVIDIA ForceWare Clocks   
Standard 2D   Grafikprozessor (GPU): 50 MHz, Shader: 101 MHz, Speicher: 135 MHz
Low-Power 3D   Grafikprozessor (GPU): 202 MHz, Shader: 405 MHz, Speicher: 324 MHz
Performance 3D   Grafikprozessor (GPU): 750 MHz, Shader: 1500 MHz, Speicher: 900 MHz
   
Grafikprozessorhersteller   
Firmenname   NVIDIA Corporation
Produktinformation   http://www.nvidia.com/page/products.html
Treiberdownload   http://www.nvidia.com/content/drivers/drivers.asp
Treiberupdate   http://www.aida64.com/driver-updates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Title: Re: x38g reports
Post by: Pepi on 21 Jun 2011, 05:04:33 am
hi Jason,
I installed latest nvidia beta driver on my laptop, 275.50-notebook-win7-winvista-64bit-international-beta
BOINC 6.12.26 shows:
20.06.2011 22:23:06 |  | NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 4000, compute capability 2.1, 962MB, 172 GFLOPS peak)

275.33-notebook-win7-winvista-64bit-international-whql has the same issue.
in general, there must be something wrong in the driver detection of BOINC.

Although,   I can run primegrid.

heinz


On my desktop  MB have GPU also. I use MB GPU for usually work, and GTX 560TI only for crunching. When nothing is attached to 560 TI then I have same message as you: driver version unknown
but 560TI works without problem in SETI.
Title: Re: x38g reports
Post by: _heinz on 21 Jun 2011, 08:47:44 am
Hi Jason,
all 3 wu's are done now on the ION (driver 270.32)
hostid=5510631 (http://setiathome.berkeley.edu/show_host_detail.php?hostid=5510631)
resultid=1956143932 (http://setiathome.berkeley.edu/result.php?resultid=1956143932)
resultid=1956143934 (http://setiathome.berkeley.edu/result.php?resultid=1956143934)
resultid=1956143936 (http://setiathome.berkeley.edu/result.php?resultid=1956143936)

Cuda Active: All 15 paranoid early cuFft plans succeeded.

what does it mean ? all can be used ?

woundering about
<core_client_version>6.10.58</core_client_version>
reports:
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.

Flopcounter: 42371081878.117691

Spike count:    30
Pulse count:    0
Triplet count:  0
Gaussian count: 0
called boinc_finish
~~~~~~~~~~~~~~~~~~~~~~~~~~~

x38g reports:
Multibeam x38g Preview, Cuda 3.20

Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is :  2.720454

Flopcounter: 10431952929543.820000

Spike count:    2
Pulse count:    0
Triplet count:  1
Gaussian count: 0
Worker preemptively acknowledging a normal exit.->
called boinc_finish
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->

So no validation will happen for me.

heinz
Title: Re: x38g reports
Post by: Mike on 21 Jun 2011, 09:31:39 am
So far i can see your results looks O.K.
Your winman using fermi app and produces -9 errors.
So the unit will be sent to third host.
Title: Re: x38g reports
Post by: perryjay on 21 Jun 2011, 09:36:21 am
Mike beat me to it. You have the same wingman on all three of those work units. He is running a 560TI and is apparently throwing out bad -9 results. Hope the next in line does better. You should get credit no problem on those.
Title: Re: x38g reports
Post by: perryjay on 21 Jun 2011, 12:21:20 pm
Got one invalid result  http://setiathome.berkeley.edu/workunit.php?wuid=761506607  Not much to it, I found a pulse the other two wingmen didn't. It was when I was running the 0.38e flavor. Thought I would mention it just in case. It's the only invalid result I've got so far.
Title: Re: x38g reports
Post by: Mike on 21 Jun 2011, 06:22:19 pm

Just keep an eye on it perryjay.
Title: Re: x38g reports
Post by: Jason G on 21 Jun 2011, 09:53:13 pm
Got one invalid result  http://setiathome.berkeley.edu/workunit.php?wuid=761506607  Not much to it, I found a pulse the other two wingmen didn't. It was when I was running the 0.38e flavor. Thought I would mention it just in case. It's the only invalid result I've got so far.

Yep, as mentioned on main, looks like the single, likely low power, pulse that you found, where the others didn't, would be simply due to the innaccurate old nVidia app chirp.  So it fits the expected pattern.  In science terms yours is 'more correct' of course, and would likely have matched a CPU app wingman strongly, but being ganged up on by 2 older apps that way is going to happen during the transition period.

Jason
Title: Re: x38g reports
Post by: Josef W. Segur on 22 Jun 2011, 12:26:41 am
Got one invalid result  http://setiathome.berkeley.edu/workunit.php?wuid=761506607  Not much to it, I found a pulse the other two wingmen didn't. It was when I was running the 0.38e flavor. Thought I would mention it just in case. It's the only invalid result I've got so far.

Yep, as mentioned on main, looks like the single, likely low power, pulse that you found, where the others didn't, would be simply due to the innaccurate old nVidia app chirp.  So it fits the expected pattern.  In science terms yours is 'more correct' of course, and would likely have matched a CPU app wingman strongly, but being ganged up on by 2 older apps that way is going to happen during the transition period.

Jason

The one reported pulse doesn't fully explain the invalid judgement, since "weakly similar" merely needs half the signals to match. The task was VHAR, so there should have been a best_gaussian with all zero values, that's a gimme match. The reported pulse would be repeated as best_pulse, and if the difference were due to it being only a tiny bit above threshold that best_pulse should match the others close enough. And finally there would be a best_spike. IOW 1 dodgy pulse could have easily had 3 acceptable best_* signals to yield weakly similar.  To get invalid 3 of the 4 must not have found a match in the other results.

OTOH, we have no way of knowing the result file didn't get corrupted server-side or something like that. However, I'd expect some indication from other users of similar problems in that case. It's a puzzle which cannot be solved now, just watch to see if it happens again with x38g.

The one on http://setiathome.berkeley.edu/workunit.php?wuid=762393888 is a loss as far as analysis goes, there's no stderr information from x38g.
                                                           Joe
Title: Re: x38g reports
Post by: Jason G on 22 Jun 2011, 01:41:25 am
OTOH, we have no way of knowing the result file didn't get corrupted server-side or something like that. However, I'd expect some indication from other users of similar problems in that case. It's a puzzle which cannot be solved now, just watch to see if it happens again with x38g.

Hmmm, the missing stderr information to me indicates a few possibilities.  Either the improved exit code is not functioning as designed (due to system specific issue or other problem in the code itself), there is a communication issue of some sort (I suppose the server load could have some part there), or indeed the server itself lost that information.   I've seen no indication that result files wouldn't follow the same behaviour as stderr contents.

I'm finding that as the cuda app issues get rarer, they do get harder to diagnose when they appear.  One thing that is noticeable is that users are finding their errors & inconclusives more quickly now that the web pages display in categorised form  ;D
Title: Re: x38g reports
Post by: perryjay on 22 Jun 2011, 09:59:06 am
I noticed the missing stderr not only for my result but also one of the others as well. I didn't think it would do you much good that way but from Jason's comment I guess I should have mentioned it here too.
Title: Re: x38g reports
Post by: perryjay on 23 Jun 2011, 11:01:23 am
Well, I woke up this morning to another downclocking. I had noticed last night a general sluggishness to my computer but decided not to reboot. Guess I should have. I tend to leave everything running when I quit the computer so I would guess it just built up until something had to give. I don't think it was downclocked for very long so I didn't lose too much. After a reboot everything is back and running good.


EDIT  I spoke too soon. It down clocked again. I've rebooted again and it is back up to where it is supposed to be. Guess I will see if it will hold this time. Gotta go cut the grass so I will be away for about an hour. Hope it doesn't go down in that length of time.
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 11:41:37 am
Can you catch a task name that's in progress when it does it next time ?  When the result is uploaded we could then see if the stderr says anything useful.
Title: Re: x38g reports
Post by: perryjay on 23 Jun 2011, 12:48:26 pm
This is one that took forever, not sure if it's the one you want.  http://setiathome.berkeley.edu/workunit.php?wuid=765100017

Here's another one that completed and validated  http://setiathome.berkeley.edu/workunit.php?wuid=765100083


It seemed to effect my CPU times too but that is hard to tell for sure. This one http://setiathome.berkeley.edu/workunit.php?wuid=766957670 seemed to be way too long. I was finishing a couple of APs at the time it happened so I don't have many CPU tasks done . Since the APs were within an hour or so of completion it didn't effect their runtime by much and I don't know exactly what time it happened the first time.
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 01:33:01 pm
Thanks,
   Clearly those runtimes indicate something freaked out.  Despite that, there's no visible indication in stderr apart from the excessive runtime on the task report itself, which means I'll need to instrument every kernel launch to find out what's happening.  That will take a few days to go through the whole code, then if you;re agreeable I'll drag you into the dev area to pin down the exact point(s) of downclock.  I'll do so by using a build instrumented to check for kernel errors and subsquently print the brand new, presumably downclocked, clock speed after the point of failure.   

Can you confirm (once again) that these are 'sticky downclocks' requiring  a reboot to clear ?

Jason
Title: Re: x38g reports
Post by: perryjay on 23 Jun 2011, 01:37:06 pm
 :o  The, the dev area????  Can I bring a gun?

Yes, they needed a reboot. Well, at least the first one did. I just went ahead and rebooted when I saw the one today. Figured it was the easiest way to get going again fast. So far this time everything is running okay again now.
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 01:43:05 pm
:o  The, the dev area????  Can I bring a gun?

Yes, they needed a reboot. Well, at least the first one did. I just went ahead and rebooted when I saw the one today. Figured it was the easiest way to get going again fast. So far this time everything is running okay again now.

OK, but we won't wait for it to downclock again to try something.

Please swap in the attached, deliberately slightly dialled back for diagnostic purposes, build, while I spend the next few days instrumenting the code.  If this one doesn't initiate downclocks on the card in the meantime, then it'll add some possibilities to the investigation, directing me to optimise a particular piece of code I've been hesitant to touch so far (so that part remained stock until this dialled back build). 

(x39c, dialled back build attached for diagnostic purposes)

[Edit:] Old build removed.  Please use the updated x39d build at:
http://lunatics.kwsn.net/12-gpu-crunching/x38g-reports.msg39407.html#msg39407
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 01:53:23 pm
FYI:  there is an easy way to swap in builds if you're confused by the app_info.
Title: Re: x38g reports
Post by: perryjay on 23 Jun 2011, 01:57:55 pm
Okay, just to be sure, where does it go? Do I just replace all instances of  <file_name>Lunatics_x38g_win32_cuda32.exe</file_name> in the app info or do I need to put it somewhere else?

Easy way? What's that? I've never seen such a thing. Nothing is easy for a n00b like me!   ;D
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 02:03:14 pm
Easy way? What's that? I've never seen such a thing. Nothing is easy for a n00b like me!   ;D

Easy way:
- Stop Boinc
- Drop the new exe, unzipped, into the project folder
- edit the MBCuda.aistub file in notepad
      - use the edit->replace function to replace all occurrances of x38g with x39c ,
      - [change the counts too if desired]
      - save & exit notepad
- run the aimerge.cmd batch file that resides in the project directory
- start Boinc & check task manager that x39c runs.

[Edit:] added mention of counts
Title: Re: x38g reports
Post by: perryjay on 23 Jun 2011, 02:46:24 pm
Gawd I'm dumb. First I put the zipped file in, then I forgot the .exe at the end. Okay, now it's running the 39c build.
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 02:59:24 pm
Gawd I'm dumb. First I put the zipped file in, then I forgot the .exe at the end. Okay, now it's running the 39c build.
  Cheers.  If something happens with that one we should hopefully get a little more info... If not then it points straight to the code I dialled back for refinement.  Either way, I'll be going through the whole lot making things at least print the revised clock rate & location in the code if something detectable happens.
Title: Re: x38g reports
Post by: perryjay on 23 Jun 2011, 03:07:39 pm
I can't say how long I'm going to have to run this. Like I said, I hadn't rebooted for awhile and everything had started to slow down before it down clocked. I'll just let it run and see.
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 04:15:55 pm
Just noticed something Perryjay:   The stderr task output indicates a core clock of 900MHz.  Firstly, is that correct ? and what core voltage is that set to ? (assuming you have a monitor/OC tool such as MSI afterburner installed)

Jason
Title: Re: x38g reports
Post by: perryjay on 23 Jun 2011, 04:36:08 pm
Yes, I'm OCed to 900/1800/1804  I have CPUID Hardware Monitor. The only voltage I find with that is  VINO 1.11v. Is that what you mean? I can go looking for MSI Afterburner if you want.
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 04:44:34 pm
Nah that's fine, thanks.  Just mostly wanted to see if the clock was reporting correctly.  Yeah 1.11V sounds like the core, and should be fine at 900MHz for that card, but it helps to have some reference if something turns up down the road.

Jason.
Title: Re: x38g reports
Post by: perryjay on 23 Jun 2011, 04:55:20 pm
Well, I got Afterburner but it doesn't show current voltage in the little window. Guess you have to move the slide to show anything and that I am not going to do.   ;D


Oh, sorry bout not mentioning the over clock. I have mentioned it so many times before I just figured you knew. Dumb move on my part again!
Title: Re: x38g reports
Post by: Jason G on 23 Jun 2011, 04:57:23 pm
LoL, give us a screenshot before moving any sliders, if you could  :D
Title: Re: x38g reports
Post by: perryjay on 24 Jun 2011, 10:27:25 am
Jason, does this new build do anything about clearing up the -12 issue? I just found this WU marked invalid, too many bugs. http://setiathome.berkeley.edu/workunit.php?wuid=764389127  I was the only one running the V 0.38g and the only one to complete it without getting -12.
Title: Re: x38g reports
Post by: Josef W. Segur on 24 Jun 2011, 12:22:09 pm
Jason, does this new build do anything about clearing up the -12 issue? I just found this WU marked invalid, too many bugs. http://setiathome.berkeley.edu/workunit.php?wuid=764389127  I was the only one running the V 0.38g and the only one to complete it without getting -12.

AFAIK x38g and x32f have the same improvements relative to triplet handling, they allow 1 more than stock before committing suicide. Of the 3 triplets found, two must have been in the same array; stock fails on that, x3xx Lunatics doesn't.
                                                           Joe
Title: Re: x38g reports
Post by: Jason G on 24 Jun 2011, 04:27:16 pm
Jason, does this new build do anything about clearing up the -12 issue? I just found this WU marked invalid, too many bugs. http://setiathome.berkeley.edu/workunit.php?wuid=764389127  I was the only one running the V 0.38g and the only one to complete it without getting -12.

AFAIK x38g and x32f have the same improvements relative to triplet handling, they allow 1 more than stock before committing suicide. Of the 3 triplets found, two must have been in the same array; stock fails on that, x3xx Lunatics doesn't.
                                                           Joe
  Yep, that's Joe's extension, which continues to serve very well.   Ultimately, I have as a goal to converge CPU & GPU results as much as possible/practical/reasonable, even though the nature of floating point arithmetic & the hardware it is executed on pretty much guarantees some amount of variation when different algorithms are used for the same set of computations.  That means several of the GPU kernels will end up being reengineered to some degree, and in the meantime can expose cross-platform limitations that were instilled in the original CPU code as well (as with spikes accuracy) due to not having forseen that vastly different hardware would one day be trying to match results. 

This kindof juggling is proving to have annoying side effects for the interim period, though my hope is that when seti@home V7.x is released, that the intermediate pain will have proved worthwhile, even if there are still wrinkles to iron out.

One thing to keep in mind, with classical control system redundancy techniques like this used in 'real' systems like aircraft, is that the redundancy is usually specified to have different authors & hardware manufacturers, and that they must agree within accepted variation.   With the inconclusives & subsequent reissues we are seeing even between results that look pretty much the same to all external visible features, we are seeign that validation mechanism 'working' as it should.

My current standing is that we are seeing legacy application limitations in combination with new hardware variations add up to 'a circus' of marginally close answers.    I feel that the base design change decisions for legacy work & the intent to converge cross platform results moving into V7 will prove the right direction, though I am also certain that some new architectures present further difficulties yet to be divined.

Jason
Title: Re: x38g reports
Post by: perryjay on 25 Jun 2011, 01:16:14 pm
Jason, in case you miss it in the NC forum, I've decided to go back to two at a time. Not long after I posted over there I started getting sluggish again. No down clock but everything running very slow. I shut down Firefox but no change, so I also shut down Thunderbird. Still nothing so I shut down SETI and closed BOINC manager. When I restarted BM and SETI everything came back to normal. I let it run for awhile with no problem but I get the feeling my little 450 doesn't like running three work units at a time 24/7. It seems to like to take a little break every now and again. I'll see how it likes two at a time again and let you know how it goes.
Title: Re: x38g reports
Post by: Jason G on 25 Jun 2011, 01:18:09 pm
OK, no worries.  Responded over there.  If it happens with 2 as well we might have to dig at that as well, though is probably just related to things that need to be done next anyway.
Title: Re: x38g reports
Post by: perryjay on 26 Jun 2011, 06:04:52 pm
Well over 24 hours now and everything is going along great. Guess I was just pushing the limit by running three at a time.
Title: Re: x38g reports
Post by: Jason G on 26 Jun 2011, 06:13:47 pm
Well over 24 hours now and everything is going along great. Guess I was just pushing the limit by running three at a time.

OK.  Keep an eye on things when you can.  With things running a bit more smoothly, I am currently starting a rewrite of the problem pulsefinds once and for all (i.e. VLAR & display lag related).  That's going to take time & care, but at least the experience garnered so far should see things get a lot better from this point, in terms of both reliability & performance.

Jason
Title: Re: x38g reports
Post by: Jason G on 27 Jun 2011, 03:00:58 am
For those following this thread & using the x39c diagnostic, please update to the attached build with some added diagnostic info printed on errors.

[Removed old build]
Title: Re: x38g reports
Post by: perryjay on 27 Jun 2011, 02:16:50 pm
Got it Jason, now if only SETI would cooperate. I've just started getting the can't connect to server message when trying to upload. I hope it's Hurricane Electric working on the problem. But anyway, another day of no problems, seems dropping back to two tasks has cured my problem.



Okay, finally got some reports through. Here's the link to one of the validated WUs I finished on x39d just in case you wanted to look at it.  http://setiathome.berkeley.edu/result.php?resultid=1967582734
Title: Re: x38g reports
Post by: Jason G on 27 Jun 2011, 07:03:25 pm
In the meantime, I've noticed Cuda 'freaking out' here on the 480 with newer builds ... but only when FireFox is Running... weird. No Errors, but certainly seems to stick in some funky lag-mode. 

 I'm trying your solution of stepping down from 3 to 2 tasks.   If that helps I'd take it as an indication that the loading presented by the newer builds is indeed substantially higher overall.  I may have to retest which number of tasks gives the most throughput here, as 2 task loading seems to be >95% now.  I wasn't expecting that to change until later when I get a bit more optimisation in... Will see.

[Edit:]  Stepping down to 2 seems to have helped here too, will keep an eye on it for a while.

Jason
Title: Re: x38g reports
Post by: perryjay on 27 Jun 2011, 08:06:52 pm
Quote
. No Errors, but certainly seems to stick in some funky lag-mode. 

If you mean it seems to stick for a little while, I'm seeing that too. Mine seems to stick at around 96 to 98% and hold for somewhere around 30 seconds to a minute then pick up and run on to completion. I haven't tried it with firefox closed and I don't know exactly how often this happens.
Title: Re: x38g reports
Post by: Jason G on 27 Jun 2011, 08:16:17 pm
If you mean it seems to stick for a little while, I'm seeing that too.

Yeah that, & only with firefox running, when I run 3 tasks at once.  All fine so far with 2 tasks at once, but will periodically try to induce the behaviour.

  I've just now upgraded to the newer Beta drivers (just to throw a confusing change into the mix). I will satisfy myself that all is operating normally with 2 tasks & heavy firefox usage, then try reproduce the behaviour with 3 tasks running.  If it doesn't reoccur I'll pin it on something to do with 275.33 under heavy load, if it does then the increased load of updated firefox & newer apps.

[Update:]  Back up to 3 tasks at once with the 275.50 beta drivers.  No sign of weirdness yet, will thrash firefox tabs periodically & see what happens. 
[Update2:]  That didn't take long.  Poking at firefox for 5 minutes switching between tabs repeatedly did induce the behaviour.  Going back down to 2 to watch that setting again. It looks like we're creating a slightly heftier GPU load  :D  Oh well.

Jason
Title: Re: x38g reports
Post by: arkayn on 27 Jun 2011, 11:04:46 pm
Running 39d on my 460, luckily it is not my main surfing computer and mostly just crunches.
Title: Re: x38g reports
Post by: Ghost0210 on 28 Jun 2011, 03:28:18 am
In the meantime, I've noticed Cuda 'freaking out' here on the 480 with newer builds ... but only when FireFox is Running... weird. No Errors, but certainly seems to stick in some funky lag-mode. 

 

I've noticed the same with Firefox, but only with either 4 or 5. The 3.6.x versions don't seem to have any effect on the builds.
If I try to open Firefox with the builds running (from around the mid x38 builds) Firefox will hang for around 10 seconds then open and the applications will slow for around 20 seconds and GPU utiliisation will dropm from 93%+ down to ~87\88% utilisation.
Then as soon as I close Firefox, all is good again. Haven't seen this behaviour with either IE9 or Chrome as yet
AS I've been testing Raistmer NV r521 build I thought it could be related to this so wasn't exactly sure what could have been causing this
Title: Re: x38g reports
Post by: Jason G on 28 Jun 2011, 03:45:53 am
I think we should petition MS & Mozilla to stop trying to use our precious GPU resources  :P
Title: Re: x38g reports
Post by: Ghost0210 on 28 Jun 2011, 03:58:38 am
I think we should petition MS & Mozilla to stop trying to use our precious GPU resources  :P

So inconsiderate of them  ;D
Title: Re: x38g reports
Post by: Josef W. Segur on 28 Jun 2011, 12:29:08 pm
I think we should petition MS & Mozilla to stop trying to use our precious GPU resources  :P

Heresy! There's no more important use of computing resources than web browsing!

I don't know which browser started the "hardware acceleration" thing, but the best we can hope for is that they provide an option to turn it off.
                                                                   Joe
Title: Re: x38g reports
Post by: Jason G on 28 Jun 2011, 01:01:04 pm
Ahah!,

firefox's built in  about:support (http://about:support) page shows:

GPU Accelerated Windows1/1 Direct3D 10

on mine, now to find out how to disable that rubbish...
Title: Re: x38g reports
Post by: Mike on 28 Jun 2011, 01:03:10 pm
I think we should petition MS & Mozilla to stop trying to use our precious GPU resources  :P

Heresy! There's no more important use of computing resources than web browsing!

I don't know which browser started the "hardware acceleration" thing, but the best we can hope for is that they provide an option to turn it off.
                                                                   Joe

Agreed.
Thats one of the reasons i stick with firefox 3.
Title: Re: x38g reports
Post by: Jason G on 28 Jun 2011, 01:06:34 pm
Found it.  In firefox 5.0 the setting is on the advanced options page.  unticking 'Use hardware acceleration when available' has resulted in about:support showing now:

GPU Accelerated Windows0/1 

Cranking the 480 back up to 3 tasks  :D

[Edit:] Found an equivalent looking setting in IE9's advanced options as well.  ticked "Use Software Rendering Instead of GPU rendering" & restarted the browser as directed by the fine print.  Hah! eat cpu cycles browsers  :P
Title: Re: x38g reports
Post by: Claggy on 28 Jun 2011, 02:03:18 pm
[Edit:] Found an equivalent looking setting in IE9's advanced options as well.  ticked "Use Software Rendering Instead of GPU rendering" & restarted the browser as directed by the fine print.  Hah! eat cpu cycles browsers  :P
I switched that off on my Laptop's 128Mb 8400M GS almost as soon as IE9 came out as it made the Desktop Very Laggy when collatz was running,
i was going to post today asking if Firefox 4 & 5 had a similar option, but afternoon snooze got in the way,  ::)

Claggy
Title: Re: x38g reports
Post by: perryjay on 28 Jun 2011, 02:47:56 pm
Okay, switched off and back to three for me too. Be interesting to see if I can hold up at this rate.


So much for that idea. Noticed my internet slowing then heard my fans slowing down. Checked SETI and saw the to completion time rising instead of falling. Checked EVGA Precision and saw my temp and fan speed was down but it did not downclock. I went ahead and shut down the SETI Client and BM, switched back to two at a time and things are running smoothly again. This poor little GTS 450 1GB just can't handle three at a time.

One more little note, I did not shut down Firefox.  I just made my changes to SETI and started it back up.  Firefox is running better now too.
Title: Re: x38g reports
Post by: Jason G on 28 Jun 2011, 04:22:57 pm
This poor little GTS 450 1GB just can't handle three at a time....

Oh! The penny has dropped.  I've seen a VRAM utilisation blowout here & 3 tasks seems to be using way too much.  Over 1.4GiG VRAM used  :o ::)   That was unintentional & likely you'll be able to go back to 3 once I figure out what has happened there (& fix it).  No way should we be using that much per task, and indeed a 1 Gig card won't accomodate 3 strangely greedy instances.
Title: Re: x38g reports
Post by: perryjay on 28 Jun 2011, 04:33:40 pm
While it ran, it ran good. I was only losing about a minute and a half over two at a time by running three. That's running shorties, I'm in the middle of the shorty storm right now. It would really be great if you find the problem and get us going again.
Title: Re: x38g reports
Post by: Jason G on 28 Jun 2011, 04:36:24 pm
While it ran, it ran good. I was only losing about a minute and a half over two at a time by running three. That's running shorties, I'm in the middle of the shorty storm right now. It would really be great if you find the problem and get us going again.
  Oh I'll find it alright  :D  There's some V7 issues to resolve as well, but I am a stickler for trying to shrink memory footprints, simply because I prefer computation over RAM.  RAM's Slow  ;)  The chances of this weird build running on 256MiB cards is currently zero  ;D
Title: Re: x38g reports
Post by: Ghost0210 on 28 Jun 2011, 04:36:30 pm
Here's what happened when I tried to run 3 at a time  

Quote
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: GeForce GTX 465, 993 MiB, regsPerBlock 32768
     computeCap 2.0, multiProcs 11
     clockRate = 1500000
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GTX 465 is okay
SETI@home using CUDA accelerated device GeForce GTX 465
Priority of process raised successfully
Priority of worker thread raised successfully
Cuda Active: Plenty of total Global VRAM (>300MiB).
 All early cuFft plans postponed, to parallel with first chirp.

 )       _   _  _)_ o  _  _
(__ (_( ) ) (_( (_  ( (_ (  
 not bad for a human...  _)

Multibeam x39d Preview, Cuda 3.20

Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is :  2.589599
VRAM:              cudaMalloc((void**) &dev_cx_DataArray, 1048576x       8bytes =    8388608bytes, offs256=0, rtotal=   8388608bytes
VRAM:         cudaMalloc((void**) &dev_cx_ChirpDataArray, 1179648x       8bytes =    9437184bytes, offs256=0, rtotal=  17825792bytes
VRAM:                      cudaMalloc((void**) &dev_flag,       1x       8bytes =          8bytes, offs256=0, rtotal=  17825800bytes
VRAM:                  cudaMalloc((void**) &dev_WorkData, 1179648x       8bytes =    9437184bytes, offs256=0, rtotal=  27262984bytes
VRAM:             cudaMalloc((void**) &dev_PowerSpectrum, 1048576x       4bytes =    4194304bytes, offs256=0, rtotal=  31457288bytes
VRAM:           cudaMalloc((void**) &dev_t_PowerSpectrum, 1048584x       4bytes =    1048608bytes, offs256=0, rtotal=  32505896bytes
VRAM:           cudaMalloc((void**) &dev_GaussFitResults, 1048576x      16bytes =   16777216bytes, offs256=0, rtotal=  49283112bytes
VRAM:                       cudaMalloc((void**) &dev_PoT, 1572864x       4bytes =    6291456bytes, offs256=0, rtotal=  55574568bytes
VRAM:              cudaMalloc((void**) &dev_PoTPrefixSum, 1572864x       4bytes =    6291456bytes, offs256=0, rtotal=  61866024bytes
VRAM:              cudaMalloc((void**) &dev_NormMaxPower,   16384x       4bytes =      65536bytes, offs256=0, rtotal=  61931560bytes
VRAM:                   cudaMalloc((void**) &dev_flagged, 1048576x       4bytes =    4194304bytes, offs256=0, rtotal=  66125864bytes
VRAM:            cudaMalloc((void**) &dev_outputposition, 1048576x       4bytes =    4194304bytes, offs256=0, rtotal=  70320168bytes
VRAM:       cudaMalloc((void**) &dev_PowerSpectrumSumMax,  262144x      12bytes =    3145728bytes, offs256=0, rtotal=  73465896bytes
VRAM:         cudaMallocArray( &dev_gauss_dof_lcgf_cache,       1x    8192bytes =       8192bytes, offs256=176, rtotal=  73474088bytes
VRAM:          cudaMallocArray( &dev_null_dof_lcgf_cache,       1x    8192bytes =       8192bytes, offs256=72, rtotal=  73482280bytes
VRAM:           cudaMalloc((void**) &dev_find_pulse_flag,       1x       8bytes =          8bytes, offs256=0, rtotal=  73482288bytes
VRAM:             cudaMalloc((void**) &dev_t_funct_cache, 1966081x       4bytes =    7864324bytes, offs256=0, rtotal=  81346612bytes
Thread call stack limit is: 1k
CudaThreadSetLimit() returned code

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00529977 read attempt to address 0x00000002

Engaging BOINC Windows Runtime Debugger...

setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: GeForce GTX 465, 993 MiB, regsPerBlock 32768
     computeCap 2.0, multiProcs 11
     clockRate = 1500000
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GTX 465 is okay
SETI@home using CUDA accelerated device GeForce GTX 465
Priority of process raised successfully
Priority of worker thread raised successfully
Cuda Active: Plenty of total Global VRAM (>300MiB).
 All early cuFft plans postponed, to parallel with first chirp.

 )       _   _  _)_ o  _  _
(__ (_( ) ) (_( (_  ( (_ (  
 not bad for a human...  _)

Multibeam x39d Preview, Cuda 3.20

Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is :  2.589599
VRAM:              cudaMalloc((void**) &dev_cx_DataArray, 1048576x       8bytes =    8388608bytes, offs256=0, rtotal=   8388608bytes
VRAM:         cudaMalloc((void**) &dev_cx_ChirpDataArray, 1179648x       8bytes =    9437184bytes, offs256=0, rtotal=  17825792bytes
VRAM:                      cudaMalloc((void**) &dev_flag,       1x       8bytes =          8bytes, offs256=0, rtotal=  17825800bytes
VRAM:                  cudaMalloc((void**) &dev_WorkData, 1179648x       8bytes =    9437184bytes, offs256=0, rtotal=  27262984bytes
VRAM:             cudaMalloc((void**) &dev_PowerSpectrum, 1048576x       4bytes =    4194304bytes, offs256=0, rtotal=  31457288bytes
VRAM:           cudaMalloc((void**) &dev_t_PowerSpectrum, 1048584x       4bytes =    1048608bytes, offs256=0, rtotal=  32505896bytes
VRAM:           cudaMalloc((void**) &dev_GaussFitResults, 1048576x      16bytes =   16777216bytes, offs256=0, rtotal=  49283112bytes
VRAM:                       cudaMalloc((void**) &dev_PoT, 1572864x       4bytes =    6291456bytes, offs256=0, rtotal=  55574568bytes
VRAM:              cudaMalloc((void**) &dev_PoTPrefixSum, 1572864x       4bytes =    6291456bytes, offs256=0, rtotal=  61866024bytes
VRAM:              cudaMalloc((void**) &dev_NormMaxPower,   16384x       4bytes =      65536bytes, offs256=0, rtotal=  61931560bytes
VRAM:                   cudaMalloc((void**) &dev_flagged, 1048576x       4bytes =    4194304bytes, offs256=0, rtotal=  66125864bytes
VRAM:            cudaMalloc((void**) &dev_outputposition, 1048576x       4bytes =    4194304bytes, offs256=0, rtotal=  70320168bytes
VRAM:       cudaMalloc((void**) &dev_PowerSpectrumSumMax,  262144x      12bytes =    3145728bytes, offs256=0, rtotal=  73465896bytes
VRAM:         cudaMallocArray( &dev_gauss_dof_lcgf_cache,       1x    8192bytes =       8192bytes, offs256=176, rtotal=  73474088bytes
VRAM:          cudaMallocArray( &dev_null_dof_lcgf_cache,       1x    8192bytes =       8192bytes, offs256=72, rtotal=  73482280bytes
VRAM:           cudaMalloc((void**) &dev_find_pulse_flag,       1x       8bytes =          8bytes, offs256=0, rtotal=  73482288bytes
VRAM:             cudaMalloc((void**) &dev_t_funct_cache, 1966081x       4bytes =    7864324bytes, offs256=0, rtotal=  81346612bytes
Thread call stack limit is: 1k
Cuda Thread Limit was adjusted to 10k
boinc_exit(): requesting safe worker shutdown ->
  Worker Acknowledging exit request, spinning-> boinc_exit(): received safe worker shutdown acknowledge ->

changed it back to 2 at a time and the task picked up and looks like it will complete successfully
MSI was reading 925 MiB with the 3rd task running (or trying to) & Boinc reports my card as having 993MB
Title: Re: x38g reports
Post by: Jason G on 28 Jun 2011, 04:44:24 pm
A Hah!  Thanks!  Will do some tests here  :)
Title: Re: x38g reports
Post by: Jason G on 28 Jun 2011, 05:18:17 pm
Try this one with 3 tasks, perryjay & ghost ( x39e attached, reduced footprint back to roughly normal, I hope)



[attachment deleted by admin]
Title: Re: x38g reports
Post by: Ghost0210 on 28 Jun 2011, 05:37:56 pm
Try this one with 3 tasks, perryjay & ghost ( x39e attached, reduced footprint back to roughly normal, I hope)



thats got it  ;D
now able to run 3 tasks at a time with memory usage now @ 801MB which is the same as x39d was running 2 tasks
Title: Re: x38g reports
Post by: Jason G on 28 Jun 2011, 05:42:17 pm
Sweet.  Note to self:  Lack of beer induces ID: 10t errors
Title: Re: x38g reports
Post by: arkayn on 28 Jun 2011, 07:00:23 pm
I just checked my 460 and it showed that I had used up to 740 MB and is at 710 MB right now with 2 at a time.

Changing over to x39e now.

[edit]Looks like it is down to 516 MB now[/edit]
Title: Re: x38g reports
Post by: perryjay on 28 Jun 2011, 07:11:32 pm
I'm here. Memory usuage-826MB, GPU Usage- 94-99%, temp ~ 69degrees. Fan sounds quieter but running at 70%. We'll see how it goes.



Okay, after just a few minutes little has changed. Temp has gone up to 72 degrees, and memory usage has gone down to 817MB. GPU usage has gone to 92 to 97%, staying right around 95% mostly. Here's hoping.



Title: Re: x38g reports
Post by: Slavac on 29 Jun 2011, 01:36:57 am
Checking in.  Looks like I was a touch too late to snag the latest x39 build for testing.

Hoping this helps a bit, my 560ti's have been giving me fits.
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 02:30:24 am
Not too late.  Have just been purposely burying test builds in this thread to limit distribution while still getting some wider testing.  Will PM you link to newest (x39e) on Seti Main, to the post a few back.  [Done, relayed that x39e should be more helpful in isolating any further problem at least, so fingers crossed it shows something obvious]
Title: Re: x38g reports
Post by: _heinz on 29 Jun 2011, 03:27:58 am
Hi Jason,
took x39e now for seti main on my GT540M (1GB), but get no work till now.
As soon as I have work, i will post again.
heinz
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 03:31:18 am
Cheers Heinz.  If your error pops up as before then that'll be good for further diagnosis. If it doesn't well that'll be good too.
Title: Re: x38g reports
Post by: Pepi on 29 Jun 2011, 04:20:36 am
Hi Jason !
This 39e you posted is fastest app on my system. Can do 4 WU in parallel without any problems, but stuck when first of four is finished and new one need to start :(
In the other way, work with much less memory usage then any of previous releases. Now , as always I will crunch at least 100 WU to see how this app works.
Good work!!!

(http://i53.tinypic.com/2ev55vm.jpg)
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 04:31:55 am
This 39e you posted is fastest app on my system.

That actually surprises me, since I dialled back some things for diagnosis & refinement, and am not focussed on speed at this time.  Gradually refining/fixing things I suppose may help 'real-world', as opposed to laboratory bench, speed as well, so I'll keep that in mind as things go further forward. 

The memory footprint may yet still end up from a little bit, to considerably smaller.  I'm not sure at this stage.  4 at a time is getting pretty eager though  :D

Jason
Title: Re: x38g reports
Post by: Pepi on 29 Jun 2011, 04:54:13 am
I don't know what are you doing, but you doing well :) ( whatever you do with this app) :)
Title: Re: x38g reports
Post by: perryjay on 29 Jun 2011, 10:23:47 am
Going for a fourth  http://setiathome.berkeley.edu/workunit.php?wuid=766762437 

I agreed with another running x38g while a stock 6.03 found an extra gaussian.  The x38g was first, I was third. Shouldn't I have validated him?

But anyway, made it through the night with no problems at all to report.  As to the comment about this being the fastest app yet, could it just be that it seems to load faster and we don't have that snag near the end of the WU anymore. Those two give us about a minutes advantage right there at least.
Title: Re: x38g reports
Post by: Josef W. Segur on 29 Jun 2011, 01:03:15 pm
Going for a fourth  http://setiathome.berkeley.edu/workunit.php?wuid=766762437 

I agreed with another running x38g while a stock 6.03 found an extra gaussian.  The x38g was first, I was third. Shouldn't I have validated him?

Yes, x38g on a GTX 460 and x39d on a GTS 450 really ought to be so close that an inconclusive comparison is nearly impossible. IMO the tiny likelihood of one of the reported or "best" signals being at a critical level should be much rarer than necessary to explain the number of inconclusives that are happening even between stock and the x3[8|9] builds.

Edit: Attaching the WU for that particular case. I have no way of comparing x38g to x39e unless someone else tests. I could do a CPU test, but won't unless CUDA testing seems to indicate it's needed.
                                                                   Joe
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 03:06:40 pm
As far as I'm concerned, x39d & e are different on those particular cards to x38g, and it's those family of 'newer' cards that brought us into x39 diagnostic builds trying to locate a specific issue with those GPUs (& some others).

My current suspicions are along the lines that x38g & earlier builds, on certain cards & drivers ,can have some silent failures, that while not necessarily manifesting in obvious reportable count differences can certainly lead to differences in the best signals.   

With regard to the likelihood that some such hidden error exists, with x38g it's possible, while with x39d highly unlikely.  In other words while the computation codepaths are basically the same, the driver version & kernel reliability cross GPU is not, which is why we are running 'diagnostic' builds & not optimising for performance at this point.


Jason
Title: Re: x38g reports
Post by: perryjay on 29 Jun 2011, 03:23:02 pm
Hey boss, just wanted to let you know, Raistmer, Claggy and Ghost made me do it!!! They ganged up on me!   ::)

Only kidding, but I am running Raistmer's new app for APs on NVidia GPUs. Ghost said it was running okay on his with two MBs running at a time so I guess I will find out if three at a time will work. Haven't got any work for it yet but I'll let you know how it goes.
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 03:24:55 pm
OK.  If it pinches all the CPU from the Cuda app, starving it out, blame Raistmer.
Title: Re: x38g reports
Post by: Claggy on 29 Jun 2011, 03:49:21 pm
Going for a fourth  http://setiathome.berkeley.edu/workunit.php?wuid=766762437 

I agreed with another running x38g while a stock 6.03 found an extra gaussian.  The x38g was first, I was third. Shouldn't I have validated him?

Yes, x38g on a GTX 460 and x39d on a GTS 450 really ought to be so close that an inconclusive comparison is nearly impossible. IMO the tiny likelihood of one of the reported or "best" signals being at a critical level should be much rarer than necessary to explain the number of inconclusives that are happening even between stock and the x3[8|9] builds.

Edit: Attaching the WU for that particular case. I have no way of comparing x38g to x39e unless someone else tests. I could do a CPU test, but won't unless CUDA testing seems to indicate it's needed.
                                                                   Joe
Here's a benchrun comparing x39e to x32f and x38g,

Edit: x39e was Weakly similar against x32f, but Strongly similar,  Q= 99.96% against x38g

Edit 2: did an x38d run too, x39d was Weakly similar against x32f, but Strongly similar,  Q= 99.97% against x38g

Claggy
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 04:10:13 pm
Looks like the chirp difference to me altered the best gaussian, as opposed to more recent x39 changes.

I'll run that one on AKv8b for a double precision CPU chirp reference comparison.

(Barring the mentioned reliability issues we're looking for, x38g & x39d/e should have matched one another in this case)
Title: Re: x38g reports
Post by: Josef W. Segur on 29 Jun 2011, 05:03:39 pm
...
(Barring the mentioned reliability issues we're looking for, x38g & x39d/e should have matched one another in this case)

Claggy's x38g and x39e results did agree on the best_gaussian (and everything else) so can't explain why Perryjay's result didn't get strongly similar against Phud's.

I expect the x32f best_gaussian (which was one of the reported gaussians) is more likely to match CPU results, simply because it has a considerable history of few inconclusives.
                                                                Joe
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 05:30:57 pm
I expect the x32f best_gaussian (which was one of the reported gaussians) is more likely to match CPU results, simply because it has a considerable history of few inconclusives.

Well,  testing that theory grabbing a AKv8b result to add to the collection (That's taking a while  :D). 

As no 'direct'  Gaussian search modifications were made in x32f through 39e, I currently call the x38g chirp & some other kernels 'unstable' under some conditions on certain cards under as yet undetermined conditions. If it turns out something simpler then I'll be happy with that.

I haven't looked at the spikes' proximity to threshold, but given the known 6.03 limitations (which should show in my AKv8b result if a factor) then I think the 3-way circus on the live runs might go something like this:

x38g Vs 6.03 disagrees by spikes, with possible suspect chirp in x38g presenting effects
x39d Vs 6.03 disagrees by spikes
x38g Vs x39d, possible suspect x38g chirp (reliability)

So far we have seen mismatched gaussians between AKv8 & x32f, with a full length test task from your FG set, I'm putting forward that the accuracy of those in the x38g one is repaired to match CPU by the chirp, but that instability created an issue in the live result not seen under bench, and that the majority of the remaining disagreement comes from the spikes.
Title: Re: x38g reports
Post by: Claggy on 29 Jun 2011, 05:50:51 pm
I've taken out my GTX460 and fitted my 9800GTX+ and i'm in the process of running a bench comparing x39d and x39e against x32f and x38g,

Claggy
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 05:56:35 pm
OK, apart from the fix for the VRAM blowout, x39e is identical to x39d, so you could shorten your test by one build if you wanted, though I suppose the extra run couldn't hurt to see if remaining stability issues show up,  despite that none seem to under bench (the frustrating part  :))
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 06:01:22 pm
------------
Running app : AK_v8b_win_x64_SSSE3x.exe -verb -nog
with WU     : 27fe11ac.12560.9065.8.10.100.wu
Started at  : 05:48:23.796
Ended at    : 07:24:09.576
   5745.740 secs Elapsed
   5179.405 secs CPU time
Result      : stored as ref for validation.
------------
Running app : Lunatics_x39e_win32_cuda32.exe -verb -nog
with WU     : 27fe11ac.12560.9065.8.10.100.wu
Started at  : 07:24:12.637
Ended at    : 07:30:50.101
    397.415 secs Elapsed
     50.638 secs CPU time
Speedup     : 99.02%
Ratio       : 102.28 x
ref-AK_v8b_win_x64_SSSE3x.exe-27fe11ac.12560.9065.8.10.100.wu.res:-
Result      : Strongly similar,  Q= 99.74%


Attaching bench & result files for manual comparisons.... [Done, analysing]
Title: Re: x38g reports
Post by: SciManStev on 29 Jun 2011, 06:09:21 pm
Here is x39a, x39d, and x39e V6. I have been running 3 wu's at a time with x39a live without issues. I have only had one invalid wu, out of thousands. Driver 275.33

Steve
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 06:11:28 pm
Here is x39a, x39d, and x39e V6. I have been running 3 wu's at a time with x39a live without issues. I have only had one invalid wu, out of thousands. Driver 275.33

Steve

Thanks Steve, yeah it's these 460's (and some others) that appear to be sensitive to something I'm abusing.  We seem to be home 'n hosed with the 480s
Title: Re: x38g reports
Post by: SciManStev on 29 Jun 2011, 06:20:01 pm
That's what I gathered by reading the threads, but I wanted to throw in a test or two myself. Is there any particular app you would like me to run live, or is there any other comparison you would like me to run?

Steve

PS. I did back down my BCLK one click and eliminated my AP invalids.
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 06:20:29 pm
x39e all the way  ;D
Title: Re: x38g reports
Post by: SciManStev on 29 Jun 2011, 06:25:40 pm
Done!

Steve
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 06:45:56 pm
Manual 27fe11ac.12560.9065.8.10.100  result cross comparison under bench conditions

Claggy's x39e (GTX460) Vs my x39e(GTX480) under bench conditions:  Strongly similar,  Q= 99.95%
My x39e (GTX 480) Vs AKv8bx64SSSE3x:  Result      : Strongly similar,  Q= 99.74%
Claggy's x38g (GTX 460) Vs x39e(My GTX 480): Result      : Strongly similar,  Q= 99.97%
Claggy's x32f (GTX 460) Vs AKv8b( My e8400): Weakly similar. (Bodgy Best Gaussian)
Claggy's x32f (GTX 460) Vs   x39e(My GTX 480): Weakly similar. (Bodgy Best Gaussian)

Tentative analysis:  The known CPU app spikes issues are playing no part here.  The bodgy best Gaussian is in x32f due to innacurate stock nVidia chirp ( ~48 bit precision emulated floating point), I've kept complete documentation on how I fixed that chirp in the alpha ivory beer tower.    x32f & the stock code it came from are crap with highly chirp sensitive signals.

On the live runs, the x38g result likely didn't match the x39d result purely due to the known stability issues we are here to resolve on that class of card.

AKv8b found here:
Quote
Spike count:    8
Pulse count:    4
Triplet count:  0
Gaussian count: 2

with the live run CPU 6.03 guy finding:
Quote
Spike count:    8
Pulse count:    4
Triplet count:  0
Gaussian count: 3

Now A stock cuda 6.08 wingman has rocked up, STILL INONCLUSIVE....LoL... ;D
Quote
Spike count:    8
Pulse count:    4
Triplet count:  0
Gaussian count: 2

IMO, all the results are broken in some way apart from the offline AKv8b ones & the x39d/e results.
Jason
Title: Re: x38g reports
Post by: Claggy on 29 Jun 2011, 06:59:00 pm
Here's the results of the 9800GTX+ run, all apps Strongly similar,

Claggy
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 07:09:10 pm
Well, unfortunately the 9800 result has the stock-x32f-style bodgy gaussian against AKv8b so there will be something to look at deeper with the pre-Fermis  (That gaussian drift against CPU results has been there for a long time though)
Title: Re: x38g reports
Post by: perryjay on 29 Jun 2011, 07:29:49 pm
And here I thought I was just going to show something kinda interesting. I didn't know I was gonna start all this!   :o
Title: Re: x38g reports
Post by: arkayn on 29 Jun 2011, 07:47:12 pm
I thought it was the 560's having problems, I have not seen any problems from my little 460.
Title: Re: x38g reports
Post by: Claggy on 29 Jun 2011, 07:50:09 pm
I thought it was the 560's having problems, I have not seen any problems from my little 460.
I think the 560's just have bigger problems,

Claggy
Title: Re: x38g reports
Post by: Jason G on 29 Jun 2011, 11:11:29 pm
I thought it was the 560's having problems, I have not seen any problems from my little 460.
I think the 560's just have bigger problems,
and not all 460's & 560's are created equal either, whereas 480's like mine & Steve's are nVidia reference & likely as close to identical as they come, so relatively predictable.  It's something that should not reflect in the results when the code is 100% 'right'. 

Since some of the newer code in play is specifically written to exploit the superscaler instruction level parallelism for maximum bandwidth on compute capabiiity 2.1, i.e 48 instead of 32 Cuda cores per multiprocessor with an extra warp scheduler, if somethings' being pushed to the limits kernel execution configuration wise then It;s going to show on those cards with harder constraints first.

Jason
Title: Re: x38g reports
Post by: Josef W. Segur on 30 Jun 2011, 01:16:02 am
Hmm, I noticed the "Bodgy Best Gaussian" occurred at chirp rate ~63.9958 and the presumably correct one at ~79.3237. So while running the WU with stock 6.95 I periodically checked state.sah to see what would turn up near those rates. Before the first there was a fairly weak "best" captured at chirp -15.88619 with a score of 0.125732, then the bodgy with a score of 1.293129, and the final one had a score of 1.305961. I don't know how to evaluate the 0.98% difference between bodgy and final scores.

AK_v8b_win_SSSE3x.exe has just gotten to bodgy and calculates its score as 1.293091 which is close though I'd rather the first difference were in the 6th significant digit than the 5th.
                                                                  Joe
Title: Re: x38g reports
Post by: Jason G on 30 Jun 2011, 01:36:48 am
AK_v8b_win_SSSE3x.exe has just gotten to bodgy and calculates its score as 1.293091 which is close though I'd rather the first difference were in the 6th significant digit than the 5th.

If you are too, I'm content to call this one "remaining chirp annoyances, with added yet to be divined nVidia Gaussfit implementation vagaries"
Title: Re: x38g reports
Post by: Jason G on 30 Jun 2011, 02:00:29 am
I don't know how to evaluate the 0.98% difference between bodgy and final scores.

Just a naive thought on how that could interact with slight chirp differences:  If you take a fairly 'phat' gaussian (wider bandwidth bin leakage or similar effect) then stride it in slightly different angles, you will get a slightly different fit (shape)  for very similar (but not the same) peak... could there be some substantial aliasing & could [recommending to the project] some windowed transforms improve that SNR (controlling 'lobing' in the frequency domain)?
Title: Re: x38g reports
Post by: perryjay on 30 Jun 2011, 10:21:02 am
Here's another one you guys might like to ponder http://setiathome.berkeley.edu/workunit.php?wuid=766762437  it gives a 6.03, 6.08, x38g, and my x39e plus another it has just been sent out to running optimized Linux. He should be reporting in soon.
Title: Re: x38g reports
Post by: Jason G on 30 Jun 2011, 10:49:00 am
Same one I reckon  :D
Title: Re: x38g reports
Post by: perryjay on 30 Jun 2011, 12:12:46 pm
May want to see this one too.  http://setiathome.berkeley.edu/workunit.php?wuid=771931438  First three to run it got -12s I completed and a 6.03 completed with the same count I had. Still got sent out to another.


Am I finding what you guys are looking for or wasting your time? If these don't help let me know more of what I should look for.   ;)


Here's a poor guy with a new GTX 590 throwing a bunch of -9s. http://setiathome.berkeley.edu/show_host_detail.php?hostid=6016350 He's showing 122 invalids with only 36 valid results. He's running x32f.
Title: Re: x38g reports
Post by: Josef W. Segur on 30 Jun 2011, 12:35:35 pm
AK_v8b_win_SSSE3x.exe has just gotten to bodgy and calculates its score as 1.293091 which is close though I'd rather the first difference were in the 6th significant digit than the 5th.

If you are too, I'm content to call this one "remaining chirp annoyances, with added yet to be divined nVidia Gaussfit implementation vagaries"

Agreed, and I hope Crunch3r's 64 bit Linux build will resolve it. Otherwise that WU will end up in the very rare "Too many success results" category.

...
Am I finding what you guys are looking for or wasting your time? If these don't help let me know more of what I should look for.   ;)
...

You're doing fine, we just wish the project were fully funded and had several technicians available to look into these cases.  8)
                                                             Joe
Title: Re: x38g reports
Post by: perryjay on 30 Jun 2011, 02:30:05 pm
Got the same count as this 460 running x38g but still went to inconclusive..  http://setiathome.berkeley.edu/workunit.php?wuid=771572553
Title: Re: x38g reports
Post by: Jason G on 30 Jun 2011, 09:16:02 pm
Got the same count as this 460 running x38g but still went to inconclusive..  http://setiathome.berkeley.edu/workunit.php?wuid=771572553

Yeah now that initially looks to me like x38g failing invisibly with something.  As we've seen they are numerically the same under bench conditions even with marginal results like the previous chirp/gaussfit weirdo, but no evidence appears that something went wrong.   

I think x38g pushes the pulsefinding a touch too hard for some cards like the 460 & yours, causing some undocumented driver/kernel launch failures later in the process.    x39e has that wound back a notch, and extra hardened launches with print & hard error outs before & after every Cuda call.  That should make any repeat of what happens in x38g more obvious & descriptive.  IOW: discount the x38g result as possibly bad on that wingman, and we'll have to thrash out x39e for problems so a more general update can be provided.

An interesting thing, if possible red herring, is that a 480 (running stock 6.10 cuda_fermi) failed with a Cuda incorrect function on that task (again without a lot of explanation).  I know from my own & Steve's 480's that errors on these cards are exceedingly rare with all current builds .... So it's possible there is something funky going on with certain tasks as well.

Jason
Title: Re: x38g reports
Post by: perryjay on 30 Jun 2011, 10:39:37 pm
Duh, right, me good x38g bad, got it!   ;)
Title: Re: x38g reports
Post by: _heinz on 01 Jul 2011, 03:06:13 am
Hi Jason,
took x39e now for seti main on my GT540M (1GB), but get no work till now.
As soon as I have work, i will post again.
heinz
I got 5 tasks.
2 are conclusive,
one against SETI@home Enhanced v6.10 (cuda_fermi)
the other against SETI@home Enhanced v6.03

for 3 I have to wait.
hostid=6023152 (http://setiathome.berkeley.edu/results.php?hostid=6023152)

Only issue every time when a seti wu will start the machines clock down again on my GT540M.
I run it together with primegrid.
To get my oc'ed values(750/900/1500) back I must restart the machine. EVGA Precision was not able to set the frequency new, if it is fallen down.
Still restart helps.
Standard clock is (672/900/1344)
30.06.2011 20:24:17 |  | NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 4000, compute capability 2.1, 962MB, 172 GFLOPS peak)
BOINC 6.12.26(x64)
I have installed latest 275.50-notebook-win7-winvista-64bit-international-beta
Title: Re: x38g reports
Post by: Jason G on 01 Jul 2011, 03:15:06 am
...
Only issue every time when a seti wu will start the machines clock down again on my GT540M.
I run it together with primegrid....

OK heinz, I think that no other project has the application fixes for newer Cuda drivers yet.  If you suspend primegrid for a while & verfiy you see no more downclocks, then I'd suggest if you want to run that then either go back to a Cuda 3.2 driver (until primegrid fixes their application) or encourage them to apply the boincapi fixes necessary in an application update.

Jason
Title: Re: x38g reports
Post by: _heinz on 01 Jul 2011, 06:53:16 am
...
Only issue every time when a seti wu will start the machines clock down again on my GT540M.
I run it together with primegrid....

OK heinz, I think that no other project has the application fixes for newer Cuda drivers yet.  If you suspend primegrid for a while & verfiy you see no more downclocks, then I'd suggest if you want to run that then either go back to a Cuda 3.2 driver (until primegrid fixes their application) or encourage them to apply the boincapi fixes necessary in an application update.

Jason
from where I can download the boincapi fixes ? If I get them, I can compile a new pg version, to see if that fixes the issue.
thanks
Title: Re: x38g reports
Post by: Jason G on 01 Jul 2011, 07:21:47 am
from where I can download the boincapi fixes ? If I get them, I can compile a new pg version, to see if that fixes the issue.
thanks
  pretty sure I gave you that info ... Looking...

[Edit:] found it:
http://lunatics.kwsn.net/5-windows/re-ap-blanking-experiment.msg39031.html#msg39031
Title: Re: x38g reports
Post by: perryjay on 01 Jul 2011, 09:28:34 am
Quote
May want to see this one too.  http://setiathome.berkeley.edu/workunit.php?wuid=771931438  First three to run it got -12s I completed and a 6.03 completed with the same count I had. Still got sent out to another.

Last man finished. Three of us got validated. Last man also ran it as a 6.03.
Title: Re: x38g reports
Post by: Jason G on 01 Jul 2011, 12:11:03 pm
Well am back up & running with a spare 450W PSU myself & had to put in the old 9600GSO to get operational again.  The 750W one driving the GTX 480 seems to have bitten the dust, so it's out of action until I can RMA it. 

Oh well, looks like crunching, development & everything else will be in slow motion for the time being  ::)
Title: Re: x38g reports
Post by: arkayn on 01 Jul 2011, 12:23:17 pm
Ouch!!!!

I am currently using my AMD Quad as my primary desktop since the iMac is just getting old and crash happy. I use my Q8200 machine as the music supplier.
Title: Re: x38g reports
Post by: perryjay on 01 Jul 2011, 02:19:14 pm
What, you don't have a dozen or so extras laying around? Sorry to hear that Jason, hope you get going again real soon. Things are still running good here, no problems to report. I have got my first AP on GPU but will be awhile before I get to it. Hope nothing smokes here by trying to run it. I haven't changed any settings so it will have to fend for itself when it starts.
Title: Re: x38g reports
Post by: _heinz on 01 Jul 2011, 06:12:54 pm
Hi Jason,
I run now still seti alone. Looks like the downclocking is gone.
one result looks very courious --> http://setiathome.berkeley.edu/result.php?resultid=1976573029
<core_client_version>6.12.26</core_client_version>
<![CDATA[
<stderr_txt>

</stderr_txt>
]]>
~~~~~~
nothing in stderr ?
Title: Re: x38g reports
Post by: Jason G on 01 Jul 2011, 09:38:42 pm
Yeah, I'm convinced that's Boinc server or communications related somehow, & have seen empty stderr before with different applications. The newer exit code seems to have reduced the appearances of ones where stderr is truncated, but the whole thing missing isn't one I've been able to pin down to app or boinc client side yet.    One day I would like to think of a way to locate the exact point in the system where the stderr contents (or other parts) go missing, whether it's somewhere on our end, the server, or somewhere in between.

Jason
Title: Re: x38g reports
Post by: Jason G on 01 Jul 2011, 09:54:01 pm
What, you don't have a dozen or so extras laying around? ...

LoL, yeah I'm running on the spare now, which has been a marathon juggling excercise:
- 750W psu aparently died,
- unplugged all hardware, including the GTX 480 for a test with spare 300W psu (success)
- checked 750W again ... nogo,
- harvested 450W PSU from lounge machine, repalcing it with the 300W
- removed GTX 260 from lounge machine replacing with old 9600GSO
- fiddled with drivers on that for ages to make sure that one's running & crunching OK (seems to be)
- Installed the 450W & another 9600GSO I had lying around in main machine to get it operational
- got that crunching OK & went to sleep

Still to come, see if the 450W can manage to drive the GTX 260 harvested from the lounge machine, not likely that it would manage the 480.  Lot's of tasks to crunch & the 9600GSO may have trouble beating deadlines, LoL.

Then I'll see if the supplier will RMA the 750W PSU.  Sheesh, best PSU I ever had (Seasonic X-750), runs stone cold & only last a year?  ... oh well...

Title: Re: x38g reports
Post by: BANZAI56 on 01 Jul 2011, 11:29:20 pm
Sheesh, best PSU I ever had (Seasonic X-750), runs stone cold & only last a year?  ... oh well...

Humm.  Not what I wanted to hear.       :o
A few weeks back I put together an i7-950 with the same exact PS...   
Title: Re: x38g reports
Post by: Jason G on 01 Jul 2011, 11:31:26 pm
Sheesh, best PSU I ever had (Seasonic X-750), runs stone cold & only last a year?  ... oh well...

Humm.  Not what I wanted to hear.       :o
A few weeks back I put together an i7-950 with the same exact PS...   

It's been as Awesome PSU, never any sign of stress, or even getting warm.  Hopefully just a freak one-off.
Title: Re: x38g reports
Post by: Pepi on 02 Jul 2011, 03:58:42 am
It looks like fuse problem in PSU, not PSU itself :) But since you will not open PSU since you loose guarantee you will never know :)
Always you can look at bright side of life :) What if all other components dies, but not PSU. That will me far more damage.

P.S little advice.
Don't  mess with 450W PSU of lower quality and GTX 260. It is power hungry beast, and or PSU or GPU will be damaged is there is no sufficient power.
Yesterday I finally got more then 120 WU, so can start crunch normally at least 24 hours :)
Title: Re: x38g reports
Post by: Jason G on 02 Jul 2011, 04:33:47 am
P.S little advice.
Don't  mess with 450W PSU of lower quality and GTX 260. It is power hungry beast, and or PSU or GPU will be damaged is there is no sufficient power.

LoL, I agree, but I've been trying to kill this GTX 260 & PSU (turns out now I look it's a thermaltake 470W, so not completely crap, but not enough for the 260)  for a long time so that I can justify getting a newer one for the machine it was in  ;D...
Quote
Yesterday I finally got more then 120 WU, so can start crunch normally at least 24 hours    :)
Besides, I have ~2400 tasks on this machine, if something dies trying to whittle that down I'll call it a noble sacrifice & give them a decent burial.  I'll take your adice to heart & run it underclocked though  ;)

The stupid part is that I have the skills & tools to fix the 750W unit, but, as you say, don't want to open it *sigh*
Title: Re: x38g reports
Post by: Pepi on 02 Jul 2011, 07:52:55 am
You try to kill GTX 260?
It is peace of cake: use  12V , put one cable ( + or -) it is irrelevant on any gold contact of PCI express, and with other make contact with all other gold contacts.
That will kill it immediately :)
That is how I kill some computer parts.
PSU is hard to kill, it shutdown itself  with any voltage irregularities :)

And for all of those WU, make backup of both BOINC directory regularly  :) And put it on some other hard disc :)
Title: Re: x38g reports
Post by: SciManStev on 02 Jul 2011, 08:12:37 am
The stupid part is that I have the skills & tools to fix the 750W unit, but, as you say, don't want to open it *sigh*
I hear you on that! If mine went belly up, I'd have a hard time not opening the case, and digging into it. I'm really sorry your supply died like that. You are right in that these supplies are the best you have ever owned. The 1200 Watt version I have is better than anything I have ever seen.

As far as x39e, or any of the other builds I have run live, I haven't experienced a single problem. These 480's are crunching away like jet engines, and have held up perfectly, even at an extreme overclock. 871 MHz vs 700 Mhz stock.

Steve
Title: Re: x38g reports
Post by: _heinz on 02 Jul 2011, 08:35:50 am
AS you know I killed the second 1000W PSU with my V8-Xeon. Now in the summer it is not possible to run the machine with 3 Aircooled 470/570, we had some days ago alredy 36 grd celsius outside, and the house where I live has no climatisation. Still a watercooled system and perhaps a 1200W PSU is able running the whole year continuous.
I must wait till late autumn to repair the machine.  :'(

heinz
Title: Re: x38g reports
Post by: perryjay on 02 Jul 2011, 09:17:54 am
Quote
The stupid part is that I have the skills & tools to fix the 750W unit, but, as you say, don't want to open it *sigh*

My problem is I'm too impatient to wait for an RMA and don't have that many spare parts laying around that I can swap out. If something goes out on me I'm usually in it trying to fix it within an hour. That's about all the time I can manage to keep my hands out of it.  :-X
Title: Re: x38g reports
Post by: _heinz on 02 Jul 2011, 01:37:04 pm
i3, GT540M
Today I got a unknown error (http://setiathome.berkeley.edu/result.php?resultid=1977850597)
Preemptively acknowledging a safe Exit on error->
SETI@home error 1 Unknown error
(cudaAcc_CalcChirpData_kernel_sm13<<<grid, block>>>(cudaAcc_NumDataPoints, 0.5*chirp_rate, recip_sample_rate, dev_cx_DataArray, dev_cx_ChirpDataArray))
File: c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_CalcChirpData_sm13.cu
Line: 89
Title: Re: x38g reports
Post by: Jason G on 02 Jul 2011, 01:44:56 pm
Thanks Heinz,  I thought I was the only one to get those  :D   I may do some further profiling & investigation into the performance of that chirp... The indication of "Error on launch" is enough to say 'something broke here' and I have a couple of ideas that might eliminate those.

How many tasks do you currently run at once on that GPU ? 
Please keep an eye out if it happens again.

Jason
Title: Re: x38g reports
Post by: _heinz on 02 Jul 2011, 01:49:49 pm
How many tasks do you currently run at once on that GPU ? 
Jason
I was running two tasks.
At the moment I have no work, several downlods are hunging around...must be patient to get some tasks.
heinz
Title: Re: x38g reports
Post by: perryjay on 02 Jul 2011, 03:26:49 pm
I've only got two errors, both -12s. This one is interesting though  http://setiathome.berkeley.edu/workunit.php?wuid=770546468  Take a look at the one that did complete it. 23 triplets found??
Title: Re: x38g reports
Post by: Jason G on 02 Jul 2011, 03:37:52 pm
I've only got two errors, both -12s. This one is interesting though  http://setiathome.berkeley.edu/workunit.php?wuid=770546468  Take a look at the one that did complete it. 23 triplets found??

Sure!   if ||| is one triplet, |||| is 2 triplets, and ||||| is 5 triplets.  Add a few more |'s in the same 'PulsePoT' and you get to 23 pretty quickly, which is bigger than where the nVidia code commits suicide.

I've yet to hear a good explanation of why ET might not think that's a good way to catch our attention, so do want to rewrite that to eliminate the -12s.  It's only relatively recently  that I feel my Cuda experience is getting to the point where I can consider that particular rewrite (among others), so reengineering it to match CPU is on the list, though much lower down than some other issues.

Jason
Title: Re: x38g reports
Post by: Raistmer on 03 Jul 2011, 01:05:41 pm
Approach used in OpenCL->CUDA build shold never experience this problem. Maybe it's worth to incorporate it in "pure CUDA" build too.
Title: Re: x38g reports
Post by: Slavac on 03 Jul 2011, 06:34:51 pm
We've got Jason sorted for a new PSU.
Title: Re: x38g reports
Post by: perryjay on 03 Jul 2011, 06:42:57 pm
Great news Slavic. 

Get shopping Jason!!   ;D
Title: Re: x38g reports
Post by: Jason G on 04 Jul 2011, 12:17:43 am
Approach used in OpenCL->CUDA build shold never experience this problem. Maybe it's worth to incorporate it in "pure CUDA" build too.

Yeah, when I rewrite that I'll take yours into consideration & apply the 'max bandwidth' approach that's been working well for me so far as well.  As mentioned I do want that issue gone, and as errors go it's becoming more common as hardware gets faster & the work noiser.

 
We've got Jason sorted for a new PSU.

Wow looks like I end up with enough to get the 'big one'.   :)  I'll let everyone know the good news on main & issue thanks as well. 
Title: Re: x38g reports
Post by: Josef W. Segur on 04 Jul 2011, 12:45:09 am
Not unexpected, http://setiathome.berkeley.edu/workunit.php?wuid=766762437 which we looked at pretty closely ended with all 5 getting credit. The stock CUDA 6.08 got canonical so was strongly similar to CRUNCH3R's 6.01 CUDA for 64 bit Linux.
                                                            Joe
Title: Re: x38g reports
Post by: Jason G on 04 Jul 2011, 12:57:29 am
Thanks for keeping track of that one.   Yep looks like we need to at least take a look at the gaussians then.  I'm not happy with the outwardly clean looking 6.03 run being marginalised.  While we can discount the legacy Cuda builds' accuracy for a mixed bag of reasons (& the spikes in 6.03 for that matter), and the x38g one for stability & the unexplored gaussfit code, it's going to be the marginal cases that'll show us where to look.   They are going to get stranger I suppose.

Jason
Title: Re: x38g reports
Post by: Slavac on 05 Jul 2011, 09:39:27 pm
X39e's up and running on two 560ti's.  We'll see how she does.
Title: Re: x38g reports
Post by: _heinz on 06 Jul 2011, 04:55:14 am
Hi Jason,
I get one -9 resultoverflow resultid=1981104197 (http://setiathome.berkeley.edu/result.php?resultid=1981104197)
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.

I thought this is solved ?
Title: Re: x38g reports
Post by: Claggy on 06 Jul 2011, 05:07:46 am
Hi Jason,
I get one -9 resultoverflow resultid=1981104197 (http://setiathome.berkeley.edu/result.php?resultid=1981104197)
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.

I thought this is solved ?
That's quite Normal, as long as your wingman finds too many signals too,

You were probably thinking of error that happens when there are too many triplets,

Claggy
Title: Re: x38g reports
Post by: Slavac on 06 Jul 2011, 05:53:31 pm
x38 was giving me quite a few invalids, thus killing my RAC.  x39 seems to have killed the invalids issues though I'm getting a few errors.  After a few days of run time I'll post what errors I'm getting.

The x39 build also seems to have reduced the frequency of the downclocks I was experiencing.

Jason should I be running a particular driver with the 560's? 
Title: Re: x38g reports
Post by: perryjay on 06 Jul 2011, 06:12:41 pm
According to Martin of Martin's lighthouse...
Quote
266.66 is the first release with 560 Ti support.
 

If that's any help.   ::)
Title: Re: x38g reports
Post by: Mike on 06 Jul 2011, 06:43:13 pm
The 275.xx drivers should be a little fast at least.
Title: Re: x38g reports
Post by: perryjay on 06 Jul 2011, 11:09:32 pm
Got another invalid  http://setiathome.berkeley.edu/workunit.php?wuid=765094268  I found 11 spikes the other two only found three.

Oh, FYI, I've had a couple of driver restarts so I've cut back just a bit on my over clock. I'm now at 883/1766/1804. I'm also only running two at a time because of running Raistmer's app for the NV AP.
Title: Re: x38g reports
Post by: Jason G on 07 Jul 2011, 01:47:00 am
Got another invalid  http://setiathome.berkeley.edu/workunit.php?wuid=765094268  I found 11 spikes the other two only found three.

Nothing weird there, just 2 CPU apps with missed spikes due to inaccuracy ganging up on you.  Looks like x39e is going well so far on that one.
Title: Re: x38g reports
Post by: Slavac on 07 Jul 2011, 03:14:01 am
So far so good.  The invalids I was getting with 38 seem to be gone completely.
Title: Re: x38g reports
Post by: perryjay on 07 Jul 2011, 10:52:39 am
Here's one that might be of interest  http://setiathome.berkeley.edu/workunit.php?wuid=763291020
Title: Re: x38g reports
Post by: Jason G on 07 Jul 2011, 11:26:47 am
Here's one that might be of interest  http://setiathome.berkeley.edu/workunit.php?wuid=763291020
  LoL.  poor old x32f, he served us well.
Title: Re: x38g reports
Post by: glennaxl on 07 Jul 2011, 10:55:41 pm
Almost a month of run-time  without any issues. Good work  ;)
Title: Re: x38g reports
Post by: Slavac on 08 Jul 2011, 06:50:16 pm
Is there any truth to the rumor that 39e runs at a lower RAC than 38?
Title: Re: x38g reports
Post by: perryjay on 08 Jul 2011, 06:55:09 pm
Here's a pretty one...  http://setiathome.berkeley.edu/workunit.php?wuid=762509253
Title: Re: x38g reports
Post by: perryjay on 08 Jul 2011, 06:59:55 pm
Is there any truth to the rumor that 39e runs at a lower RAC than 38?


I believe it runs just a little bit slower than the 38 but that is because it is trying to find some problems some of us were having. I don't know how much that is going to effect your RAC since there are so many variables. With the new credit system you really can't tell since it depends on your wingman's time too.
Title: Re: x38g reports
Post by: Slavac on 08 Jul 2011, 08:31:05 pm
Got my first invalid on the 39 build.  What I find interesting is that 38 was throwing all sorts of invalids at me, 39 is almost entirely invalid free.  Lovely.

http://setiathome.berkeley.edu/result.php?resultid=1974204985
Title: Re: x38g reports
Post by: Slavac on 08 Jul 2011, 08:32:06 pm
Is there any truth to the rumor that 39e runs at a lower RAC than 38?


I believe it runs just a little bit slower than the 38 but that is because it is trying to find some problems some of us were having. I don't know how much that is going to effect your RAC since there are so many variables. With the new credit system you really can't tell since it depends on your wingman's time too.

It seems to be running 500-700 RAC slower on my machine than 38 though IMO that's such a small figure that it's not worth even mentioning.  Was hoping to get some ammunition to belay the "39's horrible for your RAC don't download it" garbage seen at SETI Home.
Title: Re: x38g reports
Post by: perryjay on 08 Jul 2011, 09:10:46 pm
Got my first invalid on the 39 build.  What I find interesting is that 38 was throwing all sorts of invalids at me, 39 is almost entirely invalid free.  Lovely.

http://setiathome.berkeley.edu/result.php?resultid=1974204985


You picked the wrong work unit. Your invalid was http://setiathome.berkeley.edu/result.php?resultid=1974204984  Looks like you had two gang up on you by not finding those 8 gaussians you found.
Title: Re: x38g reports
Post by: Slavac on 08 Jul 2011, 09:36:41 pm
Got my first invalid on the 39 build.  What I find interesting is that 38 was throwing all sorts of invalids at me, 39 is almost entirely invalid free.  Lovely.

http://setiathome.berkeley.edu/result.php?resultid=1974204985


You picked the wrong work unit. Your invalid was http://setiathome.berkeley.edu/result.php?resultid=1974204984  Looks like you had two gang up on you by not finding those 8 gaussians you found.

Crap how do I link WU's properly?
Title: Re: x38g reports
Post by: perryjay on 08 Jul 2011, 10:51:07 pm
You were almost there but you must have copied DCappello's work unit location by mistake. He was the second man on that string. I just open the link I want to copy and copy what's in  the address bar. http://setiathome.berkeley.edu/workunit.php?wuid=772982715 gives you all the wingmen on that work unit.
Title: Re: x38g reports
Post by: Slavac on 09 Jul 2011, 02:33:16 am
You were almost there but you must have copied DCappello's work unit location by mistake. He was the second man on that string. I just open the link I want to copy and copy what's in  the address bar. http://setiathome.berkeley.edu/workunit.php?wuid=772982715 gives you all the wingmen on that work unit.

Crud sorry about that.  Pardon the new guy.
Title: Re: x38g reports
Post by: benool on 09 Jul 2011, 04:21:16 am
Here are a couple of errors I came upon X38g on my 8600 GTS (256Mb of memory, driver 266.58):

Quote
Find triplets return flags indicate an error (value: 1)
Last Cuda error code indicates: Success - No errors.
Cuda sync'd & freed.
Preemptively acknowledging a safe Exit on error->
SETI@home error -12 Unknown error
cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel
File: c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_pulsefind.cu
Line: 301

full details http://setiathome.berkeley.edu/result.php?resultid=1961527829

and

Quote
Cuda error 'find_triplets_kernel' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_pulsefind.cu' in line 276 : unknown error.
Unknown Error.
Cuda error 'cudaMemcpy(&flags, dev_flag, sizeof(*dev_flag), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_pulsefind.cu' in line 287 : unknown error.
Unknown Error.
Cuda error 'cudaMemset(dev_find_pulse_flag, 0, sizeof(*dev_find_pulse_flag))' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_pulsefind.cu' in line 1606 : unknown error.
Cuda error 'cudaMemcpy(&flags, dev_find_pulse_flag, sizeof(*dev_find_pulse_flag), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_pulsefind.cu' in line 1614 : unknown error.
Cuda error 'cudaMemcpy(PulseResults, dev_PulseResults, 4 * (cudaAcc_NumDataPoints / AdvanceBy + 1) * sizeof(*dev_PulseResults), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_pulsefind.cu' in line 1626 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_transpose.cu' in line 73 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_pulsefind.cu' in line 1629 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, (cudaAcc_NumDataPoints / fftlen) * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_summax.cu' in line 234 : unknown error.
Cuda error 'cudaMemcpy(dev_PoT, dev_PowerSpectrum, cudaAcc_NumDataPoints * sizeof(*dev_PowerSpectrum), cudaMemcpyDeviceToDevice)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 482 : unknown error.
Cuda error 'NormalizePoT_kernel' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 499 : unknown error.
Cuda error 'cudaMemset(dev_flag, 0, sizeof(*dev_flag))' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 502 : unknown error.
Cuda error 'GaussFit_kernel' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 509 : unknown error.
Cuda error 'GaussFit_kernel' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 509 : unknown error.
Cuda error 'cudaMemcpy(&flags, dev_flag, sizeof(*dev_flag), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 513 : unknown error.
Cuda error 'cudaMemcpy(GaussFitResults, dev_GaussFitResults, cudaAcc_NumDataPoints * sizeof(*dev_GaussFitResults), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 524 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_NormMaxPower, ul_FftLength * sizeof(*dev_NormMaxPower), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 525 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_transpose.cu' in line 73 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_t_PowerSpectrum, cudaAcc_NumDataPoints * sizeof(*dev_t_PowerSpectrum), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_gaussfit.cu' in line 532 : unknown error.
Cuda error 'cudaAcc_CalcChirpData_kernel2' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_CalcChirpData.cu' in line 113 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, (cudaAcc_NumDataPoints / fftlen) * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_summax.cu' in line 234 : unknown error.
Cuda error 'cudaFree(dev_PowerSpectrumSumMax)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 522 : unknown error.
Cuda error 'cudaFree(dev_outputposition)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 524 : unknown error.
Cuda error 'cudaFree(dev_flagged)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 526 : unknown error.
Cuda error 'cudaFree(dev_NormMaxPower)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 528 : unknown error.
Cuda error 'cudaFree(dev_PoTPrefixSum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 530 : unknown error.
Cuda error 'cudaFree(dev_PoT)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 532 : unknown error.
Cuda error 'cudaFree(dev_GaussFitResults)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 534 : unknown error.
Cuda error 'cudaFree(dev_t_PowerSpectrum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 536 : unknown error.
Cuda error 'cudaFree(dev_PowerSpectrum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 538 : unknown error.
Cuda error 'cudaFree(dev_WorkData)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 540 : unknown error.
Cuda error 'cudaFree(dev_flag)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 542 : unknown error.
Cuda error 'cudaFree(dev_sample_rate)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 544 : unknown error.
Cuda error 'cudaFree(dev_cx_ChirpDataArray)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 546 : unknown error.
Cuda error 'cudaFree(dev_cx_DataArray)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 548 : unknown error.
Cuda sync'd & freed.
Preemptively acknowledging a safe Exit on error->
SETI@Home Informational message -9 result_overflow

http://setiathome.berkeley.edu/result.php?resultid=1961505285
Title: Re: x38g reports
Post by: Terror Australis on 09 Jul 2011, 09:51:39 am
I have an interesting problem on a new system.
Any unit that runs longer than than exactly & minutes and 40 seconds errors out like this
Quote
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E

This is a "new" system: QX9650, Gigabyte EX38-DS5 MB, 4G memory, 2 x GTX580's, XP_32, V266.58 drivers upgraded to 275.33 without improvement, new package downloaded and .exe and dll files refreshed, allowed BOINC more memory and disk space (just in case) Still no fix.

Shorties and higher AR units get through ok. It's only when a unit hits the magic 7:40 barrier the problem occurs. Unfortunately I haven't been able to download enough "long" units to see if this occurrs with only one card installed
Link to the errors page (http://setiathome.berkeley.edu/results.php?hostid=6099973&offset=0&show_names=0&state=5&appid=)
Quote
Title: Re: x38g reports
Post by: Ghost0210 on 09 Jul 2011, 10:07:58 am
It's this part of the error that's important:
Quote
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>

Seems that Boinc thinks that these tasks should be running a lot quicker than they are and times them out with the above error.
Have you tried Fred's Reschedular to adjust the RSC_FPOPS_BOUND on your tasks as this should solve the problem for you

Here's a link to the Fred's site (http://www.efmer.eu/forum_tt/index.php?topic=428.0)
Title: Re: x38g reports
Post by: Terror Australis on 09 Jul 2011, 10:35:20 am
Duuh - of course THAT's what a 177 error is  :P
Should have woken up but i'd been chasing hardware problems in another box all day and wasn't thinking. It's awhile since I had one
Thanks Ghost

T.A.
Title: Re: x38g reports
Post by: perryjay on 09 Jul 2011, 04:58:00 pm
I made two x38gs fight it out over this one..  http://setiathome.berkeley.edu/workunit.php?wuid=774709022  I ran mine on my CPU.   :D


Another x39e and I didn't quite match on this one. Went out to a third running it as v5.28.  http://setiathome.berkeley.edu/workunit.php?wuid=772279650
Title: Re: x38g reports
Post by: Jason G on 10 Jul 2011, 04:42:06 am
Here are a couple of errors I came upon X38g on my 8600 GTS (256Mb of memory, driver 266.58):
...
full details http://setiathome.berkeley.edu/result.php?resultid=1961527829

looks like  regular, rarer now, -12.  Future work will get rid of those entirely, and they (two kind of -12) originate as hard coded limitations in the original nVidia supplied stock code.

Quote
http://setiathome.berkeley.edu/result.php?resultid=1961505285

Looks like a genuine failure of some sort, though x38g doesn't have as helpful output when those occur.  x39e is somewhere in this thread... I'll look for that, or make a newer variant available as appropriate as soon as a few more things are understood about the newer build behaviour.

Jason

[Edit:] here' the link to the post containing the 7zipped main exe (only), meant to be added to an existing x38g installation, with suitable app_info edits (or aistub/aimerge).
http://lunatics.kwsn.net/12-gpu-crunching/x38g-reports.msg39472.html#msg39472
Title: Re: x38g reports
Post by: perryjay on 10 Jul 2011, 12:01:43 pm
Just wanted to let everyone see what is happening with one of my errors... http://setiathome.berkeley.edu/workunit.php?wuid=770546468   5 -12s 1 -9 overflow.

Will it ever end?   ;D


Oh, my other error shows 2 -9s and 2 -12s.
Title: Re: x38g reports
Post by: Jason G on 10 Jul 2011, 12:33:44 pm
Will it ever end?   ;D

Yes it will... The -12's will end altogether, as soon as the V7 autocorrelation code is stable, then I get time to look at & rewrite  the triplet kernels (probably using a combination of Raistmer's opencl approach & my max bandwidth kernels)

Legacy builds disappear with the transition to V7, and newer builds are hoped to converge CPU & GPU results cross platform... So that puts an end to another bunch of weak similarity issues.

Long road... But I think when Joe's work, the project's polishing & all the builds are V7 compatible, then we could be looking for new sets of problems to solve... Or better yet, optimising again instead of troubleshooting & bugfixing.

Just better ask these hardware manufacturer's to stop devising new stuff so we can catch up for a while  ;)
Title: Re: x38g reports
Post by: perryjay on 10 Jul 2011, 12:39:32 pm
Oh, it seems like since I cut back on my over clock I have been getting fewer inconclusives. We may have been chasing a problem on my side.  Most all of my incons right now are caused by a wingman's 450 throwing out -9s. I PMd him but I don't know if it will do any good.
Title: Re: x38g reports
Post by: Jason G on 10 Jul 2011, 12:44:17 pm
... Most all of my incons right now are caused by a wingman's 450 throwing out -9s. I PMd him but I don't know if it will do any good.

What's he running ?
Title: Re: x38g reports
Post by: perryjay on 10 Jul 2011, 01:07:37 pm
Appears to be stock. He also has a 430 in that rig that is turning in good results.  http://setiathome.berkeley.edu/result.php?resultid=1986202415

I have been teamed up with him on quite a few work units.
Title: Re: x38g reports
Post by: Claggy on 10 Jul 2011, 05:17:44 pm
I've just come across this invalid result from one of my wingmen:

http://setiathome.berkeley.edu/result.php?resultid=1973794071

Claggy
Title: Re: x38g reports
Post by: Jason G on 10 Jul 2011, 05:26:13 pm
I've just come across this invalid result from one of my wingmen:

http://setiathome.berkeley.edu/result.php?resultid=1973794071

Claggy
  Well at least newer builds don't end up in that unhelpful cascade when things go awry.  It looks like his original failure was arbitrarily in the preceding power-spectrum, which was already fairly hardened by x38g.  Looking at his error list he may have a few other issues going on, with -177s,  Cufft failures & possibly driver crashes. 
Title: Re: x38g reports
Post by: perryjay on 11 Jul 2011, 01:37:42 pm
Here's a nice one  http://setiathome.berkeley.edu/workunit.php?wuid=777762756  the invalid one is running a GTX 570 with the old v12 mod. Will we lose these guys when we go to the new V7?
Title: Re: x38g reports
Post by: Jason G on 11 Jul 2011, 01:57:02 pm
Will we lose these guys when we go to the new V7?

Yep.
Title: Re: x38g reports
Post by: perryjay on 11 Jul 2011, 02:05:09 pm
Great, then maybe we can tewll them to upgrade when the finally visit the forums to ask why they aren't getting any work!   Oh and the work units from my buddy with the bad 450 has started validating for me and kicking him out as invalid. Judging from the dozen or so times I was paired with him he must be kicking out hundreds if not thousands of those -9s. No reply from him from my PM. (Imagine that!   ::)  )
Title: Re: x38g reports
Post by: perryjay on 12 Jul 2011, 09:05:35 am
Woke up this morning to two -1 errors.  http://setiathome.berkeley.edu/workunit.php?wuid=780336597  http://setiathome.berkeley.edu/workunit.php?wuid=779923870 .  Don't know what happened but looks like my computer rebooted overnight. EVGA precision has a habit of starting after BOINC Manager has started so it doesn't catch the over clock. I have to restart BM and client to get it right again.

Well, something is happening. I just downclocked again. I'm going to set things back to .5/.5 and try again.


FYI,  I picked up and ran an AP on my GPU last night. That could have been the problem. Everything is back up to speed after rebooting but my MB tasks on GPU are running high priority. That should settle down after a couple run here soon. I also have a few APs waiting so we shall see what happens.
Title: Re: x38g reports
Post by: Jason G on 12 Jul 2011, 11:48:50 am
FYI,  I picked up and ran an AP on my GPU last night. That could have been the problem. Everything is back up to speed after rebooting but my MB tasks on GPU are running high priority. That should settle down after a couple run here soon. I also have a few APs waiting so we shall see what happens.

Something to watch: I have no idea if Raistmer included any boincApi fixes for modern Cuda 4.0 drivers, or if they'd be needed under OpenCL running as they most definitely are under Cuda.  One way to find out would be to repeatedly exit Boinc (shutting down the AP while in progress) [or just snooze/unsnooze etc)  & see if it triggers a sticky downclock or not.  If so, then you'll just have to slap Raistmer around a bit to fix it. [ I mean ask nicely...  :D]

Jason
Title: Re: x38g reports
Post by: perryjay on 12 Jul 2011, 11:56:29 am
I'm over in his thread too. Seems it tried to start two APs at once a bit ago. After 10 minutes I had to suspend the second one as it hadn't even started yet. It was strange because even though I changed the count to .5 I had not changed the number of iterations. The only thing I can think of is the fact it was guesstimating the TTC as over 1600 hours and it just over rid everything else. Once I finish this first AP things should drop drastically and I will see what happens.
Title: Re: x38g reports
Post by: Claggy on 12 Jul 2011, 03:35:49 pm
FYI,  I picked up and ran an AP on my GPU last night. That could have been the problem. Everything is back up to speed after rebooting but my MB tasks on GPU are running high priority. That should settle down after a couple run here soon. I also have a few APs waiting so we shall see what happens.

Something to watch: I have no idea if Raistmer included any boincApi fixes for modern Cuda 4.0 drivers, or if they'd be needed under OpenCL running as they most definitely are under Cuda.  One way to find out would be to repeatedly exit Boinc (shutting down the AP while in progress) [or just snooze/unsnooze etc)  & see if it triggers a sticky downclock or not.  If so, then you'll just have to slap Raistmer around a bit to fix it. [ I mean ask nicely...  :D]

Jason
Raistmer hasn't spoken about doing any api changes on any of his apps for Cuda 4 drivers, his apps also consume large amounts of CPU time when running with Cuda 4 drivers (when running on their own),
if there are other apps running like CPU apps, they manage to claw back some of that CPU time, but i think the elapsed time of the OpenCL tasks suffers,

Claggy
Title: Re: x38g reports
Post by: Jason G on 12 Jul 2011, 05:30:45 pm
Raistmer hasn't spoken about doing any api changes on any of his apps for Cuda 4 drivers, his apps also consume large amounts of CPU time when running with Cuda 4 drivers (when running on their own),
if there are other apps running like CPU apps, they manage to claw back some of that CPU time, but i think the elapsed time of the OpenCL tasks suffers,

Hmm, Well most of the Cuda 4 driver issues with actual cuda code, so far, have turned out to be a consequence of the underlying OS/driver changes, and are remedied by changing boincApi to behave a bit more threadsafe [to get back some stability], and taking into consideration the underlying memory model changes [to get back some performance+ a bit extra in some cases]. 

 I'd expect eventually they'd apply to OpenCL as well, but didn't expect that yet.  Oh well, at least before it becomes a real problem for Raistmer on Ati/OpenCL, we should have most of the kinks ironed out & techniques improved on the Cuda side.

[Edit:] do newer Ati drivers 'appear' to be getting slower as well for old-school coding techniques ?
Title: Re: x38g reports
Post by: Mike on 12 Jul 2011, 05:39:56 pm
Quote
[Edit:] do newer Ati drivers 'appear' to be getting slower as well for old-school coding techniques ?

When the setup is working properly no.

But there are some cicumstances some GPUs dont work well on each setup.
Will ty that lout on my sons PC as soon i find some time.

Title: Re: x38g reports
Post by: Claggy on 12 Jul 2011, 05:42:09 pm
Raistmer hasn't spoken about doing any api changes on any of his apps for Cuda 4 drivers, his apps also consume large amounts of CPU time
[Edit:] do newer Ati drivers 'appear' to be getting slower as well for old-school coding techniques ?
Don't know, don't actively run any Native ATI CAL projects, i used to run Collatz, but don't anymore, i've found SDK_2.4 speeds up OpenCL work on my HD5770,
but Raistmer has found slowdowns/instability with Cat 11.6/SDK2.4 on his HD69** when using his OpenCL apps

Claggy
Title: Re: x38g reports
Post by: Jason G on 12 Jul 2011, 05:51:03 pm
OK, thanks both, they could be at some intermediate stage with those too, making things look 'weird'.   On the Cuda side I'll test/check a few things to do with the 'unified memory model' while ironing out some of the V7 code.

Jason
Title: Re: x38g reports
Post by: perryjay on 12 Jul 2011, 06:36:07 pm
Well, I've decided to give Claggy's idea a try. I've set mine to .51 for the APs and .49 for the MB CUDA tasks. I hope it works for me. Right now I'm running two MB tasks so it will have to wait until I get some more AP work.
Title: Re: x38g reports
Post by: Claggy on 12 Jul 2011, 06:37:48 pm
Raistmer hasn't spoken about doing any api changes on any of his apps for Cuda 4 drivers, his apps also consume large amounts of CPU time
[Edit:] do newer Ati drivers 'appear' to be getting slower as well for old-school coding techniques ?
I've grabbed a few Collatz_mini Wu's, with Cat 11.6/SD2.4 my first Wu's completed in 6 min 15 secs & 6 min 12secs, down from average of ~6min 31 secs which would have been done with Cat 10.xx over 6 months ago,

Claggy
Title: Re: x38g reports
Post by: Slavac on 14 Jul 2011, 12:12:35 am
Wish I could be more help, I'm still dealing with downclocking issues.
Title: Re: x38g reports
Post by: perryjay on 14 Jul 2011, 09:02:22 am
Starting the thread on NC should be of help. I've noticed a number of my wingmen with the 560Ti throwing out -9s and other bad results giving them an invalid or error. It would be good to see if it is something in the card itself causing it.
Title: Re: x38g reports
Post by: perryjay on 15 Jul 2011, 11:05:46 am
Here's a strange one http://setiathome.berkeley.edu/workunit.php?wuid=771323155 Check out computer 5257703. He's showing he has two GTS460s but all his GPU work is coming up no GPUs found.

Okay, looked a little closer, he's running driver version 258.96 with the x32f app. I tried to PM him but we will see if he responds.
Title: Re: x38g reports
Post by: Jason G on 15 Jul 2011, 11:08:56 am
Here's a strange one http://setiathome.berkeley.edu/workunit.php?wuid=771323155 Check out computer 5257703. He's showing he has two GTS460s but all his GPU work is coming up no GPUs found.

He probably installed Boinc as a service (protected application)
Title: Re: x38g reports
Post by: perryjay on 15 Jul 2011, 11:13:43 am
That's possible too. I didn't think of that.

I hope it's just the driver. He's running an I7 and cranking out a lot of good work on it. With those two 460s running he'll take off like a house afire.   ;D



another interesting one  http://setiathome.berkeley.edu/workunit.php?wuid=763277874  they tried three times to prove me wrong when I didn't match that first -9 and still refused to give me canonical.   ;D


Another one inconclusive  http://setiathome.berkeley.edu/workunit.php?wuid=782642742  He's running a 470 with v38g. He found 20 spikes I found 12.  Looking at his work I'd say I'll win.
Title: Re: x38g reports
Post by: Jason G on 15 Jul 2011, 11:35:34 am
another interesting one  http://setiathome.berkeley.edu/workunit.php?wuid=763277874  they tried three times to prove me wrong when I didn't match that first -9 and still refused to give me canonical.   ;D

Nice, the Cuda23 ones really dropped the ball with that task, the one that processed successfully missed a reportable gaussian the CPU apps & yourself picked up.  That pretty much correlates with the chirp + gaussian weirdnesses we've been seeing, and need to investigate & understand more deeply.  It is logical that the canonical was chosen as the AKv8 one, as it's likely 'in between' the 6.03 & yours, so representative.  I also take it as a sign that x38g was moving in the right direction to make better matches to CPU apps on some types of results, but there is still work to do.

Quote
Another one inconclusive  http://setiathome.berkeley.edu/workunit.php?wuid=782642742  He's running a 470 with v38g. He found 20 spikes I found 12.  Looking at his work I'd say I'll win.
  He could have power or cooling issues or anything, hard to say.  Once some of the more urgent issues are solved, I'll start playing with nVidia APi to see if I can extract thermal / power / clock info .  Decent info along those lines could paint a clearer picture  in some cases, if some info could be printed to stderr.

Jason
Title: Re: x38g reports
Post by: perryjay on 15 Jul 2011, 11:58:54 am
Quote
Decent info along those lines could paint a clearer picture  in some cases, if some info could be printed to stderr.

Will this be coming out in a paperback edition or just hardcover?   ::)
Title: Re: x38g reports
Post by: Jason G on 15 Jul 2011, 12:06:39 pm
Will this be coming out in a paperback edition or just hardcover?   ::)

LoL, I was thinking of something along the lines of "Your GPU appears to be broken"
Title: Re: x38g reports
Post by: Josef W. Segur on 16 Jul 2011, 08:54:23 am
Here's a strange one http://setiathome.berkeley.edu/workunit.php?wuid=771323155 Check out computer 5257703. He's showing he has two GTS460s but all his GPU work is coming up no GPUs found.

He probably installed Boinc as a service (protected application)

I thought in that case BOINC doesn't see the GPUs at all, though Windows does?
                                                              Joe
Title: Re: x38g reports
Post by: perryjay on 16 Jul 2011, 02:34:40 pm
Here's an oldie but a goody  http://setiathome.berkeley.edu/workunit.php?wuid=730523283 I know the first one was showing a couple of no heartbeats but it looks like it finished and had the same count as everybody else. Wonder why he didn't get any credit?
Title: Re: x38g reports
Post by: Jason G on 16 Jul 2011, 02:45:39 pm
I thought in that case BOINC doesn't see the GPUs at all, though Windows does?
                                                              Joe

I would have to check to be sure on that, but I believe everything appears normal under enumeration, except once in the application you can't initialise the selected device (cudaSetDevice() ) due to permissions or the device already being allocated to another (presumably the user) session somehow.   If that is really the case, then the approaching move to the boinc_temporary_exit() feature should help out, indefinitely stalling the work.

Jason
Title: Re: x38g reports
Post by: Jason G on 16 Jul 2011, 02:59:56 pm
Here's an oldie but a goody  http://setiathome.berkeley.edu/workunit.php?wuid=730523283 I know the first one was showing a couple of no heartbeats but it looks like it finished and had the same count as everybody else. Wonder why he didn't get any credit?

The count itself has value mostly for approximation & guessing somethings are about right,  & other aesthetic qualities. Even the later 295's that processed successfully & became canonical with x38g look to be running on the edge.  It has error results eerily matching the 560ti's with insufficient core voltage &/or cooling ( Yes, at this stage it appears the 560ti issues have been isolated as mostly attributable to those two primary factors). 

The likelihood the original 6.03 result is broken is very high, given that a flaky looking 295 weakly matched your result... the final 6.03 that resolves the quorum sits 'the other side' of the 295 result from you.  That x38g & x38e didn't perfectly match one another in this case is at first surprising until you include the multiple stability influencing factors that could be at play ... Just keep your own temps down & ensure sufficient core voltage etc, so you aren't 'the bad guy' :)

[Edit:] I've just had a brainwave that it may be helpful to add some indication of the number of reportable signals close to threshold, as we did with some astropulse bench testing a while back.  I'll give it some thought.
Title: Re: x38g reports
Post by: perryjay on 16 Jul 2011, 04:22:55 pm
I'm glad you got a handle on the 560Ti problem. Is there something you can do from this end or can you get the word out on how to fix it on the users end? I know I see a lot of my inconclusives coming from 560Tis so it sure would be nice for everybody concerned.
Title: Re: x38g reports
Post by: Jason G on 16 Jul 2011, 04:36:03 pm
I'm glad you got a handle on the 560Ti problem. Is there something you can do from this end or can you get the word out on how to fix it on the users end? I know I see a lot of my inconclusives coming from 560Tis so it sure would be nice for everybody concerned.

I'm currently giving it some deep thought.  There are special nVidia developer tools available that I may be  able to get temperatures & possibly voltages & clock rates.  I could in future print lots of explanation to stderr & go into a temprary exit, or at least a failsafe mode of some sort when things look really obviously bad.  I think after a long period of careful design, under certain more obvious circumstances it should be possible to choose either a hard error out to induce reissue & avoid contaminating the science database,  or under some other known conditions do a temporary exit for some short time period & try again in some predetermined time interval.  We'll see, the 560ti situation certainly raises these questions, and is no doubt a result of stock units being pushed far beyond reference nVidia specs.

Jason
Title: Re: x38g reports
Post by: perryjay on 18 Jul 2011, 12:51:27 pm
Got an invalid. I found 8 pulses the other two guys didn't. http://setiathome.berkeley.edu/workunit.php?wuid=781226722
Title: Re: x38g reports
Post by: Jason G on 18 Jul 2011, 12:58:30 pm
Got an invalid. I found 8 pulses the other two guys didn't. http://setiathome.berkeley.edu/workunit.php?wuid=781226722

I have no immediate explanation for that one.  Got a copy of the task by chance ?

[Edit:] I'll try get my updated offline bench suite updated & into public downloads at some point.  Still wrestling with the fallout from juggling the new PSU etc, but should be under control soon.
Title: Re: x38g reports
Post by: perryjay on 18 Jul 2011, 01:01:45 pm
No, sorry, I just found it on my tasks page.
Title: Re: x38g reports
Post by: Jason G on 18 Jul 2011, 01:25:46 pm
No, sorry, I just found it on my tasks page.

Grab a copy while it's still there. 
http://boinc2.ssl.berkeley.edu/sah/download_fanout/a4/08mr11ai.9455.10865.12.10.23

I'll get some easy up to date bench setup organised tomorrow or so, and also run here to see if 8 pulses turn up that shouldn't, and manually see how close they are to threshold.

Jason
Title: Re: x38g reports
Post by: Raistmer on 18 Jul 2011, 02:34:54 pm
http://setiathome.berkeley.edu/forum_thread.php?id=64837&nowrap=true#1129324 Is it known behavior? x38g works slower on 26x.xx drivers indeed?
Title: Re: x38g reports
Post by: Jason G on 18 Jul 2011, 02:48:51 pm
http://setiathome.berkeley.edu/forum_thread.php?id=64837&nowrap=true#1129324 Is it known behavior? x38g works slower on 26x.xx drivers indeed?

The 275 drivers are a bit better with some kernels as I gradually apply some of the newer techniques.  That will apply to different cards to different degrees, so it becomes a your mileage may vary issue as usual, until more of the kernels get 'upgraded' and full asynch operation is enabled down the line.

[Edit:] like with perryjay's 8xtra pulses I just finished benching, the gradual accumulation of improvements adds up to quite a lot, under 275.50 beta, that I don't even care what old drivers do anymore....

Quote
Quick timetable

WU : 8XtraPulses_08mr11ai.9455.10865.12.10.23.wu
Lunatics_x32f_win32_cuda30_preview.exe :
  Elapsed 494.422 secs
      CPU 77.423 secs
Lunatics_x39f_win32_cuda32.exe :
  Elapsed 407.459 secs, speedup: 17.59%  ratio: 1.21
      CPU 53.430 secs, speedup: 30.99%  ratio: 1.45

Still investigating this particular task to see if perryjay broke it, or something else is going on....

[Edit2:] Bad news perryjay  :(  You broke that one somehow... I get agreement with your wingmen under bench:
Quote
Spike count:    0
Pulse count:    0
Triplet count:  0
Gaussian count: 0

Now we just have to figure out what could have gone wrong with yours....
Title: Re: x38g reports
Post by: Ghost0210 on 18 Jul 2011, 03:27:10 pm
http://setiathome.berkeley.edu/forum_thread.php?id=64837&nowrap=true#1129324 Is it known behavior? x38g works slower on 26x.xx drivers indeed?

Yes, the x38 & x39 series are slower on the 267.xx drivers compared to the 275.50 drivers by some margin.
I can do a shortie in ~250 seconds on the 275.50 drivers, but this shoots up to around 290-300 seconds on the 267.xx drivers
Title: Re: x38g reports
Post by: perryjay on 18 Jul 2011, 03:44:18 pm
Can I slap ghost?  I had a great post all ready and when I hit post it gave me the message about a new post and ate my post!!!   >:(
Title: Re: x38g reports
Post by: Jason G on 18 Jul 2011, 03:46:30 pm
Can I slap ghost?  I had a great post all ready and when I hit post it gave me the message about a new post and ate my post!!!   >:(
  LoL, for next time, when that happens you can usually use your browser's back button, & copy to the clipboard, then paste it into a new post again  ;)

Slapping ghost could be difficult & potentially messy with all that ectoplasm, but fun to watch  ;D
Title: Re: x38g reports
Post by: perryjay on 18 Jul 2011, 03:51:28 pm
Now, what was I saying? Something about not worrying about a couple of bad WUs as long as they help someone else. Oh yeah, and I have tried going back to driver 267.59 to see how it does. I don't see much difference from the 275.33 driver I was using. I went back because I thought I was having trouble getting Raistmers' app and yours to play nice together but it looks like WU series # 13mr11ag 13143.8656.xxxx is what is giving me trouble. Some are running in around 25 minutes and some are taking an hour and 25 minutes. THey are throwing my time to completion all over the place.
Title: Re: x38g reports
Post by: Jason G on 18 Jul 2011, 03:53:47 pm
... THey are throwing my time to completion all over the place.

Hmmm, maybe it's time to think about reviving/extending my modified boinc with per Application DCFs....
Title: Re: x38g reports
Post by: Ghost0210 on 18 Jul 2011, 03:56:21 pm
Can I slap ghost?  I had a great post all ready and when I hit post it gave me the message about a new post and ate my post!!!   >:(

LoL  ;D
Title: Re: x38g reports
Post by: perryjay on 18 Jul 2011, 03:59:24 pm
This is one of the long ones http://setiathome.berkeley.edu/result.php?resultid=2000429739  and this is one of the short ones  http://setiathome.berkeley.edu/result.php?resultid=2000429745 No idea what is happening to the run time.


Oh, and all I had to do was hit the post button again. I hit my back button without checking to make sure the post had gone through.
Title: Re: x38g reports
Post by: Jason G on 18 Jul 2011, 04:03:11 pm
This is one of the long ones http://setiathome.berkeley.edu/result.php?resultid=2000429739  and this is one of the short ones  http://setiathome.berkeley.edu/result.php?resultid=2000429745 No idea what is happening to the run time.

Were you running the longer one alongside Raistmer's OpenCL APs ?  You could try use Fred's priority thingy, or Process Lasso or similar to jack up the priority on the Cuda app.  Doing other stuff on the machine ?

You get any sticky downclocks still?  How's the temperatures etc ?

Jason
Title: Re: x38g reports
Post by: perryjay on 18 Jul 2011, 04:10:42 pm
Temp is 69c, no sticky down clocks, and I was running Raistmers' app at the same time for both of them. I'm only running one MB and one AP on my GPU so it would finish like the short one then do the long one. That or a couple of short ones in a row then a long one. I've mostly been changing the unroll on his app trying to find a sweet spot but it doesn't seem to matter as to how fast or slow the MBs are.
Title: Re: x38g reports
Post by: Raistmer on 18 Jul 2011, 04:48:45 pm
I've mostly been changing the unroll on his app trying to find a sweet spot but it doesn't seem to matter as to how fast or slow the MBs are.
But it can matter. Different unrolls take different amount of GPU memory. So, at least different memory layouts for CUDA app. And with higher unrolls it can have memory shortage...
Title: Re: x38g reports
Post by: perryjay on 18 Jul 2011, 05:14:11 pm
I've been keeping an eye on the times as I make the changes if I change while a WU is running then I watch how fast it was running as compared to what it does after the change. I've also made the change before the WU runs and compare it to another run before the change. The longer WUs seemed to pause for a minute or two even though the elapsed time kept climbing. It didn't seem to have anything to do with what the AP app was doing. This without me doing any piddling around.
Title: Re: x38g reports
Post by: perryjay on 20 Jul 2011, 10:02:21 am
This one finally decided it had enough.  http://setiathome.berkeley.edu/workunit.php?wuid=771323139


This one is interesting to me. http://setiathome.berkeley.edu/workunit.php?wuid=766554061  The first guy is running 32f but the third guy takes anonymous platform seriously.  I can't figure out what version he is running.
Title: Re: x38g reports
Post by: Claggy on 20 Jul 2011, 04:30:22 pm
This one is interesting to me. http://setiathome.berkeley.edu/workunit.php?wuid=766554061  The first guy is running 32f but the third guy takes anonymous platform seriously.  I can't figure out what version he is running.
That is Stock 6.08, but running under Anonymous Platform,

Claggy
Title: Re: x38g reports
Post by: perryjay on 20 Jul 2011, 08:22:39 pm
Another invalid for me.  http://setiathome.berkeley.edu/workunit.php?wuid=781130909
Title: Re: x38g reports
Post by: Jason G on 20 Jul 2011, 08:31:15 pm
Another invalid for me.  http://setiathome.berkeley.edu/workunit.php?wuid=781130909

Hmm, you overflowed on that with pulses for no outwardly obvious reason.  Watch those temperatures  ;).  I've decided to put some level of monitoring of that within future applications, so you'll be caught red handed cooking your GPU  :D
Title: Re: x38g reports
Post by: perryjay on 21 Jul 2011, 09:09:31 am
Couple of things going on around that time. I believe I was still running my higher over clock and playing with Raistmers' app. Possibly running one of his and two of yours or some such as that. Temps have always held pretty much to a safe range so I'm not too worried about that. Figure I pretty much did that invalid myself by fiddling.
 around.
Title: Re: x38g reports
Post by: Jason G on 21 Jul 2011, 09:15:58 am
Cheers.  As long as we have some idea what went on with complete evident failure like that  ;).  The 560ti's market penetration, coupled with the large number running them on the knife edge without realising as such, has me considering ideas for monitoring/control.  We'll see.

Jason

[Next Day:]  For reference, moved x39e Diagnostic build to a special category

Cuda diagnostic Builds (http://lunatics.kwsn.net/index.php?module=Downloads;catd=47) located under GPU apps in public downloads.
Title: Re: x38g reports
Post by: perryjay on 24 Jul 2011, 01:26:30 pm
Here's a nice one..  http://setiathome.berkeley.edu/workunit.php?wuid=772821765  All that time and then get screwed by a 4.43 client.   >:(   Oh well.


And then there's this one..  http://setiathome.berkeley.edu/workunit.php?wuid=778238977 -12s galore !!!!
Title: Re: x38g reports
Post by: Jason G on 24 Jul 2011, 07:25:09 pm
And then there's this one..  http://setiathome.berkeley.edu/workunit.php?wuid=778238977 -12s galore !!!!

hehehe, yeah all three stockers choked, oh well.
Title: Re: x38g reports
Post by: perryjay on 25 Jul 2011, 02:05:05 pm
Another invalid http://setiathome.berkeley.edu/workunit.php?wuid=781127263
Title: Re: x38g reports
Post by: Jason G on 25 Jul 2011, 06:24:50 pm
Another invalid http://setiathome.berkeley.edu/workunit.php?wuid=781127263
  Hmm, Stiffed on pulses, would be good to grab that one for an offline run as well & see if it was your fault.  Not a chance to look for it myself just now, if it's still there a bit later I will.

Jason

[Later:] A quick look I couldn't find it.  What are your average GPU temperatures ?
Title: Re: x38g reports
Post by: perryjay on 26 Jul 2011, 10:45:40 am
Right now sitting at 69C. High seldom goes above 71C.
Title: Re: x38g reports
Post by: perryjay on 28 Jul 2011, 09:17:59 am
One more invalid. http://setiathome.berkeley.edu/workunit.php?wuid=781608186  I found two pulses the other guys didn't.
Title: Re: x38g reports
Post by: Mike on 28 Jul 2011, 11:44:29 am
One more invalid. http://setiathome.berkeley.edu/workunit.php?wuid=781608186  I found two pulses the other guys didn't.

Its on your end then perry.

One of the wingmen running 0.38g.

Still overclocked ?
Title: Re: x38g reports
Post by: Jason G on 28 Jul 2011, 12:00:13 pm
Still overclocked ?

Course he is, LoL.  I'm going to fit in temperature monitoring further into x40 series just for perryjay....  ;)
Title: Re: x38g reports
Post by: perryjay on 28 Jul 2011, 12:23:49 pm
Still over clocked but cut back from the 900/1800 to 883/1766. Temps haven't been a problem. I'm at 73 right now but usually lower than that. Still, that's well within limits.
Title: Re: x38g reports
Post by: Jason G on 28 Jul 2011, 12:33:19 pm
Still over clocked but cut back from the 900/1800 to 883/1766. Temps haven't been a problem. I'm at 73 right now but usually lower than that. Still, that's well within limits.

How's it hold up under OCCT 1 hour artefact scan at max complexity ?
Title: Re: x38g reports
Post by: perryjay on 28 Jul 2011, 12:41:42 pm
Okay Aussie, what's that in English?   ::)  I'll have to google for OCCT whatsit scan and get back to you.



Ewwww, pretty, how do I work it? I guess I'll have to shut down BOINC while I run it huh? Okay, off to try to figure out how I can break it.
Title: Re: x38g reports
Post by: Jason G on 28 Jul 2011, 12:44:47 pm
Okay Aussie, what's that in English?   ::)  I'll have to google for OCCT whatsit scan and get back to you.

http://www.ocbase.com/perestroika_en/index.php?Download

[Edit:] Here's the settings to use... Don't forget to abort if it starts to get really hot  ;)
Title: Re: x38g reports
Post by: perryjay on 28 Jul 2011, 01:00:03 pm
Figures you would come up with a different one than I found. I downloaded the one from EVGA.   http://www.evga.com/articles/00530/Default.asp   I'll try running it first then go get yours.
Title: Re: x38g reports
Post by: Jason G on 28 Jul 2011, 01:01:31 pm
Similar purpose/usefulness. OCCT on max complexity just seems more hardcore to me..
Title: Re: x38g reports
Post by: perryjay on 28 Jul 2011, 04:10:43 pm
Well, that was interesting. I ran the EVGA scanner for an hour. Afterward I checked it's log file. According to it I am running an ATI 6850 and I had artifacts out the ying yang. It also downclocked me way below anything that should be still running. I don't think I'll try that again.


And..... tried to get your version but it tells me my directx9 is not up to date and it blocks out  GPU OCCT and power supply tests.   I've got directx 11.
Title: Re: x38g reports
Post by: Jason G on 28 Jul 2011, 09:04:46 pm
And..... tried to get your version but it tells me my directx9 is not up to date and it blocks out  GPU OCCT and power supply tests.   I've got directx 11.

DirectX in your system will need an update. DirectX 9.0c update was published 18th April 2011.

http://www.microsoft.com/download/en/details.aspx?id=35
Title: Re: x38g reports
Post by: perryjay on 29 Jul 2011, 10:39:33 am
Well, that didn't take as long as I thought it would. I don't think I ever had DX9 on this machine.  Hope it doesn't mess with anything else on here.
Title: Re: x38g reports
Post by: _heinz on 29 Jul 2011, 02:57:25 pm
I tried OCCT on my laptop.
the "Monitoring" part did not open on my i3
seems program is missing some latest chipsets

heinz
Title: Re: x38g reports
Post by: perryjay on 29 Jul 2011, 04:31:47 pm
I ran it for 30 minutes at default 0 shader complexity the first time and got a ton of errors then remembered one of the FAQs that said EVGA precision would cause errors so I stopped it and ran it again for your 6 minute test and set the shader complexity to 8 with no errors.

https://picasaweb.google.com/lh/photo/uFQ_UCQC5Ra7BCZtozmVBSbSZ_Aup0-RSRejz0fueJU?feat=directlink

https://picasaweb.google.com/lh/photo/eG8e5yUOsUAFkZ-fg-BOGybSZ_Aup0-RSRejz0fueJU?feat=directlink

https://picasaweb.google.com/lh/photo/vTDakos6ogUKQLJp0zHSPCbSZ_Aup0-RSRejz0fueJU?feat=directlink

https://picasaweb.google.com/lh/photo/hU2Qv8QKDHix7nBTAEnPXibSZ_Aup0-RSRejz0fueJU?feat=directlink

https://picasaweb.google.com/lh/photo/9RYN9UQs5luTfj2CjnyhISbSZ_Aup0-RSRejz0fueJU?feat=directlink

https://picasaweb.google.com/lh/photo/0sLNi2Pdejctj_P4nYzlAibSZ_Aup0-RSRejz0fueJU?feat=directlink

https://picasaweb.google.com/lh/photo/yi4S9r03UuULaEICDGRMMibSZ_Aup0-RSRejz0fueJU?feat=directlink
Title: Re: x38g reports
Post by: Jason G on 29 Jul 2011, 04:43:04 pm
OK,
  looks like some optimised code can push things harder than previously thought for durations too short to detect with monitoring tools.  That would explain a lot coupled with overoptimistic factory or user OC's, insufficient cooling or PSU.  For now I'd advise backing off any OC if you see invalids not directly attributable to being stiffed by legacy wingmen apps.  In the long run I may need to detect signs of instability, and devise a more purpose built stability check.

Stable for 6 mins OCCT on max complexity is good.  An hour run would be more thorough, but probably get pretty warm.

Jason
Title: Re: x38g reports
Post by: perryjay on 29 Jul 2011, 04:48:03 pm
Hope I did those links right. After a half hour on the first run it only got up to 85c so temp doesn't seem to be a problem.
Title: Re: x38g reports
Post by: perryjay on 03 Aug 2011, 09:23:10 am
This one took awhile to decide.  http://setiathome.berkeley.edu/workunit.php?wuid=763346130 
Title: Re: x38g reports
Post by: glennaxl on 06 Aug 2011, 12:20:05 pm
Starting to get errors on gtx 570:
http://setiathome.berkeley.edu/result.php?resultid=2027743944
http://setiathome.berkeley.edu/result.php?resultid=2027727112
http://setiathome.berkeley.edu/result.php?resultid=2027722678
http://setiathome.berkeley.edu/result.php?resultid=2027713834
http://setiathome.berkeley.edu/result.php?resultid=2027666731
http://setiathome.berkeley.edu/result.php?resultid=2027666700

Code: [Select]
CUFFT error in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_fft.cu' in line 125.
Cuda error 'cudaFree(dev_PowerSpectrumSumMax)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 522 : unknown error.
Cuda error 'cudaFree(dev_outputposition)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 524 : unknown error.
Cuda error 'cudaFree(dev_flagged)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 526 : unknown error.
Cuda error 'cudaFree(dev_NormMaxPower)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 528 : unknown error.
Cuda error 'cudaFree(dev_PoTPrefixSum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 530 : unknown error.
Cuda error 'cudaFree(dev_PoT)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 532 : unknown error.
Cuda error 'cudaFree(dev_GaussFitResults)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 534 : unknown error.
Cuda error 'cudaFree(dev_t_PowerSpectrum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 536 : unknown error.
Cuda error 'cudaFree(dev_PowerSpectrum)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 538 : unknown error.
Cuda error 'cudaFree(dev_WorkData)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 540 : unknown error.
Cuda error 'cudaFree(dev_flag)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 542 : unknown error.
Cuda error 'cudaFree(dev_cx_ChirpDataArray)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 546 : unknown error.
Cuda error 'cudaFree(dev_cx_DataArray)' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 548 : unknown error.
Cuda sync'd & freed.
Title: Re: x38g reports
Post by: Jason G on 06 Aug 2011, 12:26:15 pm
Eeh gads.  If that's started recently then somethings come unstuck.  Please reboot & report temperatures.  I can see your driver is 280.19.  If something let go, I'd like to find out what.

Jason
Title: Re: x38g reports
Post by: glennaxl on 06 Aug 2011, 12:54:24 pm
It all started yesterday - GPU load was 99% but temp was under 40C. Check with GPU-z, cuda-z and nvidia inspector if the GPU has down-clocked itself but it didn't. Reboot, change drivers, OC with EVGA pricision, OC thru bios, ran OCCT, ran EVGA OC Scanner - all these stuff didn't point to a single error. Right now load temps are @65C-70C. I also change the cuda count from .31/.34 to .50/.50 but still got an error on MB but surprisingly no error on AP.
Title: Re: x38g reports
Post by: Jason G on 06 Aug 2011, 01:24:10 pm
Please Ditch AP, aborting tasks as necessary, then after a reboot report to me  x38g behaviour only.  Thanks.
Title: Re: x38g reports
Post by: glennaxl on 07 Aug 2011, 10:55:22 am
Please Ditch AP, aborting tasks as necessary, then after a reboot report to me  x38g behaviour only.  Thanks.
After ditching all AP task, x38g is running smoothly.
Title: Re: x38g reports
Post by: Jason G on 07 Aug 2011, 11:08:50 am
After ditching all AP task, x38g is running smoothly.

Thanks for the information, and I apologise for the curtness of the recommendation.  For the AP app, current advice being given is to run either Cuda 4.0 Drivers, or the Beta AP app, but not both together, as apparently it does not include any of the (boincApi) fixes for newer drivers, and the underlying OS/driver changes they address. 

Jason
Title: Re: x38g reports
Post by: perryjay on 07 Aug 2011, 02:03:19 pm
Here's one that is interesting. Must say they tried their best to prove me wrong but finally gave up and gave me credit.  http://setiathome.berkeley.edu/workunit.php?wuid=783092275
Title: Re: x38g reports
Post by: Jason G on 07 Aug 2011, 02:06:40 pm
Here's one that is interesting. Must say they tried their best to prove me wrong but finally gave up and gave me credit.  http://setiathome.berkeley.edu/workunit.php?wuid=783092275
  You're also canonical, not that surprisingly with all those apps erroring out.

Jason
Title: Re: x38g reports
Post by: Terror Australis on 09 Aug 2011, 12:41:19 am
Using 275.33/38g & 39f I have a downclocking problem with one particular card. There is no problem using 191.07/6.09.

It's one of 2 EVGA GTX285 FTW's in the same box, the other card behaves itself perfectly. I have confirmed it's the card by swapping cards between the sockets.

Jason. As there are two identical cards, one which plays up and one which doesn't. Are there any tests you would would like me to run on the cards to help track down the reason for the downclocking issue in general ?

T.A.
Title: Re: x38g reports
Post by: Jason G on 09 Aug 2011, 05:03:18 am
One thing that has become apparent lately, is that some of the optimised code may be pushing harder than some OC scanning tools.  It is the 560tis that came under scrutiny first  for being close to the edge from factory, and seems to have some per manufacturer &/or per Silicon differences.   

As a result of the ongoing examination, I may need to take some of the 'hottest' running Cuda kernels, and make some more targeted scanning tool out of it. In the meantime I suggest see what happens if you back the suspect card right down to reference clocks.  If that still doesn't help there could be further issues.

If it turns out I am pushing harder that whatever the factories use to bin parts for factory OC models, then I may have to look at some sortof backoff throttle.

Jason
Title: Re: x38g reports
Post by: Terror Australis on 09 Aug 2011, 05:39:30 am
On clser examination there is a possibility it's related either to certain WU's or series of WU's.

THE FTW cards are factory OC'ed to 725MHz, I have backed them off to 715MHz and it still drops to half speed with the same degree of randomness.

Two WU's in question are here (http://setiathome.berkeley.edu/result.php?resultid=2026291800) and here (http://setiathome.berkeley.edu/result.php?resultid=2026307997)

T.A.
Title: Re: x38g reports
Post by: Jason G on 09 Aug 2011, 05:48:25 am
Two WU's in question are here (http://setiathome.berkeley.edu/result.php?resultid=2026291800) and here (http://setiathome.berkeley.edu/result.php?resultid=2026307997)

Thanks, no indication of a cause in those, so we'll keep looking.

What are the temps like going at flat chat?

It's stability of my code (exhibiting apparent task dependency)  versus Factory OC's that are in question, i.e. Higher stock than nVidia reference.

nVidia reference clocks for GTX 285 are:
Core: 648 MHz
Shader: 1476 MHz
Mem: 1242 MHz

Still crook at those ?
Title: Re: x38g reports
Post by: Jason G on 09 Aug 2011, 07:06:52 am
There are some further clues in your errored task list that I'm looking at, tracing some code. Back later with some beer to fuel a further analysis.

There's still quite a few things to eliminate from suspicion, but we'll isolate what's going on eventually.

Jason
Title: Re: x38g reports
Post by: Terror Australis on 10 Aug 2011, 01:50:52 pm
Hi Jason - Here (http://setiathome.berkeley.edu/result.php?resultid=2027976105) and here (http://setiathome.berkeley.edu/result.php?resultid=2027976084) are a couple more units for your perusal. For comparison THIS (http://setiathome.berkeley.edu/result.php?resultid=2027924538) is a "good" unit from the same card

In a short run of about 12 hours, reducing the card to 648MHz showed no downclocking errors (it was dropping to approx half speed before). I've put it back up to 702MHz which GPUZ claims is the "stock" speed and will report back tomorrow (The 70 Meg downclock from the EVGA factory spec was just too irritating to handle :-)

T.A.
Title: Re: x38g reports
Post by: Jason G on 10 Aug 2011, 11:24:04 pm
Thanks,
  It gives me some ammunition to approach things properly with the new 560ti in the other room, which I have yet to put under any crunching or test pieces. I intend to use it to help isolate what's going on, attempting to replicate what some others see .

If I'm pushing some code portions 'too hard' (that is harder than what the factories are using to determine stable OC or bin parts), I'll just have to back those off, making them optional via advanced user settings somehow. (There could be a lot of them, so probably some sortof configuration file would be needed, along with stress tests to determine viable settings), as well as potentially some monitoring & failsafes.

It's likely to end up being a complicated tradeoff, whether to run faster code at a reduced clock rate, or slower code at potentially unstable factory settings, but the most stable config would have to be the default. 

Jason
Title: Re: x38g reports
Post by: perryjay on 12 Aug 2011, 10:17:26 am
Got an error http://setiathome.berkeley.edu/workunit.php?wuid=801331122  Found a triplet thrice. Sounds like a song title or something.  :D  My original wingman hasn't got to it yet and the new work hasn't gone out yet so I don't know if it was something I did wrong or not. Since it's the first one of those I've seen in awhile I doubt it's me.
Title: Re: x38g reports
Post by: Jason G on 12 Aug 2011, 10:24:29 am
LoL.  It looks like he's running stock, so will find it twice before exploding (assuming your result was all in order).  It's probably just an extraterrestrial intergalactic cruiseliner sending an SOS distress beacon in morse code.  we don't want those anyway, we're looking for extraterrestrial intelligence, not shuffleboarders.
Title: Re: x38g reports
Post by: perryjay on 12 Aug 2011, 10:33:27 am
Got it, no shuffleboarders.  If I got it thrice does that mean all my wingman will get is  S, O ?
Title: Re: x38g reports
Post by: Jason G on 12 Aug 2011, 10:39:50 am
Got it, no shuffleboarders.  If I got it thrice does that mean all my wingman will get is  S, O ?

LoL, probably more like the S plus the first 'dah' of the O  , since 4 tones would make 2 triplet detections already if you think about it, 5 tones could make [Edit: 3 triplets].  Won't be long until I remove this limit anyway.
Title: Re: x38g reports
Post by: Richard Haselgrove on 12 Aug 2011, 12:00:03 pm
It's probably just an extraterrestrial intergalactic cruiseliner sending an SOS distress beacon in morse code.  we don't want those anyway, we're looking for extraterrestrial intelligence, not shuffleboarders.

Nah, the cruiseliners look like 2035680302 (http://setiathome.berkeley.edu/result.php?resultid=2035680302) - overflow on 23 gaussians after 75 minutes. It's the wake, you know.
Title: Re: x38g reports
Post by: Pepi on 14 Sep 2011, 12:57:56 pm
What stupid credits :(
2.66 credits for 1200 seconds of work

http://setiathome.berkeley.edu/workunit.php?wuid=799356547
Title: Re: x38g reports
Post by: Jason G on 14 Sep 2011, 12:59:57 pm
Oh there's more fun to come yet  :D
Title: Re: x38g reports
Post by: Pepi on 14 Sep 2011, 01:09:23 pm
I found what was those results: it looks like my 280.26 drivers broken, so GPU calculates soo much. There was  one result finished in 14 seconds get same credit, and same angle in WU.
But this http://setiathome.berkeley.edu/workunit.php?wuid=800021504 is mistery: my gtx 560 Ti or his gtx295 is right? :) Booth uses same app :)
Title: Re: x38g reports
Post by: Jason G on 14 Sep 2011, 01:14:15 pm
His 295 looks to be cooking, (Invalids popping onto his list)
Title: Re: x38g reports
Post by: Pepi on 14 Sep 2011, 02:56:14 pm
His 295 looks to be cooking, (Invalids popping onto his list)
Yes you are right: all inconclusive are from same host with GTX 295 ( I can "see" user of that card happy , because his card crunch so fast ) :))
Title: Re: x38g reports
Post by: perryjay on 18 Sep 2011, 12:01:17 pm
Just to let you know I'm still around  http://setiathome.berkeley.edu/workunit.php?wuid=742974732  I thought this one was interesting.  Not much else to report, everything has been working well except for the wonderful changes Dr. A made.   ::)