Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: madmac on 10 May 2010, 12:44:50 pm

Title: GTX295 CUDA Issues
Post by: madmac on 10 May 2010, 12:44:50 pm
OK, I decided to splash out and spent £250 getting a second hand 295 as I was fed up watching my position slide down the tables :-(
I built a new (2nd hand) rig, installed my card, latest nvidia driver, turned of sli and put the latest optimised apps on and fired her up.
(It would appear that I have a problem with one of the cores as one gpu just errors typically after 19 sec's but never much longer than that, though there has been a couple of times where it has crunched to about 80% and then errored, but this has only happened with 1 or 2 units out of too many to mention, so might be a fluke.
Reverted back to stock apps - no difference
dropped the memory clocks right down to 500Mhz - no difference, the card is now back at stock speeds with one gpu disabled
This is my new pc

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5400786

And this is the task list - Could someone on here have a look at the errors and see if there is a common theme or if there is anything I can change as Im stumped on this one...
The error messages are not always the same

http://setiathome.berkeley.edu/results.php?hostid=5400786

Is it hardware related or is it software related..

Any help gratefully received

Title: Re: GTX295 CUDA Issues
Post by: sunu on 10 May 2010, 05:06:16 pm
99.99% it's a hardware problem. All invalids and error workunits are from the first GPU (device 1).

A bit puzzling is http://setiathome.berkeley.edu/result.php?resultid=1604052193 that gives an out of memory error.
Title: Re: GTX295 CUDA Issues
Post by: Pizzadude on 11 May 2010, 03:39:20 am
Mmmm, thats weird, I've got a gtx295 and been suffering similar issues for about the last three weeks. I thought it was a heat issue and completely dismantled the GTX295 and removed all dustballs etc and replaced heatsink paste with Artic silver. Overall temperatures have reduced by about 5 to 8 degrees but the Seti problem persisted.  I assumed it may be a OS or registry issue so I clean installed  Win7 64bit with various Nvidia drivers but still the problem persists.  The errors always occur on GPU 0.  I am not convinced its a hardware issue as all other Cuda apps work flawlessly.  The GTX 295 plays intensive games without a hitch.
I have performed a burnin test using furmark which took the GTX295 to 93 degrees well within its 105 degree design limit.  In case it was an issue with gtx295 interfacing with my motherboard i removed all over clocks from my I7 processor and memory and put everything back to stock Intel settings.
Still GPU 0 throws an error every couple units or sometimes 10 in a row.

 :-\
Title: Re: GTX295 CUDA Issues
Post by: Pepi on 11 May 2010, 02:37:42 pm
weak power supply?
Title: Re: GTX295 CUDA Issues
Post by: madmac on 11 May 2010, 05:15:40 pm
I am not convinced its a hardware issue as all other Cuda apps work flawlessly.  The GTX 295 plays intensive games without a hitch.
I have performed a burnin test using furmark which took the GTX295 to 93 degrees well within its 105 degree design limit. 
Still GPU 0 throws an error every couple units or sometimes 10 in a row.

 :-\
l
Interesting, can you clarify the 'all other CUDA apps work flawlessly' bit?
I have read that this is an issue with the older dual pcb versions over on the seti forums

I have the problem that 99.99% of wu's error on one of the gpu's
Trying to get my money back...
Title: Re: GTX295 CUDA Issues
Post by: Pizzadude on 12 May 2010, 01:01:31 pm
weak power supply?


I had suspected this so I installed a brand new Corsair 750w nearly two weeks ago.  I wired the system so only the boot drive, motherboard, DVD writer and GTX295 were connected and for the following 5 days the problems persisted.
Title: Re: GTX295 CUDA Issues
Post by: Pizzadude on 12 May 2010, 01:21:43 pm
Quote
l
Interesting, can you clarify the 'all other CUDA apps work flawlessly' bit?
I have read that this is an issue with the older dual pcb versions over on the seti forums

My GTX is the dual PCB version and I suspect the issue you refer to is the bug in the memory controller, so far as I can make out this is more of an urban legend rather than actually confirmed by Nvidia.

As I said other Cuda apps work with very little hitch - the ones I currently use are Badaboom, TMPGENc, Adobe, Boinc Collatz Conjecture and GPUgrid.  Boinc Collatz Conjecture throws occasional errors but nowhere near as many as Seti.
Title: Re: GTX295 CUDA Issues
Post by: Pizzadude on 13 May 2010, 02:39:11 am
Can anybody explain this error, suddenly getting lots of these now :-


Name   29no06ag.1150.892.6.10.140_0
Workunit   608510479
Created   6 May 2010 8:08:47 UTC
Sent   6 May 2010 8:12:24 UTC
Received   13 May 2010 6:12:24 UTC
Server state   Over
Outcome   Client error
Client state   Compute error
Exit status   1 (0x1)
Computer ID   5180462
Report deadline   22 Jun 2010 22:42:57 UTC
Run time   989.105137
CPU time   287.4474
stderr out   

<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
   Device 1 : GeForce GTX 295
           totalGlobalMem = 919994368
           sharedMemPerBlock = 16384
           regsPerBlock = 16384
           warpSize = 32
           memPitch = 2147483647
           maxThreadsPerBlock = 512
           clockRate = 1242000
           totalConstMem = 65536
           major = 1
           minor = 3
           textureAlignment = 256
           deviceOverlap = 1
           multiProcessorCount = 30
   Device 2 : GeForce GTX 295
           totalGlobalMem = 919994368
           sharedMemPerBlock = 16384
           regsPerBlock = 16384
           warpSize = 32
           memPitch = 2147483647
           maxThreadsPerBlock = 512
           clockRate = 1242000
           totalConstMem = 65536
           major = 1
           minor = 3
           textureAlignment = 256
           deviceOverlap = 1
           multiProcessorCount = 30
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GTX 295 is okay
SETI@home using CUDA accelerated device GeForce GTX 295
V10 modification by Raistmer
Priority of worker thread rised successfully
Priority of process adjusted successfully
Total GPU memory 919994368    free GPU memory 561876992
setiathome_enhanced 6.02 Visual Studio/Microsoft C++

Build features: Non-graphics   VLAR autokill enabled    FFTW   x86   
     CPUID: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.4.5

Work Unit Info:
...............
WU true angle range is :  0.406157
Cuda error 'cufftExecC2C' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_fft.cu' in line 63 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.

</stderr_txt>
]]>

Validate state   Invalid
Claimed credit   73.2308450799687
Granted credit   0
Title: Re: GTX295 CUDA Issues
Post by: sunu on 13 May 2010, 06:13:02 am
Have you recently updated drivers etc. ?
Title: Re: GTX295 CUDA Issues
Post by: Pizzadude on 13 May 2010, 03:12:44 pm
Have you recently updated drivers etc. ?

I am currently running 197.45 as this is the version recommended for Cuda support in Adobe CS5.

Which version are folks using for Seti ?
Title: Re: GTX295 CUDA Issues
Post by: efmer (fred) on 15 May 2010, 01:28:36 pm
Have you recently updated drivers etc. ?

I am currently running 197.45 as this is the version recommended for Cuda support in Adobe CS5.

Which version are folks using for Seti ?
These are my computers http://setiathome.berkeley.edu/hosts_user.php?userid=8906489
XP 64 is by far the best choice. Win 7 is a lot slower and a bit buggy.

Do you have any warranty on this card?
I don't want to scare you, but these 2 pcb versions are really bad.
Got about 4 defect cards, before I got 3 cards that work fine 24/7.
The defects were all 2 pcb cards, and lucky me, I have none left, all replaced under full warranty. Without any questions. Even as a second owner you may check if you still have any warranty left.

The 2 pcb cards get way way too hot. It may be that under Seti the cards get a bit warmer than with other applications.

So I recognize all your problems. I had 2 systems so I could swap them around and still it didn't work. The one pcb's worked flawlessly.

Cuda 2.3 works best for me. And 19038 not the latest work best.

Title: Re: GTX295 CUDA Issues
Post by: Pizzadude on 16 May 2010, 12:17:37 am
Quote
These are my computers http://setiathome.berkeley.edu/hosts_user.php?userid=8906489
XP 64 is by far the best choice. Win 7 is a lot slower and a bit buggy.

Do you have any warranty on this card?
I don't want to scare you, but these 2 pcb versions are really bad.
Got about 4 defect cards, before I got 3 cards that work fine 24/7.
The defects were all 2 pcb cards, and lucky me, I have none left, all replaced under full warranty. Without any questions. Even as a second owner you may check if you still have any warranty left.

The 2 pcb cards get way way too hot. It may be that under Seti the cards get a bit warmer than with other applications.

So I recognize all your problems. I had 2 systems so I could swap them around and still it didn't work. The one pcb's worked flawlessly.

Cuda 2.3 works best for me. And 19038 not the latest work best.


Quote

Warrantys out so thats a no go. I have shopped around looking for a single PCB version but they seem to be out of circulation or discontinued in the UK.

I think I will wait for the coding issues to get sorted and go the GTX480 route.
Title: Re: GTX295 CUDA Issues
Post by: efmer (fred) on 27 May 2010, 12:36:26 pm
Try the new Beta drivers.
These Win 7 / GTX 295 drivers are the first to really work.
My 2 cards work, for the first time with Win 7, without any problems.
Title: Re: GTX295 CUDA Issues
Post by: madmac on 27 May 2010, 04:53:55 pm
I ended getting a refund in the end..

Ok, I now need some advice on how to spend £300!

All I am interested in is crunching - not games.
Typical me, I was looking at a machine filled with 9800 GX2's, but I got excited and bought a 295 of ebay. It had a fault so I returned it, but I loved the output it gave me :-)
So I have to buy another card
I know there is no hard and fast data, but what will give the best ppd?
A GTX295 or a GTX470 as there isn't that much price difference now, a 480 is still too expensive...
I know Fermi has a different design and the 256 drivers and CUDA 3.1 are meant to improve things further so if you had the money, what would you do???
For the best points per day would you get a 470 or a 295?
Help a cruncher in need :-)
Title: Re: GTX295 CUDA Issues
Post by: sunu on 27 May 2010, 07:08:07 pm
A GTX295 should give a better RAC than a GTX470 or 480. You could also wait, don't know how long, for the dual fermi cards to come out.
Title: Re: GTX295 CUDA Issues
Post by: madmac on 28 May 2010, 01:13:25 am
Can't wait that long as my machine isn't doing anything at the moment..
Will see what I can pick up off the bay then - cheers
Title: Re: GTX295 CUDA Issues
Post by: efmer (fred) on 28 May 2010, 03:12:58 am
The GTX 295 by faaar.
Title: Re: GTX295 CUDA Issues
Post by: efmer (fred) on 28 May 2010, 03:15:07 am
A GTX295 should give a better RAC than a GTX470 or 480. You could also wait, don't know how long, for the dual fermi cards to come out.
The dual fermi card looks read bad, a heavily underclocked dual 470.
Title: Re: GTX295 CUDA Issues
Post by: sunu on 28 May 2010, 04:47:59 am
We don't even know yet if it will be a dual 470 or other fermi iteration like for example GF104.
Title: Re: GTX295 CUDA Issues
Post by: madmac on 28 May 2010, 05:11:50 pm
Posted on Seti and got this reply...

Depends whether you are interested in instantaneous results, or whether you are planning for the long term.

A 295 is mature technology, perhaps even obsolescent. It will have peaked already - there won't be much new development for it, and in due course there will be new developments which it can't keep up with.

A 470 is new technology, and won't be giving of its best yet. But it should be the focus of development effort for longer into the future, and you will be able to gain from those efforts.

Now or later? Dead-end or progress? Your choice.

He has a point, so Im going for a fermi :-)
Title: Re: GTX295 CUDA Issues
Post by: Richard Haselgrove on 28 May 2010, 05:48:16 pm
Posted on Seti and got this reply...

From me. Flames to the usual address......
Title: Re: GTX295 CUDA Issues
Post by: SciManStev on 29 May 2010, 02:31:39 pm
It's somewhat comforting to know that others have had similar problems with their 295's. Mine played games great, and worked perfectly with the standard apps. Every time I switched to the optimized apps, all it did was crash work units left and right. This happened on multiple motherboards, PSU's and processors. Heat was never even close to being an issue. It is nice to learn that my card was most likely an older one that was made with this error. I didn't know that before I read this thread. My two GTX 480's are crunching like mad now, and with the faulty 295 out of the picture, all my problems went away.

 One other symptom I was getting was that every time I tried to use a Killawatt meter to measurre power levels, the system crashed. I did upgrade my PSU, and it got a little better, but still during one of the regular screen blanking caused by the 295, it crashed me again. Now with the 480's, and an OC'ed CPU 980 at 4.2339 GHz, the system is very stable with the Killawatt meter running all the time.

Steve
Title: Re: GTX295 CUDA Issues
Post by: Pizzadude on 31 May 2010, 03:06:29 am
It's somewhat comforting to know that others have had similar problems with their 295's. Mine played games great, and worked perfectly with the standard apps. Every time I switched to the optimized apps, all it did was crash work units left and right. This happened on multiple motherboards, PSU's and processors. Heat was never even close to being an issue. It is nice to learn that my card was most likely an older one that was made with this error. I didn't know that before I read this thread. My two GTX 480's are crunching like mad now, and with the faulty 295 out of the picture, all my problems went away.

 One other symptom I was getting was that every time I tried to use a Killawatt meter to measurre power levels, the system crashed. I did upgrade my PSU, and it got a little better, but still during one of the regular screen blanking caused by the 295, it crashed me again. Now with the 480's, and an OC'ed CPU 980 at 4.2339 GHz, the system is very stable with the Killawatt meter running all the time.

Steve


Hi Steve, which Seti App. are you running and whats the average crunch time per work unit with your setup ?
Title: Re: GTX295 CUDA Issues
Post by: SciManStev on 01 Jun 2010, 06:54:00 pm
I am running SSSE3, and the larger units are completing in just over 9 minutes, with shorties completing in 2 minutes and maybe a few seconds. There seem to be several interim units that are completed in just over 7 minutes. Rarely I will get a unit that takes 10 minutes. With the 295, it was 11 to 14 minutes, and full of errors, regardless of overclocking either GPU or CPU. Current CPU clock speed is for the CPU 980, 4.2339 GHz, with GPU shaders at 1510, and memory at 1890. Temps are i46°C for GPU, and 49°C for CPU. There is a long way I can keep going as far as over clocking is concerned.

Steve
Title: Re: GTX295 CUDA Issues
Post by: Raistmer on 01 Jun 2010, 07:01:01 pm
Looks like you mix CPU and GPU apps.
There is no SSSE3-based GPU app. Akv8 SSSE3 is CPU app, and I hardly belive it can do task for only 9 mins even on i7.
On my Q9450 long task take ~2h of CPU while short one ~30min.
Title: Re: GTX295 CUDA Issues
Post by: SciManStev on 01 Jun 2010, 07:28:58 pm
Sorry about that. I'm still coming up to speed, and learning what I can, while still making mistakes. I tried to look at the app_info file for the answer, and as it was a copy of Todd's, I am still learning to read and decipher it. Over the next few weeks, I will make a huge effort to learn more, and be more useful. My CPU WU's are completing in about 1 hour. I am running 95% of six cores, with hyperthreading disabled. The fermi app is setiathome_6.09_windows_intelx86__cuda_fermi.exe, as it was directly coppied from Todd. I truley want to learn to edit and create files. I have a book coming that should help.

Steve
Title: Re: GTX295 CUDA Issues
Post by: Pepi on 01 Jun 2010, 07:31:30 pm
Current CPU clock speed is for the CPU 980, 4.2339 GHz, with GPU shaders at 1510, and memory at 1890. Temps are i46°C for GPU, and 49°C for CPU. There is a long way I can keep going as far as over clocking is concerned.

Steve

Lower your CPU to stock frequency, and then also lower your shaders to let say 1400 MHz, and also memory to 1700, and then try. I have problems with my GTX260 ( shaders not working well) After I put shaders to stock frequency there is 15 days without any error. I uset OCCT test with error counting for GPU testing. If you have even one error on 1 hour test you must lower your shaders clocks, until you get it stable. I can bet that your card will fail on this test, and give you errors in first few minutes.
Title: Re: GTX295 CUDA Issues
Post by: SciManStev on 01 Jun 2010, 07:44:04 pm
Now that I am running the 480's, I have no errors, and my system is very stable. The 295, gave me nothing but headaches. I tried stock speeds, undercllocking the memory, everything I could think of or read on the SETI boards. As soon as I converted to the 480's, my problems vanished.

Steve
Title: Re: GTX295 CUDA Issues
Post by: Raistmer on 01 Jun 2010, 07:48:17 pm
Quite enough data to return it to seller IMO ;D
Title: Re: GTX295 CUDA Issues
Post by: Pepi on 01 Jun 2010, 07:50:39 pm
Try OCCT test, you have nothing to loose. If I am right, you have bad card, use EVGA Precision to change shaders frequency ( without flashing) and test it.
Title: Re: GTX295 CUDA Issues
Post by: SciManStev on 01 Jun 2010, 07:57:15 pm
At this point, the 295 can be sold to a gamer. The card does excellent with games. I have seen others try to return their cards, only to be met with total frustration. I will relate any past data on the 295, but I am now a Fermi man! I love the 480's and with my dual radiator water cooling system, I have no motivation to go back. What ever I can test here at Lunatics, it needs to be Fermi. Just for me to remove a 480, and go back to the 295 would take a couple of hours, and a fluid drain.

Steve
Title: Re: GTX295 CUDA Issues
Post by: Pepi on 01 Jun 2010, 07:59:54 pm
In that case: crunch with Fermi and be happy man!  :)
Title: Re: GTX295 CUDA Issues
Post by: SciManStev on 01 Jun 2010, 08:13:47 pm
I truly am happy. I want so much to learn everything I can, and contribute in any way possible. I really want to be here, and be a part of this valiant quest to constantly improve our search for ET.

Steve