Seti@Home optimized science apps and information
Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: madmac on 10 May 2010, 12:44:50 pm
-
OK, I decided to splash out and spent £250 getting a second hand 295 as I was fed up watching my position slide down the tables :-(
I built a new (2nd hand) rig, installed my card, latest nvidia driver, turned of sli and put the latest optimised apps on and fired her up.
(It would appear that I have a problem with one of the cores as one gpu just errors typically after 19 sec's but never much longer than that, though there has been a couple of times where it has crunched to about 80% and then errored, but this has only happened with 1 or 2 units out of too many to mention, so might be a fluke.
Reverted back to stock apps - no difference
dropped the memory clocks right down to 500Mhz - no difference, the card is now back at stock speeds with one gpu disabled
This is my new pc
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5400786
And this is the task list - Could someone on here have a look at the errors and see if there is a common theme or if there is anything I can change as Im stumped on this one...
The error messages are not always the same
http://setiathome.berkeley.edu/results.php?hostid=5400786
Is it hardware related or is it software related..
Any help gratefully received
-
99.99% it's a hardware problem. All invalids and error workunits are from the first GPU (device 1).
A bit puzzling is http://setiathome.berkeley.edu/result.php?resultid=1604052193 that gives an out of memory error.
-
Mmmm, thats weird, I've got a gtx295 and been suffering similar issues for about the last three weeks. I thought it was a heat issue and completely dismantled the GTX295 and removed all dustballs etc and replaced heatsink paste with Artic silver. Overall temperatures have reduced by about 5 to 8 degrees but the Seti problem persisted. I assumed it may be a OS or registry issue so I clean installed Win7 64bit with various Nvidia drivers but still the problem persists. The errors always occur on GPU 0. I am not convinced its a hardware issue as all other Cuda apps work flawlessly. The GTX 295 plays intensive games without a hitch.
I have performed a burnin test using furmark which took the GTX295 to 93 degrees well within its 105 degree design limit. In case it was an issue with gtx295 interfacing with my motherboard i removed all over clocks from my I7 processor and memory and put everything back to stock Intel settings.
Still GPU 0 throws an error every couple units or sometimes 10 in a row.
:-\
-
weak power supply?
-
I am not convinced its a hardware issue as all other Cuda apps work flawlessly. The GTX 295 plays intensive games without a hitch.
I have performed a burnin test using furmark which took the GTX295 to 93 degrees well within its 105 degree design limit.
Still GPU 0 throws an error every couple units or sometimes 10 in a row.
:-\
l
Interesting, can you clarify the 'all other CUDA apps work flawlessly' bit?
I have read that this is an issue with the older dual pcb versions over on the seti forums
I have the problem that 99.99% of wu's error on one of the gpu's
Trying to get my money back...
-
weak power supply?
I had suspected this so I installed a brand new Corsair 750w nearly two weeks ago. I wired the system so only the boot drive, motherboard, DVD writer and GTX295 were connected and for the following 5 days the problems persisted.
-
l
Interesting, can you clarify the 'all other CUDA apps work flawlessly' bit?
I have read that this is an issue with the older dual pcb versions over on the seti forums
My GTX is the dual PCB version and I suspect the issue you refer to is the bug in the memory controller, so far as I can make out this is more of an urban legend rather than actually confirmed by Nvidia.
As I said other Cuda apps work with very little hitch - the ones I currently use are Badaboom, TMPGENc, Adobe, Boinc Collatz Conjecture and GPUgrid. Boinc Collatz Conjecture throws occasional errors but nowhere near as many as Seti.
-
Can anybody explain this error, suddenly getting lots of these now :-
Name 29no06ag.1150.892.6.10.140_0
Workunit 608510479
Created 6 May 2010 8:08:47 UTC
Sent 6 May 2010 8:12:24 UTC
Received 13 May 2010 6:12:24 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 5180462
Report deadline 22 Jun 2010 22:42:57 UTC
Run time 989.105137
CPU time 287.4474
stderr out
<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1 : GeForce GTX 295
totalGlobalMem = 919994368
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 2147483647
maxThreadsPerBlock = 512
clockRate = 1242000
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
Device 2 : GeForce GTX 295
totalGlobalMem = 919994368
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 2147483647
maxThreadsPerBlock = 512
clockRate = 1242000
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 295 is okay
SETI@home using CUDA accelerated device GeForce GTX 295
V10 modification by Raistmer
Priority of worker thread rised successfully
Priority of process adjusted successfully
Total GPU memory 919994368 free GPU memory 561876992
setiathome_enhanced 6.02 Visual Studio/Microsoft C++
Build features: Non-graphics VLAR autokill enabled FFTW x86
CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
Cache: L1=64K L2=256K
CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.4.5
Work Unit Info:
...............
WU true angle range is : 0.406157
Cuda error 'cufftExecC2C' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_fft.cu' in line 63 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.
</stderr_txt>
]]>
Validate state Invalid
Claimed credit 73.2308450799687
Granted credit 0
-
Have you recently updated drivers etc. ?
-
Have you recently updated drivers etc. ?
I am currently running 197.45 as this is the version recommended for Cuda support in Adobe CS5.
Which version are folks using for Seti ?
-
Have you recently updated drivers etc. ?
I am currently running 197.45 as this is the version recommended for Cuda support in Adobe CS5.
Which version are folks using for Seti ?
These are my computers http://setiathome.berkeley.edu/hosts_user.php?userid=8906489
XP 64 is by far the best choice. Win 7 is a lot slower and a bit buggy.
Do you have any warranty on this card?
I don't want to scare you, but these 2 pcb versions are really bad.
Got about 4 defect cards, before I got 3 cards that work fine 24/7.
The defects were all 2 pcb cards, and lucky me, I have none left, all replaced under full warranty. Without any questions. Even as a second owner you may check if you still have any warranty left.
The 2 pcb cards get way way too hot. It may be that under Seti the cards get a bit warmer than with other applications.
So I recognize all your problems. I had 2 systems so I could swap them around and still it didn't work. The one pcb's worked flawlessly.
Cuda 2.3 works best for me. And 19038 not the latest work best.
-
These are my computers http://setiathome.berkeley.edu/hosts_user.php?userid=8906489
XP 64 is by far the best choice. Win 7 is a lot slower and a bit buggy.
Do you have any warranty on this card?
I don't want to scare you, but these 2 pcb versions are really bad.
Got about 4 defect cards, before I got 3 cards that work fine 24/7.
The defects were all 2 pcb cards, and lucky me, I have none left, all replaced under full warranty. Without any questions. Even as a second owner you may check if you still have any warranty left.
The 2 pcb cards get way way too hot. It may be that under Seti the cards get a bit warmer than with other applications.
So I recognize all your problems. I had 2 systems so I could swap them around and still it didn't work. The one pcb's worked flawlessly.
Cuda 2.3 works best for me. And 19038 not the latest work best.
Warrantys out so thats a no go. I have shopped around looking for a single PCB version but they seem to be out of circulation or discontinued in the UK.
I think I will wait for the coding issues to get sorted and go the GTX480 route.
-
Try the new Beta drivers.
These Win 7 / GTX 295 drivers are the first to really work.
My 2 cards work, for the first time with Win 7, without any problems.
-
I ended getting a refund in the end..
Ok, I now need some advice on how to spend £300!
All I am interested in is crunching - not games.
Typical me, I was looking at a machine filled with 9800 GX2's, but I got excited and bought a 295 of ebay. It had a fault so I returned it, but I loved the output it gave me :-)
So I have to buy another card
I know there is no hard and fast data, but what will give the best ppd?
A GTX295 or a GTX470 as there isn't that much price difference now, a 480 is still too expensive...
I know Fermi has a different design and the 256 drivers and CUDA 3.1 are meant to improve things further so if you had the money, what would you do???
For the best points per day would you get a 470 or a 295?
Help a cruncher in need :-)
-
A GTX295 should give a better RAC than a GTX470 or 480. You could also wait, don't know how long, for the dual fermi cards to come out.
-
Can't wait that long as my machine isn't doing anything at the moment..
Will see what I can pick up off the bay then - cheers
-
The GTX 295 by faaar.
-
A GTX295 should give a better RAC than a GTX470 or 480. You could also wait, don't know how long, for the dual fermi cards to come out.
The dual fermi card looks read bad, a heavily underclocked dual 470.
-
We don't even know yet if it will be a dual 470 or other fermi iteration like for example GF104.
-
Posted on Seti and got this reply...
Depends whether you are interested in instantaneous results, or whether you are planning for the long term.
A 295 is mature technology, perhaps even obsolescent. It will have peaked already - there won't be much new development for it, and in due course there will be new developments which it can't keep up with.
A 470 is new technology, and won't be giving of its best yet. But it should be the focus of development effort for longer into the future, and you will be able to gain from those efforts.
Now or later? Dead-end or progress? Your choice.
He has a point, so Im going for a fermi :-)
-
Posted on Seti and got this reply...
From me. Flames to the usual address......
-
It's somewhat comforting to know that others have had similar problems with their 295's. Mine played games great, and worked perfectly with the standard apps. Every time I switched to the optimized apps, all it did was crash work units left and right. This happened on multiple motherboards, PSU's and processors. Heat was never even close to being an issue. It is nice to learn that my card was most likely an older one that was made with this error. I didn't know that before I read this thread. My two GTX 480's are crunching like mad now, and with the faulty 295 out of the picture, all my problems went away.
One other symptom I was getting was that every time I tried to use a Killawatt meter to measurre power levels, the system crashed. I did upgrade my PSU, and it got a little better, but still during one of the regular screen blanking caused by the 295, it crashed me again. Now with the 480's, and an OC'ed CPU 980 at 4.2339 GHz, the system is very stable with the Killawatt meter running all the time.
Steve
-
It's somewhat comforting to know that others have had similar problems with their 295's. Mine played games great, and worked perfectly with the standard apps. Every time I switched to the optimized apps, all it did was crash work units left and right. This happened on multiple motherboards, PSU's and processors. Heat was never even close to being an issue. It is nice to learn that my card was most likely an older one that was made with this error. I didn't know that before I read this thread. My two GTX 480's are crunching like mad now, and with the faulty 295 out of the picture, all my problems went away.
One other symptom I was getting was that every time I tried to use a Killawatt meter to measurre power levels, the system crashed. I did upgrade my PSU, and it got a little better, but still during one of the regular screen blanking caused by the 295, it crashed me again. Now with the 480's, and an OC'ed CPU 980 at 4.2339 GHz, the system is very stable with the Killawatt meter running all the time.
Steve
Hi Steve, which Seti App. are you running and whats the average crunch time per work unit with your setup ?
-
I am running SSSE3, and the larger units are completing in just over 9 minutes, with shorties completing in 2 minutes and maybe a few seconds. There seem to be several interim units that are completed in just over 7 minutes. Rarely I will get a unit that takes 10 minutes. With the 295, it was 11 to 14 minutes, and full of errors, regardless of overclocking either GPU or CPU. Current CPU clock speed is for the CPU 980, 4.2339 GHz, with GPU shaders at 1510, and memory at 1890. Temps are i46°C for GPU, and 49°C for CPU. There is a long way I can keep going as far as over clocking is concerned.
Steve
-
Looks like you mix CPU and GPU apps.
There is no SSSE3-based GPU app. Akv8 SSSE3 is CPU app, and I hardly belive it can do task for only 9 mins even on i7.
On my Q9450 long task take ~2h of CPU while short one ~30min.
-
Sorry about that. I'm still coming up to speed, and learning what I can, while still making mistakes. I tried to look at the app_info file for the answer, and as it was a copy of Todd's, I am still learning to read and decipher it. Over the next few weeks, I will make a huge effort to learn more, and be more useful. My CPU WU's are completing in about 1 hour. I am running 95% of six cores, with hyperthreading disabled. The fermi app is setiathome_6.09_windows_intelx86__cuda_fermi.exe, as it was directly coppied from Todd. I truley want to learn to edit and create files. I have a book coming that should help.
Steve
-
Current CPU clock speed is for the CPU 980, 4.2339 GHz, with GPU shaders at 1510, and memory at 1890. Temps are i46°C for GPU, and 49°C for CPU. There is a long way I can keep going as far as over clocking is concerned.
Steve
Lower your CPU to stock frequency, and then also lower your shaders to let say 1400 MHz, and also memory to 1700, and then try. I have problems with my GTX260 ( shaders not working well) After I put shaders to stock frequency there is 15 days without any error. I uset OCCT test with error counting for GPU testing. If you have even one error on 1 hour test you must lower your shaders clocks, until you get it stable. I can bet that your card will fail on this test, and give you errors in first few minutes.
-
Now that I am running the 480's, I have no errors, and my system is very stable. The 295, gave me nothing but headaches. I tried stock speeds, undercllocking the memory, everything I could think of or read on the SETI boards. As soon as I converted to the 480's, my problems vanished.
Steve
-
Quite enough data to return it to seller IMO ;D
-
Try OCCT test, you have nothing to loose. If I am right, you have bad card, use EVGA Precision to change shaders frequency ( without flashing) and test it.
-
At this point, the 295 can be sold to a gamer. The card does excellent with games. I have seen others try to return their cards, only to be met with total frustration. I will relate any past data on the 295, but I am now a Fermi man! I love the 480's and with my dual radiator water cooling system, I have no motivation to go back. What ever I can test here at Lunatics, it needs to be Fermi. Just for me to remove a 480, and go back to the 295 would take a couple of hours, and a fluid drain.
Steve
-
In that case: crunch with Fermi and be happy man! :)
-
I truly am happy. I want so much to learn everything I can, and contribute in any way possible. I really want to be here, and be a part of this valiant quest to constantly improve our search for ET.
Steve