Forum > GPU crunching
x38g reports
perryjay:
I'm not sure how widespread the prblem is but I have been noticing 560Tis showing up in my results as giving -9s where I and another finished clean. Hadn't looked close enough to figure out if it was just one or two of them or re spread out over the whole type. Hope you find the problem as it looks like a really nice card otherwise. After peading that thread it seems like they have found a pretty good workaround.
Jason G:
Yeah upping the voltage slightly is a likely solution for those -9's, especially when running several tasks at once. Probably the manufacturers kept the voltage down to meet a power spec or something.
I'll likely know more in a few days, but it does look like there are timing sensitivity issues as well, possibly to do with memory controller load. I'm testing a build on my p4/GTX260 now that both gives more descriptive error messages, and guts & replaces a bunch of code inherited from nVidia dev rk preceeding the FFTs that fail. That 260 has steadily exhibited the issue, so has proven useful for exploring how to make the app harder.
Prior to the code-gutting excercise, I did a quick test to look for the Error code:
--- Quote --- ...
Multibeam x39 Preview, Cuda 3.20
Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is : 2.592398
...
A FFT launch failed (try 1), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 2), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 3), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 4), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 5), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 6), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 7), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 8), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 9), code CUFFT_EXEC_FAILED = 0x6
A FFT launch failed (try 10), code CUFFT_EXEC_FAILED = 0x6
CUFFT error in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcc_fft.cu' in line 81. code CUFFT_EXEC_FAILED = 0x6
--- End quote ---
Googling CUT_EXEC_FAILED reveals common issues with this in the part on various cards. especially on GPUGrid, mostly with GTX 260's , so I have replaced this x39 on that host with a 'special' different x39 build that looks the same but changes a lot of code before the FFTs. I'll be watching that for a day or so, then determine if there indeed was some crankiness in the drivers etc, or careful code leading up to the FFTs resolves the issue.
No more errors yet, touchwood, on the 260 since I changed from 'ordinary x39 with extra descriptive errors' to ' x39 with replaced code before FFTs'. The latest errors with x39 visible (for the moment) for that host is the first kind of x39, so is part of attempting to track things down. I'll be looking for any after 18 Jun 2011 | 16:52:11 UTC for further diagnosis/investigation.
Jason
_heinz:
Hi Jason,
on my ATOM- ION x38g runs still on CPU... it has 10% after 5 hours, so ~ 50 hours runtime at end.
Should I abort it ?
Maybe there is not enough memory available while initializing the GPU.
Some standalone test with test wu's are necessary.
Have BOINC 6.10.60
NVIDIA GPU 0: ION (driver version 27533, CUDA version 4000, compute capability 1.1, 64MB, 35 GFLOPS peak)
And since I installed the driver, it shows wrong value 64MB
heinz
Pepi:
If I may write some words :)
It looks like 560TI is little power hungry beast. And some manufacturers think that card will operate well with lower voltage settings. But in case of SETI and other BOINC projects it is not true. I have issue with my Gigabyte 560 TI also, and until I raise voltage to 1.0375V I was unable to get stable work. With this voltage I can do 24/7 crunching without any problems. So before you blame your app or SDK, or drivers, do two things: or downclock your GPU to 820 MHZ ( stock freq) or give to GPU more voltage , not below 1.025V. And then do testing.
Jason G:
--- Quote from: _heinz on 19 Jun 2011, 03:01:16 am ---Hi Jason,
on my ATOM- ION x38g runs still on CPU... it has 10% after 5 hours, so ~ 50 hours runtime at end.
Should I abort it ?
--- End quote ---
We know it has more memory than that, so stop Boinc put a driver that reports properly & reboot. The app should pick up properly ehrn iy sees enough TAM. That mechanism will change in x39 series to say 'go away' if there isn't enough total, and a Boinc temporary exit ( newer BoincApi feature I'll have to update the BoincApi in use to have access to )
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version