Forum > Windows
just installed Unified Installers, v0.37 for Windows
Jason G:
Thanks, well it sortof fits the theory, from what I can tell so far. 'cufftExecC2C' would have been the first kernel executed after a pulsefind on the previous cfft pair, A long one of which having crashed the driver, or application context etc. Everything after that clearly hosed. I reckon in the future we can handle that better.
perryjay:
That's me Jason, low class all the way!! ;D Glad I could be of help and give you guys something to play with.
Josef W. Segur:
--- Quote from: Jason G on 02 Sep 2010, 12:29:33 pm ---...
Got a Breakdown Joe ?
--- End quote ---
--- Code: ---AR=0.39430364685758, First limit=30, Second limit=100 [ChirpRes 0.1665]
FFTLen Stepsize NumCfft Spikes Gaussians Pulses Triplets PoTlen
8 7.463718 27 3538944 189 1365 2835 16621
16 3.731859 53 3473408 795 6075 11925 8310
32 1.865929 107 3506176 3317 24645 49755 4155
64 0.932965 215 3522560 13545 101115 203175 2078
128 0.466482 429 3514368 54483 409575 817245 1039
256 0.233241 857 3510272 218535 1640925 3278025 519
512 0.116621 1715 3512320 876365 6568905 13145475 260
1024 0.058310 3429 3511296 3507867 26316675 52618005 130
2048 0.029155 6859 3511808 14040373 105287445 210605595 65
4096 0.014578 13719 3512064 56179305 421314075 842689575 32
8192 0.007289 27439 3512192 224752849 1685584935 3371292735 16
16384 0.003644 54879 3512256 899082657 0 0 8
32768 0.014788 13525 432800 0 0 0 4
65536 0.003697 16229 259664 0 0 0 2
131072 0.000924 64917 519336 0 0 0 1
-------- ---------- ---------- ---------- ----------
Totals 204399 43349464 1198730280 2247255735 199747049
--- End code ---
--- Quote ---The lower multiprocessorcount of the 9500GT, about half that of my old 9600GSO, would see long PulsePoTs at fftLength 4096 and under, split pulsefind kernel execution more often to fit hardware. That would explain naturally longer runtime of the tasks on lower classes of GPU, while staying the same as other midrange tasks on higher GPUS. In addition, I did move execution of those kernels to a non-default stream (ie. not stream 0), and tamper with kernel launch geometry somewhat. That could explain why it runs to completion on x32f, while suffers timeouts & driver crashes under stock.
Jason
--- End quote ---
Perryjay did say the GPU had handled other tasks with similar AR much quicker, and the way the ALFALFA project observes I'd expect he even had at least several with AR identical to the full 14 digits supplied in the WU header. About the only possibility of something unusual in this WU has to be in the data. For Pulse finding, about the only possibility of a slowdown would be if the best_pulse threshold built up gradually, requiring a lot of data to be sent back from GPU to CPU. And for Gaussian fitting the situation is similar, there might have been a gradual buildup requiring much data return to the CPU, and even doing the final ChiSqr checks an unusually large number of times might be implicated.
My CPU run actually finished quicker than I'd expected, but that's mostly my not having done many full-length tasks on the test system. The result file is very strongly similar to Perryjay's as expected.
My judgement is the WU is exonerated, it just happened to be the one being processed when something caused either a GPU slowdown or tied up the CPU so it wasn't getting the next GPU operation started promptly. There's no way to tell if it was a protracted sluggishness or a period of zero progress, of course. Whatever, the task took about 3 times as long as usual for similar tasks which is disturbing but didn't approach the ~10 times longer which would have risked a -177 error. (The AMD hang on stock CPU apps appears to be permanent unless the user takes action, otherwise it will always reach the time limit.)
Watching for any similar cases is of course called for, at this point trying to make a special debug build without having even a vague theory of possible causes seems impractical.
Joe
Jason G:
--- Quote from: Josef W. Segur on 02 Sep 2010, 02:14:42 pm ---...
Watching for any similar cases is of course called for, at this point trying to make a special debug build without having even a vague theory of possible causes seems impractical.
Joe
--- End quote ---
Thanks for the breakdown, I agree those cffts don't look particularly hardcore, and didn't see anything unusual in execution here.
Perhaps if caught it the act, it would warrant grabbing a HiJackThis! log or similar to look for interfering processes.
perryjay:
You're right Joe, I've finished a few of the same angle range and even very similar WU name. This one caught my eye because of the 5 hour run time. It was the first I'd seen that didn't -1 error out on me. The rest of them so far have been well within normal runtimes.
If you gentlemen are through, someone in another thread is saying something about the upload folder is full. So, I guess I should delete my stuff.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version