Forum > Windows

just installed Unified Installers, v0.37 for Windows

<< < (5/10) > >>

Jason G:
Thanks,  well it sortof fits the theory, from what I can tell so far.    'cufftExecC2C' would have been the first kernel executed after a pulsefind on the previous cfft pair,  A long one of which having crashed the driver, or application context etc.  Everything after that clearly hosed.  I reckon in the future we can handle that better.

perryjay:
That's me Jason, low class all the way!!   ;D  Glad I could be of help and give you guys something to play with.

Josef W. Segur:

--- Quote from: Jason G on 02 Sep 2010, 12:29:33 pm ---...
Got a Breakdown Joe ?
--- End quote ---


--- Code: ---AR=0.39430364685758, First limit=30, Second limit=100  [ChirpRes 0.1665]

FFTLen   Stepsize  NumCfft     Spikes  Gaussians     Pulses   Triplets  PoTlen
     8   7.463718       27    3538944        189       1365       2835   16621
    16   3.731859       53    3473408        795       6075      11925    8310
    32   1.865929      107    3506176       3317      24645      49755    4155
    64   0.932965      215    3522560      13545     101115     203175    2078
   128   0.466482      429    3514368      54483     409575     817245    1039
   256   0.233241      857    3510272     218535    1640925    3278025     519
   512   0.116621     1715    3512320     876365    6568905   13145475     260
  1024   0.058310     3429    3511296    3507867   26316675   52618005     130
  2048   0.029155     6859    3511808   14040373  105287445  210605595      65
  4096   0.014578    13719    3512064   56179305  421314075  842689575      32
  8192   0.007289    27439    3512192  224752849 1685584935 3371292735      16
 16384   0.003644    54879    3512256  899082657          0          0       8
 32768   0.014788    13525     432800          0          0          0       4
 65536   0.003697    16229     259664          0          0          0       2
131072   0.000924    64917     519336          0          0          0       1
                  -------- ---------- ---------- ---------- ----------
Totals              204399   43349464 1198730280 2247255735  199747049
--- End code ---


--- Quote ---The lower multiprocessorcount of the 9500GT, about half that of my old 9600GSO, would see long PulsePoTs at fftLength 4096 and under, split pulsefind kernel execution more often to fit hardware.  That would explain naturally longer runtime of the tasks on lower classes of GPU, while staying the same as other midrange tasks on higher GPUS.  In addition, I did move execution of those kernels to a non-default stream (ie. not stream 0), and tamper with kernel launch geometry somewhat.  That could explain why it runs to completion on x32f, while suffers timeouts & driver crashes under stock.

Jason
--- End quote ---

Perryjay did say the GPU had handled other tasks with similar AR much quicker, and the way the ALFALFA project observes I'd expect he even had at least several with AR identical to the full 14 digits supplied in the WU header. About the only possibility of something unusual in this WU has to be in the data. For Pulse finding, about the only possibility of a slowdown would be if the best_pulse threshold built up gradually, requiring a lot of data to be sent back from GPU to CPU. And for Gaussian fitting the situation is similar, there might have been a gradual buildup requiring much data return to the CPU, and even doing the final ChiSqr checks an unusually large number of times might be implicated.

My CPU run actually finished quicker than I'd expected, but that's mostly my not having done many full-length tasks on the test system. The result file is very strongly similar to Perryjay's as expected.

My judgement is the WU is exonerated, it just happened to be the one being processed when something caused either a GPU slowdown or tied up the CPU so it wasn't getting the next GPU operation started promptly. There's no way to tell if it was a protracted sluggishness or a period of zero progress, of course. Whatever, the task took about 3 times as long as usual for similar tasks which is disturbing but didn't approach the ~10 times longer which would have risked a -177 error. (The AMD hang on stock CPU apps appears to be permanent unless the user takes action, otherwise it will always reach the time limit.)

Watching for any similar cases is of course called for, at this point trying to make a special debug build without having even a vague theory of possible causes seems impractical.
                                                                                 Joe

Jason G:

--- Quote from: Josef W. Segur on 02 Sep 2010, 02:14:42 pm ---...
Watching for any similar cases is of course called for, at this point trying to make a special debug build without having even a vague theory of possible causes seems impractical.
                                                                                 Joe

--- End quote ---

Thanks for the breakdown, I agree those cffts don't look particularly hardcore, and didn't see anything unusual in execution here. 

Perhaps if caught it the act, it would warrant grabbing a HiJackThis! log or similar to look for interfering processes.

perryjay:
You're right Joe, I've finished a few of the same angle range and even very similar WU name. This one caught my eye because of the 5 hour run time. It was the first I'd seen that didn't -1 error out on me. The rest of them so far have been well within normal runtimes.

If you gentlemen are through, someone in another thread is saying something about the upload folder is full. So, I guess I should delete my stuff.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version