If at first you don't succeed try try again!!!
...Got a Breakdown Joe ?
AR=0.39430364685758, First limit=30, Second limit=100 [ChirpRes 0.1665]FFTLen Stepsize NumCfft Spikes Gaussians Pulses Triplets PoTlen 8 7.463718 27 3538944 189 1365 2835 16621 16 3.731859 53 3473408 795 6075 11925 8310 32 1.865929 107 3506176 3317 24645 49755 4155 64 0.932965 215 3522560 13545 101115 203175 2078 128 0.466482 429 3514368 54483 409575 817245 1039 256 0.233241 857 3510272 218535 1640925 3278025 519 512 0.116621 1715 3512320 876365 6568905 13145475 260 1024 0.058310 3429 3511296 3507867 26316675 52618005 130 2048 0.029155 6859 3511808 14040373 105287445 210605595 65 4096 0.014578 13719 3512064 56179305 421314075 842689575 32 8192 0.007289 27439 3512192 224752849 1685584935 3371292735 16 16384 0.003644 54879 3512256 899082657 0 0 8 32768 0.014788 13525 432800 0 0 0 4 65536 0.003697 16229 259664 0 0 0 2131072 0.000924 64917 519336 0 0 0 1 -------- ---------- ---------- ---------- ----------Totals 204399 43349464 1198730280 2247255735 199747049
The lower multiprocessorcount of the 9500GT, about half that of my old 9600GSO, would see long PulsePoTs at fftLength 4096 and under, split pulsefind kernel execution more often to fit hardware. That would explain naturally longer runtime of the tasks on lower classes of GPU, while staying the same as other midrange tasks on higher GPUS. In addition, I did move execution of those kernels to a non-default stream (ie. not stream 0), and tamper with kernel launch geometry somewhat. That could explain why it runs to completion on x32f, while suffers timeouts & driver crashes under stock.Jason
...Watching for any similar cases is of course called for, at this point trying to make a special debug build without having even a vague theory of possible causes seems impractical. Joe
Looks like I'll have to go through for a trim session this weekend