Forum > Windows

just installed Unified Installers, v0.37 for Windows

<< < (4/10) > >>

Josef W. Segur:

--- Quote from: perryjay on 01 Sep 2010, 09:38:13 pm ---If at first you don't succeed try try again!!!   ;D
--- End quote ---

Well done, third time's the charm! ;) I have the WU running a standalone test on a system which should take between 12 and 13 hours for that AR. Maybe someone else with CUDA capability could check whether it causes any unusual effects.
                                                                                    Joe

Jason G:
will be able to laod it up tonight for a , look.

perryjay:
Just a thought but could whatever is causing my little 9500Gt to hang on these be what's causing stock AMDs to hang? Sure would be great if my problem helped to find a cure for that. I know optimizing AMDs cures it but if we can find that one wrong piece in the WU maybe the boys at Berkeley could correct it in the stock WUs.

Jason G:
Have run just now under x32f, both Cuda 3 & 3.1 versions, on the 480 looking for anything unusual.  Nothing immediately obvious yet.   These builds, as usual, have the bench code disabled that causes those rare issues on stock with AMD.  ~8 minutes elapsed, ~1min CPU time.  Pretty normal processing for a Mid Angle range task here.  I don't have stock cuda_fermi on hand at the moment to see if that differs. 

Will see if I can spot anything in the result files, such as lots of closely spaced triplets or something...

[Edit:] Notes:
  -  Your result file is 'Strongly Similar' to both mine
  -  Both detected pulses seem to be at 'fairly short' FFT Lengths, (i.e. Long PulsePoTs) which can run more efficiently on Fermi hardware at this time, but prior gen can choke.  I suspect these Long PulsePoTs could explain up to around 50% increased runtime for this task, maybe more, but would need a chirp/FFT pair breakdown to know for sure.  If correct then it's a 'nasty bastard' task for older/lower capacity cards, but I'm not prepared to rule out something else interfering with the run time on that machine yet.

Got a Breakdown Joe ?

The lower multiprocessorcount of the 9500GT, about half that of my old 9600GSO, would see long PulsePoTs at fftLength 4096 and under, split pulsefind kernel execution more often to fit hardware.  That would explain naturally longer runtime of the tasks on lower classes of GPU, while staying the same as other midrange tasks on higher GPUS.  In addition, I did move execution of those kernels to a non-default stream (ie. not stream 0), and tamper with kernel launch geometry somewhat.  That could explain why it runs to completion on x32f, while suffers timeouts & driver crashes under stock.

Jason

perryjay:
Just to show, this is the stderr from one completed back on the 21st....also an 0.39 AR  and a 21ap10ag....

Oops, not completed, errored out..   ::)

Stderr output

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : GeForce 9500 GT
           totalGlobalMem = 1056505856
           sharedMemPerBlock = 16384
           regsPerBlock = 8192
           warpSize = 32
           memPitch = 2147483647
           maxThreadsPerBlock = 512
           clockRate = 1840363
           totalConstMem = 65536
           major = 1
           minor = 1
           textureAlignment = 256
           deviceOverlap = 1
           multiProcessorCount = 4
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce 9500 GT is okay
SETI@home using CUDA accelerated device GeForce 9500 GT
V12 modification by Raistmer
Priority of worker thread rised successfully
Priority of process adjusted successfully
Total GPU memory 1056505856    free GPU memory 983990272
setiathome_enhanced 6.02 Visual Studio/Microsoft C++

Build features: Non-graphics   CUDA    VLAR autokill enabled    FFTW   USE_SSE   x86   
     CPUID: Pentium(R) Dual-Core  CPU      E5400  @ 2.70GHz

     Cache: L1=64K L2=2048K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is :  0.393971
After app init: total GPU memory 1056505856    free GPU memory 983990272
Cuda error 'cufftExecC2C' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_fft.cu' in line 143 : the launch timed out and was terminated.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : the launch timed out and was terminated.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : the launch timed out and was terminated.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_summax.cu' in line 147 : the launch timed out and was terminated.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_summax.cu' in line 147 : the launch timed out and was terminated.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_summax.cu' in line 160 : the launch timed out and was terminated.

</stderr_txt>
]]>

Hope that helps.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version