Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: Raistmer on 27 Dec 2008, 07:56:59 pm

Title: WUs that CUDA MB can't do correctly
Post by: Raistmer on 27 Dec 2008, 07:56:59 pm
Here this WU with results, log of testing, rescmpv2 for comparison are attached.


[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Jason G on 28 Dec 2008, 05:57:29 am
Here's that WU run with AKv8b SSE4.1 , and stock cuda 6.05.

Same results as your AKv8 SSE3  Vs 6.06:

Quote
------------
Running app : AK_v8b_win_SSE41.exe with -verb -nog
with WU     : 03dc08ad.15767.890.15.8.213.wu
Started at  : 20:20:54.402
Ended at    : 20:56:35.543
   2141.047 secs Elapsed
   2128.016 secs CPU time
Result      : stored as ref for validation.
------------
Running app : setiathome_6.05_windows_intelx86__cuda.exe with -verb -st
with WU     : 03dc08ad.15767.890.15.8.213.wu
Started at  : 20:56:35.590
Ended at    : 21:22:04.605
   1528.969 secs Elapsed
     95.953 secs CPU time
Speedup     : 95.49%
Ratio       : 22.18 x
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      2      0      0        2      0      0
     Gaussian      2      0      0        2      0      0
        Pulse      1      0      0        1      0      0
      Triplet      1      0      1        1      0      0
   Best Spike      1      0      0        1      0      0
Best Gaussian      1      0      0        1      0      0
   Best Pulse      1      0      0        1      0      0
 Best Triplet      0      0      1        0      0      1
                ----   ----   ----     ----   ----   ----
                   9      0      2        9      0      1

Result      : Weakly similar.

Bench file attached.  Ignore that I broke some Init_data.xml values while experimenting with something else ... no effect on (the lack of) validity of the result.

Jason


[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 28 Dec 2008, 06:01:16 am
Thanks!
Will search another reproducible failures to collect them here.
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 28 Dec 2008, 08:27:03 am
Now VHAR WU:

Online and standalone results for CUDA app strongly similar again and both invalid versus CPU AK8 SSSE3x app result.
Data along with full log attached.

Log excerpt:

------------
MB_6.06r380mod_CUDA.exe -verb -st / 03no08aa.5874.273823.14.11.250.wu :
Started at  : 15:49:57.593
Ended at    : 15:50:15.392
     17.737 secs Elapsed
     15.054 secs CPU time
Speedup     : 97.85%
Ratio       : 46.46 x
 
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      1        0      0      0
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      0        0      0      0
      Triplet      0      0      2        0      0     31
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      1        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0      7        0      0     31

Result      : Different.
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : GeForce 9600 GSO
           totalGlobalMem = 402653184
           sharedMemPerBlock = 16384
           regsPerBlock = 8192
           warpSize = 32
           memPitch = 262144
           maxThreadsPerBlock = 512
           clockRate = 1700000
           totalConstMem = 65536
           major = 1
           minor = 1
           textureAlignment = 256
           deviceOverlap = 0
           multiProcessorCount = 12
setiathome_CUDA: No device specified, determined to use CUDA device 1: GeForce 9600 GSO
SETI@home using CUDA accelerated device GeForce 9600 GSO
Rise priority modification by Raistmer based on rev380 of SETI@home sources
Priority of worker thread rised successfully
setiathome_enhanced 6.02 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is :  14.146648

[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 28 Dec 2008, 10:04:27 am
Some VLAR rich of CUDA errors in standalone mode too

AK_v8_win_SSSE3x.exe -verb -st / 15no08ac.10856.20256.16.8.135.wu :
Started at  : 16:34:50.293
Ended at    : 17:38:32.699
   3822.390 secs Elapsed
   3820.168 secs CPU time
 
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale

     CPUID: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
     Speed: 4 x 2655 MHz
     Cache: L1=64K L2=6144K
  Features: MMX SSE SSE2 SSE3 SSSE3
 
Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  0.013497

Flopcounter: 19390751501029.980000

Spike count:    4
Pulse count:    2
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]
------------
MB_6.06r380mod_CUDA.exe -verb -st / 15no08ac.10856.20256.16.8.135.wu :
Started at  : 17:38:32.745
Ended at    : 17:38:53.462
     20.686 secs Elapsed
     15.163 secs CPU time
Speedup     : 99.60%
Ratio       : 251.94 x
 
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      4        0      0     30
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      2        0      0      0
      Triplet      0      0      0        0      0      0
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      0        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0      9        0      0     30

Result      : Different.
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : GeForce 9600 GSO
           totalGlobalMem = 402653184
           sharedMemPerBlock = 16384
           regsPerBlock = 8192
           warpSize = 32
           memPitch = 262144
           maxThreadsPerBlock = 512
           clockRate = 1700000
           totalConstMem = 65536
           major = 1
           minor = 1
           textureAlignment = 256
           deviceOverlap = 0
           multiProcessorCount = 12
setiathome_CUDA: No device specified, determined to use CUDA device 1: GeForce 9600 GSO
SETI@home using CUDA accelerated device GeForce 9600 GSO
Rise priority modification by Raistmer based on rev380 of SETI@home sources
Priority of worker thread rised successfully
setiathome_enhanced 6.02 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is :  0.013497
Optimal function choices:
-----------------------------------------------------
name               
-----------------------------------------------------
              v_BaseLineSmooth (no other)

            v_GetPowerSpectrum 0.00020 0.00000  test
            v_GetPowerSpectrum 0.00020 0.00000  choice

                   v_ChirpData 0.01300 0.00000  test
                   v_ChirpData 0.01300 0.00000  choice

                   v_Transpose 0.00550 0.00000  test
                  v_Transpose2 0.00492 0.00000  test
                  v_Transpose4 0.00313 0.00000  test
                  v_Transpose8 0.00586 0.00000  test
                  v_Transpose4 0.00313 0.00000  choice

               FPU opt folding 0.00775 0.00000  test
               FPU opt folding 0.00775 0.00000  choice

Cuda error 'find_pulse_kernel2<3, false>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1166 : unknown error.
Cuda error 'find_pulse_kernel2<4, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1172 : unknown error.
Cuda error 'find_pulse_kernel2<4, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1172 : unknown error.
Cuda error 'find_pulse_kernel2<5, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1178 : unknown error.
Cuda error 'find_pulse_kernel2<5, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1178 : unknown error.
Cuda error 'cudaMemcpy(&flags, dev_find_pulse_flag, sizeof(*dev_find_pulse_flag), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1250 : unknown error.
Cuda error 'cudaMemcpy(PulseResults, dev_PulseResults, 4 * (cudaAcc_NumDataPoints / AdvanceBy + 1) * sizeof(*dev_PulseResults), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1262 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cufftExecC2C' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_fft.cu' in line 63 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.
Cuda error 'find_triplets_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 224 : unknown error.
Cuda error 'find_triplets_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 224 : unknown error.
Cuda error 'cudaMemcpy(&flags, dev_flag, sizeof(*dev_flag), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 228 : unknown error.
Cuda error 'find_pulse_kernel2<3, false>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1166 : unknown error.
Cuda error 'find_pulse_kernel2<3, false>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1166 : unknown error.
Cuda error 'find_pulse_kernel2<4, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1172 : unknown error.
Cuda error 'find_pulse_kernel2<4, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1172 : unknown error.
Cuda error 'find_pulse_kernel2<5, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1178 : unknown error.
Cuda error 'find_pulse_kernel2<5, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1178 : unknown error.
Cuda error 'cudaMemcpy(&flags, dev_find_pulse_flag, sizeof(*dev_find_pulse_flag), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1250 : unknown error.
Cuda error 'cudaMemcpy(PulseResults, dev_PulseResults, 4 * (cudaAcc_NumDataPoints / AdvanceBy + 1) * sizeof(*dev_PulseResults), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1262 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cufftExecC2C' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_fft.cu' in line 63 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.

Flopcounter: 20886859697.456421

Spike count:    30
Pulse count:    0
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]


[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 29 Dec 2008, 04:05:34 pm
This is VHAR with AR~14, but it finishes OK and validates versus CPU in standalone. Online result the same as standalone one.


[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 29 Dec 2008, 06:07:42 pm
Best Gaussian differ:

                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      1      0      0        1      0      0
     Gaussian      4      0      0        4      1      0
        Pulse      0      0      0        0      0      0
      Triplet      2      0      0        2      0      0
   Best Spike      1      0      0        1      0      0
Best Gaussian      0      1      0        0      1      0
   Best Pulse      1      0      0        1      0      0
 Best Triplet      1      0      0        1      0      0
                ----   ----   ----     ----   ----   ----
                  10      1      0       10      2      0

Result      : Weakly similar.


[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 29 Dec 2008, 06:48:12 pm
Again VHAR task, restarted under BOINC and non-overflowed.
Standalone testing w/o restarting gave strongly similar with online result and with CPU result too.


[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 30 Dec 2008, 11:13:28 am
New type of error:

AR~13 (VHAR):

SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: d:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu
Line: 235

[ /stderr ]

MB_6.06r380mod_CUDA.exe -verb -st / 03no08aa.5874.274232.14.11.84.wu :
Started at  : 18:25:58.354
Ended at    : 18:26:13.252
     14.820 secs Elapsed
     14.009 secs CPU time
Speedup     : 98.05%
Ratio       : 51.21 x
 
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      2        0      0      0
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      0        0      0      0
      Triplet      0      0      5        0      0      0
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      1        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0     11        0      0      0

Result      : Different.

Online result: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=5100914



[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 30 Dec 2008, 11:46:18 am
Invalid overflow at VHAR AR~13,6

MB_6.06r380mod_CUDA.exe -verb -st / 03no08aa.5874.274232.14.11.89.wu :
Started at  : 19:35:07.752
Ended at    : 19:35:25.239
     17.456 secs Elapsed
     14.976 secs CPU time
Speedup     : 97.92%
Ratio       : 48.09 x
 
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      1        0      0      0
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      0        0      0      0
      Triplet      0      0      2        0      0     31
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      1        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0      7        0      0     31

Result      : Different.

Online result: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=5100919

[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 30 Dec 2008, 12:32:06 pm
VLAR AR~0.15
Errors at CUDA mem copy, invalid results.

AK_v8_win_SSSE3x.exe -verb -st / 23no08ad.15915.22976.9.8.127.wu :
Started at  : 03:07:17.808
Ended at    : 04:05:01.401
   3463.562 secs Elapsed
   3458.355 secs CPU time
 
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale

     CPUID: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
     Speed: 4 x 2655 MHz
     Cache: L1=64K L2=6144K
  Features: MMX SSE SSE2 SSE3 SSSE3
 
Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  0.154919

Flopcounter: 27945224690130.227000

Spike count:    2
Pulse count:    5
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]

------------
MB_6.06r380mod_CUDA.exe -verb -st / 23no08ad.15915.22976.9.8.127.wu :
Started at  : 19:48:55.425
Ended at    : 20:17:32.767
   1717.310 secs Elapsed
     70.294 secs CPU time
Speedup     : -57.06%
Ratio       : 0.64 x
 
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      2      0      0        2      0      4
     Gaussian      0      0      0        0      0      0
        Pulse      5      0      0        5      0      2
      Triplet      0      0      0        0      0      0
   Best Spike      0      0      1        0      0      1
Best Gaussian      1      0      0        1      0      0
   Best Pulse      0      0      1        0      0      1
 Best Triplet      0      0      0        0      0      0
                ----   ----   ----     ----   ----   ----
                   8      0      2        8      0      8

Result      : Weakly similar.

Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.

Online result:  http://setiathome.berkeley.edu/result.php?resultid=1108406664


[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 30 Dec 2008, 02:59:03 pm
AR ~0.4
Invalid result:
MB_6.06r380mod_CUDA.exe -verb -st / 03dc08ab.11550.18882.10.8.130.wu :
Started at  : 21:22:41.537
Ended at    : 21:41:03.287
   1101.656 secs Elapsed
    120.339 secs CPU time
Speedup     : 95.83%
Ratio       : 24.00 x
 
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      0        0      0      0
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      1        0      0      0
      Triplet      0      0      7        0      0      0
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      1        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0     12        0      0      0

Result      : Different.

 with CUDA error:

SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: d:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu
Line: 235

[ /stderr ]



[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Josef W. Segur on 30 Dec 2008, 05:43:00 pm
VLAR AR~0.15
Errors at CUDA mem copy, invalid results.
Wall clock execution time greater than for CPU app.

============
AK_v8_win_SSSE3x.exe -verb -st / 23no08ad.15915.22976.9.8.127.wu :
Started at  : 19:48:10.591
Ended at    : 19:48:55.394
     44.772 secs Elapsed
     44.757 secs CPU time
...
No heartbeat from core client for 30 sec - exiting
...

I don't think the timing comparison is meaningful, though the CUDA mem copy errors obviously show a problem with that.
                                                                           Joe
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 30 Dec 2008, 07:03:41 pm
Oops, will retest...
(thanx for spotting this early exit)

ADDON:report edited, now correct CPU run there.
(and reference on online result added)
Title: Re: WUs that CUDA MB can't do correctly
Post by: Maik on 04 Jan 2009, 09:05:51 am
-
AR: 5.324874
-
   
MB_6.06r380mod_CUDA AK_v8_win_SSE41 setiathome_6.06_windows_intelx86__cuda
Spike count: 0 Spike count: 9 Spike count: 0
Pulse count: 0 Pulse count: 0 Pulse count: 0
Triplet count: 31 Triplet count: 2 Triplet count: 31
Gaussian count: 0 Gaussian count: 0 Gaussian count: 0

[edit] found 2 more of them, same AR, nearly same results ... stock cuda on both with triplet count: 31[/edit]

[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: popandbob on 04 Jan 2009, 02:16:31 pm
A new error for you all to have fun with...

cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel

http://setiathome.berkeley.edu/result.php?resultid=1112071921

I didn't do stand alone testing due to my pc locked up every time I tried to....

[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 04 Jan 2009, 04:58:07 pm
This task has different online and standalone results for CUDA app.
standalone results for my build, stock 6.06 and CPUapp all strongly similar, online result gave overflow.
So, there is influence between tasks, not just temporal locality of bad tasks.
AR ~2.16


[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Josef W. Segur on 04 Jan 2009, 05:05:45 pm
A new error for you all to have fun with...

cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel

http://setiathome.berkeley.edu/result.php?resultid=1112071921

I didn't do stand alone testing due to my pc locked up every time I tried to....

And the MAX_TRIPLETS_ABOVE_THRESHOLD constant is 10. I was wondering whether that would show up. It only allows for that many of the power samples to be above threshold in one Power over Time array, and those arrays are up to 128K samples in size. It is handled just like the case where more than one triplet is detected in one PoT array, a flag is set which causes a SETIERROR() call with the Unsupported Function code (12), which is negated by the time BOINC sees it (and because it's only defined in the SETI source, BOINC condiders it an unknown error).

It's a good protection from splitter problems like what happened when Enhanced first was released on main and again shortly after we started doing work from multibeam; the triplet threshold was much too low or even negative for some WUs. OTOH, it probably ought to be classed like result_overflow and be reported as a success if that's the purpose of setting the limit that low.
                                                                              Joe
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 04 Jan 2009, 05:11:10 pm
It's all true if it would be show in CPU result too. While we have such error in GPU only I tend to think it's just another CUDA app problem....
Title: Re: WUs that CUDA MB can't do correctly
Post by: Josef W. Segur on 04 Jan 2009, 05:30:41 pm
It's all true if it would be show in CPU result too. While we have such error in GPU only I tend to think it's just another CUDA app problem....

In this case it looks like a design choice rather than an execution bug. My guess is someone did a mathematical estimation of the most "above threshold" points which should occur in pure random data and allowed for somewhat more. It looks like another case where ideal randomness isn't quite achieved...
                                                                    Joe
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 04 Jan 2009, 06:37:17 pm
Another damaged online result with good standalone ones.
It seems OS reboot after driver crash/"snow" screen is required indeed although there is no visible memory leak.
AR =0.437128

ADDON: added another such result.

[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Maik on 04 Jan 2009, 07:22:48 pm
I noticed not every WU-crash leads into a gfx-memoryleak.
Very often -after my script killed a stucking task- they run normal to 100% without any loss of memory.
It seems that the memoryleak isn't indicated from stucking WU.
My assumption:
There must be an instruction (send from cuda-app) that run inside the shaders in a neverending loop.
Maybe you know, an instruction is given with a special task to the first shader-unit, it complete it gives with next special task to the next shader-unit and so on until the instruction is handled. (ok, thats for grafic calculation, dont know how cuda works)
Is there a fault, maybe the instructions runs from shader-unit to shader-unit without completing it, it is blocking the gfx to work correctly (snow on screen, jumping pixel, grafic-errors, chrash of gfx).

I had this error one time. Tryied driver-reset via RivaTuner -> nothing happend. If you have this error, you have to reboot your host ... :(
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 04 Jan 2009, 08:02:04 pm
Maybe...
There is no evidences of memory leak still.... Now I have driver crashed and restarted, "snow" on screen so will try to run task again to see if initial free memory will be less or not.

And here is "classic" VLAR AR~0.009 (edited)
It crashes videodriver with "snow" visual effect on screen.


ADDON:
No memory leak. Just usual amount of free GPU memory.
And many  CUDA errors in log, ended with overflow.



[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Maik on 04 Jan 2009, 08:03:49 pm
will do a stand alone test with 23ap08aa.4504.481.10.11.187 at my host ... done

-results AK_v8_win_SSE41.exe -> equivalent to yours (3 Pulse)
-MB_6.06r380mod_CUDA.exe -> task (WinTaskManager) goes stuck, after 30 sec i kicked them
 '--> no video-errors.
Title: Re: WUs that CUDA MB can't do correctly
Post by: Maik on 04 Jan 2009, 09:37:41 pm
-MB_6.06r380mod_CUDA.exe -> task (WinTaskManager) goes stuck, after 30 sec i kicked them
 '--> no video-errors.

I was really wrong. After restarting BM i noticed first Cuda-Wu wont work (stuck). -> reboot -> now it runs ... reporting back ->done
WU true angle range is :  2.722583 (Triplet count:  2), normally run through ... must be a driver crash from past task
I'v been warned now. Next WU's i get with AR below 0.00 i'll abort bevore bm trys to crunch them *grrr  >:(

edit: found one ... AR: 0.0078554334964187 wanna have it? -> attachment ;) (not tested yet!)

[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 05 Jan 2009, 07:26:01 am
One more VLAR
AR ~0.01
online result differ from standalone result and both differ from standalone CPU result.
There is overflow in online result, no overflow but invalid signals in CUDA standalone result.
Standalone CUDA MB run didn't cause driver crash but caused "snow screen" effect. CUDA error in stderr.



[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Maik on 05 Jan 2009, 07:52:56 am
Big WU crash with serveral restarts of task, acces violtaions error, runtime debugger ...
applivation: mem-opt MB_6.06r380mod_CUDA.exe
WU: 01no08aa.9239.55814.9.8.194
WU true angle range is :  0.019967

error:
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004097A4 read attempt to address 0x0980A820

Engaging BOINC Windows Runtime Debugger...


doing stand alone now with stock app.

Update:
-------------------------
AK_v8_win_SSE41.exe -standalone, Elapsed time: 3715 seconds
 > Spike count:    1
 > Pulse count:    3
 > Triplet count:  0
 > Gaussian count: 0
-------------------------
setiathome_6.06_windows_intelx86__cuda.exe -standalone, Elapsed time: 32 seconds
 > Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
-------------------------
reuslts as attachment added.

[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 05 Jan 2009, 08:14:20 am
VLAR AR ~0.13
driver crash in standalone run (6.06 stock, not my build), overflow in online result
Both results differ from CPU one, unknown CUDA error in stderr.



[attachment deleted by admin]
Title: Re: WUs that CUDA MB can't do correctly
Post by: Jason G on 05 Jan 2009, 08:51:32 am
...unknown CUDA error in stderr....

Oooh, that sounds familiar  :P
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 05 Jan 2009, 08:52:54 am
It's all true if it would be show in CPU result too. While we have such error in GPU only I tend to think it's just another CUDA app problem....

In this case it looks like a design choice rather than an execution bug. My guess is someone did a mathematical estimation of the most "above threshold" points which should occur in pure random data and allowed for somewhat more. It looks like another case where ideal randomness isn't quite achieved...
                                                                    Joe

I agree that this design flaw may appear sometime in future but for now it's just another CUDA bug.
http://setiathome.berkeley.edu/workunit.php?wuid=390436470
CPU didn't report this overflow (as it should be :) )
Title: Re: WUs that CUDA MB can't do correctly
Post by: Raistmer on 05 Jan 2009, 08:53:28 am
...unknown CUDA error in stderr....

Oooh, that sounds familiar  :P
Sure, they used the same CUDA as I did ;)