+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: WUs that CUDA MB can't do correctly  (Read 30354 times)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
WUs that CUDA MB can't do correctly
« on: 27 Dec 2008, 07:56:59 pm »
Here this WU with results, log of testing, rescmpv2 for comparison are attached.

[attachment deleted by admin]

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: WUs that CUDA MB can't do correctly
« Reply #1 on: 28 Dec 2008, 05:57:29 am »
Here's that WU run with AKv8b SSE4.1 , and stock cuda 6.05.

Same results as your AKv8 SSE3  Vs 6.06:

Running app : AK_v8b_win_SSE41.exe with -verb -nog
with WU     : 03dc08ad.15767.890.15.8.213.wu
Started at  : 20:20:54.402
Ended at    : 20:56:35.543
   2141.047 secs Elapsed
   2128.016 secs CPU time
Result      : stored as ref for validation.
Running app : setiathome_6.05_windows_intelx86__cuda.exe with -verb -st
with WU     : 03dc08ad.15767.890.15.8.213.wu
Started at  : 20:56:35.590
Ended at    : 21:22:04.605
   1528.969 secs Elapsed
     95.953 secs CPU time
Speedup     : 95.49%
Ratio       : 22.18 x
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      2      0      0        2      0      0
     Gaussian      2      0      0        2      0      0
        Pulse      1      0      0        1      0      0
      Triplet      1      0      1        1      0      0
   Best Spike      1      0      0        1      0      0
Best Gaussian      1      0      0        1      0      0
   Best Pulse      1      0      0        1      0      0
 Best Triplet      0      0      1        0      0      1
                ----   ----   ----     ----   ----   ----
                   9      0      2        9      0      1

Result      : Weakly similar.

Bench file attached.  Ignore that I broke some Init_data.xml values while experimenting with something else ... no effect on (the lack of) validity of the result.


[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #2 on: 28 Dec 2008, 06:01:16 am »
Will search another reproducible failures to collect them here.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #3 on: 28 Dec 2008, 08:27:03 am »

Online and standalone results for CUDA app strongly similar again and both invalid versus CPU AK8 SSSE3x app result.
Data along with full log attached.

Log excerpt:

MB_6.06r380mod_CUDA.exe -verb -st / 03no08aa.5874.273823.14.11.250.wu :
Started at  : 15:49:57.593
Ended at    : 15:50:15.392
     17.737 secs Elapsed
     15.054 secs CPU time
Speedup     : 97.85%
Ratio       : 46.46 x
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      1        0      0      0
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      0        0      0      0
      Triplet      0      0      2        0      0     31
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      1        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0      7        0      0     31

Result      : Different.
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : GeForce 9600 GSO
           totalGlobalMem = 402653184
           sharedMemPerBlock = 16384
           regsPerBlock = 8192
           warpSize = 32
           memPitch = 262144
           maxThreadsPerBlock = 512
           clockRate = 1700000
           totalConstMem = 65536
           major = 1
           minor = 1
           textureAlignment = 256
           deviceOverlap = 0
           multiProcessorCount = 12
setiathome_CUDA: No device specified, determined to use CUDA device 1: GeForce 9600 GSO
SETI@home using CUDA accelerated device GeForce 9600 GSO
Rise priority modification by Raistmer based on rev380 of SETI@home sources
Priority of worker thread rised successfully
setiathome_enhanced 6.02 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
WU true angle range is :  14.146648

[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #4 on: 28 Dec 2008, 10:04:27 am »
Some VLAR rich of CUDA errors in standalone mode too

AK_v8_win_SSSE3x.exe -verb -st / 15no08ac.10856.20256.16.8.135.wu :
Started at  : 16:34:50.293
Ended at    : 17:38:32.699
   3822.390 secs Elapsed
   3820.168 secs CPU time
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale

     CPUID: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
     Speed: 4 x 2655 MHz
     Cache: L1=64K L2=6144K
  Features: MMX SSE SSE2 SSE3 SSSE3
Work Unit Info:
Credit multiplier is :  2.85
WU true angle range is :  0.013497

Flopcounter: 19390751501029.980000

Spike count:    4
Pulse count:    2
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]
MB_6.06r380mod_CUDA.exe -verb -st / 15no08ac.10856.20256.16.8.135.wu :
Started at  : 17:38:32.745
Ended at    : 17:38:53.462
     20.686 secs Elapsed
     15.163 secs CPU time
Speedup     : 99.60%
Ratio       : 251.94 x
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      4        0      0     30
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      2        0      0      0
      Triplet      0      0      0        0      0      0
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      0        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0      9        0      0     30

Result      : Different.
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : GeForce 9600 GSO
           totalGlobalMem = 402653184
           sharedMemPerBlock = 16384
           regsPerBlock = 8192
           warpSize = 32
           memPitch = 262144
           maxThreadsPerBlock = 512
           clockRate = 1700000
           totalConstMem = 65536
           major = 1
           minor = 1
           textureAlignment = 256
           deviceOverlap = 0
           multiProcessorCount = 12
setiathome_CUDA: No device specified, determined to use CUDA device 1: GeForce 9600 GSO
SETI@home using CUDA accelerated device GeForce 9600 GSO
Rise priority modification by Raistmer based on rev380 of SETI@home sources
Priority of worker thread rised successfully
setiathome_enhanced 6.02 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
WU true angle range is :  0.013497
Optimal function choices:
              v_BaseLineSmooth (no other)

            v_GetPowerSpectrum 0.00020 0.00000  test
            v_GetPowerSpectrum 0.00020 0.00000  choice

                   v_ChirpData 0.01300 0.00000  test
                   v_ChirpData 0.01300 0.00000  choice

                   v_Transpose 0.00550 0.00000  test
                  v_Transpose2 0.00492 0.00000  test
                  v_Transpose4 0.00313 0.00000  test
                  v_Transpose8 0.00586 0.00000  test
                  v_Transpose4 0.00313 0.00000  choice

               FPU opt folding 0.00775 0.00000  test
               FPU opt folding 0.00775 0.00000  choice

Cuda error 'find_pulse_kernel2<3, false>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1166 : unknown error.
Cuda error 'find_pulse_kernel2<4, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1172 : unknown error.
Cuda error 'find_pulse_kernel2<4, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1172 : unknown error.
Cuda error 'find_pulse_kernel2<5, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1178 : unknown error.
Cuda error 'find_pulse_kernel2<5, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1178 : unknown error.
Cuda error 'cudaMemcpy(&flags, dev_find_pulse_flag, sizeof(*dev_find_pulse_flag), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1250 : unknown error.
Cuda error 'cudaMemcpy(PulseResults, dev_PulseResults, 4 * (cudaAcc_NumDataPoints / AdvanceBy + 1) * sizeof(*dev_PulseResults), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1262 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cufftExecC2C' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_fft.cu' in line 63 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.
Cuda error 'find_triplets_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 224 : unknown error.
Cuda error 'find_triplets_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 224 : unknown error.
Cuda error 'cudaMemcpy(&flags, dev_flag, sizeof(*dev_flag), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 228 : unknown error.
Cuda error 'find_pulse_kernel2<3, false>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1166 : unknown error.
Cuda error 'find_pulse_kernel2<3, false>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1166 : unknown error.
Cuda error 'find_pulse_kernel2<4, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1172 : unknown error.
Cuda error 'find_pulse_kernel2<4, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1172 : unknown error.
Cuda error 'find_pulse_kernel2<5, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1178 : unknown error.
Cuda error 'find_pulse_kernel2<5, true>' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1178 : unknown error.
Cuda error 'cudaMemcpy(&flags, dev_find_pulse_flag, sizeof(*dev_find_pulse_flag), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1250 : unknown error.
Cuda error 'cudaMemcpy(PulseResults, dev_PulseResults, 4 * (cudaAcc_NumDataPoints / AdvanceBy + 1) * sizeof(*dev_PulseResults), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1262 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaAcc_transpose' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_transpose.cu' in line 74 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cufftExecC2C' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_fft.cu' in line 63 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.

Flopcounter: 20886859697.456421

Spike count:    30
Pulse count:    0
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]

[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #5 on: 29 Dec 2008, 04:05:34 pm »
This is VHAR with AR~14, but it finishes OK and validates versus CPU in standalone. Online result the same as standalone one.

[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #6 on: 29 Dec 2008, 06:07:42 pm »
Best Gaussian differ:

                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      1      0      0        1      0      0
     Gaussian      4      0      0        4      1      0
        Pulse      0      0      0        0      0      0
      Triplet      2      0      0        2      0      0
   Best Spike      1      0      0        1      0      0
Best Gaussian      0      1      0        0      1      0
   Best Pulse      1      0      0        1      0      0
 Best Triplet      1      0      0        1      0      0
                ----   ----   ----     ----   ----   ----
                  10      1      0       10      2      0

Result      : Weakly similar.

[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #7 on: 29 Dec 2008, 06:48:12 pm »
Again VHAR task, restarted under BOINC and non-overflowed.
Standalone testing w/o restarting gave strongly similar with online result and with CPU result too.

[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #8 on: 30 Dec 2008, 11:13:28 am »
New type of error:

AR~13 (VHAR):

SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: d:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu
Line: 235

[ /stderr ]

MB_6.06r380mod_CUDA.exe -verb -st / 03no08aa.5874.274232.14.11.84.wu :
Started at  : 18:25:58.354
Ended at    : 18:26:13.252
     14.820 secs Elapsed
     14.009 secs CPU time
Speedup     : 98.05%
Ratio       : 51.21 x
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      2        0      0      0
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      0        0      0      0
      Triplet      0      0      5        0      0      0
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      1        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0     11        0      0      0

Result      : Different.

Online result: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=5100914

[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #9 on: 30 Dec 2008, 11:46:18 am »
Invalid overflow at VHAR AR~13,6

MB_6.06r380mod_CUDA.exe -verb -st / 03no08aa.5874.274232.14.11.89.wu :
Started at  : 19:35:07.752
Ended at    : 19:35:25.239
     17.456 secs Elapsed
     14.976 secs CPU time
Speedup     : 97.92%
Ratio       : 48.09 x
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      1        0      0      0
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      0        0      0      0
      Triplet      0      0      2        0      0     31
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      1        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0      7        0      0     31

Result      : Different.

Online result: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=5100919

[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #10 on: 30 Dec 2008, 12:32:06 pm »
VLAR AR~0.15
Errors at CUDA mem copy, invalid results.

AK_v8_win_SSSE3x.exe -verb -st / 23no08ad.15915.22976.9.8.127.wu :
Started at  : 03:07:17.808
Ended at    : 04:05:01.401
   3463.562 secs Elapsed
   3458.355 secs CPU time
[ stderr ]
Can't set up shared mem: -1
Will run in standalone mode.
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale

     CPUID: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
     Speed: 4 x 2655 MHz
     Cache: L1=64K L2=6144K
  Features: MMX SSE SSE2 SSE3 SSSE3
Work Unit Info:
Credit multiplier is :  2.85
WU true angle range is :  0.154919

Flopcounter: 27945224690130.227000

Spike count:    2
Pulse count:    5
Triplet count:  0
Gaussian count: 0
called boinc_finish
[ /stderr ]

MB_6.06r380mod_CUDA.exe -verb -st / 23no08ad.15915.22976.9.8.127.wu :
Started at  : 19:48:55.425
Ended at    : 20:17:32.767
   1717.310 secs Elapsed
     70.294 secs CPU time
Speedup     : -57.06%
Ratio       : 0.64 x
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      2      0      0        2      0      4
     Gaussian      0      0      0        0      0      0
        Pulse      5      0      0        5      0      2
      Triplet      0      0      0        0      0      0
   Best Spike      0      0      1        0      0      1
Best Gaussian      1      0      0        1      0      0
   Best Pulse      0      0      1        0      0      1
 Best Triplet      0      0      0        0      0      0
                ----   ----   ----     ----   ----   ----
                   8      0      2        8      0      8

Result      : Weakly similar.

Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.
Cuda error 'cudaMemcpy(tmp_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1269 : unknown error.

Online result:  http://setiathome.berkeley.edu/result.php?resultid=1108406664

[attachment deleted by admin]
« Last Edit: 31 Dec 2008, 06:09:28 am by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #11 on: 30 Dec 2008, 02:59:03 pm »
AR ~0.4
Invalid result:
MB_6.06r380mod_CUDA.exe -verb -st / 03dc08ab.11550.18882.10.8.130.wu :
Started at  : 21:22:41.537
Ended at    : 21:41:03.287
   1101.656 secs Elapsed
    120.339 secs CPU time
Speedup     : 95.83%
Ratio       : 24.00 x
                ----- R1:R2 ------     ----- R2:R1 ------
                Good    Bad   Ugly     Good    Bad   Ugly
        Spike      0      0      0        0      0      0
     Gaussian      0      0      0        0      0      0
        Pulse      0      0      1        0      0      0
      Triplet      0      0      7        0      0      0
   Best Spike      0      0      1        0      0      0
Best Gaussian      0      0      1        0      0      0
   Best Pulse      0      0      1        0      0      0
 Best Triplet      0      0      1        0      0      0
                ----   ----   ----     ----   ----   ----
                   0      0     12        0      0      0

Result      : Different.

 with CUDA error:

SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: d:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu
Line: 235

[ /stderr ]

[attachment deleted by admin]

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: WUs that CUDA MB can't do correctly
« Reply #12 on: 30 Dec 2008, 05:43:00 pm »
VLAR AR~0.15
Errors at CUDA mem copy, invalid results.
Wall clock execution time greater than for CPU app.

AK_v8_win_SSSE3x.exe -verb -st / 23no08ad.15915.22976.9.8.127.wu :
Started at  : 19:48:10.591
Ended at    : 19:48:55.394
     44.772 secs Elapsed
     44.757 secs CPU time
No heartbeat from core client for 30 sec - exiting

I don't think the timing comparison is meaningful, though the CUDA mem copy errors obviously show a problem with that.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #13 on: 30 Dec 2008, 07:03:41 pm »
Oops, will retest...
(thanx for spotting this early exit)

ADDON:report edited, now correct CPU run there.
(and reference on online result added)
« Last Edit: 31 Dec 2008, 06:03:02 am by Raistmer »


  • Guest
Re: WUs that CUDA MB can't do correctly
« Reply #14 on: 04 Jan 2009, 09:05:51 am »
AR: 5.324874
MB_6.06r380mod_CUDA AK_v8_win_SSE41 setiathome_6.06_windows_intelx86__cuda
Spike count: 0 Spike count: 9 Spike count: 0
Pulse count: 0 Pulse count: 0 Pulse count: 0
Triplet count: 31 Triplet count: 2 Triplet count: 31
Gaussian count: 0 Gaussian count: 0 Gaussian count: 0

[edit] found 2 more of them, same AR, nearly same results ... stock cuda on both with triplet count: 31[/edit]

[attachment deleted by admin]
« Last Edit: 04 Jan 2009, 09:46:11 am by Maik »


Welcome, Guest.
Please login or register.
Forgot your password?
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Total Posts: 59559
Total Topics: 1672
Most Online Today: 158
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 169
Total: 169
Powered by EzPortal