+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: Computation Error with CUDA  (Read 16738 times)

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Computation Error with CUDA
« on: 10 Aug 2010, 05:17:09 pm »
My GT 240 was running stable for weeks now. Suddenly I get a "Computation Error" for all my WUs. And every time after exactly the same amount of time: 03:43

I am using MB_6.08_CUDA_V12_VLARKill_FPLim2048.exe

Where can I get more information on the exact type of this "Computation Error". Are there logfiles I can look at?
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline Geek@Play

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 330
Re: Computation Error with CUDA
« Reply #1 on: 10 Aug 2010, 05:58:31 pm »
Copy "client_state.xml" to desktop or somewhere else you like.  In notepad searach for the word error.  Look a few lines above and below for more descriptions of the problem.
Boinc....Boinc....Boinc....Boinc

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: Computation Error with CUDA
« Reply #2 on: 10 Aug 2010, 06:07:33 pm »
Ah! Thanks ... now how to read this?

<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):

   Device 1 : GeForce GT 240

           totalGlobalMem = 497745920

           sharedMemPerBlock = 16384

           regsPerBlock = 16384

           warpSize = 32

           memPitch = 2147483647

           maxThreadsPerBlock = 512

           clockRate = 1750000

           totalConstMem = 65536

           major = 1

           minor = 2

           textureAlignment = 256

           deviceOverlap = 1

           multiProcessorCount = 12

setiathome_CUDA: CUDA Device 1 specified, checking...

   Device 1: GeForce GT 240 is okay

SETI@home using CUDA accelerated device GeForce GT 240

V12 modification by Raistmer

Priority of worker thread rised successfully

Priority of process adjusted successfully

Total GPU memory 497745920    free GPU memory 477425664

setiathome_enhanced 6.02 Visual Studio/Microsoft C++



Build features: Non-graphics   CUDA    VLAR autokill enabled    FFTW   USE_SSE   x86   

     CPUID: AMD Phenom(tm) II X6 1090T Processor



     Cache: L1=64K L2=512K



CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3

libboinc: 6.3.22



Work Unit Info:

...............

WU true angle range is :  0.420137

After app init: total GPU memory 497745920    free GPU memory 477425664





Unhandled Exception Detected...



- Unhandled Exception Record -

Reason: Breakpoint Encountered (0x80000003) at address 0x75BA22A1



Engaging BOINC Windows Runtime Debugger...







********************





BOINC Windows Runtime Debugger Version 6.3.22





Dump Timestamp    : 08/10/10 21:57:21

Install Directory : C:\Program Files (x86)\BOINC\

Data Directory    : C:\ProgramData\BOINC

Project Symstore  :

LoadLibraryA( C:\Program Files (x86)\BOINC\\dbghelp.dll ): GetLastError = 126

Loaded Library    : dbghelp.dll

LoadLibraryA( C:\Program Files (x86)\BOINC\\symsrv.dll ): GetLastError = 126

LoadLibraryA( symsrv.dll ): GetLastError = 126

LoadLibraryA( C:\Program Files (x86)\BOINC\\srcsrv.dll ): GetLastError = 126

LoadLibraryA( srcsrv.dll ): GetLastError = 126

LoadLibraryA( C:\Program Files (x86)\BOINC\\version.dll ): GetLastError = 126

Loaded Library    : version.dll

Debugger Engine   : 4.0.5.0

Symbol Search Path: C:\ProgramData\BOINC\slots\0;C:\ProgramData\BOINC\projects\setiathome.berkeley.edu;srv*C:\Users\Admin\AppData\Local\Temp\symbols*http://msdl.microsoft.com/download/symbols;srv*C:\Users\Admin\AppData\Local\Temp\symbols*http://boinc.berkeley.edu/symstore





ModLoad: 00400000 00448000 C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\MB_6.08_CUDA_V12_VLARKill_FPLim2048.exe (6.2.0.0) (-nosymbols- Symbols Loaded)

    Linked PDB Filename   :

    File Version          : 6.02

    Company Name          : Space Sciences Laboratory

    Product Name          : setiathome_enhanced

    Product Version       : 6.02



ModLoad: 77300000 00180000 C:\Windows\SysWOW64\ntdll.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wntdll.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 76d10000 00100000 C:\Windows\syswow64\kernel32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wkernel32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 75b90000 00046000 C:\Windows\syswow64\KERNELBASE.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wkernelbase.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 10000000 0004a000 C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\cudart.dll (6.14.11.2030) (-exported- Symbols Loaded)

    Linked PDB Filename   :

    File Version          : 6,14,11,2030

    Company Name          : NVIDIA Corporation

    Product Name          : NVIDIA CUDA 2.3 Runtime

    Product Version       : 6,14,11,2030



ModLoad: 009e0000 00845000 C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\cufft.dll (6.14.11.2030) (-exported- Symbols Loaded)

    Linked PDB Filename   :

    File Version          : 6,14,11,2030

    Company Name          : NVIDIA Corporation

    Product Name          : NVIDIA Windows XP CUDA 2.3 FFT Library

    Product Version       : 6,14,11,2030



ModLoad: 01230000 00494000 C:\Windows\system32\nvcuda.dll (8.17.12.5721) (-exported- Symbols Loaded)

    Linked PDB Filename   :

    File Version          : 8.17.12.5721

    Company Name          : NVIDIA Corporation

    Product Name          : NVIDIA CUDA 3.1.1 driver

    Product Version       : 8.17.12.5721



ModLoad: 74ed0000 00100000 C:\Windows\syswow64\USER32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wuser32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 75c80000 00090000 C:\Windows\syswow64\GDI32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wgdi32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 75270000 0000a000 C:\Windows\syswow64\LPK.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wlpk.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 75280000 0009d000 C:\Windows\syswow64\USP10.dll (1.626.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : usp10.pdb

    File Version          : 1.0626.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft(R) Uniscribe Unicode script processor

    Product Version       : 1.0626.7600.16385



ModLoad: 76b90000 000ac000 C:\Windows\syswow64\msvcrt.dll (7.0.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : msvcrt.pdb

    File Version          : 7.0.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 7.0.7600.16385



ModLoad: 755b0000 000a0000 C:\Windows\syswow64\ADVAPI32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : advapi32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 76cf0000 00019000 C:\Windows\SysWOW64\sechost.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : sechost.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 76e10000 000f0000 C:\Windows\syswow64\RPCRT4.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wrpcrt4.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 74e70000 00060000 C:\Windows\syswow64\SspiCli.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wsspicli.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 74e60000 0000c000 C:\Windows\syswow64\CRYPTBASE.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : cryptbase.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 759f0000 0019d000 C:\Windows\syswow64\SETUPAPI.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : setupapi.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 76c60000 00027000 C:\Windows\syswow64\CFGMGR32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : cfgmgr32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



Get Product Name Failed.

ModLoad: 74fd0000 0008f000 C:\Windows\syswow64\OLEAUT32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : oleaut32.pdb

    File Version          : 6.1.7600.16385

    Company Name          : Microsoft Corporation

    Product Name          :

    Product Version       : 6.1.7600.16385



ModLoad: 756b0000 0015c000 C:\Windows\syswow64\ole32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : ole32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 76c40000 00012000 C:\Windows\syswow64\DEVOBJ.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : devobj.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 76c90000 00060000 C:\Windows\system32\IMM32.DLL (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wimm32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 75360000 000cc000 C:\Windows\syswow64\MSCTF.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : msctf.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 04910000 00198000 C:\Windows\system32\nvapi.dll (8.17.12.5721) (-exported- Symbols Loaded)

    Linked PDB Filename   : c:\dvs\p4\build\sw\rel\gpu_drv\r256\r256_stable_charlie\drivers\nvapi\_out\win7_wow64_release\nvapi.pdb

    File Version          : 8.17.12.5721

    Company Name          : NVIDIA Corporation

    Product Name          : NVIDIA Windows drivers

    Product Version       : 8.17.12.5721



ModLoad: 75650000 00057000 C:\Windows\syswow64\SHLWAPI.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : shlwapi.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 75d10000 00c49000 C:\Windows\syswow64\SHELL32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : shell32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 73410000 00009000 C:\Windows\system32\VERSION.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : version.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 76960000 0002d000 C:\Windows\syswow64\WINTRUST.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : wintrust.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 76a70000 0011c000 C:\Windows\syswow64\CRYPT32.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : crypt32.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 75c70000 0000c000 C:\Windows\syswow64\MSASN1.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : msasn1.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385



ModLoad: 72480000 000eb000 C:\Windows\system32\dbghelp.dll (6.1.7600.16385) (-exported- Symbols Loaded)

    Linked PDB Filename   : dbghelp.pdb

    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)

    Company Name          : Microsoft Corporation

    Product Name          : Microsoft® Windows® Operating System

    Product Version       : 6.1.7600.16385







*** Dump of the Process Statistics: ***



- I/O Operations Counters -

Read: 0, Write: 0, Other 0



- I/O Transfers Counters -

Read: 0, Write: 0, Other 0



- Paged Pool Usage -

QuotaPagedPoolUsage: 0, QuotaPeakPagedPoolUsage: 0

QuotaNonPagedPoolUsage: 0, QuotaPeakNonPagedPoolUsage: 0



- Virtual Memory Usage -

VirtualSize: 0, PeakVirtualSize: 0



- Pagefile Usage -

PagefileUsage: 0, PeakPagefileUsage: 0



- Working Set Size -

WorkingSetSize: 0, PeakWorkingSetSize: 0, PageFaultCount: 0



*** Dump of thread ID 2944 (state: Initialized): ***



- Information -

Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000



- Unhandled Exception Record -

Reason: Breakpoint Encountered (0x80000003) at address 0x75BA22A1



- Registers -

eax=00000000 ebx=00000000 ecx=00811712 edx=036e613c esi=00000001 edi=00000000

eip=75ba22a1 esp=036efb74 ebp=036eff94

cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246



- Callstack -

ChildEBP RetAddr  Args to Child

036eff94 77339d72 00000000 753eaf83 00000000 00000000 KERNELBASE!DebugBreak+0x0

036effd4 77339d45 0045ca50 00000000 00000000 00000000 ntdll!RtlInitializeExceptionChain+0x0

036effec 00000000 0045ca50 00000000 00000000 03f00000 ntdll!RtlInitializeExceptionChain+0x0



*** Dump of thread ID 144 (state: Initialized): ***



- Information -

Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000



- Registers -

eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=00000100 edi=0018f2e4

eip=7731f871 esp=0018f29c ebp=0018f308

cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206



- Callstack -

ChildEBP RetAddr  Args to Child

0018f308 76d21184 00000100 00000064 00000000 0018f36c ntdll!NtWaitForSingleObject+0x0

0018f320 76d21138 00000100 00000064 00000000 00000000 kernel32!WaitForSingleObjectEx+0x0

0018f334 0153b04d 00000100 00000064 04eabac0 0124bd98 kernel32!WaitForSingleObject+0x0

00000000 00000000 00000000 00000000 00000000 00000000 nvcuda!cuGraphicsD3D11RegisterResource+0x0





*** Debug Message Dump ****





*** Foreground Window Data ***

    Window Name      :

    Window Class     :

    Window Process ID: 0

    Window Thread ID : 0



Exiting...


</stderr_txt>
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: Computation Error with CUDA
« Reply #3 on: 10 Aug 2010, 06:25:08 pm »
Ah! Thanks ... now how to read this?

<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : GeForce GT 240
           totalGlobalMem = 497745920
           sharedMemPerBlock = 16384
           regsPerBlock = 16384
           warpSize = 32
           memPitch = 2147483647
           maxThreadsPerBlock = 512
           clockRate = 1750000
           totalConstMem = 65536
           major = 1
           minor = 2
           textureAlignment = 256
           deviceOverlap = 1
           multiProcessorCount = 12
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GT 240 is okay

what it's going to run on

Quote
SETI@home using CUDA accelerated device GeForce GT 240
V12 modification by Raistmer

what app is going to run
 
Quote
Priority of worker thread rised successfully
Priority of process adjusted successfully
Total GPU memory 497745920    free GPU memory 477425664
setiathome_enhanced 6.02 Visual Studio/Microsoft C++
Build features: Non-graphics   CUDA    VLAR autokill enabled    FFTW   USE_SSE   x86   
     CPUID: AMD Phenom(tm) II X6 1090T Processor
     Cache: L1=64K L2=512K
CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.3.22
Work Unit Info:
...............
WU true angle range is :  0.420137
After app init: total GPU memory 497745920    free GPU memory 477425664
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x75BA22A1
Engaging BOINC Windows Runtime Debugger...

and that bit is the error itself. the rest is a dump few people will be able to make sense of (much less go bughunting in)

haven't come across this one. one of the notorious -12 ?
The road to hell is paved with good intentions

Offline Geek@Play

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 330
Re: Computation Error with CUDA
« Reply #4 on: 10 Aug 2010, 06:27:31 pm »
See this.......

http://boincfaq.mundayweb.com/index.php?language=1&view=480

Also more info available when Seti forums come back online.  See the forums there.

Were thise work units moved from gpu to CPU???
Boinc....Boinc....Boinc....Boinc

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: Computation Error with CUDA
« Reply #5 on: 10 Aug 2010, 06:31:25 pm »
Were thise work units moved from gpu to CPU???

No. Nothing rescheduled.


What I seriously don't understand is why the error happens AT THE EXACT SAME SECOND for all WUs!
« Last Edit: 10 Aug 2010, 06:33:50 pm by Frizz23 »
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Computation Error with CUDA
« Reply #6 on: 10 Aug 2010, 06:35:08 pm »
See this.......

http://boincfaq.mundayweb.com/index.php?language=1&view=480

Also more info available when Seti forums come back online.  See the forums there.

Were thise work units moved from gpu to CPU???
aborting by elapsed time limit exceeding usually marked in stderr by additional info lines. here no such info...

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: Computation Error with CUDA
« Reply #7 on: 10 Aug 2010, 06:36:05 pm »
Ah! Thanks ... now how to read this?

<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
... 

</stderr_txt>


Also, have a look just a couple of lines above <stderr.txt>. If Carola's right, you should see
<message>
 - exit code -12 (0xfffffff4)
</message>
and a few lines above that,
<exit_status>-12</exit_status>
If you see any other number, let us know: I must say, my first instinct on reading your initial report that they happened at the exact same second was that it might be a return of the -177s, but only you can tell until the servers are back up.

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: Computation Error with CUDA
« Reply #8 on: 10 Aug 2010, 06:36:43 pm »
See this.......

http://boincfaq.mundayweb.com/index.php?language=1&view=480

Also more info available when Seti forums come back online.  See the forums there.

Were thise work units moved from gpu to CPU???

... uh right... there's such things as standard error messages ::)
'exceeded maximum disk space' ? or a memory leak? modern systems rarely run out of space, but a quick check can't hurt. The other one - try rebooting perhaps? something stuck on the GPU?
The road to hell is paved with good intentions

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: Computation Error with CUDA
« Reply #9 on: 10 Aug 2010, 06:43:19 pm »
Also, have a look just a couple of lines above <stderr.txt>. If Carola's right, you should see
<message>
 - exit code -12 (0xfffffff4)
</message>
and a few lines above that,
<exit_status>-12</exit_status>
If you see any other number, let us know: I must say, my first instinct on reading your initial report that they happened at the exact same second was that it might be a return of the -177s, but only you can tell until the servers are back up.


oh right, exit codes get parsed to somewhere else...
I was just guessing, Richard, as I assumed -12 would show up as such. I've yet to come across one for myself, so I don't know what they look like.
ah and max CPU time exceeded should be -177 from the description in the faq.
« Last Edit: 10 Aug 2010, 06:47:47 pm by Miep »
The road to hell is paved with good intentions

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: Computation Error with CUDA
« Reply #10 on: 10 Aug 2010, 06:50:32 pm »
Whatever the exit code is, it'll be in there somewhere.

I knew I'd had a -12 on one machine since the servers went off, so those entries were a direct copy from client_state.

Others may appear slightly differently, but <exit_status>xxx</exit_status> will always be a key entry.

Offline Geek@Play

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 330
Re: Computation Error with CUDA
« Reply #11 on: 10 Aug 2010, 06:53:11 pm »
During past outages I have successfully search "client_state.xml" for the word "error".  On those searches I successsfully located -12 errors and -177 errors.

My first reaction is that they are -177 since they all occured at the same instant during processing.  That was also my experience with the -177 errors.
Boinc....Boinc....Boinc....Boinc

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: Computation Error with CUDA
« Reply #12 on: 10 Aug 2010, 07:20:38 pm »
My first reaction is that they are -177 since they all occured at the same instant during processing.  That was also my experience with the -177 errors.

Damn it - just found some "Computation Error" for CPU units too. 5 WUs. All at the EXACT same time.
Is this an indication for -177 errors?

By the way: What exactly is this -177 error?
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: Computation Error with CUDA
« Reply #13 on: 10 Aug 2010, 07:27:04 pm »
My first reaction is that they are -177 since they all occured at the same instant during processing.  That was also my experience with the -177 errors.

Damn it - just found some "Computation Error" for CPU units too. 5 WUs. All at the EXACT same time.
Is this an indication for -177 errors?

By the way: What exactly is this -177 error?
You need to be certain exactly what they are - search client_state, don't just rely on our speculation.

-177 errors arise from bad estimates issued by the server - they don't indicate a problem at your end. At this moment, I would guess they're re-issues for the ghost WUs which are timing out - look for _2, _3 or higher on the end of the task name. Don't worry about them unless you get a huge number - if you do, Fred's latest rescheduling tool can deal with them, by resetting the estimates.

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: Computation Error with CUDA
« Reply #14 on: 10 Aug 2010, 07:35:26 pm »
You need to be certain exactly what they are - search client_state, don't just rely on our speculation.

OK found it (-177). It would be nice if the BOINC client would show this - and not just "Computation error". Especially at times when SETI is offline.

How can I prevent this -177 error?
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 257
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 207
Total: 207
Powered by EzPortal