+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: WUs that CUDA MB can't do correctly  (Read 30929 times)

popandbob

  • Guest
Re: WUs that CUDA MB can't do correctly
« Reply #15 on: 04 Jan 2009, 02:16:31 pm »
A new error for you all to have fun with...

cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel

http://setiathome.berkeley.edu/result.php?resultid=1112071921

I didn't do stand alone testing due to my pc locked up every time I tried to....

[attachment deleted by admin]

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #16 on: 04 Jan 2009, 04:58:07 pm »
This task has different online and standalone results for CUDA app.
standalone results for my build, stock 6.06 and CPUapp all strongly similar, online result gave overflow.
So, there is influence between tasks, not just temporal locality of bad tasks.
AR ~2.16


[attachment deleted by admin]

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: WUs that CUDA MB can't do correctly
« Reply #17 on: 04 Jan 2009, 05:05:45 pm »
A new error for you all to have fun with...

cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel

http://setiathome.berkeley.edu/result.php?resultid=1112071921

I didn't do stand alone testing due to my pc locked up every time I tried to....

And the MAX_TRIPLETS_ABOVE_THRESHOLD constant is 10. I was wondering whether that would show up. It only allows for that many of the power samples to be above threshold in one Power over Time array, and those arrays are up to 128K samples in size. It is handled just like the case where more than one triplet is detected in one PoT array, a flag is set which causes a SETIERROR() call with the Unsupported Function code (12), which is negated by the time BOINC sees it (and because it's only defined in the SETI source, BOINC condiders it an unknown error).

It's a good protection from splitter problems like what happened when Enhanced first was released on main and again shortly after we started doing work from multibeam; the triplet threshold was much too low or even negative for some WUs. OTOH, it probably ought to be classed like result_overflow and be reported as a success if that's the purpose of setting the limit that low.
                                                                              Joe

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #18 on: 04 Jan 2009, 05:11:10 pm »
It's all true if it would be show in CPU result too. While we have such error in GPU only I tend to think it's just another CUDA app problem....

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: WUs that CUDA MB can't do correctly
« Reply #19 on: 04 Jan 2009, 05:30:41 pm »
It's all true if it would be show in CPU result too. While we have such error in GPU only I tend to think it's just another CUDA app problem....

In this case it looks like a design choice rather than an execution bug. My guess is someone did a mathematical estimation of the most "above threshold" points which should occur in pure random data and allowed for somewhat more. It looks like another case where ideal randomness isn't quite achieved...
                                                                    Joe

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #20 on: 04 Jan 2009, 06:37:17 pm »
Another damaged online result with good standalone ones.
It seems OS reboot after driver crash/"snow" screen is required indeed although there is no visible memory leak.
AR =0.437128

ADDON: added another such result.

[attachment deleted by admin]
« Last Edit: 05 Jan 2009, 06:57:22 am by Raistmer »

Maik

  • Guest
Re: WUs that CUDA MB can't do correctly
« Reply #21 on: 04 Jan 2009, 07:22:48 pm »
I noticed not every WU-crash leads into a gfx-memoryleak.
Very often -after my script killed a stucking task- they run normal to 100% without any loss of memory.
It seems that the memoryleak isn't indicated from stucking WU.
My assumption:
There must be an instruction (send from cuda-app) that run inside the shaders in a neverending loop.
Maybe you know, an instruction is given with a special task to the first shader-unit, it complete it gives with next special task to the next shader-unit and so on until the instruction is handled. (ok, thats for grafic calculation, dont know how cuda works)
Is there a fault, maybe the instructions runs from shader-unit to shader-unit without completing it, it is blocking the gfx to work correctly (snow on screen, jumping pixel, grafic-errors, chrash of gfx).

I had this error one time. Tryied driver-reset via RivaTuner -> nothing happend. If you have this error, you have to reboot your host ... :(
« Last Edit: 04 Jan 2009, 07:35:34 pm by Maik »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #22 on: 04 Jan 2009, 08:02:04 pm »
Maybe...
There is no evidences of memory leak still.... Now I have driver crashed and restarted, "snow" on screen so will try to run task again to see if initial free memory will be less or not.

And here is "classic" VLAR AR~0.009 (edited)
It crashes videodriver with "snow" visual effect on screen.


ADDON:
No memory leak. Just usual amount of free GPU memory.
And many  CUDA errors in log, ended with overflow.



[attachment deleted by admin]
« Last Edit: 04 Jan 2009, 08:16:56 pm by Raistmer »

Maik

  • Guest
Re: WUs that CUDA MB can't do correctly
« Reply #23 on: 04 Jan 2009, 08:03:49 pm »
will do a stand alone test with 23ap08aa.4504.481.10.11.187 at my host ... done

-results AK_v8_win_SSE41.exe -> equivalent to yours (3 Pulse)
-MB_6.06r380mod_CUDA.exe -> task (WinTaskManager) goes stuck, after 30 sec i kicked them
 '--> no video-errors.
« Last Edit: 04 Jan 2009, 09:17:31 pm by Maik »

Maik

  • Guest
Re: WUs that CUDA MB can't do correctly
« Reply #24 on: 04 Jan 2009, 09:37:41 pm »
-MB_6.06r380mod_CUDA.exe -> task (WinTaskManager) goes stuck, after 30 sec i kicked them
 '--> no video-errors.

I was really wrong. After restarting BM i noticed first Cuda-Wu wont work (stuck). -> reboot -> now it runs ... reporting back ->done
WU true angle range is :  2.722583 (Triplet count:  2), normally run through ... must be a driver crash from past task
I'v been warned now. Next WU's i get with AR below 0.00 i'll abort bevore bm trys to crunch them *grrr  >:(

edit: found one ... AR: 0.0078554334964187 wanna have it? -> attachment ;) (not tested yet!)

[attachment deleted by admin]
« Last Edit: 04 Jan 2009, 10:00:31 pm by Maik »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #25 on: 05 Jan 2009, 07:26:01 am »
One more VLAR
AR ~0.01
online result differ from standalone result and both differ from standalone CPU result.
There is overflow in online result, no overflow but invalid signals in CUDA standalone result.
Standalone CUDA MB run didn't cause driver crash but caused "snow screen" effect. CUDA error in stderr.



[attachment deleted by admin]

Maik

  • Guest
Re: WUs that CUDA MB can't do correctly
« Reply #26 on: 05 Jan 2009, 07:52:56 am »
Big WU crash with serveral restarts of task, acces violtaions error, runtime debugger ...
applivation: mem-opt MB_6.06r380mod_CUDA.exe
WU: 01no08aa.9239.55814.9.8.194
WU true angle range is :  0.019967

error:
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004097A4 read attempt to address 0x0980A820

Engaging BOINC Windows Runtime Debugger...


doing stand alone now with stock app.

Update:
-------------------------
AK_v8_win_SSE41.exe -standalone, Elapsed time: 3715 seconds
 > Spike count:    1
 > Pulse count:    3
 > Triplet count:  0
 > Gaussian count: 0
-------------------------
setiathome_6.06_windows_intelx86__cuda.exe -standalone, Elapsed time: 32 seconds
 > Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.
-------------------------
reuslts as attachment added.

[attachment deleted by admin]
« Last Edit: 05 Jan 2009, 09:05:54 am by Maik »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #27 on: 05 Jan 2009, 08:14:20 am »
VLAR AR ~0.13
driver crash in standalone run (6.06 stock, not my build), overflow in online result
Both results differ from CPU one, unknown CUDA error in stderr.



[attachment deleted by admin]

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: WUs that CUDA MB can't do correctly
« Reply #28 on: 05 Jan 2009, 08:51:32 am »
...unknown CUDA error in stderr....

Oooh, that sounds familiar  :P

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: WUs that CUDA MB can't do correctly
« Reply #29 on: 05 Jan 2009, 08:52:54 am »
It's all true if it would be show in CPU result too. While we have such error in GPU only I tend to think it's just another CUDA app problem....

In this case it looks like a design choice rather than an execution bug. My guess is someone did a mathematical estimation of the most "above threshold" points which should occur in pure random data and allowed for somewhat more. It looks like another case where ideal randomness isn't quite achieved...
                                                                    Joe

I agree that this design flaw may appear sometime in future but for now it's just another CUDA bug.
http://setiathome.berkeley.edu/workunit.php?wuid=390436470
CPU didn't report this overflow (as it should be :) )

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 74
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 49
Total: 49
Powered by EzPortal