Author Topic: Driver, application and VRAM requirement? (Read 45797 times)

Miep · « **on:** 19 Jul 2010, 11:38:28 am »

Hello world

Ok, so I'm running a high spec Notebook with a NVIDIA Quadro FX 570M , 256MB. this host for a few more details.
And from the start it has been having stability issues with the graphic driver (IRQ_zero at system service bluescreens among others). Well I assume it's the graphic driver,
since updating it has increased stability.
Up to 195.62 which is the most stable and the one I'm currently running.
I did upgrade to 257.21, but with the decrease in reported RAM it didn't want to run stock 6.08 any more and it was more unstable on top. (Looks like they just released 258.96, haven't tried that one...)
I downgraded again last week so I could at least run stock 6.08.
So. I tried optimized CUDA MBnokill 2.2 dll and it errored out on me. Didn't go through to report and I didn't remember to preserve the stderr, but from what I've read I assume it's memory issues - stock 6.08 runs just fine (AND I've got aero enabled ;P). I can probably free up some RAM if somebody explains how to do it (or just points to the right post).
As an interim measure I've stuck 6.08 into app_info, so at least I get some WUs to play around with. So far it hasn't errored (first one at 15% done right now) and since the errors with optimized occured in the first two minutes or so, I hope it's playing along for the moment.
Any further suggestions?

BMaytum · « **Reply #1 on:** 19 Jul 2010, 11:57:58 am »

Miep:

Maybe others here can help you (especially if you can link to errored WU and/or provide STDERR output) for the WU(s) that errored out when you tried optimized NO-VLAR-kill app.

Asisde: Just an FYI for MB WUs run on GPU using the optimized VLAR-Kill version: If the WU terminates after only a few elapsed seconds, it was terminated because the VLAR was too low (ie the VLAR-Kill caussed termination). In such cases you'll see something similar to this

Code: [Select]

Stderr output
<core_client_version>6.10.56</core_client_version>
<![CDATA[
<message>
 - exit code -6 (0xfffffffa)
</message>
<stderr_txt>

VLAR WU (AR: 0.009264 )detected... autokill initialised
SETI@home error -6 Bad workunit header

The "error -6 " means VLAR-Kill did it's job.

Raistmer · « **Reply #2 on:** 19 Jul 2010, 11:59:29 am »

Hi here

Try to disable Aero first. Also, if 191.xx drivers work with CUDA 2.3 DLLs stable - no need to upgrade to higher versions (still)

Richard Haselgrove · « **Reply #3 on:** 20 Jul 2010, 10:38:44 am »

Quote from: Raistmer on 19 Jul 2010, 11:59:29 am

Hi here
Try to disable Aero first. Also, if 191.xx drivers work with CUDA 2.3 DLLs stable - no need to upgrade to higher versions (still)

When Miep asked about this on the main board last week (Nvidia driver revert?), I had a scout round and found that 190/191 drivers were never released (even in Beta) for her Quadro FX 570M / Vista 32. The first available driver with CUDA 2.3 support was the 195.62 she's using now.

Miep · « **Reply #4 on:** 20 Jul 2010, 11:05:47 am »

Ah well, next iteration. Disabled aero, rerun.
I think it might just be possible to get to many debugging messages... Are there actually people who can make sense of core dumps?

So, memory, as assumed:

Code: [Select]

<stderr_out>
<![CDATA[
<message>
 - exit code -1073741819 (0xc0000005)
</message>

anf then further down

Code: [Select]

After app init: total GPU memory 268435456	 free GPU memory 38637568

Cuda error 'cudaMalloc((void**) &dev_WorkData' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcceleration.cu' in line 293 : out of memory.

setiathome_CUDA: CUDA runtime ERROR in device memory allocation (Step 1 of 3). Falling back to HOST CPU processing...

Unhandled Exception Detected...
- Unhandled Exception Record -

Reason: Access Violation (0xc0000005) at address 0x726F662F read attempt to address 0x726F662F
Engaging BOINC Windows Runtime Debugger...

followed by no less that 7 pages of runtime debugger messages.

Kept client_state in case anybody feels like having a closer look.
Fine,back to 6.08 for now.

Raistmer · « **Reply #5 on:** 20 Jul 2010, 11:17:57 am »

no need to red those dumps further. It's clear that memory allocation failed. AFAIK last GPU-Z shows memory usage for nVidia cards (no such service for ATI GPUs still). Maybe you can see how much GPU memory in use before running CUDA MB with this tool?

Miep · « **Reply #6 on:** 21 Jul 2010, 08:15:43 am »

OK.
So GPU-Z 0.4.4 reports 104MB used. Erm. Not good. A bit of tweaking (appareance to Basic and colours to 16bit) brought that down to 19 (oh and the windows fix to temporarily disable .lnk and .pif not sure if that has an impact).
Still not enough.

While I was looking at client_state I noticed

Code: [Select]

 Cuda error 'cudaMalloc((void**) &dev_GaussFitResults' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcceleration.cu' in line 314 : out of memory.

setiathome_CUDA: CUDA runtime ERROR in device memory allocation (Step 1 of 3). Falling back to HOST CPU processing...

even on the WUs that had (from a users perspective) run error free.
?!
So, what's it doing then? It does put load on the GPU after all?
I'll run with more free memory and see if that clears up...

Raistmer · « **Reply #7 on:** 21 Jul 2010, 04:28:05 pm »

No, it performs fallback to CPU and not using GPU at all if you see listed message. It's slowest way to do seti task actually

Better disable GPU processing completely then (if you can't free more memory for app).
Another way is to fallback to older CUDA DLLs (like 2.2 or even older) they have slower cuFFT but require less memory. That way will be still much better than CPU fallback mode.

Jason G · « **Reply #8 on:** 21 Jul 2010, 04:32:41 pm »

Quote from: Raistmer on 21 Jul 2010, 04:28:05 pm

It's slowest way to do seti task actually

No it isn't. Asking my dog to process tasks is much slower. He hasn't returned a valid result yet

Miep · « **Reply #9 on:** 21 Jul 2010, 07:23:50 pm »

If it was doing just that, fine (not by cruching standards, but by 'I know what it's doing').

But how do you then explain
a) load on the GPU - from the heat generated that is. I'll run with GPU-Z in the background over night and see if iit shows load.
b) crunching times - well hard to say actually CPU is ak_v8b_win_ssse3x.exe and ap_5.05r409_sse.exe respectively GPU is 6.08 stock.
and I'm getting 1h50' avarage for shorties on GPU and 1h30' on CPU
When I was running both on stock GPU used to take about 1/3 of CPU - and besides the driver I've not conciously changed the setup.
[times are not really comparable though, I'm still finetuning how hot I allow CPU and GPU to get, while not having them throttled into oblivion]
c) both CPU cores are on approximately the same times as without GPU tasks. (same throughput)
d) shouldn't then a cpu core get taken over? I still get 3 tasks running.

So, from all I can tell at least part of it uses the GPU.

I don't think I've returned any of those tasks yet, so I don't know if they validate. Old setup used to, but as there was never any reason to check stderr I've no idea if it was showing
the same behaviour.

Josef W. Segur · « **Reply #10 on:** 21 Jul 2010, 11:20:10 pm »

The CPU fallback in the CUDA apps happens without BOINC knowing about it, so it assumes the CUDA task it started is only using a small fraction of a CPU and will start a CPU task also. The CUDA task is running at a higher priority than the CPU task, so runs mostly uninterrupted using the minimal default set of CPU code. The task started for the CPU will probably run quite slowly since it will get much less CPU time. So not only would the task started for CUDA be running slowly on CPU, a CPU task would also be running slowly; as Raistmer said, the worst possible way to do work.

Raistmer wrote:

Quote

Another way is to fallback to older CUDA DLLs (like 2.2 or even older) they have slower cuFFT but require less memory. That way will be still much better than CPU fallback mode.

I think that is the only way you'll be able to crunch with 256 MB VRAM.
Joe

Richard Haselgrove · « **Reply #11 on:** 22 Jul 2010, 02:34:10 am »

It'll be a bit difficult to work out what's happening until the outage is over, and we can see both CPU and Elapsed times reported on the website: I think BOINC does record actual CPU usage, even on a nominal GPU-allocated task.

At the moment, I'm getting confused signals. It looks as if the optimized CUDA MBnokill you started with has broken CPU fallback code, and just errors on low memory. From what you've said, the v6.08 CPU fallback mode is kicking in as designed, and at least generating a result - but at the lowest possible speed. The CPU code in that build has few, if any, of the optimisations present in even the stock v6.03 CPU application, let alone the additional ~2x speedup available with AK_v8b. Why NVidia incorporated such a crappy CPU codebase, I'll never know.

But your later observations don't bear that out. More heat from the GPU? That implies work - so what's it doing? (Unless, just possibly, merely starting the app kicks up a fequency shift, from idle to active - that sort of thing might be expected in a notebook). And if the CPU is doing the bulk of the heavy lifting for the nominal 'GPU' app, you'd expect that the elapsed timings for CPU tasks (I'm presuming those are what you're reading, from BOINC Manager) would increase significantly, even if CPU time should remain constant. Unless, again, the higher-priority GPU app is triggering a frequency increase in the CPU, from idle to performance. That's been a nasty gotcha for Linux users, which took a long time to track down the first time we came across it - presenting problem was that the Linux app was about half as efficient as the equivalent Windows app, which didn't make sense.

It would certainly be worth running GPU-Z in the background, and watching in particular what happens to GPU speed and utilisation as a new task starts up (from scratch, obviously, rather than replacing an existing running task). It might also be worth having a look at what the power management settings are doing for the system as a whole. If the notebook is high enough spec to have a Quadro as standard, then it's probably got some good power control stuff as well. Check both the BIOS, and for any Vista power extensions. Just for experimentation and understanding what's going on, it would be good to eliminate any variability from frequency shifting you can, while you test everything else. And you might even be able to allocate more RAM to the GPU in BIOS.

Miep · « **Reply #12 on:** 22 Jul 2010, 05:27:55 am »

Quote from: Josef W. Segur on 21 Jul 2010, 11:20:10 pm

The CPU fallback in the CUDA apps happens without BOINC knowing about it, so it assumes the CUDA task it started is only using a small fraction of a CPU and will start a CPU task also. The CUDA task is running at a higher priority than the CPU task, so runs mostly uninterrupted using the minimal default set of CPU code. The task started for the CPU will probably run quite slowly since it will get much less CPU time. So not only would the task started for CUDA be running slowly on CPU, a CPU task would also be running slowly; as Raistmer said, the worst possible way to do work.

I hate it when I manage to kill my own post while writing... OK let's see...

You were right , of course

GPU memory shifts from 19 to 136 used and after a few secs down to 49 and stays there. Core clock, memory clock and shader clock all go up and stay up. (whatever that exactly is...)
I suppose Richard is right - enough to shift (and heat up) not enough to run.

We are talking 6.08 stock right now, as I understood it, optimized has higher memory requirements? So, if I can't get stock to run, no point in vieing for opt.

Quote

Raistmer wrote:
Quote
Another way is to fallback to older CUDA DLLs (like 2.2 or even older) they have slower cuFFT but require less memory. That way will be still much better than CPU fallback mode.

I think that is the only way you'll be able to crunch with 256 MB VRAM.
Joe

Ah, I think you laid the finger in the wound.
When I downgraded the driver last week I was far to stressed to pay attention to details. I wrongly assumed the libs would follow suit. However:

Code: [Select]

 22/07/2010 09:47:08		NVIDIA GPU 0: Quadro FX 570M (driver version 19562, CUDA version 3000, compute capability 1.1, 256MB, 61 GFLOPS peak)

So Cuda 3.0 dll lib?
Well guess I have to remove a couple of files then to get back to 2.2. So, what needs removing/replacing? I doubt I can find all relevant files on my own...

Thanks a lot.

Richard Haselgrove · « **Reply #13 on:** 22 Jul 2010, 05:54:15 am »

Don't worry about the 'version 3000' from BOINC - that's just showing the maximum version you could use with that driver.

You can use lower versions, just by changing the two DLLs in the boinc\projects\sah directory.

Attaching, for simplicity, the 2.2 and 2.1 versions - straight file replace, no rename. Stop BOINC first, obviously, and try: 2.2 first, 2.1 if that fails.

Let me know when you've downloaded the files, and I'll remove the attachment - save server space.

Edit - attachments removed, served their purpose.

Miep · « **Reply #14 on:** 22 Jul 2010, 06:08:09 am »

Ta.
Ack. That was 2.2 in there... Ok trying 2.1

As the GPU is on a 4 min wait for inactivity, no results expected before tomorrow.

Author Topic: Driver, application and VRAM requirement? (Read 45797 times)

Miep

Driver, application and VRAM requirement?

BMaytum

Re: Driver, application and VRAM requirement?

Raistmer

Re: Driver, application and VRAM requirement?

Richard Haselgrove

Re: Driver, application and VRAM requirement?

Miep

Re: Driver, application and VRAM requirement?

Raistmer

Re: Driver, application and VRAM requirement?

Miep

Re: Driver, application and VRAM requirement?

Raistmer

Re: Driver, application and VRAM requirement?

Jason G

Re: Driver, application and VRAM requirement?

Miep

Re: Driver, application and VRAM requirement?

Josef W. Segur

Re: Driver, application and VRAM requirement?

Richard Haselgrove

Re: Driver, application and VRAM requirement?

Miep

Re: Driver, application and VRAM requirement?

Richard Haselgrove

Re: Driver, application and VRAM requirement?

Miep

Re: Driver, application and VRAM requirement?