Author Topic: CUDA_V12_app (Read 60101 times)

k6xt · « **Reply #30 on:** 24 Aug 2009, 10:17:34 am »

Quote from: Lord Asmodeus on 22 Aug 2009, 02:36:54 am

[snip]
Anyway this problem is "solved" by asking less WUs. Since the other day, my RAC has continuously raised, I don't really understand why. boinc.exe used to use like 12h of 1 core per day (one eighth of the CPU) which doesn't add up with the raise I see. Even my other computer see a raise and I have not touched it for ages (server use), so I guess it has to do with the SETI crew.
[snip]

My RAC has a bit more than doubled since 30 July after making sure I had all the latest KWSN software configured on the PC's. And shortened my queue. Maybe the project had something to do with it as well.

I saw posts from Raistmer, Joe Segur etc about shortening the queue but did not see any info on what is the "right" queue size. Mine is now 3 days' work. What is about "the right size" for quad cores with CUDA video?

Maik · « **Reply #31 on:** 29 Aug 2009, 05:59:24 pm »

Completed, validation inconclusive

same WU

1.claimed credit: 31.47
-> mod: V12 non VLAR Kill
-> Flopcounter: 11572163551292.662000
-> Triplet count: 5
-> GPU: GTX275
2.claimed credit 0.12
-> mod: V12 VLAR Kill
-> Flopcounter: 40177900787.577545
-> Spike count: 30
-> GPU: GT9400

greets
Maik

Claggy · « **Reply #32 on:** 29 Aug 2009, 06:34:38 pm »

Don't trust the 9400GT's host, it has 42 invalid WU's completed, and loads of 'Completed, validation inconclusive', a Reboot probably needed.

Claggy

Raistmer · « **Reply #33 on:** 29 Aug 2009, 06:40:04 pm »

Quote from: Claggy on 29 Aug 2009, 06:34:38 pm

Don't trust the 9400GT, it has 42 invalid WU's completed, loads of 'Completed, validation inconclusive', a Reboot probably needed.

Claggy

my 9400GT requires regular OS reboots

With last drivers it starts to do CPU fallback. Anower weird manifestations was before... crap card

About correct queue size - it should be determined experimentally.
look at task manager. ig boinc.exe takes no more than 1-2% of CPU - all ok.

k6xt · « **Reply #34 on:** 30 Aug 2009, 09:07:11 pm »

Quote from: Raistmer on 29 Aug 2009, 06:40:04 pm

About correct queue size - it should be determined experimentally.
look at task manager. ig boinc.exe takes no more than 1-2% of CPU - all ok.

Raistmer - Excellent thank you. None of my PC are above 0.2CPU. The PC I write this on, Q6600 and ASUS 9800GTX, it is 0.02CPU. And far as I know I've never had to reboot just because of the 9800GTX, dead reliable.
Regards
Art

Lord Asmodeus · « **Reply #35 on:** 24 Sep 2009, 09:23:33 pm »

Quote from: k6xt on 24 Aug 2009, 10:17:34 am

Quote from: Lord Asmodeus on 22 Aug 2009, 02:36:54 am
[snip]
Anyway this problem is "solved" by asking less WUs. Since the other day, my RAC has continuously raised, I don't really understand why. boinc.exe used to use like 12h of 1 core per day (one eighth of the CPU) which doesn't add up with the raise I see. Even my other computer see a raise and I have not touched it for ages (server use), so I guess it has to do with the SETI crew.
[snip]
My RAC has a bit more than doubled since 30 July after making sure I had all the latest KWSN software configured on the PC's. And shortened my queue. Maybe the project had something to do with it as well.

I saw posts from Raistmer, Joe Segur etc about shortening the queue but did not see any info on what is the "right" queue size. Mine is now 3 days' work. What is about "the right size" for quad cores with CUDA video?

I haven't found it yet. It's at 2+1 for the moment. SETI being offline for several days at times doesn't help. Moreover, I don't understand how BOINC decides to ask new WUs, it defies logic, sometimes there is a dozen WU left and it won't ask, other times there is hundreds and it keeps asking 'em. It's quite frustrating, so now I don't even open the manager anymore, I just run reschedule once or twice a day, putting 72% of the WUs on the GPU.

My RAC also took a big jump recently, maybe a change in the credit attribution ?

k6xt · « **Reply #36 on:** 24 Sep 2009, 10:18:03 pm »

Quote from: Lord Asmodeus on 24 Sep 2009, 09:23:33 pm

Quote from: k6xt on 24 Aug 2009, 10:17:34 am
Quote from: Lord Asmodeus on 22 Aug 2009, 02:36:54 am
[snip]
Anyway this problem is "solved" by asking less WUs. Since the other day, my RAC has continuously raised, I don't really understand why. boinc.exe used to use like 12h of 1 core per day (one eighth of the CPU) which doesn't add up with the raise I see. Even my other computer see a raise and I have not touched it for ages (server use), so I guess it has to do with the SETI crew.
[snip]
My RAC has a bit more than doubled since 30 July after making sure I had all the latest KWSN software configured on the PC's. And shortened my queue. Maybe the project had something to do with it as well.

I saw posts from Raistmer, Joe Segur etc about shortening the queue but did not see any info on what is the "right" queue size. Mine is now 3 days' work. What is about "the right size" for quad cores with CUDA video?

I haven't found it yet. It's at 2+1 for the moment. SETI being offline for several days at times doesn't help. Moreover, I don't understand how BOINC decides to ask new WUs, it defies logic, sometimes there is a dozen WU left and it won't ask, other times there is hundreds and it keeps asking 'em. It's quite frustrating, so now I don't even open the manager anymore, I just run reschedule once or twice a day, putting 72% of the WUs on the GPU.

My RAC also took a big jump recently, maybe a change in the credit attribution ?

Been on travel for a bit. Fast forward one month from my last post. My RAC evened out at 30,000 after the August changes. Maybe with some help from Rescheduler as I've had very few computation errors. The few I did have were due to the default 4 hour reschedule, which is not frequent enough for the newer Nvidia GPUs. One hour on a 275 works well, 2 hours on a 9800GTX. SETI reached 30K despite adding in 10 percent each for Milkyway and Einstein, reducing SETI to 80% on the 275. Only trouble with the 275, it is very noisy at 100% (MSI card) with the dual fans at full speed.

Sutaru Tsureku · « **Reply #37 on:** 11 Nov 2009, 09:35:22 am »

I'm little bit curious..

Why is the priority of the opt._CUDA_6.08_V12_app at 'lower than normal' and not at 'normal'?

Because of the boinc.exe and the System activity peaks I don't crunch on the CPU on my GPU cruncher (4x OCed GTX260-216).
Everytime this both progs have activity CPU and GPU tasks would be involved.

If the CUDA tasks would have higher priority ('normal') then only the CPU tasks ('low') would be involved if other progs would have activity.
And only if all CPU tasks are stopped then also GPU tasks are involved.

For example if my 4 GPUs would have a new CUDA start (CPU preparation) and the BOINC client have also activity (all 5 progs have 'normal') my system would crash?
AMD Quad-Core CPU.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=4789793

Or maybe after the CPU preparation time change the 'lower than normal' to 'normal' for CUDA tasks?

With the stock current priorities it's not well to crunch also on my CPU.
The GPU calculation time would increase x3 or something because of the big boinc.exe/System activities/peaks.
Only a 3 day WU cache!

For max. GPU performance it would be well to have 'normal' priority of the CUDA tasks all the time. (Also in the CPU preparaton time

)

Pappa · « **Reply #38 on:** 11 Nov 2009, 09:19:18 pm »

Over in the development area I have an AMD X2 6000 with a 9800GT, an X2 6000 with a 8400GS. Even though both X2 6000's have a 1 meg L2 there are problems piping information when Cuda is running (Cache overrun). Currently I do not have a Cuda card in my 9550 with an L3 Cache to see how that would run (warranty issues).

It gets worse, doing AP/MB and Cuda you end up with a Large Cache overrun (can you say VLAR). I have found that using an integer based (non FFT) project on the CPU and Cuda Seti on the GPU I get the best balance. So during the discussion of events over the months, the priorities were set.

Now with the Hybrid ATI for AP Raistmer is once again playing with priorties. This does not mention a Hybrid Cuda that we have no work to test with.

Then you "add" until recently, Boinc would congest itself for users with Large Caches. You are just adding CPU overhead. Everything has to fit inside the box. That is the OS, the RAM, the CPU(s) and GPU(s). What Users do not realize is that you only have X amount of resources. It is easy to overrun those resources.

That said the "priorities" to the CPU, can make it easier to overrun the resources.

Regards

Raistmer · « **Reply #39 on:** 12 Nov 2009, 02:43:16 pm »

Quote from: Sutaru Tsureku on 11 Nov 2009, 09:35:22 am

Why is the priority of the opt._CUDA_6.08_V12_app at 'lower than normal' and not at 'normal'?

Because of the boinc.exe and the System activity peaks I don't crunch on the CPU on my GPU cruncher (4x OCed GTX260-216).
Everytime this both progs have activity CPU and GPU tasks would be involved.

Because BOINC's science apps were designed to use remaining idle CPU/GPU cycles (with GPU it not tru now, it mostly you should just stopp GPU app when doing something GPU-intensive).
So, they priority should be lower than normal to not to disturb other user's applications.

Correct question is:
Why BOINC daemon and manager priorities are normal ones and not below normal. Cause I found that on notebook BOINC, even with CPU-only apps, disturbs video player.
So I had to disable BOINC computations completely to watch movie. It's not good, even it's bad actually.

Sutaru Tsureku · « **Reply #40 on:** 13 Jan 2010, 10:21:40 am »

Hello opt. crew!

I 'helped(?)' Jon in the SETI@home NC subforum to install opt. apps.
Core i7 920 & 2x GTX295 (old 2x PCB), Win7 64bit.
But after installation he have CUDA probs/errors.

Please if you have time, have a small look here: 'RAC with 2 gtx295'

IIRC, nVIDIA_driver_191.x and all stock and everything was well.
The probs started here.. 'Message 962334'.

In my messages you can find some interesting infos about his system.. but now - I'm out of ideas.. $:-\$

Thanks!

Richard Haselgrove · « **Reply #41 on:** 13 Jan 2010, 10:47:53 am »

You mean when you told him to install the VLAR_kill application, without explaining what it does, how it works, and the requirement for a user of 'anonymous platform' applications to manage and maintain their own science app from that point forward?

Most of his errors are -6 "Bad workunit header". It's a VLAR WU. VLAR kills it. That's what it does.

Task 1478354101 is interesting. -6 error, but no VLAR_kill message. Raistmer?

There are also a number of "Incorrect function. (0x1) - exit code 1 (0x1)", after what would appear to be full-term runtimes, but with none of the standard data in stderr_txt. Anyone?

Raistmer · « **Reply #42 on:** 13 Jan 2010, 10:56:49 am »

Yes, most errors just VLAR rejections. But there are some that very similar to my own troubles with 9400GT in dual-GPU config on Core2 Duo host.
Same "0" available memory readings time to time, same "unknown error".
For my own host it was only one solution - to remove 9400GT from it and leave 9600GSO only. It works perfect now.
Also, 9400GT works just perfect in Q9450 host.

What the reasons for such behavior?
I see 3 possibilities:
1) overheating.
2) system underpowered
3) system PCI-E bus overloaded and brings corruption to bus transfers.

Check them.
last one could be checked by using bandwidth sample from nVidia's CUDA or OpenCL samples.

Raistmer · « **Reply #43 on:** 13 Jan 2010, 10:58:34 am »

Quote from: Richard Haselgrove on 13 Jan 2010, 10:47:53 am

Task 1478354101 is interesting. -6 error, but no VLAR_kill message. Raistmer?

Perhaps task was aborted in especially rough way and stderr buffer was no flushed into file.
I'm more concerned with such errors:
http://setiathome.berkeley.edu/result.php?resultid=1478354023

efmer (fred) · « **Reply #44 on:** 13 Jan 2010, 11:13:48 am »

Quote from: Raistmer on 13 Jan 2010, 10:56:49 am

Yes, most errors just VLAR rejections. But there are some that very similar to my own troubles with 9400GT in dual-GPU config on Core2 Duo host.
Same "0" available memory readings time to time, same "unknown error".
For my own host it was only one solution - to remove 9400GT from it and leave 9600GSO only. It works perfect now.
Also, 9400GT works just perfect in Q9450 host.

The system has 2 old 2 pcb 295's probably the worst cards that nVidia made. They are not suitable for CUDA work, one maybe but two together get way way too hot.
I had my experience with them, and not good. The newer 1 PCB version or the 295 are ok though, quite a different design.

And to top thing off he uses Win 7 with drivers I couldn't get to work properly with my 2 295. Too many driver crashes.

So this system is asking for trouble and he is sometimes OC them as well. So not the best testbed.

Author Topic: CUDA_V12_app (Read 60101 times)

k6xt

Re: CUDA_V12_app

Maik

Re: CUDA_V12_app

Claggy

Re: CUDA_V12_app

Raistmer

Re: CUDA_V12_app

k6xt

Re: CUDA_V12_app

Lord Asmodeus

Re: CUDA_V12_app

k6xt

Re: CUDA_V12_app

Sutaru Tsureku

Re: CUDA_V12_app

Pappa

Re: CUDA_V12_app

Raistmer

Re: CUDA_V12_app

Sutaru Tsureku

Re: CUDA_V12_app

Richard Haselgrove

Re: CUDA_V12_app

Raistmer

Re: CUDA_V12_app

Raistmer

Re: CUDA_V12_app

efmer (fred)

Re: CUDA_V12_app