Author Topic: BOINC 6.10.x (Alpha) For Windows (Read 40424 times)

Claggy · « **Reply #15 on:** 05 Oct 2009, 05:53:58 pm »

Quote from: MarkJ on 04 Oct 2009, 05:34:21 am

That must have been the .4 and .5 interations. We are now up to 6.10.11 which seems to be okay so far. There have been a couple of fixes since so expect a new version any day soon.

Boinc 6.10.12 is out:

boinc_6.10.12_windows_intelx86.exe

boinc_6.10.12_windows_x86_64.exe

Claggy

Edit:

Change Log:

Rom 2 October 2009
- client: fix crashing bug introduced in [18605]

Rom 5 october 2009
- client: if downloaded project list file is garbage, ignore it.

- all: accept <foo /> as an XML bool

- client: Apparently it is valid for the autoproxy to return successful API completeion but a null proxy list. Check for the null instead of crashing.

- client: only support one of the ati13* plan classes at a time. A couple users had not updated their amdcal* runtime libraries after upgrading catalyst drivers. This was leading to crashes of the project applications when work was supplied looking for the old DLL names.

- client: fix a handle leak I just introduced. (From: Andreas a.k.a Gipsel)

- lib: Fix memory/resource leak. (From Nicolás Alvarez)

- lib: Add additional ATI descriptions.

- lib: Fix some inaccurate ATI capabilities in certain cards. (From: Andreas a.k.a Gipsel)

- lib: Fix memory/resource leak. (From Nicolás Alvarez) (reprise)

- client: restore calDeviceGetInfo(), add its info to COPROC_ATI struct (some plan class might need to know this).

- Code cleanup.

- client: better behavior if a GPU goes away:
1) if an APP_VERSION is missing a coprocessor, don't delete it and its files. (If the coprocessor returns, we won't need to re-download)

2) if a RESULT uses an app version that is missing a coprocessor, abort it (rather than deleting it). The client will report the result on the next scheduler RPC, and the server will make a new instance.

- client: fix bug where if you change project "no CPU/NVIDIA/ATI" prefs and update, the change wouldn't take effect until client restart.

- client: fix bug in enforcement of "no CPU/NVIDIA/ATI" prefs

- client: make the order of the result vector consistent with the order used to select coproc jobs

- client: improve coproc_debug messages

- client: if a task is running, uses a GPU, and the system has >1 GPU, append text to its resource string saying which GPU it's using

- manager: tweak Task properties text

- DIAG: Suspend threads right before extracting their context and then resume them afterwards. Otherwise we could end up in a deadlock state where both the main thread and a support thread are attempting to use the same system resource. In the last situation it was way down in Winsock.

- DIAG: Don't resume after the thread has been suspended, otherwise the thread stack may be trashed after extracting the context. This should still be okay though as by the time the diagnostics framework has gotten here it has already downloaded all the symbols it'll need.

Tag for 6.10.12 release, all platforms boinc_core_release_6_10_12

Richard Haselgrove · « **Reply #16 on:** 05 Oct 2009, 07:37:58 pm »

Quote from: Claggy on 05 Oct 2009, 05:53:58 pm

Claggy

- all: accept <foo /> as an XML bool

Claggy - your space - R.I.P.

Raistmer · « **Reply #17 on:** 05 Oct 2009, 10:49:30 pm »

Quote from: Claggy on 05 Oct 2009, 05:53:58 pm

2) if a RESULT uses an app version that is missing a coprocessor, abort it (rather than deleting it). The client will report the result on the next scheduler RPC, and the server will make a new instance.

Welcome to lose work in progress as official BOINC's politics.

Claggy · « **Reply #18 on:** 06 Oct 2009, 01:04:43 am »

Quote from: Richard Haselgrove on 05 Oct 2009, 07:37:58 pm

Quote from: Claggy on 05 Oct 2009, 05:53:58 pm

Claggy

- all: accept <foo /> as an XML bool

Claggy - your space - R.I.P.

Yes I'd noticed Claggy's Space was no more.

Boinc 6.10.13 is released:

boinc_6.10.13_windows_intelx86.exe

boinc_6.10.13_windows_x86_64.exe

Claggy

Edit:

Change Log:

Rom 5 october 2009
- client: Fix crash that was introduced 7 months ago. (From Nicolás Alvarez)

- client: Fix a missed checkin that prevents a crash during autoproxy detection.

Tag for 6.10.13 release, all platforms boinc_core_release_6_10_13

Arnulf · « **Reply #19 on:** 10 Oct 2009, 03:47:29 am »

Hello Claggy!

Can you tell me if there is some coding that errors out WU's that runs for more than 24 hours?
I have tested the 6.10.12 version and that one reports errors when running over 86430 seconds.
When re-installing 6.10.3 the errors goes away.

Arnulf

Here are the results:

http://setiathome.berkeley.edu/beta/workunit.php?wuid=2321437
http://setiathome.berkeley.edu/beta/workunit.php?wuid=2321435
http://setiathome.berkeley.edu/beta/workunit.php?wuid=2321436
http://setiathome.berkeley.edu/beta/workunit.php?wuid=2321243

Raistmer · « **Reply #20 on:** 10 Oct 2009, 04:31:37 am »

Yes, it's very annoying to lose work w/o any good reason to do that!

<core_client_version>6.10.12</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>

Claggy · « **Reply #21 on:** 10 Oct 2009, 07:33:36 am »

Quote from: Arnulf on 10 Oct 2009, 03:47:29 am

Hello Claggy!

Can you tell me if there is some coding that errors out WU's that runs for more than 24 hours?
I have tested the 6.10.12 version and that one reports errors when running over 86430 seconds.
When re-installing 6.10.3 the errors goes away.

Arnulf

Here are the results:

http://setiathome.berkeley.edu/beta/workunit.php?wuid=2321437
http://setiathome.berkeley.edu/beta/workunit.php?wuid=2321435
http://setiathome.berkeley.edu/beta/workunit.php?wuid=2321436
http://setiathome.berkeley.edu/beta/workunit.php?wuid=2321243

Have you got FLOPS values entered in your app_info?

Boinc 6.10.5 introduced:

- client: if app_info.xml doesn't specify flops, use an estimate that takes GPUs into account.

If you do have an app_info with flops values, the flops value is too high, as Boinc will abort the WU when it gets to 10x the WU flops value,

If you don't have flops values in an app_info, then Boinc will take the GPU estimated flops, and expect the WU to be finished very quickly,

In changeset 19282, which should be in the next new version of Boinc:

- client: if anonymous platform description (app_info.xml)

doesn't specify FLOPS for a GPU app,
assume that it runs at CPU peak speed rather than GPU peak speed.
Better to be conservative, otherwise job might be aborted
due to time limit exceeded.

Since you seem to be running Raistmer's AstroPulse hybrid CPU/ATI GPU build for GPUs with double precision support and a lot of it runs on the CPU,
best way round this if you want to run Boinc 6.10.5 to 6.10.13 is to add CPU flops values to the GPU part of the app info.

References:

Error code -171 to -180 explained.

Infinite loops

Claggy

Arnulf · « **Reply #22 on:** 10 Oct 2009, 08:29:32 am »

Thanks!

<flops>4127010920</flops>

How do I determine what number to insert? The line above I snagged from a CUDA App_info.xml.
I have also found some formula that says GFLOPS*0.2 but thats for CUDA devices, and I expect Brook to be somewhat different.

My Boinc manager says my card should deliver 428GFLOPS.

Arnulf

Claggy · « **Reply #23 on:** 10 Oct 2009, 09:10:54 am »

This post is the Bible for setting up a app_info's flop values to try and equalise DCF and work fetch between CPU and GPU's:

app_info for AP503, AP505, MB603 and MB608

If you take your GPU flops you get:

428 000 000 000 * 0.2 = 85 600 000 000

and CPU flops:

1 824 240 000 * 2.6 = 4 743 024 000

I'd eithier take the CPU flops * 2.6 and use that,
Or I'd take the GPU flops *0.2, and take a digit off the end,

so

4 743 024 000

or

8 560 000 000

The first one will be more safe, the second one will give you better work fetch,
I don't know how fast Raistmer's AP/Brook app is, so it'll be a case of suck it and see,
you can always increase the flops values later.

Claggy

Raistmer · « **Reply #24 on:** 10 Oct 2009, 10:02:59 am »

Good estimate is ~24% faster in CPU time than current opt AP release.
Elapsed time difference is less though.

Richard Haselgrove · « **Reply #25 on:** 10 Oct 2009, 10:28:21 am »

Quote from: Claggy on 10 Oct 2009, 09:10:54 am

This post is the Bible for setting up a app_info's flop values to try and equalise DCF and work fetch between CPU and GPU's:

app_info for AP503, AP505, MB603 and MB608

If you take your GPU flops you get:

428 000 000 000 * 0.2 = 85 600 000 000

...

ooooooh - er - I'd be very careful about that.

That sounds like the same as saying that my 9800GT does 508 GFlops (which it does, according to NVidia's marketing people), and basing the calculation on that.

Whatever the hype, I'd be very doubtful that ATI cards are ten times faster than CUDA cards - and before anyone calls "Milkyway Credit", please show me the technical underpinning for that one, too.

Safest advice for now is just to put the CPU floating-point benchmark value in for now, and work upwards from there to normalise the DCF.

Raistmer · « **Reply #26 on:** 10 Oct 2009, 10:59:16 am »

yes, using CPU value would not give too big error.

And about MW credits... well, there is no project parity in credits as was in discussion last few weeks on BOINC dev maillist... And IMO good possibility to ensure such parity unfortunately not supported by BOINC administration.

Christoph · « **Reply #27 on:** 15 Oct 2009, 05:52:47 pm »

Somebody know when the 6.10.14 will come, which hopefully correct some errors like that?

Rectifier · « **Reply #28 on:** 16 Oct 2009, 08:56:18 am »

Well, seeing as to how .13 was only released a few days ago - expect it to take a few more weeks at most. Besides, the problems discussed were mostly fixed in .13

Claggy · « **Reply #29 on:** 16 Oct 2009, 05:03:51 pm »

Boinc 6.10.14 is out:

boinc_6.10.14_windows_intelx86.exe

boinc_6.10.14_windows_x86_64.exe

Claggy

Edit:

Change log for 6.10.14

Rom 7 October 2009
- MGR: Fix the Statistics page Save/Restore project display feature.

Charlie 8 October 2009
- MGR: If aborting multiple tasks, ask "Are you sure?" only once.

Rom 16 October 2009
- client: Fix crash that was introduced 7 months ago. (From Nicolás Alvarez)

- client: remove redundant 0s in job log

- client: add --unsigned_apps_ok cmdline option and <unsigned_apps_ok> config option. This tells the client to allow unsigned apps. For testing. No file xfers or other network traffic will be allowed if set.

- client: add <exit_after_finish> option (same as cmdline flag)

- client: add <skip_cpu_benchmarks> option (same as cmdline flag)

- client: print message if abort past-deadline unstarted job

- client: improve message when have NVIDIA drivers but no GPU

- client: if anonymous platform description (app_info.xml) doesn't specify FLOPS for a GPU app, assume that it runs at CPU peak speed rather than GPU peak speed. Better to be conservative, otherwise job might be aborted due to time limit exceeded.

- client: on startup, if a coproc needed by a job is missing, set a "coproc_missing" flag rather than aborting the job. If use removes a GPU board while there's a large queue of GPU jobs, they'll stay queued (until their deadline passes).

Note: this doesn't fix the situation where user connects via Remote Desktop while GPU jobs are running or queued. We should check for Remote Desktop every minute or so, and stop GPU jobs.

- client: the get_all_projects_list() RPC doesn't require auth

- client: don't multiply checkpoint interval (i.e., "disk interval" pref) by # processors.

- actually, make it "Tasks checkpoint to disk at most every ..." and change it in the advanced prefs dialog too

- LIB: Make the is_remote_desktop compilable for all VS versions and SKUs.

- MGR: Fix initial first connection problem on startup. I'm not sure why it was only happening at startup, there might have been a few crashes because of this issue as well. The basic problem is that wxWidgets had an exception handler around the initial frame creation and when the first GUI RPC was issued to detect whether or not we were atached to an account manager during menu creation the GUI thread would go about doing idle processing while waiting for the GUI RPC thread to initialize. During this time the frame pointer is NULL and was getting dereferenced which would halt window construction and stay there until some other event was fired.

- MGR: Initial dose of code cleanup and shuffling. Order the menu functions in the order in which they are displayed in the menu.

- client: address the situation where GPUs become unusable for certain periods (e.g. when Remote Desktop is used on Win).

* add is_usable() member function to COPROC.

Currently this just calls the respective (CUDA or CAL) initialization function. We need to check whether this works and/or causes problems.

* in enforce_schedule(), check whether usability has changed for each GPU type.

If we've gone from usable to unusable, flag all jobs for that GPU as coproc_missing (so they won't get run, and will quit if they're running). If we've gone from unusable to usable, clear the flag.

This should deal with all cases except where the client is started up with GPUs unusable.

- client: bug fixes to the above. Don't fetch work for an unable resource.

- update cal.h to current ATI code

- client/scheduler: standardize the FLOPS estimate between NVIDIA and ATI. Make them both peak FLOPS, according to the formula supplied by the manufacturer.

The impact on the client is minor:

* the startup message describing the GPU
* the weight of the resource type in computing long-term debt

On the server, I changed the example app_plan() function to assume that app FLOPS is 20% of peak FLOPS (that's about what it is for SETI@home).

- client: the weight of GPU debt in computing total debt should be (estimated throughput of all GPUs)/(estimated throughput of all CPUs) rather than the ratio of 1 GPU to 1 CPU. This change will hopefully cause ratios of granted credit to more closely match resource shares.

- client: multi-thread jobs were being given too high priority; in particular, they were preempting jobs in the middle of time slice.

Solution:
1) don't use MT in the sort order defined by more_important().
2) add a 2nd reordering in which MT jobs are moved ahead of non-MT jobs, but only if #CPUs used is < #CPUs (see promote_multi_thread_jobs())

- client: the seqno of jobs in progress but not selected was being set to zero. It should be runnable_jobs.size(). This could potentially cause wrong scheduling decisions.

Author Topic: BOINC 6.10.x (Alpha) For Windows (Read 40424 times)

Claggy

Re: BOINC 6.10.0 (Alpha) For Windows

Richard Haselgrove

Re: BOINC 6.10.0 (Alpha) For Windows

Raistmer

Re: BOINC 6.10.0 (Alpha) For Windows

Claggy

Re: BOINC 6.10.0 (Alpha) For Windows

Arnulf

Re: BOINC 6.10.x (Alpha) For Windows

Raistmer

Re: BOINC 6.10.x (Alpha) For Windows

Claggy

Re: BOINC 6.10.x (Alpha) For Windows

Arnulf

Re: BOINC 6.10.x (Alpha) For Windows

Claggy

Re: BOINC 6.10.x (Alpha) For Windows

Raistmer

Re: BOINC 6.10.x (Alpha) For Windows

Richard Haselgrove

Re: BOINC 6.10.x (Alpha) For Windows

Raistmer

Re: BOINC 6.10.x (Alpha) For Windows

Christoph

Re: BOINC 6.10.x (Alpha) For Windows

Rectifier

Re: BOINC 6.10.x (Alpha) For Windows

Claggy

Re: BOINC 6.10.x (Alpha) For Windows