BOINC 6.10.x (Alpha) For Windows

Forum > GPU crunching

<< < (6/6)

Richard Haselgrove:

--- Quote from: Claggy on 10 Oct 2009, 09:10:54 am ---This post is the Bible for setting up a app_info's flop values to try and equalise DCF and work fetch between CPU and GPU's:

app_info for AP503, AP505, MB603 and MB608

If you take your GPU flops you get:

428 000 000 000 * 0.2 = 85 600 000 000

...

--- End quote ---

ooooooh - er - I'd be very careful about that.

That sounds like the same as saying that my 9800GT does 508 GFlops (which it does, according to NVidia's marketing people), and basing the calculation on that.

Whatever the hype, I'd be very doubtful that ATI cards are ten times faster than CUDA cards - and before anyone calls "Milkyway Credit", please show me the technical underpinning for that one, too.

Safest advice for now is just to put the CPU floating-point benchmark value in for now, and work upwards from there to normalise the DCF.

Raistmer:
yes, using CPU value would not give too big error.

And about MW credits... well, there is no project parity in credits as was in discussion last few weeks on BOINC dev maillist... And IMO good possibility to ensure such parity unfortunately not supported by BOINC administration.

Christoph:
Somebody know when the 6.10.14 will come, which hopefully correct some errors like that?

Rectifier:
Well, seeing as to how .13 was only released a few days ago - expect it to take a few more weeks at most. Besides, the problems discussed were mostly fixed in .13

Claggy:
Boinc 6.10.14 is out:

boinc_6.10.14_windows_intelx86.exe

boinc_6.10.14_windows_x86_64.exe

Claggy

Edit:

Change log for 6.10.14

Rom 7 October 2009
- MGR: Fix the Statistics page Save/Restore project display feature.

Charlie 8 October 2009
- MGR: If aborting multiple tasks, ask "Are you sure?" only once.

Rom 16 October 2009
- client: Fix crash that was introduced 7 months ago. (From Nicolás Alvarez)

- client: remove redundant 0s in job log

- client: add --unsigned_apps_ok cmdline option and <unsigned_apps_ok> config option. This tells the client to allow unsigned apps. For testing. No file xfers or other network traffic will be allowed if set.

- client: add <exit_after_finish> option (same as cmdline flag)

- client: add <skip_cpu_benchmarks> option (same as cmdline flag)

- client: print message if abort past-deadline unstarted job

- client: improve message when have NVIDIA drivers but no GPU

- client: if anonymous platform description (app_info.xml) doesn't specify FLOPS for a GPU app, assume that it runs at CPU peak speed rather than GPU peak speed. Better to be conservative, otherwise job might be aborted due to time limit exceeded.

- client: on startup, if a coproc needed by a job is missing, set a "coproc_missing" flag rather than aborting the job. If use removes a GPU board while there's a large queue of GPU jobs, they'll stay queued (until their deadline passes).

Note: this doesn't fix the situation where user connects via Remote Desktop while GPU jobs are running or queued. We should check for Remote Desktop every minute or so, and stop GPU jobs.

- client: the get_all_projects_list() RPC doesn't require auth

- client: don't multiply checkpoint interval (i.e., "disk interval" pref) by # processors.

- actually, make it "Tasks checkpoint to disk at most every ..." and change it in the advanced prefs dialog too

- LIB: Make the is_remote_desktop compilable for all VS versions and SKUs.

- MGR: Fix initial first connection problem on startup. I'm not sure why it was only happening at startup, there might have been a few crashes because of this issue as well. The basic problem is that wxWidgets had an exception handler around the initial frame creation and when the first GUI RPC was issued to detect whether or not we were atached to an account manager during menu creation the GUI thread would go about doing idle processing while waiting for the GUI RPC thread to initialize. During this time the frame pointer is NULL and was getting dereferenced which would halt window construction and stay there until some other event was fired.

- MGR: Initial dose of code cleanup and shuffling. Order the menu functions in the order in which they are displayed in the menu.

- client: address the situation where GPUs become unusable for certain periods (e.g. when Remote Desktop is used on Win).

* add is_usable() member function to COPROC.

Currently this just calls the respective (CUDA or CAL) initialization function. We need to check whether this works and/or causes problems.

* in enforce_schedule(), check whether usability has changed for each GPU type.

If we've gone from usable to unusable, flag all jobs for that GPU as coproc_missing (so they won't get run, and will quit if they're running). If we've gone from unusable to usable, clear the flag.

This should deal with all cases except where the client is started up with GPUs unusable.

- client: bug fixes to the above. Don't fetch work for an unable resource.

- update cal.h to current ATI code

- client/scheduler: standardize the FLOPS estimate between NVIDIA and ATI. Make them both peak FLOPS, according to the formula supplied by the manufacturer.

The impact on the client is minor:

* the startup message describing the GPU
* the weight of the resource type in computing long-term debt

On the server, I changed the example app_plan() function to assume that app FLOPS is 20% of peak FLOPS (that's about what it is for SETI@home).

- client: the weight of GPU debt in computing total debt should be (estimated throughput of all GPUs)/(estimated throughput of all CPUs) rather than the ratio of 1 GPU to 1 CPU. This change will hopefully cause ratios of granted credit to more closely match resource shares.

- client: multi-thread jobs were being given too high priority; in particular, they were preempting jobs in the middle of time slice.

Solution:
1) don't use MT in the sort order defined by more_important().
2) add a 2nd reordering in which MT jobs are moved ahead of non-MT jobs, but only if #CPUs used is < #CPUs (see promote_multi_thread_jobs())

- client: the seqno of jobs in progress but not selected was being set to zero. It should be runnable_jobs.size(). This could potentially cause wrong scheduling decisions.

Navigation

[0] Message Index

[*] Previous page

Go to full version