Author Topic: Unified installer add flops (Read 31046 times)

efmer (fred) · « **on:** 06 Dec 2009, 05:13:59 am »

Just a suggestion ,it would be nice to add a flops statement to the app_info.xml.
Because BOINC still doesn't have a separate CPU / GPU scheduler, the flops statement is essential is you have a faster GPU installed.
Without the flops, I see GPU tasks planned for almost 3 hours, as they take only 15 minutes or less, this causes the scheduler to request no GPU tasks.

Jason G · « **Reply #1 on:** 06 Dec 2009, 05:36:10 am »

Quote from: Fred M on 06 Dec 2009, 05:13:59 am

Just a suggestion ,it would be nice to add a flops statement to the app_info.xml.
Because BOINC still doesn't have a separate CPU / GPU scheduler, the flops statement is essential is you have a faster GPU installed.
Without the flops, I see GPU tasks planned for almost 3 hours, as they take only 15 minutes or less, this causes the scheduler to request no GPU tasks.

Have been thinking about *something* along those lines, so it must be a good idea

. It's on a pretty long list of stuff that needs looking at ... Have not managed to work out how to best determine a default entry ... I'm hoping to avoid having to parse boinc state files (as they can be problematic for a few reasons) and installer parsing capabilities aren't easy to program.

It might be best to leave that to a separate small GUI app (perhaps optionally installed in installer & run after setup. ) which could be run even sometime after installation (for 'recalibration' to new hardware etc, could evolve into kindof a control panel applet or similar). If that looks like the way to go (when I get to it), then I'll lliase with Richard Haselgrove on the latest best defaults. Since I've been preoccupied with my own 'fiddling', rather than concerned with how flops is dealt with now that new Boinc versions seem to have multipled the performance of my card by five for no apparent technical/practical purpose other than marketing

.

Jason

efmer (fred) · « **Reply #2 on:** 06 Dec 2009, 05:45:22 am »

Quote from: Jason G on 06 Dec 2009, 05:36:10 am

Quote from: Fred M on 06 Dec 2009, 05:13:59 am
Just a suggestion ,it would be nice to add a flops statement to the app_info.xml.
Because BOINC still doesn't have a separate CPU / GPU scheduler, the flops statement is essential is you have a faster GPU installed.
Without the flops, I see GPU tasks planned for almost 3 hours, as they take only 15 minutes or less, this causes the scheduler to request no GPU tasks.

Have been thinking about *something* along those lines, so it must be a good idea . It's on a pretty long list of stuff that needs looking at ... Have not managed to work out how to best determine a default entry ... I'm hoping to avoid having to parse boinc state files (as they can be problematic for a few reasons) and installer parsing capabilities aren't easy to program.

It might be best to leave that to a separate small GUI app (perhaps optionally installed in installer & run after setup. ) which could be run even sometime after installation (for 'recalibration' to new hardware etc, could evolve into kindof a control panel applet or similar). If that looks like the way to go (when I get to it), then I'll lliase with Richard Haselgrove on the latest best defaults. Since I've been preoccupied with my own 'fiddling', rather than concerned with how flops is dealt with now that new Boinc versions seem to have multipled the performance of my card by five for no apparent technical/practical purpose other than marketing .

Jason

The easiest way is by a GPU card list, because as you already stated, the reported values are not very accurate, but have to be as high as possible, marketing wise.
If you are interested I could help you with some things, I have some experience. Detecting the nVidia and AMD cards is fairly easy to do.

Jason G · « **Reply #3 on:** 06 Dec 2009, 06:49:38 am »

Quote from: Fred M on 06 Dec 2009, 05:45:22 am

The easiest way is by a GPU card list, because as you already stated, the reported values are not very accurate, but have to be as high as possible, marketing wise.
If you are interested I could help you with some things, I have some experience. Detecting the nVidia and AMD cards is fairly easy to do.

Cheers Fred. Good ideas. A few things that I need to look at before getting to that. In the meantime, for reference, the installer system I use is the NSIS one, and can run external (customi by us) executables for detection etc.

Probably if you're going to be helping out with installer stuff (which has proven a bit more important/successful than I thought it would) then you should have some developer access. I'll forward a link to this post/thread to Gecko_R7, to see if it's possible/feasible to arrange that. [Edit: haven't seen much of our liizard friend lately, hopefully he's still about & doing well]

Jason

Richard Haselgrove · « **Reply #4 on:** 06 Dec 2009, 07:15:23 am »

Quote from: Jason G on 06 Dec 2009, 05:36:10 am

Have been thinking about *something* along those lines, so it must be a good idea . It's on a pretty long list of stuff that needs looking at ... Have not managed to work out how to best determine a default entry ... I'm hoping to avoid having to parse boinc state files (as they can be problematic for a few reasons) and installer parsing capabilities aren't easy to program.

Agreed, parsing state files is a pain. The 'correct' way to do it within BOINC would be a GUI_RPC: but - I've just done one with <get_host_info> (against v6.10.21), and it only mentions the CPU. No reference to the CUDA card at all!!

Fred - please check and confirm: they've forgotten to add any GPU information to the standard GUI_RPC calls (<get_state> says nothing either). I'll check with DA, too.

Pending that, there is no way of getting GPU information even by file parsing - it would have to be direct detection. I would suggest it would be better to query the card(s) directly for shaders/speeds/compute capability, and impute a figure from that - like the old BOINC 'est flops' figure, peak is useless - rather than working from a look-up list which would be difficult to maintain and go rapidly out of date.

Quote from: Jason G on 06 Dec 2009, 05:36:10 am

It might be best to leave that to a separate small GUI app (perhaps optionally installed in installer & run after setup. ) which could be run even sometime after installation (for 'recalibration' to new hardware etc, could evolve into kindof a control panel applet or similar). If that looks like the way to go (when I get to it), then I'll lliase with Richard Haselgrove on the latest best defaults. Since I've been preoccupied with my own 'fiddling', rather than concerned with how flops is dealt with now that new Boinc versions seem to have multipled the performance of my card by five for no apparent technical/practical purpose other than marketing .

Jason

Happy to work with you on it when the time comes. Remember also that effective flops are strongly dependent on the version of the CUDA .DLLs in use, and judging by recent reports, driver version as well (195.xx on Win7 is horribly slow, I read - but that may just be FUD).

efmer (fred) · « **Reply #5 on:** 06 Dec 2009, 07:57:48 am »

Quote from: Richard Haselgrove on 06 Dec 2009, 07:15:23 am

Quote from: Jason G on 06 Dec 2009, 05:36:10 am
Have been thinking about *something* along those lines, so it must be a good idea . It's on a pretty long list of stuff that needs looking at ... Have not managed to work out how to best determine a default entry ... I'm hoping to avoid having to parse boinc state files (as they can be problematic for a few reasons) and installer parsing capabilities aren't easy to program.

Agreed, parsing state files is a pain. The 'correct' way to do it within BOINC would be a GUI_RPC: but - I've just done one with <get_host_info> (against v6.10.21), and it only mentions the CPU. No reference to the CUDA card at all!!

Fred - please check and confirm: they've forgotten to add any GPU information to the standard GUI_RPC calls (<get_state> says nothing either). I'll check with DA, too.

Pending that, there is no way of getting GPU information even by file parsing - it would have to be direct detection. I would suggest it would be better to query the card(s) directly for shaders/speeds/compute capability, and impute a figure from that - like the old BOINC 'est flops' figure, peak is useless - rather than working from a look-up list which would be difficult to maintain and go rapidly out of date.

Quote from: Jason G on 06 Dec 2009, 05:36:10 am
It might be best to leave that to a separate small GUI app (perhaps optionally installed in installer & run after setup. ) which could be run even sometime after installation (for 'recalibration' to new hardware etc, could evolve into kindof a control panel applet or similar). If that looks like the way to go (when I get to it), then I'll lliase with Richard Haselgrove on the latest best defaults. Since I've been preoccupied with my own 'fiddling', rather than concerned with how flops is dealt with now that new Boinc versions seem to have multipled the performance of my card by five for no apparent technical/practical purpose other than marketing .

Jason

Happy to work with you on it when the time comes. Remember also that effective flops are strongly dependent on the version of the CUDA .DLLs in use, and judging by recent reports, driver version as well (195.xx on Win7 is horribly slow, I read - but that may just be FUD).

I checked the source for the GET_HOST_INFO and it only has information about the CPU stored.
There is some info in the message log, that can be read by RPC calls, but you can only do that after stopping and restarting the BOINC client.
But I don't know how accurate this info is.

Do you know of any plans of making the GPU processing self learning. e.g. add a separate correction factor in the BOINC Client. That would be the best way to go.
Otherwise there are several options, one is read the capabilities of the card directly and make a calculated guess of the actual flops to fill in the xml file.
And I have noticed a performance difference in XP and WIN 7 that is considerable, maybe 10 - 20%. But that will improve after they have solved the problem of making it work at all.

Richard Haselgrove · « **Reply #6 on:** 06 Dec 2009, 08:37:34 am »

Quote from: Fred M on 06 Dec 2009, 07:57:48 am

I checked the source for the GET_HOST_INFO and it only has information about the CPU stored.

There is some info in the message log, that can be read by RPC calls, but you can only do that after stopping and restarting the BOINC client.

But I don't know how accurate this info is.

Do you know of any plans of making the GPU processing self learning. e.g. add a separate correction factor in the BOINC Client. That would be the best way to go.

Otherwise there are several options, one is read the capabilities of the card directly and make a calculated guess of the actual flops to fill in the xml file.

And I have noticed a performance difference in XP and WIN 7 that is considerable, maybe 10 - 20%. But that will improve after they have solved the problem of making it work at all.

Yes, I checked those sources as well before asking my calculatedly-naive question on boinc_alpha. Sometimes the simplest questions have the biggest sting in the tail.

I think the start-up message log is a dead-end - not the way to go all all. The information is accurate, but summarised and badly structured - parsing it would involve detecting whether it came from a client reporting 'real' flops or 'marketing' flops, and applying different correction factors to each. David should respond to my question with an 'ooops', and I could ask him to code, as far as possible, the raw technical details returned by the GPU's API.

As regards self-learning: yes, it's been requested many times - basically, to record and maintain TDCF at the app_version level - and agreed in principle many months ago. But boring chores like this tend to get pushed to the back of the development queue. We've had the whole facebook cycle, and the ATI management, before the basic functions are properly debugged. At the moment, I'm pushing him quite hard on debt management (affecting work fetch and task switching), which is still fubar'd 50 weeks after the public CUDA launch.

Richard Haselgrove · « **Reply #7 on:** 06 Dec 2009, 08:54:31 am »

Jorden has reminded us that you can get a sneak preview of the fields that should be in the RPC by looking at a project sched_request.xml file.

efmer (fred) · « **Reply #8 on:** 06 Dec 2009, 12:04:50 pm »

Quote from: Richard Haselgrove on 06 Dec 2009, 08:54:31 am

Jorden has reminded us that you can get a sneak preview of the fields that should be in the RPC by looking at a project sched_request.xml file.

There is a flops value in the file, per application. But it can be the manually placed value or the system value.
And the original system value is about a factor 10 off from the one it should be on my system.
Another problem is that you have to be lucky enough, that there is a cuda task in the scheduler file.

Jason G · « **Reply #9 on:** 06 Dec 2009, 12:29:24 pm »

Quote from: Fred M on 06 Dec 2009, 12:04:50 pm

There is a flops value in the file, per application. But it can be the manually placed value or the system value.
And the original system value is about a factor 10 off from the one it should be on my system.
Another problem is that you have to be lucky enough, that there is a cuda task in the scheduler file.

Just thinking, another option is to obtain/calculate the values we need programmatically through CudaAPI.

Not sure what Boinc uses now, but IIRC the first releases that supported Cuda derived a value from the clocks & number of multiprocessors.

I think it may be most reliable if we calcuate it ourselves and scale as required. Being independant of Boinc, it should be simpler than processing Boinc files that may be subject to change in content or backend value/meaning per Boinc version.

The other appeal , to me, is that it should (at least partially) work for unknown/unlisted/unreleased cards, which might be an advantage over using lookup tables too ( less maintenance, i.e. not having to make a new release everytime nVidia releases/renames a card) ... it should also account for OC.

Thoughts ?

[Later:] I guess ATI would have it's own similar api method/needs ... for CPU some tiny timed calculation might be sufficient, but as an option can always implement the same bench as used by boinc (a fairly easy initial option, even if we didn't end up sticking with it as is).

efmer (fred) · « **Reply #10 on:** 06 Dec 2009, 01:18:17 pm »

Quote from: Jason G on 06 Dec 2009, 12:29:24 pm

Quote from: Fred M on 06 Dec 2009, 12:04:50 pm
There is a flops value in the file, per application. But it can be the manually placed value or the system value.
And the original system value is about a factor 10 off from the one it should be on my system.
Another problem is that you have to be lucky enough, that there is a cuda task in the scheduler file.

Just thinking, another option is to obtain/calculate the values we need programmatically through CudaAPI.

Not sure what Boinc uses now, but IIRC the first releases derived a value from the clocks & number of multiprocessors.

I think it may be most reliable if we calcuate it ourselves and scale as required. Being independant of Boinc, it should be simpler than processing Boinc files that may be subject to change in content or backend value/meaning per Boinc version.

The other appeal of, to me, is that it should (at least partially) work for unknown/unlisted/unreleased cards, which might be an advantage over using lookup tables too ( less maintenance, i.e. not having to make a new release everytime nVidia releases/renames a card) ... it should also account for OC.

Thoughts ?

That's my preference, the direct approach. Using Boinc everything will be highly dependent on the BOINC version.
I've taken a quick look at the API and you can get the clock rate, and nr of processors etc. That may be enough to get an est. flop value.

Richard Haselgrove · « **Reply #11 on:** 06 Dec 2009, 03:25:56 pm »

BOINC changed from 'real' flops to 'peak' flops with http://boinc.berkeley.edu/trac/changeset/19310, so you can see both versions there - they're inline functions in trunk/boinc/lib/coproc.h

Here are the fields BOINC knows about for a CUDA card. This is taken from the information sent by the BOINC client to every project indiscriminately whenever 'update' is clicked (or a scheduler contact for any other purpose). You don't have to be requesting or reporting a CUDA task - in fact, you don't need to be talking to a CUDA-capable project at all. The only ambiguous field is 'clockRate': I've checked with GPU-Z, and the value is correct for the shader clock (as it should be). BOINC rates this card as 484 GFLOPS peak.

Code: [Select]

<coprocs>
   <coproc_cuda>
   <count>1</count>
   <name>GeForce 9800 GTX/9800 GTX+</name>
   <req_secs>25769.711159</req_secs>
   <req_instances>0.000000</req_instances>
   <estimated_delay>0.000000</estimated_delay>
   <drvVersion>19038</drvVersion>
   <cudaVersion>2030</cudaVersion>
   <totalGlobalMem>536543232</totalGlobalMem>
   <sharedMemPerBlock>16384</sharedMemPerBlock>
   <regsPerBlock>8192</regsPerBlock>
   <warpSize>32</warpSize>
   <memPitch>262144</memPitch>
   <maxThreadsPerBlock>512</maxThreadsPerBlock>
   <maxThreadsDim>512 512 64</maxThreadsDim>
   <maxGridSize>65535 65535 1</maxGridSize>
   <totalConstMem>65536</totalConstMem>
   <major>1</major>
   <minor>1</minor>
   <clockRate>1890000</clockRate>
   <textureAlignment>256</textureAlignment>
   <deviceOverlap>1</deviceOverlap>
   <multiProcessorCount>16</multiProcessorCount>
   </coproc_cuda>
</coprocs>

Jason G · « **Reply #12 on:** 06 Dec 2009, 03:38:23 pm »

Good to see both versions via the changeset, cheers (bookmarking) , might use one of those estimates scaled initially, then try whip up more accurate bench later down the road if we find that would compensate better for memory speeds etc.

efmer (fred) · « **Reply #13 on:** 06 Dec 2009, 04:13:43 pm »

These values look to come out of the cudaApi. But what does the BOINC client actually do with these values.
My card gives 596 GFlops and the value I found to be correct is 22 GFlops. The last one is in the flops statement in the xml.
A bit of a difference. And as already mentioned the values differ quite a bit in various BOINC versions.
I see 62 GFlops on a older BOINC client and that card is no more than 1/2 as slow as the other one. I found it to be about 14 GFlops.

So 596 = 22 and 62 = 14

It looks these values don't have much to do with the actual calculation speed. More like theoretical values out of the sales brochure.

Jason G · « **Reply #14 on:** 06 Dec 2009, 04:25:51 pm »

Quote from: Fred M on 06 Dec 2009, 04:13:43 pm

...
It looks these values don't have much to do with the actual calculation speed. More like theoretical values out of the sales brochure.
...

Absolutely.
[rant]
I actually got a little agitated when I installed Boinc 6.10.18 on my system & noticed it was claiming ~.~317GFlops 'peak' ... ( the words 'when hell freezes over' came to mind

Then I had a good laugh about it and felt much better). Of course with memory bound algorithms like larger FFT sizes, on that hardware, real world performance is more like 18-20GFLops, around twice that of each of my CPU Cores with the same problem. I've little doubt that kernels that do multiple redundant operations on the same data repeatedly, sitting in registers, on register sized (very small, ~8k total IIRC) datasets could acheive that kind of throughput ... The dumb thing is that sounds like graphics frame by frame processing more that general purpose computation[/rant]

I'm fairly certain the syntheic estimates shoudl be good enough, provided we scale the number appropriately to a realistic range... but there is the alternative of benching with real code if we find better accuracy is needed (which I doubt, but the option is there).

Jason

Author Topic: Unified installer add flops (Read 31046 times)

efmer (fred)

Unified installer add flops

Jason G

Re: Unified installer add flops

efmer (fred)

Re: Unified installer add flops

Jason G

Re: Unified installer add flops

Richard Haselgrove

Re: Unified installer add flops

efmer (fred)

Re: Unified installer add flops

Richard Haselgrove

Re: Unified installer add flops

Richard Haselgrove

Re: Unified installer add flops

efmer (fred)

Re: Unified installer add flops

Jason G

Re: Unified installer add flops

efmer (fred)

Re: Unified installer add flops

Richard Haselgrove

Re: Unified installer add flops

Jason G

Re: Unified installer add flops

efmer (fred)

Re: Unified installer add flops

Jason G

Re: Unified installer add flops