Forum > GPU crunching
Unified installer add flops
efmer (fred):
--- Quote from: Jason G on 06 Dec 2009, 12:29:24 pm ---
--- Quote from: Fred M on 06 Dec 2009, 12:04:50 pm ---There is a flops value in the file, per application. But it can be the manually placed value or the system value.
And the original system value is about a factor 10 off from the one it should be on my system.
Another problem is that you have to be lucky enough, that there is a cuda task in the scheduler file.
--- End quote ---
Just thinking, another option is to obtain/calculate the values we need programmatically through CudaAPI.
Not sure what Boinc uses now, but IIRC the first releases derived a value from the clocks & number of multiprocessors.
I think it may be most reliable if we calcuate it ourselves and scale as required. Being independant of Boinc, it should be simpler than processing Boinc files that may be subject to change in content or backend value/meaning per Boinc version.
The other appeal of, to me, is that it should (at least partially) work for unknown/unlisted/unreleased cards, which might be an advantage over using lookup tables too ( less maintenance, i.e. not having to make a new release everytime nVidia releases/renames a card) ... it should also account for OC.
Thoughts ?
--- End quote ---
That's my preference, the direct approach. Using Boinc everything will be highly dependent on the BOINC version.
I've taken a quick look at the API and you can get the clock rate, and nr of processors etc. That may be enough to get an est. flop value.
Richard Haselgrove:
BOINC changed from 'real' flops to 'peak' flops with http://boinc.berkeley.edu/trac/changeset/19310, so you can see both versions there - they're inline functions in trunk/boinc/lib/coproc.h
Here are the fields BOINC knows about for a CUDA card. This is taken from the information sent by the BOINC client to every project indiscriminately whenever 'update' is clicked (or a scheduler contact for any other purpose). You don't have to be requesting or reporting a CUDA task - in fact, you don't need to be talking to a CUDA-capable project at all. The only ambiguous field is 'clockRate': I've checked with GPU-Z, and the value is correct for the shader clock (as it should be). BOINC rates this card as 484 GFLOPS peak.
--- Code: ---<coprocs>
<coproc_cuda>
<count>1</count>
<name>GeForce 9800 GTX/9800 GTX+</name>
<req_secs>25769.711159</req_secs>
<req_instances>0.000000</req_instances>
<estimated_delay>0.000000</estimated_delay>
<drvVersion>19038</drvVersion>
<cudaVersion>2030</cudaVersion>
<totalGlobalMem>536543232</totalGlobalMem>
<sharedMemPerBlock>16384</sharedMemPerBlock>
<regsPerBlock>8192</regsPerBlock>
<warpSize>32</warpSize>
<memPitch>262144</memPitch>
<maxThreadsPerBlock>512</maxThreadsPerBlock>
<maxThreadsDim>512 512 64</maxThreadsDim>
<maxGridSize>65535 65535 1</maxGridSize>
<totalConstMem>65536</totalConstMem>
<major>1</major>
<minor>1</minor>
<clockRate>1890000</clockRate>
<textureAlignment>256</textureAlignment>
<deviceOverlap>1</deviceOverlap>
<multiProcessorCount>16</multiProcessorCount>
</coproc_cuda>
</coprocs>
--- End code ---
Jason G:
Good to see both versions via the changeset, cheers (bookmarking) , might use one of those estimates scaled initially, then try whip up more accurate bench later down the road if we find that would compensate better for memory speeds etc.
efmer (fred):
These values look to come out of the cudaApi. But what does the BOINC client actually do with these values.
My card gives 596 GFlops and the value I found to be correct is 22 GFlops. The last one is in the flops statement in the xml.
A bit of a difference. And as already mentioned the values differ quite a bit in various BOINC versions.
I see 62 GFlops on a older BOINC client and that card is no more than 1/2 as slow as the other one. I found it to be about 14 GFlops.
So 596 = 22 and 62 = 14
It looks these values don't have much to do with the actual calculation speed. More like theoretical values out of the sales brochure.
Jason G:
--- Quote from: Fred M on 06 Dec 2009, 04:13:43 pm ---...
It looks these values don't have much to do with the actual calculation speed. More like theoretical values out of the sales brochure.
...
--- End quote ---
Absolutely.
[rant]
I actually got a little agitated when I installed Boinc 6.10.18 on my system & noticed it was claiming ~.~317GFlops 'peak' ... ( the words 'when hell freezes over' came to mind ;) Then I had a good laugh about it and felt much better). Of course with memory bound algorithms like larger FFT sizes, on that hardware, real world performance is more like 18-20GFLops, around twice that of each of my CPU Cores with the same problem. I've little doubt that kernels that do multiple redundant operations on the same data repeatedly, sitting in registers, on register sized (very small, ~8k total IIRC) datasets could acheive that kind of throughput ... The dumb thing is that sounds like graphics frame by frame processing more that general purpose computation[/rant]
I'm fairly certain the syntheic estimates shoudl be good enough, provided we scale the number appropriately to a realistic range... but there is the alternative of benching with real code if we find better accuracy is needed (which I doubt, but the option is there).
Jason
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version