Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (46/62) > >>

Jason G:
Well, to get that 30-50% speedup (1.5-2x) on the small GPU, we went a bit further than what the nVidia documentation specifies for efficient reductions, and the code 'looks nice' (a good sign in engineering)... Still the larger sizes to go, might have to send some notes back to nVidia after we finish this, to update the optimisation manual a bit  :o

Jason G:

--- Quote from: _heinz on 22 Dec 2010, 05:53:11 pm ---looks like a issue ?
--- End quote ---

Not 'our' problem  ;)  see what msi afterburner says (for memory),  Maybe they confuse ION & ION2, don't know

Jason G:
First post updated:

--- Quote ---Update: PowerSpectrum(+summax reduction) Test #8 - 'Sanity check'
- Check of all needed reduction sizes
- minimal changes to larger sizes, larger than selected thrds/blk is 'almost' stock (but a bit better)
- Looking for any hardware that could yield [BAD] instead of [OK] on some sizes, particularly around selected thrds/blk
- Don't need full results, just confirmation all [OK] & no Opt1 'worst case' slower than stock
- Intend to integrate FFTs next, so this is a critical sanity check.
- having all sizes it's a longer run, and may require several runs to see if a '[BAD]' will manifest.

--- End quote ---

Please test repeatedly on all Cuda enabled GPUs... No posting of results please (too large for me to look through, I'll go crosseyed  ;)), just confirm all Opt1 [OK] & faster at all sizes, And alert if you see and marked [BAD] or too slow, may need to run several times to see if a problem appears or not.

Jason

glennaxl:
All systems are go except....

gtx 295
core 0 - 1 bad at test 1/5 under 128 size
core 1 - 1 slow at test 2/5 under 128 size

gtx 260 - 1 slow at test 4/5 under 256 size

Jason G:

--- Quote from: glennaxl on 23 Dec 2010, 06:05:33 am ---All systems are go except....

gtx 295
core 0 - 1 bad at test 1/5 under 128 size
core 1 - 1 slow at test 2/5 under 128 size

gtx 260 - 1 slow at test 4/5 under 256 size

--- End quote ---

Thanks!  on the 295 is the Video memory OC'd ?  I found here the Opt1 around size #thrds/block(256 on Fermi, 128 on 2xx)  can be unstable if VRAM OC is pushed.  I had to back off my Video memory OC by 80MHz for it to stabilise

GTX260 - Please run that one a few times & see if that's consistently slower than stock at size 256.  Will be checking that code  in the meantime.
[Edit:] I see you did, & got one slow out of 5 ... OK


[Edit2:] Darn 128 still a little unstable here too  ???, will dial size 128 & 256 back & replace the test shortly (might be pushing a tad hard )
Jason

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version