Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (39/62) > >>

Frizz:

--- Quote from: Jason G on 08 Dec 2010, 08:51:55 am ---probably do Batman really well though  ::)

--- End quote ---

LOL

Yeah ... memory. They chopped the memory interface. I guess they did this so it's not getting to close to the 580 - and not to far ahead of the 480.

GTX570: 320bit
GTX480 & GTX580: 384bit

Jason G:
@All:  In the meantime, having identified the major issues at pllay with these code areas, along with appropriate techniques to use,  I have come up with some ideas for a major redesign of the FFT->Powerspectrum->Summax(reduction)->FindSpikes pipeline, which currently accounts for around ~40%-60% of processing. 

I'll change the format of the next test quite a bit, and spend time tomorrow to get things underway toward #7.

Jason

Raistmer:
I would suggest to  test these samples of code at different GPU freq to mem freq ratios.
SubSpace's experiment with beta OpenCL apps showed that it's very informative approach.
(He established that HD5 wins over usual OpenCL MB if GPU engine is relative fast and memory relative slow, while if GPU clocks lowed usual app wins).
I think it's quite explains why other testers see bigger execution time on VLAR for HD5 than for usual app - their GPUs not so fast relative their memory.
Memory influence can be quite highlighted this way.

Jason G:
Yes, in fact that's exactly what happened to confirm memory bound nature of what's going on, ( from yet another angle ).

Steve's 480 core is clocked considerably higher than mine, yet he was initially achieving lower throughput than my card.  He tweaked his memory throughput for some improvement. 

After that, a discrepancy between throughput on XP Vs Win7 was then noted, somewhere around the familiar 10% difference.  I added use of pinned memory for the transfers, to try hide them.  With Ghost's help, In the heavy transfer case ( worst case full summax array copy, as with stock code) the XP-Win7 difference was narrowed to ~4% or less, while the WDDM performance proved more efficient with the raw processing in best case (No transfers needed)

Now Steve's 480 achieves some 27% more throughput, in the worst case,  than mine does.  I take this as an indication that the transfer hiding is shifting the bottleneck around as intended, and that it's time to move on to more sophisticated code portions with the acquired tools & techniques.

Still learning stuff every day with these things.

Jason

SciManStev:
I have kicked my 480 memory speed up to 1975 MHz, with plenty of room to go. The 480 cores are clocked at 860 MHz. I tried to increase my CPU memory, but 1774 MHz is as fast as I can get it. I was able to increase my CPU speed to 4.26 GHz with hypethreading enabled, while maintaining about 57°C to 60°C core temps.

Steve

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version