Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (31/62) > >>

Jason G:
@Heinz, something broke in that source you used, investigating.

Ghost0210:
I've been playing with a couple of other versions of drivers (263.xx & 256.xx) as well and there is no improvement over the current 260.99 WHQL release drivers figures.
Was worth doing this just to get an XP machine up and running again - although I'm struggling to remember where anything is.....

Jason G:

--- Quote from: Ghost on 04 Dec 2010, 04:44:04 pm ---Was worth doing this just to get an XP machine up and running again - although I'm struggling to remember where anything is.....

--- End quote ---

Yep going back is a challenge after adapting.  Now that I'm pretty confident the memory transfers are the main factor, I'm hopeful a certain 'trick' may squash the difference.  We'll see.

[Edit:] Updated first post:

--- Quote ---Update: powerspectrum Test 6, pinned memory
- does it improve 'worst case' optimisation on WDDM versus XPDM ?
- or does it improve on both OSes the same ? (or neither,  Test5 remains for comparison)
--- End quote ---

Will use pinned memory, for Opt1, on GPUs that can do so.

Ghost0210:
Hi Jason,

Getting an error with the new build saying that cudart_32_32_7.dll isn't present - is this meant to be in the .7z file?

ghost

arkayn:
Just to see if it would run, I made a copy of the cudart32_32_16.dll, renamed it to cudart32_32_7.dll and then ran the test

Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   12.9 GFlops   51.4 GB/s   0.0ulps

 SumMax (    64)    1.0 GFlops    4.4 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    3.4 GFlops   13.6 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       19.4 GFlops   77.5 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  256 threads, fftlen 64: (worst case: full summax copy)
         6.0 GFlops   24.4 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
         7.0 GFlops   28.2 GB/s 121.7ulps

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version