Forum > Discussion Forum

AVX Optimized App Development

<< < (6/33) > >>

Jason G:
Similar result here on the E8400 (of course).  Darn, now I'm CPU shopping  ::)

Josef W. Segur:

--- Quote from: arkayn on 28 Apr 2011, 09:44:19 pm ---Runs fine on my Q8200
...
--- End quote ---

Thanks, that's a better basis for comparison since it includes the SSE3 chirp which 'most everyone will see. And although I'm not particularly concerned about the 13 lines of assembly code which checks CPU and OS to decide whether AVX is supported, confirmation that Win7 SP1 by itself isn't enough is good.
                                                                                                 Joe

Josef W. Segur:
From dnolan via PM at NC, result on his i7 2600 w/W7 64 SP1:


--- Code: ---Ftst_v7 started.
 
Optimal function choices:
-------------------------------------------------------
                            name  timing   error
-------------------------------------------------------
                v_BaseLineSmooth (no other)
 
              v_GetPowerSpectrum 0.00010 0.00000  test
             v_vGetPowerSpectrum 0.00005 0.00000  test
            v_vGetPowerSpectrum2 0.00006 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.00005 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.00007 0.00000  test
           v_avxGetPowerSpectrum 0.00004 38.07197  test
     v_vGetPowerSpectrumUnrolled 0.00005 0.00000  choice
 
                     v_ChirpData 0.00444 0.00000  test
                   fpu_ChirpData 0.01053 0.00000  test
               fpu_opt_ChirpData 0.00444 0.00000  test
             v_vChirpData_x86_64 0.05060 0.00000  test
               sse1_ChirpData_ak 0.00590 0.00000  test
               sse2_ChirpData_ak 0.00567 0.00000  test
               sse3_ChirpData_ak 0.00556 0.00000  test
                 avx_ChirpData_a 0.00230 0.85637  test
                 avx_ChirpData_b 0.00231 0.85637  test
                     v_ChirpData 0.00444 0.00000  choice
 
                     v_Transpose 0.00270 0.00000  test
                    v_Transpose2 0.00292 0.00000  test
                    v_Transpose4 0.00149 0.00000  test
                    v_Transpose8 0.00271 0.00000  test
                  v_pfTranspose2 0.00161 0.00000  test
                  v_pfTranspose4 0.00149 0.00000  test
                  v_pfTranspose8 0.00313 0.00000  test
                   v_vTranspose4 0.00088 0.00000  test
                 v_vTranspose4np 0.00114 0.00000  test
                v_vTranspose4ntw 0.00716 0.00000  test
              v_vTranspose4x8ntw 0.00298 0.00000  test
             v_vTranspose4x16ntw 0.00085 0.00000  test
            v_vpfTranspose8x4ntw 0.00719 0.00000  test
            v_avxTranspose8x4ntw 0.00299 0.00000  test
            v_avxTranspose8x8ntw 0.00232 9696326.77324  test
             v_vTranspose4x16ntw 0.00085 0.00000  choice
 
                 FPU opt folding 0.00204 0.00000  test
                  AK SSE folding 0.00045 0.00000  test
                  BH SSE folding 0.00043 0.00000  test
                  BH SSE folding 0.00043 0.00000  choice
 
                   Test duration    2.53 seconds
 
Ftst_v7 completed successfully.
--- End code ---

Nice speedups on the Chirp functions, but I obviously need to rework data shuffling.
                                                                                                       Joe

Jason G:

--- Quote from: Josef W. Segur on 29 Apr 2011, 11:17:30 am ---Nice speedups on the Chirp functions, but I obviously need to rework data shuffling.
--- End quote ---

Numbered bottlecaps help with that for me.  Good to see some hints that with work the architecture additions may perform very well.

Jason

Claggy:

--- Quote from: Jason G on 28 Apr 2011, 10:16:04 pm ---Similar result here on the E8400 (of course).  Darn, now I'm CPU shopping  ::)


--- End quote ---
This is what an E8500 @ 4.14GHz gets (with Boinc, v7 Seti Beta CPU apps, an NV Seti Cuda MB app and an ATI OpenCL Seti MB app running)(ran it 5 times):


--- Code: ---Ftst_v7 started.

Optimal function choices:
-------------------------------------------------------
                            name  timing   error
-------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.00013 0.00000  test
             v_vGetPowerSpectrum 0.00006 0.00000  test
            v_vGetPowerSpectrum2 0.00006 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.00005 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.00006 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.00005 0.00000  choice

                     v_ChirpData 0.03146 0.00000  test
                   fpu_ChirpData 0.01685 0.00000  test
               fpu_opt_ChirpData 0.02659 0.00000  test
             v_vChirpData_x86_64 0.04977 0.00000  test
               sse1_ChirpData_ak 0.00881 0.00000  test
               sse2_ChirpData_ak 0.00886 0.00000  test
               sse3_ChirpData_ak 0.00829 0.00000  test
               sse3_ChirpData_ak 0.00829 0.00000  choice

                     v_Transpose 0.00389 0.00000  test
                    v_Transpose2 0.00476 0.00000  test
                    v_Transpose4 0.00464 0.00000  test
                    v_Transpose8 0.01212 0.00000  test
                  v_pfTranspose2 0.00397 0.00000  test
                  v_pfTranspose4 0.00477 0.00000  test
                  v_pfTranspose8 0.01263 0.00000  test
                   v_vTranspose4 0.00396 0.00000  test
                 v_vTranspose4np 0.00585 0.00000  test
                v_vTranspose4ntw 0.00690 0.00000  test
              v_vTranspose4x8ntw 0.00649 0.00000  test
             v_vTranspose4x16ntw 0.00532 0.00000  test
            v_vpfTranspose8x4ntw 0.00568 0.00000  test
                     v_Transpose 0.00389 0.00000  choice

                 FPU opt folding 0.00194 0.00000  test
                  AK SSE folding 0.00072 0.00000  test
                  BH SSE folding 0.00071 0.00000  test
                  BH SSE folding 0.00071 0.00000  choice

                   Test duration    4.21 seconds

Ftst_v7 completed successfully.
--- End code ---

Claggy

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version