+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: AVX Optimized App Development  (Read 131763 times)

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: AVX Optimized App Development
« Reply #45 on: 04 May 2011, 03:58:28 am »
For completion: first one boinc running, second one boinc suspended
The road to hell is paved with good intentions

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #46 on: 04 May 2011, 04:27:32 pm »
Here's the J34 results for my Atom N450 @1.66Mhz (5 runs with Boinc and 2 apps running, 5 runs with Boinc shut down)

Claggy
« Last Edit: 04 May 2011, 05:45:40 pm by Claggy »

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #47 on: 04 May 2011, 05:07:39 pm »
Here's the J34 run on my C2D T8100 @2.1GHz (5 runs with Boinc and one v7 r246 task running, one AP r409 task running and one collatz_mini 2.05 Cuda task running, then 5 runs of Boinc and apps shut down)

Claggy
« Last Edit: 05 May 2011, 03:53:42 am by Claggy »

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #48 on: 11 May 2011, 01:46:18 am »
Here's the latest version, including a fix for the AVX 8x4 transpose plus folding for AVX. The folding is as simple as I could make it, but there's no guarantee it's all correct even so. If it works I'm sure it can be improved.

I dropped the extra set of Transpose tests, there's enough data to thoroughly confuse me already. I may reactivate them later if I have some useful thoughts.
                                                                                                    Joe
Edit: Attachment deleted, newer version in later post.
« Last Edit: 12 May 2011, 09:07:24 pm by Josef W. Segur »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: AVX Optimized App Development
« Reply #49 on: 11 May 2011, 01:54:42 am »
All legacy functions... err.. functioning.

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: AVX Optimized App Development
« Reply #50 on: 11 May 2011, 02:13:11 am »
Code: [Select]
=========================================================
Ftst_v7_J34 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000480 0.00000  test
             v_vGetPowerSpectrum 0.000301 0.00000  test
            v_vGetPowerSpectrum2 0.000327 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000314 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000294 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000294 0.00000  choice

                     v_ChirpData 0.019478 0.00000  test
                   fpu_ChirpData 0.025356 0.00000  test
               fpu_opt_ChirpData 0.015757 0.00000  test
             v_vChirpData_x86_64 0.079464 0.00000  test
               sse1_ChirpData_ak 0.011689 0.00000  test
               sse2_ChirpData_ak 0.011893 0.00000  test
              sse2_ChirpData_ak8 0.008098 0.00000  test
               sse3_ChirpData_ak 0.011029 0.00000  test
              sse2_ChirpData_ak8 0.008098 0.00000  choice

                     v_Transpose 0.041660 0.00000  test
                    v_Transpose2 0.025839 0.00000  test
                    v_Transpose4 0.012987 0.00000  test
                    v_Transpose8 0.020351 0.00000  test
                  v_pfTranspose2 0.025092 0.00000  test
                  v_pfTranspose4 0.012726 0.00000  test
                  v_pfTranspose8 0.019991 0.00000  test
                   v_vTranspose4 0.012808 0.00000  test
                 v_vTranspose4np 0.013273 0.00000  test
                v_vTranspose4ntw 0.008225 0.00000  test
              v_vTranspose4x8ntw 0.008911 0.00000  test
             v_vTranspose4x16ntw 0.007548 0.00000  test
            v_vpfTranspose8x4ntw 0.008831 0.00000  test
             v_vTranspose4x16ntw 0.007548 0.00000  choice

                 FPU opt folding 0.003467 0.00000  test
                  AK SSE folding 0.001317 0.00000  test
                  BH SSE folding 0.001285 0.00000  test
                  BH SSE folding 0.001285 0.00000  choice

                   Test duration     6.44 seconds

Ftst_v7 completed successfully.

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: AVX Optimized App Development
« Reply #51 on: 11 May 2011, 04:15:50 am »
1st boinc running 2nd snoozed
The road to hell is paved with good intentions

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #52 on: 11 May 2011, 05:48:03 am »
@arkayn your posted stderr.txt says Ftst_v7_J34 and not J37

Claggy

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #53 on: 11 May 2011, 05:51:12 am »
Here'a run on my C2D E8500 @4.14GHz with J37 (5 times with Boinc and apps running and 5 times with Boinc shut down)

Claggy
« Last Edit: 11 May 2011, 01:49:17 pm by Claggy »

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: AVX Optimized App Development
« Reply #54 on: 11 May 2011, 12:16:05 pm »
Did some reading about AVX and checked its output with this Test-file.

Whithout BOINC running: sterr.txt :

=========================================================
Ftst_v7_J34 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000105 0.00000  test
             v_vGetPowerSpectrum 0.000052 0.00000  test
            v_vGetPowerSpectrum2 0.000063 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000049 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000066 0.00000  test
           v_avxGetPowerSpectrum 0.000043 0.00000  test
           v_avxGetPowerSpectrum 0.000043 0.00000  choice

                     v_ChirpData 0.005899 0.00000  test
                   fpu_ChirpData 0.010711 0.00000  test
               fpu_opt_ChirpData 0.005305 0.00000  test
             v_vChirpData_x86_64 0.051195 0.00000  test
               sse1_ChirpData_ak 0.006250 0.00000  test
               sse2_ChirpData_ak 0.005789 0.00000  test
              sse2_ChirpData_ak8 0.003679 0.00000  test
               sse3_ChirpData_ak 0.005621 0.00000  test
                 avx_ChirpData_a 0.001884 0.00000  test
                 avx_ChirpData_b 0.002139 0.00000  test
                 avx_ChirpData_a 0.001884 0.00000  choice

                     v_Transpose 0.002753 0.00000  test
                    v_Transpose2 0.002947 0.00000  test
                    v_Transpose4 0.001516 0.00000  test
                    v_Transpose8 0.002775 0.00000  test
                  v_pfTranspose2 0.001659 0.00000  test
                  v_pfTranspose4 0.001586 0.00000  test
                  v_pfTranspose8 0.002802 0.00000  test
                   v_vTranspose4 0.000915 0.00000  test
                 v_vTranspose4np 0.001169 0.00000  test
                v_vTranspose4ntw 0.007690 0.00000  test
              v_vTranspose4x8ntw 0.003222 0.00000  test
             v_vTranspose4x16ntw 0.000900 0.00000  test
            v_vpfTranspose8x4ntw 0.007704 0.00000  test
            v_avxTranspose4x8ntw 0.003195 0.00000  test
           v_avxTranspose4x16ntw 0.000817 0.00000  test
            v_avxTranspose8x4ntw 0.007712 0.00000  test
          v_avxTranspose8x8ntw_a 0.002666 0.00000  test
          v_avxTranspose8x8ntw_b 0.003011 0.00000  test
           v_avxTranspose4x16ntw 0.000817 0.00000  choice

                 FPU opt folding 0.002047 0.00000  test
                  AK SSE folding 0.000464 0.00000  test
                  BH SSE folding 0.000451 0.00000  test
                  JS AVX folding 0.000405 0.00000  test
                  JS AVX folding 0.000405 0.00000  choice

                   Test duration     2.90 seconds

Ftst_v7 completed successfully.

With BOINC (6.10.60 X64)(i7-2600 + 2x HD5870) 8x MB (CPU)+ 4 ATi  MB rev.177 or AP rev.524.

=========================================================
Ftst_v7_J34 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000105 0.00000  test
             v_vGetPowerSpectrum 0.000052 0.00000  test
            v_vGetPowerSpectrum2 0.000063 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000049 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000066 0.00000  test
           v_avxGetPowerSpectrum 0.000043 0.00000  test
           v_avxGetPowerSpectrum 0.000043 0.00000  choice

                     v_ChirpData 0.005899 0.00000  test
                   fpu_ChirpData 0.010711 0.00000  test
               fpu_opt_ChirpData 0.005305 0.00000  test
             v_vChirpData_x86_64 0.051195 0.00000  test
               sse1_ChirpData_ak 0.006250 0.00000  test
               sse2_ChirpData_ak 0.005789 0.00000  test
              sse2_ChirpData_ak8 0.003679 0.00000  test
               sse3_ChirpData_ak 0.005621 0.00000  test
                 avx_ChirpData_a 0.001884 0.00000  test
                 avx_ChirpData_b 0.002139 0.00000  test
                 avx_ChirpData_a 0.001884 0.00000  choice

                     v_Transpose 0.002753 0.00000  test
                    v_Transpose2 0.002947 0.00000  test
                    v_Transpose4 0.001516 0.00000  test
                    v_Transpose8 0.002775 0.00000  test
                  v_pfTranspose2 0.001659 0.00000  test
                  v_pfTranspose4 0.001586 0.00000  test
                  v_pfTranspose8 0.002802 0.00000  test
                   v_vTranspose4 0.000915 0.00000  test
                 v_vTranspose4np 0.001169 0.00000  test
                v_vTranspose4ntw 0.007690 0.00000  test
              v_vTranspose4x8ntw 0.003222 0.00000  test
             v_vTranspose4x16ntw 0.000900 0.00000  test
            v_vpfTranspose8x4ntw 0.007704 0.00000  test
            v_avxTranspose4x8ntw 0.003195 0.00000  test
           v_avxTranspose4x16ntw 0.000817 0.00000  test
            v_avxTranspose8x4ntw 0.007712 0.00000  test
          v_avxTranspose8x8ntw_a 0.002666 0.00000  test
          v_avxTranspose8x8ntw_b 0.003011 0.00000  test
           v_avxTranspose4x16ntw 0.000817 0.00000  choice

                 FPU opt folding 0.002047 0.00000  test
                  AK SSE folding 0.000464 0.00000  test
                  BH SSE folding 0.000451 0.00000  test
                  JS AVX folding 0.000405 0.00000  test
                  JS AVX folding 0.000405 0.00000  choice

                   Test duration     2.90 seconds

Ftst_v7 completed successfully.
=========================================================
Ftst_v7_J34 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000234 0.00000  test
             v_vGetPowerSpectrum 0.000105 0.00000  test
            v_vGetPowerSpectrum2 0.000100 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000082 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000098 0.00000  test
           v_avxGetPowerSpectrum 0.000061 0.00000  test
           v_avxGetPowerSpectrum 0.000061 0.00000  choice

                     v_ChirpData 0.011899 0.00000  test
                   fpu_ChirpData 0.019045 0.00000  test
               fpu_opt_ChirpData 0.012640 0.00000  test
             v_vChirpData_x86_64 0.063979 0.00000  test
               sse1_ChirpData_ak 0.010132 0.00000  test
               sse2_ChirpData_ak 0.009260 0.00000  test
              sse2_ChirpData_ak8 0.006961 0.00000  test
               sse3_ChirpData_ak 0.008636 0.00000  test
                 avx_ChirpData_a 0.003490 0.00000  test
                 avx_ChirpData_b 0.003833 0.00000  test
                 avx_ChirpData_a 0.003490 0.00000  choice

                     v_Transpose 0.007700 0.00000  test
                    v_Transpose2 0.004792 0.00000  test
                    v_Transpose4 0.008537 0.00000  test
                    v_Transpose8 0.014129 0.00000  test
                  v_pfTranspose2 0.015112 0.00000  test
                  v_pfTranspose4 0.011302 0.00000  test
                  v_pfTranspose8 0.012998 0.00000  test
                   v_vTranspose4 0.002625 0.00000  test
                 v_vTranspose4np 0.005798 0.00000  test
                v_vTranspose4ntw 0.008330 0.00000  test
              v_vTranspose4x8ntw 0.004689 0.00000  test
             v_vTranspose4x16ntw 0.002755 0.00000  test
            v_vpfTranspose8x4ntw 0.008488 0.00000  test
            v_avxTranspose4x8ntw 0.003759 0.00000  test
           v_avxTranspose4x16ntw 0.002249 0.00000  test
            v_avxTranspose8x4ntw 0.008294 0.00000  test
          v_avxTranspose8x8ntw_a 0.003551 0.00000  test
          v_avxTranspose8x8ntw_b 0.004706 0.00000  test
           v_avxTranspose4x16ntw 0.002249 0.00000  choice

                 FPU opt folding 0.003407 0.00000  test
                  AK SSE folding 0.000878 0.00000  test
                  BH SSE folding 0.000816 0.00000  test
                  JS AVX folding 0.000656 0.00000  test
                  JS AVX folding 0.000656 0.00000  choice

                   Test duration     5.03 seconds

Ftst_v7 completed successfully.

« Last Edit: 11 May 2011, 12:42:40 pm by Fredericx51 »

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: AVX Optimized App Development
« Reply #55 on: 11 May 2011, 01:14:33 pm »
@arkayn your posted stderr.txt says Ftst_v7_J34 and not J37

Claggy
So does yours and everyone else, I am guessing that Joe forgot to change that number.

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #56 on: 11 May 2011, 04:50:49 pm »
@arkayn your posted stderr.txt says Ftst_v7_J34 and not J37

Claggy
So does yours and everyone else, I am guessing that Joe forgot to change that number.

'Tis true. Luckily having only one set of transposes serves well to distinguish J37 from J34.

The test outputs are appreciated, I'd hate to break something and not know it because of the limited systems I have.

Did some reading about AVX and checked its output with this Test-file.
...

Thanks very much, it seems I've reached the initial goal of having working avx variants for all the tested categories. I'll do some tweaking to see if I can improve performance.
                                                                                            Joe

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: AVX Optimized App Development
« Reply #57 on: 11 May 2011, 05:30:05 pm »
Quote
Pulse finding is another area which is more limited by processing speed than memory access, though, and it also accounts for a sizable fraction of run time.

On GPU pulse finding looks memory-bound...
Maybe GaussFit ? It's compute intensive search.

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #58 on: 12 May 2011, 09:04:54 pm »
Here's another test, added a third variant AVX chirp which might be marginally faster and a second variant AVX folding set, ditto. AFAIK no changes to the non-AVX code other than making sure this one identifies itself correctly  :D
                                                                                                 Joe
Edit: Attachment deleted, newer version in later post.
« Last Edit: 14 May 2011, 09:01:39 pm by Josef W. Segur »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: AVX Optimized App Development
« Reply #59 on: 12 May 2011, 09:28:16 pm »
... AFAIK no changes to the non-AVX code other than making sure this one identifies itself correctly ...
  Checked anyway, All good including:
Quote
Ftst_v7_J39 started.

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 183
Total: 183
Powered by EzPortal