+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: AVX Optimized App Development  (Read 124062 times)

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: AVX Optimized App Development
« Reply #75 on: 18 May 2011, 10:27:44 am »
J43, run 5 times with Firefox open with 4 pages, 5 with BOINC (6.10.60.;64BIT) and
5 with all apps and BOINC off.

I was interrupted and did 20 runs, 2 x 5 runs with no aplications running, runtimes with BOINC are more then doubled.




« Last Edit: 18 May 2011, 10:50:09 am by Fredericx51 »

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: AVX Optimized App Development
« Reply #76 on: 19 May 2011, 12:07:01 pm »
Run from both of my machines.

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #77 on: 21 May 2011, 02:12:17 pm »
Attaching the J45 version, tweaked avx_ChirpData_c and added avx_ChirpData_d which in theory should be faster (but not by a huge amount). Changes in the non-AVX code are minimal.
                                                                                                 Joe

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #78 on: 22 May 2011, 06:31:22 am »
Here's a run of J45 on my E8500 (5 runs with Boinc and 4 apps running)

Claggy
« Last Edit: 22 May 2011, 06:33:23 am by Claggy »

Offline Vyper

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 376
Re: AVX Optimized App Development
« Reply #79 on: 04 Jun 2011, 01:34:20 pm »
Here is my results..

I7 2600K @ 4.2Ghz , first result with boinc running, second idle.

//Vyper

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #80 on: 04 Jun 2011, 05:26:18 pm »
The testing is appreciated. Lest anyone think I've lost interest I'll just note that there have been other things requiring my attention, and I hope to get back to the AVX stuff in ten days or so.

The functions from the J45 test have been contributed to the project, so I hope the v7 CPU application will be rebuilt to include them before release to main. For the next stage I hope to refactor at least the chirping into our AK_v8 based code, and probably some details into other functions too. I think eventually we'll be transitioning from many CPU versions to perhaps just 2 Windows versions (32 and 64 bit) with inbuilt dispatch for various capabilities, but that will take a fair amount of work. Shorter term it may be sensible to add an AVX build to the selection when we populate an installer for the v7 transition to main.
                                                                                       Joe

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #81 on: 08 Jun 2011, 07:36:01 pm »
Here's the J45 run on my 32bit XP3200/HD4650/8400 GS machine, (5 runs of Boinc running, one AKV8_P3 app running, one Collatz ATI 2.09 app running and one Collatz Cuda 2.05 app running, and 5 runs with all that shut down)

Claggy
« Last Edit: 08 Jun 2011, 08:04:13 pm by Claggy »

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: AVX Optimized App Development
« Reply #82 on: 16 Jun 2011, 07:17:58 pm »
A bit off topic, but on my GTX470 rig. I can add an HD5770, which I still have.

But AVX development put on 'hold'  ;) , just curious and very restickted at home(walking with an IV-Drain and standard
with 2 bags) isn't the best way trying to do something sometimes....................

Just Installed the 'new' v0.38 installer on my i7-2600+ 2x HD5870. First have to see and probably adjust some cmd-line parameters, also (trying to) run more then 1 at a time.

 Did the same for WIN XP64 using BOINC 6.12.26, had troubles installing on my WIN7 box? So I reverted to 6.10.60.
Well I'm gonna try the new V0.38!?

« Last Edit: 16 Jun 2011, 07:22:26 pm by Fredericx51 »

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: AVX Optimized App Development
« Reply #83 on: 17 Jul 2011, 12:05:59 pm »
Are you guys already experimenting with FFTW 3.3?

Quote from FFTWs homepage: " ... Now available is the beta1 release for FFTW 3.3, scheduled for final release on July 25, 2011. ... The main new features in 3.3 are support for 256-bit AVX instructions on x86 processors, ...".
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #84 on: 17 Jul 2011, 06:22:07 pm »
Are you guys already experimenting with FFTW 3.3?

Quote from FFTWs homepage: " ... Now available is the beta1 release for FFTW 3.3, scheduled for final release on July 25, 2011. ... The main new features in 3.3 are support for 256-bit AVX instructions on x86 processors, ...".

Thanks, I'd missed that. There's not much I can do with it, the test package doesn't use FFTs at all. I suppose I could graft something on, but the prebuilt Windows DLL downloads from http://fftw.org/install/windows.html do include a test program. If someone with a Sandy Bridge was interested it would be possible to compare 3.3-beta1 to 3.2.2 that way. It's a command line interface program with a lot of flexibility, if there's interest I'll advise, etc.

In the release notes they say "The AVX code works with 16-byte alignment (as opposed to 32-byte alignment), so there is no ABI change compared to FFTW 3.2.2.". With the Sandy Bridge architecture it's true that is just about as fast, and the same should be true on Bulldozer. But I chose to go the other way and do full 32-byte alignment to be better prepared for future changes, I expect either Intel or AMD will graduate to full 256 bit execution units rather than paired 128 bit units within a year or two. I could be wrong, the shift from 64 bit to 128 bit took longer than that.
                                                       Joe

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #85 on: 09 Mar 2012, 03:45:02 pm »
We're now working toward releases using FFTW 3.3 and/or recent versions of IPP with AVX support for doing FFTs for both Astropulse v6 and SETI@home v7. At least some of the AVX routines developed for this thread should go in too, with some modifications.

Meanwhile there have been some AMD Bulldozer CPUs with AVX support released. I'd like to see some test runs of J45 (from my earlier post) on those CPUs, the AVX implementation is likely to respond differently than Intel's. Win 7 SP1 / Windows Server 2008 R2 or later is still needed, MS will certainly not backport the changes needed for OS support to Vista or XP.

The AMD optimization manuals do hint at some of the ways their implementation differs, I might try some further variations if the tests indicate a need.
                                                          Joe

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: AVX Optimized App Development
« Reply #86 on: 09 Mar 2012, 04:48:41 pm »
FX-4100

=========================================================
Ftst_v7_J45 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000165 0.00000  test
             v_vGetPowerSpectrum 0.000064 0.00000  test
            v_vGetPowerSpectrum2 0.000072 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000052 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000065 0.00000  test
           v_avxGetPowerSpectrum 0.000089 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000052 0.00000  choice

                     v_ChirpData 0.009393 0.00000  test
                   fpu_ChirpData 0.017479 0.00000  test
               fpu_opt_ChirpData 0.009286 0.00000  test
             v_vChirpData_x86_64 0.053607 0.00000  test
               sse1_ChirpData_ak 0.010594 0.00000  test
             sse1_ChirpData_ak8e 0.007210 0.00000  test
             sse1_ChirpData_ak8h 0.007588 0.00000  test
               sse2_ChirpData_ak 0.007590 0.00000  test
              sse2_ChirpData_ak8 0.004558 0.00000  test
               sse3_ChirpData_ak 0.007020 0.00000  test
              sse3_ChirpData_ak8 0.004675 0.00000  test
                 avx_ChirpData_a 0.003773 0.00000  test
                 avx_ChirpData_b 0.003815 0.00000  test
                 avx_ChirpData_c 0.004179 0.00000  test
                 avx_ChirpData_d 0.003993 0.00000  test
                 avx_ChirpData_a 0.003773 0.00000  choice

                     v_Transpose 0.009390 0.00000  test
                    v_Transpose2 0.004041 0.00000  test
                    v_Transpose4 0.004842 0.00000  test
                    v_Transpose8 0.008043 0.00000  test
                  v_pfTranspose2 0.003915 0.00000  test
                  v_pfTranspose4 0.003701 0.00000  test
                  v_pfTranspose8 0.007678 0.00000  test
                   v_vTranspose4 0.002108 0.00000  test
                 v_vTranspose4np 0.002115 0.00000  test
                v_vTranspose4ntw 0.006964 0.00000  test
              v_vTranspose4x8ntw 0.003505 0.00000  test
             v_vTranspose4x16ntw 0.002523 0.00000  test
            v_vpfTranspose8x4ntw 0.006799 0.00000  test
            v_avxTranspose4x8ntw 0.003511 0.00000  test
           v_avxTranspose4x16ntw 0.002197 0.00000  test
            v_avxTranspose8x4ntw 0.006822 0.00000  test
          v_avxTranspose8x8ntw_a 0.003503 0.00000  test
          v_avxTranspose8x8ntw_b 0.003660 0.00000  test
                   v_vTranspose4 0.002108 0.00000  choice

                 FPU opt folding 0.003458 0.00000  test
                  AK SSE folding 0.000858 0.00000  test
                  BH SSE folding 0.000726 0.00000  test
                JS AVX_a folding 0.000736 0.00000  test
                JS AVX_c folding 0.000820 0.00000  test
                  BH SSE folding 0.000726 0.00000  choice

                   Test duration     4.63 seconds

Ftst_v7 completed successfully.

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
Re: AVX Optimized App Development
« Reply #87 on: 11 Mar 2012, 10:14:58 am »
AMD FX-8150 @4.3GHz

=========================================================
Ftst_v7_J45 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000138 0.00000  test
             v_vGetPowerSpectrum 0.000053 0.00000  test
            v_vGetPowerSpectrum2 0.000059 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000042 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000054 0.00000  test
           v_avxGetPowerSpectrum 0.000066 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000042 0.00000  choice

                     v_ChirpData 0.007723 0.00000  test
                   fpu_ChirpData 0.014326 0.00000  test
               fpu_opt_ChirpData 0.007722 0.00000  test
             v_vChirpData_x86_64 0.043058 0.00000  test
               sse1_ChirpData_ak 0.006840 0.00000  test
             sse1_ChirpData_ak8e 0.005882 0.00000  test
             sse1_ChirpData_ak8h 0.005998 0.00000  test
               sse2_ChirpData_ak 0.006243 0.00000  test
              sse2_ChirpData_ak8 0.003763 0.00000  test
               sse3_ChirpData_ak 0.005858 0.00000  test
              sse3_ChirpData_ak8 0.003847 0.00000  test
                 avx_ChirpData_a 0.003160 0.00000  test
                 avx_ChirpData_b 0.003138 0.00000  test
                 avx_ChirpData_c 0.003387 0.00000  test
                 avx_ChirpData_d 0.003302 0.00000  test
                 avx_ChirpData_b 0.003138 0.00000  choice

                     v_Transpose 0.007775 0.00000  test
                    v_Transpose2 0.003264 0.00000  test
                    v_Transpose4 0.003892 0.00000  test
                    v_Transpose8 0.006481 0.00000  test
                  v_pfTranspose2 0.003249 0.00000  test
                  v_pfTranspose4 0.003022 0.00000  test
                  v_pfTranspose8 0.006095 0.00000  test
                   v_vTranspose4 0.001745 0.00000  test
                 v_vTranspose4np 0.001746 0.00000  test
                v_vTranspose4ntw 0.005688 0.00000  test
              v_vTranspose4x8ntw 0.003089 0.00000  test
             v_vTranspose4x16ntw 0.002110 0.00000  test
            v_vpfTranspose8x4ntw 0.005689 0.00000  test
            v_avxTranspose4x8ntw 0.003090 0.00000  test
           v_avxTranspose4x16ntw 0.001814 0.00000  test
            v_avxTranspose8x4ntw 0.005711 0.00000  test
          v_avxTranspose8x8ntw_a 0.003089 0.00000  test
          v_avxTranspose8x8ntw_b 0.003108 0.00000  test
                   v_vTranspose4 0.001745 0.00000  choice

                 FPU opt folding 0.002837 0.00000  test
                  AK SSE folding 0.000670 0.00000  test
                  BH SSE folding 0.000603 0.00000  test
                JS AVX_a folding 0.000613 0.00000  test
                JS AVX_c folding 0.000682 0.00000  test
                  BH SSE folding 0.000603 0.00000  choice

                   Test duration     3.66 seconds

Ftst_v7 completed successfully.
A smile is the shortest distance between two peoble (Victor Borge).

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #88 on: 06 May 2012, 12:29:32 am »
I did sort of a survey of Beta hosts running stock S@H v7 with Bulldozer and Sandy Bridge CPUs to see which chirp variants were chosen most. Bulldozer were about 8% a, 34% b, 10% c, and 49% d. Sandy Bridge were about 13% a, 9% b, 9% c, and 69% d. That was for 277 results on Bulldozer and 296 on Sandy Bridge, so may be at least roughly meaningful.

I'll attach a J46 version of the test which has two added chirp variants which might possibly be even better than the d which was obviously the previous best. I left out the other kinds of functions this time, haven't figured out any significant improvements for those. But each run of the program now does the chirp tests 3 times.

What I'm aiming at, short term, is one best AVX chirp function which can be put into the existing Lunatics CPU code for a targeted AVX build. Hopefully we'll be able to use some dispatch functionality in future to keep the number of different builds down, but that's not ready yet.

Edit: attachment removed, see later posts for current chirp only version.
Joe
« Last Edit: 07 May 2012, 01:47:48 pm by Josef W. Segur »

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: AVX Optimized App Development
« Reply #89 on: 06 May 2012, 12:39:50 am »
FX-4100

=========================================================
Ftst_v7_J46_Chirponly started.

Ignored: j46.txt
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                     v_ChirpData 0.009781 0.00000  test
                   fpu_ChirpData 0.017815 0.00000  test
               fpu_opt_ChirpData 0.009309 0.00000  test
               sse1_ChirpData_ak 0.008972 0.00000  test
             sse1_ChirpData_ak8e 0.007175 0.00000  test
             sse1_ChirpData_ak8h 0.007744 0.00000  test
               sse2_ChirpData_ak 0.007798 0.00000  test
              sse2_ChirpData_ak8 0.004583 0.00000  test
               sse3_ChirpData_ak 0.007205 0.00000  test
              sse3_ChirpData_ak8 0.004993 0.00000  test
                 avx_ChirpData_a 0.003887 0.00000  test
                 avx_ChirpData_b 0.004019 0.00000  test
                 avx_ChirpData_c 0.004198 0.00000  test
                 avx_ChirpData_d 0.004099 0.00000  test
                 avx_ChirpData_e 0.004393 0.00000  test
                 avx_ChirpData_f 0.003727 0.00000  test
                 avx_ChirpData_f 0.003727 0.00000  choice

            Second run

                     v_ChirpData 0.009942 0.00000  test
                   fpu_ChirpData 0.017879 0.00000  test
               fpu_opt_ChirpData 0.009913 0.00000  test
               sse1_ChirpData_ak 0.008894 0.00000  test
             sse1_ChirpData_ak8e 0.007540 0.00000  test
             sse1_ChirpData_ak8h 0.007731 0.00000  test
               sse2_ChirpData_ak 0.007974 0.00000  test
              sse2_ChirpData_ak8 0.004620 0.00000  test
               sse3_ChirpData_ak 0.007186 0.00000  test
              sse3_ChirpData_ak8 0.004808 0.00000  test
                 avx_ChirpData_a 0.004083 0.00000  test
                 avx_ChirpData_b 0.003978 0.00000  test
                 avx_ChirpData_c 0.004161 0.00000  test
                 avx_ChirpData_d 0.004288 0.00000  test
                 avx_ChirpData_e 0.003972 0.00000  test
                 avx_ChirpData_f 0.003840 0.00000  test
                 avx_ChirpData_f 0.003840 0.00000  choice

            Third run

                     v_ChirpData 0.009758 0.00000  test
                   fpu_ChirpData 0.018261 0.00000  test
               fpu_opt_ChirpData 0.009494 0.00000  test
               sse1_ChirpData_ak 0.009149 0.00000  test
             sse1_ChirpData_ak8e 0.007363 0.00000  test
             sse1_ChirpData_ak8h 0.007963 0.00000  test
               sse2_ChirpData_ak 0.007715 0.00000  test
              sse2_ChirpData_ak8 0.004633 0.00000  test
               sse3_ChirpData_ak 0.007329 0.00000  test
              sse3_ChirpData_ak8 0.004750 0.00000  test
                 avx_ChirpData_a 0.004010 0.00000  test
                 avx_ChirpData_b 0.004000 0.00000  test
                 avx_ChirpData_c 0.004277 0.00000  test
                 avx_ChirpData_d 0.004212 0.00000  test
                 avx_ChirpData_e 0.004129 0.00000  test
                 avx_ChirpData_f 0.003745 0.00000  test
                 avx_ChirpData_f 0.003745 0.00000  choice

                   Test duration     9.91 seconds

Ftst_v7 completed successfully.

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 98
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 57
Total: 57
Powered by EzPortal