+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: AVX Optimized App Development  (Read 131737 times)

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: AVX Optimized App Development
« Reply #60 on: 12 May 2011, 09:40:30 pm »
From the Q8200

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: AVX Optimized App Development
« Reply #61 on: 13 May 2011, 07:14:23 am »
And from a i7-2600, without and with BOINC (6.10.60)


Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #62 on: 13 May 2011, 07:23:51 am »
Here's my E8500's J39 run. (5 runs with Boinc and apps running, 5 runs with Boinc and apps shut down)

Claggy

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #63 on: 14 May 2011, 08:59:44 pm »
So here's J40, modified the "_b" AVX folding but expect it will probably still be slower than the "_a" version. For the 4 float SIMD folding it was beter to use non-SIMD for the very shortest cases, AVX looks like it may be better just to handle all sizes as 8 floats with masking at the end. Anyhow, I reduced my guess about how small is too small to be efficient on AVX.

Also added SSE3 and SSE1 modified chirping based on AKv8. There are two variants for SSE1, one uses the Estrin method for the polynomials, the other Horner. Estrin has one fewer instruction but Horner needs fewer registers. On my Pentium-M it's a wash, either one may be marginally faster for a single run. But perhaps on even older systems where SSE1 is the best capability it may make a difference, or perhaps some newer systems will also react in surprising ways.

I've left the AVX chirping unchanged. Of the 6 tests on AVX capable systems a was chosen twice, b twice, and c twice. The largest difference between the slowest and fastest AVX version on one test was about 12%, so it's worth gathering more data.
                                                                                                        Joe
Edit: Attachment deleted, newer version in later post.
« Last Edit: 17 May 2011, 09:21:40 pm by Josef W. Segur »

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: AVX Optimized App Development
« Reply #64 on: 14 May 2011, 10:09:06 pm »
First up the Q8200

The the X4 630
« Last Edit: 14 May 2011, 10:12:34 pm by arkayn »

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #65 on: 15 May 2011, 05:21:03 am »
Here' the J40 run on my E8500 @ 4.14GHz (5 runs with Boinc and apps running, 5 runs with Boinc and apps shut down)

Edit: added Atom N450 run (5 runs with Boinc and one v7 r246 app and one AP r468 app running, and 5 runs with Boinc and apps shut down)

Edit 2: added C2D T8100 run (5 runs with Boinc and one v7 r246 app, one AP r409 app and one Collatz Cuda app running, and 5 runs with Boinc and apps shut down)

Edit 3: Dug my old XP3200 out it's box and connected it up, done a run with J40 (just 5 runs with Boinc shut down)

Claggy
« Last Edit: 15 May 2011, 09:27:09 am by Claggy »

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: AVX Optimized App Development
« Reply #66 on: 15 May 2011, 06:20:57 am »
And 2 J40 runs with BOINC(6.10.60) doing 12 MB WUs and 2 runs nwith BOINC sleeping.
CPU= i7-2600 stock frequency. (3.4GHz.)



« Last Edit: 15 May 2011, 06:25:40 am by Fredericx51 »

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: AVX Optimized App Development
« Reply #67 on: 15 May 2011, 05:45:44 pm »
More tests required from i7-2600/ any CPU supporting AVX ?

Be happy to test your latest fsj40, a couple of more times, if it's output is usefull for
your 'build' eventually, or part of the coders information.

I'm going to download the ertire AVX building and C++ compiler suite (IPP+???)(Still reading a lot of PDF files
getting some usefull info, very time time consuming, though.

Will return soon  ;D


Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: AVX Optimized App Development
« Reply #68 on: 16 May 2011, 11:14:40 am »
Did a few more runs with FTST-J40, 5 with BOINC pauzed, leaving app in memory, 5 with BOINC shut down and 5 with
BOINC running 8 MB WUs on CPU (i7-2600)  & 4 on GPUs (HD5870s).  (Firefoxs history and cache data, flushed.)
OS=WIN7 64BIT, 8GByte DDR3 1333MHz, everything stock settings. BOINC 6.10.60, 64BIT.
46 KByte text, compressewd as f.i. RAR, not even 4KBytes, are needed! (Well, all text is infact the same for every run)

Hope it is usefull, if there is more or other  AVX extended tests are needed, please ask  ::)




« Last Edit: 16 May 2011, 11:58:57 am by Fredericx51 »

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #69 on: 16 May 2011, 03:56:57 pm »
Thanks for the additional runs. More data is definitely useful when there are so many things which can affect individual runs. Whether I can recognize what's significant is doubtful, but it ought to limit my really bad guesses.

I do have a few more things in mind to try, but don't know when I'll be able to actually code them.
                                                                                                 Joe

Offline Fredericx51

  • Knight o' The Round Table
  • ***
  • Posts: 207
  • Knight Who Says Ni N!
Re: AVX Optimized App Development
« Reply #70 on: 17 May 2011, 01:07:14 pm »
Well, I do am (almost) daily looking at something interresting and/or new.
AVX, happens to be one of them  ::)

I hope you'll be able to work out some usefull AVX  configuration for Gauss Fit and other like triplets, pulses and spikes.

If it's usefull, I can run all 3 versions a few times, just ask cause the info might be needed(?)

And the Outage @ SETI@Home, began while I was posting  :-\.

Wishing all a pleasent day, Fred.
B.t.w., when doing 2 on 1 GPU, screenlag is quite heavy, sometime all motion stops, screen no longer Refreshed.
Is ther something to change using the cmd-line parameters? (SETI Bêta rev177 (or newer, will have a look!)

Oh. boy, there might be some double attachments.(?) [Edit by Miep - I took the second instance of stderrftst_V7_J37_W32.rar out]
« Last Edit: 17 May 2011, 02:43:05 pm by Miep »

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: AVX Optimized App Development
« Reply #71 on: 17 May 2011, 02:45:52 pm »
B.t.w., when doing 2 on 1 GPU, screenlag is quite heavy, sometime all motion stops, screen no longer Refreshed.
Is ther something to change using the cmd-line parameters? (SETI Bêta rev177 (or newer, will have a look!)

with OpenCL MB increase -period_iteration_num if it's laggy/ driver restarts.
with AP decrease -unroll and block sizes
I'd point you to my main post on that topic, but that's a tiny bit tricky during maintenance :)
The road to hell is paved with good intentions

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: AVX Optimized App Development
« Reply #72 on: 17 May 2011, 09:19:50 pm »
Here's another test version. I've dropped the JS_AVX_b folding because that approach was a clear failure, but added JS_AVX_c folding which may do better. For the non_AVX side I did some minor cleanup, but don't expect any noticeable difference in results unless I made a typo or something.
                                                                                                  Joe
Edit: Attachment deleted, newer version in later post.
« Last Edit: 21 May 2011, 02:14:10 pm by Josef W. Segur »

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: AVX Optimized App Development
« Reply #73 on: 18 May 2011, 08:38:13 am »
once with boinc running, once without.
The road to hell is paved with good intentions

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: AVX Optimized App Development
« Reply #74 on: 18 May 2011, 09:34:50 am »
Here's the J43 run on my E8500 (5 runs with Boinc and apps running, 5 runs with Boinc and apps shut down)

Claggy

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 352
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 346
Total: 346
Powered by EzPortal