+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: optimized sources  (Read 615621 times)

ScanMan

  • Guest
Re: optimized sources
« Reply #270 on: 18 Nov 2007, 11:26:17 am »
Thanks for the heads up on my question.


Regards

ScanMan

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #271 on: 18 Nov 2007, 06:27:58 pm »
Hi Jason,
Merci for compiling my codepieces and make asm files with Intel-Compiler. After a first look at asm-code, AKFCOMP and FPUCOMP performs well.  ;D

found why my asm output not worked in ORCAS, in Configuration was Release, but must have Debug.  ;)

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #272 on: 21 Nov 2007, 09:41:07 am »
Hi Jason,
if you have some little time, try this with the Intel-compiler and use the etimer-project for measuring.
if you need anythink PM me.
------------------------------------------
------ Build started: Project: Optimizer, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_AKFCOMP" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MTd /Zp16 /Gy /FAs /Fa"Release32-NOGFX\\" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc90.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp"
AKfoldSSE.cpp
-----IPP-----
-----SSE2/em-----
-----AKFCOMP-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
I had have a look at the asm-file yet.   ;)
heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #273 on: 21 Nov 2007, 10:05:43 am »
Will have a look at compiling this with 'USE_AKFCOMP" defined soon , and check if I need anything else.
[was done & pm'd]

Jason
« Last Edit: 21 Nov 2007, 11:43:12 am by j_groothu »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #274 on: 21 Nov 2007, 01:09:56 pm »
Merci,
must a little be finetuned to go more parallel.
PM you if it is done.
heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #275 on: 28 Nov 2007, 08:21:52 pm »
The auto- vectorizer runs  ;D
-----------------------------------
------ Build started: Project: Optimizer, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_AKFSIMD" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MTd /Zp16 /arch:SSE2 /fp:fast /FAs /Fa"Release32-NOGFX\\" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc90.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp"
AKfoldSSE.cpp
-----IPP-----
-----SSE2/em-----
-----AKFSIMD-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #276 on: 30 Nov 2007, 08:08:10 pm »
Working now on a vectorized version of chirpfft.cpp
heinz  ;D

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #277 on: 01 Dec 2007, 07:28:09 am »
Hi Heinz,
  Did you manage to determine any performance differences between our 'auto vectoriser friendly' folding routine (when compiled under ICC, with the pragma hints / dependency overrides) and hand vectorised code?  If you haven't had a chance I'll be able to take another look in 2 weeks (holidays  ;D)

Jason

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #278 on: 03 Dec 2007, 06:22:15 pm »
Hi Jason,
I´m waiting with this till you have holidays. Realised some nice ideas to eleminate not necessary code.  ::)
The autovectorizer runs great. Let surprise you.
Have a nice week.
Heinz  ;D

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #279 on: 12 Dec 2007, 05:16:57 pm »
As I´m going through the code, fraction_done get my attention.
Always before it is called we found (sometimes not directly before) following statement --->
progress = std::min( progress, 1.0 );

1. in function do_transpose
                    progress = std::min( progress, 1.0 );
                    #ifdef BOINC_APP_GRAPHICS
                        if ( !nographics() )
                            {
                            if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                            sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                            }
                    #endif
                    remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                    fraction_done( progress, remaining );
----------------------------------------------------------------------------------------------------------
2. in function process_data
                progress = std::min( progress, 1.0 );
                #ifdef BOINC_APP_GRAPHICS
                    if ( !nographics() )
                        {
                        if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                        sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                        }
                #endif
                remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                fraction_done( progress, remaining );
------------------------------------------------------------------------------------------------------
3. in analyzePoT.cpp line 246
         progress = std::min( progress, 1.0 );   // prevent display of >   100%
         fraction_done( progress, remaining );
-----------------------------------------------------------------------------------------------------------------------------------
4.  in analyzePot.cpp line 387
            progress = std::min( progress, 1.0 );   // prevent display of >   100%
            fraction_done( progress, remaining );
----------------------------------------------------------------------------------------------------------------------------------------------------
therefore I think if we call fraction_done( double progress, double remaining )
it is not necessary in it to calculate progress again --->progress = std::min( progress, 1.0 );
because we get same result as before. So we can comment it out.
After helping the Compiler with some additional vars we get following short hopfully effective code --->

; 75   :     prog2 = 1.0 - remaining;

   fld1
   fsub   QWORD PTR _remaining$[esp-4]

; 76   : //   progress = std::min( progress, 1.0 ); // is alredy done before call fraction_done
; 77   : //   prog = progress * ( 1.0 - pow( prog2, PROG_POWER ) ) + prog2 * pow(prog2,PROG_POWER );//original
; 78   : //   A = pow( prog2,PROG_POWER );
; 79   : //   prog = progress * ( 1.0 - A ) +  prog2 * A ;
; 80   : //   B = 1.0 - A; C = prog2 * A;
; 81   : //   prog = progress * B + C;
; 82   : //  D = progress * B;
; 83   : //   prog = D + C;
; 84   :
; 85   :    A = pow( prog2,PROG_POWER );

   fld   QWORD PTR __real@4018000000000000
   call   __CIpow

; 86   :    B = 1.0 - A; C = prog2 * A;

   fld1
   fsubrp   ST(1), ST(0)

; 87   :    D = progress * B;
; 88   :    prog = D + C;
; 89   :     boinc_fraction_done( prog );

   sub   esp, 8
   fmul   ST(0), ST(0)
   fmul   QWORD PTR _progress$[esp+4]
   fadd   ST(0), ST(0)
   fstp   QWORD PTR [esp]
   call   _boinc_fraction_done
   add   esp, 8

; 90   :     }

   ret   0
?fraction_done@@YAXNN@Z ENDP            ; fraction_done
---------------------------------------------------------------------------------------------------------------------------------------

your comments are welcome

heinz




Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #280 on: 16 Dec 2007, 04:58:53 am »
Working now on a vectorized version of chirpfft.cpp
heinz  ;D
Hi Heinz, I'm now on holidays :D, Are you looking at this one? I am trying to get reoriented after finishing study/work for the year, and am recovering after some serious celebrations :D.  It's time to catch up!

Jason

(PS, I been raised to code wizard so I've been reading more of the private areas, I think some of the stuff we've been trying out to force the autovectoriser has some real relevance and we maybe should start a thread about it there)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #281 on: 17 Dec 2007, 05:20:03 pm »
Hi Jason,
had not have time the last days.... think we should equalize our codes first, if you are agree using the new programm structure I will upload all and if it is done PM you.
heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #282 on: 18 Dec 2007, 02:29:32 am »
Hi Heinz,
Sounds like a good idea, PM when ready, take your time, no rush :D. For a comparative baseline reference, I have a functional 2.4V noGFX build with xN switches now.  It was tough finding a suitable Boincapi svn revision to build against because of much restructuring of random 'utils' and gfx classes between ~august 'til now.  [Investigating some unresolved externals actually led me to posts made by Simon back about July, on Beta, regarding the same sets of unresolved externals].  I think we should decide if we want to fix at a certain Boinc API svn revision (less work but may break), or build against the HEAD (lots more work...).

One initial feeling I get from that experience is any improvements that involve cutting out unnecessary boinc interface, and remove some of the basic string, memory and utility functions away from boincapi --> back towards OS/app might stabilise some of those issues (As these elements seem to be in constant flux in boincapi)....Yes I'm aware that's the exact opposite feeling an api library is supposed to generate [stability and solidity].  Of course a stripped down minimalist 'version' might constitute its own branch.... Just ideas. [Might be an idea to make the required utility functions in our own lib, maybe allowing us to drop some boincapi .h & .c references completely, removing dependancy on the revision... e.g. 'str_util.c' & 'str_util.h'.. do we really need to use boinc's version of this?...]

Jason
« Last Edit: 18 Dec 2007, 03:24:17 am by j_groothu »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #283 on: 29 Dec 2007, 05:19:18 pm »
I like the idea of a stripped down minimalist version, but we should eleminate not necessary code with #ifdef directives, in connection with the use of include files for variants, as I have done it  with USE_PFLOOP etc. , because it is important to have still one sourcecode, from which we can generate all necessary programmversions for the different cpu´s.

'str_util.c' & 'str_util.h'.. do we really need to use boinc's version of this?...] is a question for Joe

surprise... we are codewizards,  ::)   who does it ?   

heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #284 on: 29 Dec 2007, 08:12:53 pm »
Yay, I feel special too.. I believe It was 'The Lunatic Mods of Chickenness" calling on the Holy Powers of the "Knights Who Say Ni!"

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 25
Total: 25
Powered by EzPortal