Forum > Windows

optimized sources

<< < (51/179) > >>

_heinz:
easy we can compile all 3 cases with the präprozessordefinition now --->
---------------------------------------------------------------------------------------------------
// USE_PFLOOP  --> Präprozessordirective
// USE_PFCASE  --> Präprozessordirective
#if defined( USE_PFLOOP )
   #pragma message ("-----PFLOOP-----")
   #include "pfloop.h" //use the loop-construct
#else
#if defined( USE_PFCASE )
   #pragma message ("-----PFCASE-----")
   #include "pfcase.h" //use the modified case-construct
#else
   //use original code
#endif // USE_PFCASE
#endif // USE_PFLOOP
-----------------------------------------------------------------------------------------
------ Build started: Project: seti_boinc, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "." /I "../../../boinc/api" /I "../../../boinc/client/win" /I "../../../boinc/lib" /I ".." /I "glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\boinc" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_DEBUG" /D "USE_PFLOOP" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MTd /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\pulsefind.cpp"
pulsefind.cpp
-----PFLOOP-----
..\pulsefind.cpp(1487) : warning C4146: unary minus operator applied to unsigned type, result still unsigned
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 1 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

regards   ;D

Jason G:

--- Quote from: seti_britta on 07 Nov 2007, 01:55:39 pm ---       have a strong modified chirpfft.cpp which we can try  too

--- End quote ---

Good we'll do that I think it is a very good idea, I have p4 sse2  primary performance data  (vtune) for the sse2_ChirpData_ak, 10000 loops on p4 Northwood with 512k l2 cache, which took a toral time of 10 secs execution time: (19 runs worth of data gathered)
(preliminary data, subject to verification with further runs)
   64k Alaising : almost none... Accounts for 1.34% of function workload (about 0.13 secs)
  Second Level Cache misses: Accounts for 10.28% of the workload (about 1 second)

other statistics (preliminary, subject to verification) :
128 bit mmx instructions ~82 million (no 64 bit MMX instructions counted)
packed double precision Floating Point SSE instructions ~1.4 billion (thousand million)
packed single precision  Floating Point SSE instructions ~4 billion (thousand million)

Mispredicted Branches = 0 !!!  :o

No Machine Clear counts (Pipeline flushes), split loads or blocked store forwards at all :D

I think that's a really good function, much better statistics than the pulefolding functions gave me, but I'll have to retest those in isolation too as I'm getting better at selecting the correct compiler settings and driving vtune too.

Well I'll check a few build setting and run primary performance measures again to verify those results, and add secondary performance indicators to see what else turns up.... Then on the weekend maybe fiddle with that 3 phase idea to see if it actually works....All good fun :D...

Jason


_heinz:
the modified PFCASE is ready now
-----------------------------------------------
------ Build started: Project: seti_boinc, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "." /I "../../../boinc/api" /I "../../../boinc/client/win" /I "../../../boinc/lib" /I ".." /I "glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\boinc" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_DEBUG" /D "USE_PFCASE" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MTd /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\pulsefind.cpp"
pulsefind.cpp
-----PFCASE-----
..\pulsefind.cpp(1487) : warning C4146: unary minus operator applied to unsigned type, result still unsigned
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 1 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
 ;D

_heinz:
modified PFCASE rocks

here as it was before --->
ar=0.435000 done. Total flop count: 108711033335.208650

PulTimB 0.5    Totals:  Ratio            Ticks
             standard:  1.000      87303043476
Plan < 512 FPU swi ! :  0.575      50201832416
 Plan < 512 AK SSE ! :  0.634      55338411648
Plan < 512 BHx SSE ! :  0.993      86661631716
 Plan < 512 BH SSE ! :  0.774      67545465584

PFCASE ---->
ar=0.435000 done. Total flop count: 108711033335.208650

PulTimB 0.5    Totals:  Ratio            Ticks
             standard:  1.000      87387438720
Plan < 512 FPU swi ! : 0.504      44014700492
 Plan < 512 AK SSE ! :  0.633      55324520388
Plan < 512 BHx SSE ! :  0.992      86681643504
 Plan < 512 BH SSE ! :  0.773      67531081560
----------------------------------------------------------------------------------------------------
modified PFCASE ---> ~13% faster     ;D
heinz

Jason G:
Woohoo!, It's weekend! that function was with just the changes you made before? I'll guess that maybe the compiler did vectorise some of that,  I would like to look at disassembly output,  if the compiler was smart enough to put prefetch plus FPU plus streaming stores then that IS 3-Phase :D, anything is possible, have you compared for accuracy as well ?

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version