+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: optimized sources  (Read 548897 times)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #210 on: 26 Oct 2007, 10:14:58 am »
LOL, Here's one in seti_analyze that disappears if going to FFTW,
 
Code: [Select]
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
                    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif

I see a few of those.

Another thought. Has anyone attempted to use that FFTW codelet generator given that only a small portion of fftw is used? I have played with OCAML before, didn't seem hard.[but it was long enough ago to have forgotten everything :D]

Jason
« Last Edit: 26 Oct 2007, 10:22:59 am by j_groothu »

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: optimized sources
« Reply #211 on: 26 Oct 2007, 08:55:01 pm »
LOL, Here's one in seti_analyze that disappears if going to FFTW,
 
Code: [Select]
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
                    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif

I see a few of those.

Another thought. Has anyone attempted to use that FFTW codelet generator given that only a small portion of fftw is used? I have played with OCAML before, didn't seem hard.[but it was long enough ago to have forgotten everything :D]

Jason

Yes, those memcpy calls could be eliminated if the IPP FFTs were switched to out of place. Testsuji made that change in the official sources after 5.15 so they aren't included in our source. I've mentioned this several times, but it would be best if someone who actually works with IPP made and tested the changes.

I've thought about codelet generation, even downloaded OCAML, but have never done anything. I suspect there could be some efficiency to be gained by an FFTW function which combined the FFT and conversion to PowerSpectrum; the final FFT stage has the values needed to save the power rather than having a separate function to go through the complex array and convert it. The reversibility of a complete FFT is only needed during baseline smoothing.
                                                      Joe

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #212 on: 26 Oct 2007, 11:56:33 pm »
yes , What triggered the mention is an attempt for myself to understand why the susbsequent calls to IPP are out of place versions, called using the same source and destination 'inplace style',  odd  :o.
     ippsFFTInv_CToC_32fc(
                        ( Ipp32fc * ) WorkData,   //pSrcDst for inplace, pSrc for outplace
                        ( Ipp32fc * ) WorkData,    // additional parameter indicating out of place call ?
                                                                       // maybe, drop it for in place, or change for out of place proper
                                                                      // and disable preceding memcpy.
                        FftSpec[FftNum],
                        FftBuf );

Having a play with MinGW over Eclipse at the moment for other work, less vendor library oriented.  I'm liking it, a big switch as I haven't used a gnu compiler for years.  Woot, 'proper make facilities'  ;D, means I'll have to take a deeper look at FFTW sometime soon.

Jason

[ Maybe when I'm back on ICC/IPP, I'll see what breaks if I comment out the memcpy, and use out of place parameter , just by changing the source arguement (In all those places in seti_analyze),   be indeed nice if someone more IPP experienced and in the sources loop could look at Joe's observation and comment, test etc.]
« Last Edit: 27 Oct 2007, 11:23:24 am by j_groothu »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #213 on: 27 Oct 2007, 10:54:45 am »
Okay, Just for kicks I managed to get FFTW 3.1.2 (configure & make) scripts operational in MinGW/MSYS.  No Idea what to make of the actual configuration flags (* config.h) yet though  ;D back to the doccos!
« Last Edit: 27 Oct 2007, 11:31:07 am by j_groothu »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #214 on: 27 Oct 2007, 03:39:09 pm »
Hi Jason,
nice that you encouraged me, thanks...

The points are in process_data ------> here with some changes to the original code
// ----------------------------------------------------------------------------
//   Function:   process_data
//   Typ      :   void
//   Inhalt   :   process data, with or without transpose      
//   parameter:   none
//   last update:19.03.2007   by:seti_britta      
// ----------------------------------------------------------------------------
// Part 4.2 process data
// ----------------------------------------------------------------------------
void process_data()
   {
      extern int have_transpose;
      if (!have_transpose) ifft = 0;   // seti_britta: ifft=0, when no transpose
      for (; ifft < NumFfts; ifft++ )
            {
                CurrentSub = fftlen * ifft;
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
                    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif

               // seti_britta:move the calculation of flops to the point where fftlen get value
               // flops_form1= 4 * fftlen + 5 * fftlen * log( double(fftlen) ) / log(2.0 )
               // count_flops( 4 * fftlen + 5 * fftlen * log( double(fftlen) ) / log(2.0 ) );
               count_flops(flops_form1);   // setibritta: new statement

                #if defined( USE_IPP )
                    ippsFFTInv_CToC_32fc(
                        ( Ipp32fc * ) WorkData,
                        ( Ipp32fc * ) WorkData,
                        FftSpec[FftNum],
                        FftBuf );
                #elif defined( USE_FFTWF )
                    fftwf_execute_dft(
                        analysis_plans[FftNum],
                        &ChirpedData[CurrentSub],
                        WorkData );
                #else
                    // replace time with freq - ooura FFT
               // seti_britta: take mul with 2 out off the loop, where fftlen get value
                 /*   cdft( fftlen * 2, 1, WorkData, BitRevTab[FftNum], CoeffTab[FftNum] ); */
               cdft( fftlen_m2, 1, WorkData, BitRevTab[FftNum], CoeffTab[FftNum] );
                #endif
            if (have_transpose)
               {
               // BENH: new version replace freq with power
               //      does transpose as well as puts values back
               //      into WorkData (for use by findSpikes)
               GetPowerSpectrum( WorkData, PowerSpectrum, fftlen, ifft, NumFfts);
               have_transpose = false;
               }
                // replace freq with power
            // no transpose
                else PwrSpectrumOnly( WorkData, (float *)WorkData, fftlen );

                // any ETIs ?!
                // If PoT freq bin is non-negative, we are into PoT analysis
                // for this cfft pair and need not redo spike finding.
                if ( analysis_state.PoT_freq_bin == -1 )
                    {
                    count_flops( fftlen );
                    retval = FindSpikes( (float *)WorkData, fftlen, ifft, swi );
                    progress += SpikeProgressUnits( fftlen ) * ProgressUnitSize / NumFfts;
                    if ( retval ) SETIERROR( retval, "from FindSpikes" );
                    }

                // progress = ((float)icfft)/num_cfft + ((float)ifft)/(NumFfts*num_cfft);
                progress = std::min( progress, 1.0 );
                #ifdef BOINC_APP_GRAPHICS
                    if ( !nographics() )
                        {
                        if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                        sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                        }
                #endif
                remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                fraction_done( progress, remaining );

            }   // end for ifft < NumFfts
   } // end part 4.2 process_data
------------------------------------------------------------------
regards heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #215 on: 28 Oct 2007, 12:11:42 am »
Yeah, I still have the clunky seti_analyze function :D , but that's the same code. I think 3 or 4 places have that arrangement (didn't count properly yet).  Here's what I'm suggesting for those IPP inclined might work for a crude test (might break other non IPP / FFTW versions though, but good enough to test):

Quote
    ...
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
               // Commenting out the mempy()
              //    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif
     ...
          // Now fix the source for out of place IPP call properly
                #if defined( USE_IPP )
                    ippsFFTInv_CToC_32fc(
          //             ( Ipp32fc * ) WorkData, // changing from this source
                      ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
                        ( Ipp32fc * ) WorkData, // leave as same destination
                        FftSpec[FftNum],
                        FftBuf );   

Maybe there is a trick I don't know or understand to the original code, If so then  8), but I can't see it.  Maybe in the next couple of weeks I can see what happens.

Jason

[Maybe this will be nicer on Non IPP/FFTW builds
changing this:
  #ifndef USE_FFTW        // FFTW now uses out of place transforms.
to something like this:
#ifndef USE_FFTW  || USE_IPP     // FFTW & IPP now use out of place transforms.
]
« Last Edit: 28 Oct 2007, 12:36:54 am by j_groothu »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #216 on: 28 Oct 2007, 03:42:39 am »
... The reversibility of a complete FFT is only needed during baseline smoothing.
                                                      Joe

Now there's a possible can of worms.  Green from the IPP tutorials I was thinking in terms of thresholding denormal data etc...to speed up IPP.  I figured the destructiveness might be a problem so left it there until I get a better handle on things. 
But seeing as different architectures are returning similar (enough) results despite, as I understand it:
            - no or limited threshholding in place for verysmall/big numbers in the data (Could be wrong there)
            - known architecture dependance  in calculation for such boundary data (randomness)
            - significant penalties in SSE for arithmetic with these numbers (these do show in vtune profiles)
            - reduced IPP performance with this denormal data
I might have to look again.
« Last Edit: 28 Oct 2007, 03:47:20 am by j_groothu »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #217 on: 28 Oct 2007, 07:33:10 am »
Hi Jason,
#ifndef USE_FFTW  || USE_IPP     // FFTW & IPP now use out of place transforms
-------------------------------------------------------------------------------------------------------------------
give warnings if I compile --->
------ Build started: Project: seti_boinc, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "." /I "../../../boinc/api" /I "../../../boinc/client/win" /I "../../../boinc/lib" /I ".." /I "glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\boinc" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_DEBUG" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MTd /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\analyzeFuncs.cpp"
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
..\analyzeFuncs.cpp(694) : warning C4067: unexpected tokens following preprocessor directive - expected a newline
..\analyzeFuncs.cpp(1187) : warning C4067: unexpected tokens following preprocessor directive - expected a newline
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 2 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
so I will use  ---->
           #ifndef USE_FFTWF   // FFTW & IPP now use out of place transforms
      // memcopy no longer necessary
        //    memcpy( DataOutChunk, DataInChunk, int(NumPointsInChunk * sizeof(sah_complex)) );
        #endif
-----------------------------------------------------------------------------------------------------------
this block is empty so we can delete it, but I let it in there for documentation
it compiles fine now without any warnings --->
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
-----------------------------------------------------------------------------------------------------------
There are exact 3 points in analyzeFuncs.cpp
The first is in do_transpose
// ----------------------------------------------------------------------------
//   Function:   do_transpose()
//   Typ      :   void
//   Inhalt   :   do transpose
//   parameter:   none
//   last update:19.03.2007   by:seti_britta      
// ----------------------------------------------------------------------------
// Part 4.2.1 do tanspose, use strips of 4
// ----------------------------------------------------------------------------
void do_transpose()
{
   extern int have_transpose;
   for ( ifft = 0; ifft < NumFfts - 3; ifft += 4 )
      {
            // do transpose
             for ( int iC = 0; iC < 4; iC++ )
                 {
                    CurrentSub = fftlen * (ifft + iC);
                /*   sah_complex *WorkArea = &WorkData[iC * fftlen / 2];*/  // assume sah_complex 2 floats
            // seti_britta: do fftlen / 2 out of the loop, where fftlen get its value
               sah_complex *WorkArea = &WorkData[iC * fftlen_half];  // assume sah_complex 2 floats
               #ifndef USE_FFTW      // FFTW & IPP now use out of place transforms
               // memcopy no longer necessary,
                    //    memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                    #endif

                    #if defined( USE_IPP )
                        ippsFFTInv_CToC_32fc(
                            ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
                     ( Ipp32fc * ) WorkArea, // leave as same destination
                     FftSpec[FftNum],
                     FftBuf );
                    #elif defined( USE_FFTWF )
                        fftwf_execute_dft( analysis_plans[FftNum], &ChirpedData[CurrentSub], WorkArea );
                    #else // replace time with freq - ooura FFT
               // seti_britta: take mult wit 2 out of the loop, where fftlen get its value
                    /*  cdft( fftlen * 2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] ); */
                  cdft( fftlen_m2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] );
                    #endif

                    // replace freq with power
                    PwrSpectrumOnly( WorkArea, (float *)WorkArea, fftlen );
               // seti_britta:move the calculation of flops out of loop, where fftlen get value
                    // count_flops( 4 * fftlen + 5 * fftlen * log( double(fftlen) ) / log(2.0 ) );
               count_flops(flops_form1);   // setibritta: new statement

                    // any ETIs ?!
                    // If PoT freq bin is non-negative, we are into PoT analysis
                    // for this cfft pair and need not redo spike finding.
                    if ( analysis_state.PoT_freq_bin == -1 )
                        {
                        count_flops( fftlen );
                        retval = FindSpikes( (float *)WorkArea, fftlen, ifft + iC, swi );
                        progress += SpikeProgressUnits( fftlen ) * ProgressUnitSize / NumFfts;
                        if ( retval ) SETIERROR( retval, "from FindSpikes" );
                        }

                    // progress = ((float)icfft)/num_cfft + ((float)ifft)/(NumFfts*num_cfft);
                    progress = std::min( progress, 1.0 );
                    #ifdef BOINC_APP_GRAPHICS
                        if ( !nographics() )
                            {
                            if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                            sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                            }
                    #endif
                    remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                    fraction_done( progress, remaining );

                 } // end ic < 4
                TransposeStrip(fftlen, NumFfts, ifft, (float *)WorkData, PowerSpectrum);
        } // end for ifft < NumFfts - 3
      // transpose done
      have_transpose = true;   // seti_britta: tell process_data that transpose is done
}   // end of do_transpose
-------------------------------------------------------------------------------------------------------------------
The second is in process_data
The third is in v_BaseLineSmooth
        DataInChunk = &( DataIn[TimeChunk * NumPointsInChunk] );
        #ifndef USE_FFTWF   // FFTW & IPP now use out of place transforms
      // memcopy no longer necessary
        //    memcpy( DataOutChunk, DataInChunk, int(NumPointsInChunk * sizeof(sah_complex)) );
        #endif

        // transform to freq
        #ifdef USE_IPP
            ippsFFTInv_CToC_32fc(
                // ( Ipp32fc * ) DataOutChunk,
            ( Ipp32fc * ) DataInChunk,  // to direct source for out of place
                ( Ipp32fc * ) DataOutChunk, // leave as same destination
                FftSpec,
                NULL );
--------------------------------------------------------------------------------------------------------------------------
a fourth is in benchmark.cpp
But I have no idea if there is anything todo
line 618 ff
   for(loops = 0; loops < 25 && (end_cyc-total_run)< MAX_CYCLES; loops++)
      {
      if(pre_test == zero_out)   memset( out_buf, 0, test_size );
      if(pre_test == fill_in)      memcpy( out_buf, workBuf, test_size );
      ramming_speed();
      cycles = cycleCount();
      switch ( bench_list[idx].token )
         {
         case _FFT:
            #if defined( USE_IPP )
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            #endif //seti_britta:
            #if defined( USE_FFTWF )
            fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );
            #endif
            break;
         case SUM2_TBL:
------------------------------------------------------------------------------------------
What do you think about this in benchmark.cpp ?
regards heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #218 on: 28 Oct 2007, 08:04:58 am »
Yes that will be broken, I'll look up the correct preprocessor format, and post it here, as soon as my machine stops choking on the MinGW FFTW build (need bigger machine :( ).

[Later:  Here is the better preprocessor format, and it is FFTWF not FFTW, I think I need spectacles, a faster computer ,more practice with preprocessor directives, and maybe a beer! :D ]

#if !(defined(USE_IPP) | defined(USE_FFTWF))
   //statements to be used if neither FFTW or IPP
   //memcopy is here, this should only run for builds with no IPP or FFTW .... or if the following IPP call wasn't updated
        memcpy(.......,.......)
#endif
« Last Edit: 28 Oct 2007, 08:31:03 am by j_groothu »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #219 on: 28 Oct 2007, 08:34:16 am »
...
What do you think about this in benchmark.cpp ?
regards heinz

Looking at the places you show now   :)

[Later:  In Benchmark.cpp It is also an out of place IPP inverse FFT call, but with same source and destination parameters.
  ..NOTE:  I am  still wondering if they must have done it like that on purpose but still don't know why]

If it is meant to be in place FFT it should be:
              ippsFFTInv_CToC_32fc(
                                           
                  ( Ipp32fc * ) out_buf, // This is both source and destination, don't need second time
                                                  // usually  needs to come from a memcpy first to not corrupt source data.
                                                 // benchmark.cpp is special case because it might be zeros or filled buffer
                  FftSpec,
                  NULL );

If it is meant to be out of place it should be: [ I think you cannot use this for Benchmark.cpp , we might use zero fill]
              ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) workBuf, // This is the source data, this is not overwritten
                  ( Ipp32fc * ) out_buf, // This is some other Buffer destination
                                                 // no memcpy required
                  FftSpec,
                  NULL );
« Last Edit: 28 Oct 2007, 09:34:47 am by j_groothu »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #220 on: 28 Oct 2007, 09:36:47 am »
Hi Jason,
benchmark.cpp ------>
-----------------------------------------------------------------------------------------------
      switch ( bench_list[idx].token )
         {
         case _FFT:
            #if defined( USE_IPP )
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) workBuf,   // This is the source data, this is not overwritten
                  ( Ipp32fc * ) out_buf,   // This is some other Buffer destination
                                    // no memcpy required
                  FftSpec,
                  NULL );
            #endif //seti_britta:
            #if defined( USE_FFTWF )
            fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );
            #endif
--------------------------------------------------------------------------
it compiles well ------>
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MTd /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc90.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\benchmark.cpp"
benchmark.cpp
-----IPP-----
-----SSE2-----
-----ipp-----
-----sse2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
tried this as you prevent it ---->
            #if !(defined(USE_IPP) | defined(USE_FFTWF))
                  //statements to be used if neither FFTW or IPP
                  //memcopy is here, this should only run for builds with no IPP or FFTW .... or if the following IPP call wasn't updated
                        memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                    #endif
---------------------------------------------------------------------------------------
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
..\analyzeFuncs.cpp(630) : error C2065: 'WorkArea' : undeclared identifier
..\analyzeFuncs.cpp(642) : error C2065: 'WorkArea' : undeclared identifier
..\analyzeFuncs.cpp(642) : error C2065: 'WorkArea' : undeclared identifier
..\analyzeFuncs.cpp(653) : error C2065: 'WorkArea' : undeclared identifier
--------------------------------------------------------------------------------------------------------
627                     #if defined( USE_IPP )
628                         ippsFFTInv_CToC_32fc(
629                            ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
630                     ( Ipp32fc * ) WorkArea, // leave as same destination
                     FftSpec[FftNum],
                     FftBuf );

-------------------------------------------------------------------------------------------------------------------------------
hmm.... will use the old statement then it compiles....
heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #221 on: 28 Oct 2007, 09:44:44 am »
In Benchmark.cpp, I am worried that will not work in the case we use a zero fill instead of workBuf as the source.

In AnalyzeFuncs, I am looking where/ why your WorkArea has gone  :o

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #222 on: 28 Oct 2007, 09:53:54 am »
Are you missing this line above the   memcpy #if block in that one place?

          sah_complex *WorkArea = &WorkData[iC * fftlen / 2];  // assume sah_complex 2 floats
          #if !(defined(USE_IPP) | defined(USE_FFTWF))
                  //statements to be used if neither FFTW or IPP
                   memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
          #endif

You have different source to me! ( different line numbers! ) uh oh

[PS: If it is breaking your source to try this then I suggest to stop and reverse :D It is a nice idea that may or may not show any benefit in the long run, but needs more planning, consideration and testing before wholesale code changes are made. Baby steps are better IMO, Besides,  I break enough of my own code  ;)  ]
« Last Edit: 28 Oct 2007, 10:48:32 am by j_groothu »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #223 on: 28 Oct 2007, 11:02:40 am »
Merci for the comments,
in this way it compiles ---->
------------------------------------------------------------------------------------------------------
                    CurrentSub = fftlen * (ifft + iC);
                /*   sah_complex *WorkArea = &WorkData[iC * fftlen / 2];*/  // assume sah_complex 2 floats
            // seti_britta: do fftlen / 2 out of the loop, where fftlen get its value
               sah_complex *WorkArea = &WorkData[iC * fftlen_half];  // assume sah_complex 2 floats               #ifndef USE_FFTW      // FFTW & IPP now use out of place transforms
               // memcopy no longer necessary,
                    //    memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                    #endif

                    #if defined( USE_IPP )
                        ippsFFTInv_CToC_32fc(
                            ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
                     ( Ipp32fc * ) WorkArea, // leave as same destination
                     FftSpec[FftNum],
                     FftBuf );
                    #elif defined( USE_FFTWF )
                        fftwf_execute_dft( analysis_plans[FftNum], &ChirpedData[CurrentSub], WorkArea );
                    #else // replace time with freq - ooura FFT
               // seti_britta: take mult wit 2 out of the loop, where fftlen get its value
                    /*  cdft( fftlen * 2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] ); */
                  cdft( fftlen_m2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] );
                    #endif
--------------------------------------------------------------------------------------------------------
the yellow line was there....
yes very different linenumbers, I give the analyzeFuncs.cpp a new structure......
I will have a look at benchmark again.... make a trigger for the case we use a zero fill.
heinz

heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #224 on: 28 Oct 2007, 11:13:25 am »
the yellow line was there....
    and still "undefined" variable WorkArea?,  that is wierd  :o

Quote
yes very different linenumbers, I give the analyzeFuncs.cpp a new structure......
I will have a look at benchmark again.... make a trigger for the case we use a zero fill.
heinz
Ahh that's right, the improved model you showed me ... that's some good stuff mmm.

I'll take a look in mine at the WorkArea part,  I think may be in an inner loop and may be a most important place if something is to show a change in tests.

Jason

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 652
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 611
Total: 611
Powered by EzPortal