Forum > Windows

optimized sources

<< < (44/179) > >>

Jason G:
Yeah, I still have the clunky seti_analyze function :D , but that's the same code. I think 3 or 4 places have that arrangement (didn't count properly yet).  Here's what I'm suggesting for those IPP inclined might work for a crude test (might break other non IPP / FFTW versions though, but good enough to test):


--- Quote ---    ...
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
               // Commenting out the mempy()
              //    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif
     ...
          // Now fix the source for out of place IPP call properly
                #if defined( USE_IPP )
                    ippsFFTInv_CToC_32fc(
          //             ( Ipp32fc * ) WorkData, // changing from this source
                      ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
                        ( Ipp32fc * ) WorkData, // leave as same destination
                        FftSpec[FftNum],
                        FftBuf );   

--- End quote ---

Maybe there is a trick I don't know or understand to the original code, If so then  8), but I can't see it.  Maybe in the next couple of weeks I can see what happens.

Jason

[Maybe this will be nicer on Non IPP/FFTW builds
changing this:
  #ifndef USE_FFTW        // FFTW now uses out of place transforms.
to something like this:
#ifndef USE_FFTW  || USE_IPP     // FFTW & IPP now use out of place transforms.
]

Jason G:

--- Quote from: Josef W. Segur on 26 Oct 2007, 08:55:01 pm ---... The reversibility of a complete FFT is only needed during baseline smoothing.
                                                      Joe

--- End quote ---

Now there's a possible can of worms.  Green from the IPP tutorials I was thinking in terms of thresholding denormal data etc...to speed up IPP.  I figured the destructiveness might be a problem so left it there until I get a better handle on things. 
But seeing as different architectures are returning similar (enough) results despite, as I understand it:
            - no or limited threshholding in place for verysmall/big numbers in the data (Could be wrong there)
            - known architecture dependance  in calculation for such boundary data (randomness)
            - significant penalties in SSE for arithmetic with these numbers (these do show in vtune profiles)
            - reduced IPP performance with this denormal data
I might have to look again.

_heinz:
Hi Jason,
#ifndef USE_FFTW  || USE_IPP     // FFTW & IPP now use out of place transforms
-------------------------------------------------------------------------------------------------------------------
give warnings if I compile --->
------ Build started: Project: seti_boinc, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "." /I "../../../boinc/api" /I "../../../boinc/client/win" /I "../../../boinc/lib" /I ".." /I "glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\boinc" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_DEBUG" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MTd /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\analyzeFuncs.cpp"
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
..\analyzeFuncs.cpp(694) : warning C4067: unexpected tokens following preprocessor directive - expected a newline
..\analyzeFuncs.cpp(1187) : warning C4067: unexpected tokens following preprocessor directive - expected a newline
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 2 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
so I will use  ---->
           #ifndef USE_FFTWF   // FFTW & IPP now use out of place transforms
      // memcopy no longer necessary
        //    memcpy( DataOutChunk, DataInChunk, int(NumPointsInChunk * sizeof(sah_complex)) );
        #endif
-----------------------------------------------------------------------------------------------------------
this block is empty so we can delete it, but I let it in there for documentation
it compiles fine now without any warnings --->
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
-----------------------------------------------------------------------------------------------------------
There are exact 3 points in analyzeFuncs.cpp
The first is in do_transpose
// ----------------------------------------------------------------------------
//   Function:   do_transpose()
//   Typ      :   void
//   Inhalt   :   do transpose
//   parameter:   none
//   last update:19.03.2007   by:seti_britta      
// ----------------------------------------------------------------------------
// Part 4.2.1 do tanspose, use strips of 4
// ----------------------------------------------------------------------------
void do_transpose()
{
   extern int have_transpose;
   for ( ifft = 0; ifft < NumFfts - 3; ifft += 4 )
      {
            // do transpose
             for ( int iC = 0; iC < 4; iC++ )
                 {
                    CurrentSub = fftlen * (ifft + iC);
                /*   sah_complex *WorkArea = &WorkData[iC * fftlen / 2];*/  // assume sah_complex 2 floats
            // seti_britta: do fftlen / 2 out of the loop, where fftlen get its value
               sah_complex *WorkArea = &WorkData[iC * fftlen_half];  // assume sah_complex 2 floats
               #ifndef USE_FFTW      // FFTW & IPP now use out of place transforms
               // memcopy no longer necessary,
                    //    memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                    #endif

                    #if defined( USE_IPP )
                        ippsFFTInv_CToC_32fc(
                            ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
                     ( Ipp32fc * ) WorkArea, // leave as same destination
                     FftSpec[FftNum],
                     FftBuf );
                    #elif defined( USE_FFTWF )
                        fftwf_execute_dft( analysis_plans[FftNum], &ChirpedData[CurrentSub], WorkArea );
                    #else // replace time with freq - ooura FFT
               // seti_britta: take mult wit 2 out of the loop, where fftlen get its value
                    /*  cdft( fftlen * 2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] ); */
                  cdft( fftlen_m2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] );
                    #endif

                    // replace freq with power
                    PwrSpectrumOnly( WorkArea, (float *)WorkArea, fftlen );
               // seti_britta:move the calculation of flops out of loop, where fftlen get value
                    // count_flops( 4 * fftlen + 5 * fftlen * log( double(fftlen) ) / log(2.0 ) );
               count_flops(flops_form1);   // setibritta: new statement

                    // any ETIs ?!
                    // If PoT freq bin is non-negative, we are into PoT analysis
                    // for this cfft pair and need not redo spike finding.
                    if ( analysis_state.PoT_freq_bin == -1 )
                        {
                        count_flops( fftlen );
                        retval = FindSpikes( (float *)WorkArea, fftlen, ifft + iC, swi );
                        progress += SpikeProgressUnits( fftlen ) * ProgressUnitSize / NumFfts;
                        if ( retval ) SETIERROR( retval, "from FindSpikes" );
                        }

                    // progress = ((float)icfft)/num_cfft + ((float)ifft)/(NumFfts*num_cfft);
                    progress = std::min( progress, 1.0 );
                    #ifdef BOINC_APP_GRAPHICS
                        if ( !nographics() )
                            {
                            if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                            sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                            }
                    #endif
                    remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                    fraction_done( progress, remaining );

                 } // end ic < 4
                TransposeStrip(fftlen, NumFfts, ifft, (float *)WorkData, PowerSpectrum);
        } // end for ifft < NumFfts - 3
      // transpose done
      have_transpose = true;   // seti_britta: tell process_data that transpose is done
}   // end of do_transpose
-------------------------------------------------------------------------------------------------------------------
The second is in process_data
The third is in v_BaseLineSmooth
        DataInChunk = &( DataIn[TimeChunk * NumPointsInChunk] );
        #ifndef USE_FFTWF   // FFTW & IPP now use out of place transforms
      // memcopy no longer necessary
        //    memcpy( DataOutChunk, DataInChunk, int(NumPointsInChunk * sizeof(sah_complex)) );
        #endif

        // transform to freq
        #ifdef USE_IPP
            ippsFFTInv_CToC_32fc(
                // ( Ipp32fc * ) DataOutChunk,
            ( Ipp32fc * ) DataInChunk,  // to direct source for out of place
                ( Ipp32fc * ) DataOutChunk, // leave as same destination
                FftSpec,
                NULL );
--------------------------------------------------------------------------------------------------------------------------
a fourth is in benchmark.cpp
But I have no idea if there is anything todo
line 618 ff
   for(loops = 0; loops < 25 && (end_cyc-total_run)< MAX_CYCLES; loops++)
      {
      if(pre_test == zero_out)   memset( out_buf, 0, test_size );
      if(pre_test == fill_in)      memcpy( out_buf, workBuf, test_size );
      ramming_speed();
      cycles = cycleCount();
      switch ( bench_list[idx].token )
         {
         case _FFT:
            #if defined( USE_IPP )
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            #endif //seti_britta:
            #if defined( USE_FFTWF )
            fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );
            #endif
            break;
         case SUM2_TBL:
------------------------------------------------------------------------------------------
What do you think about this in benchmark.cpp ?
regards heinz

Jason G:
Yes that will be broken, I'll look up the correct preprocessor format, and post it here, as soon as my machine stops choking on the MinGW FFTW build (need bigger machine :( ).

[Later:  Here is the better preprocessor format, and it is FFTWF not FFTW, I think I need spectacles, a faster computer ,more practice with preprocessor directives, and maybe a beer! :D ]

#if !(defined(USE_IPP) | defined(USE_FFTWF))
   //statements to be used if neither FFTW or IPP
   //memcopy is here, this should only run for builds with no IPP or FFTW .... or if the following IPP call wasn't updated
        memcpy(.......,.......)
#endif

Jason G:

--- Quote from: seti_britta on 28 Oct 2007, 07:33:10 am ---...
What do you think about this in benchmark.cpp ?
regards heinz

--- End quote ---

Looking at the places you show now   :)

[Later:  In Benchmark.cpp It is also an out of place IPP inverse FFT call, but with same source and destination parameters.
  ..NOTE:  I am  still wondering if they must have done it like that on purpose but still don't know why]

If it is meant to be in place FFT it should be:
              ippsFFTInv_CToC_32fc(
                                           
                  ( Ipp32fc * ) out_buf, // This is both source and destination, don't need second time
                                                  // usually  needs to come from a memcpy first to not corrupt source data.
                                                 // benchmark.cpp is special case because it might be zeros or filled buffer
                  FftSpec,
                  NULL );

If it is meant to be out of place it should be: [ I think you cannot use this for Benchmark.cpp , we might use zero fill]
              ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) workBuf, // This is the source data, this is not overwritten
                  ( Ipp32fc * ) out_buf, // This is some other Buffer destination
                                                 // no memcpy required
                  FftSpec,
                  NULL );

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version