optimized sources

Forum > Windows

optimized sources

<< < (47/179) > >>

_heinz:

--- Quote from: j_groothu on 28 Oct 2007, 01:22:56 pm ---
questions I have are:
- Why benchmark an array of zeroes ?
- If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?

--- End quote ---
hmm... maybe Alex Kan or Joe has a good answer

Josef W. Segur:

--- Quote from: seti_britta on 28 Oct 2007, 02:02:56 pm ---
--- Quote from: j_groothu on 28 Oct 2007, 01:22:56 pm ---
questions I have are:
- Why benchmark an array of zeroes ?
- If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?

--- End quote ---
hmm... maybe Alex Kan or Joe has a good answer
--- End quote ---

The 2.2B benchmark.cpp source doesn't set pre_test to zero_out anyplace. Setting pre_test = fill_in makes sense for the in place transform so it always works on the same random data, that's not needed for out of place. But the FFT benchmark is timing only, and wasted time at that except in standalone runs with -bench or -verbose, since it is not used to choose a "best" variant. The lunatics.at 2.4 builds don't run the FFT benchmark test, though Crunch3r's 2.4V builds which use IPP FFTs do.

I don't know why Ben Herndon used the out of place form of parameters in the ippsFFTInv_CToC_32fc() calls, but he may have checked the actual code produced and determined that was slightly more efficient.
Joe

Jason G:

--- Quote from: Josef W. Segur on 29 Oct 2007, 10:39:15 am ---I don't know why Ben Herndon used the out of place form of parameters in the ippsFFTInv_CToC_32fc() calls, but he may have checked the actual code produced and determined that was slightly more efficient.
Joe

--- End quote ---
I wracked my brain about this, and ultimately came to a similar (though more convoluted and speculative) conclusion. It would make sense to me if an explicit out of place call could make better use of the prefetch, cache and paralellism mechanisms we have discussed in a different context. An explicit in place call could not, (so far as I can see for now, through read write dependancies).

After considering that, another possibility presented itself:
for the same reasons, as originally presented the memcopy followed by the out of place form call (with inplace parameters), may simply be faster than 'true out of place' way we're playing with ::). If so, I suspect a 'cache doubling effect' from using same source & dest.

The flipside is that if that effect shows verifiably then it might even indicate the particular calls are not using streaming writes to start with... possibly bringing your hybridised codelet phased processing screaming to a new sense of urgency.

More speculation than hard data at the moment, I'll think about some small simple external tests for a while and stew on it for a couple of weeks ;)

Jason

_heinz:

--- Quote from: j_groothu on 28 Oct 2007, 01:22:56 pm ---ahah I see.... now that IPP call is "In Place" You can do this:

...
if(pre_test == zero_out)
{
ippsFFTInv_CToC_32fc(
// ( Ipp32fc * ) out_buf, // Commented out this to make it inplace
( Ipp32fc * ) out_buf, // This is both source and destination
FftSpec,
NULL );
}

--- End quote ---
if we do this we get a error message ---->
.\benchmark.cpp(634) : error C2660: 'w7_ippsFFTInv_CToC_32fc' : function does not take 3 arguments
also let it so as it is --->
            if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            }
--------------------------------------------
so it compiles
heinz

Jason G:

--- Quote from: seti_britta on 01 Nov 2007, 05:13:26 pm ---so it compiles
heinz

--- End quote ---

Yes, as we have discovered before I must need my eyes checked ;D and it would make sense , if it was ever used in the zero fill context, to leave it using the same form as might occur in a real analysis anyway.

For the sakes of information - Here is the form for out of place Inverse FFT (as exists):
IppStatus ippsFFTInv_CToC_32fc(
const Ipp32fc* pSrc,
Ipp32fc* pDst, const
IppsFFTSpec_C_32fc* pFFTSpec,
Ipp8u* pBuffer);

And Here is the form for in place :
IppStatus ippsFFTInv_CToC_32fc_I(
Ipp32fc* pSrcDst,
const IppsFFTSpec_C_32fc* pFFTSpec,
Ipp8u* pBuffer);

I am currently learning much about what is connected to what by trying to separate out the benchmark (for exploratory purposes). Piece by piece it connects to almost the whole codebase, Still a few external references to track down, but I may end up with a stripped down custom testbed for examining function of different algorithms, libraries & optimised functions.

The main reason for this unnecessary but educational exploration is, I may wish to try and see actual differences between the FFT libraries, different compilers and flags, without touching my main copy of the code anymore. Also I am interested to see how close to ideal the forward and inverse transforms are when a 'Maximum Length Sequence' is applied as input, rather than zeroes or random data (I hope I'll get a constant power spectrum, with no spikes etc...We''ll See :D )

Jason

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version