Forum > Windows
optimized sources
Jason G:
This block compiles on mine: (For comparison, I can see no major functional difference to yours :D )
----------
CurrentSub = fftlen * (ifft + iC);
sah_complex *WorkArea = &WorkData[iC * fftlen / 2]; // assume sah_complex 2 floats
#if !(defined(USE_IPP) | defined(USE_FFTWF)) // makes ,memcpy inactive
memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
#endif
#if defined( USE_IPP )
ippsFFTInv_CToC_32fc(
( Ipp32fc * ) &ChirpedData[CurrentSub], // Source
( Ipp32fc * ) WorkArea, //Destination
FftSpec[FftNum],
FftBuf );
#elif defined( USE_FFTWF )
fftwf_execute_dft( analysis_plans[FftNum], &ChirpedData[CurrentSub], WorkArea );
#else // replace time with freq - ooura FFT
cdft( fftlen * 2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] );
#endif
----------
I did notice it went haywire if I missed out a ( Ipp32fc * ) typecast.
_heinz:
yes it compiles mine too --->
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 0 warning(s)
----------------------------------------------------------------------------------------
heinz
Jason G:
Ahh good one ;D, I'm thinking that this
new way:
--- Using no memcopy
--- Using IPP Function as intended
is better than the old way:
--- Using a memcopy (even an optimised one, which I was looking at)
--- Using IPP function in a wierd way
of course only a test can show if this has any speed difference. Be a while before I could look at a rebuild as I have more schoolwork and have to give some tutoring this week . Even if it is slower I don't mind because it still has helped me to understand a small piece more of the code. The next step for me after testing this would probably be to look at Joe's even better suggestions, There are many now!.
Thanks for trying this and keep plugging away !
Back later in the week!
Jason
_heinz:
changed benchmark.cpp ----->
--------------------------------------------------------------------------------------------------------
for(loops = 0; loops < 25 && (end_cyc-total_run)< MAX_CYCLES; loops++)
{
if(pre_test == zero_out) memset( out_buf, 0, test_size );
if(pre_test == fill_in) memcpy( out_buf, workBuf, test_size );
ramming_speed();
cycles = cycleCount();
switch ( bench_list[idx].token )
{
case _FFT:
#if defined( USE_IPP )
if(pre_test == zero_out)
{
ippsFFTInv_CToC_32fc(
( Ipp32fc * ) out_buf,
( Ipp32fc * ) out_buf,
FftSpec,
NULL );
}
else
{
ippsFFTInv_CToC_32fc(
( Ipp32fc * ) workBuf, // This is the source data, this is not overwritten
( Ipp32fc * ) out_buf, // This is some other Buffer destination
// no memcpy required
FftSpec,
NULL );
}
#endif //seti_britta:
#if defined( USE_FFTWF )
fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );
#endif
break;
-----------------------------------------------------------------------------------------------------------------------------
it compiles well --->
benchmark.cpp
-----IPP-----
-----SSE2-----
-----ipp-----
-----sse2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
-------------------------------------------------------------------------------------------------------------------------------
will try this an look if it works well....
see you again here
regards heinz
Jason G:
ahah I see.... now that IPP call is "In Place" You can do this:
...
if(pre_test == zero_out)
{
ippsFFTInv_CToC_32fc(
// ( Ipp32fc * ) out_buf, // Commented out this to make it inplace
( Ipp32fc * ) out_buf, // This is both source and destination
FftSpec,
NULL );
}
...
Whether it makes any difference is another question :D
questions I have are:
- Why benchmark an array of zeroes ?
- If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version