Forum > Windows

optimized sources

<< < (84/179) > >>

Jason G:

--- Quote from: _heinz on 27 Nov 2008, 06:09:38 pm ---
--- Quote from: Jason G on 27 Nov 2008, 07:34:50 am ---Thanks Heinz,
   Could you let me know:
   - Current CPU speed at time of test
   - Cache sizes per package
   - Bus speed

--- End quote ---
CPU speed 2398 MHz
FSB speed 400(QP) 1600
Cache sizes per package ... I must look up ( where can I find in the source ? )
ahh.. cpu package.. 12 MB

--- End quote ---

Thanks again, looks like my single thread estimates come good for your parameters:  Could you try a comparison run to this bench I compiled? (attached) Still Single threaded, but will make sure we have reference for future numbers.

same parameter usage: benchf_sse_icc  -opatient [same FFT lengths as before]

Jason



[attachment deleted by admin]

_heinz:
fftw-3.1.2 benchf_sse_icc(jason) started
benchf_sse_icc.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32
768 131072
Problem: 8, setup: 273.78 us, time: 49.65 ns, ``mflops'': 2416.8
Problem: 16, setup: 262.88 us, time: 98.21 ns, ``mflops'': 3258.2
Problem: 32, setup: 7.68 ms, time: 117.86 ns, ``mflops'': 6787.9
Problem: 64, setup: 26.83 ms, time: 222.62 ns, ``mflops'': 8624.6
Problem: 128, setup: 61.58 ms, time: 429.96 ns, ``mflops'': 10420
Problem: 256, setup: 124.30 ms, time: 925.40 ns, ``mflops'': 11066
Problem: 512, setup: 235.98 ms, time: 2.13 us, ``mflops'': 10816
Problem: 1024, setup: 401.79 ms, time: 4.50 us, ``mflops'': 11366
Problem: 2048, setup: 710.67 ms, time: 11.17 us, ``mflops'': 10080
Problem: 4096, setup: 1.39 s, time: 27.94 us, ``mflops'': 8797.1
Problem: 8192, setup: 3.08 s, time: 60.62 us, ``mflops'': 8783.6
Problem: 16384, setup: 6.91 s, time: 134.93 us, ``mflops'': 8499.6
Problem: 32768, setup: 15.86 s, time: 289.70 us, ``mflops'': 8483.2
Problem: 131072, setup: 86.42 s, time: 1.39 ms, ``mflops'': 7988.8
fftw-3.1.2 benchf_sse_icc ended.
----------------------------------------------
... great results   ;D
heinz

Jason G:
Huh... now my estimates are way out :o, That places cost of a complex multiply-add pair about 1.5 cycles and half the initial startup latency (now 35nS).  What was your original bench? non-sse floats fftw 3.1.2? (before cost estimate was 10.5 cycles per mul-add & startup latency 60nS).  Must be seeing effect of SSE instruction level parallelism and out-of-order execution hiding some of the latency maybe.

_heinz:

--- Quote from: Jason G on 28 Nov 2008, 04:47:30 am --- What was your original bench? non-sse floats fftw 3.1.2?

--- End quote ---
Configuration: Active(Release float SSE) Platform: Active(Win32)
/I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:prompt
******************************************
/OUT:"..\benchf_sse.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\I\SC\fftw-3.1.2\libfftwf_sse.lib" /MANIFEST /MANIFESTFILE:".\bench___Win32_Release_float_SSE\benchf_sse.exe.intermediate.manifest" /PDB:".\bench___Win32_Release_float/benchf.pdb" /SUBSYSTEM:CONSOLE /MACHINE:X86 /ERRORREPORT:PROMPT ..\libfftwf_sse.lib  kernel32.lib

heinz

Jason G:
ugghhh... ever stranger...same build (except mine with ICC), I guess when they say ICC builds aren't much faster they must mean against GCC builds.  Don't have my MinGW/GCC setup anymore to try that build, and that one managed to strangle my p4 back last year. Maybe I'll have better luck this year with improved hardware.

[In any case, we have some reference FFT speeds for the skulltrail now thanks, next is to come up with something that equals that, that can be more easily scaled to parallel.]

Jason

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version