Forum > Windows

optimized sources

<< < (83/179) > >>

_heinz:

--- Quote from: Jason G on 26 Nov 2008, 11:44:44 am ---@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail?  Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 

I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.

Jason


--- End quote ---
compiled the fftw project (single thread) as 32 bit
 /I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:prompt

Results:
C:\Windows\system32>echo off
fftw-3.1.2 benchfsse(VS2005) started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 300.32 us, time: 169.69 ns, ``mflops'': 707.16
Problem: 16, setup: 288.86 us, time: 332.84 ns, ``mflops'': 961.43
Problem: 32, setup: 7.91 ms, time: 726.79 ns, ``mflops'': 1100.7
Problem: 64, setup: 27.46 ms, time: 1.67 us, ``mflops'': 1148.4
Problem: 128, setup: 62.98 ms, time: 4.19 us, ``mflops'': 1069.1
Problem: 256, setup: 137.48 ms, time: 9.18 us, ``mflops'': 1115
Problem: 512, setup: 267.80 ms, time: 20.95 us, ``mflops'': 1099.6
Problem: 1024, setup: 575.47 ms, time: 46.10 us, ``mflops'': 1110.7
Problem: 2048, setup: 1.37 s, time: 99.17 us, ``mflops'': 1135.8
Problem: 4096, setup: 3.42 s, time: 220.42 us, ``mflops'': 1115
Problem: 8192, setup: 8.83 s, time: 530.79 us, ``mflops'': 1003.2
Problem: 16384, setup: 21.99 s, time: 1.13 ms, ``mflops'': 1014.9
Problem: 32768, setup: 53.80 s, time: 2.41 ms, ``mflops'': 1020
Problem: 131072, setup: 369.12 s, time: 9.89 ms, ``mflops'': 1126
fftw-3.1.2 benchfsse ended.
Drücken Sie eine beliebige Taste . . .
----------------------------------------------------------------------------------------------------
For the threaded variants I must first read doku again...
Did you mean this ? or if you want some other Compiler options let me know..
If I have installed the Intel® Parallel Composer Beta, I will recompile the project...

regards heinz

Jason G:
Thanks Heinz,
   Could you let me know:
   - Current CPU speed at time of test
   - Cache sizes per package
   - Bus speed

My single core computations are so far within around 10% of your numbers at least, but don't allow for those overheads for large problems, so I factor them into the instruction cost at the moment. 

For multithreaded (eventually)  FFTW i think it would require a different package they have, (alpha?).  In any case the purpose is to refine my textbook efficiency approximations into more practical ones that can be used to assess scalability of parallel FFT algorithms. 

_heinz:

--- Quote from: Jason G on 27 Nov 2008, 07:34:50 am ---Thanks Heinz,
   Could you let me know:
   - Current CPU speed at time of test
   - Cache sizes per package
   - Bus speed

--- End quote ---
CPU speed 2398 MHz
FSB speed 400(QP) 1600
Cache sizes per package ... I must look up ( where can I find in the source ? )
ahh.. cpu package.. 12 MB

Leaps-from-Shadows:
Current Nehalem CPUs (920, 940, 965) have 32k L1 instruction cache per core, 32k L1 data cache per core, 256k L2 cache per core, and 8MB shared L3 cache.

_heinz:
Intel® Parallel Composer Beta is installed and running, but not in the VS2005/2008 Express versions.
>------ Erstellen gestartet: Projekt: fibonacci, Konfiguration: Release x64 ------
1>Compiling with Intel(R) C++ Compiler 11.1.032 [Intel(R) 64]... (Intel C++ Environment)
1>Intel(R) C++ Compiler for applications running on Intel(R) 64, Version 11.1  Beta  Build 20081112 Package ID: composer_beta_update2.032
1>Copyright (C) 1985-2008 Intel Corporation.  All rights reserved.
1>icl /c /I C:\I\INTEL\tbb21_012oss\include -D WIN64 -D NDEBUG -D _CONSOLE -D _MBCS /EHsc /MD /GS /fp:fast /FoC:\Users\heinz\AppData\Local\Temp\tbb_examples\fibonacci\x64\Release/ /W1 /nologo /Qvc9 "/Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" ..\Fibonacci.cpp
1>
1>Fibonacci.cpp
1>Linking... (Intel C++ Environment)
1>xilink: executing 'link'
1>Embedding manifest... (Microsoft VC++ Environment)
1>Copying tbb.dll (Microsoft VC++ Environment)
1>        1 Datei(en) kopiert.
1>Build log was saved at "file://C:\Users\heinz\AppData\Local\Temp\tbb_examples\fibonacci\x64\Release\BuildLog.htm"
1>fibonacci - 0 error(s), 0 warning(s)
========== Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========

I give you 2 results on the hand, both compiled with VS2008, but one with integrated Parallel Composer.
VS2008 TBB --> fibonacci_1000_out.txt
VS2008 TBB Parallel Composer -->fibonacciopt_1000_out.txt
files attached

heinz  ;D

[attachment deleted by admin]

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version