Forum > Windows

optimized sources

<< < (89/179) > >>

_heinz:

--- Quote from: _heinz on 26 Nov 2008, 07:24:20 pm ---
--- Quote from: Jason G on 26 Nov 2008, 11:44:44 am ---@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail?  Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 

I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.

Jason


--- End quote ---
compiled the fftw project (single thread) as 32 bit
 /I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:prompt

Results:
C:\Windows\system32>echo off
fftw-3.1.2 benchfsse(VS2005) started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 300.32 us, time: 169.69 ns, ``mflops'': 707.16
Problem: 16, setup: 288.86 us, time: 332.84 ns, ``mflops'': 961.43
Problem: 32, setup: 7.91 ms, time: 726.79 ns, ``mflops'': 1100.7
Problem: 64, setup: 27.46 ms, time: 1.67 us, ``mflops'': 1148.4
Problem: 128, setup: 62.98 ms, time: 4.19 us, ``mflops'': 1069.1
Problem: 256, setup: 137.48 ms, time: 9.18 us, ``mflops'': 1115
Problem: 512, setup: 267.80 ms, time: 20.95 us, ``mflops'': 1099.6
Problem: 1024, setup: 575.47 ms, time: 46.10 us, ``mflops'': 1110.7
Problem: 2048, setup: 1.37 s, time: 99.17 us, ``mflops'': 1135.8
Problem: 4096, setup: 3.42 s, time: 220.42 us, ``mflops'': 1115
Problem: 8192, setup: 8.83 s, time: 530.79 us, ``mflops'': 1003.2
Problem: 16384, setup: 21.99 s, time: 1.13 ms, ``mflops'': 1014.9
Problem: 32768, setup: 53.80 s, time: 2.41 ms, ``mflops'': 1020
Problem: 131072, setup: 369.12 s, time: 9.89 ms, ``mflops'': 1126
fftw-3.1.2 benchfsse ended.
Drücken Sie eine beliebige Taste . . .
----------------------------------------------------------------------------------------------------
For the threaded variants I must first read doku again...
Did you mean this ? or if you want some other Compiler options let me know..
If I have installed the Intel® Parallel Composer Beta, I will recompile the project...

regards heinz


--- End quote ---
sample above compiled with MSC-Compiler

C:\Windows\system32>echo off
compiled with Parallel Composer  Configuration(Release float SSE) Platform(Win32)
fftw-3.1.2 benchf_sse started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 241.93 us, time: 49.93 ns, ``mflops'': 2403.6
Problem: 16, setup: 276.57 us, time: 94.39 ns, ``mflops'': 3390
Problem: 32, setup: 7.91 ms, time: 117.86 ns, ``mflops'': 6787.9
Problem: 64, setup: 26.76 ms, time: 219.35 ns, ``mflops'': 8753.3
Problem: 128, setup: 61.71 ms, time: 447.42 ns, ``mflops'': 10013
Problem: 256, setup: 124.16 ms, time: 855.56 ns, ``mflops'': 11969
Problem: 512, setup: 238.18 ms, time: 1.99 us, ``mflops'': 11575
Problem: 1024, setup: 403.56 ms, time: 4.47 us, ``mflops'': 11455
Problem: 2048, setup: 719.56 ms, time: 10.62 us, ``mflops'': 10611
Problem: 4096, setup: 1.41 s, time: 25.84 us, ``mflops'': 9510.4
Problem: 8192, setup: 3.14 s, time: 58.67 us, ``mflops'': 9076.4
Problem: 16384, setup: 7.01 s, time: 125.16 us, ``mflops'': 9163.6
Problem: 32768, setup: 16.08 s, time: 279.92 us, ``mflops'': 8779.5
Problem: 131072, setup: 87.35 s, time: 1.29 ms, ``mflops'': 8658.3
fftw-3.1.2 benchf_sse ended.

with 128K  8658,3 mflops
best relation ~1:10
let's everybody make his own thoughts..
heinz

Jason G:
Ahhh, so FFTW's warnings about MS compiler generating incorrect SSE code for FFTW might be correct.   Good to know.  I'm pretty sure the stock DLL would have been built with GCC/MinGW.

Much better numbers  ;D

_heinz:
Hi Jason,
the new Intel Board is available -->Intel SmackOver DX58SO X58 price 228,58 € in Germany
Produkttyp Motherboard
Formfaktor ATX
Abmessungen (Breite x Tiefe x Höhe) 30.5 cm x 24.4 cm
Chipsatz Intel X58 Express / Intel ICH10R
Multi-Core-Unterstützung 4-Core
Prozessor 0 ( 1 ) - LGA1366 Socket
Kompatible Prozessoren Core i7, Core i7 Extreme
64-Bit-Prozessor-Kompatibilität Eingebaut
RAM 0 MB (installiert) / 16 GB (Max)
Unterstützte RAM-Technologie DDR3 SDRAM
Unterstützte RAM-Integritätsprüfung Nicht-ECC
Storage Controller Serial ATA-300 (RAID)
Konfiguration von USB-Steckplätzen 12 x USB
Konfiguration von Speichersteckplätzen 6 x SATA, 2 x eSATA
Konfiguration von FireWire-Steckplätzen 2 x FireWire
Audioausgang Soundkarte - 7.1 Channel Surround
Netzwerk Netzwerkkarte - Intel 82567LM - Ethernet, Fast Ethernet, Gigabit Ethernet

have a look http://www.kmelektronik.de/
 
heinz

_heinz:
Happy New Year,
the new year started with some strong issues.
short before chrismas the last AP is out now, thanks to all who are involved to make it possible.
1. AP rev69 duration time now 9 - 10 hours , Standard AP need ca 70-90 hours (measured Intel E8600 @3,6 Ghz)
2. we are working on AP, to make it fit for much more parallelism.
3. my test and developer machine AK-V8 suffered by a  bad disk, which I took off today. Now it runs again.
4. some support requests are still open btw ati 8.12 driver, which I need for ati developer environment.
5. our actions will be in the closed forums, so let you surprize from time to time.

heinz
 ;D

Crunch3r:

--- Quote from: _heinz on 28 Nov 2008, 04:02:41 pm ---
--- Quote from: _heinz on 26 Nov 2008, 07:24:20 pm ---
--- Quote from: Jason G on 26 Nov 2008, 11:44:44 am ---@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail?  Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 

I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.

Jason


--- End quote ---
compiled the fftw project (single thread) as 32 bit
 /I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:prompt

Results:
C:\Windows\system32>echo off
fftw-3.1.2 benchfsse(VS2005) started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 300.32 us, time: 169.69 ns, ``mflops'': 707.16
Problem: 16, setup: 288.86 us, time: 332.84 ns, ``mflops'': 961.43
Problem: 32, setup: 7.91 ms, time: 726.79 ns, ``mflops'': 1100.7
Problem: 64, setup: 27.46 ms, time: 1.67 us, ``mflops'': 1148.4
Problem: 128, setup: 62.98 ms, time: 4.19 us, ``mflops'': 1069.1
Problem: 256, setup: 137.48 ms, time: 9.18 us, ``mflops'': 1115
Problem: 512, setup: 267.80 ms, time: 20.95 us, ``mflops'': 1099.6
Problem: 1024, setup: 575.47 ms, time: 46.10 us, ``mflops'': 1110.7
Problem: 2048, setup: 1.37 s, time: 99.17 us, ``mflops'': 1135.8
Problem: 4096, setup: 3.42 s, time: 220.42 us, ``mflops'': 1115
Problem: 8192, setup: 8.83 s, time: 530.79 us, ``mflops'': 1003.2
Problem: 16384, setup: 21.99 s, time: 1.13 ms, ``mflops'': 1014.9
Problem: 32768, setup: 53.80 s, time: 2.41 ms, ``mflops'': 1020
Problem: 131072, setup: 369.12 s, time: 9.89 ms, ``mflops'': 1126
fftw-3.1.2 benchfsse ended.
Drücken Sie eine beliebige Taste . . .
----------------------------------------------------------------------------------------------------
For the threaded variants I must first read doku again...
Did you mean this ? or if you want some other Compiler options let me know..
If I have installed the Intel® Parallel Composer Beta, I will recompile the project...

regards heinz


--- End quote ---
sample above compiled with MSC-Compiler

C:\Windows\system32>echo off
compiled with Parallel Composer  Configuration(Release float SSE) Platform(Win32)
fftw-3.1.2 benchf_sse started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 241.93 us, time: 49.93 ns, ``mflops'': 2403.6
Problem: 16, setup: 276.57 us, time: 94.39 ns, ``mflops'': 3390
Problem: 32, setup: 7.91 ms, time: 117.86 ns, ``mflops'': 6787.9
Problem: 64, setup: 26.76 ms, time: 219.35 ns, ``mflops'': 8753.3
Problem: 128, setup: 61.71 ms, time: 447.42 ns, ``mflops'': 10013
Problem: 256, setup: 124.16 ms, time: 855.56 ns, ``mflops'': 11969
Problem: 512, setup: 238.18 ms, time: 1.99 us, ``mflops'': 11575
Problem: 1024, setup: 403.56 ms, time: 4.47 us, ``mflops'': 11455
Problem: 2048, setup: 719.56 ms, time: 10.62 us, ``mflops'': 10611
Problem: 4096, setup: 1.41 s, time: 25.84 us, ``mflops'': 9510.4
Problem: 8192, setup: 3.14 s, time: 58.67 us, ``mflops'': 9076.4
Problem: 16384, setup: 7.01 s, time: 125.16 us, ``mflops'': 9163.6
Problem: 32768, setup: 16.08 s, time: 279.92 us, ``mflops'': 8779.5
Problem: 131072, setup: 87.35 s, time: 1.29 ms, ``mflops'': 8658.3
fftw-3.1.2 benchf_sse ended.

with 128K  8658,3 mflops
best relation ~1:10
let's everybody make his own thoughts..
heinz

--- End quote ---

you gotta be carefull with fftw and which compiler to use. From my own experience the pre-packaged gcc builds where always faster than the icc compiled code !

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version