Ahhh, 6 meg per package ( 1.5 meg per core )... Okay, yep it is 12 meg total for the 8 cores.Compared 32 bit ICC 10.1 / TBB 2.0 build of fibonacci, and it IS slower than Parallel composer 32 bit build under XP64 ... Will have to try that build under XP32 to confiirm though. I will probably update all my ICC/IPP base packages as soon as I get time, in a few week.Jason
Threads number is 2Shared serial (mutex) - in 0.286294 msecShared serial (spin_mutex) - in 0.196978 msecShared serial (queuing_mutex) - in 0.301214 msecShared serial (Conc.HashTable) - in 4.313505 msecParallel while+for/queue - in 1.485761 msecParallel pipe/queue - in 1.980293 msecParallel reduce - in 0.523162 msecParallel scan - in 0.338611 msecParallel tasks - in 0.566134 msec
Threads number is 2Shared serial (mutex) - in 0.279819 msecShared serial (spin_mutex) - in 0.208223 msecShared serial (queuing_mutex) - in 0.284642 msecShared serial (Conc.HashTable) - in 4.461598 msecParallel while+for/queue - in 1.718736 msecParallel pipe/queue - in 2.188073 msecParallel reduce - in 0.571781 msecParallel scan - in 0.357319 msecParallel tasks - in 0.534837 msec
Threads number is 3Shared serial (mutex) - in 162.014407 msecShared serial (spin_mutex) - in 11.609819 msecShared serial (queuing_mutex) - in 50.960339 msecShared serial (Conc.HashTable) - in 401.327768 msecParallel while+for/queue - in 93.399315 msecParallel pipe/queue - in 164.994829 msecParallel reduce - in 27.500117 msecParallel scan - in 22.918168 msecParallel tasks - in 25.904447 msec
Threads number is 3Shared serial (mutex) - in 76.449678 msecShared serial (spin_mutex) - in 13.449323 msecShared serial (queuing_mutex) - in 50.961819 msecShared serial (Conc.HashTable) - in 413.186277 msecParallel while+for/queue - in 93.995606 msecParallel pipe/queue - in 171.541281 msecParallel reduce - in 28.647254 msecParallel scan - in 27.231642 msecParallel tasks - in 24.389762 msec
No, just used default which was 100... will try 1000[Later:] Fastest 32 bit run built on XP32 ICC10.1 / TBB2.0 now 3 threads :QuoteThreads number is 3Now you know why I choosed 5 .. a not even numberWe can create every number of threads 1, 2, 3, 4.. 128, 256, 512 etc. not even numbers also.and we can use /QxHOST ---> Best performance on latest features of the processor supported by the compilation host. heinz
Threads number is 3
Quote from: Jason G on 26 Nov 2008, 11:44:44 am@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail? Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.Jasoncompiled the fftw project (single thread) as 32 bit /I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:promptResults:C:\Windows\system32>echo offfftw-3.1.2 benchfsse(VS2005) startedbenchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768131072Problem: 8, setup: 300.32 us, time: 169.69 ns, ``mflops'': 707.16Problem: 16, setup: 288.86 us, time: 332.84 ns, ``mflops'': 961.43Problem: 32, setup: 7.91 ms, time: 726.79 ns, ``mflops'': 1100.7Problem: 64, setup: 27.46 ms, time: 1.67 us, ``mflops'': 1148.4Problem: 128, setup: 62.98 ms, time: 4.19 us, ``mflops'': 1069.1Problem: 256, setup: 137.48 ms, time: 9.18 us, ``mflops'': 1115Problem: 512, setup: 267.80 ms, time: 20.95 us, ``mflops'': 1099.6Problem: 1024, setup: 575.47 ms, time: 46.10 us, ``mflops'': 1110.7Problem: 2048, setup: 1.37 s, time: 99.17 us, ``mflops'': 1135.8Problem: 4096, setup: 3.42 s, time: 220.42 us, ``mflops'': 1115Problem: 8192, setup: 8.83 s, time: 530.79 us, ``mflops'': 1003.2Problem: 16384, setup: 21.99 s, time: 1.13 ms, ``mflops'': 1014.9Problem: 32768, setup: 53.80 s, time: 2.41 ms, ``mflops'': 1020Problem: 131072, setup: 369.12 s, time: 9.89 ms, ``mflops'': 1126fftw-3.1.2 benchfsse ended.Drücken Sie eine beliebige Taste . . .----------------------------------------------------------------------------------------------------For the threaded variants I must first read doku again...Did you mean this ? or if you want some other Compiler options let me know..If I have installed the Intel® Parallel Composer Beta, I will recompile the project...regards heinz
@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail? Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.Jason
Quote from: _heinz on 26 Nov 2008, 07:24:20 pmQuote from: Jason G on 26 Nov 2008, 11:44:44 am@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail? Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.Jasoncompiled the fftw project (single thread) as 32 bit /I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:promptResults:C:\Windows\system32>echo offfftw-3.1.2 benchfsse(VS2005) startedbenchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768131072Problem: 8, setup: 300.32 us, time: 169.69 ns, ``mflops'': 707.16Problem: 16, setup: 288.86 us, time: 332.84 ns, ``mflops'': 961.43Problem: 32, setup: 7.91 ms, time: 726.79 ns, ``mflops'': 1100.7Problem: 64, setup: 27.46 ms, time: 1.67 us, ``mflops'': 1148.4Problem: 128, setup: 62.98 ms, time: 4.19 us, ``mflops'': 1069.1Problem: 256, setup: 137.48 ms, time: 9.18 us, ``mflops'': 1115Problem: 512, setup: 267.80 ms, time: 20.95 us, ``mflops'': 1099.6Problem: 1024, setup: 575.47 ms, time: 46.10 us, ``mflops'': 1110.7Problem: 2048, setup: 1.37 s, time: 99.17 us, ``mflops'': 1135.8Problem: 4096, setup: 3.42 s, time: 220.42 us, ``mflops'': 1115Problem: 8192, setup: 8.83 s, time: 530.79 us, ``mflops'': 1003.2Problem: 16384, setup: 21.99 s, time: 1.13 ms, ``mflops'': 1014.9Problem: 32768, setup: 53.80 s, time: 2.41 ms, ``mflops'': 1020Problem: 131072, setup: 369.12 s, time: 9.89 ms, ``mflops'': 1126fftw-3.1.2 benchfsse ended.Drücken Sie eine beliebige Taste . . .----------------------------------------------------------------------------------------------------For the threaded variants I must first read doku again...Did you mean this ? or if you want some other Compiler options let me know..If I have installed the Intel® Parallel Composer Beta, I will recompile the project...regards heinzsample above compiled with MSC-CompilerC:\Windows\system32>echo offcompiled with Parallel Composer Configuration(Release float SSE) Platform(Win32)fftw-3.1.2 benchf_sse startedbenchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768131072Problem: 8, setup: 241.93 us, time: 49.93 ns, ``mflops'': 2403.6Problem: 16, setup: 276.57 us, time: 94.39 ns, ``mflops'': 3390Problem: 32, setup: 7.91 ms, time: 117.86 ns, ``mflops'': 6787.9Problem: 64, setup: 26.76 ms, time: 219.35 ns, ``mflops'': 8753.3Problem: 128, setup: 61.71 ms, time: 447.42 ns, ``mflops'': 10013Problem: 256, setup: 124.16 ms, time: 855.56 ns, ``mflops'': 11969Problem: 512, setup: 238.18 ms, time: 1.99 us, ``mflops'': 11575Problem: 1024, setup: 403.56 ms, time: 4.47 us, ``mflops'': 11455Problem: 2048, setup: 719.56 ms, time: 10.62 us, ``mflops'': 10611Problem: 4096, setup: 1.41 s, time: 25.84 us, ``mflops'': 9510.4Problem: 8192, setup: 3.14 s, time: 58.67 us, ``mflops'': 9076.4Problem: 16384, setup: 7.01 s, time: 125.16 us, ``mflops'': 9163.6Problem: 32768, setup: 16.08 s, time: 279.92 us, ``mflops'': 8779.5Problem: 131072, setup: 87.35 s, time: 1.29 ms, ``mflops'': 8658.3fftw-3.1.2 benchf_sse ended.with 128K 8658,3 mflopsbest relation ~1:10let's everybody make his own thoughts..heinz
you gotta be carefull with fftw and which compiler to use. From my own experience the pre-packaged gcc builds where always faster than the icc compiled code !