Forum > Windows

optimized sources

<< < (88/179) > >>

_heinz:

--- Quote from: Jason G on 28 Nov 2008, 12:30:43 pm ---Ahhh, 6 meg per package ( 1.5 meg per core )... Okay, yep it is 12 meg total for the 8 cores.

Compared 32 bit ICC 10.1 / TBB 2.0 build of fibonacci, and it IS slower than Parallel composer 32 bit build under XP64 ... Will have to try that build under XP32 to confiirm though.  I will probably update all my ICC/IPP base packages as soon as I get time, in a few week.

Jason


--- End quote ---

12 MB per chip
BX80574E5405A Aktivkühler oder für 1-HE-Systeme 45 nm E5405 2,00 GHz (80 W) 1333 12 MB gesamt
we have 2 processors so we have 24MB for 8 Cores

Jason G:
Err well CPU-Z shows only per core then?  In any case:

Hmm, not a lot of Fibonacci difference here, but some: (fastest thread number was 2)

Built under xp32 with ICC 10.1 + TBB (run on XP 32)

--- Quote ---Threads number is 2
Shared serial (mutex)           - in 0.286294 msec
Shared serial (spin_mutex)      - in 0.196978 msec
Shared serial (queuing_mutex)   - in 0.301214 msec
Shared serial (Conc.HashTable)  - in 4.313505 msec
Parallel while+for/queue        - in 1.485761 msec
Parallel pipe/queue             - in 1.980293 msec
Parallel reduce                 - in 0.523162 msec
Parallel scan                   - in 0.338611 msec
Parallel tasks                  - in 0.566134 msec
--- End quote ---

and Built under XP64 with Parallel Composer Beta Update 2 + TBB 2.0 ( but run on XP 32 also)

--- Quote ---Threads number is 2
Shared serial (mutex)           - in 0.279819 msec
Shared serial (spin_mutex)      - in 0.208223 msec
Shared serial (queuing_mutex)   - in 0.284642 msec
Shared serial (Conc.HashTable)  - in 4.461598 msec
Parallel while+for/queue        - in 1.718736 msec
Parallel pipe/queue             - in 2.188073 msec
Parallel reduce                 - in 0.571781 msec
Parallel scan                   - in 0.357319 msec
Parallel tasks                  - in 0.534837 msec
--- End quote ---

So some things look a bit slower, but I will carefully consider shifting to ICC 11 soon, and check how our projects of interest compare.

_heinz:
How many number let you generate ? 1000 ?

Jason G:
No, just used default which was 100... will try 1000

[Later:]  Fastest 32 bit run built on XP32 ICC10.1 / TBB2.0 now 3 threads  :o:

--- Quote ---Threads number is 3
Shared serial (mutex)           - in 162.014407 msec
Shared serial (spin_mutex)      - in 11.609819 msec
Shared serial (queuing_mutex)   - in 50.960339 msec
Shared serial (Conc.HashTable)  - in 401.327768 msec
Parallel while+for/queue        - in 93.399315 msec
Parallel pipe/queue             - in 164.994829 msec
Parallel reduce                 - in 27.500117 msec
Parallel scan                   - in 22.918168 msec
Parallel tasks                  - in 25.904447 msec
--- End quote ---

Getting parallel composer build data:

--- Quote ---Threads number is 3
Shared serial (mutex)           - in 76.449678 msec
Shared serial (spin_mutex)      - in 13.449323 msec
Shared serial (queuing_mutex)   - in 50.961819 msec
Shared serial (Conc.HashTable)  - in 413.186277 msec
Parallel while+for/queue        - in 93.995606 msec
Parallel pipe/queue             - in 171.541281 msec
Parallel reduce                 - in 28.647254 msec
Parallel scan                   - in 27.231642 msec
Parallel tasks                  - in 24.389762 msec
--- End quote ---


_heinz:

--- Quote from: Jason G on 28 Nov 2008, 01:00:42 pm ---No, just used default which was 100... will try 1000

[Later:]  Fastest 32 bit run built on XP32 ICC10.1 / TBB2.0 now 3 threads  :o:

--- Quote ---Threads number is 3

--- End quote ---
Now you know why I choosed 5 .. a not even number
We can create every number of threads 1, 2, 3, 4.. 128, 256, 512 etc.   not even numbers also.
and we can use /QxHOST ---> Best performance on latest features of the processor supported by the compilation host.
 ::)
heinz
--- End quote ---

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version