Forum > Windows
optimized sources
			_heinz:
			
			
--- Quote from: Jason G on 28 Nov 2008, 12:30:43 pm ---Ahhh, 6 meg per package ( 1.5 meg per core )... Okay, yep it is 12 meg total for the 8 cores.
Compared 32 bit ICC 10.1 / TBB 2.0 build of fibonacci, and it IS slower than Parallel composer 32 bit build under XP64 ... Will have to try that build under XP32 to confiirm though.  I will probably update all my ICC/IPP base packages as soon as I get time, in a few week.
Jason
--- End quote ---
12 MB per chip
BX80574E5405A Aktivkühler oder für 1-HE-Systeme 45 nm E5405 2,00 GHz (80 W) 1333 12 MB gesamt 
we have 2 processors so we have 24MB for 8 Cores
		
			Jason G:
			
			Err well CPU-Z shows only per core then?  In any case:
Hmm, not a lot of Fibonacci difference here, but some: (fastest thread number was 2)
Built under xp32 with ICC 10.1 + TBB (run on XP 32)
--- Quote ---Threads number is 2
Shared serial (mutex)           - in 0.286294 msec
Shared serial (spin_mutex)      - in 0.196978 msec
Shared serial (queuing_mutex)   - in 0.301214 msec
Shared serial (Conc.HashTable)  - in 4.313505 msec
Parallel while+for/queue        - in 1.485761 msec
Parallel pipe/queue             - in 1.980293 msec
Parallel reduce                 - in 0.523162 msec
Parallel scan                   - in 0.338611 msec
Parallel tasks                  - in 0.566134 msec
--- End quote ---
and Built under XP64 with Parallel Composer Beta Update 2 + TBB 2.0 ( but run on XP 32 also)
--- Quote ---Threads number is 2
Shared serial (mutex)           - in 0.279819 msec
Shared serial (spin_mutex)      - in 0.208223 msec
Shared serial (queuing_mutex)   - in 0.284642 msec
Shared serial (Conc.HashTable)  - in 4.461598 msec
Parallel while+for/queue        - in 1.718736 msec
Parallel pipe/queue             - in 2.188073 msec
Parallel reduce                 - in 0.571781 msec
Parallel scan                   - in 0.357319 msec
Parallel tasks                  - in 0.534837 msec
--- End quote ---
So some things look a bit slower, but I will carefully consider shifting to ICC 11 soon, and check how our projects of interest compare.
		
			_heinz:
			
			How many number let you generate ? 1000 ?
		
			Jason G:
			
			No, just used default which was 100... will try 1000
[Later:]  Fastest 32 bit run built on XP32 ICC10.1 / TBB2.0 now 3 threads  :o:
--- Quote ---Threads number is 3
Shared serial (mutex)           - in 162.014407 msec
Shared serial (spin_mutex)      - in 11.609819 msec
Shared serial (queuing_mutex)   - in 50.960339 msec
Shared serial (Conc.HashTable)  - in 401.327768 msec
Parallel while+for/queue        - in 93.399315 msec
Parallel pipe/queue             - in 164.994829 msec
Parallel reduce                 - in 27.500117 msec
Parallel scan                   - in 22.918168 msec
Parallel tasks                  - in 25.904447 msec
--- End quote ---
Getting parallel composer build data:
--- Quote ---Threads number is 3
Shared serial (mutex)           - in 76.449678 msec
Shared serial (spin_mutex)      - in 13.449323 msec
Shared serial (queuing_mutex)   - in 50.961819 msec
Shared serial (Conc.HashTable)  - in 413.186277 msec
Parallel while+for/queue        - in 93.995606 msec
Parallel pipe/queue             - in 171.541281 msec
Parallel reduce                 - in 28.647254 msec
Parallel scan                   - in 27.231642 msec
Parallel tasks                  - in 24.389762 msec
--- End quote ---
		
			_heinz:
			
			
--- Quote from: Jason G on 28 Nov 2008, 01:00:42 pm ---No, just used default which was 100... will try 1000
[Later:]  Fastest 32 bit run built on XP32 ICC10.1 / TBB2.0 now 3 threads  :o:
--- Quote ---Threads number is 3
--- End quote ---
Now you know why I choosed 5 .. a not even number
We can create every number of threads 1, 2, 3, 4.. 128, 256, 512 etc.   not even numbers also.
and we can use /QxHOST ---> Best performance on latest features of the processor supported by the compilation host.
 ::)
heinz
--- End quote ---
		
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version