+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: optimized sources  (Read 615791 times)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #420 on: 28 Nov 2008, 09:29:20 am »
I tried to compile the fftw project with Parallel Composer but get error... it will have a 64bit project, but this is win32.
Looks like I must additional install the Composer for 32bit..
Please have a look at the resultfiles from the fibonacci project, it is 64 bit compiled with Composer

heinz

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #421 on: 28 Nov 2008, 09:35:47 am »
Hi Heinz, Why did it choose 5 threads instead of 8 ?

Quote
Threads number is 5
test before that is 4, next is 16, so it seems a bit weird.  I have some fibonnaci example project I did here with TBB a while ago.  Will see if I can dig it out.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #422 on: 28 Nov 2008, 09:46:15 am »
Hi Heinz, Why did it choose 5 threads instead of 8 ?

Quote
Threads number is 5
test before that is 4, next is 16, so it seems a bit weird.  I have some fibonnaci example project I did here with TBB a while ago.  Will see if I can dig it out.

5:8   should be 5(1) to 8, hmmm 1:4 did work as you can see
echo off
echo please wait
echo fibonacci result in fibonacciopt_1000_out.txt
rem no second parameter means standard(1:4)
fibonacci.exe 1000 >fibonacciopt_1000_out.txt
echo fibonacci 1000 5:8
fibonacci.exe 1000 5:8 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 16
fibonacci.exe 1000 16 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 32
fibonacci.exe 1000 32 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 64
fibonacci.exe 1000 64 >>fibonacciopt_1000_out.txt
echo ready

you can compare with the other result file to see differences
heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #423 on: 28 Nov 2008, 10:08:00 am »
Hi Heinz, Why did it choose 5 threads instead of 8 ?

Quote
Threads number is 5
test before that is 4, next is 16, so it seems a bit weird.  I have some fibonnaci example project I did here with TBB a while ago.  Will see if I can dig it out.

5:8   should be 5(1) to 8, hmmm 1:4 did work as you can see
echo off
echo please wait
echo fibonacci result in fibonacciopt_1000_out.txt
rem no second parameter means standard(1:4)
fibonacci.exe 1000 >fibonacciopt_1000_out.txt
echo fibonacci 1000 5:8
fibonacci.exe 1000 5:8 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 16
fibonacci.exe 1000 16 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 32
fibonacci.exe 1000 32 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 64
fibonacci.exe 1000 64 >>fibonacciopt_1000_out.txt
echo ready

you can compare with the other result file to see differences
heinz
looked up.. this works correct with one line
fibonacci.exe 1000 1:64 >fibonacciopt_1000_out.txt

[attachment deleted by admin]
« Last Edit: 28 Nov 2008, 11:09:28 am by _heinz »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #424 on: 28 Nov 2008, 10:22:49 am »
Downloading Parallel Composer beta now, for cooperative exploration & development.  If we can do something with FFT & FFA that would be good IMO for astropulse,  but there are strong possibilities for Multibeam softawre as well (maybe more, because it has a higher degree of serial optimisation already)... Will See.... This Beta software better not mess up my ICC/IPP installation!  :o

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #425 on: 28 Nov 2008, 10:48:26 am »
Change of Plan:
Quote
  Intel(R) C++ Compiler 10.1 Integration(s) in Microsoft Visual Studio* is already installed.
Installation can continue; however, you will not be able to use the Intel C++ Compiler 10.1 or 9.0 within the Visual Studio IDE

So I will switch to my 64 bit boot drive, and install both 32 & 64 bit parallel composer there instead.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #426 on: 28 Nov 2008, 11:16:22 am »
Heinz, have 64 & 32 bit Parallel composer beta (update 2) installed .... Where can I find the fibonacci sample?  (the stuff I see here looks more boring)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #427 on: 28 Nov 2008, 11:23:20 am »
Heinz, have 64 & 32 bit Parallel composer beta (update 2) installed .... Where can I find the fibonacci sample?  (the stuff I see here looks more boring)

fibonacci is part of TBB
If TBB is installed you have this sample
it is in C:\I\SC\ITBB\examples\test_all\fibonacci\vc8
analog what you choose as installdir

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #428 on: 28 Nov 2008, 11:24:13 am »
Hmmm, I have TBB on my other (32-bit) drive ... maybe I can install it here, will try.

[Hmmm, so parallel composer doesn't actually have TBB in it then ....  ??? that seems a bit odd, maybe they expect you'll use pure openmp.. what about IPP, I suppose that's not there either which would make this ICC 11 ?]

32 & 64 bit fibonacci sample built & ran.  Will consider fully migrating to 64 bit platform for holidays in a few weeks.  It'll be painful, but about time probably.

Jason
« Last Edit: 28 Nov 2008, 11:46:52 am by Jason G »

Leaps-from-Shadows

  • Guest
Re: optimized sources
« Reply #429 on: 28 Nov 2008, 11:49:09 am »
Quote
ahh.. cpu package.. 12 MB
This is wrong, unless you have a not-yet-released version of the Nehalem processor.

Currently released versions have 32k L1 instruction cache per core, 32k L1 data cache per core, 256k L2 cache per core, and 8MB shared L3 cache.  They only have four physical cores, so that's 128k L1 instruction cache total, 128k L1 data cache total, 1MB L2 cache total, and 8MB L3 cache.  The four HT virtual cores aren't physical cores, so they don't have cache of their own.

I don't know how much difference this would make, but I thought I'd point it out anyway...
« Last Edit: 28 Nov 2008, 11:51:38 am by Leaps-from-Shadows »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #430 on: 28 Nov 2008, 11:52:39 am »
Hi Leaps! .. Nahhh .. It's 2xXeon Quads on a Skultrail Mobo  :) [Heinz, Please check cache size with CPU-Z]

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #431 on: 28 Nov 2008, 12:01:02 pm »
Hmmm, I have TBB on my other (32-bit) drive ... maybe I can install it here, will try.

[Hmmm, so parallel composer doesn't actually have TBB in it then ....  ??? that seems a bit odd, maybe they expect you'll use pure openmp.. what about IPP, I suppose that's not there either which would make this ICC 11 ?]

32 & 64 bit fibonacci sample built & ran.  Will consider fully migrating to 64 bit platform for holidays in a few weeks.  It'll be painful, but about time probably.

Jason

I did upload the fibonacci project to our testproject
it is in /users/heinz/heinz_projects/fibonacci/vc8

As far as I have seen IPP will be used if it is installed...but must read doku to manifest it..
Looked up now: --->
Vectorization and Loop Optimization
Vectorization detects patterns of sequential data accesses by the same instruction and transforms the code for SIMD execution, including use of the SSE, SSE2, SSE3, SSSE3, and SSE4 instruction sets.
heinz
did you see the speedup between the easy TBB and the composer with TBB. my testresultfiles are up.  ;D
heinz
« Last Edit: 28 Nov 2008, 12:28:25 pm by _heinz »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #432 on: 28 Nov 2008, 12:03:35 pm »
@heinz: I didn't check yet ... wait up.. this is fun... will compare 32 bit Fibonacci here.

@Leaps: Will PM shortly about something

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #433 on: 28 Nov 2008, 12:20:50 pm »
Hi Leaps! .. Nahhh .. It's 2xXeon Quads on a Skultrail Mobo  :) [Heinz, Please check cache size with CPU-Z]
there you can see CPUID
and here you can see CPUZ
« Last Edit: 28 Nov 2008, 12:36:31 pm by _heinz »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #434 on: 28 Nov 2008, 12:30:43 pm »
Ahhh, 6 meg per package ( 1.5 meg per core )... Okay, yep it is 12 meg total for the 8 cores.

Compared 32 bit ICC 10.1 / TBB 2.0 build of fibonacci, and it IS slower than Parallel composer 32 bit build under XP64 ... Will have to try that build under XP32 to confiirm though.  I will probably update all my ICC/IPP base packages as soon as I get time, in a few week.

Jason

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 29
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 17
Total: 17
Powered by EzPortal