+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: optimized sources  (Read 623765 times)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #615 on: 18 Sep 2010, 02:01:09 pm »
Hi Jason,
I found the issue.. Because I had already installed(automatic) several security updates for VS2008, I had to run VS2008-PatchRemovalTool-x86 before install VS2008SP1.
After VS200SP1 was sucessful, I installed now Parallel_Nsight_Host_Win32_1.0.10200 (Jul 2010)
and Parallel_Nsight_Monitor_Win32_1.0.10200 (Jul 2010).
I registered and get Standard and Professional License.
Professiona licence expired October 1st, 2010
Professional licence is activated now.
Realy a short periode till October 1st, 2010, we have not a lot time.

Heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #616 on: 22 Sep 2010, 06:02:56 pm »
The pleasure was a short one. As I profiled the app FFT a DOS windows opened and a crash occured. No further using of VS2008 was possible(hardware reset was necessary). Because this happened I deinstalled Nsight and the complete dev-environment. I ordered a 2.5" SEAGATE Momentus XT 500GB 7200.1 32MB to push up the power of R3600 and "Acronis True Image Home 2011" to work in "try and decide" mode. This is necessary to avoid difficulties with the different Compiler-packages(Parallel Studio 2011 and ICC Professional)

heinz 
 

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #617 on: 27 Sep 2010, 09:07:00 pm »
Today I installed ICC(067) and compiled ap rev 443 sucessful on the R3600 Atom.

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #618 on: 28 Sep 2010, 10:22:28 am »
performance of the new hybrid disk on R3600
ST950056_20AS_readtest
FW:SD23
disk added as external via eSATA, case "Revo Alu Guard"
and used as data storage- and backup- system.
Later I will use it to install W7 64Bit on partition1 (180GB)
heinz

Offline cristipurdel

  • Knight o' The Realm
  • **
  • Posts: 123
Re: optimized sources
« Reply #619 on: 28 Sep 2010, 02:52:39 pm »
Question:
Is there an app that shows which application uses the different types of optimization?
For example, when I run seti stock, I want to see what optimization is in the application, and when I run the optimized application from lunatics installer, I want to see the optimizations (e.g. SSSE3, ...)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #620 on: 28 Sep 2010, 04:10:15 pm »
@cristipurdel
as far as I know the Unified Installer v0.37 has a detection mechanism of the CPU, but you can choose your app and the optimization(SSSE3) like yo can see there -->
Unified_Installer_v0.37
perhaps Jason can tell you some more about the mechanism of the installer.
further you can see it if you are looking at the stderr protocoll of the calculated wu's on your host.
heinz

Offline cristipurdel

  • Knight o' The Realm
  • **
  • Posts: 123
Re: optimized sources
« Reply #621 on: 28 Sep 2010, 04:16:17 pm »
Sorry, for not being more precise. I'm interested in a general application that can detect the cpu optimizations. The lunatics installer was just an example.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #622 on: 28 Sep 2010, 04:18:16 pm »
SSE through SSE4.2 aren't 'optimisations', they are instruction sets.  As such there aren't 'usuallly' outward means of determining whether a given application uses certain instructions, though we usually put the maximum instruction set level (SSE Level) as  part of the file name.

Stock Mutibeam uses internal benching/dispatching mechanisms to decide which functions to use.  Typically an AK or BH variant is chosen fom those, and the selected function is noted in the stderr output. Those are mostly SSE.

In many cases the instructions chosen represent microarchitectural optimisation, which is one level of optimisation that applies only to specific hardware.  Most optimisations that provide greatest benefit tend to be algorithmic (general) optimisations and are not dependant on the instructions used.  In those cases there are no outward indications of hardware required.

Differnet instructions from different SSE levels built into the microprocessors may or may not be useful for given code, and in most cases simply telling the compiler to use those instructions doesn't do a very good job (i.e. is niot optimisation!)

If you really want to 'see' what instructions were used, then the most effective means I know of would be to use a debugger that shows some disssassembly of the executable code, which you could then look up the instructions in CPU manufacturer reference materials.  Short of that, looking at the source code if curious is never a bad idea IMO if available. (and quite a bit easier  ;))

Jason
« Last Edit: 28 Sep 2010, 04:21:29 pm by Jason G »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #623 on: 28 Sep 2010, 04:29:19 pm »
Sorry, for not being more precise. I'm interested in a general application that can detect the cpu optimizations. The lunatics installer was just an example.
if you want to see which instruction set(SSE SSE2 SSE3 SSSE3 etc) your cpu support you can use Everest Ultimate
or very easy cpuz
~~~~~~~~
A typical report from Everest looks like this:
Informationsliste   Wert
CPU-Eigenschaften   
CPU Typ   Intel Atom 230, 1600 MHz (12 x 133)
CPU Bezeichnung   Diamondville-SC
CPU stepping   C0
Befehlssatz   x86, x86-64, MMX, SSE, SSE2, SSE3, SSSE3
Vorgesehene Taktung   1600 MHz
Min / Max CPU Multiplikator   6x / 12x
Engineering Sample   Nein
L1 Code Cache   32 KB
L1 Datencache   24 KB
L2 Cache   512 KB  (On-Die, ECC, ASC, Full-Speed)
   
Multi CPU   
Motherboard ID   nVidia MCP79
CPU #1   Intel(R) Atom(TM) CPU 230 @ 1.60GHz, 1600 MHz
CPU #2   Intel(R) Atom(TM) CPU 230 @ 1.60GHz, 1600 MHz
   
CPU Technische Informationen   
Gehäusetyp   437 Ball FC-BGA
Gehäusegröße   2.2 cm x 2.2 cm
Transistoren   47 Mio.
Fertigungstechnologie   45 nm, CMOS, Cu, High-K + Metal Gate
Gehäusefläche   25 mm2
Typische Leistung   4 W @ 1.60 GHz
   
CPU Hersteller   
Firmenname   Intel Corporation
Produktinformation   http://www.intel.com/products/processor
   
CPU Auslastung   
1. CPU / 1. HTT Einheit   0 %
1. CPU / 2. HTT Einheit   0 %


heinz

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: optimized sources
« Reply #624 on: 28 Sep 2010, 04:55:03 pm »
Debugger as Jason said + profiler like VTune or Code Analyst. They will show "optimization level" in some performance terms and actually intended to be used for "optimization level" assessment.

Offline cristipurdel

  • Knight o' The Realm
  • **
  • Posts: 123
Re: optimized sources
« Reply #625 on: 28 Sep 2010, 05:12:13 pm »

Differnet instructions from different SSE levels built into the microprocessors may or may not be useful for given code, and in most cases simply telling the compiler to use those instructions doesn't do a very good job (i.e. is niot optimisation!)

Jason
I saw that some programs require Intel MKL to 'enhance' the computing capabilities and better use the 'optimizations' inside the processor. But when I saw this http://www.agner.org/optimize/blog/read.php?i=49#121 I wondered if there were any free version which could 'enhance' the mkl on my cpu, and not cripple the performance on an amd cpu.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #626 on: 28 Sep 2010, 07:22:22 pm »
I saw that some programs require Intel MKL to 'enhance' the computing capabilities and better use the 'optimizations' inside the processor. But when I saw this http://www.agner.org/optimize/blog/read.php?i=49#121 I wondered if there were any free version which could 'enhance' the mkl on my cpu, and not cripple the performance on an amd cpu.

Not this old chestnut again  ::) It's getting rather tired.

The suggestion there is that Intel's MKL library should be optimised for use on AMD CPUs. That's not something I would either expect or need, mostly since we don't use MKL - don't really care.  What should reallly happen is that AMD should write their own compiler & libraries, rather than play dirty marketing tricks to fool the public that don't know about coding, compilers & microarchitecture. 

They (AMD/ATI) have been trying the same garbage against nVidia too, and it fails... because their investment in software development and support for developers in general is very poor compared to both Intel and nVidia.

Agner Fog is a respected expert in CPU performance and criticises certain Intel tactics with their performance libraries.  Those are well established and justified in certain contexts only... namely code that is not hand optimised, and developers use the compilers & libraries without knowing what's going on inside epecting the best performance. These involve dispatch mechanisms we don't use in our builds since they can result in lress than optimal code paths for many CPUs in our target audience.  Intel compilers produce the fastest multbeam builds under windows on AMD chips, provided dynamic dispatch is not used ... There is no 'crippling' going on here... though I  would as always invite anyone to make faster builds for any platform.

Since we don't use Intel compiler's dynamic dispatch mechanisms (which are subject to choosing code based on processor type) ,  the builds do not run a generic px code path for AMD chips, and only have a single code path. 

Optimisation that we do here is less a function of the compiler & more a function of 'hand rolling'.  Expecting a compiler alone, whatever options & libraries are used, to do the best optimisation job is naive.  Agner Fog's Manuals detail several strategies for ensuring the right code is generated in builds here, and of those we use several.  unfortunately even Intel's compilers with the workarounds aplied doesn't magically obercome hardware CPU limitations.

Jason
« Last Edit: 28 Sep 2010, 08:19:33 pm by Jason G »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #627 on: 29 Sep 2010, 05:00:33 pm »
published 09/28/2010
CUDA Toolkit 3.2 RC (September 2010)
New and Improved CUDA Libraries
(now include Fermi architecture GPUs)
Its worth to have a look there
http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #628 on: 04 Oct 2010, 03:53:10 pm »
3.2 is installed now and running


 CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "ION"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 253296640 bytes
  Multiprocessors x Cores/MP = Cores:            2 (MP) x 8 (Cores/MP) = 16 (Cor
es)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.10 GHz
  Concurrent copy and execution:                 No
  Run time limit on kernels:                     Yes
  Integrated:                                    Yes
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads
can use this device simultaneously)
  Concurrent kernel execution:                   No
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Vers
ion = 3.20, NumDevs = 1, Device = ION


PASSED
~~~~~~~~~~~~~~~~
and BOINC shows:
04.10.2010 21:26:28      NVIDIA GPU 0: ION (driver version 26061, CUDA version 3020, compute capability 1.1, 242MB, 35 GFLOPS peak)

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #629 on: 05 Oct 2010, 11:21:15 am »
Cuda 3.20 does not answer our expectations on this ION chipset.
ICC067: with CUDA3020 we have a -3% against Composer update6(CUDA3000)
if we use MKL(parallel) we can reach nearly the same as our reference(CUDA3000)
PS2011:the most speedup +10.31% CUDA3010 Parallel Studio2011
so the best is to wait till CUDA 3.20 is out of the Beta.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
========================
gcomp_u6_fft.exe
AppName: gcomp_u6_fft.exe
Started at  : 16:41:31.760
Ended at    : 16:42:37.436
     65.520 secs Elapsed
     64.584 secs CPU time
------------------------
g067_cuda32_fft.exe
AppName: g067_cuda32_fft.exe
Started at  : 16:42:37.561
Ended at    : 16:43:44.360
     66.659 secs Elapsed
     66.519 secs CPU time
Speedup     : -3.00%
Ratio       : 0.97 x
------------------------
g067_mklp_fft.exe
AppName: g067_mklp_fft.exe
Started at  : 16:43:44.672
Ended at    : 16:44:49.194
     64.381 secs Elapsed
     64.179 secs CPU time
Speedup     : 0.63%
Ratio       : 1.01 x
------------------------
g067_mkls_fft.exe
AppName: g067_mkls_fft.exe
Started at  : 16:44:49.412
Ended at    : 16:45:56.149
     66.596 secs Elapsed
     66.394 secs CPU time
Speedup     : -2.80%
Ratio       : 0.97 x
------------------------
g2011_fft.exe
AppName: g2011_fft.exe
Started at  : 16:45:56.399
Ended at    : 16:46:54.743
     58.219 secs Elapsed
     57.923 secs CPU time
Speedup     : 10.31%
Ratio       : 1.11 x
------------------------
g2011_SSSE3_fft.exe
AppName: g2011_SSSE3_fft.exe
Started at  : 16:46:54.977
Ended at    : 16:47:53.524
     58.422 secs Elapsed
     59.218 secs CPU time
Speedup     : 8.31%
Ratio       : 1.09 x
------------------------
 
Quick timetable
--------------------------------------
gcomp_u6_fft.exe : 64.584 secs CPU
Result      : stored as reference.
--------------------------------------
g067_cuda32_fft.exe : 66.519 secs CPU
Speedup     : -3.00%
Ratio       : 0.97 x
--------------------------------------
g067_mklp_fft.exe : 64.179 secs CPU
Speedup     : 0.63%
Ratio       : 1.01 x
--------------------------------------
g067_mkls_fft.exe : 66.394 secs CPU
Speedup     : -2.80%
Ratio       : 0.97 x
--------------------------------------
g2011_fft.exe : 57.923 secs CPU
Speedup     : 10.31%
Ratio       : 1.11 x
--------------------------------------
g2011_SSSE3_fft.exe : 59.218 secs CPU
Speedup     : 8.31%
Ratio       : 1.09 x
--------------------------------------
 
 
------------------------
CPU:
Number of processors   1
Number of cores      1 (max 1)
Specification      Intel(R) Atom(TM) CPU  230   @ 1.60GHz (Engineering Sample)
Codename      Silverthorne
Core Speed      1600.1 MHz (12.0 x 133.3 MHz)
Core Stepping      C0
Technology      45 nm
Stock frequency      1666 MHz
------------------------
Chipset:
Northbridge      NVIDIA ID0A82 rev. B1
Southbridge      NVIDIA ID0AAD rev. B2
------------------------
RAM:
Memory Type      
Memory Size      1792 MBytes
------------------------
OS:
Windows Version      Microsoft Windows Vista (6.1) Home Premium Edition   (Build 7600)
========================
heinz

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 208
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 4
Total: 4
Powered by EzPortal