+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: optimized sources  (Read 548921 times)

Gecko_R7

  • Guest
Re: optimized sources
« Reply #540 on: 04 Jan 2010, 06:14:21 pm »
Full Atom run attached w/ result files in 7zip.
Strange that the ATOM switch shows "slower" on the DMH1023 WU.  :-\
Not sure I trust that result.
Also noticed a missing WU.

Gonna re-run.... ::)
 
WU : DMH1023rr_ap_21oc08ab_B2_P0_00081_20081130_08605.dat
ap_5.05r168_SSE3.exe : 2207.679 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 1791.094 secs CPU
Speedup     : 18.87%
Ratio       : 1.23 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1664.967 secs CPU
Speedup     : 24.58%
Ratio       : 1.33 x
 
WU : JasonMediumrr.dat
ap_5.05r168_SSE3.exe : 11137.676 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 6406.134 secs CPU
Speedup     : 42.48%
Ratio       : 1.74 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 6412.452 secs CPU
Speedup     : 42.43%
Ratio       : 1.74 x
 
WU : JasonShortrr.dat
ap_5.05r168_SSE3.exe : 3823.569 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3199.768 secs CPU
Speedup     : 16.31%
Ratio       : 1.19 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 3213.012 secs CPU
Speedup     : 15.97%
Ratio       : 1.19 x
 
WU : Raistmer_tinyrr.dat
ap_5.05r168_SSE3.exe : 941.263 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 802.922 secs CPU
Speedup     : 14.70%
Ratio       : 1.17 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 806.447 secs CPU
Speedup     : 14.32%
Ratio       : 1.17 x
 
WU : sigindrr.dat
ap_5.05r168_SSE3.exe : 5168.859 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3982.191 secs CPU
Speedup     : 22.96%
Ratio       : 1.30 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 4041.658 secs CPU
Speedup     : 21.81%
Ratio       : 1.28 x
Quote

[attachment deleted by admin]
« Last Edit: 04 Jan 2010, 06:31:07 pm by Gecko_R7 »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #541 on: 05 Jan 2010, 03:43:59 am »
...
Strange that the ATOM switch shows "slower" on the DMH1023 WU.  :-\
Not sure I trust that result.
...
Well, A few ideas on that run:
  - DMH1023 is a weird one, with lot's of blanking & early signals IIRC....
  - The ATOM_QOpt build being first run in the Science Apps folder, would likely mean it was generating the FFTW wisdom (which takes time of course), and subsequent builds/runs might have benefited from that once off cost.
 - I'm still trying to work a few things out about the characteristics of the newer ICC optimisations, that mean targetted switches are likely not operative on the hot code regions.  Targetted platform builds ( such as /QxSSE4.1.... ) seem to be performing inferior to generate arch:sse3,  that could indicate a combination of hand optimisations confounding/blocking the compiler automation, and/or a need to adjust Joe's excellent hand SSE code per platform ( of which there are a few fairly straight forward parameters clearly set for P3-P3 at the moment )

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #542 on: 05 Jan 2010, 06:34:20 pm »
R3600 ATOM

100.000 credits today  ;D

First seen on 2009-11-08 06:38:13
Current Credit (based on incremental update) 100,105.20
Recent average credit RAC (projects accumulated) 1,934.37570
mostly crunched collatz on the ION chip
cpu run empty.....

see full statistic of host R3600 6187800

summary we can say it crunches 50000 per month and get a rac of ~2000 running collatz on ION, cpu run empty..
for ~4 days I used the machine and swithed BOINC off

happy crunching  ;D


Gecko_R7

  • Guest
Re: optimized sources
« Reply #543 on: 05 Jan 2010, 07:54:20 pm »
Re-run of Atom N270 results attached.
Summary below.

On this run, the 1LC25 WU was the first one and ATOM_ICC_Qopt was slower.
However, the 08605 WU was next & showed the Atom faster.
On my previous run, the 080605 WU was the first run and it was slower like these results.

There does seem to be a slow-down on the first WU run which makes ATOM_ICC times longer.
So, perhaps Wisdom gen time does have noticebale impact?  :-\

Quote

Quick timetable
 
WU : ap_18se08aa_B6_P1_00046_1LC25.dat
ap_5.05r168_SSE3.exe : 2403.913 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 2163.079 secs CPU
Speedup     : 10.02%
Ratio       : 1.11 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1919.093 secs CPU
Speedup     : 20.17%
Ratio       : 1.25 x
 
WU : DMH1023rr_ap_21oc08ab_B2_P0_00081_20081130_08605.dat
ap_5.05r168_SSE3.exe : 1952.649 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 1671.145 secs CPU
Speedup     : 14.42%
Ratio       : 1.17 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1675.482 secs CPU
Speedup     : 14.19%
Ratio       : 1.17 x
 
WU : JasonMediumrr.dat
ap_5.05r168_SSE3.exe : 13857.850 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 6451.858 secs CPU
Speedup     : 53.44%
Ratio       : 2.15 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 6548.376 secs CPU
Speedup     : 52.75%
Ratio       : 2.12 x
 
WU : JasonShortrr.dat
ap_5.05r168_SSE3.exe : 3752.620 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3227.926 secs CPU
Speedup     : 13.98%
Ratio       : 1.16 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 3236.210 secs CPU
Speedup     : 13.76%
Ratio       : 1.16 x
 
WU : Raistmer_tinyrr.dat
ap_5.05r168_SSE3.exe : 1186.544 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 810.191 secs CPU
Speedup     : 31.72%
Ratio       : 1.46 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 813.795 secs CPU
Speedup     : 31.41%
Ratio       : 1.46 x
 
WU : sigindrr.dat
ap_5.05r168_SSE3.exe : 5153.165 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 4008.071 secs CPU
Speedup     : 22.22%
Ratio       : 1.29 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 4072.968 secs CPU
Speedup     : 20.96%
Ratio       : 1.27 x

[attachment deleted by admin]

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: optimized sources
« Reply #544 on: 05 Jan 2010, 08:01:17 pm »
...
So, perhaps Wisdom gen time does have noticebale impact?  :-\
...
  It certainly can, and is probably the case here.  Some platforms seem to converge quite quickly on wisdom, some take longer.  I reckon it depends on how fftw arranged the heuristics in that initialisation, and to whether it finds the best codelet sequences soon or later in allowed time limits.

To confirm wisdom impact, take a look at the counters in stderr.txt.  The Init component will contain any wisdom generation, while the crunch time is just that.  The additional ffa counter is a subcomponent of crunching that Joe's been doing lot's of work on recently.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #545 on: 18 Jan 2010, 08:15:30 pm »
some more interesting results
Quick timetable

WU : ap_18se08aa_B6_P1_00046_1LC25.wu
ap_5.05r168_SSE3.exe : 718.266 secs CPU
ap_5.05r309_SSSE3_ICC_CSP_QxSSSE3.exe : 357.641 secs CPU
Speedup     : 50.21%
Ratio       : 2.01 x

WU : Raistmer's_tiny.wu
ap_5.05r168_SSE3.exe : 275.047 secs CPU
ap_5.05r309_SSSE3_ICC_CSP_QxSSSE3.exe : 135.047 secs CPU
Speedup     : 50.90%
Ratio       : 2.04 x

WU : sigind_v5.wu
ap_5.05r168_SSE3.exe : 1073.109 secs CPU
ap_5.05r309_SSSE3_ICC_CSP_QxSSSE3.exe : 782.500 secs CPU
Speedup     : 27.08%
Ratio       : 1.37 x
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
some more tests needed to confirm it

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #546 on: 23 Jan 2010, 08:26:26 am »
Among all compiled app's it's sometime difficult to choose the right one for a specific processor.
The question is always:
What's the best option for my processor?

SSE4.2
Intel® Core™ i7 Processors
Intel® Xeon® 55XX series
 
SSE4.1
Intel® Xeon® 74XX series
Quad-Core Intel® Xeon 54XX, 33XX series
Dual-Core Intel® Xeon 52XX, 31XX series
Intel® Core™ 2 Extreme 9XXX series
Intel® Core™ 2 Quad 9XXX series
Intel® Core™ 2 Duo 8XXX series
Intel® Core™ 2 Duo E7200

SSSE3
Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series
Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series
Intel® Core™ 2 Extreme 7XXX, 6XXX series
Intel® Core™ 2 Quad 6XXX series
Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series
Intel® Core™ 2 Solo 2XXX series
Intel® Pentium® dual-core processor E2XXX, T23XX series
 
SSE3_ATOM
Intel® ATOM™ Processor only (not usable for any other Processor)
 
SSE3
Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series
Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16
Dual-Core Intel® Xeon® 2.8
Intel® Xeon® processors with SSE3 instruction set support
Intel® Core™ Duo
Intel® Core™ Solo
Intel® Pentium® dual-core processor T21XX, T20XX series
Intel® Pentium® processor Extreme Edition
Intel® Pentium® D
Intel® Pentium® 4 processors with SSE3 instruction set support

SSE2(default)
Intel® Xeon® processors
Intel® Pentium® 4 processors
Intel® Pentium® M
 
IA32
Intel® Pentium® III Processor
Intel® Pentium® II Processor
Intel® Pentium® Processor

------------------------------------------------------------------
Which processor is targeted by default?

On IA-32 systems running Windows* and Linux*, /arch:SSE2 is on by default.
The resulting code path should run on the Intel Pentium 4 and Intel Xeon processors with SSE2 support
and other later Intel processors or compatible non-Intel processors with SSE2 support.
App's compiled with /arch:IA32 are special builds for the early Pentium® Processors(PIII, PII, Pentium®)

You can run CPUZ to see your processor specific options.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #547 on: 31 Jan 2010, 02:41:46 pm »
By the way, today I got 7 Mio total credit and 2 Mio collatz.   ;D
Current Credit (based on incremental update) 7,028,888.84
--> full statistic

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #548 on: 20 Feb 2010, 07:44:56 pm »
I did run a astropulse wu on my old P4 2,6GHz in about 110874 seconds
my wingman run stock app on a P4 3,2GHz need about 265779 seconds
the wu has 0% blanking !
now everybody can do its own calculation.
~~~~~~~~~~~~~~~~~~~~~~~~~~
thanks to all readers of this epic thread, we have now more than 72 000 hits.

regards  ;D



Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #549 on: 21 Feb 2010, 02:02:13 pm »
wow, 8,012,203.19 total today (need 21 days for the last Mio)
have a look here
let's crunching and have fun  ;D

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: optimized sources
« Reply #550 on: 21 Feb 2010, 02:11:01 pm »
For 10? I think you need two million. Oh, and I thought you were German.  :)
Congrats anyway.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #551 on: 21 Feb 2010, 05:18:57 pm »
For 10? I think you need two million. Oh, and I thought you were German.  :)
Congrats anyway.
10 mio ? -->Target will be reached in 41.63 days on April 4 2010 (if i have no hardware outage)
 :)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #552 on: 23 Feb 2010, 05:24:05 pm »
Have now "Intel Compiler Suite" and "Parallel Studio", ( Composer update5 ) parallel installed on my dev-environment.
This way we can easy change the different compiler-packages in our projects as we need it.



Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #553 on: 06 Mar 2010, 07:29:32 pm »
all days update...
I switched off my air connected network and installed some CPL Home Plug adapters 200Mbps with 3 port switch.
I had have too many acesspoints in my environment which use the same channel  as mine. This reduced the bandwith to 34 Mpbs , I need days for my big software-downloads of some gigabytes. I was tired to look every day after a other free channel.
For fun I installed a USB-HDTV stick on R3600, to see the olympia events in HDTV quality. It worked great.
-----------------
Last week I installed a complete developer environment on my R3600 Atom.(no VM)
OS: Vista32
Parallel Studio (update5)
Intel Compiler Suite (update5)
A first complex project was tested sucessful and shows that the test and developer environment works.
Now I have still todo the updates on my VM's.

I did not upgrade the R3600 to W7.
But perhaps jason can tell us if his dev environment works on W7.

With some new astropulse wu's I had have no luck, I did not get any of them, so we are playing the waiting game..


Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #554 on: 10 Mar 2010, 08:22:42 am »
looks like we manifest now the 50% speedup against ap_5.05r168_SSE3 as our latest tests with ap_5.05r309 shows.
some more tests on different hardware-platforms must confirm it.
 :)

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 652
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 637
Total: 637
Powered by EzPortal