Forum > Windows
optimized sources
Gecko_R7:
Full Atom run attached w/ result files in 7zip.
Strange that the ATOM switch shows "slower" on the DMH1023 WU. :-\
Not sure I trust that result.
Also noticed a missing WU.
Gonna re-run.... ::)
WU : DMH1023rr_ap_21oc08ab_B2_P0_00081_20081130_08605.dat
ap_5.05r168_SSE3.exe : 2207.679 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 1791.094 secs CPU
Speedup : 18.87%
Ratio : 1.23 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1664.967 secs CPU
Speedup : 24.58%
Ratio : 1.33 x
WU : JasonMediumrr.dat
ap_5.05r168_SSE3.exe : 11137.676 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 6406.134 secs CPU
Speedup : 42.48%
Ratio : 1.74 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 6412.452 secs CPU
Speedup : 42.43%
Ratio : 1.74 x
WU : JasonShortrr.dat
ap_5.05r168_SSE3.exe : 3823.569 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3199.768 secs CPU
Speedup : 16.31%
Ratio : 1.19 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 3213.012 secs CPU
Speedup : 15.97%
Ratio : 1.19 x
WU : Raistmer_tinyrr.dat
ap_5.05r168_SSE3.exe : 941.263 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 802.922 secs CPU
Speedup : 14.70%
Ratio : 1.17 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 806.447 secs CPU
Speedup : 14.32%
Ratio : 1.17 x
WU : sigindrr.dat
ap_5.05r168_SSE3.exe : 5168.859 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3982.191 secs CPU
Speedup : 22.96%
Ratio : 1.30 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 4041.658 secs CPU
Speedup : 21.81%
Ratio : 1.28 x
--- Quote ---
--- End quote ---
[attachment deleted by admin]
Jason G:
--- Quote from: Gecko_R7 on 04 Jan 2010, 06:14:21 pm ---...
Strange that the ATOM switch shows "slower" on the DMH1023 WU. :-\
Not sure I trust that result.
...
--- End quote ---
Well, A few ideas on that run:
- DMH1023 is a weird one, with lot's of blanking & early signals IIRC....
- The ATOM_QOpt build being first run in the Science Apps folder, would likely mean it was generating the FFTW wisdom (which takes time of course), and subsequent builds/runs might have benefited from that once off cost.
- I'm still trying to work a few things out about the characteristics of the newer ICC optimisations, that mean targetted switches are likely not operative on the hot code regions. Targetted platform builds ( such as /QxSSE4.1.... ) seem to be performing inferior to generate arch:sse3, that could indicate a combination of hand optimisations confounding/blocking the compiler automation, and/or a need to adjust Joe's excellent hand SSE code per platform ( of which there are a few fairly straight forward parameters clearly set for P3-P3 at the moment )
_heinz:
R3600 ATOM
100.000 credits today ;D
First seen on 2009-11-08 06:38:13
Current Credit (based on incremental update) 100,105.20
Recent average credit RAC (projects accumulated) 1,934.37570
mostly crunched collatz on the ION chip
cpu run empty.....
see full statistic of host R3600 6187800
summary we can say it crunches 50000 per month and get a rac of ~2000 running collatz on ION, cpu run empty..
for ~4 days I used the machine and swithed BOINC off
happy crunching ;D
Gecko_R7:
Re-run of Atom N270 results attached.
Summary below.
On this run, the 1LC25 WU was the first one and ATOM_ICC_Qopt was slower.
However, the 08605 WU was next & showed the Atom faster.
On my previous run, the 080605 WU was the first run and it was slower like these results.
There does seem to be a slow-down on the first WU run which makes ATOM_ICC times longer.
So, perhaps Wisdom gen time does have noticebale impact? :-\
--- Quote ---
Quick timetable
WU : ap_18se08aa_B6_P1_00046_1LC25.dat
ap_5.05r168_SSE3.exe : 2403.913 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 2163.079 secs CPU
Speedup : 10.02%
Ratio : 1.11 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1919.093 secs CPU
Speedup : 20.17%
Ratio : 1.25 x
WU : DMH1023rr_ap_21oc08ab_B2_P0_00081_20081130_08605.dat
ap_5.05r168_SSE3.exe : 1952.649 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 1671.145 secs CPU
Speedup : 14.42%
Ratio : 1.17 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1675.482 secs CPU
Speedup : 14.19%
Ratio : 1.17 x
WU : JasonMediumrr.dat
ap_5.05r168_SSE3.exe : 13857.850 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 6451.858 secs CPU
Speedup : 53.44%
Ratio : 2.15 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 6548.376 secs CPU
Speedup : 52.75%
Ratio : 2.12 x
WU : JasonShortrr.dat
ap_5.05r168_SSE3.exe : 3752.620 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3227.926 secs CPU
Speedup : 13.98%
Ratio : 1.16 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 3236.210 secs CPU
Speedup : 13.76%
Ratio : 1.16 x
WU : Raistmer_tinyrr.dat
ap_5.05r168_SSE3.exe : 1186.544 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 810.191 secs CPU
Speedup : 31.72%
Ratio : 1.46 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 813.795 secs CPU
Speedup : 31.41%
Ratio : 1.46 x
WU : sigindrr.dat
ap_5.05r168_SSE3.exe : 5153.165 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 4008.071 secs CPU
Speedup : 22.22%
Ratio : 1.29 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 4072.968 secs CPU
Speedup : 20.96%
Ratio : 1.27 x
--- End quote ---
[attachment deleted by admin]
Jason G:
--- Quote from: Gecko_R7 on 05 Jan 2010, 07:54:20 pm ---...
So, perhaps Wisdom gen time does have noticebale impact? :-\
...
--- End quote ---
It certainly can, and is probably the case here. Some platforms seem to converge quite quickly on wisdom, some take longer. I reckon it depends on how fftw arranged the heuristics in that initialisation, and to whether it finds the best codelet sequences soon or later in allowed time limits.
To confirm wisdom impact, take a look at the counters in stderr.txt. The Init component will contain any wisdom generation, while the crunch time is just that. The additional ffa counter is a subcomponent of crunching that Joe's been doing lot's of work on recently.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version