Recent Posts

Pages: 1 [2] 3 4 ... 10
11
Discussion Forum / Re: Loading APU to the limit: performance considerations
« Last post by Raistmer on 17 Oct 2016, 04:59:04 pm »
Same on single graph for direct comparison.

On full CPU load A10-5700 more(!) than twice slower vs i5-3470 for PG MultiBeam set  :-\
Hardly it's twice cheaper or consumes twice less power... Having 2 of such APUs and only single IvyBridge feel disappointed
12
Discussion Forum / Re: Loading APU to the limit: performance considerations
« Last post by Raistmer on 17 Oct 2016, 02:52:43 pm »
And here similar data for Trinity AMD APU.
One additional dot - 3 CPU tasks + busy GPU so one can see how strong GPU influence.

Situation much worse here.
Even CPU part alone scales very badly. Just 2 busy cores show considerable declination from linear scaling.
And 4 performs only slightly better than 3.
With GPU addition to equation APU seems overloaded. Maybe this is result of particular drivers (CPU load from GPU app unexpectedly high, much higher than for discrete ATi GPU with same app).
So the difference between CPU time and elapsed time became non-neglectible (red dots - from elapsed, black dots computed from CPU time)
Of course, dot5 (just as with IvyBridge case) doesn't fully reflect device performance, GPU part throughput not accounted here, only its negative influence on CPU part shown.
13
Discussion Forum / Re: Loading APU to the limit: performance considerations
« Last post by Raistmer on 17 Oct 2016, 02:15:28 pm »
And here is the first results for IvyBridge.
So far only CPU part explored under different loads.
Quite linear performance increase. declining only on full CPU load.
With busy GPU part CPU part performance drops stronger but not fatal.
To estimate complete device throughput in this condition additional measurement of GPU throughput required.
In general, quite good scaling of load for MultiBeam.
14
Discussion Forum / Re: Loading APU to the limit: performance considerations
« Last post by Raistmer on 15 Oct 2016, 07:43:43 am »
Quite long ago I did some exploration of this topic based on AstroPulse application.
Nowadays AP is rare beast so some refreshment data with MultiBeam required.
So I decided to revive this thread.
Also, there will be some changes in methodology to make this test less invasive for crunching.

So, how APU performance tuning versus load can be done now:

1) aquire PGv8 set of shortened tasks. With GBT data advance this set is biased, but separate adjustment by running some shortened GBT/blc task can be done if needed.
2) multiply each task. I prefer 3 tasks for each AR to have some statistics and error estimation.
3) download KWSN 2.13 benchmark
4) configure it not to suspend BOINC (it's important!). BOINC will provide background load for this type of tests.
5) configure BOINC for particular background load.
6) run test.
7) sum all times, divide by 3 and take reverse value. This will represent some mean "PGset-throughput" per second for particular config.

Repeat this for all wanted configs. Bigger value will designate better load configuration for particular device.

Now, what is "configure BOINC":
for example one want to test how APU will perform with 3 cores loaded + GPU part.
In current methodology such estimation can be aquired in 2 steps (2 bench runs):
1) make only 3 CPUs available for BOINC (check that GPU computations disabled and only 3 CPU tasks active in BOINC by reducing number of available CPUs to BOINC).
2) run bench with GPU build (can be ATi or iGPU depending on device under investigation), compute GPU part throughput (GPU_throughput)
3) make only 2 CPUs available for BOINC (by reserving more cores) but unsuspend GPU computations. Check that BOINC runs 2 CPU + 1 GPU task.
4) run bench with opt CPU app. Compute throughput (CPU_throughput)
5) device throughput for such config will be APU_throughput(1_core_reserved)=3*CPU_throughput+GPU_throughput.

Similarly all other configs can be checked.
Such approach allows minimal sacrifice to crunching performance of host. But imply some precision degradation in case of strong CPU-consumption dependence of GPU app from AR.
To solve this one can replace BOINC's GPU load by run of some cloned standard task in separate bench instance (preferably 2 estimations then - with high-CPU load and with low CPU load).

Fortunately, both ATi and iGPU apps CPU consumption low enough to discard such enhancement in first approach at least (actual mostly for NV OpenCL builds).
15
Discussion Forum / Re: Better sleep on Windows - new round
« Last post by Raistmer on 26 Aug 2016, 02:02:31 pm »
And data from GT720 on busy i5-3470 (high_prec timer enabled):

MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep0.exe -verb -nog :
  Elapsed 3018.575 secs, speedup: 46.57%  ratio: 1.87x
      CPU 358.599 secs, speedup: 90.99%  ratio: 11.10x
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep1.exe -verb -nog :
  Elapsed 3024.707 secs, speedup: 46.46%  ratio: 1.87x
      CPU 326.494 secs, speedup: 91.80%  ratio: 12.19x
MB8_win_x86_SSE3_OpenCL_NV_SoG_STT.exe -verb -nog :
  Elapsed 3034.625 secs, speedup: 46.29%  ratio: 1.86x
      CPU 334.591 secs, speedup: 91.59%  ratio: 11.89x

Sleep0:class SleepQuantum:      total=5073.9668,   N=3152,   <>=1.609761,   min=0.011221858   max=8.9496584
Sleep1:class SleepQuantum:      total=3132.7358,   N=3153,   <>=0.99357305,   min=0.85221332   max=3.1896715
STT:    class SleepQuantum:      total=15702.391,   N=2136,   <>=7.3513065,   min=0.01114194   max=16.63485

Nothing new here, just support of prev conclusions.

GT720 on busy i5-3470 (timer at default after host power cycle):
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep0.exe  :
  Elapsed 3095.420 secs, speedup: 45.21%  ratio: 1.83x
      CPU 268.571 secs, speedup: 93.25%  ratio: 14.82x
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep1.exe  :
  Elapsed 3051.709 secs, speedup: 45.98%  ratio: 1.85x
      CPU 273.017 secs, speedup: 93.14%  ratio: 14.58x
MB8_win_x86_SSE3_OpenCL_NV_SoG_STT.exe  :
  Elapsed 3014.658 secs, speedup: 46.64%  ratio: 1.87x
      CPU 319.958 secs, speedup: 91.96%  ratio: 12.44x

Sleep0:class SleepQuantum:      total=38777.035,   N=1595,   <>=24.311621,   min=3.4529493   max=51.648083
Sleep1:class SleepQuantum:      total=24096.59,   N=1575,   <>=15.299422,   min=14.763614   max=15.852066
STT:class SleepQuantum:      total=13195.877,   N=2761,   <>=4.7793832,   min=0.012540359   max=23.805403

And here advantage of STT finally appeared. With sleep quantum only ~15ms STT remained on ~4-5ms range.

CPU idle:
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep0.exe  :
  Elapsed 29005.994 secs, speedup: -413.41%  ratio: 0.19x(suspended through night)
      CPU 1725.527 secs, speedup: 56.64%  ratio: 2.31x
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep1.exe  :
  Elapsed 3032.034 secs, speedup: 46.33%  ratio: 1.86x
      CPU 299.272 secs, speedup: 92.48%  ratio: 13.30x
MB8_win_x86_SSE3_OpenCL_NV_SoG_STT.exe  :
  Elapsed 3007.939 secs, speedup: 46.76%  ratio: 1.88x
      CPU 1729.599 secs, speedup: 56.54%  ratio: 2.30x

Sleep0:class SleepQuantum:      total=49.532238,   N=3197,   <>=0.015493349,   min=0.012275338   max=8.6543369
Sleep1:class SleepQuantum:      total=24069.621,   N=1575,   <>=15.282299,   min=4.3649426   max=15.597382
STT:    class SleepQuantum:      total=40.04808,   N=3215,   <>=0.012456635,   min=0.012152191   max=0.0152032

approx context switch overhead for i5-3470: 0.015493349ms-0.012456635ms=0.003036714ms~3us
16
Discussion Forum / Re: Better sleep on Windows - new round
« Last post by Mike on 24 Aug 2016, 04:51:08 pm »
Quote
Could it be power issues? Maybe more strong power supply needed?

A Corsair AX 750I should be enough.
I also tested a 850 Watt.

I only have rock solid components.
Mobo Asus Sabertooth
Corsair AX 750i PSU
Kingston RAM
Noctua Heat think
Sapphire tricool GPU

Believe me its the FX.
The FX only has 4 FPU`s but 8 physical CPU cores and since seti app uses mostly FPU you don`t need a calculator.
There is no room  left for OS specific operations.
That`s why i usually run seti on 4 cores only and it is fully loaded.
I can encode HD video on 8 cores for 24 hours without any issues cause its not FPU bound.
CPU dont exceed 55C.
17
Discussion Forum / Re: Better sleep on Windows - new round
« Last post by Raistmer on 24 Aug 2016, 12:26:06 pm »
Host reseted after 5 minutes.
FX can`t use permanent all 8 cores.
Could it be power issues? Maybe more strong power supply needed?

stderr attached.
thanks. app processed OK some time, even found some spikes.
18
Discussion Forum / Re: Better sleep on Windows - new round
« Last post by Mike on 24 Aug 2016, 09:15:44 am »
I have to remove r3500 from this bench  because it doesn`t even start with all cores in use.

Ok for now, there is separate issue we just discovered...

Also sleep versions doesn`t even start.
Zero CPU usage on GPU task so i aborted after 5 minutes.
Not even wisgen started.

Please remove all wisgen tasks, run bench, await ~5mins, locate stderr.txt in ScienceApps folder and attach it as is.

Host reseted after 5 minutes.
FX can`t use permanent all 8 cores.
I told you that before.

stderr attached.
19
Discussion Forum / Re: Better sleep on Windows - new round
« Last post by Raistmer on 24 Aug 2016, 08:58:42 am »
I have to remove r3500 from this bench  because it doesn`t even start with all cores in use.

Ok for now, there is separate issue we just discovered...

Also sleep versions doesn`t even start.
Zero CPU usage on GPU task so i aborted after 5 minutes.
Not even wisgen started.

Please remove all wisgen tasks, run bench, await ~5mins, locate stderr.txt in ScienceApps folder and attach it as is.
20
Discussion Forum / Re: Better sleep on Windows - new round
« Last post by Mike on 24 Aug 2016, 08:56:46 am »
I have to remove r3500 from this bench  because it doesn`t even start with all cores in use.

Ok for now, there is separate issue we just discovered...

Also sleep versions doesn`t even start.
Zero CPU usage on GPU task so i aborted after 5 minutes.
Not even wisgen started.
Pages: 1 [2] 3 4 ... 10
Powered by EzPortal