Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: Raistmer on 01 Jan 2012, 06:17:21 pm

Title: C-60 APU and Radeon HD6920
Post by: Raistmer on 01 Jan 2012, 06:17:21 pm
With preinstalled OS (Win7 x64 home premium) there is no OpenCL support on GPU (but Direct Copmute 5.0 supported).
I tried to find update on AMD site but highest APU listed there is C-50. So I download "11.12 mobility drivers" for Win7 x64.
Will see if this driver can be used for my hardware config and will it provide OpenCL support on this netbook or not.
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 02 Jan 2012, 06:00:13 am
My hopes are fulfilled, this APU-based netbook can do OpenCL indeed!

WU : PG1327.wu
MB7_win_x86_SSE2_CPU_r390.exe -period_iterations_num 1 -hp :
  Elapsed 1333.506 secs
      CPU 1303.108 secs
MB7_win_x86_SSE3_OpenCL_ATi_HD5_r390.exe -period_iterations_num 1 -hp :
  Elapsed 401.514 secs
      CPU 257.604 secs
MB7_win_x86_SSE3_OpenCL_ATi_LHD4K_r390.exe -period_iterations_num 1 -hp :
  Elapsed 415.554 secs
      CPU 296.839 secs
MB7_win_x86_SSE3_OpenCL_ATi_r390.exe -period_iterations_num 1 -hp :
  Elapsed 359.705 secs
      CPU 245.873 secs
MB7_win_x86_SSSE3x_CPU_r390.exe -period_iterations_num 1 -hp :
  Elapsed 0.047 secs
      CPU 0.016 secs

From the other side it can't do SSSE3 (at least, Intel's flavour one).

Build features: SETI7   Non-graphics   OpenCL   USE_OPENCL_HD5xxx   IPP   AMD specific   USE_SSE3   x86   
     CPUID: AMD C-60 APU with Radeon(tm) HD Graphics

     Cache: L1=64K L2=512K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
CPU type 0x43
Number of OpenCL platforms:             1


 OpenCL Platform Name:                AMD Accelerated Parallel Processing
Number of devices:             1
  Max compute units:             2
  Max work group size:             256
  Max clock frequency:             275Mhz
  Max memory allocation:          175374336
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             268435456
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Scratchpad
  Local memory size:             32768
  Queue properties:            
    Out-of-Order:             No
  Name:                   Loveland
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.1646 (VM)
  Version:                OpenCL 1.1 AMD-APP (831.4)
  Extensions:                cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 02 Jan 2012, 03:43:15 pm
GPU clocks look strange:

(http://gpuz.techpowerup.com/12/01/02/cbt.png)
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 02 Jan 2012, 03:46:15 pm
when GPU idle it rises to 400MHz but with busy GPU it drops to 275MHz

(http://gpuz.techpowerup.com/12/01/02/gr7.png)

Maybe power usage restriction applies?
it's netbook after all...
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 02 Jan 2012, 05:29:33 pm
GPU temperature seems a little bit high to me.
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 02 Jan 2012, 05:55:59 pm
GPU temperature seems a little bit high to me.

It reached 90C during benchmark...
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 03 Jan 2012, 06:29:00 am
Hm... reciving very controversial results in total benchmark currently running on this device.
For example SSE3_INTEL binary runs while SSE2 fails (!).
Mike, could you run similar total benchmark (all available AKV8b2 apps x86+x64) on your FX CPU to compare our findings ?
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 03 Jan 2012, 09:38:33 am
Hm... reciving very controversial results in total benchmark currently running on this device.
For example SSE3_INTEL binary runs while SSE2 fails (!).
Mike, could you run similar total benchmark (all available AKV8b2 apps x86+x64) on your FX CPU to compare our findings ?


Will do when i´m back from work.
Do you want full PG set or is 1 WU enough.

Mike
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 03 Jan 2012, 10:50:53 am
think 1 WU would be ebough for now.
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 03 Jan 2012, 02:26:11 pm
Finnished my benchrun on the FX CPU with all akv8b2 versions.

I have to say i am surprised.

Only 3 versions are functioning.
SSE3_AMD, SSE3_INTEL and SSE4.1
No x64 version is working.

SSE3_INTEL is faster than SSE3_AMD version  ::)
Will run a full PG bench with both SSE3 versions to make sure.

Mike

Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 03 Jan 2012, 02:58:56 pm
Fine (if it can be fine at all) you just confirmed my own observations!
Same distribution here (I tried SSE-nly build too also, it works).

EDIT: no, not exactly the same. I have SSE2 not working too and it works on your CPU...
But all x64 fail here too...
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 03 Jan 2012, 06:09:22 pm
Fine (if it can be fine at all) you just confirmed my own observations!
Same distribution here (I tried SSE-nly build too also, it works).

EDIT: no, not exactly the same. I have SSE2 not working too and it works on your CPU...
But all x64 fail here too...

No SSE2 dont work here also.

Quote
Only 3 versions are functioning.
SSE3_AMD, SSE3_INTEL and SSE4.1
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 03 Jan 2012, 06:17:35 pm

Here is my full bench SSE3_AMD vs SSE3_Intel.
Intel version is faster on each angle range.  (http://angelascottage.net/phpBB3/images/smilies/stupid.gif)

AK_v8b2_win_SSE3_AMD.exe -verb -nog / PG0009.wu :
AppName: AK_v8b2_win_SSE3_AMD.exe
AppArgs: -verb -nog
TaskName: PG0009.wu
Started at  : 20:28:36.375
Ended at    : 20:35:54.376
    437.939 secs Elapsed
    431.670 secs CPU time

AK_v8b2_win_SSE3_INTEL.exe -verb -nog / PG0009.wu :
AppName: AK_v8b2_win_SSE3_INTEL.exe
AppArgs: -verb -nog
TaskName: PG0009.wu
Started at  : 20:35:58.245
Ended at    : 20:43:11.676
    433.368 secs Elapsed
    426.881 secs CPU time

AK_v8b2_win_SSE3_AMD.exe -verb -nog / PG0395.wu :
AppName: AK_v8b2_win_SSE3_AMD.exe
AppArgs: -verb -nog
TaskName: PG0395.wu
Started at  : 20:43:15.513
Ended at    : 20:50:10.863
    415.288 secs Elapsed
    409.643 secs CPU time

------------
AK_v8b2_win_SSE3_INTEL.exe -verb -nog / PG0395.wu :
AppName: AK_v8b2_win_SSE3_INTEL.exe
AppArgs: -verb -nog
TaskName: PG0395.wu
Started at  : 20:50:14.732
Ended at    : 20:56:58.086
    403.291 secs Elapsed
    397.771 secs CPU time

AK_v8b2_win_SSE3_AMD.exe -verb -nog / PG0444.wu :
AppName: AK_v8b2_win_SSE3_AMD.exe
AppArgs: -verb -nog
TaskName: PG0444.wu
Started at  : 20:57:01.892
Ended at    : 21:02:46.090
    344.136 secs Elapsed
    339.521 secs CPU time

AK_v8b2_win_SSE3_INTEL.exe -verb -nog / PG0444.wu :
AppName: AK_v8b2_win_SSE3_INTEL.exe
AppArgs: -verb -nog
TaskName: PG0444.wu
Started at  : 21:02:49.866
Ended at    : 21:08:18.339
    328.411 secs Elapsed
    324.139 secs CPU time

AK_v8b2_win_SSE3_AMD.exe -verb -nog / PG1327.wu :
AppName: AK_v8b2_win_SSE3_AMD.exe
AppArgs: -verb -nog
TaskName: PG1327.wu
Started at  : 21:08:22.177
Ended at    : 21:13:09.638
    287.399 secs Elapsed
    283.189 secs CPU time

AK_v8b2_win_SSE3_INTEL.exe -verb -nog / PG1327.wu :
AppName: AK_v8b2_win_SSE3_INTEL.exe
AppArgs: -verb -nog
TaskName: PG1327.wu
Started at  : 21:13:13.491
Ended at    : 21:17:48.753
    275.200 secs Elapsed
    270.443 secs CPU time

Title: Re: C-60 APU and Radeon HD6920
Post by: Jason G on 04 Jan 2012, 04:13:49 am
Intel version is faster on each angle range.  (http://angelascottage.net/phpBB3/images/smilies/stupid.gif)

That's great news!, probably means the caching & SSE3 implementation is improved enough in that model to negate the need for a special AMD build with the earlier libraries.  That probably means 1 less build already, if we can manage to statically link in the libs needed by the older AMD chips  :)  I think the Instruction set convergence will end up making things substantially easier, so we can get back to optimisation instead of build & library juggling.  Here's hoping  ;)
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 04 Jan 2012, 12:28:16 pm
I would not talk about any convergence after these tests done. Looks like divergence much more adequate term...
Look for number of builds that DON'T run on these new AMD chips...
Whole x64 AKv8b2 set fails to even report error...
Title: Re: C-60 APU and Radeon HD6920
Post by: Jason G on 04 Jan 2012, 06:37:10 pm
I would not talk about any convergence after these tests done. Looks like divergence much more adequate term...
Look for number of builds that DON'T run on these new AMD chips...
Whole x64 AKv8b2 set fails to even report error...

When Intel specific target chip libraries & builds were used by design, and it runs at all (let alone better in some cases), I find that surprising, since a static Intel build should run badly if at all on the wrong chip, even within Intel silicon due to micro-architectural optimisation being on the heavy side.  With the instruction sets, I'm more referring that general SSE3 performs across the board pretty well on newer chips from both vendors, where neither with Core2, Athlon nor PhenomI/II was 'plain Intel SSE3' a good choice of code & libraries. 

With juggling, We *should* find, that a static build for each x86 & x64,  each with generically optimised SSE3, SSE2 with static IPP in both flavours &  FFTW, should have a workable combination for most chips except Core2 & AVX.  Obviously AVX availability being OS dependant, that would need to be tacked on with proper detection, so similar with SSSE3 seems viable.

Jason
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 04 Jan 2012, 07:42:40 pm
I would really like to see how the FX performs with a 64 bit app.
I´m almost certain it would benefit more as an Intel to be honest.

Mike
Title: Re: C-60 APU and Radeon HD6920
Post by: Jason G on 04 Jan 2012, 08:11:55 pm
I would really like to see how the FX performs with a 64 bit app.
I´m almost certain it would benefit more as an Intel to be honest.

Mike
  I tend to agree there would be some benefit, though would expect only around 10% by just changing bittage, solely because much of the hot code is 128 bit vectorised SIMD already anyway (and will be 256 bit with AVX support additions later). The periphery CPU non-vectorised code (which would become 64 bit) only has a marginal possible impact in this kind of application.

It's working out how to approach that correctly given 64 bit Intel libraries don't contain suitable generically optimised 64 bit libraries, that becomes the technical challenge.  For a 64 bit build, at the moment it looks as though a worthy option to try will be bypassing the issues by using newest FFTW, with automatic AVX support inclusive, ironically built with Intel compiler, & use generic arch:SSE3 optimisations for the core, with a arch:SSE2 path for some earlier chips.  Where I talk about convergence is that it's looking as though i3-i7 might also work well under that arrangement, so only poking around as Raistmer's V7 updates are merged in will really point to the best methods.

All I can say with absolute certainty at this point, is that separate per chip builds for every chip, or even class of chips , would not be a sustainable approach, though it worked well in the past.  Even 'simply' rebuilding the full AKv8b2 set for the bugfix, was far too much work for that amount of subtle difference. That knowledge could & should  be embedded in one app per platform instead, as stock does.

Jason
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 04 Jan 2012, 08:20:26 pm
10% isn´t to shabby IMHO.
Title: Re: C-60 APU and Radeon HD6920
Post by: Jason G on 04 Jan 2012, 08:35:54 pm
10% isn´t to shabby IMHO.

In micro-architectural optimisation terms it's a bucketload.  For more impact it's quite possible that newer FFTW may fly on that chip compared to Intels x86 SSE3 (Pentium 4!!!!) library that  appears to work best on it now.  We'll just have to make the comparisons easy with switches or similar, then hard wire defaults to suit the findings later.

Jason
Title: Re: C-60 APU and Radeon HD6920
Post by: Urs Echternacht on 04 Jan 2012, 09:59:48 pm
For Bulldozer arch try CompilerOptQuickRef-62004200.pdf (http://developer.amd.com/Assets/CompilerOptQuickRef-62004200.pdf)
Title: Re: C-60 APU and Radeon HD6920
Post by: Jason G on 04 Jan 2012, 10:42:23 pm
For Bulldozer arch try CompilerOptQuickRef-62004200.pdf (http://developer.amd.com/Assets/CompilerOptQuickRef-62004200.pdf)

Thanks! Interesting they recommend aggressive unrolling & prefetch, which suggests long pipelines. That's opposite to Core2 onward, which use loop stream detectors, often preferring to remain rolled up.   Probably when optimising for those it'll be worthwhile cross-checking what Agner Fog says for extra insight. 

I'm open in planning to try other compilers as well, so that's some good starting info.

Jason
Title: Re: C-60 APU and Radeon HD6920
Post by: skildude on 05 Jan 2012, 12:35:44 am
I did the benchmarks for the SSE3 X64 non AMD.  All WU's failed to start.
No real data to report at all on that test.
 The Following are the results from the AMD SSE3 on Win7 64 bit OC to 3.9Ghz.  The app doesn't state if it is 64 bit but it is from the 64 bit lunatics installer.
I think there is a dramatic speed difference from Mikes 32 bit testing.  I don't think the minimal OC can account for the speed difference.  In fact these times are substantially faster than Mikes!!!

WU : PG0009.wu
AK_v8b2_win_SSE3_AMD.exe : 326.697 secs CPU
AK_v8b2_win_SSE3_AMD.exe : 328.554 secs CPU
Speedup     : -0.57%
Ratio       : 0.99 x

WU : PG0395.wu
AK_v8b2_win_SSE3_AMD.exe : 307.104 secs CPU
AK_v8b2_win_SSE3_AMD.exe : 306.776 secs CPU
Speedup     : 0.11%
Ratio       : 1.00 x

WU : PG0444.wu
AK_v8b2_win_SSE3_AMD.exe : 249.430 secs CPU
AK_v8b2_win_SSE3_AMD.exe : 250.740 secs CPU
Speedup     : -0.53%
Ratio       : 0.99 x

WU : PG1327.wu
AK_v8b2_win_SSE3_AMD.exe : 201.584 secs CPU
AK_v8b2_win_SSE3_AMD.exe : 200.134 secs CPU
Speedup     : 0.72%
Ratio       : 1.01 x

Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 05 Jan 2012, 04:38:23 am
Currently I preparing new build environment on netbook. It will be x64 one cause it came with x64 Win7 onboard.
Wanna take opportunity and do great upgrade of buiuld environment too.
Ultimately will use VS2010 (unfortunately, I have access only to x86 prof version so will sit with VS2008 little more cause have full x64 pro suite).
Looks like Intel's part should be upgraded too. Perhaps, new Intel's composer? Should it support AVX? Should VS2010 support AVX? VS 2008 apparently should not ?Or some patches/service packs available?
I put new MB7 OCL NV onlyne, still CUDA 3.2 but will try CUDA 4.1RC2 on netbook so some more speed comparisons will be needed.
Testers stay tuned ;)
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 05 Jan 2012, 05:06:42 am
http://software.intel.com/en-us/articles/intel-ipp-70-library-release-notes/
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 05 Jan 2012, 05:15:06 am
The FX benefits dramatic from clock speeds.

Evenso on my last test i still had 6 cores running on boinc.
It was just a speed comparision between AMD and Intel app not overall speed test.

FX @3.9 GHZ

AK_v8b2_win_SSE3_AMD.exe -verb -nog / PG0009.wu :
AppName: AK_v8b2_win_SSE3_AMD.exe
AppArgs: -verb -nog
TaskName: PG0009.wu
Started at  : 10:28:53.820
Ended at    : 10:34:18.863
    324.981 secs Elapsed
    322.875 secs CPU time
Speedup     : 8.87%
Ratio       : 1.10 x

AK_v8b2_win_SSE3_INTEL.exe -verb -nog / PG0009.wu :
AppName: AK_v8b2_win_SSE3_INTEL.exe
AppArgs: -verb -nog
TaskName: PG0009.wu
Started at  : 10:34:22.592
Ended at    : 10:39:43.889
    321.204 secs Elapsed
    319.131 secs CPU time
Speedup     : 9.93%
Ratio       : 1.11 x

AK_v8b2_win_SSE3_AMD.exe -verb -nog / PG0395.wu :
AppName: AK_v8b2_win_SSE3_AMD.exe
AppArgs: -verb -nog
TaskName: PG0395.wu
Started at  : 10:39:47.555
Ended at    : 10:44:51.209
    303.607 secs Elapsed
    301.550 secs CPU time
Speedup     : 14.89%
Ratio       : 1.17 x

AK_v8b2_win_SSE3_INTEL.exe -verb -nog / PG0395.wu :
AppName: AK_v8b2_win_SSE3_INTEL.exe
AppArgs: -verb -nog
TaskName: PG0395.wu
Started at  : 10:44:54.860
Ended at    : 10:49:49.590
    294.684 secs Elapsed
    292.673 secs CPU time
Speedup     : 17.40%
Ratio       : 1.21 x

Also @3.9 GHZ the FX is up to 20% faster as the Phenom on 3.6 GHZ instead of 28% slower.  :o

Bench attached.




Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 05 Jan 2012, 05:58:42 am
I think those AVX-enabled need at least FFT update if not whole IPP functions used.
AFAIK Joe implemented some hands AVX opt in stock. Would be good if it would be incorporated in opt apps also... USE_AVX macro-governed perhaps.
Title: Re: C-60 APU and Radeon HD6920
Post by: Jason G on 05 Jan 2012, 07:29:09 am
I think those AVX-enabled need at least FFT update if not whole IPP functions used.
AFAIK Joe implemented some hands AVX opt in stock. Would be good if it would be incorporated in opt apps also... USE_AVX macro-governed perhaps.

Yes, definitely looking at that.  Any AVX enabled app should really have the AVX path and at least one viable alternative path/library, since AVX availability must be detected at runtime, and on Windows is only supported under Win7 w/sp1 (or presumably Win8beta)

So far I have managed to build an AVX enabled static fftw lib, both x86 & x64 (which will be useful at least in particular for AP), that uses its own internal detection, but that build is only MSVS2010sp1 so far & will definitely need to bench an ICC built variant soon (supplied bench in fftw project at least),.  So I'll probably try linking it in to AKv8 as well as slotting in your V7 updates & splitting core functions into different SSE base versions.  I'm hopeful when I get going to have that operational pretty quickly.

I haven't gotten around to test linking in several sse level static IPPs yet into the same build, but looked at how to do it & doesn't appear to be difficult.  What I'll probably do is a huge build with every kind of FFT available linked in at the same time, then switch variants with a command line parameter so the testers can tell us which library is faster for which chip  ( :D  let the testers work out the tricky mappings )

Jason
Title: Re: C-60 APU and Radeon HD6920
Post by: skildude on 10 Jan 2012, 08:33:21 am
it looks like Primegrid has an AVX app for their LLR projects for linux and Apple but not currently any windows
Title: Re: C-60 APU and Radeon HD6920
Post by: cristipurdel on 11 Jan 2012, 08:38:24 am
Are these libraries already included in the amd app?
http://developer.amd.com/libraries/appmathlibs/Pages/default.aspx
Title: Re: C-60 APU and Radeon HD6920
Post by: Raistmer on 11 Jan 2012, 10:42:01 am
Are these libraries already included in the amd app?
http://developer.amd.com/libraries/appmathlibs/Pages/default.aspx
There were attempts by Devaster and by me to use ACML at least for FFT... But no working binary was produced.
Title: Re: C-60 APU and Radeon HD6920
Post by: Mike on 14 Jan 2012, 06:38:30 pm

I encoded a video today with Handbrake because i read a test about the FX CPU using it.
Checked the logs and it used SSSE3 and SSE4.2.
Its damn fast btw.

So i´m almost certain the the C-60 supports it as well.

Mike

Title: Re: C-60 APU and Radeon HD6920
Post by: Frizz on 15 Jan 2012, 05:29:53 am
Are these libraries already included in the amd app?
http://developer.amd.com/libraries/appmathlibs/Pages/default.aspx

About a year ago I replaced Apples OCL FFT code with AMDs APP FFT code in the Astropulse application. AMDs code was considerably slower.

I haven't done any tests since then.