+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: optimized sources  (Read 548431 times)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #765 on: 06 Dec 2011, 12:35:05 pm »
excerpt from v0.39 installer Readme:
The ATI MB application will not work on ATI cards with workgroup size 128
(e.g. HD43xx).
HD4670 has:
CL_DEVICE_MAX_WORK_GROUP_SIZE:        128

 :'(  :'(  :'(
why  ?
I'm disappointed....

GPUZ shows: gpuz_hd4670

heinz
« Last Edit: 06 Dec 2011, 02:48:07 pm by _heinz »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #766 on: 06 Dec 2011, 03:10:31 pm »
I installed now:
for Astropulse
ap_5.06_win_x86_SSE2_OpenCL_ATI_r521.exe

MultiBeam
AK_v8b2_win_SSE2.exe

BOINC shows:
06.12.2011 20:58:39      ATI GPU 0: ATI Radeon HD 4600 series (R730) (CAL version 1.4.1607, 1024MB, 480 GFLOPS peak)


hopefully I will get some work....when seti is up again.

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #767 on: 06 Dec 2011, 04:26:24 pm »
HD4670 AGP, here is what clinfo shows:
~~~~~~~~~~~~~~~~~~~~~~~~~

C:\A\clinfo>echo off
clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.1 AMD-APP-SDK-v2.5 (79
3.1)
  Platform Name:                                 AMD Accelerated Parallel Proces
sing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callbac
k cl_amd_offline_devices


  Platform Name:                                 AMD Accelerated Parallel Proces
sing
Number of devices:                               2
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Device ID:                                     4098
  Max compute units:                             8
  Max work items dimensions:                     3
    Max work items[0]:                           128
    Max work items[1]:                           128
    Max work items[2]:                           128
  Max work group size:                           128
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  4
  Preferred vector width double:                 0
  Max clock frequency:                           750Mhz
  Address bits:                                  32
  Max memory allocation:                         134217728
  Image support:                                 No
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              32768
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    None
  Cache line size:                               0
  Cache size:                                    0
  Global memory size:                            536870912
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             16384
  Error correction support:                      0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   011BA4F4
  Name:                                          ATI RV730
  Vendor:                                        Advanced Micro Devices, Inc.
  Driver version:                                CAL 1.4.1607
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.0 AMD-APP-SDK-v2.5 (79
3.1)
  Extensions:                                    cl_khr_gl_sharing cl_amd_device
_attribute_query


  Device Type:                                   CL_DEVICE_TYPE_CPU
  Device ID:                                     4098
  Max compute units:                             1
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           1024
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  4
  Preferred vector width double:                 0
  Max clock frequency:                           2672Mhz
  Address bits:                                  32
  Max memory allocation:                         1073201152
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            8192
  Max image 2D height:                           8192
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   4096
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             No
  Cache type:                                    Read/Write
  Cache line size:                               0
  Cache size:                                    0
  Global memory size:                            1073201152
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             32768
  Error correction support:                      0
  Profiling timer resolution:                    279
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     Yes
  Queue properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   011BA4F4
  Name:                                                        Intel(R) Pentium(
R) 4 CPU 2.66GHz
  Vendor:                                        GenuineIntel
  Driver version:                                2.0
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.1 AMD-APP-SDK-v2.5 (79
3.1)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_
global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store
 cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_ve
c3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt


Drücken Sie eine beliebige Taste . . .

heinz

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: optimized sources
« Reply #768 on: 06 Dec 2011, 06:05:43 pm »
excerpt from v0.39 installer Readme:
The ATI MB application will not work on ATI cards with workgroup size 128
(e.g. HD43xx).
HD4670 has:
CL_DEVICE_MAX_WORK_GROUP_SIZE:        128

 :'(  :'(  :'(
why  ?
I'm disappointed....

GPUZ shows: gpuz_hd4670

heinz
heinz, you'll want to try the MB7_win_x86_SSE3_OpenCL_ATi_LHD4K_r390.exe app from the MB7 r390 sanity check thread, which is especially for GPUs with Max Workgroup size 128

Claggy

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #769 on: 07 Dec 2011, 01:25:47 pm »
I run the testcase with MB7_win_x86_SSE3_OpenCL_ATi_LHD4K_r390,
but mine P4 2.66 has still SSE2
I need a SSE2 version of LHD4K
~~~~~~~~~~~~~~~~~~~~
Informationsliste   Wert
CPU-Eigenschaften   
CPU Typ   Intel Pentium 4, 2666 MHz (20 x 133)
CPU Bezeichnung   Northwood
CPU stepping   C1
Befehlssatz   x86, MMX, SSE, SSE2
Vorgesehene Taktung   2667 MHz
Min / Max CPU Multiplikator   20x / 20x
Engineering Sample   Nein
L1 Trace Cache   12K Instructions
L1 Datencache   8 KB
L2 Cache   512 KB  (On-Die, ECC, ATC, Full-Speed)
   
CPU Technische Informationen   
Gehäusetyp   478 Pin FC-PGA2
Gehäusegröße   35 mm x 35 mm
Transistoren   55 Mio.
Fertigungstechnologie   6M, 0.13 um, CMOS, Cu, Low-K
Gehäusefläche   131 mm2
Kern Spannung   1.475 - 1.55 V
I/O Spannung   1.475 - 1.55 V
Typische Leistung   38.7 - 89.0 W  (Abhängig von der Taktung)
Maximale Leistung   49 - 109 W  (Abhängig von der Taktung)
   
CPU Hersteller   
Firmenname   Intel Corporation
Produktinformation   http://ark.intel.com/search.aspx?q=Intel Pentium 4
Treiberupdate   http://www.aida64.com/driver-updates
   
CPU Auslastung   
CPU #1   0 %

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #770 on: 07 Dec 2011, 06:48:25 pm »
HD4670 AGP
I can confirm to run successful a pg wu in 2h 35min, GPU load ~90% and CPU load was 100%, so there must be some issue in app or driver. CPU load max 5% should it be I think.
Have a look at hostid=232541

Everest shows:
Gerätebeschreibung
AGP 8x: ATI Radeon HD 4670 AGP (RV730)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Informationsliste   Wert
Grafikprozessor Eigenschaften   
Grafikkarte   ATI Radeon HD 4670 AGP (RV730)
BIOS Version   011.022.006.000.000000
BIOS Datum   06/11/10 04:09
GPU Codename   RV730 Pro
Teilenummer   113-SBRK2G02-10R-01
PCI-Geräte   1002-9495 / 1002-0028  (Rev 00)
Transistoren   514 Mio.
Fertigungstechnologie   55 nm
Gehäusefläche   146 mm2
Bustyp   AGP 8x @ 8x
Speichergröße   1 GB
GPU Takt   750 MHz  (Original: 750 MHz)
RAMDAC Takt   400 MHz
Pixel Pipelines   8
Texturen Mapping Einheiten   32
Unified Shaders   320  (v4.1)
DirectX Hardwareunterstützung   DirectX v10.1
Pixel Füllrate   6000 MPixel/s
Texel Füllrate   24000 MTexel/s
   
Speicherbus-Eigenschaften   
Bustyp   GDDR3
Busbreite   128 Bit
Tatsächlicher Takt   796 MHz (DDR)  (Original: 800 MHz)
Effektiver Takt   1593 MHz
Bandbreite   24.9 GB/s
   
Auslastung   
Grafikprozessor (GPU)   91%
   
ATI PowerPlay (BIOS)   
State #1   Grafikprozessor (GPU): 600 MHz, Speicher: 750 MHz  (Boot)
State #2   Grafikprozessor (GPU): 750 MHz, Speicher: 800 MHz
State #3   Grafikprozessor (GPU): 750 MHz, Speicher: 800 MHz  (UVD)
State #4   Grafikprozessor (GPU): 750 MHz, Speicher: 800 MHz
   
Grafikprozessorhersteller   
Firmenname   Advanced Micro Devices, Inc.
Produktinformation   http://www.amd.com/us/products/desktop/graphics
Treiberdownload   http://sites.amd.com/us/game/downloads
Treiberupdate   http://www.aida64.com/driver-updates

heinz
« Last Edit: 07 Dec 2011, 07:30:46 pm by _heinz »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #771 on: 12 Dec 2011, 03:18:29 pm »
Meanwhile I tried several ATI app's from different projects with driver 11.11(primegrid, Moo)
None of them has a acceptable CPU usage, min 60 - max 100% CPU usage.
Nothing changed since years, ATI hardware is good, but driver support is catastrophic.
11.11 driver is not really usable for GPU calculations, it forces 100% CPU usage.

heinz
modify: ATI app of Collatz crashed also
Stderr output
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
 - exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>

Running Collatz Conjecture (3x+1) ATI GPU application version 2.09 by Gipsel (Win32, CAL 1.4)
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 824633720832 numbers starting with 2372965778048095594856

CPU: Intel(R) Pentium(R) 4 CPU 2.66GHz (1 cores/threads) 2.67271 GHz (0ms)

CAL Runtime: 1.4.1607
Found 1 CAL device

Device 0: ATI Radeon HD4600 (RV730) 1024 MB local RAM (remote 64 MB cached + 128 MB uncached)
GPU core clock: 750 MHz, memory clock: 800 MHz
320 shader units organized in 8 SIMDs with 8 VLIW units (5-issue), wavefront size 32 threads
not supporting double precision

Initializing lookup table (16384 kB) ... done
Starting WU on GPU 0
Copy lookup table to GPU memory (16384 kB)
Initialize step array on GPU (256 MB)
predicted runtime per iteration is 167 ms (33.3333 ms are allowed), dividing each iteration in 6 parts
borders of the domains at 0 688 1368 2048 2736 3416 4096
No checkpoint data found.


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00BB7E66 read attempt to address 0x0000014C


 :'( :'(
« Last Edit: 16 Dec 2011, 04:34:57 am by _heinz »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #772 on: 12 Dec 2011, 04:01:38 pm »
Back to V8-Xeon,
I run now the third day under full load (CPU+GPU) and can get ~900 000 cr/day (2GTX470+1GTX570).
Pitty I cant reach the million/day with this hardware-configuration.
3GTX570 or 3 GTX580 could do it...
Seems stable now....

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #773 on: 14 Dec 2011, 05:29:51 pm »
V8-Xeon is back in the first 20 computer of the toplist of primegrid. Number 19 today.  ;D
It's GPU's blow up 384 PPS-wu's per day now.
I got no work from Astropulse...

heinz


Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #774 on: 16 Dec 2011, 04:53:30 am »
ATI HD4670 AGP
After running a week primegrid we can say production output is hd4670_pg_output 20000 points per day.
This OpenCL app btw. driver 11.11 forces CPU 100%.
Hoping we will get a well working driver update soon.

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #775 on: 16 Dec 2011, 12:01:42 pm »
V8-Xeon
ID-number    number owner    avg/credit       summary cr      Boinc ver.
ID: 173588   15       _heinz     743,356.24     170,010,538     6.10.58

Number 15 of the pg tophost list today
170 Mio pg now
Still possible with optimized CUDA application.

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #776 on: 20 Dec 2011, 11:55:33 am »
new milestones:
20th of december 2011
Current total Credit 220,708,713.18

modify:
30th of december 2011
My ION get 2 Mio primegrid
statistic shows: R3600_2Mio_primegrid  ::)  ;D  ::)

sylvester 2011
200Mio_primegrid
boinc_200Mio_primegrid

Happy new Year 2012  ;D

Thank you to all readers looking up here.
Happy crunching 2012

 
« Last Edit: 31 Dec 2011, 04:48:57 pm by _heinz »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #777 on: 05 Jan 2012, 07:12:02 pm »
6th of january
v8-Xeon pg_number_8  ;D
top_hosts


Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: optimized sources
« Reply #778 on: 06 Jan 2012, 03:59:40 am »
6th of january
v8-Xeon pg_number_8  ;D
top_hosts


Congratulations, Heinz! Keep to climb the  ladder!  :D

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: optimized sources
« Reply #779 on: 07 Jan 2012, 09:35:35 am »
Now after a month running the HD4670AGP its time for a summary.
Started to crunch with the ati OCL application and driver 12.1 on 07.th of december, now a month later on 7th of january let's have a look at the results.
hd4670_600000
As we can see HD4670AGP earned ~600000 in 30 days.
Here is a look at the results on this host
No error occured during the testperiod of one month.
0.6Mio per month is a respectable result for this old machime with P4 2.66MHz from the year 2005.
Now we wait for better driver which will hopefully reduce the CPU-usage to 5%.
On seti's side I'm waiting for Raistmers ati application to support workgroupsize=128
I need a SSE2 version of LHD4K

I bought this ATI Radeon 4670AGP for development and testing OCL, and it does.

heinz

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 40
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 33
Total: 33
Powered by EzPortal