Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => Topic started by: _heinz on 05 Mar 2007, 08:42:45 pm

Title: optimized sources
Post by: _heinz on 05 Mar 2007, 08:42:45 pm
Hi Simon,

after studying the sources I found that in the client chirpfft.cpp is an object to have my attention.  I reduced the code in CalcTrigArray by using a extern function  FillTrigArray created by me and in TrigArrayInit.ptt I made some smart hints to compile. That should set up the speed. Next will be analyse.cpp
So I will go through all the other sources to find some things to make shorter and more effective, but it takes a little time to finish this.
who compiles the sources? Shold I do that ?
Or should i send the sources back to you Simon.
Till now I have not the complete environment at home to make a new client.
I have the Microsoft C Compiler Version 4.00 and the debugger Code View Version 1.0 to make some short progs to look if my new code is fine.
Sure I can download the all necessary new components to install a new environment, but it works still for a month. Its a little bit pitty. Or I must invest over 600 Dollers I think to get it for standy using.
have anybody a good idea what to do?
mfg seti_britta



Title: Re: optimized sources
Post by: Josef W. Segur on 06 Mar 2007, 02:40:16 pm
Hi Simon,

after studying the sources I found that in the client chirpfft.cpp is an object to have my attention.  I reduced the code in CalcTrigArray by using a extern function  FillTrigArray created by me and in TrigArrayInit.ptt I made some smart hints to compile. That should set up the speed. Next will be analyse.cpp
So I will go through all the other sources to find some things to make shorter and more effective, but it takes a little time to finish this.
who compiles the sources? Shold I do that ?
Or should i send the sources back to you Simon.
Till now I have not the complete environment at home to make a new client.
I have the Microsoft C Compiler Version 4.00 and the debugger Code View Version 1.0 to make some short progs to look if my new code is fine.
Sure I can download the all necessary new components to install a new environment, but it works still for a month. Its a little bit pitty. Or I must invest over 600 Dollers I think to get it for standy using.
have anybody a good idea what to do?
mfg seti_britta

I've shifted this to the Windows side since that matches the compiler and what you are running.

What you might do is just attach your changed source files to a post here. I'm definitely interested, one of my hosts has a Pentium-MMX CPU so can't use the vectorized chirp functions. And if you've improved the TrigArray approach enough, it might turn out to be faster than those vectorized versions even on systems with SSE, etc.  Any further optimizations will also be welcome.

Simon has the full build system with Intel compiler and Intel Performance Primitives, but I've been doing test GCC builds for Windows with DevC++/MinGW (as Eric Korpela uses for the stock Windows applications). If your changes can be built this way I'll probably try.
                                                                                 Joe
Title: Re: optimized sources
Post by: BenHer on 06 Mar 2007, 04:59:18 pm
Britta,

Regarding your other questions.

1. Final releases are complied with Intel's C++ Compiler v9.x.  There is a free version of this for Linux and the Windows version is available for a 30 day demo install.

2. Making your changes compile with Microsoft 2003 or 2005 C++ should almost allways work with the Intel compiler.

3. We (the programmers) usually make a change, compile a candidate executable with that change, and then test it by crunching one of 7 available test work units.  These WUs are modified to make them run in about 1/15th the normal run time of a regular WU, but tests all parts of the seti code.

4. Once the test is complete (we also time the test and compare the time to the latest release) we verify that it produced the correct output file (result) by using rescmp, a result comparison utility.  If that works (and the time is faster) we then post the changed source file(s) along with the new executable in a posting to one of these threads for the rest of the development/testing group to try out and validate.
Title: Re: optimized sources
Post by: _heinz on 15 Mar 2007, 10:06:28 pm
Joe,
I´m  working now on analyzeFuncs.ccp. The important part chirpfft.cpp is now done.  Feel free to give some hints and comments.  Don´t use to compile it alone, some variables are defined outside of it.  All modifications are marked with "seti_britta:", so you can easy find it by searching.
seti_britta

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Crunch3r on 16 Mar 2007, 12:48:45 am
Joe,
I´m  working now on analyzeFuncs.ccp. The important part chirpfft.cpp is now done.  Feel free to give some hints and comments.  Don´t use to compile it alone, some variables are defined outside of it.  All modifications are marked with "seti_britta:", so you can easy find it by searching.
seti_britta

Hello Seti_britta,

I assume as i've seen that you joined Seti.Germany that i can write thise one in german.... (if i'm wrong please correct me ;-)

-------------------------------------------------------------------------------------------------------------------------------------------------

ok... die (log etc. ) funktionen etc. sollten mit denen aus der intel ICC/IPP oder der MKL umgesetzt werden.
(log mit libimf bzw.  mathimf.h)

Wir haben haben dafür die notwendigen Lizenzen ... (zum testen gib's die auch als 30 tage demo von intel)

Was mich Persöhnlich interessieren würde, wäre eine umsetzung des powerspectrum und der transpose functions via Intel MKL...  bzw. Powerspectrum viia Intel IPP.

Kannst du das realisieren ?

P.S. bist du mit linux vertraut oder nur windows ???









Title: Re: optimized sources
Post by: _heinz on 19 Mar 2007, 10:34:13 pm
hello Joe, hello Cruch3r,

at the moment I´m very busy with analyzeFuncs.cpp, reducing code and make some optimizations in the sources. After that I will look what to do with powerspectrum and transpose. How you know analyzeFuncs is a fat thing, not easy to understand what is going on in the code. Therefore I  divided it into logical parts easy to understand the function. This take me the possibility to have a better overview, reduce code and make other logical changes. Now I´m ready to show the first result of my studys , the new  programmstructure  of seti_analyze. Hints and suggestions are welcome.

for Crunch3r --> I know Linux too, have alredy installed a webserver with Apache and PHP, but at the moment I have still some old win and mac boxes and a P4 with xph, linux not installed
in english für alle anderen zum mitlesen :-)

seti_britta

see attachfile: the new structure of  seti_analyze ( still for understanding documentation and discussion)

[attachment deleted by admin]
Title: Re: optimized sources
Post by: BenHer on 20 Mar 2007, 05:12:00 pm
Quote
How you know analyzeFuncs is a fat thing,

Britta,

We know because we have compiled and then run the seti executable under control of a "profiling" program.  After completing an entire WU crunching we then know that  aa% of the time was spent within function abc, and bb% of the time was spent within function xyz and like so for all functions in the program.

The ones that use the most time get the most of our optimization thinking and programming attempts.
Title: Re: optimized sources
Post by: _heinz on 28 Mar 2007, 08:20:58 pm
the first news

use now an enhanced timer, count in timer ticks, test code pieces

used for test the new fkt CalcAng
let the function write in an 1000 element double vector cyclic
take this in a loop of 10 000
so we call the fkt 10 mio times.
was surprised about the result,
tryed this with 2 small different functions
here you see the result
-------------------
Timer Frequency in:
Hz  =       3579545
MHz =       3.57955
GHz =    0.00358

Start Time =     743223057648 Ticks
Stop Time  =     743224081856 Ticks

Duration in Ticks   =  1024208
Duration in seconds =  0.2861279855401
--------------------------------------
Timer Frequency in:
Hz  =       3579545
MHz =       3.57955
GHz =    0.00358

Start Time =     743224082065 Ticks
Stop Time  =     743225105999 Ticks

Duration in Ticks   =  1023934
Duration in seconds =  0.2860514394986
--------------------------------------
   P1 = 1024208
   P2 = 1023934
   dif= 274

Solution:P2 is faster than P1

 ;D

the secand news:
set up Ms Visual Studio 2005 Express
update with Windows Server 2003 Platform SDK
using this environment to compile seti sources

go on now with further optimization of the sources

seti_britta
Title: Re: optimized sources
Post by: _heinz on 31 Mar 2007, 05:54:44 am
- imported seti_boinc from Visual Studio 2003 to Visual Studio 2005 Express Edition  ;D
- can now compile and get object modul
- compile analyzeFuncs
------ Erstellen gestartet: Projekt: seti_boinc, Konfiguration: Debug Win32 ------
Kompilieren...
analyzeFuncs.cpp
....some wanings
Das Buildprotokoll wurde unter "file://c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\win_build\Debug\BuildLog.htm" gespeichert.
seti_boinc - 0 Fehler, 13 Warnung(en)
========== Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========
 ;D

now I can proof all my changes if there are any compiler errors  ;D

@Simon : till now I did not install IPP and MKL, but when I do that, it must be able to compile a optimized client. Hope did nothing forget. Simon, what do you think about it ?

Title: Re: optimized sources
Post by: Simon on 31 Mar 2007, 07:04:10 am
Hi Britta,

for optimal results, you should use ICC and IPP. Unless you modified the sources to use the fftw wrapper that MKL provides, it's not necessary (MKL).

Go for it :)

Regards,
Simon.
Title: Re: optimized sources
Post by: Gecko_R7 on 31 Mar 2007, 01:45:43 pm
Hi Simon,

Are you planning to play w/ and compare new MKL 9.1 Beta?
You think it has caught-up/surpassed speed of IPP? 

Title: Re: optimized sources
Post by: Josef W. Segur on 31 Mar 2007, 08:19:00 pm
Hi Simon,

Are you planning to play w/ and compare new MKL 9.1 Beta?
You think it has caught-up/surpassed speed of IPP? 

It would be interesting to know if they're products of separate teams within Intel which compete, or basically the same code under the hood with different interface and focus. My assumption has been the latter, in which case whichever one has the most recent release should be "better" in some sense. But note that "better" does not always mean "faster".
                                                                             Joe
Title: Re: optimized sources
Post by: Gecko_R7 on 31 Mar 2007, 11:33:07 pm
Think I may have as close to an Apples to Apples comparo of IPP vs. MKL 9.0

XEON 3.0 w/ IPP vs. MKL 8.1 in the first graph.
XEON 3.0 w/ new MKL 9.0 in the second

Looks pretty close w/ the new MKL 9.0 being slightly quicker than IPP in the 16K to 132K range.... if this is truly a level comparison.
At 16K, IPP = 12.5 Gflops vs. @ 13.5 Gflops for MKL 9.0 or @ 8% quicker
At 132K, IPP = 11.5 Gflops vs. @ 12.25 Gflops for MKL 9.0 or @ 6% quicker

I'd assume there are "other" improvements in 9.x w/ further optimization relevance as well?
Would the added trigonometric and other complex data support in the 9.0 VML also be worth a closer look?









[attachment deleted by admin]

[attachment deleted by admin]
Title: Re: optimized sources
Post by: msattler on 01 Apr 2007, 12:47:15 am
Does this mean I might have some newly compiled apps to test soon?
Title: Re: optimized sources
Post by: Crunch3r on 01 Apr 2007, 02:16:20 pm
Think I may have as close to an Apples to Apples comparo of ICC vs. MKL 9.0

XEON 3.0 w/ IPP vs. MKL 8.1 in the first graph.
XEON 3.0 w/ new MKL 9.0 in the second

Looks pretty close w/ the new MKL 9.0 being slightly quicker than ICC in the 16K to 132K range.... if this is truly a level comparison.
At 16K, ICC = 12.5 Gflops vs. @ 13.5 Gflops for MKL 9.0 or @ 8% quicker
At 132K, ICC = 11.5 Gflops vs. @ 12.25 Gflops for MKL 9.0 or @ 6% quicker

I'd assume there are "other" improvements in 9.x w/ further optimization relevance as well?
Would the added trigonometric and other complex data support in the 9.0 VML also be worth a closer look?

Hi,

MKL 9.0 is way faster than 8.0 and is %aual or in some cases depending on the ar faster than ipp.
Some weeks ago I've build a app from stock source and compared it to an old 5.12 and it was faster.

Regarding the trigonometric stuff imho it is worth looking into it!
But it depends on Ben and Joe if they like to have acloser look at it.


Title: Re: optimized sources
Post by: _heinz on 02 Apr 2007, 04:42:02 pm
Hi,

- found some interesting parts in the asm32 asseblerpackage, we should give it a chance to implement here.
- study now Intels IPP resources, before I download it and try. Go on with further optimization.
seti_britta  :)
Title: Re: optimized sources
Post by: Crunch3r on 02 Apr 2007, 05:00:12 pm
Hi,

- found some interesting parts in the asm32 asseblerpackage, we should give it a chance to implement here.
- study now Intels IPP resources, before I download it and try. Go on with further optimization.
seti_britta  :)

Hi seti_britta,

IMHO i'm not very keen on asm. To be honest i'd prefer intrinsics. Asm code depends on either your using windows /Linux and the compiler style like gcc, ms compiler or preferably ICC.

While porting the 2.2b apps to linux we had allready a hadache regarding asm code.....

However if you try studiing IPP i suggest go for ipp 5.2 ( it's not public atm but i can give you the manuals if you need them).


Title: Re: optimized sources
Post by: _heinz on 02 Apr 2007, 08:14:38 pm
Hi,

- found some interesting parts in the asm32 asseblerpackage, we should give it a chance to implement here.
- study now Intels IPP resources, before I download it and try. Go on with further optimization.
seti_britta  :)

Hi seti_britta,

IMHO i'm not very keen on asm. To be honest i'd prefer intrinsics. Asm code depends on either your using windows /Linux and the compiler style like gcc, ms compiler or preferably ICC.

While porting the 2.2b apps to linux we had allready a hadache regarding asm code.....

However if you try studiing IPP i suggest go for ipp 5.2 ( it's not public atm but i can give you the manuals if you need them).



Hi Crunch3r,
I believe I get it already, from a link you published. Its document number A24968-019US, isn´t it??
Title: Re: optimized sources
Post by: _heinz on 03 Apr 2007, 04:24:57 pm
Hi Crunch3r, Simon

have some problems with the include File
#include CMATH_LIB
can anybody let me use this File ?? or must I download ??

regards seti_britta

Title: Re: optimized sources
Post by: Simon on 03 Apr 2007, 04:33:20 pm
No,

it's a define. It's inside client/config.h (may not be inside win_config.h, which is probably why you're having trouble).

Quote
#ifdef __INTEL_COMPILER
#define MATH_LIB <mathimf.h>
#define CMATH_LIB <mathimf.h>
#else
#define MATH_LIB <math.h>
#define CMATH_LIB <cmath>
#endif

HTH,
Simon.
Title: Re: optimized sources
Post by: _heinz on 03 Apr 2007, 07:43:16 pm
hi Simon,
thx
have some problems with MEM.calloc  , it looks like the following lines
Kompilieren...
chirpfft.cpp
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(130) : error C2065: 'MEM': nichtdeklarierter Bezeichner
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(130) : warning C4002: Zu viele übergebene Parameter für das Makro 'calloc'
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(130) : error C2228: Links von "._calloc_dbg" muss sich eine Klasse/Struktur/Union befinden.
        Typ ist ''unknown-type''
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(226) : warning C4002: Zu viele übergebene Parameter für das Makro 'calloc'
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(226) : error C2228: Links von "._calloc_dbg" muss sich eine Klasse/Struktur/Union befinden.
        Typ ist ''unknown-type''
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(262) : error C2228: Links von "._free_dbg" muss sich eine Klasse/Struktur/Union befinden.
        Typ ist ''unknown-type''
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(515) : error C3861: "count_flops": Bezeichner wurde nicht gefunden.
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(566) : error C2228: Links von ".alloc" muss sich eine Klasse/Struktur/Union befinden.
        Typ ist ''unknown-type''
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(567) : error C2228: Links von ".alloc" muss sich eine Klasse/Struktur/Union befinden.
        Typ ist ''unknown-type''
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(716) : error C2228: Links von "._free_dbg" muss sich eine Klasse/Struktur/Union befinden.
        Typ ist ''unknown-type''
c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\chirpfft.cpp(719) : error C2228: Links von "._free_dbg" muss sich eine Klasse/Struktur/Union befinden.
        Typ ist ''unknown-type''
Das Buildprotokoll wurde unter "file://c:\boincstuff\kwsn-seti_boinc_1.3\seti_boinc\client\win_build\Debug\BuildLog.htm" gespeichert.
seti_boinc - 9 Fehler, 2 Warnung(en)

any suggestions ?? any macro definitons ??
Title: Re: optimized sources
Post by: _heinz on 03 Apr 2007, 09:21:27 pm
uh, that looks like the fft pack is not available
I believe I have a wrong config.h file
now I see in the win-config.h is a part for fft
/* Define to 1 if you have the `fftw' library (-lfftw). */
#define HAVE_LIBFFTW 1
looks like win-config is not active
#ifdef _WIN32
#include "win-config.h"
#else
must define _WIN32
#define _WIN32  1
will now try again
Title: Re: optimized sources
Post by: Crunch3r on 04 Apr 2007, 06:14:15 am
Hi Seti_britta,

looks like you picked the wrong source pack kwsn-seti_boinc_1.3

This is the one you should have used http://lunatics.at/index.php?module=Downloads;sa=dlview;id=71

HTH

Title: Re: optimized sources
Post by: _heinz on 04 Apr 2007, 04:57:03 pm
hallo Crunch3r,
danke für den Hinweis.
Grad hab ich nochmal von vorn angefangen, aber es scheint dass ich grundsätzlich was falsch mache.
der jetzige stand:
- so ca. vor einem Monat hab ich die quellen von seti_boinc_2k3_2.2B-Ben-Joe heruntergeladen und das entpackte Verzeichnis direkt in C: erstellt
also c:\seti_boinc_2k3_2.2B-Ben-Joe
darin hab ich auch meine quelltextänderungen vorgenommen, jedoch ohne zu kompilieren.
------------------------------------------------------------
nachdem ich nun Visual Studio 8 und Visual C++ Express installiert und danach mit Microsoft Platform SDK for Windows Server 2003 R2 update gemacht habe ist die Entwicklungsumgebung fertiggestellt.
Ich habe sie ausprobiert mit eigenen Quellen und man kann fertige exe files oder Windowsanwendungen erzeugen, die auch abarbeitbar sind. Soweit so gut.
------------------------------------------------------------------------
Nun habe ich mich an die How Tos gehalten und nach Simons Angaben für den Windows client gehandelt.
Nach dem download und entpacken haben wir folgendes:
c:\boincstuff
darin 1 Verzeichnis --> kwsn-seti_boinc_1.3
darin 2 Verzeichnisse --> boinc und seti_boinc
unter seti_boinc --> client --> win_build findet man das Project mit allen Teilen.
Nun kann man kann man seti_boinc.sln aufrufen und das Project konvertieren lassen.  Es wird die Projektmappe seti_boinc erzeugt, die übernommenen Prjektnamen werden erzeugt, sind aber leer. kein Problem wir löschen sie im Projektexplorer und sagen im Visual C++ unter -->Datei -->Hinzufügen --> Vorhandenes Project
-->client -->win_build -->boincglut  und so weiter um die Projektteile zu übernehmen.
Anschliessend sind unter dem Project seti_boinc --> Konfigurationseigenschaften -->C/C++ --> Allgemein
 -->zusätzliche Include Verzeichnisse  die entsprechenden zusätzlichen Pfade für include Dateien anzugeben.
Jetzt ist alles soweit fertig und man kann kompilieren. Wenn was nicht gefunden wird, entsprechende Include Verzeichnisse hinzufügen.
Bemerkung: im allgemeinen sollte das aber ohne zusätzliche include Verzeichnisse funktionieren, wenn man die Verzeichnisstruktur einhält, unter der die Solution seti_boinc.sln erzeugt wurde.(kenne ich leider nicht)

Simon/Crunch3r  Bitte noch einmal einen Hinweis zur Verzeichnisstruktur geben.
---------------------------------------------------------------------
Doch ich habe meine Änderungen in c:\seti_boinc_2k3_2.2B-Ben-Joe gemacht und möchte damit auch arbeiten.
Das erste was mir auffällt ist das beim entpacken von seti_boinc_2k3_2.2B1-Ben-Joe.7z nach c:\seti_boinc_2k3_2.2B-Ben-Joe, in diesem Verzeichnis kein Verzeichnis boinc wie weiter oben beschrieben besteht.
Wenn ich also c:\seti_boinc_2k3_2.2B-Ben-Joe nach Visual Studio C++ übernehme und damit arbeite habe ich folgende Probleme:
1. Include Dateien werden nicht gefunden, obwohl sie schon da sein sollten( wahrscheinlich benutze ich eine falsche Verzeichnisstruktur.
2. wenn ich sie als zusätzliche include nachgetragen habe gibt es folgendes Problem:
- bein compilieren -->#include "config.h" wird aufgerufen
- in config.h --->
#ifdef _WIN32
#include "win-config.h"
- win-config wird aufgerufen --->
#include "boinc_win.h"
boinc_win.h gibt es aber im Verzeichnis c:\seti_boinc_2k3_2.2B-Ben-Joe nicht
-------------------------------------------------------------------------------------------------------------
über einige kleine Hinweise würde ich mich freuen
mfg seti_britta





Title: Re: optimized sources
Post by: _heinz on 05 Apr 2007, 05:03:08 am
try to solve the problem
- have now installed Intel IPP 5.1.1 for windows (evaluation)
- have now installed Intel MKL 9.0 (evaluation)
*************************************
try to compile a seti client without any changes
- we will see what happen

Title: Re: optimized sources
Post by: Crunch3r on 05 Apr 2007, 05:17:41 am

darin 1 Verzeichnis --> kwsn-seti_boinc_1.3
darin 2 Verzeichnisse --> boinc und seti_boinc

Doch ich habe meine Änderungen in c:\seti_boinc_2k3_2.2B-Ben-Joe gemacht und möchte damit auch arbeiten.
Das erste was mir auffällt ist das beim entpacken von seti_boinc_2k3_2.2B1-Ben-Joe.7z nach c:\seti_boinc_2k3_2.2B-Ben-Joe, in diesem Verzeichnis kein Verzeichnis boinc wie weiter oben beschrieben besteht.
Wenn ich also c:\seti_boinc_2k3_2.2B-Ben-Joe nach Visual Studio C++ übernehme und damit arbeite habe ich folgende Probleme:
1. Include Dateien werden nicht gefunden, obwohl sie schon da sein sollten( wahrscheinlich benutze ich eine falsche Verzeichnisstruktur.
2. wenn ich sie als zusätzliche include nachgetragen habe gibt es folgendes Problem:
- bein compilieren -->#include "config.h" wird aufgerufen
- in config.h --->
#ifdef _WIN32
#include "win-config.h"
- win-config wird aufgerufen --->
#include "boinc_win.h"
boinc_win.h gibt es aber im Verzeichnis c:\seti_boinc_2k3_2.2B-Ben-Joe nicht
-------------------------------------------------------------------------------------------------------------
über einige kleine Hinweise würde ich mich freuen
mfg seti_britta

Hallo,

Das hört sich ja schlimm an ...

also....

am besten kopierst du erst mal das boinc verzeichnis aus dem kwsn-seti_boinc_1.3 nach c:\ , wenn die source "seti_boinc_2k3_2.2B-Ben-Joe" auch direkt auf C: liegt, wenn nicht, dann halt in den selben unterorder.

Damit sollte sich das problem mit der "boinc_win.h" schon mal erledigt haben.

So sieht es bei mir aus.

Code: [Select]
C:\SOURCE\32-bit>dir

 Verzeichnis von C:\SOURCE\32-bit

22.03.2007  18:59    <DIR>          .
22.03.2007  18:59    <DIR>          ..
16.01.2007  19:12    <DIR>          boinc
25.02.2007  15:57    <DIR>          seti_boinc_2k3_2.2B1-Ben-Joe


Title: Re: optimized sources
Post by: _heinz on 05 Apr 2007, 06:40:47 am

darin 1 Verzeichnis --> kwsn-seti_boinc_1.3
darin 2 Verzeichnisse --> boinc und seti_boinc

Doch ich habe meine Änderungen in c:\seti_boinc_2k3_2.2B-Ben-Joe gemacht und möchte damit auch arbeiten.
Das erste was mir auffällt ist das beim entpacken von seti_boinc_2k3_2.2B1-Ben-Joe.7z nach c:\seti_boinc_2k3_2.2B-Ben-Joe, in diesem Verzeichnis kein Verzeichnis boinc wie weiter oben beschrieben besteht.
Wenn ich also c:\seti_boinc_2k3_2.2B-Ben-Joe nach Visual Studio C++ übernehme und damit arbeite habe ich folgende Probleme:
1. Include Dateien werden nicht gefunden, obwohl sie schon da sein sollten( wahrscheinlich benutze ich eine falsche Verzeichnisstruktur.
2. wenn ich sie als zusätzliche include nachgetragen habe gibt es folgendes Problem:
- bein compilieren -->#include "config.h" wird aufgerufen
- in config.h --->
#ifdef _WIN32
#include "win-config.h"
- win-config wird aufgerufen --->
#include "boinc_win.h"
boinc_win.h gibt es aber im Verzeichnis c:\seti_boinc_2k3_2.2B-Ben-Joe nicht
-------------------------------------------------------------------------------------------------------------
über einige kleine Hinweise würde ich mich freuen
mfg seti_britta

Hallo,

Das hört sich ja schlimm an ...

also....

am besten kopierst du erst mal das boinc verzeichnis aus dem kwsn-seti_boinc_1.3 nach c:\ , wenn die source "seti_boinc_2k3_2.2B-Ben-Joe" auch direkt auf C: liegt, wenn nicht, dann halt in den selben unterorder.

Damit sollte sich das problem mit der "boinc_win.h" schon mal erledigt haben.

So sieht es bei mir aus.

Code: [Select]
C:\SOURCE\32-bit>dir

 Verzeichnis von C:\SOURCE\32-bit

22.03.2007  18:59    <DIR>          .
22.03.2007  18:59    <DIR>          ..
16.01.2007  19:12    <DIR>          boinc
25.02.2007  15:57    <DIR>          seti_boinc_2k3_2.2B1-Ben-Joe



danke, jetzt gehts schon viel besser, problem boinc_win.h ist erledigt
konnte einige von mir geänderte quellfiles erfolgreich kompilieren.
chirpfft.cpp --> OK, keine Warnungen, keine Fehler
analyzeReport.cpp --> OK
analyzePot.cpp --> OK
bei analyzeFuncs.cpp im Teil generate fft coefficients kennt er keine einzige variable von fft
siehe --->
Kompilieren...
analyzeFuncs.cpp
----ooura----
..\analyzeFuncs.cpp(381) : error C2065: 'BitRevTab': nichtdeklarierter Bezeichner
..\analyzeFuncs.cpp(420) : error C2065: 'CoeffTab': nichtdeklarierter Bezeichner
usw. usw.
mal sehen, muss noch suchen, hast ne idee ??
Title: Re: optimized sources
Post by: Crunch3r on 05 Apr 2007, 07:09:24 am
Hallo,

du hast was vergessen... und zwar muss du noch einen "-DUSE_IPP" oder "DUSE_FFTWF" als Präprozessor definieren, sonnst wird die ouura routine für die ffts vewendet (viel zu alt nutzt keiner mehr)
Title: Re: optimized sources
Post by: _heinz on 05 Apr 2007, 08:25:08 am
Hallo,

du hast was vergessen... und zwar muss du noch einen "-DUSE_IPP" oder "DUSE_FFTWF" als Präprozessor definieren, sonnst wird die ouura routine für die ffts vewendet (viel zu alt nutzt keiner mehr)

merci, hast recht, so ist es
Title: Re: optimized sources
Post by: _heinz on 05 Apr 2007, 08:39:01 am
jetzt hats geklappt  ;D
------ Erstellen gestartet: Projekt: seti_boinc, Konfiguration: Release Win32 ------
Kompilieren...
analyzeFuncs.cpp
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release\BuildLog.htm" gespeichert.
seti_boinc - 0 Fehler, 0 Warnung(en)
========== Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========
nochmals alles prüfen, kommt mir so unwahrscheinlich vor
danke nochmals Crunch3r
Title: Re: optimized sources
Post by: _heinz on 05 Apr 2007, 09:18:50 am
hab noch ein kleines Problem beim linken, ansonsten scheints OK zu sein
LINK : fatal error LNK1181: Eingabedatei "glut32.lib" kann nicht geöffnet werden.
die datei gibt´s aber nur im anderen project --->kwsn-seti_boinc_1.3\seti_boinc\client\win_build\Debug
----------------------------
muss prüfen wuzu er die braucht
Title: Re: optimized sources
Post by: Crunch3r on 05 Apr 2007, 09:36:22 am
das ist für grafik... brauchst du aber nicht.

Du musst die "Projektmappenkonfiguration" auf ---> "RELEASE32-NOGFX" umstellen und neue kompilieren.
Schau dann aber vorher noch mal nach dem "-DUSE_IPP"  ;) ob das auch in der Konfiguration definiert ist.


Title: Re: optimized sources
Post by: _heinz on 05 Apr 2007, 03:18:05 pm
gut, hab ich soweit gemacht ;)
aber vielleicht mal etwas grundsätzliches
nach der konvertierung zu Visual C++ 2005 habe ich 10 Projekte in der Projektmappe:
boincglut
glut
image_libs
jpeglib
libboinc
libboincapi
non_ICC
Optimizer
seti_boinc
setiboincdb
----------------------------------
non_ICC ist hervorgehoben und das Startprojekt
die quellen in denen ich Änderungen vorgenommen habe liegen in seti_boinc und lassen sich alle fehlerfrei compilieren. soweit OK.
Muss ich jedes Projekt neu erstellen, oder nur seti_boinc ??
wenn ich nun  das projekt seti_boinc erstelle, hab ich probleme mit dem projekt jpeglib
diverse dateien von jpeg werden nicht gefunden.
wenn ich nun das projekt jpeglib neu erstelle, werden alle c-programme übersetzt, aber beim linken erhalte ich folgende Fehlermeldung:
-----------------------------------
.... die vorhergehenden
jccolor.c
jccoefct.c
jcapistd.c
jcapimin.c
Bibliothek wird erstellt...
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\jcapimin.obj" kann nicht geöffnet werden.
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
jpeglib - 1 Fehler, 0 Warnung(en)
---------------------------------------------------
jcapimin.c ist aber übersetzt worden, aber scheinbar sucht der Linker an der falschen Stelle, oder der objektmodul ist da wo er nicht hingehört,denn irgendwo muss er ja sein
theorethisch müssten die objektmodule in  win_build\Release32-NOGFX sein
muss das mal kontrollieren
Noch was vergessen ???
andere idee ??



Title: Re: optimized sources
Post by: Crunch3r on 05 Apr 2007, 03:31:54 pm
gut, hab ich soweit gemacht ;)
aber vielleicht mal etwas grundsätzliches
nach der konvertierung zu Visual C++ 2005 habe ich 10 Projekte in der Projektmappe:
boincglut
glut
image_libs
jpeglib
libboinc
libboincapi
non_ICC
Optimizer
seti_boinc
setiboincdb
----------------------------------
non_ICC ist hervorgehoben und das Startprojekt
die quellen in denen ich Änderungen vorgenommen habe liegen in seti_boinc und lassen sich alle fehlerfrei compilieren. soweit OK.
Muss ich jedes Projekt neu erstellen, oder nur seti_boinc ??
wenn ich nun  das projekt seti_boinc erstelle, hab ich probleme mit dem projekt jpeglib
diverse dateien von jpeg werden nicht gefunden.
wenn ich nun das projekt jpeglib neu erstelle, werden alle c-programme übersetzt, aber beim linken erhalte ich folgende Fehlermeldung:
-----------------------------------
.... die vorhergehenden
jccolor.c
jccoefct.c
jcapistd.c
jcapimin.c
Bibliothek wird erstellt...
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\jcapimin.obj" kann nicht geöffnet werden.
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
jpeglib - 1 Fehler, 0 Warnung(en)
---------------------------------------------------
jcapimin.c ist aber übersetzt worden, aber scheinbar sucht der Linker an der falschen Stelle, oder der objektmodul ist da wo er nicht hingehört,denn irgendwo muss er ja sein
theorethisch müssten die objektmodule in  win_build\Release32-NOGFX sein
muss das mal kontrollieren
Noch was vergessen ???
andere idee ??

du brauchst nur folgende Projekte kompilieren:

Code: [Select]
libboinc
libboincapi
Optimizer
seti_boinc
setiboincdb

WICHTIG! rechtsklick auf das seti_boinc projekt machen und schauen, wegen den "abhängigkeiten" gegebenenfalls bei:
"boincglut,glut,image_libs,jpeglib" das häcken raus nehmen.

Damit sollten die restlichen probleme auch verschwunden sein.

HTH


Title: Re: optimized sources
Post by: _heinz on 05 Apr 2007, 06:06:54 pm
habe Abhängigkeiten geprüft :-)
libboinc hat geklappt
libboincapi hat noch  problem, findet die objektmodule nicht
setiboincdb hat noch problem, wie zuvor
----------------------------------------------------------
------ Erstellen gestartet: Projekt: setiboincdb, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
xml_util.cpp
sqlrow.cpp
sqlint8.cpp
sqlblob.cpp
Bibliothek wird erstellt...
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\sqlblob.obj" kann nicht geöffnet werden.
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
setiboincdb - 1 Fehler, 0 Warnung(en)
--------------------------------------------------------
Optimizer bringt 3 Fehler

------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
opt_FPU.cpp
Kompilieren...
cl : Befehlszeile warning D9002 : Unbekannte Option "/QxK" wird ignoriert.
opt_SSE.cpp
Kompilieren...
cl : Befehlszeile warning D9002 : Unbekannte Option "/QxB" wird ignoriert.
opt_SSE2.cpp
Kompilieren...
cl : Befehlszeile warning D9002 : Unbekannte Option "/QxT" wird ignoriert.
opt_SSE3.cpp
c:\boincstuff\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\opcodes_SSE3.hpp(12) : fatal error C1083: Datei (Include) kann nicht ge÷ffnet werden: "pmmintrin.h": No such file or directory
memspeed.cpp
FoldTst.cpp
BHSSEfold.cpp
.\BHSSEfold.cpp(65) : fatal error C1083: Datei (Include) kann nicht ge÷ffnet werden: "ipp_w7.h": No such file or directory
AKfoldSSE.cpp
.\AKfoldSSE.cpp(45) : fatal error C1083: Datei (Include) kann nicht ge÷ffnet werden: "ipp_w7.h": No such file or directory
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 3 Fehler, 3 Warnung(en)
------------------------------------------------------------

seti_boinc brauch ich noch garnicht probieren, weil der die anderen braucht
muss nochmal suchen warum der immer die objektmodule nicht findet,
beim optimizer müssen wir dann Ben fragen, da stimmt was nicht, habe SSE2 angegeben(P4)
 ;)

Title: Re: optimized sources
Post by: Crunch3r on 05 Apr 2007, 06:27:06 pm
Och Mädchen ....  :P  ;D

Also ......


habe Abhängigkeiten geprüft :-)
libboinc hat geklappt
libboincapi hat noch  problem, findet die objektmodule nicht
setiboincdb hat noch problem, wie zuvor
----------------------------------------------------------
------ Erstellen gestartet: Projekt: setiboincdb, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
xml_util.cpp
sqlrow.cpp
sqlint8.cpp
sqlblob.cpp
Bibliothek wird erstellt...
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\sqlblob.obj" kann nicht geöffnet werden.
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
setiboincdb - 1 Fehler, 0 Warnung(en)

include mal das output directory bei mir ware das so ...
Code: [Select]
C:\SOURCE\32-bit\seti_boinc_2k3_2.2B1-Ben-Joe\client\win_build\Release32-NOGFX
----------------------------------------------------

Quote
Optimizer bringt 3 Fehler

------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
opt_FPU.cpp
Kompilieren...
cl : Befehlszeile warning D9002 : Unbekannte Option "/QxK" wird ignoriert.
opt_SSE.cpp
Kompilieren...
cl : Befehlszeile warning D9002 : Unbekannte Option "/QxB" wird ignoriert.
opt_SSE2.cpp
Kompilieren...
cl : Befehlszeile warning D9002 : Unbekannte Option "/QxT" wird ignoriert.
opt_SSE3.cpp
c:\boincstuff\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\opcodes_SSE3.hpp(12) : fatal error C1083: Datei (Include) kann nicht ge÷ffnet werden: "pmmintrin.h": No such file or directory
memspeed.cpp
FoldTst.cpp
BHSSEfold.cpp
.\BHSSEfold.cpp(65) : fatal error C1083: Datei (Include) kann nicht ge÷ffnet werden: "ipp_w7.h": No such file or directory
AKfoldSSE.cpp
.\AKfoldSSE.cpp(45) : fatal error C1083: Datei (Include) kann nicht ge÷ffnet werden: "ipp_w7.h": No such file or directory
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 3 Fehler, 3 Warnung(en)
------------------------------------------------------------

Das liegt daran, dass du nicht mit dem intel compiler kompilerst (der M$ kompiler erkennt kein QxK, QxW etc...) , und auch den verzeichnispfad von IPP hast du nicht included (wäre bei dir wohl sowas wie C:\Programme\Intel\IPP\5.1\ia32\tools\staticlib)


Quote
seti_boinc brauch ich noch garnicht probieren, weil der die anderen braucht
muss nochmal suchen warum der immer die objektmodule nicht findet,
beim optimizer müssen wir dann Ben fragen, da stimmt was nicht, habe SSE2 angegeben(P4)
 ;)

So wie ich das sehe, hast du vergessen das ganze Projekt auf den intel compiler zu convertieren --->
z.b. rechtsklick auf seti_boinc und dann ganz unten auf "convert to use Intel C++ Project System"

Danach wird ein blaues "C++" vor der projektmappe angezeigt

das machst du dann mit allen projektmappen und includest alle verzeichnisse die haeder files VOM intel compiler UND ipp  beinhalten.

Danach sehen wir weiter  ;)

P.S.

Wenn das alles nicht funktioniert, dann sag mir mal die versionen von deinem intel compiler, IPP und in welchem verzeichnis du die source hast. Dann mach ich dir das mal fertig.



Title: Re: optimized sources
Post by: _heinz on 07 Apr 2007, 03:12:22 pm
Hallo Crunch3r,

danke für deine wertvollen Hinweise.  Es gibt Erfolge zu vermelden  ;D
libboincapi ist OK, nach einigen Änderungen am Quelltext, veralteter deklarationsstiel, typumwandlungen etc.
...
boinc_api.C
Code wird generiert...
Bibliothek wird erstellt...
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
libboincapi - 0 Fehler, 2 Warnung(en)
------------------------------------------------------------
setiboincdb ist OK , einige Änderungen erforderlich, deklarationen, typ etc.
Kompilieren...
xml_util.cpp
sqlrow.cpp
sqlint8.cpp
sqlblob.cpp
Code wird generiert...
Bibliothek wird erstellt...
sqlint8.obj : warning LNK4221: Es wurden keine öffentlichen Symbole gefunden. Zugriff auf archivierten Member wird nicht möglich sein.
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
setiboincdb - 0 Fehler, 1 Warnung(en)
-----------------------------------------------------------------
zu Optimizer: ein Problem war IPP 5.1, die kann man nicht verwenden, weil es bei der Installation nicht alle Dateien ordentlich ausgepackt hat. !!!
hab jetzt IPP 5.2Beta installiert und sieht so aus als ob es was wird, zumindest werden die entsprechenden includes gefunden, den Rest wird man sehen. Bin optimistisch.
-----------------------------------------------------------------
Hauptsächliche Probleme im Projekt sind:
1. Migrationsprobleme -->siehe http://msdn2.microsoft.com/de-de/library/ms235289(VS.80).aspx
2. varalteter Deklarationsstil
3. Typkonvertierungen
4. Konvertierungen bei Parameterübernahme und Rückgabe in Funktionen.
-------------------------------------------------------------------
um mal einige zu nennen
in gutil --> 27 typconvertierungen bearbeitet
in gutil_text 2 Fehler,  bearbeitet
das sieht dann so aus:
------ Erstellen gestartet: Projekt: libboincapi, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
gutil_text.C
..\..\..\boinc\api\gutil_text.C(335) : error C2440: 'Initialisierung': 'const char *' kann nicht in 'char *' konvertiert werden
        Durch die Konvertierung gehen Qualifizierer verloren
..\..\..\boinc\api\gutil_text.C(341) : error C2440: '=': 'const char *' kann nicht in 'char *' konvertiert werden
        Durch die Konvertierung gehen Qualifizierer verloren
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
libboincapi - 2 Fehler, 0 Warnung(en)
----------------------------------------------------------
bearbeitet und korrigiert  :)
boincapi ist mit boincdb verknüpft, heisst eine Änderung in boincapi kann eine Änderung in boincdb nach sich ziehen.
Aber da bin ich jetzt schon durch.  Es ist ne Menge Arbeit. Deswegen Geduld ist angesagt......
----------------------------------------------------------
Antwort zum Compiler:
Microsoft (R) 32-Bit c/c++ -Optimierungscompiler Version 14.00.50727.42 für 8086

---------------------------------------------------------
Und jetzt ist Ostern und es gibt einen Aperitif für mich  ;D
Ich wünsche allen ein Frohes Osterfest
Happy Eastern








Title: Re: optimized sources
Post by: Crunch3r on 07 Apr 2007, 03:53:52 pm
Hallo Crunch3r,

danke für deine wertvollen Hinweise.  Es gibt Erfolge zu vermelden  ;D
libboincapi ist OK, nach einigen Änderungen am Quelltext, veralteter deklarationsstiel, typumwandlungen etc.
...
boinc_api.C
Code wird generiert...
Bibliothek wird erstellt...
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
libboincapi - 0 Fehler, 2 Warnung(en)
------------------------------------------------------------
setiboincdb ist OK , einige Änderungen erforderlich, deklarationen, typ etc.
Kompilieren...
xml_util.cpp
sqlrow.cpp
sqlint8.cpp
sqlblob.cpp
Code wird generiert...
Bibliothek wird erstellt...
sqlint8.obj : warning LNK4221: Es wurden keine öffentlichen Symbole gefunden. Zugriff auf archivierten Member wird nicht möglich sein.
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
setiboincdb - 0 Fehler, 1 Warnung(en)
-----------------------------------------------------------------

Das sieht ja soweit gut aus  ;D

Quote

zu Optimizer: ein Problem war IPP 5.1, die kann man nicht verwenden, weil es bei der Installation nicht alle Dateien ordentlich ausgepackt hat. !!!
hab jetzt IPP 5.2Beta installiert und sieht so aus als ob es was wird, zumindest werden die entsprechenden includes gefunden, den Rest wird man sehen. Bin optimistisch.
-----------------------------------------------------------------



Mensch... stimmt da hab ich nicht dran gedacht, bei der evaluation version, werden nur die dynamischen libs installiert und die statischen header fehlen...

Quote
Hauptsächliche Probleme im Projekt sind:
1. Migrationsprobleme -->siehe http://msdn2.microsoft.com/de-de/library/ms235289(VS.80).aspx
2. varalteter Deklarationsstil
3. Typkonvertierungen
4. Konvertierungen bei Parameterübernahme und Rückgabe in Funktionen.
-------------------------------------------------------------------


Stimmt vs 2005 ist etwas pingelig, wenn es um den code geht, was ich aber gut finde, weil es mehr Ansi c konform ist und nicht jeder wilden code programmieren kann.

Quote
------ Erstellen gestartet: Projekt: libboincapi, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
gutil_text.C
..\..\..\boinc\api\gutil_text.C(335) : error C2440: 'Initialisierung': 'const char *' kann nicht in 'char *' konvertiert werden
        Durch die Konvertierung gehen Qualifizierer verloren
..\..\..\boinc\api\gutil_text.C(341) : error C2440: '=': 'const char *' kann nicht in 'char *' konvertiert werden
        Durch die Konvertierung gehen Qualifizierer verloren
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
libboincapi - 2 Fehler, 0 Warnung(en)
----------------------------------------------------------
bearbeitet und korrigiert  :)
boincapi ist mit boincdb verknüpft, heisst eine Änderung in boincapi kann eine Änderung in boincdb nach sich ziehen.
Aber da bin ich jetzt schon durch.  Es ist ne Menge Arbeit. Deswegen Geduld ist angesagt......
----------------------------------------------------------

Den gutil hättest du eigentlich nicht gebraucht .... nur die boinc_api.c, denn der rest ist nur für gfx

Quote
Antwort zum Compiler:
Microsoft (R) 32-Bit c/c++ -Optimierungscompiler Version 14.00.50727.42 für 8086
---------------------------------------------------------

Also hast du nicht den intel compiler installiert ?

Quote
Und jetzt ist Ostern und es gibt einen Aperitif für mich  ;D
Ich wünsche allen ein Frohes Osterfest
Happy Eastern
Joyeuses Pâques

Genau ! Prost und auch dir frohe Ostern









Title: Re: optimized sources
Post by: _heinz on 08 Apr 2007, 08:03:22 pm
@joe small question
can anybody tell me in which program  the file schema_master.cpp will be generated automatically?
-----------------------------------------
using Visual C++ 2005
have some migrationproblems with constructs like const char *
in some cases this implicate errors of typ converting
think some changes in the headerfiles of the db project are necessary,
for instance: db_table.h schema_master.h
-----------------------------------------
if I compile seti_header.cpp of the seti_boinc get someting like the following:
------ Erstellen gestartet: Projekt: seti_boinc, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
seti_header.cpp
..\seti_header.cpp(128) : error C2664: 'std::_Vector_const_iterator<_Ty,_Alloc>::_Vector_const_iterator(const std::_Vector_const_iterator<_Ty,_Alloc> &)': Konvertierung des Parameters 1 von 'int' in 'const std::_Vector_const_iterator<_Ty,_Alloc> &' nicht möglich
        with
        [
            _Ty=coordinate_t,
            _Alloc=std::allocator<coordinate_t>
        ]
        Ursache: Konvertierung von 'int' in 'const std::_Vector_const_iterator<_Ty,_Alloc>' nicht möglich
        with
        [
            _Ty=coordinate_t,
            _Alloc=std::allocator<coordinate_t>
        ]
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
seti_boinc - 1 Fehler, 0 Warnung(en)
------------------------------------------------------
by analyse this error ---> we come to schema_master.cpp ---> from there to schema_master.h ---> then to db_table.h
there are constructs in print_xml, parse_xml and some others like this:
    std::string print_xml(int full_subtables=1, int show_ids=0, int no_refs=1,
         char *tag=type_name) const;
/*   const char *tag=type_name) const; */

------------------------------------------------------------------------------------------------
I think the const operator in string connections ist the problem  which implicate errors like above.
in schema_master.h are too some constructs  which must be corrected.
Will all do this, to solve the problems .  ;)
------------------------------------------------------
any other suggestions ??

Title: Re: optimized sources
Post by: Josef W. Segur on 08 Apr 2007, 11:04:03 pm
Urs Echternacht posted an attachment to this message (http://lunatics.at/windows/visual-studio-2005-compatibility-issues.msg544.html#msg544) identifying the changes he found necessary to compile the SETI 5.17 cvs sources with Visual C++ 2005 express. That might save you some time.

I don't know about schema_master.cpp, perhaps posting your question on the boinc_opt mailing list would get a reply from Eric Korpela.
                                                                                     Joe
Title: Re: optimized sources
Post by: _heinz on 11 Apr 2007, 03:59:39 pm
Hallo Crunch3r,

hab noch ein kleines Problem: wenn ich die Benutzung des Präprozessors einschalte, findet er beim linken die objektmodule nicht. Schalte ich ihn aus findet der linker alles.
Muss man beim Bibliothekar noch Einträge machen ??
hast noch einen Tip für mich ?
---------------------------------------------
Buildprotokoll     Neu erstellen wurde gestartet: Projekt: "setiboincdb", Konfiguration: "Release32-NOGFX|Win32"
 Befehlszeilen     Die temporäre Datei "c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001E36283732.rsp" wird erstellt. Inhalt:
[
/O2 /Ob2 /Oi /Ot /Oy /GT /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "..\..\..\boinc\win_build" /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_MT" /D "_WINDOWS" /D "_CONSOLE" /D "HAVE_STD_MAX" /D "HAVE_STD_MIN" /D "HAVE_STD_TRANSFORM" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /P /GF /FD /EHsc /MT /Zp16 /GS- /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /TP "..\..\db\xml_util.cpp"

"..\..\db\sqlrow.cpp"

"..\..\db\sqlint8.cpp"

"..\..\db\sqlblob.cpp"
]Erstellen der Befehlszeile "cl.exe @"c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001E36283732.rsp" /nologo /errorReport:prompt"Die temporäre Datei "c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001F36283732.rsp" wird erstellt. Inhalt:
[
/OUT:"C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX/setiboincdb.lib" ".\Release32-NOGFX\sqlblob.obj"

".\Release32-NOGFX\sqlint8.obj"

".\Release32-NOGFX\sqlrow.obj"

".\Release32-NOGFX\xml_util.obj"
]Erstellen der Befehlszeile "lib.exe @"c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001F36283732.rsp" /NOLOGO" Ausgabefenster     Kompilieren...
xml_util.cpp
sqlrow.cpp
sqlint8.cpp
sqlblob.cpp
Bibliothek wird erstellt...
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\sqlblob.obj" kann nicht geöffnet werden.
 Ergebnisse     Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
setiboincdb - 1 Fehler, 0 Warnung(en)
 
Title: Re: optimized sources
Post by: Crunch3r on 11 Apr 2007, 04:10:30 pm
Hallo Crunch3r,

hab noch ein kleines Problem: wenn ich die Benutzung des Präprozessors einschalte, findet er beim linken die objektmodule nicht. Schalte ich ihn aus findet der linker alles.
Muss man beim Bibliothekar noch Einträge machen ??
hast noch einen Tip für mich ?
---------------------------------------------
Buildprotokoll     Neu erstellen wurde gestartet: Projekt: "setiboincdb", Konfiguration: "Release32-NOGFX|Win32"
 Befehlszeilen     Die temporäre Datei "c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001E36283732.rsp" wird erstellt. Inhalt:
[
/O2 /Ob2 /Oi /Ot /Oy /GT /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "..\..\..\boinc\win_build" /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_MT" /D "_WINDOWS" /D "_CONSOLE" /D "HAVE_STD_MAX" /D "HAVE_STD_MIN" /D "HAVE_STD_TRANSFORM" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /P /GF /FD /EHsc /MT /Zp16 /GS- /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /TP "..\..\db\xml_util.cpp"

"..\..\db\sqlrow.cpp"

"..\..\db\sqlint8.cpp"

"..\..\db\sqlblob.cpp"
]Erstellen der Befehlszeile "cl.exe @"c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001E36283732.rsp" /nologo /errorReport:prompt"Die temporäre Datei "c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001F36283732.rsp" wird erstellt. Inhalt:
[
/OUT:"C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX/setiboincdb.lib" ".\Release32-NOGFX\sqlblob.obj"

".\Release32-NOGFX\sqlint8.obj"

".\Release32-NOGFX\sqlrow.obj"

".\Release32-NOGFX\xml_util.obj"
]Erstellen der Befehlszeile "lib.exe @"c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001F36283732.rsp" /NOLOGO" Ausgabefenster     Kompilieren...
xml_util.cpp
sqlrow.cpp
sqlint8.cpp
sqlblob.cpp
Bibliothek wird erstellt...
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\sqlblob.obj" kann nicht geöffnet werden.
 Ergebnisse     Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
setiboincdb - 1 Fehler, 0 Warnung(en)
 

Also beim Bibliothekar (Ich nehme mal an das das der Linker ist), muss du auch das output directory ein eintragen bei den "aditional libraries" sollte auf dDeutsch sowas die "zusätzliche bibliotheken " heißen. Dort muss dann noch das ".\Release32-NOGFX\" rein oder wie auch immer es bei dir heißt.

 Dann sollte es gehen.
Title: Re: optimized sources
Post by: _heinz on 12 Apr 2007, 07:30:29 pm
Visual C++ 2005
-----------------------
Probleme mit Präprozessor
wenn der präprozessor eingeschaltet wird, findet der linker in der @befehlsdatei die objektmodule nicht und die Bibliothek wird nicht erstellt.
Dazu von Microsoft: http://support.microsoft.com/kb/839286/en-us
hab ich gemacht
Ausgabeverzeichnis   C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX
Zwischenverzeichnis  C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX
und beim Bibliothekar
Ausgabedatei   C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX/setiboincdb.lib
Zusätzliche Bibliotheksverzeichnisse C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX
-------------------------------------------
hat aber alles nix genutz, wie man hier sieht:
Befehlszeilen
    Die temporäre Datei "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001B4380620.rsp" wird erstellt. Inhalt:
[
/O2 /Ob2 /Oi /Ot /Oy /GT /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "..\..\..\boinc\win_build" /D "Release32-NOGFX" /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_MT" /D "_WINDOWS" /D "_CONSOLE" /D "HAVE_STD_MAX" /D "HAVE_STD_MIN" /D "HAVE_STD_TRANSFORM" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /EP /P /GF /FD /EHsc /MT /Zp16 /GS- /Gy /Fo"C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\\" /Fd"C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /TP "..\..\db\xml_util.cpp"

"..\..\db\sqlrow.cpp"

"..\..\db\sqlint8.cpp"

"..\..\db\sqlblob.cpp"
]Erstellen der Befehlszeile "cl.exe @"C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001B4380620.rsp" /nologo /errorReport:prompt"Die temporäre Datei "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001C4380620.rsp" wird erstellt. Inhalt:
[
/OUT:"C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX/setiboincdb.lib" /LIBPATH:"C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX"

".\Release32-NOGFX\sqlblob.obj"

".\Release32-NOGFX\sqlint8.obj"

".\Release32-NOGFX\sqlrow.obj"

".\Release32-NOGFX\xml_util.obj"
]Erstellen der Befehlszeile "lib.exe @"C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001C4380620.rsp""

Ausgabefenster
------ Neues Erstellen gestartet: Projekt: setiboincdb, Konfiguration: Release32-NOGFX Win32 ------
Die Zwischen- und Ausgabedateien für das Projekt "setiboincdb" mit der Konfiguration "Release32-NOGFX|Win32" werden gelöscht.
Kompilieren...
xml_util.cpp
sqlrow.cpp
sqlint8.cpp
sqlblob.cpp
Bibliothek wird erstellt...
Microsoft (R) Library Manager Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.
"/OUT:C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX/setiboincdb.lib" "/LIBPATH:C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX"
".\Release32-NOGFX\sqlblob.obj"
".\Release32-NOGFX\sqlint8.obj"
".\Release32-NOGFX\sqlrow.obj"
".\Release32-NOGFX\xml_util.obj"
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\sqlblob.obj" kann nicht geöffnet werden.
Das Buildprotokoll wurde unter "file://C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
setiboincdb - 1 Fehler, 0 Warnung(en)
========== Alles neu erstellen: 0 erfolgreich, Fehler bei 1, 0 übersprungen ==========
----------------------------------------------------------
hab mal gelesen, das das mit der @befehlsdatei beim linker nicht richtig funktioniert, man musste da noch irgendwas machen,  aber ist schon zu lange her.....
hab jetzt nochmal gesucht aber nix passendes gefunden. Bin echt gefrustet. :'(

weiss jemand woran das liegt?   Urs Echternacht ??
Title: Re: optimised sources
Post by: Aragon Speed on 13 Apr 2007, 12:34:31 am
@ seti_britta and Crunch3r.

I started following this thread with a sense of curiosity a while back, but I don't speak or understand German so I was lost after you changed languages :) .

Can either of you give me a rough update on how this is going?
Title: Re: optimized sources
Post by: _heinz on 13 Apr 2007, 03:32:21 pm
@Aragon
how you know I´m working on further optimization of the seti sourcecode. Till now I have done more about  two hundred changes in the sourcecode. The main problem is the migration from 2003 to Visual C++2005. Many changes are necessary to compile the sources without any error and without a lot of warnings. But that´s not an easy job and many problems are to solve on this way. That´s what I´m doing at the moment. Crunch3r helped me to set up the Visual C++2005  development system., using IPP and MKL.
-------------------------------------------
for all others: work is going on
glut
image_libs
jpeglib
libboinc
libboincapi
setiboincdb
are now compiled without any error, without any warnings   /w3
--------------------------------------------
Optimizer is 90% done,
--------------------------------
happy weekend  ;)
Title: Re: optimized sources
Post by: Simon on 13 Apr 2007, 04:26:22 pm
Hallo Britta,

ich habe mal rumgesucht und was gefunden, was Dir die Arbeit erleichtern sollte - ist als Attachment an diese Post angefügt.

Da hat sich jemand die Arbeit gemacht, alle Warnings und Errors bei VC 2005 wegzukriegen und alles zusammengefasst als Anleitung. Geht zwar von den Standard-Sourcen aus, ist dennoch sehr hilfreich.

Schönes WE,
Simon.

---------

I found a file posted by someone (he didn't put his name in the file, and I can't recall) that details all source changes necessary to get the S@H science app to compile with VC 2005. The file is attached to this post.

Have a nice weekend,
Simon.

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Urs Echternacht on 13 Apr 2007, 06:28:36 pm
Hallo Britta,

ich habe mal rumgesucht und was gefunden, was Dir die Arbeit erleichtern sollte - ist als Attachment an diese Post angefügt.

Da hat sich jemand die Arbeit gemacht, alle Warnings und Errors bei VC 2005 wegzukriegen und alles zusammengefasst als Anleitung. Geht zwar von den Standard-Sourcen aus, ist dennoch sehr hilfreich.

Schönes WE,
Simon.

---------

I found a file posted by someone (he didn't put his name in the file, and I can't recall) that details all source changes necessary to get the S@H science app to compile with VC 2005. The file is attached to this post.

Have a nice weekend,
Simon.
Hi Simon,
wer lesen kann ist klar im Vorteil. Name und email stehen zu unterst.
Gruß Urs
Title: Re: optimized sources
Post by: Simon on 13 Apr 2007, 06:30:05 pm
Entschuldigung Urs,

hab' ich wieder nur am Anfang gesucht, mein Fehler!

-> Blame Misfit!

lG,
Simon.
Title: Re: optimized sources
Post by: Urs Echternacht on 13 Apr 2007, 06:39:35 pm
Visual C++ 2005
-----------------------
...
weiss jemand woran das liegt?   Urs Echternacht ??
Hi seti_britta,
sorry i can't help you building with the Intel Compiler package, because i never tried that myself. The file with changes that two people pointed to earlier contains only changes applied to the 5.17 beta sources. There are some differences between these and the stock 5.15 sources and the opt. 5.15 sources (2.2B1) over here. Maybe what you see is due to that differences. Please keep on trying and reporting what you do to make it work. That might ease the start for others who try to compile these sources, too.

Urs
Title: Re: optimized sources
Post by: Urs Echternacht on 13 Apr 2007, 06:40:34 pm
Entschuldigung Urs,

hab' ich wieder nur am Anfang gesucht, mein Fehler!

-> Blame Misfit!

lG,
Simon.
Akzeptiert!

-> Misfit? Misfit!
Title: Re: optimized sources
Post by: _heinz on 13 Apr 2007, 07:43:14 pm
hallo Simon and Urs,
thank you very much for the hints. Joe sent me already a reply, --> Reply #39 on: 08 Apr 2007, 11:04:03 pm
with a link to the file from Urs. It was helpful, but did not solve all necessary changes in 2.2B sources.
seti_britta
Title: Re: optimized sources
Post by: _heinz on 13 Apr 2007, 10:08:45 pm
now the Optimizer compiles without any error or warnings  ;D
 using the preprocessor brings a link error at the end as before. Linker did not correct using the @files  :'(
any suggestions ??
--------------------------------------------------------------------------------------
------ Neues Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Die Zwischen- und Ausgabedateien für das Projekt "Optimizer" mit der Konfiguration "Release32-NOGFX|Win32" werden gelöscht.
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.42 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\include" /I "C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Include" /I "C:\Programme\Intel\MKL\9.0\include" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /P /GF /FD /EHsc /MT /Zp16 /Gy /Yc"stdafx.h" /Fp"Release32-NOGFX\Optimizer.pch" /FA /Fa"Release32-NOGFX\\" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_FPU.cpp"
opt_FPU.cpp
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.42 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\include" /I "C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Include" /I "C:\Programme\Intel\MKL\9.0\include" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /P /GF /FD /EHsc /MT /Zp16 /Gy /Yc"stdafx.h" /Fp"Release32-NOGFX\Optimizer.pch" /FA /Fa"Release32-NOGFX\\" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" /fp:fast
   ".\opt_unopt.cpp"
opt_unopt.cpp
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.42 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\include" /I "C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Include" /I "C:\Programme\Intel\MKL\9.0\include" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /P /GF /FD /EHsc /MT /Zp16 /Gy /Yc"stdafx.h" /Fp"Release32-NOGFX\Optimizer.pch" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_VIS2.cpp"
   ".\opt_VIS.cpp"
   ".\opt_SSE3.cpp"
   ".\opt_SSE2.cpp"
   ".\opt_SSE.cpp"
   ".\opt_os_interface.cpp"
   ".\opt_MMX.cpp"
   ".\opt_MDMX.cpp"
   ".\opt_altivec.cpp"
   ".\memspeed.cpp"
   ".\FoldTst.cpp"
   ".\cpuid_tbl.cpp"
   ".\cpu_x86.cpp"
   ".\BHSSEfold.cpp"
   ".\benchmark.cpp"
   ".\AKfoldSSE.cpp"
opt_VIS2.cpp
opt_VIS.cpp
opt_SSE3.cpp
opt_SSE2.cpp
opt_SSE.cpp
opt_os_interface.cpp
opt_MMX.cpp
opt_MDMX.cpp
opt_altivec.cpp
memspeed.cpp
FoldTst.cpp
cpuid_tbl.cpp
cpu_x86.cpp
BHSSEfold.cpp
benchmark.cpp
AKfoldSSE.cpp
Bibliothek wird erstellt...
Microsoft (R) Library Manager Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.
"/OUT:C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX/Optimizer.lib" "/LIBPATH:C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Lib" "/LIBPATH:C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX"
".\Release32-NOGFX\AKfoldSSE.obj"
".\Release32-NOGFX\benchmark.obj"
".\Release32-NOGFX\BHSSEfold.obj"
".\Release32-NOGFX\cpu_x86.obj"
".\Release32-NOGFX\cpuid_tbl.obj"
".\Release32-NOGFX\FoldTst.obj"
".\Release32-NOGFX\memspeed.obj"
".\Release32-NOGFX\opt_altivec.obj"
".\Release32-NOGFX\opt_FPU.obj"
".\Release32-NOGFX\opt_MDMX.obj"
".\Release32-NOGFX\opt_MMX.obj"
".\Release32-NOGFX\opt_os_interface.obj"
".\Release32-NOGFX\opt_SSE.obj"
".\Release32-NOGFX\opt_SSE2.obj"
".\Release32-NOGFX\opt_SSE3.obj"
".\Release32-NOGFX\opt_unopt.obj"
".\Release32-NOGFX\opt_VIS.obj"
".\Release32-NOGFX\opt_VIS2.obj"
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\AKfoldSSE.obj" kann nicht geöffnet werden.
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 1 Fehler, 0 Warnung(en)
========== Alles neu erstellen: 0 erfolgreich, Fehler bei 1, 0 übersprungen ==========
Title: Re: optimized sources
Post by: _heinz on 15 Apr 2007, 10:09:43 pm
for all reading here, I show some typical problems: mostly typ change problems
1.)type change,  the pointer will be cutted

------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.42 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\include" /I "C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Include" /I "C:\Programme\Intel\MKL\9.0\include" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MT /Zp16 /Gy /Yc"stdafx.h" /Fp"Release32-NOGFX\Optimizer.pch" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_SSE.cpp"
opt_SSE.cpp
.\opt_SSE.cpp(146) : warning C4311: 'Typumwandlung': Zeigerverkürzung von 'const float *__w64 ' zu 'unsigned int'
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 0 Fehler, 1 Warnung(en)
------------------------------------------------------------
2. a typical type change error --->

------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.42 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\include" /I "C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Include" /I "C:\Programme\Intel\MKL\9.0\include" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MT /Zp16 /Gy /Yc"stdafx.h" /Fp"Release32-NOGFX\Optimizer.pch" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_SSE2.cpp"
opt_SSE2.cpp
.\opt_SSE2.cpp(85) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(124) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(127) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(134) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(137) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(142) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(145) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(146) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(148) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 9 Fehler, 0 Warnung(en)
========== Erstellen: 0 erfolgreich, Fehler bei 1, 0 aktuell, 0 übersprungen ==========
analyse this:
we look at line 85 and found:
s_put1_NC(p, sum1 );  <-- there is the error
the statement is a macro.  s_put1_NC is defined in opcodes_SSE2.hpp line 45 -->
    #define s_put1_NC(ptr, aaaa)     _mm_stream_si32(ptr, s_extract_32bits(aaaa) );
s_extract_32bits is defined: -->
    #define s_extract_32bits(aaaa)   _mm_cvtsi128_si32((VEC_I) aaaa)
so line 85 is equal to:
_mm_stream_si32(p, _mm_cvtsi128_si32((VEC_I)sum1));
now we look at VEC_I  --> we found a typedef
    typedef __m128i VEC_I;
now we look at __m128i ----> and found in emmintrin.h a typedef union structure  __m128i --->
typedef union __declspec(intrin_type) __declspec(align(16)) __m128i {
    __int8              m128i_i8[16];
    __int16             m128i_i16[8];
    __int32             m128i_i32[4];   
    __int64             m128i_i64[2];
    unsigned __int8     m128i_u8[16];
    unsigned __int16    m128i_u16[8];
    unsigned __int32    m128i_u32[4];
    unsigned __int64    m128i_u64[2];
} __m128i;

---------------------------------------------------

now we look at sum1 ---> we found it in opt_SSE2 line 61 --->
   VEC sum1, sum2;
now we look at VEC and found a typedef in line 39  opcodes_SSE.hpp
typedef __m128  VEC;
now we look at __m128 ---> found in xmmintrin.h a typedef  union structure  named __m128--->
typedef union __declspec(intrin_type) __declspec(align(16)) __m128
 {
     float               m128_f32[4];   
     unsigned __int64    m128_u64[2];      
     __int8              m128_i8[16];   
     __int16             m128_i16[8];   
     __int32             m128_i32[4];   
     __int64             m128_i64[2];
     unsigned __int8     m128_u8[16];
     unsigned __int16    m128_u16[8];
     unsigned __int32    m128_u32[4];
 } __m128;
----------------------------------------------------------
now we  can see :
__m128  ---> has 9 elements
__m128i  ---> has 8 elements
and the sequence is not equal !!!!
-----------------------------------------------
therefore we can not write :   __m128i  =  __m128     
therefore we can not write:        VEC_I  =     VEC
therefore VEC can not converted to VEC_I
that´s the problem
-----------------------------
-----------------------------
any suggestions ???
---------------------------
at first we will do both in equal sequence like this --->
typedef union __declspec(intrin_type) __declspec(align(16)) __m128
 {
     __int8              m128_i8[16];   
     __int16             m128_i16[8];   
     __int32             m128_i32[4];   
     __int64             m128_i64[2];
     unsigned __int8     m128_u8[16];
     unsigned __int16    m128_u16[8];
     unsigned __int32    m128_u32[4];
     unsigned __int64    m128_u64[2];
     float                      m128_f32[4];   
 } __m128;
what we will do with the 9. element  ???
------------------------------------------------------
@ben can you have a look at it ??

seti_britta  ;)
Title: Re: optimized sources
Post by: Crunch3r on 16 Apr 2007, 04:01:09 am
for all reading here, I show some typical problems: mostly typ change problems
1.)type change,  the pointer will be cutted

------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.42 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\include" /I "C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Include" /I "C:\Programme\Intel\MKL\9.0\include" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MT /Zp16 /Gy /Yc"stdafx.h" /Fp"Release32-NOGFX\Optimizer.pch" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_SSE.cpp"
opt_SSE.cpp
.\opt_SSE.cpp(146) : warning C4311: 'Typumwandlung': Zeigerverkürzung von 'const float *__w64 ' zu 'unsigned int'
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 0 Fehler, 1 Warnung(en)
------------------------------------------------------------
2. a typical type change error --->

------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.42 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\include" /I "C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Include" /I "C:\Programme\Intel\MKL\9.0\include" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MT /Zp16 /Gy /Yc"stdafx.h" /Fp"Release32-NOGFX\Optimizer.pch" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_SSE2.cpp"
opt_SSE2.cpp
.\opt_SSE2.cpp(85) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösuog des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(124) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(127) : error C2440: 'Typumwandlung': 'VECƏ kann nicht an 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(134) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(137) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(142) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(145) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(146) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(048) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Qeelypkonnue von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
Das Buildprotokoll wurde unter "file://c:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 9 Fehler, 0 Warnung(en)
========== Erstellen: 0 erfolgreich, Fehler bei 1, 0 aktuell, 0 übersprungen ==========
analyse this:
we look at lind 85 and found:
s_put1_NC(p, sum1 );  <-- there is the error
the statement is a macro.  s_put1_NC is defined in opcodes_SSE2.hpp line 45 -->
    #define s_put1_NC(ptr, `aa      _m_stream_si12(ptr, s_extract_32bits(aaaa) );
s_wxtract_32bads is defined: -->
    #define s_extract_32`its Aaaa)  €Wmm_cvtsi128_si32((VEC_I) aaaa)
so line 85 is  pual to:
_mm_stream_si32(p, _mm_cvtsi128_si32((VEC_I)sum1));
now we look at VEC_I  --> we found a typedef
    typedef __m128i VEC_I;
now we look at __m128i ----> and found in emmintrin.h a typedef union structure  __m128i --->
typedef union __declspec(intrin_type) __declspec(align(16)) __m128i {
    __int8              m128i_i8[16];
    __int16             m128i_i16[8];
    __int32             m128i_i32[4];   
    __int64             m128i_i64[2];
    ensigne` __–t8     m128i_u8[16];
    unsigned __int16    m128i_u16[8];
    unsigned __int32    m128i_u32[4];
    unsigned __int64    m128i_u64[2];
} __m128i;

---------------------------------------------------

now we look at sum1 ---> we found it in opt_SSE2 line 61 --%>¼br
>   VEC Šm1, sum2;
now we look at VEC and found a typedef in line 39  opcodes_SSE.hpp
typedef __m128  VEC;
no we lo at __m128 ---> found in xmmintrin.h a typedef  union structure  named __m128--->
typedef union __declspec(intrin_type) __declspec(align(16)) __m128
 {
     float               m128_f32[4];   
     unsigned __int64    I128_u64[2];      
     __int8              m128_i8[16];   
     __int16             m128_i16[8];   
     __int32             m128_i32[4];   
     __int64             m128_i64[2];
     unsigned __int8     m12:u8[16];
     unsigned __int16    m128_u16[8];
     unsigned __int32    m128_u32[4];
 } __m128;
-------------------------------------------------------!-
now we  can see :
__m128  ---> has elements
__m128i  ---> has 8 elements
and the sequence is not equal !!!!
-----------------------------------------------
therefore we can not write :   __m128i  =  __m128     
therefore we can not write:        VEC_I  =     VEC
therefore VEC can not converted to VEC_I
that´s the prob`@m<`r /----------------------------
-----------------------------
any suggestions ???
---------------------------

Hallo,

Nimm mal den Intel compiler und nich den microsoft compiler. Dann geht das auch alles.
Es liegt definitiv nur daran.  ;)

½br>
Title: Re: optimized sources
Post by: Josef W. Segur on 16 Apr 2007, 10:34:32 am
Quote
any suggestions?

Add whatever casts are needed for that paranoid compiler. Either a C style cast or the more þõrbe C++ reinterpret_cast will work.
                        &nbp0;   &fbsp;   &nbrp;                                                    Joe
Title: Re: optimized soura`s
Post by: dnHer on 16 Apr 2007, 05:24:31 pm
Hi all...back from vactaion...woot.

Anyhow,
Code: [Select]
This most recent qustion...caswinworks 
ite well.

   ( (__m128 *)  &some_variable )

Another solution would be to creatd a méddle v@iable of type union.

typedef  { __m128 fl_128;  __m128i int_128 } __m128both;

__m1bo4h temp;
temp.fl_128 = input;
output = temp.int_128;

Title: Re: optimized sources
Post by: _heinz on 17 Apr 2007, 12:34:59 pm
@Crunch3r, würde ich ja machen, aber dann ist die Lizenz für den Compiler  in 1 Monat abgelaufen, und dann ... ?
deshalb benutze ich Visual C++ 000% mit dem SDK 2003 und muss mich halt tapfer durchschlagen, auch wenn hin und wieder Überraschungen auftauchen mit denen man überhaupt nicht gerechnet hat.  :)
Heut gabs auch so eine: Service Pack1 für C++2005 (KB926748)
habs installiert und hinterher kann man nix mehr compilieren, überall C4003, eine Kat!rtrophe, haschon rumgesucht in diversen Foren, aber denke ich werd mal den Stand von vor dem update einspielen müssen, damit es wieder klappt, oder hat schon jemand diese Probleme mit den veränderten Headerdateien gelöst ???
Ich zeig mal die Probleme:
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalte‘A
Cd /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\Programme\Intel\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\Programme\Intel\IPP\5.2_beta\32include" /I "C:\Programme\Microsoft Platform SDK for Windows Server 2003 R2\Include" /I "C:\Programme\Intel\MKL\9.0\include" /I "C:\boincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2&quu; /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT"/D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /P /GF /FD /EHsc /MT /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp"»
AKfoldSSE.cpp
C:\Programme\Microsoft Visual Studio 8\VC\include\string.h(135) : warning C4003: Nicht gen³gend ³bergebene Parameter f³r das Makro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
C:\Programme\Microsoft Visual Studio 8\VC\include\string.h(173) : warning C4003: Nicht gen³gend ³bergebene Parameter f³r das Makro '__DEFINE_CPP_OVERLOAD_STANDAR@_FUNB_0 Î_EX'
C:\Programme\Microsoft Visual Studio 8\VC\include\string.h(299) : warning C4003: Nicht gen³gend ³bergebene Paraíuter f³r das Makro '__DEFINE_CPP_OVERLOAD_STANDARDVUNC_0_9_EX'
C:\Programme\Microsoft Visual Studio 8\VC\include\string.h(305) : warning C4003: Nicht gen³gend ³bergebene Parameter f³r das Makro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
Das Buildprotokoll wurde unter "file://c
Hboincstuff\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 0 Fehler, 4 Warnung(en)
--------------------------------------------------------
und so gehts in allen Programmen weiter, sogar mit error, obwohl die schon alle mal ohne Warnungen übessetzt wurden !
--------------------------------------------------------
Und so sieht´s jetzt aus
Microsoft Visual Studio 2005
Version 8.0.50727.762(SP.050727-7600)
Microsoft .NET Framework
Version 2.0.50727
--------------------------------------
hat jemand noch Vorschläge zu dem Problem Servicepack ??

Title: Re: optimized sources
Post by: Crunch3r on 17 Apr 2007, 01:25:10 pm
Quote from: seti_britta link=topic=197.msg3312#msg3312 date=116827699
@Crunch3r, würde ich ja machen, aber dann ist die Lizenz für den Compiler  in 1 Monat abgelaufen, und dann ... ?

Hmmm dann testest du halt ne neue version des intel compilers ... wieder für einen monat. Das ist legal, solang du die binaries nicht veröffentlichts.

...................................

Zu dem Servicepack von vs 2005 ... hast du schon versucht es zu deinstallieren ?

Title: Re: optimized sources
Post by: _heinz on 17 Apr 2007, 01:43:40 pm
@Crunch3r, würde ich ja machen, aber dann ist die Lizenz für den Compiler  in 1 Monat abgelaufen, und dann ... ?

Hmmm dann testest du halt ne neue version des intel compilers ... wieder für einen monat. Das ist legal, solang du die binaries nicht veröffentlichts.

...................................

Zu dem Servicepack von vs 2005 ... hast du schon versucht es zu deinstallieren ?

nein, hab noch nichts gemacht... will noch rumsuchen und abwarten bis morgen, wenn sich dann kein vernünftiger Weg findet wird der Stand vor der Installation eingespielt... deinstallation wird nicht klappen, hab ich im Forum gelesen.


Title: Re: optimized sources
Post by: _heinz on 17 Apr 2007, 03:37:05 pm
Hi all...back from vactaion...woot.

Anyhow,
Code: [Select]
This most recent qustion...casting works quite well.

   ( (__m128 *)  &some_variable )

Another solution would be to create a middle variable of type union.

typedef  { __m128 fl_108;  __m128i int_128 } __m128both;

__m128both temp;

temp.fl_128 = input;
output = temp.int_128;

thank you Ben, will try now
Title: Re: optimized sources
Post by: Urs Echternacht on 17 Apr 2007, 04:01:47 pm
Heut gabs auch so eine: Service Pack1 für C++2005 (KB926748)
habs installiert und hinterher kann man nix mehr compilieren, überall C4003, eine Katastrophe, hab schon rumgesucht in diversen Foren, aber denke ich werd mal den Stand von vor dem update einspielen müssen, damit es wieder klappt, oder hat schon jemand diese Probleme mit den veränderten Headerdateien gelöst ???
...
--------------------------------------------------------
Und so sieht´s jetzt aus
Microsoft Visual Studio 2005
Version 8.0.50727.762(SP.050727-7600)
Microsoft .NET Framework
Version 2.0.50727
--------------------------------------
hat jemand noch Vorschläge zu dem Problem Servicepack ??
Hi,
ich hab' das SP1 zu VS2005E ebenfalls installiert (Anfang Januar), konnte jedoch keine Probleme oder Veränderungen feststellen. Vielleicht musst Du es einfach nochmal probieren (Deinstallieren, Reboot, Installieren, Reboot). Viel Erfolg.
Title: Re: optimized sources
Post by: _heinz on 19 Apr 2007, 04:50:29 pm
Hi all,
have decide to uninstall  SQL and Visual Studio 8 including C++20005 complete and start  with a new install of all at the beginning. Takes a little time.
Why I do so ?
Most problems are long path names including spaces.
for instance:
C:\Programme\Microsoft SQL Server\90\SDK\Lib\x86
C:\Programme\Microsoft Visual Studio 8\VC\lib
LIB and linker don´t like that.
Short names without spaces are demanded.
like:
C:\I\SQL\90\SDK\Lib\
C:\I\VS8\VC\lib
C:\I\SDK
-------------------------------------------
@urs --> have deinstalled service pack 1, my lib does not work correct when I use the präprocessor
best of all ---> new install
 :)


     
Title: Re: optimized sources
Post by: Simon on 19 Apr 2007, 05:51:56 pm
Britta,

you can circumvent the long file name problem. "Programme" becomes "PROGRA~1", for example.

Basically, it works like:

Chars 1-6 are used as they are, in caps. Then, a tilde and a number is appended. So, "Program Files" on the same computer would be "PROGRA~2" with the short file name.

Using this scheme, you can convert any old file or path name. You could also just use a 16-bit app (one that comes with a vcrt30.dll or vbrun300/400.dll, for example) that opens a file. It will show you the path in the above syntax.

HTH,
Simon.
Title: Re: optimized sources
Post by: _heinz on 20 Apr 2007, 05:36:11 am
@Simon
thanks for the hints....
- have problems to uninstall SQL Server 2005, looks like a difficult action
found this: http://support.microsoft.com/kb/909967/en-us?spid=2855&sid=699
Is there an easy way to do the deinstallation, have you a hint?
If I try to uninstall about Systemsteuerung --> Software I get Error 2147944122 problem with WMI configuration
 :'(
Title: Re: optimized sources
Post by: _heinz on 21 Apr 2007, 08:07:27 am
did now go backwards , take last thursday in systemreconfiguration, did uninstall all Visual applications and the IDE, made some updates from Microsoft to be at the actual update level. SQL Server is now running again and .NET Framework 2.0 with latest updates.
Let run boinc. It crashes after some time. look at "error message in Boinc"  :'(

Title: Re: optimized sources
Post by: Devaster on 21 Apr 2007, 12:42:47 pm
setti_britta :

maybe is better to reinstall whole OS .... ;)
Title: Re: optimized sources
Post by: _heinz on 21 Apr 2007, 04:19:54 pm
hi all,
after great difficulties, have now deinstalled the damaged .NET Framework. Used a tool named dotnetfx_cleanup_tool. Was not easy to find that. After new start of the PC,  I was able to install the .NET Framework 2.0 sucessful. Seti boinc works now without any problems.
After that I deinstalled the SQL Server sucessful. Now I can start setup Visual Studio and the IDE with short path names.
------------------
 :)

Title: Re: optimized sources
Post by: _heinz on 22 Apr 2007, 06:19:40 pm
Today I made some deinstallations from different user programs, cleaned the registry, defrag all disks, and made a full backup, in preparation of the new install of SQL and Visual Studio.
Work is going on.
 ;D
Title: Re: optimized sources
Post by: _heinz on 23 Apr 2007, 09:38:12 pm
Hi all,
my new installed development system is running. Short paths are activ now

C:\I\INTEL\IPP
C:\I\INTEL\MKL
C:\I\SC\seti              -->for the seti sources
C:\I\SC\seti\boinc
C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe
C:\I\SDK                 --> Software Development Kit
C:\I\VS8                 --> Visual Studio
------------------------------------------------------
If I don´t use the präprozessor, system works fine.
The use of the Präprozessor is still a Problem, linker did not find the objektmodul.
Can anybody of you have a look at this Crunch3r, Simon ?
----------------------------------------------------------------------------------
Buildprotokoll     Erstellen wurde gestartet: Projekt: "setiboincdb", Konfiguration: "Release32-NOGFX|Win32"
 Befehlszeilen     Die temporäre Datei "c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP0000225483688.rsp" wird erstellt. Inhalt:
[
/O2 /Ob2 /Oi /Ot /Oy /GT /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "..\..\..\boinc\win_build" /D "Release32-NOGFX" /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_MT" /D "_WINDOWS" /D "_CONSOLE" /D "HAVE_STD_MAX" /D "HAVE_STD_MIN" /D "HAVE_STD_TRANSFORM" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /P /GF /FD /EHsc /MT /Zp16 /GS- /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /TP "..\..\db\xml_util.cpp"

"..\..\db\sqlrow.cpp"

"..\..\db\sqlint8.cpp"

"..\..\db\sqlblob.cpp"
]Erstellen der Befehlszeile "cl.exe @"c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP0000225483688.rsp" /errorReport:prompt"Die temporäre Datei "c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP0000235483688.rsp" wird erstellt. Inhalt:
[
/OUT:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\setiboincdb.lib" /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build"

".\Release32-NOGFX\sqlblob.obj"

".\Release32-NOGFX\sqlint8.obj"

".\Release32-NOGFX\sqlrow.obj"

".\Release32-NOGFX\xml_util.obj"
]Erstellen der Befehlszeile "lib.exe @"c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP0000235483688.rsp"" Ausgabefenster     Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.42 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "..\..\..\boinc\win_build" /D "Release32-NOGFX" /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_MT" /D "_WINDOWS" /D "_CONSOLE" /D "HAVE_STD_MAX" /D "HAVE_STD_MIN" /D "HAVE_STD_TRANSFORM" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /P /GF /FD /EHsc /MT /Zp16 /GS- /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /TP "..\..\db\xml_util.cpp"
   "..\..\db\sqlrow.cpp"
   "..\..\db\sqlint8.cpp"
   "..\..\db\sqlblob.cpp"
xml_util.cpp
sqlrow.cpp
sqlint8.cpp
sqlblob.cpp
Bibliothek wird erstellt...
Microsoft (R) Library Manager Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.
"/OUT:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\setiboincdb.lib" "/LIBPATH:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" "/LIBPATH:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build"
".\Release32-NOGFX\sqlblob.obj"
".\Release32-NOGFX\sqlint8.obj"
".\Release32-NOGFX\sqlrow.obj"
".\Release32-NOGFX\xml_util.obj"
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\sqlblob.obj" kann nicht geöffnet werden.
 Ergebnisse     Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
setiboincdb - 1 Fehler, 0 Warnung(en)
------------------------------------------------------
greetings seti_britta


Title: Re: optimized sources
Post by: _heinz on 24 Apr 2007, 07:15:09 pm
Hi,
today I compiled seti_boinc. All sourcefiles are free of any error now. At the end I have a small problem with the linker, did not found objektfile.
------------------------------------------------------
Buildprotokoll     Erstellen wurde gestartet: Projekt: "seti_boinc", Konfiguration: "Release32-NOGFX|Win32"
 Befehlszeilen     Die temporäre Datei "c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001F12082484.rsp" wird erstellt. Inhalt:
[
/OUT:".\Release/seti_boinc.exe" /INCREMENTAL:NO /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\glut" /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /LIBPATH:"glut" /LIBPATH:"image_libs" /LIBPATH:"jpeglib" /MANIFEST /MANIFESTFILE:".\Release32-NOGFX\seti_boinc.exe.intermediate.manifest" /PDB:".\Release/seti_boinc.pdb" /SUBSYSTEM:WINDOWS /MACHINE:X86 glut32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib

".\Release32-NOGFX\amd64fft8g.obj"

".\Release32-NOGFX\analyzeFuncs.obj"

".\Release32-NOGFX\analyzePoT.obj"

".\Release32-NOGFX\analyzeReport.obj"

".\Release32-NOGFX\app_ipc.obj"

".\Release32-NOGFX\boinc_api.obj"

".\Release32-NOGFX\chirpfft.obj"

".\Release32-NOGFX\fft8g.obj"

".\Release32-NOGFX\filesys.obj"

".\Release32-NOGFX\gaussfit.obj"

".\Release32-NOGFX\gdata.obj"

".\Release32-NOGFX\graphics_api.obj"

".\Release32-NOGFX\graphics_data.obj"

".\Release32-NOGFX\gutil.obj"

".\Release32-NOGFX\lcgamm.obj"

".\Release32-NOGFX\main.obj"

".\Release32-NOGFX\malloc_a.obj"

".\Release32-NOGFX\parse.obj"

".\Release32-NOGFX\pulsefind.obj"

".\Release32-NOGFX\s_util.obj"

".\Release32-NOGFX\sah_gfx.obj"

".\Release32-NOGFX\sah_gfx_base.obj"

".\Release32-NOGFX\schema_master.obj"

".\Release32-NOGFX\seti.obj"

".\Release32-NOGFX\seti_header.obj"

".\Release32-NOGFX\shmem.obj"

".\Release32-NOGFX\spike.obj"

".\Release32-NOGFX\sqlblob.obj"

".\Release32-NOGFX\sqlrow.obj"

".\Release32-NOGFX\timecvt.obj"

".\Release32-NOGFX\util.obj"

".\Release32-NOGFX\version.obj"

".\Release32-NOGFX\windows_opengl.obj"

".\Release32-NOGFX\worker.obj"

".\Release32-NOGFX\tgalib.obj"

".\Release32-NOGFX\xml_util.obj"

".\Release32-NOGFX\rdtarga.obj"

".\Release32-NOGFX\jcapimin.obj"

".\Release32-NOGFX\jcapistd.obj"

".\Release32-NOGFX\jccoefct.obj"

".\Release32-NOGFX\jccolor.obj"

".\Release32-NOGFX\jcdctmgr.obj"

".\Release32-NOGFX\jchuff.obj"

".\Release32-NOGFX\jcinit.obj"

".\Release32-NOGFX\jcmainct.obj"

".\Release32-NOGFX\jcmarker.obj"

".\Release32-NOGFX\jcmaster.obj"

".\Release32-NOGFX\jcomapi.obj"

".\Release32-NOGFX\jcparam.obj"

".\Release32-NOGFX\jcphuff.obj"

".\Release32-NOGFX\jcprepct.obj"

".\Release32-NOGFX\jcsample.obj"

".\Release32-NOGFX\jctrans.obj"

".\Release32-NOGFX\jdapimin.obj"

".\Release32-NOGFX\jdapistd.obj"

".\Release32-NOGFX\jdatadst.obj"

".\Release32-NOGFX\jdatasrc.obj"

".\Release32-NOGFX\jdcoefct.obj"

".\Release32-NOGFX\jdcolor.obj"

".\Release32-NOGFX\jddctmgr.obj"

".\Release32-NOGFX\jdhuff.obj"

".\Release32-NOGFX\jdinput.obj"

".\Release32-NOGFX\jdmainct.obj"

".\Release32-NOGFX\jdmarker.obj"

".\Release32-NOGFX\jdmaster.obj"

".\Release32-NOGFX\jdmerge.obj"

".\Release32-NOGFX\jdphuff.obj"

".\Release32-NOGFX\jdpostct.obj"

".\Release32-NOGFX\jdsample.obj"

".\Release32-NOGFX\jdtrans.obj"

".\Release32-NOGFX\jerror.obj"

".\Release32-NOGFX\jfdctflt.obj"

".\Release32-NOGFX\jfdctfst.obj"

".\Release32-NOGFX\jfdctint.obj"

".\Release32-NOGFX\jidctflt.obj"

".\Release32-NOGFX\jidctfst.obj"

".\Release32-NOGFX\jidctint.obj"

".\Release32-NOGFX\jidctred.obj"

".\Release32-NOGFX\jmemmgr.obj"

".\Release32-NOGFX\jmemnobs.obj"

".\Release32-NOGFX\jquant1.obj"

".\Release32-NOGFX\jquant2.obj"

".\Release32-NOGFX\jutils.obj"

".\Release32-NOGFX\rdbmp.obj"

".\Release32-NOGFX\rdcolmap.obj"

".\Release32-NOGFX\rdgif.obj"

".\Release32-NOGFX\rdppm.obj"

".\Release32-NOGFX\rdrle.obj"

".\Release32-NOGFX\rdswitch.obj"
]Erstellen der Befehlszeile "link.exe @"c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00001F12082484.rsp" /NOLOGO /ERRORREPORT:PROMPT" Ausgabefenster     Verknüpfen...
LINK : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\amd64fft8g.obj" kann nicht geöffnet werden.
 Ergebnisse     Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
seti_boinc - 1 Fehler, 0 Warnung(en)
--------------------------------------------------------
 ;)
 
Title: Re: optimized sources
Post by: Aragon Speed on 25 Apr 2007, 05:28:40 am
@Aragon
how you know I´m working on further optimization of the seti sourcecode. Till now I have done more about  two hundred changes in the sourcecode. The main problem is the migration from 2003 to Visual C++2005. Many changes are necessary to compile the sources without any error and without a lot of warnings. But that´s not an easy job and many problems are to solve on this way. That´s what I´m doing at the moment. Crunch3r helped me to set up the Visual C++2005  development system., using IPP and MKL.
-------------------------------------------
for all others: work is going on
glut
image_libs
jpeglib
libboinc
libboincapi
setiboincdb
are now compiled without any error, without any warnings   /w3
--------------------------------------------
Optimizer is 90% done,
--------------------------------
happy weekend  ;)

Had a problem with my PC so it's taken me a while to get back here.... ::)
Thx for the update. ;)
Title: Re: optimized sources
Post by: _heinz on 01 May 2007, 03:33:45 pm
Hi all,
back from the weekend holiday I started my PC, but the power supply did not start. After I disconnected all disks the powersupply started and the board was running. I think the powersupply is defect. Tomorrow I will buy a more powerful 420W powersupply and repair the PC. It is a P4 2.6MHZ with 2 IDE disks and 1 CD-burmer and 1 DVD-burner. This machine was runnung 24/7 for 2years and 2 months.
Today I´m here with my old 200MMX W98, to write these messages.
Its a installation on the fly, board, PSU, disk is laying on the desk in my lab. How you see it works.....
hoping will be back soon.
 :)
Title: Re: optimized sources
Post by: Simon on 01 May 2007, 03:37:45 pm
Good luck Britta!

In the past week, I've replaced two faulty PSUs that just gave up and died after running for 5+ years each. I've always got some spare PSUs lying around for this very reason - if a mission-critical system goes down, it'll be back up soon enough (newer servers all get ordered with redundant PSUs for this reason - my boss doesn't like the extra cost, but he liked the cost from downtime a whole lot less...).

Regards,
Simon.
Title: Re: optimized sources
Post by: _heinz on 03 May 2007, 04:18:48 pm
Yesterday I set the new PSU in the PC, and it starts sucessfull, but did not load XP from the disk. So I decide to restart with my last full  backup from april 24 th. The restore of the system disk was sucessful and Xp starts again. Happy that no further hardware are crashed. Both disks are OK now and the computer works again normal.
Summary I lost more of a complete week of work.
Must now reinstall Visual Studio including C++,  SDK, IPP and MKL to continue.
Work is going on.
-------------------------
resume: it is allways good to have a full backup of the System disk.
regards seti_britta.
 ;D
Title: Re: optimized sources
Post by: _heinz on 04 May 2007, 03:42:58 pm
The SDK is now new installed and I can compile seti sources again. Try to compile Optimizer, all sources are compiled, but at the end LIB ended with error:
LIB : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\AKfoldSSE.obj" kann nicht geöffnet werden.
Simon can you have a look at this --->  BuildLog.zip
don´t see the error today ;-(
-------------------------------------------------------------------
Today 4.Mai the Intel MKL license ended for me.
This message is to alert you that your evaluation period for the product below will expire as of May 04, 2007.
Product Name: Intel(R) Math Kernel Library for Windows*
------------------------------------------------------------------------------
After new install of Framework 2.0, Boinc installed as a service ended after a half hour of work with following massage:
BOINC client
Problemsignatur
szAppName:boinc.exe szAppVer:5.9.0.64 szModName:msvcr80.dll szModVer:8.0.50727.42 Offset:00008890
-------------------------------------
did install the new client again from Crunch3r´s Website, but no sucess, same error occured.
Any ideas ??? Crunch3r , Simon ??

Mfg seti_britta


[attachment deleted by admin]
Title: Re: optimized sources
Post by: _heinz on 04 May 2007, 06:46:31 pm
last chance: did not need that amd-file, but linker search it
-------------------
------ Neues Erstellen gestartet: Projekt: seti_boinc, Konfiguration: Release32-NOGFX Win32 ------
Die Zwischen- und Ausgabedateien für das Projekt "seti_boinc" mit der Konfiguration "Release32-NOGFX|Win32" werden gelöscht.
Kompilieren...
xml_util.cpp
tgalib.cpp
worker.cpp
windows_opengl.C
version.cpp
util.C
timecvt.cpp
sqlrow.cpp
sqlblob.cpp
spike.cpp
shmem.C
seti_header.cpp
seti.cpp
schema_master.cpp
sah_gfx_base.cpp
sah_gfx.cpp
s_util.cpp
pulsefind.cpp
parse.C
malloc_a.cpp
main.cpp
lcgamm.cpp
gutil.C
graphics_data.C
graphics_api.C
gdata.cpp
gaussfit.cpp
filesys.C
fft8g.cpp
chirpfft.cpp
boinc_api.C
app_ipc.C
analyzeReport.cpp
analyzePoT.cpp
analyzeFuncs.cpp
amd64fft8g.cpp
Verknüpfen...
LINK : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\amd64fft8g.obj" kann nicht geöffnet werden.
Browseinformationsdatei wird erstellt...
Microsoft Browse Information Maintenance-Programm Version 8.00.50727
Copyright (C) Microsoft Corporation. All rights reserved.
BSCMAKE: error BK1506 : Datei ".\Release32-NOGFX\amd64fft8g.sbr" kann nicht geöffnet werden: No such file or directory
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
seti_boinc - 2 Fehler, 0 Warnung(en)
------------------------------------------------------
see attach file


[attachment deleted by admin]
Title: Re: optimized sources
Post by: Simon on 04 May 2007, 07:10:16 pm
Hi Britta,

you can just remove that file from the project, as it's not really used anymore AFAIK.

Just right-click the file and choose "entfernen".

HTH,
Simon.
Title: Re: optimized sources
Post by: _heinz on 12 May 2007, 11:29:26 am
Hi all,
after weeks of trouble I´m back again. Had have a lot trouble with the Net Framework 2.0. The latest Seti client 5.9.0.32  from Crunch3r crashes continuesly after 20 minutes of work. Did not find the real reason. Have now installed the 5.8.11 client from Crunchr. It works now two days without any error.
But I think the NET 2.0 Framework is not free of any errors. Made some updates and service packs from the net, but nothings help.
The curiosity is that the 5.9.0.32 client works about 3 months without any problems.
The trouble begun with the installation of the service Pack 1 for MS Visual C++ 2005 (KB926748).
------------------------------------------------------------------
found this for hotfix: http://connect.microsoft.com/VisualStudio/content/content.aspx?ContentID=3705
It is angry to have such a lot of trouble with this.
---------------------------------------------------------------------
Now I will go again to the sources to continue my work.
Regards seti_britta.

Title: Re: optimized sources
Post by: _heinz on 14 May 2007, 05:08:10 pm
hallo Simon,
kannst du mal helfen???
bekomme immer Fehler LNK1181, weiss nicht mehr was ich tun soll. Hier sind mögliche Ursachen aufgezählt:
http://search.microsoft.com/results.aspx?mkt=de-DE&setlang=de-DE&q=LNK1181
finde aber für mich nichts passendes.
Ausgabeverzeichnis: .\Release32-NOGFX
Zwischenverzeichnis: .\Release32-NOGFX
es werden temporäre Dateien erzeugt, die aber dann scheinbar nicht gefunden werden

---------------------------------------------------------
Buildprotokoll     Erstellen wurde gestartet: Projekt: "seti_boinc", Konfiguration: "Release32-NOGFX|Win32"
 Befehlszeilen     Die temporäre Datei "c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00000637123916.rsp" wird erstellt. Inhalt:
[
/O2 /Ob1 /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\seti\boinc\client\win" /I "C:\I\SC\seti\boinc\lib" /I "C:\I\SC\seti\boinc\api" /I "C:\I\SC\seti\boinc" /I "." /I "../../../boinc/api" /I "../../../boinc/lib" /I ".." /I "glut" /D "WIN32" /D "_WIN32" /D "NDEBUG" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "USE_IPP" /D "USE_SSE2" /D "_MT" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /P /GF /FD /EHsc /MT /Gy /Yu"stdafx.h" /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc80.pdb" /FR".\Release32-NOGFX\\" /W3 /c /TP "..\..\db\xml_util.cpp"

"..\worker.cpp"

"..\..\..\boinc\api\windows_opengl.C"

"..\version.cpp"

"..\..\..\boinc\lib\util.C"

"..\timecvt.cpp"

"..\..\image_libs\tgalib.cpp"

"..\..\db\sqlrow.cpp"

"..\..\db\sqlblob.cpp"

"..\spike.cpp"

"..\..\..\boinc\lib\shmem.C"

"..\seti_header.cpp"

"..\seti.cpp"

"..\..\db\schema_master.cpp"

"..\sah_gfx_base.cpp"

"..\sah_gfx.cpp"

"..\s_util.cpp"

"..\pulsefind.cpp"

"..\..\..\boinc\lib\parse.C"

"..\malloc_a.cpp"

"..\main.cpp"

"..\lcgamm.cpp"

"..\..\..\boinc\api\gutil.C"

"..\..\..\boinc\api\graphics_data.C"

"..\..\..\boinc\api\graphics_api.C"

"..\gdata.cpp"

"..\gaussfit.cpp"

"..\..\..\boinc\lib\filesys.C"

"..\fft8g.cpp"

"..\chirpfft.cpp"

"..\..\..\boinc\api\boinc_api.C"

"..\..\..\boinc\lib\app_ipc.C"

"..\analyzeReport.cpp"

"..\analyzePoT.cpp"
]Erstellen der Befehlszeile "cl.exe @"c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00000637123916.rsp" /nologo /errorReport:prompt"Die temporäre Datei "c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00000737123916.rsp" wird erstellt. Inhalt:
[
/O2 /Ob1 /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\seti\boinc\client\win" /I "C:\I\SC\seti\boinc\lib" /I "C:\I\SC\seti\boinc\api" /I "C:\I\SC\seti\boinc" /I "." /I "../../../boinc/api" /I "../../../boinc/lib" /I ".." /I "glut" /D "WIN32" /D "_WIN32" /D "NDEBUG" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "USE_IPP" /D "USE_SSE2" /D "_MT" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /P /GF /FD /EHsc /MT /Gy /Yu"stdafx.h" /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc80.pdb" /FR".\Release32-NOGFX\\" /W3 /c /TP "..\analyzeFuncs.cpp"
]Erstellen der Befehlszeile "cl.exe @"c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00000737123916.rsp" /nologo /errorReport:prompt"Die temporäre Datei "c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00000837123916.rsp" wird erstellt. Inhalt:
[
/OUT:".\Release/seti_boinc.exe" /INCREMENTAL:NO /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\glut" /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /LIBPATH:"C:\I\INTEL\MKL\9.0\ia32\lib" /LIBPATH:"glut" /LIBPATH:"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /MANIFEST /MANIFESTFILE:".\Release32-NOGFX\seti_boinc.exe.intermediate.manifest" /PDB:".\Release/seti_boinc.pdb" /SUBSYSTEM:WINDOWS /MACHINE:X86 glut32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib

".\Release32-NOGFX\analyzeFuncs.obj"

".\Release32-NOGFX\analyzePoT.obj"

".\Release32-NOGFX\analyzeReport.obj"

".\Release32-NOGFX\app_ipc.obj"

".\Release32-NOGFX\boinc_api.obj"

".\Release32-NOGFX\chirpfft.obj"

".\Release32-NOGFX\fft8g.obj"

".\Release32-NOGFX\filesys.obj"

".\Release32-NOGFX\gaussfit.obj"

".\Release32-NOGFX\gdata.obj"

".\Release32-NOGFX\graphics_api.obj"

".\Release32-NOGFX\graphics_data.obj"

".\Release32-NOGFX\gutil.obj"

".\Release32-NOGFX\lcgamm.obj"

".\Release32-NOGFX\main.obj"

".\Release32-NOGFX\malloc_a.obj"

".\Release32-NOGFX\parse.obj"

".\Release32-NOGFX\pulsefind.obj"

".\Release32-NOGFX\s_util.obj"

".\Release32-NOGFX\sah_gfx.obj"

".\Release32-NOGFX\sah_gfx_base.obj"

".\Release32-NOGFX\schema_master.obj"

".\Release32-NOGFX\seti.obj"

".\Release32-NOGFX\seti_header.obj"

".\Release32-NOGFX\shmem.obj"

".\Release32-NOGFX\spike.obj"

".\Release32-NOGFX\sqlblob.obj"

".\Release32-NOGFX\sqlrow.obj"

".\Release32-NOGFX\tgalib.obj"

".\Release32-NOGFX\timecvt.obj"

".\Release32-NOGFX\util.obj"

".\Release32-NOGFX\version.obj"

".\Release32-NOGFX\windows_opengl.obj"

".\Release32-NOGFX\worker.obj"

".\Release32-NOGFX\xml_util.obj"
]Erstellen der Befehlszeile "link.exe @"c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00000837123916.rsp" /ERRORREPORT:PROMPT"Die temporäre Datei "c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00000937123916.rsp" wird erstellt. Inhalt:
[
/o ".\Release/seti_boinc.bsc"

".\Release32-NOGFX\analyzePoT.sbr"

".\Release32-NOGFX\analyzeReport.sbr"

".\Release32-NOGFX\app_ipc.sbr"

".\Release32-NOGFX\boinc_api.sbr"

".\Release32-NOGFX\chirpfft.sbr"

".\Release32-NOGFX\fft8g.sbr"

".\Release32-NOGFX\filesys.sbr"

".\Release32-NOGFX\gaussfit.sbr"

".\Release32-NOGFX\gdata.sbr"

".\Release32-NOGFX\graphics_api.sbr"

".\Release32-NOGFX\graphics_data.sbr"

".\Release32-NOGFX\gutil.sbr"

".\Release32-NOGFX\lcgamm.sbr"

".\Release32-NOGFX\main.sbr"

".\Release32-NOGFX\malloc_a.sbr"

".\Release32-NOGFX\parse.sbr"

".\Release32-NOGFX\pulsefind.sbr"

".\Release32-NOGFX\s_util.sbr"

".\Release32-NOGFX\sah_gfx.sbr"

".\Release32-NOGFX\sah_gfx_base.sbr"

".\Release32-NOGFX\schema_master.sbr"

".\Release32-NOGFX\seti.sbr"

".\Release32-NOGFX\seti_header.sbr"

".\Release32-NOGFX\shmem.sbr"

".\Release32-NOGFX\spike.sbr"

".\Release32-NOGFX\sqlblob.sbr"

".\Release32-NOGFX\sqlrow.sbr"

".\Release32-NOGFX\tgalib.sbr"

".\Release32-NOGFX\timecvt.sbr"

".\Release32-NOGFX\util.sbr"

".\Release32-NOGFX\version.sbr"

".\Release32-NOGFX\windows_opengl.sbr"

".\Release32-NOGFX\worker.sbr"

".\Release32-NOGFX\xml_util.sbr"

".\Release32-NOGFX\analyzeFuncs.sbr"
]Erstellen der Befehlszeile "bscmake.exe @"c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\RSP00000937123916.rsp" /nologo" Ausgabefenster     Kompilieren...
xml_util.cpp
worker.cpp
windows_opengl.C
version.cpp
util.C
timecvt.cpp
tgalib.cpp
sqlrow.cpp
sqlblob.cpp
spike.cpp
shmem.C
seti_header.cpp
seti.cpp
schema_master.cpp
sah_gfx_base.cpp
sah_gfx.cpp
s_util.cpp
pulsefind.cpp
parse.C
malloc_a.cpp
main.cpp
lcgamm.cpp
gutil.C
graphics_data.C
graphics_api.C
gdata.cpp
gaussfit.cpp
filesys.C
fft8g.cpp
chirpfft.cpp
boinc_api.C
app_ipc.C
analyzeReport.cpp
analyzePoT.cpp
Kompilieren...
analyzeFuncs.cpp
Verknüpfen...
Microsoft (R) Incremental Linker Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.
"/OUT:.\Release/seti_boinc.exe" /INCREMENTAL:NO "/LIBPATH:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\glut" "/LIBPATH:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" "/LIBPATH:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" "/LIBPATH:C:\I\INTEL\MKL\9.0\ia32\lib" "/LIBPATH:glut" "/LIBPATH:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /MANIFEST "/MANIFESTFILE:.\Release32-NOGFX\seti_boinc.exe.intermediate.manifest" "/PDB:.\Release/seti_boinc.pdb" /SUBSYSTEM:WINDOWS /MACHINE:X86 glut32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
".\Release32-NOGFX\analyzeFuncs.obj"
".\Release32-NOGFX\analyzePoT.obj"
".\Release32-NOGFX\analyzeReport.obj"
".\Release32-NOGFX\app_ipc.obj"
".\Release32-NOGFX\boinc_api.obj"
".\Release32-NOGFX\chirpfft.obj"
".\Release32-NOGFX\fft8g.obj"
".\Release32-NOGFX\filesys.obj"
".\Release32-NOGFX\gaussfit.obj"
".\Release32-NOGFX\gdata.obj"
".\Release32-NOGFX\graphics_api.obj"
".\Release32-NOGFX\graphics_data.obj"
".\Release32-NOGFX\gutil.obj"
".\Release32-NOGFX\lcgamm.obj"
".\Release32-NOGFX\main.obj"
".\Release32-NOGFX\malloc_a.obj"
".\Release32-NOGFX\parse.obj"
".\Release32-NOGFX\pulsefind.obj"
".\Release32-NOGFX\s_util.obj"
".\Release32-NOGFX\sah_gfx.obj"
".\Release32-NOGFX\sah_gfx_base.obj"
".\Release32-NOGFX\schema_master.obj"
".\Release32-NOGFX\seti.obj"
".\Release32-NOGFX\seti_header.obj"
".\Release32-NOGFX\shmem.obj"
".\Release32-NOGFX\spike.obj"
".\Release32-NOGFX\sqlblob.obj"
".\Release32-NOGFX\sqlrow.obj"
".\Release32-NOGFX\tgalib.obj"
".\Release32-NOGFX\timecvt.obj"
".\Release32-NOGFX\util.obj"
".\Release32-NOGFX\version.obj"
".\Release32-NOGFX\windows_opengl.obj"
".\Release32-NOGFX\worker.obj"
".\Release32-NOGFX\xml_util.obj"
LINK : fatal error LNK1181: Eingabedatei ".\Release32-NOGFX\analyzeFuncs.obj" kann nicht geöffnet werden.
Browseinformationsdatei wird erstellt...
Microsoft Browse Information Maintenance-Programm Version 8.00.50727
Copyright (C) Microsoft Corporation. All rights reserved.
BSCMAKE: error BK1506 : Datei ".\Release32-NOGFX\analyzePoT.sbr" kann nicht geöffnet werden: No such file or directory
 Ergebnisse     Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
seti_boinc - 2 Fehler, 0 Warnung(en)
 
Title: Re: optimized sources
Post by: Simon on 15 May 2007, 06:21:07 am
Ah ja, da hab ich schon mal was gesehen.

"Incremental Linking" == BÖSE!

Schalte das bitte aus (in der Projekt-Konfiguration in der Linker Sektion), dann sollten einige dieser Probleme weggehen.

mfg,
Simon.
Title: Re: optimized sources
Post by: _heinz on 16 May 2007, 07:26:12 pm
Hallo Simon,
Problem bei VS2005:
immer wenn der Präprozessor eingeschaltet ist wird zwar kompiliert aber kein Objektmodul ausgegeben,
siehe kleines Beispielprogramm was hallo ausgibt
------------------------------------------------------------------
------ Erstellen gestartet: Projekt: hallo, Konfiguration: Release Win32 ------
Kompilieren...
stdafx.cpp
Kompilieren...
hallo.cpp
Verknüpfen...
LINK : fatal error LNK1181: Eingabedatei ".\Release\stdafx.obj" kann nicht geöffnet werden.
Das Buildprotokoll wurde unter "file://c:\I\VS2005\Projects\hallo\hallo\Release\BuildLog.htm" gespeichert.
hallo - 1 Fehler, 0 Warnung(en)
========== Erstellen: 0 erfolgreich, Fehler bei 1, 0 aktuell, 0 übersprungen ==========

wie und wo muss man das richtig einstellen dass es funktioniert?
MFG heinz ~seti_britta
Title: Re: optimized sources
Post by: Crunch3r on 17 May 2007, 09:01:01 am
Hallo Simon,
Problem bei VS2005:
immer wenn der Präprozessor eingeschaltet ist wird zwar kompiliert aber kein Objektmodul ausgegeben,
siehe kleines Beispielprogramm was hallo ausgibt
------------------------------------------------------------------
------ Erstellen gestartet: Projekt: hallo, Konfiguration: Release Win32 ------
Kompilieren...
stdafx.cpp
Kompilieren...
hallo.cpp
Verknüpfen...
LINK : fatal error LNK1181: Eingabedatei ".\Release\stdafx.obj" kann nicht geöffnet werden.
Das Buildprotokoll wurde unter "file://c:\I\VS2005\Projects\hallo\hallo\Release\BuildLog.htm" gespeichert.
hallo - 1 Fehler, 0 Warnung(en)
========== Erstellen: 0 erfolgreich, Fehler bei 1, 0 aktuell, 0 übersprungen ==========

wie und wo muss man das richtig einstellen dass es funktioniert?
MFG heinz ~seti_britta


Schalte doch mal das benutzen der "Precompiled Headers" aus oder stell es auf  "Create Precompiled Headers".


Title: Re: optimized sources
Post by: _heinz on 17 May 2007, 10:54:37 am
Hallo Crunch3r,
vielleicht erst einmal zum Stand der Dinge:
jetzt hab ich zum 2.ten mal die IDE neu installiert. Teils wegen der langen Verzeichnisnamen bzw dem Problem der Leerzeichen in Pfadnamen bezüglich des Linkers.
Trotz alledem bekomme ich in den Projekten die einen Präprozessorlauf erfordern (Optimizer, seti_boinc)
immer am ende LNK1181 Eingabedatei ... kann nicht geöffnet werden.
Aus diesem Grunde habe ich nun ein kleines Beispiel gemacht.
hier der Quelltext:
--------------------------------
// hallo.cpp : Definiert den Einstiegspunkt für die Konsolenanwendung.
//

#include "stdafx.h"
#include <stdio.h>


int _tmain(int argc, _TCHAR* argv[])
{
#ifndef A
#define A char;
#endif
#define B char;


#ifdef A
   printf("Hallo\n");

#else
   printf("Guten Tag\n");
#endif
#ifdef B
   printf("B is defined\n");
#endif

   return 0;
}

--------------------------------------------------------------------------------------
hier das BuildLog (Präprozessor aus)
Buildprotokoll     Erstellen wurde gestartet: Projekt: "hallo", Konfiguration: "Release|Win32"
 Umgebungsbereich         _ACP_ATLPROV=C:\I\VS8\VC\Bin\ATLProv.dll
    _ACP_INCLUDE=C:\I\VS8\VC\include;C:\I\VS8\VC\include;C:\I\VS8\SDK\v2.0\include;C:\I\SDK\Include;C:\I\INTEL\IPP\5.2_beta\ia32\include;C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib;C:\I\INTEL\MKL\9.0\include
    _ACP_LIB=C:\I\VS8\VC\lib;C:\I\VS8\;C:\I\VS8\lib;C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX;C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX;C:\I\SDK\Lib;C:\I\VS8\SDK\v2.0\Lib;C:\I\INTEL\IPP\5.2_beta\ia32\lib;C:\I\INTEL\IPP\5.2_beta\ia32\stublib;C:\I\INTEL\MKL\9.0\ia32\lib;C:\I\VS8\SDK\v2.0\lib
    _ACP_PATH=C:\I\VS8\VC\bin;C:\I\SDK\Bin;C:\I\VS8\Common7\Tools\bin;C:\I\VS8\Common7\tools;C:\I\VS8\Common7\ide;C:\Programme\HTML Help Workshop;C:\I\VS8\SDK\v2.0\bin;C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727;C:\I\VS8\;C:\I\INTEL\MKL\9.0\ia32\bin;C:\I\INTEL\IPP\5.2_beta\ia32\bin;C:\Programme\Windows Resource Kits\Tools\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Programme\Gemeinsame Dateien\Ulead Systems\MPEG;C:\Programme\Support Tools\;C:\Programme\Microsoft SQL Server\90\Tools\binn\
    ALLUSERSPROFILE=C:\Dokumente und Einstellungen\All Users
    APPDATA=C:\Dokumente und Einstellungen\heinz\Anwendungsdaten
    CLIENTNAME=Console
    CommonProgramFiles=C:\Programme\Gemeinsame Dateien
    COMPUTERNAME=DURSTI01
    ComSpec=C:\WINDOWS\system32\cmd.exe
    FP_NO_HOST_CHECK=NO
    HOMEDRIVE=C:
    HOMEPATH=\Dokumente und Einstellungen\heinz
    INCLUDE=C:\I\VS8\VC\include;C:\I\VS8\VC\include;C:\I\VS8\SDK\v2.0\include;C:\I\SDK\Include;C:\I\INTEL\IPP\5.2_beta\ia32\include;C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib;C:\I\INTEL\MKL\9.0\include
    INTEL_LICENSE_FILE=C:\Programme\Gemeinsame Dateien\Intel\Licenses
    LIB=C:\I\VS8\VC\lib;C:\I\VS8\;C:\I\VS8\lib;C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX;C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX;C:\I\SDK\Lib;C:\I\VS8\SDK\v2.0\Lib;C:\I\INTEL\IPP\5.2_beta\ia32\lib;C:\I\INTEL\IPP\5.2_beta\ia32\stublib;C:\I\INTEL\MKL\9.0\ia32\lib;C:\I\VS8\SDK\v2.0\lib
    LIBPATH=C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727
    LOGONSERVER=\\DURSTI01
    NUMBER_OF_PROCESSORS=1
    OS=Windows_NT
    PATH=C:\I\VS8\VC\bin;C:\I\SDK\Bin;C:\I\VS8\Common7\Tools\bin;C:\I\VS8\Common7\tools;C:\I\VS8\Common7\ide;C:\Programme\HTML Help Workshop;C:\I\VS8\SDK\v2.0\bin;C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727;C:\I\VS8\;C:\I\INTEL\MKL\9.0\ia32\bin;C:\I\INTEL\IPP\5.2_beta\ia32\bin;C:\Programme\Windows Resource Kits\Tools\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Programme\Gemeinsame Dateien\Ulead Systems\MPEG;C:\Programme\Support Tools\;C:\Programme\Microsoft SQL Server\90\Tools\binn\
    PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH
    PROCESSOR_ARCHITECTURE=x86
    PROCESSOR_IDENTIFIER=x86 Family 15 Model 2 Stepping 7, GenuineIntel
    PROCESSOR_LEVEL=15
    PROCESSOR_REVISION=0207
    ProgramFiles=C:\Programme
    SESSIONNAME=Console
    SystemDrive=C:
    SystemRoot=C:\WINDOWS
    TEMP=C:\TEMP
    TMP=C:\TMP
    USERDOMAIN=DURSTI01
    USERNAME=heinz
    USERPROFILE=C:\Dokumente und Einstellungen\heinz
    VS80COMNTOOLS=C:\I\VS8\Common7\Tools\
    WecVersionForRosebud.BA0=2
    windir=C:\WINDOWS
 Befehlszeilen     Die temporäre Datei "c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002629764420.rsp" wird erstellt. Inhalt:
[
/O2 /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /Gy /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /c /Wp64 /Zi /TP .\hallo.cpp
]Erstellen der Befehlszeile "cl.exe @c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002629764420.rsp /nologo /errorReport:prompt"Die temporäre Datei "c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002729764420.rsp" wird erstellt. Inhalt:
[
/OUT:"C:\I\VS2005\Projects\hallo\Release\hallo.exe" /INCREMENTAL:NO /MANIFEST /MANIFESTFILE:"Release\hallo.exe.intermediate.manifest" /DEBUG /PDB:"c:\I\VS2005\Projects\hallo\release\hallo.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /MACHINE:X86 kernel32.lib

".\Release\hallo.obj"

".\Release\stdafx.obj"
]Erstellen der Befehlszeile "link.exe @c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002729764420.rsp /NOLOGO /ERRORREPORT:PROMPT"Die temporäre Datei "c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002829764420.rsp" wird erstellt. Inhalt:
[
/outputresource:"..\release\hallo.exe;#1" /manifest

.\Release\hallo.exe.intermediate.manifest
]Erstellen der Befehlszeile "mt.exe @c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002829764420.rsp /nologo"Die temporäre Datei "c:\I\VS2005\Projects\hallo\hallo\Release\BAT00002929764420.bat" wird erstellt. Inhalt:
[
@echo, die Manifestressource wurde zuletzt um %TIME% am %DATE% aktualisiert > .\Release\mt.dep
]Erstellen der Befehlszeile "c:\I\VS2005\Projects\hallo\hallo\Release\BAT00002929764420.bat" Ausgabefenster     Kompilieren...
hallo.cpp
Verknüpfen...
Code wird generiert.
Codegenerierung ist abgeschlossen.
Das Manifest wird eingebettet...
 Ergebnisse     Das Buildprotokoll wurde unter "file://c:\I\VS2005\Projects\hallo\hallo\Release\BuildLog.htm" gespeichert.
hallo - 0 Fehler, 0 Warnung(en)
 ---------------------------------------------------
wie man sieht alles OK, im Programm sind A und B definiert und die Ausgabe ist auch richtig. Das Programm gibt aus:
Hallo
B is defined
---------------------------------------
soweit OK
jetzt schalten wir mal den Präprozessor ein:
Mit Zeilennummern (/P)
und erstellen neu:
-----------------------------------------
Buildprotokoll     Neu erstellen wurde gestartet: Projekt: "hallo", Konfiguration: "Release|Win32"
 Umgebungsbereich         _ACP_ATLPROV=C:\I\VS8\VC\Bin\ATLProv.dll
    _ACP_INCLUDE=C:\I\VS8\VC\include;C:\I\VS8\VC\include;C:\I\VS8\SDK\v2.0\include;C:\I\SDK\Include;C:\I\INTEL\IPP\5.2_beta\ia32\include;C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib;C:\I\INTEL\MKL\9.0\include
    _ACP_LIB=C:\I\VS8\VC\lib;C:\I\VS8\;C:\I\VS8\lib;C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX;C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX;C:\I\SDK\Lib;C:\I\VS8\SDK\v2.0\Lib;C:\I\INTEL\IPP\5.2_beta\ia32\lib;C:\I\INTEL\IPP\5.2_beta\ia32\stublib;C:\I\INTEL\MKL\9.0\ia32\lib;C:\I\VS8\SDK\v2.0\lib
    _ACP_PATH=C:\I\VS8\VC\bin;C:\I\SDK\Bin;C:\I\VS8\Common7\Tools\bin;C:\I\VS8\Common7\tools;C:\I\VS8\Common7\ide;C:\Programme\HTML Help Workshop;C:\I\VS8\SDK\v2.0\bin;C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727;C:\I\VS8\;C:\I\INTEL\MKL\9.0\ia32\bin;C:\I\INTEL\IPP\5.2_beta\ia32\bin;C:\Programme\Windows Resource Kits\Tools\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Programme\Gemeinsame Dateien\Ulead Systems\MPEG;C:\Programme\Support Tools\;C:\Programme\Microsoft SQL Server\90\Tools\binn\
    ALLUSERSPROFILE=C:\Dokumente und Einstellungen\All Users
    APPDATA=C:\Dokumente und Einstellungen\heinz\Anwendungsdaten
    CLIENTNAME=Console
    CommonProgramFiles=C:\Programme\Gemeinsame Dateien
    COMPUTERNAME=DURSTI01
    ComSpec=C:\WINDOWS\system32\cmd.exe
    FP_NO_HOST_CHECK=NO
    HOMEDRIVE=C:
    HOMEPATH=\Dokumente und Einstellungen\heinz
    INCLUDE=C:\I\VS8\VC\include;C:\I\VS8\VC\include;C:\I\VS8\SDK\v2.0\include;C:\I\SDK\Include;C:\I\INTEL\IPP\5.2_beta\ia32\include;C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib;C:\I\INTEL\MKL\9.0\include
    INTEL_LICENSE_FILE=C:\Programme\Gemeinsame Dateien\Intel\Licenses
    LIB=C:\I\VS8\VC\lib;C:\I\VS8\;C:\I\VS8\lib;C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX;C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX;C:\I\SDK\Lib;C:\I\VS8\SDK\v2.0\Lib;C:\I\INTEL\IPP\5.2_beta\ia32\lib;C:\I\INTEL\IPP\5.2_beta\ia32\stublib;C:\I\INTEL\MKL\9.0\ia32\lib;C:\I\VS8\SDK\v2.0\lib
    LIBPATH=C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727
    LOGONSERVER=\\DURSTI01
    NUMBER_OF_PROCESSORS=1
    OS=Windows_NT
    PATH=C:\I\VS8\VC\bin;C:\I\SDK\Bin;C:\I\VS8\Common7\Tools\bin;C:\I\VS8\Common7\tools;C:\I\VS8\Common7\ide;C:\Programme\HTML Help Workshop;C:\I\VS8\SDK\v2.0\bin;C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727;C:\I\VS8\;C:\I\INTEL\MKL\9.0\ia32\bin;C:\I\INTEL\IPP\5.2_beta\ia32\bin;C:\Programme\Windows Resource Kits\Tools\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Programme\Gemeinsame Dateien\Ulead Systems\MPEG;C:\Programme\Support Tools\;C:\Programme\Microsoft SQL Server\90\Tools\binn\
    PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH
    PROCESSOR_ARCHITECTURE=x86
    PROCESSOR_IDENTIFIER=x86 Family 15 Model 2 Stepping 7, GenuineIntel
    PROCESSOR_LEVEL=15
    PROCESSOR_REVISION=0207
    ProgramFiles=C:\Programme
    SESSIONNAME=Console
    SystemDrive=C:
    SystemRoot=C:\WINDOWS
    TEMP=C:\TEMP
    TMP=C:\TMP
    USERDOMAIN=DURSTI01
    USERNAME=heinz
    USERPROFILE=C:\Dokumente und Einstellungen\heinz
    VS80COMNTOOLS=C:\I\VS8\Common7\Tools\
    WecVersionForRosebud.BA0=2
    windir=C:\WINDOWS
 Befehlszeilen     Die temporäre Datei "c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002E29767904.rsp" wird erstellt. Inhalt:
[
/O2 /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /P /FD /EHsc /MT /Gy /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /c /Wp64 /Zi /TP .\hallo.cpp
]Erstellen der Befehlszeile "cl.exe @c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002E29767904.rsp /nologo /errorReport:prompt"Die temporäre Datei "c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002F29767904.rsp" wird erstellt. Inhalt:
[
/O2 /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /P /FD /EHsc /MT /Gy /Yc"stdafx.h" /Fp"Release\hallo.pch" /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /c /Wp64 /Zi /TP .\stdafx.cpp
]Erstellen der Befehlszeile "cl.exe @c:\I\VS2005\Projects\hallo\hallo\Release\RSP00002F29767904.rsp /nologo /errorReport:prompt"Die temporäre Datei "c:\I\VS2005\Projects\hallo\hallo\Release\RSP00003029767904.rsp" wird erstellt. Inhalt:
[
/OUT:"C:\I\VS2005\Projects\hallo\Release\hallo.exe" /INCREMENTAL:NO /MANIFEST /MANIFESTFILE:"Release\hallo.exe.intermediate.manifest" /DEBUG /PDB:"c:\I\VS2005\Projects\hallo\release\hallo.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /MACHINE:X86 kernel32.lib

".\Release\hallo.obj"

".\Release\stdafx.obj"
]Erstellen der Befehlszeile "link.exe @c:\I\VS2005\Projects\hallo\hallo\Release\RSP00003029767904.rsp /NOLOGO /ERRORREPORT:PROMPT" Ausgabefenster     Kompilieren...
stdafx.cpp
Kompilieren...
hallo.cpp
Verknüpfen...
LINK : fatal error LNK1181: Eingabedatei ".\Release\hallo.obj" kann nicht geöffnet werden.
 Ergebnisse     Das Buildprotokoll wurde unter "file://c:\I\VS2005\Projects\hallo\hallo\Release\BuildLog.htm" gespeichert.
hallo - 1 Fehler, 0 Warnung(en)
 
---------------------------------------------------------
wenn man jetzt ins Projektverzeichnis Hallo schaut sieht man 2 neu Dateien:
Hallo.i     ---> Preprocessed C/C++ Source
stdafx.i   ---> Preprocessed C/C++ Source
-------------------------------------------------------------
soweit OK, das muss ja auch so sein.
Wenn dann compiliert wird, sieht man dass nirgendswo ein Objektfile erzeugt wird.  :'(
Lediglich einige rsp Dateien werden erzeugt, die aber am Ende sofort wieder gelöscht werden,
so dass ich sie nicht untersuchen kann.
------------------------------------------------
so ist es in diesem kleinen Beispiel
und auch in Optimizer und seti_boinc
------------------------------------------------------
nun hab ich schon viel im Net herumgesucht aber noch nicht gefunden wie die dateien dem Linker beim Präprozessorlauf übergeben werden und wo man dass dann angeben muss in dieser Benutzeroberfläche.
--------------------------------------------------------------------------------
was bisher erfolgreich war:
alles was ohne Präprozessorlauf abarbeitbar war:
glut32
image_libs
jpeglib
libboinc
libboincapi
non_ICC
setiboincdb
----------------------
ojee,,,, schöne alte Zeiten wo man das alles mit ein paar befehlszeilen im batchfile machen konnte.

P.S precompiled header aus hat nix geändert.
bin für jeden Hinweis dankbar.
schönen Feiertag für alle
MFG heinz ~seti_britta

Title: Re: optimized sources
Post by: Crunch3r on 17 May 2007, 11:06:08 am
Kannst du dein Hallo projekt mal packen und irgendwo hoch laden damit man sich das mal anschauen kann ?

Title: Re: optimized sources
Post by: _heinz on 17 May 2007, 01:32:56 pm
Hallo Crunch3r,
hier iss es als 7z


[attachment deleted by admin]
Title: Re: optimized sources
Post by: Crunch3r on 17 May 2007, 02:07:32 pm
Hallo...

Also irgendwie reden wir aneinander vorbei glaub ich.  :-\
(Was willst du mit dem Prärozessor?)

schau dir mal die bilder hier an...  und sag mir mal ob du das so haben willst.

(http://calbe.dw70.de/1.JPG)
(http://calbe.dw70.de/2.JPG)
Title: Re: optimized sources
Post by: _heinz on 17 May 2007, 04:30:18 pm
Hallo Crunch3r,

Generate Preprocessed File   ----> mit Zeilennummern   einschalten

das war bei mir nach Projektübernahme eingeschaltet
Kannst du mal nachschauen ob das bei dir auch so ist?
Dachte immer dass nur dann die Präprozessordefinitionen zur Wirkung kommen.
ansonsten wird doch alles übersetzt
????




Title: Re: optimized sources
Post by: Crunch3r on 17 May 2007, 04:38:14 pm
Hallo Crunch3r,

Generate Preprocessed File   ----> mit Zeilennummern   einschalten

das war bei mir nach Projektübernahme eingeschaltet
Kannst du mal nachschauen ob das bei dir auch so ist?
Dachte immer dass nur dann die Präprozessordefinitionen zur Wirkung kommen.
ansonsten wird doch alles übersetzt
????


Das hab ich abgeschaltet, sonnst kommt genau das gleiche raus wie bei dir ... ne fehlermeldung.

Die präprozessor macros wie im 2. bild zu sehen wo das "B" gesetzt ist, kommen IMMER zur wirkung.


Da das nun geklärt ist  ;D Happy optimizing  ;)

gruß
Crunch3r
Title: Re: optimized sources
Post by: _heinz on 17 May 2007, 06:38:34 pm
Hallo Crunch3r,

Generate Preprocessed File   ----> mit Zeilennummern   einschalten

das war bei mir nach Projektübernahme eingeschaltet
Kannst du mal nachschauen ob das bei dir auch so ist?
Dachte immer dass nur dann die Präprozessordefinitionen zur Wirkung kommen.
ansonsten wird doch alles übersetzt
????


Das hab ich abgeschaltet, sonnst kommt genau das gleiche raus wie bei dir ... ne fehlermeldung.

Die präprozessor macros wie im 2. bild zu sehen wo das "B" gesetzt ist, kommen IMMER zur wirkung.


Da das nun geklärt ist  ;D Happy optimizing  ;)

gruß
Crunch3r


Merci   ;D
Title: Re: optimized sources
Post by: _heinz on 17 May 2007, 07:33:40 pm
hallo Crunch3r,
bin jetzt das erstemal durch und alle teile von seti_boinc wurden übersetzt  ;D
es wird auch gelinkt, gibt aber noch nicht aufgelöste externe referencen.
Mit schema_master.cpp und schema_master.h hatte ich die meisten Probleme. Dort steht zwar dass diese beiden automatisch erzeugt wurden, und nicht editiert werden sollen.
Aber ich musste sie editieren, sonst wäre ich nicht ohne error durchgekommen.
Weiss jemand etwas darüber wo sie erzeugt werden ??? denn dort müsste eigentlich korrigiert werden.
--------------------------------------------
Ist noch viel zu tun. Ich häng mal das Buildprotokoll dran

MFG heinz ~seti_britta




[attachment deleted by admin]
Title: Re: optimized sources
Post by: _heinz on 18 May 2007, 04:23:05 am
muss noch den Optimizer fehlerfrei machen  ;D
------------------------------------------------------------------
------ Neues Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Die Zwischen- und Ausgabedateien für das Projekt "Optimizer" mit der Konfiguration "Release32-NOGFX|Win32" werden gelöscht.
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /C /GF /FD /EHsc /MT /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp"
   ".\opt_VIS2.cpp"
   ".\opt_VIS.cpp"
   ".\opt_SSE3.cpp"
   ".\opt_SSE2.cpp"
   ".\opt_SSE.cpp"
   ".\opt_os_interface.cpp"
   ".\opt_MMX.cpp"
   ".\opt_MDMX.cpp"
   ".\opt_altivec.cpp"
   ".\memspeed.cpp"
   ".\FoldTst.cpp"
   ".\cpuid_tbl.cpp"
   ".\cpu_x86.cpp"
   ".\BHSSEfold.cpp"
   ".\benchmark.cpp"
cl : Befehlszeile warning D9007 : "/C" erfordert "/E, /EP oder /P"; Option wird ignoriert.
AKfoldSSE.cpp
-----IPP-----
-----SSE2/em-----
opt_VIS2.cpp
opt_VIS.cpp
opt_SSE3.cpp
.\opt_SSE3.cpp(147) : error C3861: "_mm_addsub_ps": Bezeichner wurde nicht gefunden.
.\opt_SSE3.cpp(148) : error C3861: "_mm_addsub_ps": Bezeichner wurde nicht gefunden.
opt_SSE2.cpp
.\opt_SSE2.cpp(86) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(126) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(129) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(136) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(139) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(144) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(147) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(148) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(150) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
opt_SSE.cpp
.\opt_SSE.cpp(146) : warning C4311: 'Typumwandlung': Zeigerverkürzung von 'const float *__w64 ' zu 'unsigned int'
opt_os_interface.cpp
.\opt_os_interface.cpp(92) : warning C4552: '<<': Operator hat keine Auswirkungen; Operator mit Nebeneffekt erwartet
.\opt_os_interface.cpp(188) : error C2664: 'RegOpenKeyExW': Konvertierung des Parameters 2 von 'const char [68]' in 'LPCWSTR' nicht möglich
        Die Typen, auf die verwiesen wird, sind nicht verknüpft; die Konvertierung erfordert einen reinterpret_cast-Operator oder eine Typumwandlung im C- oder Funktionsformat.
.\opt_os_interface.cpp(192) : error C2664: 'RegQueryValueExW': Konvertierung des Parameters 2 von 'const char [21]' in 'LPCWSTR' nicht möglich
        Die Typen, auf die verwiesen wird, sind nicht verknüpft; die Konvertierung erfordert einen reinterpret_cast-Operator oder eine Typumwandlung im C- oder Funktionsformat.
opt_MMX.cpp
opt_MDMX.cpp
opt_altivec.cpp
memspeed.cpp
FoldTst.cpp
cpuid_tbl.cpp
cpu_x86.cpp
.\cpu_x86.cpp(548) : warning C4311: 'Typumwandlung': Zeigerverkürzung von 'void *' zu 'uint32'
.\cpu_x86.cpp(549) : warning C4312: 'Typumwandlung': Konvertierung von 'uint32' in größeren Typ 'void *'
BHSSEfold.cpp
-----IPP-----
-----SSE2/em-----
benchmark.cpp
-----IPP-----
-----SSE2-----
-----IPP-----
-----SSE2-----
Code wird generiert...
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(129) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(130) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(159) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(160) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(189) : warning C4700: Die nicht initialisierte lokale Variable "tmp1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(190) : warning C4700: Die nicht initialisierte lokale Variable "tmp2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(191) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(192) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(230) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(231) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(267) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(268) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(320) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(347) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(373) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(398) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 13 Fehler, 21 Warnung(en)
========== Alles neu erstellen: 0 erfolgreich, Fehler bei 1, 0 übersprungen ==========

Title: Re: optimized sources
Post by: Crunch3r on 18 May 2007, 08:12:18 am
hallo Crunch3r,
bin jetzt das erstemal durch und alle teile von seti_boinc wurden übersetzt  ;D
es wird auch gelinkt, gibt aber noch nicht aufgelöste externe referencen.
Mit schema_master.cpp und schema_master.h hatte ich die meisten Probleme. Dort steht zwar dass diese beiden automatisch erzeugt wurden, und nicht editiert werden sollen.
Aber ich musste sie editieren, sonst wäre ich nicht ohne error durchgekommen.
Weiss jemand etwas darüber wo sie erzeugt werden ??? denn dort müsste eigentlich korrigiert werden.
--------------------------------------------
Ist noch viel zu tun. Ich häng mal das Buildprotokoll dran

MFG heinz ~seti_britta


Ich hab mir mal das build protokoll angesehen und es sieht so aus, als ob du dem linker die ipp libs fehlen und die optimizer lib. Stehen die bei dir im linker unter "input" drin ?

wenn nicht, dann sollte es ungefähr so aussehen:

"optimizer.lib libirc.lib ippsmerged.lib ippvmmerged.lib ippchmerged.lib ippcorel.lib svml_dispmt.lib libircmt.lib libmmt.lib delayimp.lib gdi32.lib user32.lib kernel32.lib libcmt.lib libcpmt.lib winmm.lib opengl32.lib glu32.lib oldnames.lib ole32.lib"

Title: Re: optimized sources
Post by: _heinz on 18 May 2007, 07:15:20 pm
@ Crunch3r Merci  ;D
Probleme im Optimizer:
Optimizer
meine Präprozessordefinitionen für Optimizer:

USE_SSE2
WIN32
_WIN32
_WINDOWS
_CONSOLE
NDEBUG
_LIB
_MT
CLIENT
NBOINC_APP_GRAPHICS
_UNICODE
UNICODE
------------------------------
wenn man nun Optimizer neu erstellt wird auch opt_SSE3.cpp übersetzt und es treten darin verschiedene Fehler auf, weil pmmintrin.h nicht includiert ist.
Mich wundert das schon, weil ich dachte wenn USE_SSE3 nicht angegeben wird beim Präprozessor, sollte es auch nicht zur Übersetzung kommen.
Ferner ist zu bemerken dass wenn pmmintrin.h aufgerufen wurde __SSE3__  noch nicht definiert ist und es somit zu den Fehlern in Zeile 147 und 148 kommt. Siehe alleinige Compilation von opt_SSE3.cpp -->
-------------------------------------------------------------------------------------------------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /C /GF /FD /EHsc /MT /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_SSE3.cpp"
cl : Befehlszeile warning D9007 : "/C" erfordert "/E, /EP oder /P"; Option wird ignoriert.
opt_SSE3.cpp
.\opt_SSE3.cpp(147) : error C3861: "_mm_addsub_ps": Bezeichner wurde nicht gefunden.
.\opt_SSE3.cpp(148) : error C3861: "_mm_addsub_ps": Bezeichner wurde nicht gefunden.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 2 Fehler, 1 Warnung(en)
========== Erstellen: 0 erfolgreich, Fehler bei 1, 0 aktuell, 0 übersprungen ==========
-----------------------------------------------------

und hier nochmal das ganze Projekt Optimizer ---->

------ Neues Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Die Zwischen- und Ausgabedateien für das Projekt "Optimizer" mit der Konfiguration "Release32-NOGFX|Win32" werden gelöscht.
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /C /GF /FD /EHsc /MT /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp"
   ".\opt_VIS2.cpp"
   ".\opt_VIS.cpp"
   ".\opt_SSE3.cpp"
   ".\opt_SSE2.cpp"
   ".\opt_SSE.cpp"
   ".\opt_os_interface.cpp"
   ".\opt_MMX.cpp"
   ".\opt_MDMX.cpp"
   ".\opt_altivec.cpp"
   ".\memspeed.cpp"
   ".\FoldTst.cpp"
   ".\cpuid_tbl.cpp"
   ".\cpu_x86.cpp"
   ".\BHSSEfold.cpp"
   ".\benchmark.cpp"
cl : Befehlszeile warning D9007 : "/C" erfordert "/E, /EP oder /P"; Option wird ignoriert.
AKfoldSSE.cpp
-----IPP-----
-----SSE2/em-----
opt_VIS2.cpp
opt_VIS.cpp
opt_SSE3.cpp
.\opt_SSE3.cpp(147) : error C3861: "_mm_addsub_ps": Bezeichner wurde nicht gefunden.
.\opt_SSE3.cpp(148) : error C3861: "_mm_addsub_ps": Bezeichner wurde nicht gefunden.
opt_SSE2.cpp
.\opt_SSE2.cpp(86) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(126) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(129) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(136) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(139) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(144) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(147) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(148) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
.\opt_SSE2.cpp(150) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
opt_SSE.cpp
.\opt_SSE.cpp(146) : warning C4311: 'Typumwandlung': Zeigerverkürzung von 'const float *__w64 ' zu 'unsigned int'
opt_os_interface.cpp
.\opt_os_interface.cpp(92) : warning C4552: '<<': Operator hat keine Auswirkungen; Operator mit Nebeneffekt erwartet
.\opt_os_interface.cpp(185) : error C2664: 'RegOpenKeyExW': Konvertierung des Parameters 2 von 'const char [68]' in 'LPCWSTR' nicht möglich
        Die Typen, auf die verwiesen wird, sind nicht verknüpft; die Konvertierung erfordert einen reinterpret_cast-Operator oder eine Typumwandlung im C- oder Funktionsformat.
.\opt_os_interface.cpp(189) : error C2664: 'RegQueryValueExW': Konvertierung des Parameters 2 von 'const char [21]' in 'LPCWSTR' nicht möglich
        Die Typen, auf die verwiesen wird, sind nicht verknüpft; die Konvertierung erfordert einen reinterpret_cast-Operator oder eine Typumwandlung im C- oder Funktionsformat.
opt_MMX.cpp
opt_MDMX.cpp
opt_altivec.cpp
memspeed.cpp
FoldTst.cpp
cpuid_tbl.cpp
cpu_x86.cpp
.\cpu_x86.cpp(548) : warning C4311: 'Typumwandlung': Zeigerverkürzung von 'void *' zu 'uint32'
.\cpu_x86.cpp(549) : warning C4312: 'Typumwandlung': Konvertierung von 'uint32' in größeren Typ 'void *'
BHSSEfold.cpp
-----IPP-----
-----SSE2/em-----
benchmark.cpp
-----IPP-----
-----SSE2-----
-----IPP-----
-----SSE2-----
Code wird generiert...
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(131) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(132) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(161) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(162) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(191) : warning C4700: Die nicht initialisierte lokale Variable "tmp1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(192) : warning C4700: Die nicht initialisierte lokale Variable "tmp2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(193) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(194) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(232) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(233) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(269) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(270) : warning C4700: Die nicht initialisierte lokale Variable "sum2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(322) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(349) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(375) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(400) : warning C4700: Die nicht initialisierte lokale Variable "sum1" wurde verwendet.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 13 Fehler, 21 Warnung(en)
========== Alles neu erstellen: 0 erfolgreich, Fehler bei 1, 0 übersprungen ==========

------------------------------------------------------------

Bemerkungen:
1. wie man sieht wird auch opt_SSE3 compiliert, _mm_addsub_ps  befindet sich in pmmintrin.h jedoch ist __SSE3__ nicht definiert.
2. Die Typumwand von VEC nach VEC_I   ist ein Problem,
3 opt_os_interface.cpp(185) und (189)  sind ein schwieriges Problem
4. memspeed.cpp ---> die Warnungen sind zwar lästiig, könnte man aber durchaus lassen. sum1=0  funktioniert nicht !!!
----------------------------------------------------------------
Das Gute zum Schluss
das Projekt seti_boinc wird ohne Fehler und nur mit 1 Warnung compiliert, jedoch noch nicht gelinkt, weil Optimizer noch nicht fertig ist
-----------------------------------
weitere Hinweise sind jederzeit willkommen
MFG ~seti_britta



Title: Re: optimized sources
Post by: Crunch3r on 18 May 2007, 07:34:34 pm
@ Crunch3r Merci  ;D
Probleme im Optimizer:
Optimizer
meine Präprozessordefinitionen für Optimizer:

USE_SSE2
WIN32
_WIN32
_WINDOWS
_CONSOLE
NDEBUG
_LIB
_MT
CLIENT
NBOINC_APP_GRAPHICS
_UNICODE
UNICODE
------------------------------

es fehlt "USE_IPP"



Du hast ein problem mit den intrinsics, weil der compiler die microsoft intrinsics nutzt un NICHT die von ICC dem intel compiler,
Darauf sind ALLE deine restlichen probleme zurück zu führen !

Installiere dir den intel compiler und MELDE DICH BEI MIR VIA PM !

Des weiteren rate ich davon ab, das hier im öffentlichen forum zu posten. Du kannst ja auch ins geschlossene ;)




Title: Re: optimized sources
Post by: _heinz on 19 May 2007, 04:34:46 am
sorry, USE_IPP hab ich beim kopieren nicht mit erwischt.... ist aber drin gewesen, und pmmintrin.h hatte ich mal eingefügt in opt_SSE3, als ich Intels IPP und MKL noch nicht benutzte.
 ;)
Merci
Title: Re: optimized sources
Post by: _heinz on 19 May 2007, 06:25:26 pm
3 opt_os_interface.cpp(185) und (189)  sind ein schwieriges Problem ---> gelöst  done ;D

for all others who read here work is going on  :)
Title: Re: optimized sources
Post by: _heinz on 20 May 2007, 04:58:57 pm
Optimizer
these programms today sucessful compiled
opt_SSE3      ;D
opt_3Dnow
opt_3Dnow+
opt_SSE.joeE
opt_SSE.joeF
---------------------
MfG ~seti_britta  ;D   
Title: Re: optimized sources
Post by: Simon on 20 May 2007, 06:01:27 pm
Heinz,

the ones with an extension like .joeE, .joeF etc. can be skipped. I was just lazy and didn't clean them out - I make backup copies of files before I put in updated ones from Joe's posts.

Those two were Rev-2.1E/F. The current file is opt_SSE.cpp.

HTH,
Simon.
Title: Re: optimized sources
Post by: _heinz on 23 May 2007, 05:41:38 am
hi all,
in remember of my daughter britta www.britta-d.de I use her name here in the project.
MfG
heinz alias ~seti_britta  
Title: Re: optimized sources
Post by: Crunch3r on 23 May 2007, 05:44:22 am
hi all,
in remember of my daughter britta www.britta-d.de I use her name here in the project.
MfG
heinz alias ~seti_britta  


What happend ?
Title: Re: optimized sources
Post by: _heinz on 23 May 2007, 10:16:40 am
hi all,
in remember of my daughter britta www.britta-d.de I use her name here in the project.
MfG
heinz alias ~seti_britta  


What happend ?
divorced as she was seven years, see her seldom, mostly once a year, live now in FR,
perhaps I should update the website, the black frame looks like a tristesse  ;)
Title: Re: optimized sources
Post by: _heinz on 23 May 2007, 12:14:38 pm
have now still one error in the Optimizer in opt_SSE2 (VEC kann nicht in VEC_I konvertiert werden)
It looks like the problem is between the different structures of VEC and VEC_I.
Have made a small program to play with the structures and the assign of structures to see what happen.
It takes a little bit of time to find out what implies the error.
Had to study the intrinsics  _mm_stream_si32 and _mm_cvtsi128_si32 to understand the problem.
heinz ~seti_britta
Title: Re: optimized sources
Post by: _heinz on 24 May 2007, 06:29:27 pm
After playing with my short sample program and the intrinsics used in GetPowerSpectrum_ptt I  believe that the overload of VEC and VEC_I are the problem to be solved.
Think to write (1,2) include files to handle the stuff of using  intrinsics and so prevent overload.
Mfg   ;)

Title: Re: optimized sources
Post by: _heinz on 27 May 2007, 01:43:41 am
It takes a little time, be patience
last error is always the hardest
 ;)
Title: Re: optimized sources
Post by: _heinz on 27 May 2007, 10:11:43 pm
problem of s_put1_NC solved  ;D ;D ;D
opt_SSE2.cpp compiled
------------------------------------------------------------

------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MT /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_SSE2.cpp"
opt_SSE2.cpp
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\opt_sse2.cpp(382) : warning C4700: Die nicht initialisierte lokale Variable "temp0" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\opt_sse2.cpp(383) : warning C4700: Die nicht initialisierte lokale Variable "temp1" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\opt_sse2.cpp(384) : warning C4700: Die nicht initialisierte lokale Variable "temp2" wurde verwendet.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\opt_sse2.cpp(385) : warning C4700: Die nicht initialisierte lokale Variable "temp3" wurde verwendet.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 0 Fehler, 4 Warnung(en)
========== Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========

heinz ~seti_britta  ;D
Title: Re: optimized sources
Post by: _heinz on 28 May 2007, 04:21:21 pm
yeah, have now Optimizer.lib    ;D
nearest to the client
-----------------------------------------------------------------------------------
Optimizer - 0 Fehler, 36 Warnung(en)
========== Alles neu erstellen: 1 erfolgreich, Fehler bei 0, 0 übersprungen ==========
all modules are compiled with MSC Compiler:
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86

regards seti_britta ~heinz  ;D ;D ;D
Title: Re: optimized sources
Post by: _heinz on 30 May 2007, 05:54:17 pm
have linked the client -->got 2 not referenced external symbols
now I´m searching  ;)
Title: Re: optimized sources
Post by: _heinz on 02 Jun 2007, 04:52:37 am
have now still 1 not referenced external symbol  ;)
Title: Re: optimized sources
Post by: _heinz on 04 Jun 2007, 05:17:47 pm
new client created    ;D ;D ;D ----->

Verknüpfen...
Microsoft (R) Incremental Linker Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.
"/OUT:.\Release32-NOGFX\seti_boinc.exe" /INCREMENTAL:NO "/LIBPATH:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" "/LIBPATH:C:\I\INTEL\IPP\5.2_beta\ia32\lib" "/LIBPATH:C:\I\INTEL\MKL\9.0\ia32\lib" "/LIBPATH:C:\masm32\lib" "/LIBPATH:C:\I\VS8\VC\lib" "/LIBPATH:C:\I\SDK\Lib" "/LIBPATH:C:\masm32\m32lib" "/LIBPATH:C:\I\SDK\Lib\AMD64" "/LIBPATH:C:\I\SDK\Lib\IA64" /MANIFEST:NO "/PDB:c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.pdb" /SUBSYSTEM:WINDOWS /MACHINE:X86 glut32.lib glut.lib glu32.lib optimizer.lib image_libs.lib jpeglib.lib libboinc.lib libboincapi.lib non_ICC.lib setiboincdb.lib ippsmerged.lib ippvmmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmt.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
".\Release32-NOGFX\analyzeFuncs.obj"
".\Release32-NOGFX\analyzePoT.obj"
".\Release32-NOGFX\analyzeReport.obj"
".\Release32-NOGFX\app_ipc.obj"
".\Release32-NOGFX\boinc_api.obj"
".\Release32-NOGFX\chirpfft.obj"
".\Release32-NOGFX\fft8g.obj"
".\Release32-NOGFX\filesys.obj"
".\Release32-NOGFX\gaussfit.obj"
".\Release32-NOGFX\gdata.obj"
".\Release32-NOGFX\graphics_api.obj"
".\Release32-NOGFX\graphics_data.obj"
".\Release32-NOGFX\gutil.obj"
".\Release32-NOGFX\lcgamm.obj"
".\Release32-NOGFX\main.obj"
".\Release32-NOGFX\malloc_a.obj"
".\Release32-NOGFX\parse.obj"
".\Release32-NOGFX\progress.obj"
".\Release32-NOGFX\pulsefind.obj"
".\Release32-NOGFX\s_util.obj"
".\Release32-NOGFX\sah_gfx.obj"
".\Release32-NOGFX\sah_gfx_base.obj"
".\Release32-NOGFX\schema_master.obj"
".\Release32-NOGFX\seti.obj"
".\Release32-NOGFX\seti_header.obj"
".\Release32-NOGFX\shmem.obj"
".\Release32-NOGFX\spike.obj"
".\Release32-NOGFX\sqlblob.obj"
".\Release32-NOGFX\sqlrow.obj"
".\Release32-NOGFX\tgalib.obj"
".\Release32-NOGFX\timecvt.obj"
".\Release32-NOGFX\util.obj"
".\Release32-NOGFX\version.obj"
".\Release32-NOGFX\windows_opengl.obj"
".\Release32-NOGFX\worker.obj"
".\Release32-NOGFX\xml_util.obj"
Browseinformationsdatei wird erstellt...
Microsoft Browse Information Maintenance-Programm Version 8.00.50727
Copyright (C) Microsoft Corporation. All rights reserved.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
seti_boinc - 0 Fehler, 3 Warnung(en)
========== Alles neu erstellen: 1 erfolgreich, Fehler bei 0, 0 übersprungen ==========
----------------------------------------------------------------------------------------
 all problems are solved, new client created  ;D ;D ;D
regards seti_britta ~heinz
Title: Re: optimized sources
Post by: Simon on 04 Jun 2007, 05:52:25 pm
Congrats :)
Title: Re: optimized sources
Post by: _heinz on 04 Jun 2007, 06:36:20 pm
Hallo Simon,

hab dein kwsn-test-package2 heruntergeladen und ausgepackt.
wohin muss ich das auspacken damit es funktioniert ??
kannst du mir bitte ein paar Hinweise zur Benutzung geben

heinz
Title: Re: optimized sources
Post by: Simon on 04 Jun 2007, 07:47:46 pm
Äh,

keine Ahnung, welches Du Dir geladen hast.

Ich würde das folgende empfehlen:

Knabench 1.43 (http://lunatics.at/index.php?module=Downloads;sa=dlview;id=54)

Wenn Du das (irgendwo) auspackst, macht's ein Verzeichnis namens "KWSN Knabench 1.43". Da drin findest Du einige weitere Unterverzeichnisse.

Die interessantesten davon sind Science_apps/ und TestWUs/. In beiden gibt es ein "Reserve" Unterverzeichnis. Ins "Reserve" verschiebe alle WUs bzw. Science Apps, die Du NICHT benchmarken willst. Ins Verzeichnis Science_apps bzw. TestWUs solltest Du logischerweise dafür alle Apps/WUs kopieren, die Du testen willst.

Wenn Du die gewünschten Apps und WUs gewählt hast (bzw. kopiert/verschoben), dann mach einen Doppelklick auf "Knabench-1.43.cmd" - und stell Dich auf ein bißchen Warten ein ;) Wenn das Skript fertig ist, dann meldet's das. Ebenso gibt's am Ende eine Zusammenfassung aller Apps/WUs.

In "Readme & Licenses" findest Du auch ein kurzes Readme zur Verwendung.

Mfg,
Simon.
Title: Re: optimized sources
Post by: _heinz on 05 Jun 2007, 05:14:16 pm
get an error, application hung up.  stackoverflow, debug now.....
-------------------------------------------------------------------------------------------

include 2.3.set5 in the project, have probs with memspeed.cpp. If I compile it says no class or namespace
----------------------------------------------------------------------------------------
------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Debug Win32 ------
Kompilieren...
memspeed.cpp
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(47) : error C2653: 'std': Keine Klasse oder Namespace
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(47) : error C3861: "min": Bezeichner wurde nicht gefunden.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(74) : error C2653: 'std': Keine Klasse oder Namespace
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\memspeed.cpp(74) : error C3861: "min": Bezeichner wurde nicht gefunden.

--------------------------------------------------------------------------------------
47       overhead = std::min(overhead, ticks);

74      min_ticks = std::min(min_ticks, ticks);
--------------------------------------------------------------------------------------
any suggestions ???

Title: Re: optimized sources
Post by: _heinz on 05 Jun 2007, 05:33:28 pm
Hi Crunchr,
seit der Umstellung auf DEBUG hab ich echte probleme. Includes werden nicht gefunden, Fehler in Projektteilen, die vorher fehlerfrei compiliert wurden usw....

Irgendwas stimmt da nicht !!!!
zum Beispiel ------>
------ Neues Erstellen gestartet: Projekt: libboincapi, Konfiguration: Debug Win32 ------
Die Zwischen- und Ausgabedateien für das Projekt "libboincapi" mit der Konfiguration "Debug|Win32" werden gelöscht.
Kompilieren...
graphics_api.C
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C2146: Syntaxfehler: Fehlendes ';' vor Bezeichner 'worker_thread_handle'
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C2377: 'HANDLE': Neudefinition; typedef kann nicht mit einem anderen Symbol überladen werden
        c:\i\sdk\include\winnt.h(334): Siehe Deklaration von 'HANDLE'
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_api.h(64) : error C2146: Syntaxfehler: Fehlendes ';' vor Bezeichner 'hQuitEvent'
c:\i\sc\seti\boinc\api\graphics_api.h(64) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_api.h(64) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_api.h(65) : error C2146: Syntaxfehler: Fehlendes ';' vor Bezeichner 'graphics_threadh'
c:\i\sc\seti\boinc\api\graphics_api.h(65) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_api.h(65) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
graphics_data.C
graphics_impl.C
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C2146: Syntaxfehler: Fehlendes ';' vor Bezeichner 'worker_thread_handle'
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C2377: 'HANDLE': Neudefinition; typedef kann nicht mit einem anderen Symbol überladen werden
        c:\i\sdk\include\winnt.h(334): Siehe Deklaration von 'HANDLE'
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_api.h(64) : error C2146: Syntaxfehler: Fehlendes ';' vor Bezeichner 'hQuitEvent'
c:\i\sc\seti\boinc\api\graphics_api.h(64) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_api.h(64) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_api.h(65) : error C2146: Syntaxfehler: Fehlendes ';' vor Bezeichner 'graphics_threadh'
c:\i\sc\seti\boinc\api\graphics_api.h(65) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_api.h(65) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_impl.c(57) : error C2146: Syntaxfehler: Fehlendes ';' vor Bezeichner 'hQuitEvent'
c:\i\sc\seti\boinc\api\graphics_impl.c(57) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_impl.c(57) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\graphics_impl.c(57) : error C2086: 'int hQuitEvent': Neudefinition
        c:\i\sc\seti\boinc\api\graphics_api.h(64): Siehe Deklaration von 'hQuitEvent'
c:\i\sc\seti\boinc\api\graphics_impl.c(98) : error C2440: '=': 'HANDLE' kann nicht in 'int' konvertiert werden
        Es gibt keinen Kontext, in dem diese Konvertierung möglich ist
c:\i\sc\seti\boinc\api\graphics_impl.c(106) : error C2440: '=': 'HANDLE' kann nicht in 'int' konvertiert werden
        Es gibt keinen Kontext, in dem diese Konvertierung möglich ist
c:\i\sc\seti\boinc\api\graphics_impl.c(107) : error C2664: 'ResumeThread': Konvertierung des Parameters 1 von 'int' in 'HANDLE' nicht möglich
        Die Konvertierung eines ganzzahligen Typs in einen Zeigertyp erfordert ein reinterpret_cast-Operator oder eine Typumwandlung im C- oder Funktionsformat
gutil.C
gutil_text.C
reduce_lib.C
reduce_main.C
texture.C
Code wird generiert...
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\BuildLog.htm" gespeichert.
libboincapi - 25 Fehler, 0 Warnung(en)
========== Alles neu erstellen: 0 erfolgreich, Fehler bei 1, 0 übersprungen ==========


was ist da zu tun ???? :'(
Title: Re: optimized sources
Post by: Urs Echternacht on 05 Jun 2007, 05:37:55 pm
...
--------------------------------------------------------------------------------------
47       overhead = std::min(overhead, ticks);

74      min_ticks = std::min(min_ticks, ticks);
--------------------------------------------------------------------------------------
any suggestions ???
I don't know if my solution would be the best or fastest, but i hope it works (just basic C) :

overhead = (overhead <= ticks) ? overhead : ticks;
min_ticks = (min_ticks <= ticks) ? min_ticks : ticks;

Title: Re: optimized sources
Post by: Crunch3r on 05 Jun 2007, 05:42:14 pm
warum auf debug stellen ?

Einfach die app im debugger starten, so wie ich es per PM geschrieben habe. Dazu muss die app nicht im debug modus compiled sein.

der debugger wird die dann schon sagen, wo die app hängt.

P.S.

Ich kann immer nocht nicht ganz nachvollziehen, warum du dir das mit dem vs compiler antuhst.
Keiner wird je die app nutzen die mit dem ms compiler erzeugt wurde ... die ist zu langsam und selbst Eric macht das nicht.
Er nutzt DEV c++ oder wie auch immer das ding heißt.

Sorry aber das ist (finde ich ) unnütze arbeit.

Sorry noch mal für die harten worte, aber es ist leider so.

Title: Re: optimized sources
Post by: _heinz on 05 Jun 2007, 06:00:50 pm
hallo Crunch3r, sicher haste recht.....
würde mich aber freuen wenn du mir ab und zu mal ne Frage beantwortest

gruss heinz ~seti_britta
Title: Re: optimized sources
Post by: Crunch3r on 05 Jun 2007, 06:18:13 pm
hallo Crunch3r, sicher haste recht.....
würde mich aber freuen wenn du mir ab und zu mal ne Frage beantwortest

gruss heinz ~seti_britta

Hab ich doch... oder nicht ?

Hab jedesmal auf deine PM geantwortet und mit sicherheit hab ich keinen unsinn geschrieben!

Sicherlich kann ich dir nicht in bei allen deinen fragen helfen, das mag daran liegen, daß ich KEIN programmierer bin... (UND auch noch LINUX für ALPHA/IA64 und FREEBSD portieren musste) was ich auch nie behauptet habe ! (Das haben andere gemacht... warum auch immer )

Ich kann, wenn ich will und nutzen darin sehe, wohl den code ändern, damit es funktioniert und auch wie man etwas zum funktionieren bekommt weiß ich ganz gut... braucht's dir nur mal den port von der 2.2B auf linux anzusehen ... ich denke mal da hab ich eine GANZE MENGE an zeit reingesteckt bis das alles ging und ich kann nicht grad sagen, das ich mich auf vorhergehende "developer" oder "porter" verlassen konnte, die sich gelmeldet haben für "developtment duty" . Da kam nix....

 (Außnahmen sind wie immer Joe Segur und Hans Dorn !!!)


So nun sag mir mal welche frage ich dir nicht beantwortet haben soll ?     ;)
Title: Re: optimized sources
Post by: Simon on 05 Jun 2007, 06:31:18 pm
Immer langsam mit den jungen Pferden, ihr zwei :)

Also: einerseits ist es von fraglichem Wert, das Projekt auf MSVC zu portieren, da dieser Compiler mit ziemlicher Sicherheit nie wirklich eingesetzt werden wird - GCC ist schneller (wesentlich), ICC sowieso.

Andererseits ist es z.B. für GPU/PS3 etc. interessant, weil die soweit ich weiß eben schon (nur?) MSVC (2005?) unterstützen (z.B. RapidMind oder CUDA und ähnliche Backends).

Drittens: wir sind alle ohne Ausnahme Freiwillige! Also: fordern kann man wirklich nix, auch wenn so ziemlich alle Leute hier recht hilfsbereit (und auch kompetent) sind.

Also nochmal zusammengefaßt: bitte keinen offenen Schlagabtausch, das bringt wohl nix. Heinz, ich glaube, daß Crunch3r durchaus einiges geholfen hat und deswegen nicht getadelt werden muß, wär ja noch schöner - Crunch3r, ich glaube, daß Heinz das vielleicht nicht ganz so gemeint hat, wie Du's gelesen hast, wie ich das so seh' war das eher für die Zukunft gemeint als in der Vergangenheit.

Ihr dürft mich gerne korrigieren, aber so seh' ich das mal.

Mfg,
Simon.
Title: Re: optimized sources
Post by: _heinz on 05 Jun 2007, 06:55:09 pm
sorry für die probleme.... :'( hab jetzt alles wieder im Griff, hatte mich verrannt im DEBUG
hab jetzt wieder zurückgestell, nun ist alles wieder normal

  ;D
Title: Re: optimized sources
Post by: _heinz on 05 Jun 2007, 07:10:38 pm
keine hektick Crunch3r, alles bestens, hast alle fragen beantwortet <freu>  ;) i
Title: Re: optimized sources
Post by: _heinz on 05 Jun 2007, 07:38:19 pm
und hier ist der neue client  ;D
------ Neues Erstellen gestartet: Projekt: seti_boinc, Konfiguration: Release32-NOGFX Win32 ------
Die Zwischen- und Ausgabedateien für das Projekt "seti_boinc" mit der Konfiguration "Release32-NOGFX|Win32" werden gelöscht.
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\seti\boinc\client\win" /I "C:\I\SC\seti\boinc\lib" /I "C:\I\SC\seti\boinc\api" /I "C:\I\SC\seti\boinc" /I "." /I "../../../boinc/api" /I "../../../boinc/lib" /I ".." /I "glut" /D "WIN32" /D "_WIN32" /D "NDEBUG" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MT /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc80.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\..\db\xml_util.cpp"
   "..\worker.cpp"
   "..\..\..\boinc\api\windows_opengl.C"
   "..\version.cpp"
   "..\..\..\boinc\lib\util.C"
   "..\timecvt.cpp"
   "..\..\image_libs\tgalib.cpp"
   "..\..\db\sqlrow.cpp"
   "..\..\db\sqlblob.cpp"
   "..\spike.cpp"
   "..\..\..\boinc\lib\shmem.C"
   "..\seti_header.cpp"
   "..\seti.cpp"
   "..\..\db\schema_master.cpp"
   "..\sah_gfx_base.cpp"
   "..\sah_gfx.cpp"
   "..\s_util.cpp"
   "..\pulsefind.cpp"
   "..\progress.cpp"
   "..\..\..\boinc\lib\parse.C"
   "..\malloc_a.cpp"
   "..\main.cpp"
   "..\lcgamm.cpp"
   "..\..\..\boinc\api\gutil.C"
   "..\..\..\boinc\api\graphics_data.C"
   "..\..\..\boinc\api\graphics_api.C"
   "..\gdata.cpp"
   "..\gaussfit.cpp"
   "..\..\..\boinc\lib\filesys.C"
   "..\fft8g.cpp"
   "..\chirpfft.cpp"
   "..\..\..\boinc\api\boinc_api.C"
   "..\..\..\boinc\lib\app_ipc.C"
   "..\analyzeReport.cpp"
   "..\analyzePoT.cpp"
   "..\analyzeFuncs.cpp"
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
analyzePoT.cpp
--- BenSpectrum ---
analyzeReport.cpp
app_ipc.C
boinc_api.C
chirpfft.cpp
fft8g.cpp
filesys.C
gaussfit.cpp
gdata.cpp
graphics_api.C
graphics_data.C
gutil.C
lcgamm.cpp
main.cpp
malloc_a.cpp
parse.C
progress.cpp
pulsefind.cpp
s_util.cpp
Code wird generiert...
Kompilieren...
sah_gfx.cpp
sah_gfx_base.cpp
schema_master.cpp
seti.cpp
seti_header.cpp
..\seti_header.cpp(93) : warning C4267: 'Initialisierung': Konvertierung von 'size_t' nach 'int', Datenverlust möglich
shmem.C
spike.cpp
sqlblob.cpp
sqlrow.cpp
tgalib.cpp
timecvt.cpp
util.C
version.cpp
windows_opengl.C
worker.cpp
xml_util.cpp
Code wird generiert...
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(663) : warning C4717: 'xml_match_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(689) : warning C4717: 'xml_find_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
Verknüpfen...
Microsoft (R) Incremental Linker Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.
"/OUT:.\Release32-NOGFX\seti_boinc.exe" /INCREMENTAL:NO "/LIBPATH:C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" "/LIBPATH:C:\I\INTEL\IPP\5.2_beta\ia32\lib" "/LIBPATH:C:\I\INTEL\MKL\9.0\ia32\lib" "/LIBPATH:C:\masm32\lib" "/LIBPATH:C:\I\VS8\VC\lib" "/LIBPATH:C:\I\SDK\Lib" "/LIBPATH:C:\masm32\m32lib" "/LIBPATH:C:\I\SDK\Lib\AMD64" "/LIBPATH:C:\I\SDK\Lib\IA64" /MANIFEST:NO "/PDB:c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.pdb" /MAP /SUBSYSTEM:WINDOWS /MACHINE:X86 glut32.lib glut.lib glu32.lib optimizer.lib image_libs.lib jpeglib.lib libboinc.lib libboincapi.lib non_ICC.lib setiboincdb.lib ippsmerged.lib ippvmmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmt.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
".\Release32-NOGFX\analyzeFuncs.obj"
".\Release32-NOGFX\analyzePoT.obj"
".\Release32-NOGFX\analyzeReport.obj"
".\Release32-NOGFX\app_ipc.obj"
".\Release32-NOGFX\boinc_api.obj"
".\Release32-NOGFX\chirpfft.obj"
".\Release32-NOGFX\fft8g.obj"
".\Release32-NOGFX\filesys.obj"
".\Release32-NOGFX\gaussfit.obj"
".\Release32-NOGFX\gdata.obj"
".\Release32-NOGFX\graphics_api.obj"
".\Release32-NOGFX\graphics_data.obj"
".\Release32-NOGFX\gutil.obj"
".\Release32-NOGFX\lcgamm.obj"
".\Release32-NOGFX\main.obj"
".\Release32-NOGFX\malloc_a.obj"
".\Release32-NOGFX\parse.obj"
".\Release32-NOGFX\progress.obj"
".\Release32-NOGFX\pulsefind.obj"
".\Release32-NOGFX\s_util.obj"
".\Release32-NOGFX\sah_gfx.obj"
".\Release32-NOGFX\sah_gfx_base.obj"
".\Release32-NOGFX\schema_master.obj"
".\Release32-NOGFX\seti.obj"
".\Release32-NOGFX\seti_header.obj"
".\Release32-NOGFX\shmem.obj"
".\Release32-NOGFX\spike.obj"
".\Release32-NOGFX\sqlblob.obj"
".\Release32-NOGFX\sqlrow.obj"
".\Release32-NOGFX\tgalib.obj"
".\Release32-NOGFX\timecvt.obj"
".\Release32-NOGFX\util.obj"
".\Release32-NOGFX\version.obj"
".\Release32-NOGFX\windows_opengl.obj"
".\Release32-NOGFX\worker.obj"
".\Release32-NOGFX\xml_util.obj"
Browseinformationsdatei wird erstellt...
Microsoft Browse Information Maintenance-Programm Version 8.00.50727
Copyright (C) Microsoft Corporation. All rights reserved.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
seti_boinc - 0 Fehler, 3 Warnung(en)
========== Alles neu erstellen: 1 erfolgreich, Fehler bei 0, 0 übersprungen ==========
Title: Re: optimized sources
Post by: _heinz on 06 Jun 2007, 04:31:48 am
...
--------------------------------------------------------------------------------------
47       overhead = std::min(overhead, ticks);

74      min_ticks = std::min(min_ticks, ticks);
--------------------------------------------------------------------------------------
any suggestions ???
I don't know if my solution would be the best or fastest, but i hope it works (just basic C) :

overhead = (overhead <= ticks) ? overhead : ticks;
min_ticks = (min_ticks <= ticks) ? min_ticks : ticks;


Merci Urs  ;)
Title: Re: optimized sources
Post by: _heinz on 06 Jun 2007, 05:10:54 am
Thank you very much,

all of you, who supported me on the long way to the client, reading my threads, answered my questions and give me hints and tips to bring the application forward to a sucessful compiled version.
Especially thanks to Crunch3r, Josef Segur, Simon and Urs Echternacht.

Now the next step is test, debug and work with test WU´s, to show that this client is really errorfree.
If this is so, I will download the Intel compiler to produce the speed-optimized version of this client, for everybody using.

Hoping at your further support to realize this.

Regards seti_britta ~heinz

Title: Re: optimized sources
Post by: _heinz on 06 Jun 2007, 07:56:21 am
This is the first try to run the client with a test WU
How you see it crashes early at the beginning. Searching now, why this happen.
KWSN_2.2B shows what it should do.
--------------------------------------------------------------------------------------------
Can't set up shared mem: -1
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Stack Overflow (0xc00000fd) at address 0x7C9281B8

Engaging BOINC Windows Runtime Debugger...

********************

BOINC Windows Runtime Debugger Version 5.5.0

Dump Timestamp    : 06/06/07 13:13:14
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
Can't set up shared mem: -1
--------------------------------------------------------------------------------------------
KWSN_2.2B

------- [ benchmark ] --------
               PowerSpectrum--:  14544120 x1.00        0 -- [ avg magnitude =   1.8306 (50)]
                    sse_GetTPS:  14802484 x0.98        0
                     hand_sse2:   6885508 x2.11        0
                   sse2_GetTPS:   7181316 x2.03        0
    PowerSpectrum--[hand_sse2]:   6885508 (chosen)
------------------------------
                PwrSpectOnly--:    897144 x1.00        0 -- [ avg magnitude =   1.8306 (50)]
                sse_GetPSO_npr:    783748 x1.14        0
                sse_GetPSO_p32:    856640 x1.05        0
                sse_GetPSO_p64:    850568 x1.05        0
               sse_GetPSO_p128:    844984 x1.06        0
PwrSpectOnly--[sse_GetPSO_npr]:    783748 (chosen)
------------------------------
                   Transpose--:  16412208 x1.00        0 -- [ avg magnitude =   0.9994 (50)]
                    Transpose2:   8550568 x1.92        0
                    Transpose4:   4740412 x3.46        0
                 sse_Trans4ntw:   2524936 x6.50        0
             sse_pfTrans8x4ntw:   2591696 x6.33        0
    Transpose--[sse_Trans4ntw]:   2524936 (chosen)
------------------------------
                   ChirpData--: 225408392 x1.00        0 -- [ avg magnitude =   0.9727 (10)]
                     TrigArray:  61711612 x3.65 1.6e-009
                  sse1_akChirp:  31697636 x7.11 1.1e-008
                  sse2_akChirp:  26079180 x8.64 1.1e-008
     ChirpData--[sse2_akChirp]:  26079180 (chosen)
------------------------------
                     GetPeak--:    268072 x1.00        0 -- [ avg magnitude =   0.9727 (50)]
                      hand_opt:    273164 x0.98   3e-007 t=-39719.3711 o=-39719.3594
                    sse_vector:     89852 x2.98   3e-007 t=-39719.3711 o=-39719.3594
         GetPeak--[sse_vector]:     89852 (chosen)
------------------------------
                       f_sum--:    312656 x1.00        0 -- [ avg magnitude =   0.9727 (50)]
                       unroll4:    402960 x0.78 7.5e-008 t=4044.4708 o=4044.4705
                      hand_sse:    412668 x0.76 8.7e-008 t=4044.4702 o=4044.4705
                    sse_vector:    312636 x1.00        0 t=4044.4705 o=4044.4705
           f_sum--[sse_vector]:    312636 (chosen)
------------------------------
                    GetChiSq--:     59320 x1.00        0 -- [ avg magnitude =   0.9727 (50)]
                  hoisted+abs(:     61132 x0.97 3.7e-007 t=123.9554 o=123.9554
          GetChiSq--[original]:     59320 (chosen)
------------------------------
             IPP FFT SSE2(64K):   5605736 x1.00        0 -- [ avg magnitude =  30.3415 (50)]
   IPP FFT SSE2(64K)[original]:   5605736 (chosen)
------------------------------
Bench Time: 6.95 seconds

- [ pulse fold select ] -
                      Standard:   3224668 x1.00        0
                       FPU opt:   2903952 x1.11 1.7e-010
                       ben SSE:   2476848 x1.30        0
                        AK SSE:   2198280 x1.47 2.3e-007
                        BH SSE:   2229300 x1.45        0
                        AK SSE:   2198280 (chosen)
Test Time: 0.06 seconds

Title: Re: optimized sources
Post by: _heinz on 06 Jun 2007, 11:00:30 am
though about this error

if we enter analyseFuncs the first to do is --->Allocate data array and work area arrays.

here are the importend statements:
int seti_analyze( ANALYSIS_STATE &state  )
 ......
    NumDataPoints = state.npoints; //seti_britta: problem, npoints has no init value ?
struct ANALYSIS_STATE
    {
    sah_complex *data;
    sah_complex *savedWUData;   // Save the original WU data
    int         npoints;
    int         icfft;
    int         PoT_freq_bin;   // where we are in PoT analysis for this icfft

    // ... will be -1 if no PoT analysis in progress
    int         PoT_activity;
    int         doing_pulse;
    double      FLOP_counter;
    };
.......
    PowerSpectrum = ( float * ) MEM.calloc( "PowerSpectrum", NumDataPoints, sizeof(float) );
.......
we go now to MEM.calloc ---->
      void *calloc( const char *name, size_t size, size_t nitems, size_t alignment = MEM_ALIGN)
         {
         void *ptr = calloc_a( size, nitems, alignment );
         if ( ptr == NULL ) SETIERROR( MALLOC_FAILED, report( name ) );
//         fprintf(stderr, "(%1.1fK) %s \n", size*nitems/1024.0, name);
         return ptr;
         };
-----------------------------------------------------
maybe nitems has no or a wrong value
-----------------------------------------------------
if NumDataPoints has no value, calloc fails and how you can see there is no opcode written  to prevent this, so we get a errormassage from the system like this ---->Can't set up shared mem: -1, Unhandled Exception Detected...
----------------------------------------------------------
it looks like this is the error, we will see if this will be confirm.......
 ;)
Title: Re: optimized sources
Post by: _heinz on 06 Jun 2007, 12:42:22 pm

it looks like this is the error, we will see if this will be confirm.......
 ;)
it is necessary to have a look at main.cpp and seti.cpp where wrote_header writes values into the structure ANALYSIS_STATE  ;)
Title: Re: optimized sources
Post by: _heinz on 07 Jun 2007, 05:02:32 am
the critical problem is in malloc_a.h
there are 2 variables undefined ---> nitems, alignment
-------------------------------------------------------------------------------
68      void *calloc( const char *name, size_t size, size_t nitems, size_t alignment = MEM_ALIGN)
69      {
70         void *ptr = calloc_a( size, nitems, alignment  );
71         if ( ptr == NULL ) SETIERROR( MALLOC_FAILED, report( name ) );
72//         fprintf(stderr, "(%1.1fK) %s \n", size*nitems/1024.0, name);
73         return ptr;
74         };

81      void __fastcall free( void * ptr )   { __OUR_FREE( ptr ); };

------------------------------------------------------------------------------------------------------------------
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(68) : warning C4002: Zu viele übergebene Parameter für das Makro 'calloc'
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(68) : error C2059: Syntaxfehler: 'Konstante'
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(81) : error C2059: Syntaxfehler: 'Konstante'
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(70) : error C2065: 'nitems': nichtdeklarierter Bezeichner
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(70) : error C2065: 'alignment': nichtdeklarierter Bezeichner
------------------------------------------------------------------------------------------------------------------------------------------------------------------
that´s the reason why the application crashes on the first call of calloc 
and this happen in main.cpp in the first lines --->

int main( int argc, char **argv )
    {
    int retval = 0, i;
    FORCE_FRAME_POINTER;
    run_stage = PREGRX;
    g_argc = argc;

    if ( !(g_argv = ( char ** ) calloc( argc + 2, sizeof(char *) )) )
        {
        exit( MALLOC_FAILED );
        }
----------------------------------------------------------------------------------------------------------------------------------
seti_britta ~heinz   ;)
Title: Re: optimized sources
Post by: _heinz on 07 Jun 2007, 02:43:17 pm
quote
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(68) : warning C4002: Zu viele übergebene Parameter für das Makro 'calloc'
--------------------------
calloc braucht genau 2 Parameter beim Aufruf
calloc(anzahl_der_elemente, groesse_der_elemente)
-------------------------------------------------------------------------------

next problem is the multideclaration of FORCE_FRAME_POINTER;
as you can see it is declared in s_util.h, line 133 and in seti.h line 144
and there are no cunstruct in the headerfiles to prevent this.
for instance:
#ifndef FORCE_FRAME_POINTER
#define FORCE_FRAME_POINTER
#endif
----------------------------------------------------------------
zur Erinnerung: ich habe es schon einmal in #36 gesagt --->
-----------------------------------------------------------------
Hauptsächliche Probleme im Projekt sind:
1. Migrationsprobleme -->siehe http://msdn2.microsoft.com/de-de/library/ms235289(VS.80).aspx
2. varalteter Deklarationsstil
3. Typkonvertierungen
4. Konvertierungen bei Parameterübernahme und Rückgabe in Funktionen.
5. Parameter in Makrodefinitionen
6. Benutzung nicht definierter Variablen
7. Multideclarationen (Mehrdeutigkeit)

um mal einige zu nennen

------------------------------------------------------------------
die Fehler in 4. und 5. sind oft schwierig zu finden, weil hier meist keine Typüberprüfung durch den Compiler durchgeführt wird. Dafür ist allein der Programmierer verantwortlich.
Das erklärt auch die Tatsache dass ein fehlerfrei compiliertes  Programm ,wie in unserem Fall geschehen, bei der Ausführung abstürzt .

seti_britta ~heinz  ;)
Title: Re: optimized sources
Post by: Josef W. Segur on 07 Jun 2007, 06:41:39 pm
...
next problem is the multideclaration of FORCE_FRAME_POINTER;
as you can see it is declared in s_util.h, line 133 and in seti.h line 144
and there are no cunstruct in the headerfiles to prevent this.
for instance:
#ifndef FORCE_FRAME_POINTER
#define FORCE_FRAME_POINTER
#endif
...

That seems to have crept into our sources before 2.0, I suggest changing to what the cvs sources now have; no define in seti.h and the following in s_util.h :

Code: [Select]
// The MS & intel compilers don't allow alloca in try/catch blocks
#ifndef FORCE_FRAME_POINTER
#if ( defined(HAVE_ALLOCA) || defined(_WIN32) ) && !( defined(__INTEL_COMPILER) || defined (_MSC_VER) )
#define FORCE_FRAME_POINTER alloca(16)
#else
#define FORCE_FRAME_POINTER (0)
#endif
#endif

Note: Crunch3r identified the error with Intel compiler, and the changes were made after that. But the 5.15 code from which ours has evolved was before.
                                                                                        Joe
Title: Re: optimized sources
Post by: _heinz on 08 Jun 2007, 09:43:21 am
Merci Joe, thank you for your comment.
I use the source seti_boinc_2k3_2.2B1-Ben-Joe.7z from 02.03.2007. Hoping that it is stiil actual.
regards seti_britta ~heinz
Title: Re: optimized sources
Post by: _heinz on 08 Jun 2007, 01:56:09 pm
@Simon
are there any changes in malloc_a.h since 2.2B source published ??
~heinz
Title: Re: optimized sources
Post by: Simon on 08 Jun 2007, 05:17:58 pm
I don't think so, no.

Regards,
Simon.
Title: Re: optimized sources
Post by: _heinz on 12 Jun 2007, 08:41:02 am
The difficulties to compile malloc_a.h
----------------------------------------------------
1. to find the eror we copy the first lines from main.cpp into a new project named seti_start. We set the necessary include paths and can compile now.
here ist the short program:
---------------------------------------------------------------------------------------------------------------------------------------------------------------
// start_seti.cpp : Definiert den Einstiegspunkt für die Konsolenanwendung.
//
// Main program for command-line application.
// Usage: client [options]
//      -version show version info
//      -verbose print running status
//      -standalone
//      -bench
//      -show_benchmark

#include "stdafx.h"
#include "config.h"
    #include "boinc_win.h"
#include "diagnostics.h"
#include "util.h"
#include "s_util.h"
#include "boinc_api.h"
//#include "util.h"
//#include "s_util.h"
#include "analyze.h"
#include "analyzeFuncs.h"
#include "analyzePoT.h"
#include "worker.h"
#include "version.h"
#include "chirpfft.h"
#include "gaussfit.h"

#include "optimize.hpp"


// =======================================================================================
//    usage -
// =======================================================================================
void usage( void )
    {
    printf( "options:\n"
    #ifdef BOINC_APP_GRAPHICS
    " -nographics run without graphics\n"
    #endif
    " -version  show version info\n -verbose  print running status\n -standalone \n" );
    }

// =======================================================================================
//    print_error -
// =======================================================================================
void print_error( int e )
    {
    char    *p;
    p = error_string( e );
    fprintf( stderr, "%s\n", p );
    }

// =======================================================================================
//    print_version -
// =======================================================================================
void print_version( void )
    {
    printf( "SETI@home client.\nVersion: %d.%02d\n", gmajor_version, gminor_version );
    printf(
        "\nSETI@home is sponsored by individual donors around the world.\nIf you'd like to contribute to the project,\nplease visit the SETI@home web site at\nhttp://setiathome.ssl.berkeley.edu.\nThe project is also sponsored by the Planetary Society,\nthe University of California, Sun Microsystems, Paramount Pictures,\nFujifilm Computer Products, Informix, Engineering Design Team Inc,\nThe Santa Cruz Operation (SCO), Intel, Quantum Corporation,\nand the SETI Institute.\n\nSETI@home was developed by David Gedye (Founder),\nDavid Anderson (Director), Dan Werthimer (Chief Scientist),\nHiram Clawson, Jeff Cobb, Charlie Fenton,\nEric Heien, Eric Korpela, Matt Lebofsky,\nTetsuji 'Maverick' Rai and Rom Walton\n" );
    }

static int  g_argc;
static char **g_argv;
extern double           sigma_thresh;
extern double           f_PowerThresh;
extern double           f_PeakPowerThresh;
extern double           chi_sq_thresh;
bool                    nographics_flag;

int                     run_stage;
typedef seti_error      boinc_error;
extern APP_INIT_DATA    app_init_data;

//int _tmain(int argc, _TCHAR* argv[])
//{
int main( int argc, char **argv )
    {
    int retval = 0, i;
    FORCE_FRAME_POINTER;
    run_stage = PREGRX;
    g_argc = argc;

    if ( !(g_argv = ( char ** ) calloc( argc + 2, sizeof(char *) )) )
        {
        exit( MALLOC_FAILED );
        }

    setbuf( stdout, 0 );

    bool    standalone = false;
    #ifdef BOINC_APP_GRAPHICS
        nographics_flag = false;
    #else
        nographics_flag = true;
    #endif
   show_benchmark = false;
   g_argv[0] = argv[0];

    for ( i = 1; i < argc; i++ )
        {
        g_argv = argv;

        char    *p = argv;
        while ( *p == '-' ) p++;
        if ( !strncmp( p, "vers", 4 ) )
            {
            print_version();
            exit( 0 );
            }
        else if ( !strncmp( p, "verb", 4 ) )
            {
            verbose = 1;
            show_benchmark = true;
            }
        else if ( !strncmp( p, "st", 2 ) )
            {
            standalone = true;
            nographics_flag = true;
            }
        else if ( !strncmp( p, "no", 2 ) )
            {
            nographics_flag = true;
            }
        else if ( !strncmp( p, "bench", 5 ) )
            {
            show_benchmark = true;
            }
        else if ( !strncmp( p, "h", 1 ) )
            {
            usage();
            }
        }


   return 0;
}
------------------------------------------------------------------------------------------------------------



As you have seen bevore  there were 2 undefined variables, nitems, alignment. No problem, if we define it like this
      unsigned static int nitems, alignment;
We had todo this at the beginning in the class c_MEM. Here are the changed malloc_a.h
-------------------------------------------------------------------------------------------------------------------------

#if !defined( MALLOC_A_H )
#define MALLOC_A_H

#define MEM_ALIGN   64

#include "s_util.h"

void    * __fastcall malloc_a( size_t size, size_t alignment );
void    * __fastcall calloc_a( size_t size, size_t nitems, size_t alignment );
void    __fastcall free_a( void *palignedMem );

#if defined( USE_FFTWF )
   #include "fftw3.h"
   #define __OUR_MALLOC(size, align) fftwf_malloc( size )
   #define __OUR_FREE( ptr ) fftwf_free( ptr );
#elif defined( HAVE_MEMALIGN )
   #define __OUR_MALLOC(size, align) memalign( align, size )
   #define __OUR_FREE( ptr ) free( ptr );
#else
   #define __OUR_MALLOC(size, align) malloc_a( size, align )
   #define __OUR_FREE( ptr ) free_a( ptr );
#endif
class c_MEM {
   public:
      c_MEM() : used(0) {};
      ~c_MEM() {}   
      unsigned static int nitems, alignment;  // seti_britta: new
      void *alloc( const char *name, size_t size, size_t alignment = MEM_ALIGN)
         {
         void *ptr = __OUR_MALLOC( size, alignment );
         if ( ptr == NULL ) SETIERROR( MALLOC_FAILED, report( name ) );
//         fprintf(stderr, "(%1.1fK) %s \n", size/1024.0, name);
         return ptr;
         };
//seti_britta:syntaxfehler:´Konstante´ -->konstanten in fkt-Aufrufen sind unzulässig !
      void *calloc( const char *name, size_t size, size_t nitems, size_t alignment = MEM_ALIGN)
         {
         void *ptr = calloc_a( size, nitems, alignment );//error C2065: 'nitems': nichtdeklarierter Bezeichner, nitems, alignment
         if ( ptr == NULL ) SETIERROR( MALLOC_FAILED, report( name ) );
//         fprintf(stderr, "(%1.1fK) %s \n", size*nitems/1024.0, name);
         return ptr;
         };
      void *callocQ( const char *name, size_t size, size_t nitems, size_t alignment = MEM_ALIGN)
         {
         void *ptr = calloc_a( size, nitems, alignment );
         if ( ptr == NULL ) SETIERROR( MALLOC_FAILED, report( name ) );
         return ptr;
         };
      //seti_britta: malloc_a.h(83) : error C2059: Syntaxfehler: 'Konstante'
      void __fastcall free( void * ptr )   { __OUR_FREE( ptr ); };
//      void __fastcall free_a(void * ,1 );

      const char * __fastcall report( const char *name);
      int   used;
   };

// Create an actual structure to call based on
extern class c_MEM MEM;

#endif
--------------------------------------------------------------------------------------------------------------------------------------------------
If we compile now, we get:
Die Zwischen- und Ausgabedateien für das Projekt "start_seti" mit der Konfiguration "Debug|Win32" werden gelöscht.
Kompilieren...
stdafx.cpp
Kompilieren...
start_seti.cpp
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(68) : warning C4002: Zu viele übergebene Parameter für das Makro 'calloc'
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(68) : error C2059: Syntaxfehler: 'Konstante'
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\malloc_a.h(83) : error C2059: Syntaxfehler: 'Konstante'
Das Buildprotokoll wurde unter "file://c:\I\VS2005\Projects\start_seti\start_seti\Debug\BuildLog.htm" gespeichert.
start_seti - 2 Fehler, 1 Warnung(en)
--------------------------------------------------------------------------------------------------------------------------------------------------------------
the problems are

68       void *calloc( const char *name, size_t size, size_t nitems, size_t alignment = MEM_ALIGN)
69         {
70         void *ptr = calloc_a( size, nitems, alignment );//error C2065: 'nitems': nichtdeklarierter Bezeichner, nitems, alignment
71         if ( ptr == NULL ) SETIERROR( MALLOC_FAILED, report( name ) );
72//         fprintf(stderr, "(%1.1fK) %s \n", size*nitems/1024.0, name);
73         return ptr;
74         };
----------------------------------------------------
83      void __fastcall free( void * ptr )   { __OUR_FREE( ptr ); };
----------------------------------------------------------------------------------------------------------------------------------------------
if we now goto 68 and look how calloc is declared we found in boinc_win.h --->
#define calloc(c, s)                          _calloc_dbg(c, s, _NORMAL_BLOCK, __FILE__, __LINE__)
if we now look how _calloc_dbg is declared we found a multideclaration(Mehrdeutigkeit) in two different files:
1. line 634 stdlib.h
_CRTIMP _CRT_JIT_INTRINSIC  _CRTNOALIAS _CRTRESTRICT __checkReturn __bcount_opt(_NumOfElements* _SizeOfElements)    void * __cdecl calloc(__in size_t _NumOfElements, __in size_t _SizeOfElements);

2. line 676 ff crtdbg.h
_CRTIMP __checkReturn __bcount_opt(_NumOfElements*_SizeOfElements) void * __cdecl _calloc_dbg(
        __in size_t _NumOfElements,
        __in size_t _SizeOfElements,
        __in int _BlockType,
        __in_z_opt const char * _Filename,
        __in int _LineNumber
        );
-----------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------
let´s now go to line 83 -->
83      void __fastcall free( void * ptr )   { __OUR_FREE( ptr ); };
we look now where  __OUR_FREE is declared and found in malloc_a.h line 53 --->
53   #define __OUR_FREE( ptr ) free_a( ptr );
we go now where free_a is declared and found in line 43 malloc_a.h --->
43 void    __fastcall free_a( void *palignedMem );
we had todo now set into the macro to resolve it........

The multideclaration is the problem, any suggestions to solve it are welcome
regards seti_britta ~heinz
Title: Re: optimized sources
Post by: Josef W. Segur on 12 Jun 2007, 02:17:35 pm
Overloaded functions are a feature of C and C++ so the compiler distinguishes between functions with the same name by the number and type of arguments. Name mangling distinguishes them in later parts of the compilation. There are fairly complex rules which allow for automatic conversion of types, and functions can actually be called with more arguments than shown in the declaration.

So, you are trying to do a DEBUG build and the overloaded calloc() definition in line 68 of malloc_a.h has the same number and type of arguments as the __calloc_dbg() which the boinc_win.h macro substitutes for calloc(). Conflict!

Crunch3r ran into the same thing when doing the Linux 2.2B builds so those sources have reverted all the MEM.calloc() etc. calls to calloc_a() etc. I think that's the best thing to do.
                                                                                      Joe
Title: Re: optimized sources
Post by: _heinz on 12 Jun 2007, 04:25:56 pm
Thank you Joe
found some hints in ---> « Reply #24 on: 11 Mar 2007, 05:24:07 pm » Porting Rev-2.x apps to Linux

will study this complex thema to look after a doable solution

further hints are welcome

regards seti_britta ~heinz
Title: Re: optimized sources
Post by: _heinz on 13 Jun 2007, 11:24:33 am
Hi all,
I use a dual monitor installation for working. My second Monitor died last night.  :'(
Today I ordered a new 22" multisync flatscreen from LG. Expected in ca. 4 days.


 
Title: Re: optimized sources
Post by: _heinz on 15 Jun 2007, 04:28:11 pm
Did downlod the Intel-Compiler 10, but use in "Visual Studio 2005 Express Edition" is still possible in the command line.  :'(    or have anybody of you other expirience with it ??

Title: Re: optimized sources
Post by: Crunch3r on 15 Jun 2007, 04:50:34 pm
Did downlod the Intel-Compiler 10, but use in "Visual Studio 2005 Express Edition" is still possible in the command line.  :'(    or have anybody of you other expirience with it ??



Sorry i think ICC only works with min. VS 2005 std. in GUI mode... never used it with express... can't help you about that one.

Title: Re: optimized sources
Post by: Simon on 15 Jun 2007, 06:16:49 pm
Well, it does work using command-line and batch files in VS 2005 express. It's just that you cannot use it like with the standard VS 2005 in a GUI.

Basically, you'll have to set a few environment variables, run the appropriate icc_env.bat and ipp_env.bat inside your compilation shell for this. When you look into ICC's or IPP's directory, you'll see a "bin" directory, that's where the batch files are in.

HTH,
Simon.
Title: Re: optimized sources
Post by: _heinz on 17 Jun 2007, 09:37:46 am
Merci Crunch3r and Simon.
------------------------------------
It would be nice to use the Intel-Compiler with the GUI. Therefore I installed the new Microsoft Visual Studio Codename Orcas
Version 9.0.20404.0 Beta1
Microsoft .NET Framework
Version 2.0.50727
Installed Edition: Professional
----------------------------------------
Installation runs without problems. After settings the necessary paths I converted seti_boinc to the new developer environment. No problems. After that I deinstalled the Intel-Compiler, started the machine new again and installed the Intel-Compiler new to run into the new environment. Now start again the machine to see if the Intel-Compiler is now available in the GUI.
I´m surprised, no menu-point in the Solutionexplorer to convert to the Intel- environment.
It looks like to use the IntelCompiler in VisualStudio Orcas is not possible
Or I forgot to do a important step.
----------------------------------------------------
Did anybody of you use the Intel-Compiler 10.0.025 already in Orcas ???
regards

seti_britta ~heinz
Title: Re: optimized sources
Post by: Simon on 17 Jun 2007, 10:39:24 am
Hi Heinz,

since the new VS version is still in Beta, it's not supported by the Intel tools yet AFAIK.

HTH,
Simon.
Title: Re: optimized sources
Post by: _heinz on 17 Jun 2007, 07:50:40 pm
Thanks Simon,

some news: if you compile the same stuff with VS2005 and with orcas, orcas show errors where VS2005 found nothing.
uuuh....
like this:
------ Build started: Project: libboinc, Configuration: Release32-NOGFX Win32 ------
Compiling...
stackwalker_win.cpp
..\..\..\boinc\lib\stackwalker_win.cpp(614) : error C2664: 'BOOL (HANDLE,PSYM_ENUMMODULES_CALLBACK64,PVOID)' : cannot convert parameter 2 from 'overloaded-function' to 'PSYM_ENUMMODULES_CALLBACK64'
        None of the functions with this name in scope match the target type
----------------------------------------------------------------------------------------------------------
614    if (!pSEM(g_hProcess, SymEnumerateModulesProc, NULL))
615    {
616        _ftprintf(stderr, _T("SymEnumerateModules64(): GetLastError = %lu\n"), gle );
617    }
-----------------------------------------------------------------------------------------------------------------------%----
overloaded functions ---> its stress pur

Simon, ist the error above known you ??
Title: Re: optimized sources
Post by: Devaster on 17 Jun 2007, 08:31:14 pm
When i have compiled it this line havei commented out and it was whole working  ;D
Title: Re: optimized sources
Post by: _heinz on 17 Jun 2007, 09:24:30 pm
VS2005
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\seti\boinc\client\win" /I "C:\I\SC\seti\boinc\lib" /I "C:\I\SC\seti\boincX`pi" /I "C:\I\SC\seti\boinc" /I "." /I "../../../boinc/api" /I "../../../boinc/lib"/I ".." /I "glut" /D "WIN32" /D "_WIN32" /D "NDEBUG" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MT /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc80.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\..\db\xml_util.cpp"
   "..\worker.cpp"
...
seti_header.cpp
..\seti_header.cpp(93) : warning C4267: 'Initialisierung': Konvertierung von 'size_t' nach 'int', Datenverlust möglich

....next
....
xml_util.cpp
Code wird generiert...
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(663) : warning C4717: 'xml_match_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(689) : warning C4717: 'xml_find_tag{;: Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
Verknüpfen...
Microsoft (R) Incremental Linker Version 8.00.50727.762
Browseinformationsdatei wird erstellt...
Microsoft Browse Information Maintenance-Programm Version 8.00.50727
Copyright (C) Microsoft Corporation. All rights reserved.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
seti_boinc - 0 Fehler, 3 Warnung(en)
----------------------------------------------------------------------------------------------------------------------------------------------------
ORCAS
------ Rebuild All started: Project: seti_boinc, Configuration: Release32-NOGFX Win32 ------
Deleting intermediate and output files for project 'seti_boinc', configuration 'Release32-NOGFX|Win32'
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc" /I "." /I "../../../boinc/api" /I "../../../boinc/lib" /I ".." /I "glut" /D "WIN32" /D "_WIN32" /D "NDEBUG" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MT /Zp16 /Gy /Fp".\Rele!qe/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\..\db\xml_util.cpp"
   "..\worker.cpp"
   "..\..\..\boinc\api\windows_opengl.C"
   "..\version.cpp"
   "..\..\..\boinc\lib\util.C"
   "..\timecvt.cpp"
   "..\..\image_libs\tgalib.cpp"
   "..\..\db\sqlrow.cpp"
   "..\..\db\sqlblob.cpp"
   "..\spike.cpp"
   "..\..\..\boinc\lib\shmem.C"
   "..\seti_header.cpp"
   "..\seti.cpp"
   "..\..\db\schema_master.cpp"
   "..\sah_gfx_base.cpp"
   "..\sah_gfx.cpp"
   "..\s_util.cpp"
   "..\pulsefind.cpp"
   "..\progress.cpp"
   "..\..\..\boinc\lib\parse.C"
   "..\malloc_a.cpp"
   "..\main.cpp"
   "..\lcgamm.cpp"
   "..\..\..\boinc\api\gutil.C"
   "..\..\..\boinc\api\graphics_data.C"
   "..\..\..\boinc\api\graphics_api.C"
   "..\gdata.cpp"
   "..\gaussfit.cpp"
   "..\..\..\boinc\lib\filesys.C"
   "..\fft8g.cpp"
   "..\chirpfft.cpp"
   "..\..\..\boinc\api\boinc_api.C"
   "..\..\..\boinc\lib\app_ipc.C"
   "..\analyzeReport.cpp"
   "..\analyzePoT.cpp"
analyzePoT.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
--- BenSpectrum ---
analyzeReport.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
app_ipc.C
c:\i\sc\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
boinc_api.C
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
chirpfft.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
fft8g.cpp
filesys.C
c:\i\sc\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
gaussfit.cpp
gdata.cpp
graphics_api.C
graphics_data.C
gutil.C
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
lcgamm.cpp
main.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
malloc_a.cpp
parse.C
c:\i\sc\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
progress.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
pulsefind.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
s_util.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
sah_gfx.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
Generating Code...
Compiling...
sah_gfx_base.cpp
schema_master.cpp
seti.cpp
seti_header.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
..\seti_header.cpp(93) : warning C4267: 'initializing' : conversion from 'size_t' to 'int', possible loss of data
shmem.C
spike.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
sqlblob.cpp
sqlrow.cpp
tgalib.cpp
timecvt.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
util.C
c:\i\sc\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
version.cpp
windows_opengl.C
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
worker.cpp
xml_util.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
Generating Code...
c:\i\sc\vs90\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(663) : warning C4717: 'xml_match_tag' : recursive on all control paths, function will cause runtime stack overflow
c:\i\sc\vs90\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(689) : warning C4717: 'xml_find_tag' : recursive on all control paths, function will cause runtime stack overflow
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc" /I "." /I "../../../boinc/api" /I "../../../boinc/lib" /I ".." /I "glut" /D "WIN32" /D "_WIN32" /D "NDEBUG" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MT /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\analyzeFuncs.cpp"
analyzeFuncs.cpp
C:\I\SC\vs90\boinc\lib\util.h(81) : warning C4996: 'std::transform': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators'
        c:\I\VS9\VC\include\algorithm(708) : see declaration of 'std::transform'
-----IPP-----
-----SSE2-----
Linking...
Microsoft (R) Incremental Linker Version 9.00.20404
Copyright (C) Microsoft Corporation.  All rights reserved.
"/OUT:.\Release32-NOGFX\seti_boinc.exe" /INCREMENTAL:NO "/LIBPATH:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" "/LIBPATH:C:\I\INTEL\IPP\5.2_beta\ia32\lib" "/LIBPATH:C:\I\INTEL\MKL\9.0\ia32\lib" "/LIBPATH:C:\masm32\lib" "/LIBPATH:C:\I\VS8\VC\lib" "/LIBPATH:C:\I\SDK\Lib" "/LIBPATH:C:\masm32\m32lib" "/LIBPATH:C:\I\SDK\Lib\AMD64" "/LIBPATH:C:\I\SDK\Lib\IA64" /MANIFEST:NO "/MANIFESTUAC:level='asInvoker' uiAccess='false'" "/PDB:c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.pdb" /MAP /SUBSYSTEM:WINDOWS /DYNAMICBASE:NO /MACHINE:X86 glut32.lib glut.lib glu32.lib optimizer.lib image_libs.lib jpeglib.lib libboinc.lib libboincapi.lib non_ICC.lib setiboincdb.lib ippsmerged.lib ippvmmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmt.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib
".\Release32-NOGFX\analyzeFuncs.obj"
".\Release32-NOGFX\analyzePoT.obj"
".\Release32-NOGFX\analyzeReport.obj"
".\Release32-NOGFX\app_ipc.obj"
".\Release32-NOGFX\boinc_api.obj"
".\Release32-NOGFX\chirpfft.obj"
".\Release32-NOGFX\fft8g.obj"
".\Release32-NOGFX\filesys.obj"
".\Release32-NOGFX\gaussfit.obj"
".\Release32-NOGFX\gdata.obj"
".\Release32-NOGFX\graphics_api.obj"
".\Release32-NOGFX\graphics_data.obj"
".\Release32-NOGFX\gutil.obj"
".\Release32-NOGFX\lcgamm.obj"
".\Release32-NOGFX\main.obj"
".\Release32-NOGFX\malloc_a.obj"
".\Release32-NOGFX\parse.obj"
".\Release32-NOGFX\progress.obj"
".\Release32-NOGFX\pulsefind.obj"
".\Release32-NOGFX\s_util.obj"
".\Release32-NOGFX\sah_gfx.obj"
".\Release32-NOGFX\sah_gfx_base.obj"
".\Release32-NOGFX\schema_master.obj"
".\Release32-NOGFX\seti.obj"
".\Release32-NOGFX\seti_header.obj"
".\Release32-NOGFX\shmem.obj"
".\Release32-NOGFX\spike.obj"
".\Release32-NOGFX\sqlblob.obj"
".\Release32-NOGFX\sqlrow.obj"
".\Release32-NOGFX\tgalib.obj"
".\Release32-NOGFX\timecvt.obj"
".\Release32-NOGFX\util.obj"
".\Release32-NOGFX\version.obj"
".\Release32-NOGFX\windows_opengl.obj"
".\Release32-NOGFX\worker.obj"
".\Release32-NOGFX\xml_util.obj"
s_util.obj : error LNK2019: unresolved external symbol "public: static void __cdecl std::_Locinfo::_Locinfo_ctor(class std::_Locinfo *,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?_Locinfo_ctor@_Locinfo@std@@SAXPAV12@ABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@2@@Z) referenced in function "public: __thiscall std::_Locinfo::_Locinfo(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (??0_Locinfo@std@@QAE@ABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@1@@Z)
schema_master.obj : error LNK2001: unresolved external symbol "public: static void __cdecl std::_Locinfo::_Locinfo_ctor(class std::_Locinfo *,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?_Locinfo_ctor@_Locinfo@std@@SAXPAV12@ABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@2@@Z)
seti.obj : error LNK2001: unresolved external symbol "public: static void __cdecl std::_Locinfo::_Locinfo_ctor(class std::_Locinfo *,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?_Locinfo_ctor@_Locinfo@std@@SAXPAV12@ABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@2@@Z)
xml_util.obj : error LNK2001: unresolved external symbol "public: static void __cdecl std::_Locinfo::_Locinfo_ctor(class std::_Locinfo *,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?_Locinfo_ctor@_Locinfo@std@@SAXPAV12@ABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@2@@Z)
.\Release32-NOGFX\seti_boinc.exe : fatal error LNK1120: 1 unresolved externals
Creating browse information file...
Microsoft Browse Information Maintenance Utility Version 9.00.20404
Copyright (C) Microsoft Corporation. All rights reserved.
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 5 error(s), 23 warning(s)
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========
------------------------------------------------------------------------------------------------------------
any hints for the warnings or for the unresolved ext. symbols  ???
heinz
Title: Re: optimized sources
Post by: Simon on 18 Jun 2007, 07:53:08 am
Well - yeah - though you won't like it. Don't use ORCAS! ;) It's Beta, unsupported, and will only lead to more trouble.

Regards,
Simon.
Title: Re: optimized sources
Post by: _heinz on 18 Jun 2007, 05:21:03 pm
have a question

I want compile a FFTWF client.
It is still necessary to use USE_FFTWF alone without IPP ?? It seems so, but I´m not sure. It looks like benchmark.cpp has a problem with FFTWF. Some changes were necessary to compile benchmark.cpp with FFTWF.
Are there changes known in 2.2B benchmark.cpp since it was published ??
-----------------------------------------------------------------------------------------------------------------------------------------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\MKL\9.0\include\fftw" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /D "USE_FFTWF" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MT /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\benchmark.cpp"
benchmark.cpp
-----FFTWF-----
-----fftwf-----
.\benchmark.cpp(627) : error C2664: 'fftwf_execute_dft': Konvertierung des Parameters 2 von 'float *' in 'fftwf_complex (*)' nicht möglich
        Die Typen, auf die verwiesen wird, sind nicht verknüpft; die Konvertierung erfordert einen reinterpret_cast-Operator oder eine Typumwandlung im C- oder Funktionsformat.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 1 Fehler, 0 Warnung(en)
----------------------------------------------------------------------------------------------------------------
         case _FFT:
            #if defined( USE_IPP )
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            #elif defined( USE_FFTWF )
627               fftwf_execute_dft( da_fft_plan, in_buf[0],   out_buf ); <---error
            #endif
            break;
---------------------------------------------------------------------------------------------------------------------------------------------
any news about it known ??

regards heinz
Title: Re: optimized sources
Post by: Crunch3r on 18 Jun 2007, 05:25:35 pm
 change it to ---> fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );

It has allready been discussed here on the board so a search would have helped in the first place.  ;)

EDIT


Simply putting a "//" in front of that line would have helped to cuz you don't need that line at all nor do you need to bench FFTW3.





Title: Re: optimized sources
Post by: _heinz on 18 Jun 2007, 07:02:21 pm
merci Crunch3r  :)
Title: Re: optimized sources
Post by: _heinz on 18 Jun 2007, 07:22:40 pm
FFTWF compiled   ;D
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\MKL\9.0\include\fftw" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /D "USE_FFTWF" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "NDEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MT /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\benchmark.cpp"
benchmark.cpp
-----FFTWF-----
-----fftwf-----
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 0 Fehler, 0 Warnung(en)
-------------------------------------------------------------------------------------------------
regards  ;D  ;)
Title: Re: optimized sources
Post by: _heinz on 18 Jun 2007, 09:06:17 pm
have now FFTWF Optimizer sucessful compiled.
compiled FFTWF client , but the linker shows still some not resolved external references.
--------------------------------------------------------------------------------------------------------------------------

analyzeFuncs.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_import_wisdom_from_string" in Funktion ""void __cdecl load_wisdom(void)" (?load_wisdom@@YAXXZ)".
analyzeFuncs.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_set_timelimit" in Funktion ""void __cdecl load_wisdom(void)" (?load_wisdom@@YAXXZ)".
analyzeFuncs.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_export_wisdom_to_string" in Funktion ""void __cdecl save_wisdom(void)" (?save_wisdom@@YAXXZ)".
analyzeFuncs.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_plan_dft_1d" in Funktion ""void __cdecl do_generate_fft_coeff(void)" (?do_generate_fft_coeff@@YAXXZ)".
Optimizer.lib(benchmark.obj) : error LNK2001: Nicht aufgelöstes externes Symbol "_fftwf_plan_dft_1d".
analyzeFuncs.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_execute_dft" in Funktion ""void __cdecl do_transpose(void)" (?do_transpose@@YAXXZ)".
Optimizer.lib(benchmark.obj) : error LNK2001: Nicht aufgelöstes externes Symbol "_fftwf_execute_dft".
analyzeFuncs.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_destroy_plan" in Funktion ""void __cdecl do_return_best_of_signals(void)" (?do_return_best_of_signals@@YAXXZ)".
analyzeFuncs.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_execute" in Funktion ""int __cdecl v_BaseLineSmooth(float (*)[2],int,int,int)" (?v_BaseLineSmooth@@YAHPAY01MHHH@Z)".
malloc_a.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_malloc" in Funktion ""void * __cdecl malloc_a(unsigned int,unsigned int)" (?malloc_a@@YAPAXII@Z)".
malloc_a.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_fftwf_free" in Funktion ""void __cdecl free_a(void *)" (?free_a@@YAXPAX@Z)".
.\Release32-NOGFX\seti_boinc.exe : fatal error LNK1120: 9 nicht aufgelöste externe Verweise.
--------------------------------------------------------------------------------------------------------------------------------------------
??
now I´m searching    :o
----------------------------------------------------------------------------------------
IPP SSE2 is compiled and linked sucessful   ;D
IPP MMX is compiled and linked sucessful    ;D


Title: Re: optimized sources
Post by: _heinz on 21 Jun 2007, 06:19:36 pm
Who know anything about this :
in the configuration of Release32-NOGFX  I can compile opt_FPU.cpp  from Optimizer with 0 error.
When I now change the configuration to Debug (as Simon described it)  it shows the following:

------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Debug Win32 ------
Kompilieren...
opt_FPU.cpp
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C2146: Syntaxfehler: Fehlendes ';' vor Bezeichner 'worker_thread_handle'
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\boinc\api\boinc_api.h(123) : error C4430: Fehlender Typspezifizierer - int wird angenommen. Hinweis: "default-int" wird von C++ nicht unterstützt.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\sincos.h(31) : error C2006: '#include': Dateinamen erwartet, aber 'Bezeichner' gefunden
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\sincos.h(31) : fatal error C1083: Datei (Include) kann nicht geöffnet werden: "": No such file or directory
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Debug\BuildLog.htm" gespeichert.
Optimizer - 5 Fehler, 0 Warnung(en)
----------------------------------------------------------------------------------------------------------------------------------------
if we go to sincos.h we see:
29 #ifndef _SINCOS_H_
30   #define _SINCOS_H_
31    #include CMATH_LIB     <----error
    #ifdef __cplusplus
      extern "C"
      {
    #endif
---------------------------------------------------------------------------------
if we now go where CMATH_LIB is defined we see in win-config.h ---->
   // Harold Naparst's suggestion
   #ifdef __INTEL_COMPILER
      /* Should work but doesn't - more work required
      #undef _INC_MATH
      #define MATH_LIB <mathimf.h>
      #define CMATH_LIB <mathimf.h>
      */
      #define MATH_LIB <math.h>
      #define CMATH_LIB <cmath>
   #else
      #define MATH_LIB <math.h>             
      #define CMATH_LIB <cmath>       <----
   #endif
----------------------------------------------------------------------------------------------------------------------------
But it looks like this will not be inserted into sincos.h in the #include statement.

If I write it direct in the #include  in sincos.h ----->
#include <cmath>
the necessary path is set alright c:\I\VS8\Include

then it produces a lot of new errors
--------------------------------------------------------------------------------------------------------------------------
did control possible values, but nothing found
iI don´understan that it compile under Release32-NOGFX  without problems
All other projects compile fine under Debug

any hints and suggestions are welcome
regards   :'(


Title: Re: optimized sources
Post by: _heinz on 21 Jun 2007, 06:30:26 pm
Release32-NOGFX   ----->
------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Release32-NOGFX Win32 ------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\MKL\9.0\include\fftw" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_LIB" /D "_MT" /D "CLIENT"/D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "DEBUG" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MT /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc80.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\opt_FPU.cpp"
opt_FPU.cpp
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" gespeichert.
Optimizer - 0 Fehler, 0 Warnung(en)
========?= Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========

nobody knows why ??  I´m searching   :'(
Title: Re: optimized sources
Post by: _heinz on 21 Jun 2007, 07:12:23 pm
uhhh   found the problem

iin the Debug configuration we must "Force includes"  win-config.h

thats it how you can see here ------>
--------------------------------------------------------------------------------------------------------------------------
------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Debug Win32 ------
Kompilieren...
opt_FPU.cpp
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Debug\BuildLog.htm" gespeichert.
Optimizer - 0 Fehler, 0 Warnung(en)
========== Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========


regards seti_britta ~heinz  ;D
Title: Re: optimized sources
Post by: _heinz on 22 Jun 2007, 10:23:51 am
the Debug configuration is now ready
-----------------------------------------------------
------ Neues Erstellen gestartet: Projekt: seti_boinc, Konfiguration: Debug Win32 ------
Die Zwischen- und Ausgabedateien für das Projekt "seti_boinc" mit der Konfiguration "Debug|Win32" werden gelöscht.
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /Od /I "." /I "../../../boinc/api" /I "../../../boinc/client/win" /I "../../../boinc/lib" /I ".." /I "glut" /I "../../../boinc/image_libs" /I "../../../boinc/jpeglib" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "USE_IPP" /D "USE_SSE2" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /Gm /EHsc /RTC1 /MTd /Fp".\Debug/seti_boinc.pch" /Fo".\Debug/" /Fd".\Debug/" /FR".\Debug\\" /W3 /c /ZI /TP "..\..\db\xml_util.cpp"
   "..\worker.cpp"
   "..\..\..\boinc\api\windows_opengl.C"
   "..\version.cpp"
   "..\..\..\boinc\lib\util.C"
   "..\timecvt.cpp"
   "..\..\image_libs\tgalib.cpp"
   "..\..\db\sqlrow.cpp"
   "..\..\db\sqlblob.cpp"
   "..\spike.cpp"
   "..\..\..\boinc\lib\shmem.C"
   "..\seti_header.cpp"
   "..\seti.cpp"
   "..\..\db\schema_master.cpp"
   "..\sah_gfx_base.cpp"
   "..\sah_gfx.cpp"
   "..\s_util.cpp"
   "..\pulsefind.cpp"
   "..\progress.cpp"
   "..\..\..\boinc\lib\parse.C"
   "..\malloc_a.cpp"
   "..\main.cpp"
   "..\lcgamm.cpp"
   "..\..\..\boinc\api\gutil.C"
   "..\..\..\boinc\api\graphics_data.C"
   "..\..\..\boinc\api\graphics_api.C"
   "..\gdata.cpp"
   "..\gaussfit.cpp"
   "..\..\..\boinc\lib\filesys.C"
   "..\fft8g.cpp"
   "..\chirpfft.cpp"
   "..\..\..\boinc\api\boinc_api.C"
   "..\..\..\boinc\lib\app_ipc.C"
   "..\analyzeReport.cpp"
   "..\analyzePoT.cpp"
   "..\analyzeFuncs.cpp"
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
analyzePoT.cpp
--- BenSpectrum ---
analyzeReport.cpp
app_ipc.C
boinc_api.C
chirpfft.cpp
fft8g.cpp
filesys.C
gaussfit.cpp
gdata.cpp
graphics_api.C
graphics_data.C
gutil.C
lcgamm.cpp
main.cpp
malloc_a.cpp
parse.C
progress.cpp
pulsefind.cpp
s_util.cpp
Code wird generiert...
Kompilieren...
sah_gfx.cpp
sah_gfx_base.cpp
schema_master.cpp
seti.cpp
seti_header.cpp
shmem.C
spike.cpp
sqlblob.cpp
sqlrow.cpp
tgalib.cpp
timecvt.cpp
util.C
version.cpp
windows_opengl.C
worker.cpp
xml_util.cpp
Code wird generiert...
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(663) : warning C4717: 'xml_match_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(689) : warning C4717: 'xml_find_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
Verknüpfen...
LINK : warning LNK4098: Standardbibliothek "LIBCMT" steht in Konflikt mit anderen Bibliotheken; /NODEFAULTLIB:Bibliothek verwenden.
LIBCMTD.lib(a_env.obj) : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "__imp__GetEnvironmentStrings@0" in Funktion "___crtGetEnvironmentStringsA".
Debug/setiathome_2.18_windows_intelx86.exe : fatal error LNK1120: 1 nicht aufgelöste externe Verweise.
Browseinformationsdatei wird erstellt...
Microsoft Browse Information Maintenance-Programm Version 8.00.50727
Copyright (C) Microsoft Corporation. All rights reserved.
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\BuildLog.htm" gespeichert.
seti_boinc - 2 Fehler, 3 Warnung(en)
========== Alles neu erstellen: 0 erfolgreich, Fehler bei 1, 0 übersprungen ==========
------------------------------------------------------------------------------------------------------------------------------------------
remember that I had a problem with the GetEnvironmentStrings in Optimizer opt_osinterface.cpp
here is the solution: --->

// HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\Session Manager\\Memory Management\\SecondLevelDataCache

#define CPU_REG_KEY      HKEY_LOCAL_MACHINE
#define CPU_REG_SUBKEY   "SYSTEM\\CurrentControlSet\\Control\\Session Manager\\Memory Management\\"

#define L2_DATA_CACHE      "SecondLevelDataCache"


//
// os_L2_cache_size - Get value from registry: Might be the off-cpu cache size
//
int os_L2_cache_size( void )
   {
   HKEY hKey;
/*   DWORD dwType; */
   DWORD dwSize;
   DWORD L2_size = 0;
//seti_britta:
/*   LONG status = RegOpenKeyEx(CPU_REG_KEY, CPU_REG_SUBKEY, 0, KEY_QUERY_VALUE, &hKey); */
   LONG status = RegOpenKeyEx(CPU_REG_KEY, TEXT("SYSTEM\\CurrentControlSet\\Control\\Session Manager\\Memory Management\\"), 0, KEY_QUERY_VALUE, &hKey);
   if(status != ERROR_SUCCESS)   return 0;

   dwSize = sizeof( L2_size );
//seti_britta:
/*   status = (RegQueryValueEx(hKey, L2_DATA_CACHE, NULL, NULL, (LPBYTE)&L2_size, &dwSize)); */
   status = (RegQueryValueEx(hKey, TEXT("SecondLevelDataCache"), NULL, NULL, (LPBYTE)&L2_size, &dwSize));
   RegCloseKey(hKey);

   return L2_size;
   }

-----------------------------------------------------------------------------------------------------------------------------------------------------------
it compiles fine how you can see: ---->
------ Erstellen gestartet: Projekt: Optimizer, Konfiguration: Debug Win32 ------
Kompilieren...
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x86
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
cl /Od /Oy /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\MKL\9.0\include\fftw" /I "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "USE_IPP" /D "USE_SSE2" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /Gm /EHsc /RTC1 /MT /Fo"Debug\\" /Fd"Debug\vc80.pdb" /W3 /c /Wp64 /ZI /TP /FI "win-config.h" ".\opt_os_interface.cpp"
opt_os_interface.cpp
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\optimizer\opt_os_interface.cpp(92) : warning C4552: '<<': Operator hat keine Auswirkungen; Operator mit Nebeneffekt erwartet
Das Buildprotokoll wurde unter "file://c:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Debug\BuildLog.htm" gespeichert.
Optimizer - 0 Fehler, 1 Warnung(en)
========== Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========
-------------------------------------------------------------------------------------------------------------------------------------

the conflict of  "LIBCMT" and LIBCMTD.lib  with 1 not resolved ext ref  ??
I must search  ::)
any suggestions are welcome
regards seti_britta ~heinz
Title: Re: optimized sources
Post by: _heinz on 22 Jun 2007, 04:22:21 pm
Hi Simon,
wahrscheinlich hängt der Fehler mit der LIBCMTD mit der Laufzeitbibliothek vom Debugger zusammen. Da ist "Multithreaded-Debug(/MTd) als Option angegeben.
----------------------------------------------------
Beim compilieren ohne Debug tritt kein Fehler auf.

Kannst du bitte mal nachschauen was bei dir angegeben ist.

Merci heinz
Title: Re: optimized sources
Post by: Simon on 23 Jun 2007, 07:21:44 pm
...schon recht lange her, daß ich einen Debug-Build gemacht habe. K.a., wirklich, meine Compile-Maschine läuft derzeit mit 64-bit Linux, die Windows-Partition ist schlafen.

Ich habe bei den Release-Konfigurationen alles auf single-threaded eingestellt, weil die Apps dazu gedacht sind auf nur einem Kern zu laufen (BOINC erledigt den Rest, obwohl gerade über Multithreaded Science Apps diskutiert wird. Zukunftsmusik.).

Schon mal IDB probiert? Ist beim ICC dabei und ein brauchbarer Debugger.

Mfg,
Simon.
Title: Re: optimized sources
Post by: _heinz on 23 Jun 2007, 08:43:30 pm
Hi Simon,
kann jetzt mit debug ohne Fehler compilieren. Es war irgenden komisches Zeichen  bei den Angaben LINKER--->Allgemein--->Zusätzliche Bibliotheksverzeichnisse
Kann jetzt auch debugger aufrufen und debuggen

Merci heinz
Title: Re: optimized sources
Post by: _heinz on 23 Jun 2007, 09:31:31 pm
Can't set up shared mem: -1


Unhandled Exception Detected...
---------------------------------------------------------------
Das ist alles, kommt mir doch sehr bekannt vor, hab ich im Forum schon gesehen

Kann jemand helfen ?
Title: Re: optimized sources
Post by: Simon on 23 Jun 2007, 09:33:41 pm
Keine fftw dll im gleichen Verzeichnis? Wenn Du nicht mit IPP FFT arbeitest, braucht's die sonst.

Mfg,
Simon.
Title: Re: optimized sources
Post by: _heinz on 23 Jun 2007, 10:44:04 pm
Merci, hab ich hereingetan...

Mit dem debugger komme ich bis zu der Stelle wo er das testfile (die wu "workunit.sah" einlesen will.

hier
    // Open the file and load the first line
    FILE *fp = fopen(virtual_name, "r");      <----- hier passiert es
    if (!fp) return ERR_FOPEN;

es wird dann verzweigt zu iosfwd
439    static size_t __CLRCALL_OR_CDECL length(const _Elem *_First)
      {   // find length of null-terminated string
//      _DEBUG_POINTER(_First);
      return (::strlen(_First));          <------
      }

Eine Ausnahme (erste Chance) bei 0x7c812a7b in setiathome_2.3S5B_windows_intelx86.exe: Microsoft C++-Ausnahme: seti_error an Speicherposition 0x0012f158..
Der Thread 'Debug Exception Monitor' (0xd9c) hat mit Code -5 (0xfffffffb) geendet.
Der Thread 'Timer' (0x7a0) hat mit Code -5 (0xfffffffb) geendet.
Das Programm "[2928] setiathome_2.3S5B_windows_intelx86.exe: Systemeigen" wurde mit Code -5 (0xfffffffb) beendet.
-----------------------------------------------------------------------------------

hab mal nachgeschaut:   workunit.sah hat nur das archivbit gesetzt
???? hmmm...
fehlt noch was ??
heinz

Title: Re: optimized sources
Post by: _heinz on 24 Jun 2007, 03:08:08 am
sieht aus als hätt ich ein problem mit mkl.... muss prüfen  ::)
Title: Re: optimized sources
Post by: Simon on 24 Jun 2007, 10:45:17 am
Äh,

MKL wird in unseren derzeitigen Sourcen eigentlich nicht verwendet (wir verwenden IPP's FFT-Implementation für die optimierten Apps, Berkeley verwendet FFTW). Würde vorschlagen, die MKL includes/libraries raus zu nehmen, falls Du sie drin hast.

MfG,
Simon.
Title: Re: optimized sources
Post by: _heinz on 24 Jun 2007, 03:39:15 pm
Hi all,

to debug the client I took the 3 files ( init_data.xml   stderr.txt  work_unit.sah ) into the debug directory where the client is compiled and linked. Then set in seti_boinc ---> Konfigurationseigenschaften ---> Debuggen ---->Befehl  C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe
---->Befehlsargumente -bench -show_benchmark

Now set line 240 worker.cpp a stoppoint and begin to debug
239        retval = read_wu_state();
240       if ( retval ) SETIERROR( retval, "from read_wu_state() in worker()" );  <----stoppoint

if we reached it we can see that read_wu_state() give back errno=2
normally that means:  2 ENOENT No such file or directory.  A component of a specified pathname did not exist, or the pathname was an empty string.

But now the error must be interpreted through the program.
It goes to: line 286 worker.cpp
286     catch( seti_error e )
        {
        if ( e == RESULT_OVERFLOW )
            {
            fprintf( stderr, "SETI@Home Informational message -9 result_overflow\n" );
            fprintf(
                stderr,
                "NOTE: The number of results detected exceeds the storage space allocated.\n" );
            final_report();  // add signal and flop counts to stderr.txt
            progress = 1;
            remaining = 0;
            boinc_fraction_done( progress );
            checkpoint( true ); // force a checkpoint
            boinc_fpops_cumulative( analysis_state.FLOP_counter * LOAD_STORE_ADJUSTMENT );
            boinc_finish( 0 );
            exit( 0 );          // an overflow is not an app error
            }
        else
            {
            e.print();      <---------- to here  
            exit ( static_cast< int >( e ) );
            }
        }
    } // worker()
Because its not a result overflow it goes to e.print   ----> in s_util.cpp
void seti_error::print( void ) const
    {
    std::cerr << "SETI@home error " << -value << " ";
    if ( (value <= atexit_failure) && (value >= 0) )
        {
        std::cerr << message[value];
        }
    else
        {
        std::cerr << "Unknown error";
        }

    std::cerr << std::endl << data << std::endl;
    std::cerr << "File: " << file << std::endl;
    std::cerr << "Line: " << line << std::endl;
    std::cerr << std::endl;
    }
--------------------------- now to iosfwd ---->

   static size_t __CLRCALL_OR_CDECL length(const _Elem *_First)
      {   // find length of null-terminated string
//      _DEBUG_POINTER(_First);
      return (::strlen(_First));    <--------
      }
------------------------------------------
-      _First   0x00714558 "C"   const char *
and
+      _First   0x006cb5a8 ""   const char *     <---second loop
--------------------------------------
&#135;      std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign zurückgegeben.   "work_unit.sah"   std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
-----------------------------------------------
come now to 318 app_ipc
318    FILE *fp = fopen(virtual_name, "r");
    if (!fp) return ERR_FOPEN;
--------------------------------------------------------
then to 129 app_ipc
129     FILE *fp = fopen(virtual_name, "r");
    if (!fp) return ERR_FOPEN;
-----------------------------------------------------------------
then to
static int read_wu_state( void )
    {
    FILE    *f;
    int     retval = 0;
    string  path;
    FORCE_FRAME_POINTER;

    boinc_resolve_filename_s( WU_FILENAME, path );
    f = boinc_fopen( path.c_str(), "rb" );
    if ( f )
        {
        #ifdef BOINC_APP_GRAPHICS
        if ( !nographics() ) sprintf( sah_graphics->status, "Scanning data file\n" );
        #endif
        retval = seti_parse_wu( f, analysis_state );
        fclose( f );
        if ( retval ) SETIERROR( retval, "from seti_parse_wu() in read_wu_state()" );
        }
    else
        {
        char    msg[1024];
        sprintf( msg, "(%s) in read_wu_state() errno=%d\n", path.c_str(), errno );
        SETIERROR( FOPEN_FAILED, msg );
        }

    retval = seti_init_state();
    if ( retval ) SETIERROR( retval, "from seti_init_state() in read_wu_state()" );

    #ifdef BOINC_APP_GRAPHICS
    if ( !nographics() ) sprintf( sah_graphics->status, "Scanning state file.\n" );
    #endif
    try
        {
        retval = parse_state_file( analysis_state );
        }

    catch( seti_error e )
        {

        // Failure to open the state file means that are starting a new WU.
        if ( static_cast< int >( e ) == FOPEN_FAILED )
            {
            retval = initialize_for_wu();
            if ( retval )
                SETIERROR( retval, "from initialize_for_wu() in read_wu_state()" );
            }
        else
            throw e;
        }

    boinc_fraction_done( progress * remaining + (1.0 - remaining) * (1.0 - remaining) );
    return 0;
    }
------------------------------------------------------ then to
   static size_t __CLRCALL_OR_CDECL length(const _Elem *_First)
      {   // find length of null-terminated string
//      _DEBUG_POINTER(_First);
      return (::strlen(_First));
      }
----------------------------------------------------
+      _First   0x0012f3b4 "(work_unit.sah) in read_wu_state() errno=2
"   const char *

+      this   "(work_unit.sah) in read_wu_state() errno=2
"   std::basic_string<char,std::char_traits<char>,std::allocator<char> > * const
------------------------------------------------------
+      this   0x0012f158 {value=5 file="c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\worker.cpp" line=144 ...}   seti_error * const
-------------------------------------------
then to 143 worker.cpp
143         sprintf( msg, "(%s) in read_wu_state() errno=%d\n", path.c_str(), errno );
        SETIERROR( FOPEN_FAILED, msg );
-------------------------------------------------------------
then to catch_error 286
 286   catch( seti_error e )
        {
        if ( e == RESULT_OVERFLOW )
            {
            fprintf( stderr, "SETI@Home Informational message -9 result_overflow\n" );
            fprintf(
                stderr,
                "NOTE: The number of results detected exceeds the storage space allocated.\n" );
            final_report();  // add signal and flop counts to stderr.txt
            progress = 1;
            remaining = 0;
            boinc_fraction_done( progress );
            checkpoint( true ); // force a checkpoint
            boinc_fpops_cumulative( analysis_state.FLOP_counter * LOAD_STORE_ADJUSTMENT );
            boinc_finish( 0 );
            exit( 0 );          // an overflow is not an app error
            }
        else
            {
            e.print();
            exit ( static_cast< int >( e ) );  <--------------- here the app crash
            }
        }
    } // worker()
-----------------------------------------------
+      e   {value=5 file="c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\worker.cpp" line=144 ...}   seti_error
-------------------
how you see    e = 5
+      e   {value=5 file="c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\worker.cpp" line=144 ...}   seti_error
    if ( (value <= atexit_failure) && (value >= 0) )
      value   5   int
+      _First   0x006cf4b8 "Can't open file"   const char *
           exit ( static_cast< int >( e ) );
+      e   {value=5 file="c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\client\worker.cpp" line=144 ...}   seti_error
------------------------------------------------------------------------------------------------------
and now ------> crashs the application thats not a really good solution of this problem

Eine Ausnahme (erste Chance) bei 0x7c812a7b in setiathome_2.3S5B_windows_intelx86.exe: Microsoft C++-Ausnahme: seti_error an Speicherposition 0x0012f158..
Der Thread 'Debug Exception Monitor' (0xd50) hat mit Code -5 (0xfffffffb) geendet.
Der Thread 'Timer' (0x278) hat mit Code -5 (0xfffffffb) geendet.
Das Programm "[3288] setiathome_2.3S5B_windows_intelx86.exe: Systemeigen" wurde mit Code -5 (0xfffffffb) beendet.
-----------------------------------------------------------------------------------------------------------------

questions:
1. why the did the program not open the file ? ( it was the first of the 7 test wu´s ),    still the archive bit is set
     did I made something wrong in the test with the file ?

2. the error is not well reported, solution should not crash

3. the stderr.txt is cleaned and empty
-----------------------------------------------------------------------------------------------
Merci for your attention, your suggestions are welcome
regards heinz
Title: Re: optimized sources
Post by: _heinz on 24 Jun 2007, 07:52:13 pm
have some new test lines inserted to handle the error.

        else
            {
         //seti_britta: always exit with 0
            e.print();
         if ( e == FOPEN_FAILED )
         {
            fprintf( stderr, "Can´t open file" );
            exit( 0 );     <-------- here it come back with 0 now
         }
         if ( e == READ_FAILED )
         {
            fprintf( stderr, "Can´t read file" );
            exit( 0 );
         }
            exit ( static_cast< int >( e ) );
            }
really interesting, FOPE_FAILED is true, it goes then to fprintf statement, run it, then to exit(0), in assembler code come back with 0 now.
I believe it is a problem with crt
there is the assembly:
--- f:\sp\vctools\crt_bld\self_x86\crt\src\crt0dat.c ---------------------------
0063C2B0  push        ebp 
0063C2B1  mov         ebp,esp
0063C2B3  push        0   
0063C2B5  push        0   
0063C2B7  mov         eax,dword ptr
Code: [Select]

0063C2BA  push        eax 
0063C2BB  call        doexit (63C520h)
0063C2C0  add         esp,0Ch
0063C2C3  pop         ebp 
0063C2C4  ret   

-------------------------------------------------------------------
Eine Ausnahme (erste Chance) bei 0x7c812a7b in setiathome_2.3S5B_windows_intelx86.exe: Microsoft C++-Ausnahme: seti_error an Speicherposition 0x0012f158..
Der Thread 'Debug Exception Monitor' (0x764) hat mit Code 0 (0x0) geendet.
Der Thread 'Timer' (0x804) hat mit Code 0 (0x0) geendet.
Das Programm "[2336] setiathome_2.3S5B_windows_intelx86.exe: Systemeigen" wurde mit Code 0 (0x0) beendet.
----------------------------------------------------------------------------------------------------------------------------------------------------------------     
huuh .... I took the file boinc_lockfile from the test-package  into the debug ... is that right ?

thought that the output comes into the stderr.txt , but it is always empty ?  ?  ?
     
But I believe that the program cant read file is a other problem...
regards heinz
Title: Re: optimized sources
Post by: Josef W. Segur on 24 Jun 2007, 09:04:05 pm
Heinz,

When I used MS VC++ 4.0 heavily a few years ago, running a debug version of a program from the IDE had similar problems finding files. It turned out to be that the working directory was not where the program was (in Debug), but the directory above that where the source code was. I never figured out why MS had done it that way, just learned to put the files I wanted the program to access up one level. If they are still doing that it may account for the troubles you are having.
                                                                                    Joe
Title: Re: optimized sources
Post by: _heinz on 25 Jun 2007, 08:27:34 am
Thank you Joe for this important hint, will try it
Merci Heinz
Title: Re: optimized sources
Post by: _heinz on 26 Jun 2007, 06:32:16 pm
Hi Crunchr please tell me where in the Ide I must put the commandline and its parameter exactly.
set_boinc.exe -bench -show_benchmark

----------------------------------------------------------------
I did it in Projektmappenexplorer --->seti_boinc -->Eigenschaften --->Konfigurationseigenschaften --->Debuggen --->Befehl --->
C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe -bench -show_benchmark

Pfad + exe + parameter  all in a line

Das Programm wird aufgerufen aber keine Parameter erkannt, dann wird immer work_unit.sah nicht gefunden, bekomme niemals eine filepointer zurück, ---> undefiniertes Programmende

Wahrscheinlich mach ich grundsät5zlich was falsch  :'(
Title: Re: optimized sources
Post by: Simon on 26 Jun 2007, 06:52:34 pm
Joe's hint about moving it to the directory above did not work?

Did you try moving all necessary files to a different directory altogether, or will debugging from inside VS not work otherwise? (probably not)

Regards,
Simon.
Title: Re: optimized sources
Post by: _heinz on 26 Jun 2007, 08:02:33 pm
have the files in all  debug, Release32-NOGFX, and winbuild.
debugging works...but no parameters will be known


Title: Re: optimized sources
Post by: Simon on 27 Jun 2007, 02:11:24 pm
Heinz, sounds like you need some quotes for the debug command.

Quote
I did it in Projektmappenexplorer --->seti_boinc -->Eigenschaften --->Konfigurationseigenschaften --->Debuggen --->Befehl --->
C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe -bench -show_benchmark

should be:

"C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe -bench -show_benchmark"
instead.

Since it does recognize the command and seems to be able to find the files, this is the most logical explanation.

If the above does not work, I recommend something even simpler: write a batch file with the commandline options, and put that as debug command.

HTH,
Simon.
Title: Re: optimized sources
Post by: _heinz on 27 Jun 2007, 04:48:51 pm
Hi Simon,
thanks for your proposals
-----------------------------------------
Let´s document what happen:
variant_1 Befehl: "C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe -bench -show_benchmark"

The code:
int WINAPI WinMain( HINSTANCE hInst, HINSTANCE hPrevInst, LPSTR Args, int WinMode )
    {
    LPSTR   command_line;
    char    *argv[100];
    int     argc, retval;     <----both are not initialized(undefined value)

    command_line = GetCommandLine();    <---here
    argc = parse_command_line( command_line, argv );
    retval = main( argc, argv );

    return retval;
-------------------------------------------------------------------------
from debug:
-      command_line   0x00151ee0 ""C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe" "   char *
how we can see no parameter will be assigned

 ---------------------------------------
variant_2: without quotes
Befehl: C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe -bench -show_benchmark

 +      command_line   0x00151ee0 ""C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe" "   char *
----------------------------------------------------------------------------------------------------------------------------------------
no parameter will be assigned
interesting the qouots, I marked red
----------------------------------------------------------------------------------------------
variant_3
Befehl: C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe
Befehlsargumente: "-bench" "-show_benchmark"
+      command_line   0x00151ee0 ""C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe" "   char *
-------------------------------------------------------
no argument is assigned
and in all variants argc and retval have undefined values
------------------------------------variant_4
code:
int WINAPI WinMain( HINSTANCE hInst, HINSTANCE hPrevInst, LPSTR Args, int WinMode )
    {
    LPSTR   command_line;
    char    *argv[100];
    int     argc, retval;
    argc=retval = 0;   <---new
    command_line = GetCommandLine();       <----------------
    argc = parse_command_line( command_line, argv );
    retval = main( argc, argv );

    return retval;
    }
-------------------------------------------------
-      command_line   0x00151ee0 ""C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Debug\setiathome_2.3S5B_windows_intelx86.exe" "   char *
-----------------------------------------------------
think "GetCommandLine(); did not work right.........
will search why

heinz
Title: Re: optimized sources
Post by: _heinz on 27 Jun 2007, 05:25:25 pm
I should read documentation  :)
Title: Re: optimized sources
Post by: _heinz on 28 Jun 2007, 06:03:55 pm
found the following instruction set   ---->

Befehl (Lokaler Windows-Debugger)
Gibt den Startbefehl für das auf dem lokalen Computer zu debuggende Programm an.

Befehlsargumente (Lokaler Windows-Debugger und Remote-Windows-Debugger)
Gibt Argumente für den oben aufgeführten Befehl an.

In diesem Feld können die folgenden Umleitungsoperatoren verwendet werden:

< file
Liest "stdin" aus Datei.

> file
Schreibt "stdout" in Datei.

>> file
Fügt "stdout" an Datei an.

2> file
Schreibt "stderr" in Datei.

2>> file
Fügt "stderr" an Datei an.

2> &1
Sendet "stderr (2)"-Ausgaben an denselben Speicherort wie "stdout (1)".

1> &2
Sendet "stdout (1)"-Ausgaben an denselben Speicherort wie "stderr (2)".

In den meisten Fällen sind diese Operatoren nur auf Konsolenanwendungen anwendbar.
------------------------
Arbeitsverzeichnis
Gibt das Arbeitsverzeichnis des zu debuggenden Programms relativ zum Projektverzeichnis mit der EXE-Datei an. Wenn Sie kein Arbeitsverzeichnis festlegen, wird das Projektverzeichnis verwendet. Beim Remotedebuggen befindet sich das Projektverzeichnis auf dem Remoteserver.
----------------------------
Projektverzeichnis ist das Verzeichnis in dem sich die .sln Datei befindet. In unserem Fall seti_boinc.sln im Verzeichnis winbuild
C:\I\SC\seti\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build
heisst die Dateien boinc_lockfile, stderr.txt, work_unit.sah müssen in dieses Verzeichnis kopiert werden.
Das ist bei mir auch so.
-------------------------------
Hätten wir das also auch geklärt.

heinz

Title: Re: optimized sources
Post by: _heinz on 30 Jun 2007, 09:22:54 am
reading documentation
1. int WINAPI WinMain( HINSTANCE hInst, HINSTANCE hPrevInst, LPSTR Args, int WinMode )   :o
2. command_line = GetCommandLine();  ::)

zu 1. found this --->http://msdn.microsoft.com/library/deu/default.asp?url=/library/DEU/vccore/html/_core_revising_the_winmain_function.asp
zu 2. found this --->http://msdn2.microsoft.com/en-us/library/ms683156.aspx

It looks like there is still something to do
be patient... work is going on
heinz
Title: Re: optimized sources
Post by: _heinz on 03 Jul 2007, 05:57:27 pm
made some short sample programs.... now I´m testing the application and debug....
heinz
Title: Re: optimized sources
Post by: _heinz on 04 Jul 2007, 10:03:44 pm
the application read now the files init_data.xml and work_unit.sah.
from init_data.xml debug shows ----> in hostinfo
      m_nbytes   134217728.00000000   double

and here in apc_ipc.c
void APP_INIT_DATA::copy(const APP_INIT_DATA& a) {
    memcpy(this, &a, sizeof(APP_INIT_DATA));
    if (a.project_preferences) {
        project_preferences = strdup(a.project_preferences);  <----here
    }
}
project_preferences will not be filled.... hmm ?
----------------------------------
and here in boinc_api.c   app_init_data will be filled
int boinc_get_init_data(APP_INIT_DATA& app_init_data) {
    app_init_data = aid;     <----- here
    return 0;
}
how we can see --->
+      app_init_data   {major_version=0 minor_version=0 release=0 ...}   APP_INIT_DATA &
      m_nbytes   134217728.00000000   double
host_info m_nbytes is there obove
------------------------------------------------
        boinc_get_init_data( app_init_data );
        if ( !nographics_flag
         &&  (app_init_data.host_info.m_nbytes != 0)
         &&  (app_init_data.host_info.m_nbytes <= ( double ) (48 * 1024 * 1024)) )
            {
            #ifdef BOINC_APP_GRAPHICS
            fprintf( stderr, "Low memory machine... Disabling graphics.\n" );
            fprintf(
                stderr,
                "%f <= %f\n",
                app_init_data.host_info.m_nbytes,
                ( double ) 48 * 1024 * 1024 );
            fflush( stderr );
            nographics_flag = 1;
            #endif
            }

        #ifdef BOINC_APP_GRAPHICS
        if ( !nographics_flag )
            {
            run_stage = GRXINIT;
                #ifdef DYNAMIC_GRAPHICS
            retval = boinc_init_graphics_lib( worker, argv[0] );
                #else
            retval = boinc_init_graphics( worker );
                #endif
            run_stage = POSTINIT;
            }
        else
            #endif
            {
            run_stage = POSTINIT;
            retval = boinc_init();       <------ here it should run into
            if ( !retval ) worker();      <------ than to worker()
            }
---------------------
it goes to boinc_init()
boinc_lockfile will be read
+      filename   0x005d4014 "boinc_lockfile"   const char *
it goes now to ----->
    retval = boinc_parse_init_data_file();
    if (retval) {
        standalone = true;
    } else {
        retval = setup_shared_mem();
        if (retval) {
            fprintf(stderr, "Can't set up shared mem: %d\n", retval);
            standalone = true;
        }
    }
boinc_init is sucessful done and it go´s to worker()
there it goes to read_wu_state --->
       retval = read_wu_state();
the workunit will be assigned
+      this   "work_unit.sah"   std::basic_string<char,std::char_traits<char>,std::allocator<char> > * const
we open the file work_unit.sah
    // Open the file and load the first line
    FILE *fp = fopen(virtual_name, "r");
    if (!fp) return ERR_FOPEN;

    char buf[512];
    fgets(buf, 512, fp);
    fclose(fp);
--------------------------------------------
read 512 bytes into buf and close the file
+      buf   0x0012f1a0 "<workunit>
"   char [512]
+      physical_name   "work_unit.sah"   std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
now we are in worker opened the file an come to --->
        retval = seti_parse_wu( f, analysis_state );    <------- here
        fclose( f );
-----------------------
it goes to seti.cpp
int seti_parse_wu( FILE *f, ANALYSIS_STATE &state )
then to -->
    retval = seti_parse_wu_header( f );
we come to xstring
   bool __CLR_OR_THIS_CALL _Grow(size_type _Newsize,
      bool _Trim = false)
      {   // ensure buffer is big enough, trim to size if _Trim is true
         if (max_size() < _Newsize)
         _String_base::_Xlen();   // result too long
      if (_Myres < _Newsize)
         _Copy(_Newsize, _Mysize);   // reallocate to grow
      else if (_Trim && _Newsize < _BUF_SIZE)
         _Tidy(true,   // copy and deallocate if trimming to small string
            _Newsize < _Mysize ? _Newsize : _Mysize);
      else if (_Newsize == 0)
         _Eos(0);   // new size is zero, just null terminate
      return (0 < _Newsize);   // return true only if more work to do
      }
new size is zero and it goes back
it goes now to seti_header.cpp ---->
int seti_parse_wu_header( FILE *f )
and there to the loops --->
    do
        {
        fgets( buf, 256, f );
        }
    while ( !feof( f ) && !xml_match_tag( buf, "<workunit_header" ) );

    buffer += buf;

    while ( fgets( buf, 256, f ) && !xml_match_tag( buf, "</workunit_header" ) )
        {
        buffer += buf;
        }

    buffer += buf;

    if ( wu ) delete wu;
    wu = new workunit( buffer );

    SETI_WU_INFO    temp( *wu );
    swi = temp;
    found = 1;
----------------------------------
we run into the loop to fill the buffer
+      buf   0x0012edc4 "<workunit_header>
"   char [256]
+      buffer   ""   std::basic_string<char,std::char_traits<char>,std::allocator<char> >
+      f   0x0061f590 {_ptr=0x00379aff "  <name>01mr99ab.14893.2848.703400.3.151</name>
  <group_info>
    <tape_info>
      <name>01mr99ab</name>
      <start_time>2451239.5778227</start_time>
      <last_block_time>2451239.5778227</last_block_time>
      <last_block_done>2848</last_block_done>
      <missed>0</missed>
      <tape_quality>0</tape_quality>
      <sb_id>0</sb_id>
    </tape_info>
    _iobuf *
----------------------------------------------
and fill the buffer till we found "</workunit_header" ---> means  end of header
the whole workunit is now read into the buffer --->
<workunit_header>
  <name>01mr99ab.14893.2848.703400.3.151
.....
<subband_desc>
  <number>151</number>
  <center>1418978879.8359</center>
  <base>1418974607.375</base>
  <sample_rate>9765.625</sample_rate>
</subband_desc>
<sb_id>0</sb_id>
</workunit_header>
----------------------------------------------------------------
we come now to --->
    if ( wu ) delete wu;
    wu = new workunit( buffer );
-----------
it goes to dbgnew.cpp --->allocate memory block
in it it swiches to dbg_heap.cpp
and there to ---> _nh_malloc_dbg_impl
extern "C" void * __cdecl _nh_malloc_dbg (
        size_t nSize,
        int nhFlag,
        int nBlockUse,
        const char * szFileName,
        int nLine
        )
{
        int errno_tmp = 0;
        void * pvBlk = _nh_malloc_dbg_impl(nSize, nhFlag, nBlockUse, szFileName, nLine, &errno_tmp); <--here

        if ( pvBlk == NULL && errno_tmp != 0 && _errno())
        {
            errno = errno_tmp; // recall, #define errno *_errno()
        }
        return pvBlk;
----------------------------------
and there we do the allocation
        for (;;)
        {
            /* do the allocation
             */
            pvBlk = _heap_alloc_dbg_impl(nSize, nBlockUse, szFileName, nLine, errno_tmp);

            if (pvBlk)
            {
                return pvBlk;
            }
            if (nhFlag == 0)
            {
                *errno_tmp = ENOMEM;
                return pvBlk;
            }

            /* call installed new handler */
            if (!_callnewh(nSize))
            {
                *errno_tmp = ENOMEM;
                return NULL;
            }

            /* new handler was successful -- try to allocate again */
        }
-----------------
we make the allocation and come back with
    void *res = _nh_malloc_dbg( cb, 1, nBlockUse, szFileName, nLine );

      res   0x0037ab20   void *
----------------------------------------
then back to seti_header.cpp
    wu = new workunit( buffer );    <------ here

    SETI_WU_INFO    temp( *wu );
    swi = temp;
    found = 1;
--------------------
debugger shows --->
&#135;      operator new returned   0x0037ab20   void *
+      buffer   "<workunit_header>
  <name>01mr99ab.14893.2848.703400.3.151</name>
  <group_info>
    <tape_info>
      <name>01mr99ab</name>
      <start_time>2451239.5778227</start_time>
      <last_block_time>2451239.5778227</last_block_time>
      <last_block_done>2848</last_block_done>
      <missed>0</missed>
      <tape_quality>0</tape_quality>
      <sb_id>0</sb_id>
    </tape_info>
    <nam   std::basic_string<char,std::char_traits<char>,std::allocator<char> >
+      wu   0x00000000 {id=??? name=0x00000010 <Bad Ptr> group_info={...} ...}   workunit_header *
+      db_table<workunit_header>   {table_name=0x005d7014 "workunit_header" me=??? _search_tag=0x005d7014 "workunit_header" ...}   db_table<workunit_header>
      id   CXX0030: Error: expression cannot be evaluated   
+      name   0x00000010 <Bad Ptr>   char [64]
+      group_info   {r={...} id=??? }   db_reference<workunit_grp,long>
+      subband_desc   {number=??? center=??? base=??? ...}   subband_description_t
      sb_id   CXX0030: Error: expression cannot be evaluated
------------------
and in it db_table looks not so well prepared I think-      db_table<workunit_header>   {table_name=0x005d7014 "workunit_header" me=??? _search_tag=0x005d7014 "workunit_header" ...}   db_table<workunit_header>
      track_mem<workunit_header>   {...}   track_mem<workunit_header>
+      table_name   0x005d7014 "workunit_header"   char * const
      me   CXX0030: Error: expression cannot be evaluated   
+      _search_tag   0x005d7014 "workunit_header"   char *
      _nfields   5   int
+      column_names   0x005d7914 char * const * const db_table<class workunit_header>::column_names   char * const [5]
      cursor   CXX0030: Error: expression cannot be evaluated
--------------------------------------------
we come to dbgheap.c  --->
                    RTCCALLBACK(_RTC_FuncCheckSet_hook,(0));
                    pHead = (_CrtMemBlockHeader *)_heap_alloc_base(blockSize);   <---- here
and call the function
--------------------------
      blockSize   7284   unsigned int
      nSize   7248   unsigned int
+      pHead   0x00000000 {pBlockHeaderNext=??? pBlockHeaderPrev=??? szFileName=??? ...}   _CrtMemBlockHeader *
-----------------
I believe _CrtMemBlockHeader *    has in szFileName a bad value
----------------------
it goes to malloc.c   ---->
    if (__active_heap == __SYSTEM_HEAP) {
        return HeapAlloc(_crtheap, 0, size ? size : 1);
----------------
      __active_heap   1   int
      _crtheap   0x00370000   void *
      size   7284   unsigned int
   
and HeapAlloc is called --->
it runs through it more than 100 always with the values obove.... then it crashes

all this is in seti_parse_wu
suddenly we get a eror --->
Unhandled exception at 0x7c91eddd in seti_boinc.exe: 0xC0000005: Access violation writing location 0x00030ffc.

in stderr.txt we find ----> Can't set up shared mem: -1
------------------------------------
 searching now
any suggestions ? ?
Title: Re: optimized sources
Post by: Urs Echternacht on 06 Jul 2007, 06:33:30 pm
No translation necessary.
(Vorweg: Bei meinen zweisprachigen Kommentaren beachte bitte nur die in Deutsch, da ich die in Englisch nur zur Verständnis für alle hier eingefügt habe.)
the application read now the files init_data.xml and work_unit.sah.
from init_data.xml debug shows ----> in hostinfo
      m_nbytes   134217728.00000000   double
...
-----------------
we make the allocation and come back with
    void *res = _nh_malloc_dbg( cb, 1, nBlockUse, szFileName, nLine );

      res   0x0037ab20   void *
----------------------------------------
0x0037ab20 seems to be the end of space for the heap. Which values do go into the _nh_malloc_dbg(...) ?
(Soweit reicht wohl der Speicherbereich, den der Heap später belegen darf. Welche Werte führen den in _nh_malloc_dbg(...) zu diesem Ergebnis ?)
Quote
then back to seti_header.cpp
...
+      subband_desc   {number=??? center=??? base=??? ...}   subband_description_t
      sb_id   CXX0030: Error: expression cannot be evaluated
------------------
The missing values seem to be filled in by templates, so this could be a debugger problem.
(Da die fehlenden Werte in Templates eingesetzt werden sollen, kann es sein, daß der Debugger Probleme hat, zu erkennen, was da drin stehen soll.)
Quote
...
--------------------------------------------
we come to dbgheap.c  --->
                    RTCCALLBACK(_RTC_FuncCheckSet_hook,(0));
                    pHead = (_CrtMemBlockHeader *)_heap_alloc_base(blockSize);   <---- here
and call the function
--------------------------
      blockSize   7284   unsigned int
      nSize   7248   unsigned int <----- Ist das ein Tippfehler (siehe weiter unten) ?
+      pHead   0x00000000 {pBlockHeaderNext=??? pBlockHeaderPrev=??? szFileName=??? ...}   _CrtMemBlockHeader *
-----------------
I believe _CrtMemBlockHeader *    has in szFileName a bad value
----------------------
it goes to malloc.c   ---->
    if (__active_heap == __SYSTEM_HEAP) {
        return HeapAlloc(_crtheap, 0, size ? size : 1);
----------------
      __active_heap   1   int
      _crtheap   0x00370000   void *
      size   7284   unsigned int
So, the memory for the heap starts at 0x00370000 and ends at 0x0037ab19. The rights to write at that memoryarea are given. But how does that calculate to 7284 times space for unsigned int's (debug: 1+4+1) ?
(Das ist also der Speicherbereich, der für den Heap reserviert ist. Nur in diesem Speicherbereich haben wir die Schreibrechte. Aber wie passt das ganze mit den 7284 vorzeichenlosen Integer-Werten zusammen die da drauf passen sollen ? Hast Du mal nachgerechnet, ob das in der Regel und im Extremfall hinkommt ?)
Quote
   
and HeapAlloc is called --->
it runs through it more than 100 always with the values obove.... then it crashes

all this is in seti_parse_wu
suddenly we get a eror --->
Unhandled exception at 0x7c91eddd in seti_boinc.exe: 0xC0000005: Access violation writing location 0x00030ffc.

in stderr.txt we find ----> Can't set up shared mem: -1
------------------------------------
 searching now
any suggestions ? ?
I give it a try, but i don't know if that helps much.
(Ich habs versucht, weiss aber leider nicht, ob Dir das viel hilft.)

Try to reproduce the error with the debugger. If that is possible, take a look at the dissassembly in the debugger. Check what commands were called before the error( at 0x7c91eddd ) happened. Try to go back to the last command called inside seti-code. Indentify what variables and calls are included and which one gets/produces the wrong write command.
(Versuche mal mit dem Disassembly des Debuggers "näher" an den Fehler heranzukommen. Identifiziere einmal welcher Befehl im Seti-Code zu dieser Fehlermeldung geführt hat. Welche Variablen sind genau beteiligt und führen zu dieser Zugriffsverletzung?)

No translation !
(Na, toll ausgerechnet mein "Lieblingsfehler": 0xC0000005.  >:(   (Den hab ich "gefressen" !) )
Title: Re: optimized sources
Post by: _heinz on 10 Jul 2007, 06:53:13 pm
Merci Urs,
compiled the whole application new, will see if there are any differences. will post tomorrow more about it.

     blockSize   7284   unsigned int
      nSize   7248   unsigned int <----- Ist das ein Tippfehler (siehe weiter unten) ?

nein kein tippfehler
-----------------------
get a stack overflow....

regards heinz
Title: Re: optimized sources
Post by: _heinz on 11 Jul 2007, 05:38:31 pm
if I compile I get 2 warings, as you can see in 'xml_match_tag' and 'xml_find_tag'
maybe this make the trouble.....
Code wird generiert...
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(663) : warning C4717: 'xml_match_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(689) : warning C4717: 'xml_find_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
Kompilieren...
------------------------------------------------------------------------------------------------------------------------
and here is the code ---->
// return true if the tag appears in the line
//
bool xml_match_tag(char* buf,  char* tag) {
    char tmp_tag[BUFSIZ]={'<',0};
    if (strlen(buf)==0) return false;
    if (tag[0] == '<') {
      strlcpy(tmp_tag,tag,BUFSIZ);
    } else {
      strlcat(tmp_tag,tag,BUFSIZ);
    }
    char *p=tmp_tag+strlen(tmp_tag);
    do {
      *(p--)=0;
    } while (isxmldelim(*p));
    while ((buf=strstr(buf,tmp_tag))) {
      if (isxmldelim(buf[strlen(tmp_tag)])) return true;
      buf++;
    }
    return false;
}


bool xml_match_tag(const std::string &s, char* tag)
{
   return xml_match_tag(s.c_str(),tag);
663 }    <----- here

size_t xml_find_tag( char* buf, char* tag) {
//       const char *buf0=buf; // OK
    char *buf0=buf;
    char tmp_tag[BUFSIZ]={'<',0};
    if (tag[0] == '<') {
      strlcpy(tmp_tag,tag,BUFSIZ);
    } else {
      strlcat(tmp_tag,tag,BUFSIZ);
    }
    char *p=tmp_tag+strlen(tmp_tag);
    do {
      *(p--)=0;
    } while (isxmldelim(*p));
    while ((buf=strstr(buf,tmp_tag))) {
      if (isxmldelim(buf[strlen(tmp_tag)])) return buf-buf0;
      buf++;
    }
    return strlen(buf0);
}

std::string::size_type xml_find_tag(const std::string &s, char* tag)
{
   std::string::size_type p=xml_find_tag(s.c_str(),tag);
   return (p!=strlen(s.c_str()))?p:(std::string::npos);
689 }      <------- here
----------------------------------------------------------------------------------------------------------
have anybody of you a idea to prevent this
your suggestions are welcome....
regards heinz
Title: Re: optimized sources
Post by: Urs Echternacht on 12 Jul 2007, 04:40:02 pm
if I compile I get 2 warings, as you can see in 'xml_match_tag' and 'xml_find_tag'
maybe this make the trouble.....
Code wird generiert...
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(663) : warning C4717: 'xml_match_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
c:\i\sc\seti\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(689) : warning C4717: 'xml_find_tag': Rekursiv für alle Steuerelementpfade. Die Funktion verursacht einen Stapelüberlauf zur Laufzeit.
Kompilieren...
------------------------------------------------------------------------------------------------------------------------
and here is the code ---->
// return true if the tag appears in the line
//
bool xml_match_tag(char* buf,  char* tag) {
    ...
}


bool xml_match_tag(const std::string &s, char* tag)
{
   return xml_match_tag(s.c_str(),tag);
663 }    <----- here

size_t xml_find_tag( char* buf, char* tag) {
...
}

std::string::size_type xml_find_tag(const std::string &s, char* tag)
{
   std::string::size_type p=xml_find_tag(s.c_str(),tag);
   return (p!=strlen(s.c_str()))?p:(std::string::npos);
689 }      <------- here
----------------------------------------------------------------------------------------------------------
have anybody of you a idea to prevent this
your suggestions are welcome....
regards heinz
Hallo heinz,
benenne doch einfach mal die beiden Funktionen size_t xml_find_tag( char* , char* ) und bool xml_match_tag(char* ,  char* ) um, um zu sehen, ob Compiler sich hier nicht irrt. Es scheint mir ja fast so, als würde er den Unterschied zwischen den 'überladenen' Funktionen nicht erkennen. Vergiss nicht die beiden Aufrufe in den Funktionen bool xml_match_tag(const std::string & , char* ) und std::string::size_type xml_find_tag(const std::string & , char* ) ebenfalls abzuändern. Falls der Compiler jetzt immer noch mögliche Endlosschleifen zu sehen glaubt, dann verbirgt sich an dieser Stelle wirklich ein Problem. Falls an dieser Stelle jetzt alles O.K. ist, einfach die Funktionsnamen wieder zurück ändern und die Warnung mit einer #pragma-Anweisung (und Kommentar warum sie da steht) abschalten.

Falls jemand eine bessere Idee hat, bitte mitteilen.

(my idea: compiler does not recognize the overloaded functions. My suggestion is trying to proof that. If the compiler is actually wrong i suggested to shut down especially the given warning with #pragma command.
If someone has a better idea, please tell.)
Title: Re: optimized sources
Post by: _heinz on 17 Jul 2007, 07:29:46 pm
I believe I should write a new function. As it is now written, I don´t like the construct.
Work is going on, be patient.
Regards heinz
Title: Re: optimized sources
Post by: _heinz on 25 Jul 2007, 10:05:48 pm
Hi all,
I use a dual monitor installation for working. My second Monitor died last night.  :'(
Today I ordered a new 22" multisync flatscreen from LG. Expected in ca. 4 days.


 
Last week, after a month of waiting I got the monitor. I ordered a silver colored monitor and when I opened the package I get a black one.  I installed it, but it does not run properly( letters total blurred and fuzzy, in a distance of all 3cm). It was a desaster. So I sent these damaged hardware back to the shop and ordered back my money.  All this trouble takes me a lot of time. Now I installed a 10 years old MAG CRT Monitor MX17S, it works fine and the letters are clear and exact written on it. Have some old hardware lying around in my working area, so this was possible.
To recreate myself from this desaster  I set up a diskless Pentium 200Mhz w98 workstation to test the last MMX application of seti. Here  (http://setiathome.berkeley.edu/forum_thread.php?id=39157) you can see it running.
----------!----
regards seti_britta ~heinz
Title: Re: optimized sources
Post by: _heinz on 08 Aug 2007, 02:27:10 pm
Hi all,
my diskless dual 200MMX machine is now crunching. It is a NT4 workstation and runs the client as a service. Two MMX-clients run simultan. If you want read something about it go to dual 200MMX (http://setiathome.berkeley.edu/forum_thread.php?id=39157). Enjoy
It is time to continue the work on the optimized client now.
regards seti_britta~heinz
Title: Re: optimized sources
Post by: _heinz on 10 Aug 2007, 04:16:46 pm
I have a problem with the CPU usage at the MMX dual machine.
Here are 2 pictures that show what I mean.
Any hints are welcome.

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Urs Echternacht on 10 Aug 2007, 06:13:05 pm
I have a problem with the CPU usage at the MMX dual machine.
...
Any hints are welcome.
What is wrong, you think ? The pics show a normal distribution of CPU-time and usage for a dual CPU configuration.
Title: Re: optimized sources
Post by: _heinz on 10 Aug 2007, 06:25:41 pm
Merci Urs,
I did not know, how to interpret it right...  :'(
excuse me.
heinz
Title: Re: optimized sources
Post by: _heinz on 03 Sep 2007, 09:45:28 pm
Hi all,
back now from holidays in Austria , so we can start again to continue.
regards to all who are reading here
seti_britta ~heinz  ;D
Title: Re: optimized sources
Post by: _heinz on 19 Sep 2007, 06:37:47 pm
have now still 1 warning in xml_util
-------------------------------------------------
------ Build started: Project: setiboincdb, Configuration: Release32-NOGFX Win32 ------
Compiling...
xml_util.cpp
c:\i\sc\vs90\seti_boinc_2k3_2.2b-ben-joe\db\xml_util.cpp(685) : warning C4717: 'xml_find_tag' : recursive on all control paths, function will cause runtime stack overflow
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
setiboincdb - 0 error(s), 1 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

I´m working..... it is going on.
regards heinz ~seti_britta
Title: Re: optimized sources
Post by: _heinz on 29 Sep 2007, 06:35:28 pm
Since I´m back it is so calm here in the forums. 
Or are running all important informations behind the backstage ?
Sometime it seems and I have the feeling nobody is interested in further development of the s@h app.
Sure, a lot is already done. All thanks to the development crew.
-----------------------------------------------------------------------------------------
I asked in the Pre-Release -->Windows forum (22.09.07)
Please let me know if  anybody of you are working with me together in further optimization of the source code.
More about 40 read it, but nothing. No answer till now........I can´t believe it.
-------------------------------------------------------------------------------------------------------------
 :(  :'(

Title: Re: optimized sources
Post by: Crunch3r on 29 Sep 2007, 10:05:44 pm
Since I´m back it is so calm here in the forums. 
Or are running all important informations behind the backstage ?
Sometime it seems and I have the feeling nobody is interested in further development of the s@h app.
Sure, a lot is already done. All thanks to the development crew.
-----------------------------------------------------------------------------------------
I asked in the Pre-Release -->Windows forum (22.09.07)
Please let me know if  anybody of you are working with me together in further optimization of the source code.
More about 40 read it, but nothing. No answer till now........I can´t believe it.
-------------------------------------------------------------------------------------------------------------
 :(  :'(



Hallo Heinz, es liegt vieleicht daran, dass du schon seit monaten auf der 2.2b source herum reites und wir schon bei 2.4 sind...
Warum du das machst, weiß ich nicht...

Des weiteren geht auch nix hinter "verschlossenen Türen" vor sich... es gibt nixt neues und mein persöhnliches interesse an S@H tendiert momentan 100% gegen NULL.... oder ... "void S@H()" :P

Keine ahnung wie das hier weiter gehen soll,  denn ich mache keinen finger mehr krum hier...

Cya... ich bin dann mal gone ....  ;)




Title: Re: optimized sources
Post by: The Grinch on 30 Sep 2007, 01:18:04 am
Oh oh!
Das hört sich nach dicke Luft an hier?  :-\
Title: Re: optimized sources
Post by: Jason G on 30 Sep 2007, 02:05:54 am
Since I´m back it is so calm here in the forums. 
Or are running all important informations behind the backstage ?
Sometime it seems and I have the feeling nobody is interested in further development of the s@h app.
Sure, a lot is already done. All thanks to the development crew.
-----------------------------------------------------------------------------------------
I asked in the Pre-Release -->Windows forum (22.09.07)
Please let me know if  anybody of you are working with me together in further optimization of the source code.
More about 40 read it, but nothing. No answer till now........I can´t believe it.
-------------------------------------------------------------------------------------------------------------
 :(  :'(



Well I, for one, have been reading what I can (I don't speak german) with Interest, though I rarely have time between school and work to do much poking around in the code.  Now I am on a two week break and thought I'd have a go getting the IPP/ICC installed over the top of my Visual Studio 2005 setup.  In the mean time I've been looking at the older 1.31 sources / tute wrteup, and reading a bit about IPP from an Intel press release guide thingy... ...All good fun..
What's the deal with 2.4 sources ?
[ I've spent the last 2 years or so doing assembler/C on industrial microcontrollers, and setting up PLCs etc.. So my C/C++ and Visual Studio skills have withered since....]

Jason
Title: Re: optimized sources
Post by: Jason G on 30 Sep 2007, 08:36:53 am
first try with 1.31 sources:
========== Build: 5 succeeded, 2 failed, 0 up-to-date, 0 skipped ==========

 5 out of 7 ... not bad I suppose  :-\

Build log digging time

Note: Found Crun3er's link earlier in this thread to 2.2B sources ... I'll see what they do ...

Jason

Title: Re: optimized sources
Post by: _heinz on 30 Sep 2007, 10:19:44 am
@ Crunch3r
neuer client ist erstellt, 2.3A source mit Änderungen eingearbeitet   
das war am 5.Juni, danach hab ich keine anderen Informationen
Das ist der jetzige Stand.
Lässt sich die 2.3.A auf den neuesten Stand bringen ? und wenn ja mit welchen Dateien als Änderung ??
------------------------------------------------------------------------------------------------------------------------------------------------
Ich würde das gerne noch machen, wenn es denn möglich ist.

Im Optimizer gabs in opt_SSE2.cpp Probleme mit -->      s_put1_NC(p, sum1 );
das gab immer Konvertierungsfehler VEC kann nicht in VEC_I konvertiert werden.
opt_SSE2.cpp
.\opt_SSE2.cpp(85) : error C2440: 'Typumwandlung': 'VEC' kann nicht in 'VEC_I' konvertiert werden
        Quelltyp konnte von keinem Konstruktor angenommen werden, oder die Überladungsauflösung des Konstruktors ist mehrdeutig
Das hat ein wenig gedauert, bis ich eine Lösung hatte. Das Problem ist ein eigenes Thema wert.
Das Problem mit dem Stackoverflow konnte ich bisher nicht mit debug finden. Da ist dann noch das Problem mit den beiden rekursiven Funktionen die sich nicht einwandfrei compilieren lassen.
Dann war Sommer und Urlaub... so vergeht die Zeit ganz schnell.

Warum ich das mache ??  Damit die Quellen mal bereinigt und auf den aktuellen Stand gebracht werden.
Wenn ich mich richtig erinnere hatte ich beim ersten compilieren der 2.2B so um die 600 Fehler.
Fehlende Typ-declarationen, Konvertierungsfehler usw. usw.
Jetzt ist der Quelltext, bis auf einige Warnungen bereinigt und teilweise neu strukturiert worden
Nun muss das Programm nur noch eine Test WU erfolgreich rechnen. Das wärs dann.

heinz  ;)






Title: Re: optimized sources
Post by: _heinz on 30 Sep 2007, 03:35:29 pm
The problem of s_put1_NC

Although already resolved I will give you a short impression. If you try to compile the opt_SS2.cpp with the MSC compiler you will not have sucess. The problem is the statement s_put1_NC(p, sum1 );
die typdefinitionen:
-----------------------------------------
typedef union __declspec(intrin_type) _CRT_ALIGN(16) __m128 {
     float               m128_f32[4];
     unsigned __int64    m128_u64[2];
     __int8              m128_i8[16];
     __int16             m128_i16[8];
     __int32             m128_i32[4];
     __int64             m128_i64[2];
     unsigned __int8     m128_u8[16];
     unsigned __int16    m128_u16[8];
     unsigned __int32    m128_u32[4];
 } __m128;


typedef union __declspec(intrin_type) _CRT_ALIGN(16) __m128i {
    __int8              m128i_i8[16];
    __int16             m128i_i16[8];
    __int32             m128i_i32[4];   
    __int64             m128i_i64[2];
    unsigned __int8     m128i_u8[16];
    unsigned __int16    m128i_u16[8];
    unsigned __int32    m128i_u32[4];
    unsigned __int64    m128i_u64[2];
} __m128i;

   typedef __m128 VEC;
   typedef __m128i VEC_I;







der Zeiger workBuf ist ein Zeiger auf float und erhält hier seinen Wert der auf FreqData zeigt

   float *workBuf = (float *)FreqData;

es wird sum1 definiert:
      VEC sum1, sum2;
Achtung es wird keine Variable für VEC_I definiert !!!! denke das ist der Fehler

der Zeiger p : ist ein Zeiger auf int und zeigt auf PowerSpectrum[bin_off + bin]

      int *p = (int *)(&PowerSpectrum[bin_off + bin]);



      s_put1(&workBuf, sum1);   //    workBuf = psNum;
wir lösen auf:
    #define s_put1( addr, bbbb )            _mm_store_ss( addr, bbbb )
extern void _mm_store_ss(float *_V, __m128 _A);

und setzen ein
      _mm_store_ss(&workBuf, sum1);   // alles OK soweit

      s_put1_NC(p, sum1 ); <--- Fehler
------------------------------------------------
wir lösen das Macro auf:
wir finden:
    #define s_put1_NC(ptr, aaaa)     _mm_stream_si32(ptr, s_extract_32bits(aaaa) );

wir finden:
    #define s_extract_32bits(aaaa)   _mm_cvtsi128_si32((VEC_I) aaaa)

der Befehl nach Auflösung:
----------------------------------------------
   _mm_stream_si32(p, _mm_cvtsi128_si32((sum1)); <-- VEC kann nicht in VEC_I konvertiert werden


----------------------------------------------------------------------------------------------------------------------------------------------------------
the resolution:

// ----------------------------------------------------------------------------
//   Function:   v_convert_f(int k, int *p_i, float *p_f)
//   Typ      :   void
//   Inhalt   :   convert of sum1 and write back to PowerSpectrum
//            problem of s_put1_NC solved for MSC
//   parameter:   int k, int *p_i, float *p_f
//   last update:28.05.2007         by:seti_britta ~heinz
// ----------------------------------------------------------------------------
#ifdef _MSC_VER
void v_convert_f(int k, int *p_i, float *p_f)
{
   for(k=0; k<4; k++)  // k kein festwert  !!! suchen
   {
      *p_i++ = (int) *p_f++; // p_i forwards, because it points to PowerSpectrum[bin_off + bin]
                        // p_f forwards because it points to sum1.m128_f32[0]
   }
}
#endif

// =============================================================================
//     v_GetPowerSpectrum
// seti_britta: comments for understanding, some small changes
//            problem of s_put1_NC for MSC solved
// =============================================================================

GetPowerSpectrum_ptt( sse2_v_GetPowerSpectrum )
{
   float *workBuf = (float *)FreqData;
   register int   i, bin;   //seti_britta: hold var in register
   int *p; //seti_britta: out of the loop
   
   VEC sum1, sum2;      //seti_britta: moved to here
   sum1=sum2= ZERO;   //seti_britta: init ---> no warnings

   ALIGNED_YES( FreqData );
   ALIGNED_YES( PowerSpectrum );

#if defined( _MSC_VER )
   float *p_f1 = (float *)(sum1.m128_f32);  //seti_britta:new
   register int *p_i;
   register int k;
   k = 0;
#endif
   // seti_britta: let the loop run to the value of this_fft_len
   for   ( i   = 0, bin = 0; i < this_fft_len; i++, bin += bin_len)
   {
      p = (int *)(&PowerSpectrum[bin_off + bin]); //seti_britta: int *p out of the loop
#if defined( _MSC_VER )
      p_i = (int *)(&PowerSpectrum[bin_off + bin]); //seti_britta:new
#endif
      s_fetch( &FreqData[i+16][0] );   // get float data from FreqData
      sum1 = s_get1(&FreqData[0]);   // get FreqData[0] first row, first 4 elements to sum1
      sum1 = s_mult(sum1, sum1);      // power of sum1 and store to sum1, overwritten now
      sum2 = s_get1(&FreqData[1]);   // get FreqData[1] first row, second element to sum2
      sum2 = s_mult(sum2, sum2);      // power of sum2 and store to sum2, overwritten now
      sum1 = s_add(sum1, sum2);      // add both power values sum1 and sum2 and store to sum1
         // WARNING: !! this store overwrites FreqData[0], so loop must go bottom to top !!
         //  reusing buffer - not needed after our psNum compute.
      s_put1(&workBuf, sum1);   //    workBuf = psNum; store sum1 to workBuf
#if defined( _MSC_VER )
      v_convert_f(k, p_i, p_f1);   //seti_britta: new, convert function with write back to PowerSpectrum
#else
      s_put1_NC(p, sum1 );
#endif
   }

      // When using non caching writes (non-temporal), you should allways force
      // the writes to be "globaly visible" to possible other CPUs
   s_fence_writes();
}
-----------------------------------------------------------------------
and so on for sum1 till sum4 analog .......

If anybody of you have a better solution let it me know
heinz  ;)
Title: Re: optimized sources
Post by: Jason G on 04 Oct 2007, 05:26:49 am
Any luck with that s_put1_NC(p,  sum1 ) call?  I haven't looked at this code, but the types are local, what does it break if you just change them... VEC_I sum1, sume2 and, VEC_I * p ? is powerspectrum not aligned ?
Title: Re: optimized sources
Post by: _heinz on 04 Oct 2007, 07:45:00 pm
@ Jason,
I had sucess, it compiled and linked sucessful. But you need do nothing here, you have the Intel Compiler.
If you look into the code you can see that  Powerspectrum is aligned --->
  ALIGNED_YES( PowerSpectrum );
I took the var definitions out of the block at the beginning, that reduce prolog and epilog of the block.

heinz   ;)
Title: Re: optimized sources
Post by: _heinz on 08 Oct 2007, 08:33:05 pm
Compiler Option /LTCG 
how you can use it to optimize your app
klick  here  (http://msdn2.microsoft.com/en-us/library/xbf3tbeh(VS.71).aspx)

heinz
Title: Re: optimized sources
Post by: Jason G on 08 Oct 2007, 09:01:49 pm
Yeah, works good sometimes, but you have to look carefully at the output because sometimes it does silly things .... and that is hard for link time because there is no source.
Title: Re: optimized sources
Post by: Jason G on 08 Oct 2007, 11:39:08 pm
Any tips at all on handling cache thrashing on an early (non HT) p4 ? I waiting on my second profile set, up to run 7 of 19  ::)
Title: Re: optimized sources
Post by: _heinz on 11 Oct 2007, 07:02:00 pm
As you all know a well structured data-set can better and faster  handled  than a not well structured dataset.
Aligned data are very important for optimal performance. We can help the compiler in it´s work, if we make some rules for data definitions
1.) write all structures together, one after the other
2.) write all double together, one after the other
3.) write all float together, one after the other
4.) write all int together, one after the other
5.) write all char together, one after the other
6.) write all bool together, one after the other

don´t mix definitions and codepieces
don´t make definitions in loops
hold vars global to reduce prolog and epilog of blocks
avoid type-convert

organize  same data-fields as vectors and n-dimensional matrices.
Use structures for vars and datas because it is easy to align structures.

Use structured programming methods for the code.

regards heinz


Title: Re: optimized sources
Post by: Jason G on 11 Oct 2007, 07:13:43 pm
HAHAHA,  I never saw some of those bad things before this week. I know exactly what you mean.
Title: Re: optimized sources
Post by: _heinz on 11 Oct 2007, 07:53:13 pm
@jason,
sure, this is not the answer to your question... it is common
Title: Re: optimized sources
Post by: Jason G on 11 Oct 2007, 08:05:05 pm
Sadly true, It takes a long time to understand the code this way.
Title: Re: optimized sources
Post by: _heinz on 11 Oct 2007, 08:39:56 pm
the most important part is analyzeFuncs.cpp
I did my best for a liitle more structure in it for better reading and understanding
here are some short comments
// here all inline functions
// ============================================================================
// seti_britta: set inline functions direct before the main fkt
// hint:  you find the other functions  behind the closing brace of main fkt
// ----------------------------------------------------------------------------
//      order to find               used in:
//      ------------------------      -----------------------------
//      getMTFL                     do_generate_chirp_fft_pairs
//      load_wisdom                  do_generate_fft_coeff
//      save_wisdom                  do_generate_fft_coeff
//                              do_generate_chirp_fft_pairs
//      notify_user                  seti_analyze
//                              do_generate_fft_coeff
//                              do_generate_chirp_fft_pairs
//                              do_chirping_data
//                              do_return_best_of_signals
//      do_generate_fft_coeff         seti_analyze
//      do_generate_chirp_fft_pairs      seti_analyze
//      do_chirping_data            seti_analyze
//      do_transpose               seti_analyze
//      process_data               seti_analyze
//      do_analyse_pot               seti_analyze
//      do_return_best_of_signals      seti_analyze
//      stats_output               do_generate_chirp_fft_pairs
//
//
// ============================================================================
all functions have now heads like this ----->
// ----------------------------------------------------------------------------
//   Function:   getMTFL
//   Typ      :   int
//   Inhalt   :   Find maximum FFT length for which transpose of PowerSpectrum
//            is needed
//   parameter:   int maxFFTLen
//   last update:         by:
// ----------------------------------------------------------------------------

and that is my actual main loop ---->
// ----------------------------------------------------------------------------
//   Function:   seti_analyze
//   Typ      :   int
//   Inhalt   :   seti_analyze
//         The main analysis function. Args: state pointer to data, # of
//         points, starting chirp/fftlen Must be called with unchirped data;
//         this function modifies (chirps) the data in place swi parsed WU header         
//   parameter:   ANALYSIS_STATE &state
//
//   last update:18.06.2007   by:seti_britta
// ----------------------------------------------------------------------------
// Part 1   allocation and init
// Part 2   generate fft coefficients, save into wisdom
// Part 3   generate chirp/fft pairs, do different calcs in preparation analyze
// Part 4   loop through chirp/fft pairs - this is the top level analysis loop.
// Part 4.1 chirping data
// Part 4.2 do transpose if needed
// Part 4.3 process data
// Part 4.4 analyze power over time (POT), set checkpoint
// Part 5.   return the "best of" signals and do the rest
//
// ----------------------------------------------------------------------------
int seti_analyze( ANALYSIS_STATE &state )
{
// Part 1   allocation and init
    bitfield    = swi.analysis_cfg.analysis_fft_lengths;
    DataIn      = state.savedWUData;
    NumDataPoints = state.npoints;
   ChirpedData   = NULL;
    WorkData      = NULL;
    PowerSpectrum = NULL;
    num_cfft = retval = 0;
    MinChirpStep  = 0.0;
    last_chirp_ind = -1 << 20;
    cputime0 = 0;
   int have_transpose = false;   // seti_britta: used in: do_transpose(); process_data();
   d_log2 = log ( 2.0 );
    #if defined( USE_IPP )
        ippStaticInit();        // initialization of IPP library
    #elif defined( USE_FFTWF )
        // plan space for fftw
        // fftwf_plan  analysis_plans[MAX_NUM_FFTS]; //now out external
    #else
        // fields need by the ooura fft logic
        int         *BitRevTab[MAX_NUM_FFTS];
        float       *CoeffTab[MAX_NUM_FFTS];
    #endif
    ChirpedData = state.data;
    PowerSpectrum = ( float * ) calloc_a(NumDataPoints, sizeof(float), MEM_ALIGN);
    if (PowerSpectrum == NULL) SETIERROR(MALLOC_FAILED, "PowerSpectrum == NULL");
    notify_user( "Choosing optimal functions" );   
    CacheChirpCalc  = optimize_init(); // choose fastest function
// end Part 1   allocation and init
   do_generate_fft_coeff();// Part 2 generate fft coefficients, save into wisdom
   do_generate_chirp_fft_pairs();   // Part 3 generate chirp/fft pairs
// Part 4   loop through chirp/fft pairs - this is the top level analysis loop.
   chirp_units = 0;
    for ( icfft = state.icfft; icfft < num_cfft; icfft++ )// the big loop
    {
      do_chirping_data();   // Part 4.1 chirping data
        if (fftlen <= MaxTransposeFftLen)
         do_transpose();   // Part 4.2 do tanspose, use strips of 4
      process_data();      // Part 4.3 process data
      do_analyse_pot();   // Part 4.4 do analyze pot
   }// end loop over chirp/fftlen paris
   do_return_best_of_signals();// Part 5 return "best of" signals and do the rest
return retval; // finish seti_analyze
}   // end of seti_analyze
// ============================================================================
// seti_britta: here after the closing brace of the main fkt are the functions
// you find it in the following order:            used in:
//      enough_ram                           not found
//      v_BaseLineSmooth                     do_generate_chirp_fft_pairs
//      GetPowerSpectrum_ptt                  not found
//      PwrSpectrumOnly_ptt                     not found
//      TransposeStrip_ptt                     not found
//      v_subTranspose                        TransposeStrip_ptt
//      TransposeStrip_ptt( orig_v_Transpose2 )      not found
//      TransposeStrip_ptt( orig_v_Transpose4 )      not found
//
//
//
//hint: functions which are not found will be used in other sourcefiles.
// ----------------------------------------------------------------------------


hoping that helps
regards heinz  ;)
Title: Re: optimized sources
Post by: Jason G on 11 Oct 2007, 08:43:59 pm
Much nicer thank you :D
Title: Re: optimized sources
Post by: Jason G on 12 Oct 2007, 11:26:09 am
Quote
// Part 1   allocation and init
// Part 2   generate fft coefficients, save into wisdom
// Part 3   generate chirp/fft pairs, do different calcs in preparation analyze
// Part 4   loop through chirp/fft pairs - this is the top level analysis loop.
// Part 4.1 chirping data
// Part 4.2 do transpose if needed
// Part 4.3 process data
// Part 4.4 analyze power over time (POT), set checkpoint
// Part 5.   return the "best of" signals and do the rest

Here you are starting to see some encapsulation of the underlying processes, within which are the optimised routines.  I am starting to think some classes will help for those,  instead of the pointer juggling table.

Just inital thoughts,
    optimalChirpFunc = chirpFuncCollection.bestChirp(...);
    cerr << "Optimal Chirping Function Chosen: " << optimalChirpFunc.name() << endl;
    ...
    optimalChirpFunc.doChirp(...);
 

Jason
Title: Re: optimized sources
Post by: _heinz on 22 Oct 2007, 09:08:11 pm
For better understanding the benchmark and the complexity of the optimizing process I compiled FFTW-3.1.2 for Windows using VS2005. See attachment ---> FFTW.7z
here is a first result ---> benchf_sse.exe -opatient 64 128 256 512 1024 2048 4096
fftw-3.1.2 benchfsse started
Problem: 64, setup: 29.83 ms, time: 2.26 us, ``mflops'': 849.14
Problem: 128, setup: 68.43 ms, time: 6.43 us, ``mflops'': 697.23
Problem: 256, setup: 192.97 ms, time: 14.04 us, ``mflops'': 729.44
Problem: 512, setup: 383.39 ms, time: 30.87 us, ``mflops'': 746.36
Problem: 1024, setup: 886.80 ms, time: 70.96 us, ``mflops'': 721.55
Problem: 2048, setup: 2.18 s, time: 155.05 us, ``mflops'': 726.49
Problem: 4096, setup: 5.75 s, time: 339.71 us, ``mflops'': 723.44
fftw-3.1.2 benchfsse ended.
------------------------------------------------------------------------------------------------------------------------
If you want to know what is going on here you can read the manual there (http://www.fftw.org/fftw3_doc/)
have fun
regards heinz   ;D

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Jason G on 26 Oct 2007, 08:16:21 am
Nice one, I'll be taking a look at that too soon, as the FFTs (Using Intel IPP at the moment) and the Pulse Folding (mostly selects AK varieties in tests) are generating much cache issues with my old machines.  As I have worked with custom FFT's before it'll be interesting to poke around in there (Difficult to try with IPP  ;) ) to see how far things have come. 

The more Intel Literature I'm reading is suggesting significant speedups will be possible for my old p4's.  With specific optimising techniques a possible 3+ times speedup of certain types of loops..  I'll set my goals low and settle for a 5 to 10% crunch time improvement across angle ranges :). 

The problems I've managed to identify so far in the  inner loops in the FFT and folding are p4 specific, but apparently apply to some (or all)  of the p4 based xeons as well (and of course p4 based celerons too) .  Looking at BoincStats, If that's anything to go by,  that's one heck of a lot of active machines. :o

  I am a bit surprised that Intel's own IPP is doing this, of course they've moved on to newer faster architectures, perhaps there are some implementation specific aspects of IPP I haven't come across yet, that'll all be fun to find out ....

I still will examine the costly memcopies further but may try integrating them using some of the processing methods put forward by Joe Segur, If I get the chance by christmas time.  There are limited options for further parallelisation on my old single core beasts, and they look like a good one.

Keep going, I'm still paying attention when I can :D  Even though you are working a different platform the approach still help my very gradual understanding.

Jason
Title: Re: optimized sources
Post by: Jason G on 26 Oct 2007, 10:14:58 am
LOL, Here's one in seti_analyze that disappears if going to FFTW,
 
Code: [Select]
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
                    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif

I see a few of those.

Another thought. Has anyone attempted to use that FFTW codelet generator given that only a small portion of fftw is used? I have played with OCAML before, didn't seem hard.[but it was long enough ago to have forgotten everything :D]

Jason
Title: Re: optimized sources
Post by: Josef W. Segur on 26 Oct 2007, 08:55:01 pm
LOL, Here's one in seti_analyze that disappears if going to FFTW,
 
Code: [Select]
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
                    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif

I see a few of those.

Another thought. Has anyone attempted to use that FFTW codelet generator given that only a small portion of fftw is used? I have played with OCAML before, didn't seem hard.[but it was long enough ago to have forgotten everything :D]

Jason

Yes, those memcpy calls could be eliminated if the IPP FFTs were switched to out of place. Testsuji made that change in the official sources after 5.15 so they aren't included in our source. I've mentioned this several times, but it would be best if someone who actually works with IPP made and tested the changes.

I've thought about codelet generation, even downloaded OCAML, but have never done anything. I suspect there could be some efficiency to be gained by an FFTW function which combined the FFT and conversion to PowerSpectrum; the final FFT stage has the values needed to save the power rather than having a separate function to go through the complex array and convert it. The reversibility of a complete FFT is only needed during baseline smoothing.
                                                      Joe
Title: Re: optimized sources
Post by: Jason G on 26 Oct 2007, 11:56:33 pm
yes , What triggered the mention is an attempt for myself to understand why the susbsequent calls to IPP are out of place versions, called using the same source and destination 'inplace style',  odd  :o.
     ippsFFTInv_CToC_32fc(
                        ( Ipp32fc * ) WorkData,   //pSrcDst for inplace, pSrc for outplace
                        ( Ipp32fc * ) WorkData,    // additional parameter indicating out of place call ?
                                                                       // maybe, drop it for in place, or change for out of place proper
                                                                      // and disable preceding memcpy.
                        FftSpec[FftNum],
                        FftBuf );

Having a play with MinGW over Eclipse at the moment for other work, less vendor library oriented.  I'm liking it, a big switch as I haven't used a gnu compiler for years.  Woot, 'proper make facilities'  ;D, means I'll have to take a deeper look at FFTW sometime soon.

Jason

[ Maybe when I'm back on ICC/IPP, I'll see what breaks if I comment out the memcpy, and use out of place parameter , just by changing the source arguement (In all those places in seti_analyze),   be indeed nice if someone more IPP experienced and in the sources loop could look at Joe's observation and comment, test etc.]
Title: Re: optimized sources
Post by: Jason G on 27 Oct 2007, 10:54:45 am
Okay, Just for kicks I managed to get FFTW 3.1.2 (configure & make) scripts operational in MinGW/MSYS.  No Idea what to make of the actual configuration flags (* config.h) yet though  ;D back to the doccos!
Title: Re: optimized sources
Post by: _heinz on 27 Oct 2007, 03:39:09 pm
Hi Jason,
nice that you encouraged me, thanks...

The points are in process_data ------> here with some changes to the original code
// ----------------------------------------------------------------------------
//   Function:   process_data
//   Typ      :   void
//   Inhalt   :   process data, with or without transpose      
//   parameter:   none
//   last update:19.03.2007   by:seti_britta      
// ----------------------------------------------------------------------------
// Part 4.2 process data
// ----------------------------------------------------------------------------
void process_data()
   {
      extern int have_transpose;
      if (!have_transpose) ifft = 0;   // seti_britta: ifft=0, when no transpose
      for (; ifft < NumFfts; ifft++ )
            {
                CurrentSub = fftlen * ifft;
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
                    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif

               // seti_britta:move the calculation of flops to the point where fftlen get value
               // flops_form1= 4 * fftlen + 5 * fftlen * log( double(fftlen) ) / log(2.0 )
               // count_flops( 4 * fftlen + 5 * fftlen * log( double(fftlen) ) / log(2.0 ) );
               count_flops(flops_form1);   // setibritta: new statement

                #if defined( USE_IPP )
                    ippsFFTInv_CToC_32fc(
                        ( Ipp32fc * ) WorkData,
                        ( Ipp32fc * ) WorkData,
                        FftSpec[FftNum],
                        FftBuf );
                #elif defined( USE_FFTWF )
                    fftwf_execute_dft(
                        analysis_plans[FftNum],
                        &ChirpedData[CurrentSub],
                        WorkData );
                #else
                    // replace time with freq - ooura FFT
               // seti_britta: take mul with 2 out off the loop, where fftlen get value
                 /*   cdft( fftlen * 2, 1, WorkData, BitRevTab[FftNum], CoeffTab[FftNum] ); */
               cdft( fftlen_m2, 1, WorkData, BitRevTab[FftNum], CoeffTab[FftNum] );
                #endif
            if (have_transpose)
               {
               // BENH: new version replace freq with power
               //      does transpose as well as puts values back
               //      into WorkData (for use by findSpikes)
               GetPowerSpectrum( WorkData, PowerSpectrum, fftlen, ifft, NumFfts);
               have_transpose = false;
               }
                // replace freq with power
            // no transpose
                else PwrSpectrumOnly( WorkData, (float *)WorkData, fftlen );

                // any ETIs ?!
                // If PoT freq bin is non-negative, we are into PoT analysis
                // for this cfft pair and need not redo spike finding.
                if ( analysis_state.PoT_freq_bin == -1 )
                    {
                    count_flops( fftlen );
                    retval = FindSpikes( (float *)WorkData, fftlen, ifft, swi );
                    progress += SpikeProgressUnits( fftlen ) * ProgressUnitSize / NumFfts;
                    if ( retval ) SETIERROR( retval, "from FindSpikes" );
                    }

                // progress = ((float)icfft)/num_cfft + ((float)ifft)/(NumFfts*num_cfft);
                progress = std::min( progress, 1.0 );
                #ifdef BOINC_APP_GRAPHICS
                    if ( !nographics() )
                        {
                        if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                        sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                        }
                #endif
                remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                fraction_done( progress, remaining );

            }   // end for ifft < NumFfts
   } // end part 4.2 process_data
------------------------------------------------------------------
regards heinz
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 12:11:42 am
Yeah, I still have the clunky seti_analyze function :D , but that's the same code. I think 3 or 4 places have that arrangement (didn't count properly yet).  Here's what I'm suggesting for those IPP inclined might work for a crude test (might break other non IPP / FFTW versions though, but good enough to test):

Quote
    ...
                #ifndef USE_FFTW        // FFTW now uses out of place transforms.
               // Commenting out the mempy()
               //    memcpy( WorkData, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                #endif
     ...
           // Now fix the source for out of place IPP call properly
                #if defined( USE_IPP )
                    ippsFFTInv_CToC_32fc(
           //             ( Ipp32fc * ) WorkData, // changing from this source
                       ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
                        ( Ipp32fc * ) WorkData, // leave as same destination
                        FftSpec[FftNum],
                        FftBuf );   

Maybe there is a trick I don't know or understand to the original code, If so then  8), but I can't see it.  Maybe in the next couple of weeks I can see what happens.

Jason

[Maybe this will be nicer on Non IPP/FFTW builds
changing this:
  #ifndef USE_FFTW        // FFTW now uses out of place transforms.
to something like this:
#ifndef USE_FFTW  || USE_IPP     // FFTW & IPP now use out of place transforms.
]
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 03:42:39 am
... The reversibility of a complete FFT is only needed during baseline smoothing.
                                                      Joe

Now there's a possible can of worms.  Green from the IPP tutorials I was thinking in terms of thresholding denormal data etc...to speed up IPP.  I figured the destructiveness might be a problem so left it there until I get a better handle on things. 
But seeing as different architectures are returning similar (enough) results despite, as I understand it:
            - no or limited threshholding in place for verysmall/big numbers in the data (Could be wrong there)
            - known architecture dependance  in calculation for such boundary data (randomness)
            - significant penalties in SSE for arithmetic with these numbers (these do show in vtune profiles)
            - reduced IPP performance with this denormal data
I might have to look again.
Title: Re: optimized sources
Post by: _heinz on 28 Oct 2007, 07:33:10 am
Hi Jason,
#ifndef USE_FFTW  || USE_IPP     // FFTW & IPP now use out of place transforms
-------------------------------------------------------------------------------------------------------------------
give warnings if I compile --->
------ Build started: Project: seti_boinc, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "." /I "../../../boinc/api" /I "../../../boinc/client/win" /I "../../../boinc/lib" /I ".." /I "glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\boinc" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_DEBUG" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MTd /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\analyzeFuncs.cpp"
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
..\analyzeFuncs.cpp(694) : warning C4067: unexpected tokens following preprocessor directive - expected a newline
..\analyzeFuncs.cpp(1187) : warning C4067: unexpected tokens following preprocessor directive - expected a newline
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 2 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
so I will use  ---->
           #ifndef USE_FFTWF   // FFTW & IPP now use out of place transforms
      // memcopy no longer necessary
        //    memcpy( DataOutChunk, DataInChunk, int(NumPointsInChunk * sizeof(sah_complex)) );
        #endif
-----------------------------------------------------------------------------------------------------------
this block is empty so we can delete it, but I let it in there for documentation
it compiles fine now without any warnings --->
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
-----------------------------------------------------------------------------------------------------------
There are exact 3 points in analyzeFuncs.cpp
The first is in do_transpose
// ----------------------------------------------------------------------------
//   Function:   do_transpose()
//   Typ      :   void
//   Inhalt   :   do transpose
//   parameter:   none
//   last update:19.03.2007   by:seti_britta      
// ----------------------------------------------------------------------------
// Part 4.2.1 do tanspose, use strips of 4
// ----------------------------------------------------------------------------
void do_transpose()
{
   extern int have_transpose;
   for ( ifft = 0; ifft < NumFfts - 3; ifft += 4 )
      {
            // do transpose
             for ( int iC = 0; iC < 4; iC++ )
                 {
                    CurrentSub = fftlen * (ifft + iC);
                /*   sah_complex *WorkArea = &WorkData[iC * fftlen / 2];*/  // assume sah_complex 2 floats
            // seti_britta: do fftlen / 2 out of the loop, where fftlen get its value
               sah_complex *WorkArea = &WorkData[iC * fftlen_half];  // assume sah_complex 2 floats
               #ifndef USE_FFTW      // FFTW & IPP now use out of place transforms
               // memcopy no longer necessary,
                    //    memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                    #endif

                    #if defined( USE_IPP )
                        ippsFFTInv_CToC_32fc(
                            ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
                     ( Ipp32fc * ) WorkArea, // leave as same destination
                     FftSpec[FftNum],
                     FftBuf );
                    #elif defined( USE_FFTWF )
                        fftwf_execute_dft( analysis_plans[FftNum], &ChirpedData[CurrentSub], WorkArea );
                    #else // replace time with freq - ooura FFT
               // seti_britta: take mult wit 2 out of the loop, where fftlen get its value
                    /*  cdft( fftlen * 2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] ); */
                  cdft( fftlen_m2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] );
                    #endif

                    // replace freq with power
                    PwrSpectrumOnly( WorkArea, (float *)WorkArea, fftlen );
               // seti_britta:move the calculation of flops out of loop, where fftlen get value
                    // count_flops( 4 * fftlen + 5 * fftlen * log( double(fftlen) ) / log(2.0 ) );
               count_flops(flops_form1);   // setibritta: new statement

                    // any ETIs ?!
                    // If PoT freq bin is non-negative, we are into PoT analysis
                    // for this cfft pair and need not redo spike finding.
                    if ( analysis_state.PoT_freq_bin == -1 )
                        {
                        count_flops( fftlen );
                        retval = FindSpikes( (float *)WorkArea, fftlen, ifft + iC, swi );
                        progress += SpikeProgressUnits( fftlen ) * ProgressUnitSize / NumFfts;
                        if ( retval ) SETIERROR( retval, "from FindSpikes" );
                        }

                    // progress = ((float)icfft)/num_cfft + ((float)ifft)/(NumFfts*num_cfft);
                    progress = std::min( progress, 1.0 );
                    #ifdef BOINC_APP_GRAPHICS
                        if ( !nographics() )
                            {
                            if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                            sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                            }
                    #endif
                    remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                    fraction_done( progress, remaining );

                 } // end ic < 4
                TransposeStrip(fftlen, NumFfts, ifft, (float *)WorkData, PowerSpectrum);
        } // end for ifft < NumFfts - 3
      // transpose done
      have_transpose = true;   // seti_britta: tell process_data that transpose is done
}   // end of do_transpose
-------------------------------------------------------------------------------------------------------------------
The second is in process_data
The third is in v_BaseLineSmooth
        DataInChunk = &( DataIn[TimeChunk * NumPointsInChunk] );
        #ifndef USE_FFTWF   // FFTW & IPP now use out of place transforms
      // memcopy no longer necessary
        //    memcpy( DataOutChunk, DataInChunk, int(NumPointsInChunk * sizeof(sah_complex)) );
        #endif

        // transform to freq
        #ifdef USE_IPP
            ippsFFTInv_CToC_32fc(
                // ( Ipp32fc * ) DataOutChunk,
            ( Ipp32fc * ) DataInChunk,  // to direct source for out of place
                ( Ipp32fc * ) DataOutChunk, // leave as same destination
                FftSpec,
                NULL );
--------------------------------------------------------------------------------------------------------------------------
a fourth is in benchmark.cpp
But I have no idea if there is anything todo
line 618 ff
   for(loops = 0; loops < 25 && (end_cyc-total_run)< MAX_CYCLES; loops++)
      {
      if(pre_test == zero_out)   memset( out_buf, 0, test_size );
      if(pre_test == fill_in)      memcpy( out_buf, workBuf, test_size );
      ramming_speed();
      cycles = cycleCount();
      switch ( bench_list[idx].token )
         {
         case _FFT:
            #if defined( USE_IPP )
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            #endif //seti_britta:
            #if defined( USE_FFTWF )
            fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );
            #endif
            break;
         case SUM2_TBL:
------------------------------------------------------------------------------------------
What do you think about this in benchmark.cpp ?
regards heinz
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 08:04:58 am
Yes that will be broken, I'll look up the correct preprocessor format, and post it here, as soon as my machine stops choking on the MinGW FFTW build (need bigger machine :( ).

[Later:  Here is the better preprocessor format, and it is FFTWF not FFTW, I think I need spectacles, a faster computer ,more practice with preprocessor directives, and maybe a beer! :D ]

#if !(defined(USE_IPP) | defined(USE_FFTWF))
   //statements to be used if neither FFTW or IPP
   //memcopy is here, this should only run for builds with no IPP or FFTW .... or if the following IPP call wasn't updated
        memcpy(.......,.......)
#endif
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 08:34:16 am
...
What do you think about this in benchmark.cpp ?
regards heinz

Looking at the places you show now   :)

[Later:  In Benchmark.cpp It is also an out of place IPP inverse FFT call, but with same source and destination parameters.
  ..NOTE:  I am  still wondering if they must have done it like that on purpose but still don't know why]

If it is meant to be in place FFT it should be:
              ippsFFTInv_CToC_32fc(
                                           
                  ( Ipp32fc * ) out_buf, // This is both source and destination, don't need second time
                                                  // usually  needs to come from a memcpy first to not corrupt source data.
                                                 // benchmark.cpp is special case because it might be zeros or filled buffer
                  FftSpec,
                  NULL );

If it is meant to be out of place it should be: [ I think you cannot use this for Benchmark.cpp , we might use zero fill]
              ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) workBuf, // This is the source data, this is not overwritten
                  ( Ipp32fc * ) out_buf, // This is some other Buffer destination
                                                 // no memcpy required
                  FftSpec,
                  NULL );
Title: Re: optimized sources
Post by: _heinz on 28 Oct 2007, 09:36:47 am
Hi Jason,
benchmark.cpp ------>
-----------------------------------------------------------------------------------------------
      switch ( bench_list[idx].token )
         {
         case _FFT:
            #if defined( USE_IPP )
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) workBuf,   // This is the source data, this is not overwritten
                  ( Ipp32fc * ) out_buf,   // This is some other Buffer destination
                                    // no memcpy required
                  FftSpec,
                  NULL );
            #endif //seti_britta:
            #if defined( USE_FFTWF )
            fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );
            #endif
--------------------------------------------------------------------------
it compiles well ------>
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MTd /Zp16 /Gy /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc90.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\benchmark.cpp"
benchmark.cpp
-----IPP-----
-----SSE2-----
-----ipp-----
-----sse2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
tried this as you prevent it ---->
            #if !(defined(USE_IPP) | defined(USE_FFTWF))
                  //statements to be used if neither FFTW or IPP
                  //memcopy is here, this should only run for builds with no IPP or FFTW .... or if the following IPP call wasn't updated
                        memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                    #endif
---------------------------------------------------------------------------------------
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
..\analyzeFuncs.cpp(630) : error C2065: 'WorkArea' : undeclared identifier
..\analyzeFuncs.cpp(642) : error C2065: 'WorkArea' : undeclared identifier
..\analyzeFuncs.cpp(642) : error C2065: 'WorkArea' : undeclared identifier
..\analyzeFuncs.cpp(653) : error C2065: 'WorkArea' : undeclared identifier
--------------------------------------------------------------------------------------------------------
627                     #if defined( USE_IPP )
628                         ippsFFTInv_CToC_32fc(
629                            ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
630                     ( Ipp32fc * ) WorkArea, // leave as same destination
                     FftSpec[FftNum],
                     FftBuf );

-------------------------------------------------------------------------------------------------------------------------------
hmm.... will use the old statement then it compiles....
heinz
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 09:44:44 am
In Benchmark.cpp, I am worried that will not work in the case we use a zero fill instead of workBuf as the source.

In AnalyzeFuncs, I am looking where/ why your WorkArea has gone  :o
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 09:53:54 am
Are you missing this line above the   memcpy #if block in that one place?

           sah_complex *WorkArea = &WorkData[iC * fftlen / 2];  // assume sah_complex 2 floats
          #if !(defined(USE_IPP) | defined(USE_FFTWF))
                  //statements to be used if neither FFTW or IPP
                   memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
          #endif

You have different source to me! ( different line numbers! ) uh oh

[PS: If it is breaking your source to try this then I suggest to stop and reverse :D It is a nice idea that may or may not show any benefit in the long run, but needs more planning, consideration and testing before wholesale code changes are made. Baby steps are better IMO, Besides,  I break enough of my own code  ;)  ]
Title: Re: optimized sources
Post by: _heinz on 28 Oct 2007, 11:02:40 am
Merci for the comments,
in this way it compiles ---->
------------------------------------------------------------------------------------------------------
                    CurrentSub = fftlen * (ifft + iC);
                /*   sah_complex *WorkArea = &WorkData[iC * fftlen / 2];*/  // assume sah_complex 2 floats
            // seti_britta: do fftlen / 2 out of the loop, where fftlen get its value
               sah_complex *WorkArea = &WorkData[iC * fftlen_half];  // assume sah_complex 2 floats               #ifndef USE_FFTW      // FFTW & IPP now use out of place transforms
               // memcopy no longer necessary,
                    //    memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
                    #endif

                    #if defined( USE_IPP )
                        ippsFFTInv_CToC_32fc(
                            ( Ipp32fc * ) &ChirpedData[CurrentSub], // to direct source for out of place
                     ( Ipp32fc * ) WorkArea, // leave as same destination
                     FftSpec[FftNum],
                     FftBuf );
                    #elif defined( USE_FFTWF )
                        fftwf_execute_dft( analysis_plans[FftNum], &ChirpedData[CurrentSub], WorkArea );
                    #else // replace time with freq - ooura FFT
               // seti_britta: take mult wit 2 out of the loop, where fftlen get its value
                    /*  cdft( fftlen * 2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] ); */
                  cdft( fftlen_m2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] );
                    #endif
--------------------------------------------------------------------------------------------------------
the yellow line was there....
yes very different linenumbers, I give the analyzeFuncs.cpp a new structure......
I will have a look at benchmark again.... make a trigger for the case we use a zero fill.
heinz

heinz
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 11:13:25 am
the yellow line was there....
    and still "undefined" variable WorkArea?,  that is wierd  :o

Quote
yes very different linenumbers, I give the analyzeFuncs.cpp a new structure......
I will have a look at benchmark again.... make a trigger for the case we use a zero fill.
heinz
Ahh that's right, the improved model you showed me ... that's some good stuff mmm.

I'll take a look in mine at the WorkArea part,  I think may be in an inner loop and may be a most important place if something is to show a change in tests.

Jason
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 11:34:50 am
This block compiles on mine: (For comparison, I can see no major functional difference to yours :D )
----------
             CurrentSub = fftlen * (ifft + iC);
             sah_complex *WorkArea = &WorkData[iC * fftlen / 2];  // assume sah_complex 2 floats
          #if !(defined(USE_IPP) | defined(USE_FFTWF)) // makes ,memcpy inactive
                       memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
          #endif

             #if defined( USE_IPP )
                        ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) &ChirpedData[CurrentSub], // Source
                     ( Ipp32fc * ) WorkArea, //Destination
                     FftSpec[FftNum],
                     FftBuf );
             #elif defined( USE_FFTWF )
                        fftwf_execute_dft( analysis_plans[FftNum], &ChirpedData[CurrentSub], WorkArea );
             #else // replace time with freq - ooura FFT
                        cdft( fftlen * 2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] );
             #endif

----------

I did notice it went haywire if I missed out a ( Ipp32fc * ) typecast.
Title: Re: optimized sources
Post by: _heinz on 28 Oct 2007, 12:14:05 pm
yes it compiles mine too --->
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 0 warning(s)
----------------------------------------------------------------------------------------
heinz
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 12:29:54 pm
Ahh good one  ;D,  I'm thinking that this
 new way:
       --- Using no memcopy
       --- Using IPP Function as intended

is better than the old way:
      --- Using a memcopy (even an optimised one, which I was looking at)
      --- Using IPP function in a wierd way

of course only a test can show if this has any speed difference.  Be a while before I could look at a rebuild as I have more schoolwork and have to give some tutoring this week .  Even if it is slower I don't mind because it still has helped me to understand a small piece more of the code. The next step for me after testing this would probably be to look at Joe's even better suggestions,  There are many now!.

Thanks for trying this and keep plugging away !

Back later in the week!

Jason

Title: Re: optimized sources
Post by: _heinz on 28 Oct 2007, 12:55:20 pm
changed benchmark.cpp ----->
--------------------------------------------------------------------------------------------------------
   for(loops = 0; loops < 25 && (end_cyc-total_run)< MAX_CYCLES; loops++)
      {
      if(pre_test == zero_out)   memset( out_buf, 0, test_size );
      if(pre_test == fill_in)      memcpy( out_buf, workBuf, test_size );
      ramming_speed();
      cycles = cycleCount();
      switch ( bench_list[idx].token )
         {
         case _FFT:
            #if defined( USE_IPP )
            if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            }
            else
            {
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) workBuf,   // This is the source data, this is not overwritten
                  ( Ipp32fc * ) out_buf,   // This is some other Buffer destination
                                    // no memcpy required
                  FftSpec,
                  NULL );
            }
            #endif //seti_britta:
            #if defined( USE_FFTWF )
            fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );
            #endif
            break;
-----------------------------------------------------------------------------------------------------------------------------
it compiles well --->
benchmark.cpp
-----IPP-----
-----SSE2-----
-----ipp-----
-----sse2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
-------------------------------------------------------------------------------------------------------------------------------
will try this an look if it works well....
see you again here
regards heinz
Title: Re: optimized sources
Post by: Jason G on 28 Oct 2007, 01:22:56 pm
ahah I see.... now that IPP call is "In Place"  You can do this:
   
...
       if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
             //     ( Ipp32fc * ) out_buf,  // Commented out this to make it inplace
                  ( Ipp32fc * ) out_buf, // This is both source and destination
                  FftSpec,
                  NULL );
            }
...

Whether it makes any difference is another question :D
questions I have are:
        -    Why benchmark an array of zeroes ?
        -    If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?
Title: Re: optimized sources
Post by: _heinz on 28 Oct 2007, 02:02:56 pm

questions I have are:
        -    Why benchmark an array of zeroes ?
        -    If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?

hmm... maybe Alex Kan or Joe has a good answer
Title: Re: optimized sources
Post by: Josef W. Segur on 29 Oct 2007, 10:39:15 am

questions I have are:
        -    Why benchmark an array of zeroes ?
        -    If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?

hmm... maybe Alex Kan or Joe has a good answer

The 2.2B benchmark.cpp source doesn't set pre_test to zero_out anyplace. Setting pre_test = fill_in makes sense for the in place transform so it always works on the same random data, that's not needed for out of place. But the FFT benchmark is timing only, and wasted time at that except in standalone runs with -bench or -verbose, since it is not used to choose a "best" variant. The lunatics.at 2.4 builds don't run the FFT benchmark test, though Crunch3r's 2.4V builds which use IPP FFTs do.

I don't know why Ben Herndon used the out of place form of  parameters in the ippsFFTInv_CToC_32fc() calls, but he may have checked the actual code produced and determined that was slightly more efficient.
                                                       Joe
Title: Re: optimized sources
Post by: Jason G on 29 Oct 2007, 11:44:32 am
I don't know why Ben Herndon used the out of place form of  parameters in the ippsFFTInv_CToC_32fc() calls, but he may have checked the actual code produced and determined that was slightly more efficient.
                                                       Joe
I wracked my brain about this, and ultimately came to a similar (though more convoluted and speculative) conclusion.  It would make sense to me if an explicit out of place call could make better use of the prefetch, cache and paralellism mechanisms we have discussed in a different context.  An explicit in place call could not, (so far as I can see for now, through read write dependancies).

After considering that, another possibility presented itself:
    for the same reasons, as originally presented the memcopy followed by the out of place form call (with inplace parameters), may simply be faster than 'true out of place' way we're playing with ::).  If so, I suspect a 'cache doubling effect' from using same source & dest. 

The flipside is that if that effect shows verifiably then it might even  indicate the particular calls are not using streaming writes to start with... possibly bringing your hybridised codelet phased processing screaming to a new sense of urgency.

More speculation than hard data at the moment, I'll think about some small simple external tests for a while and stew on it for a couple of weeks  ;)

Jason
Title: Re: optimized sources
Post by: _heinz on 01 Nov 2007, 05:13:26 pm
ahah I see.... now that IPP call is "In Place"  You can do this:
   
...
       if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
             //     ( Ipp32fc * ) out_buf,  // Commented out this to make it inplace
                  ( Ipp32fc * ) out_buf, // This is both source and destination
                  FftSpec,
                  NULL );
            }

if we do this we get a error message ---->
.\benchmark.cpp(634) : error C2660: 'w7_ippsFFTInv_CToC_32fc' : function does not take 3 arguments
also let it so as it is --->
            if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            }
--------------------------------------------
so it compiles
heinz
Title: Re: optimized sources
Post by: Jason G on 01 Nov 2007, 06:00:14 pm
so it compiles
heinz

Yes, as we have discovered before I must need my eyes checked  ;D and it would make sense , if it was ever used in the zero fill context, to leave it using  the same form as might occur in a real analysis anyway.

For the sakes of information - Here is the form for out of place Inverse FFT  (as exists):
    IppStatus ippsFFTInv_CToC_32fc(
                 const Ipp32fc* pSrc,
                 Ipp32fc* pDst, const
                 IppsFFTSpec_C_32fc* pFFTSpec,
                 Ipp8u* pBuffer);

And Here is the form for in place :
    IppStatus ippsFFTInv_CToC_32fc_I(
                 Ipp32fc* pSrcDst,
                 const IppsFFTSpec_C_32fc* pFFTSpec,
                 Ipp8u* pBuffer);

I am currently learning much about what is connected to what by trying to separate out the benchmark (for exploratory purposes).  Piece by piece it connects to almost the whole codebase, Still a few external references to track down, but I may end up with a stripped down custom testbed for examining function of different algorithms, libraries & optimised functions.

The main reason for this unnecessary but educational exploration is, I may wish to try and see actual differences between the FFT libraries, different compilers and flags, without touching my main copy of the code anymore.  Also I am interested to see how close to ideal the forward and inverse transforms are when a 'Maximum Length Sequence' is applied as input, rather than zeroes or random data (I hope I'll get a constant power spectrum, with no spikes etc...We''ll See :D )

Jason
Title: Re: optimized sources
Post by: _heinz on 03 Nov 2007, 05:32:00 am
Hi Jason,
her you see the output of ET I use to measure codepieces of two functions p1, p2
--------------------------------------------------------------------------------------------------------------------
ET v1.0 test seti
-------------------
Timer Frequency in:
Hz  =       3579545
MHz =       3.57955
GHz =       0.00358

Start Time =    1080132967465 Ticks
Stop Time  =    1080134441029 Ticks

Duration in Ticks   =  1473564
Duration in seconds =  0.4116623760841
--------------------------------------
Start Time =    1080134443291 Ticks
Stop Time  =    1080138377735 Ticks

Duration in Ticks   =  3934444
Duration in seconds =  1.0991463998916
--------------------------------------
        P1 = 1473564
        P2 = 3934444
        dif= 2460880

Solution:P1 is faster than P2
Press the Enter Key!
------------------------------------------------------------------------------------------------
so we see the success without running a test WU....

heinz
Title: Re: optimized sources
Post by: Jason G on 03 Nov 2007, 07:37:13 am
Cool , thanks for the links by PM.  could be quite handy for the things I intend to be looking at soon.... but LOL, where is etimer.lib file that is discussed in the intel site ? The link at the end of the etimer article is giving me some 3d transform program files INTEAD  :o , if I can't find it I probably should let Intel know their link is broken ....

[ LOL now they fixed it ! :D, maybe they read Lunatics]
Title: Re: optimized sources
Post by: _heinz on 03 Nov 2007, 10:47:59 am
maybe....we are one of the most accessed, now more than 22 000 ...... ;D


Title: Re: optimized sources
Post by: Jason G on 03 Nov 2007, 01:34:41 pm
'Tis truly an Epical Thread  :D.... But Wait there's more! ....

Using the timers I ran some big loop math array test pieces to establish the best optimisation configurations on my old p4 Northwood::

With everything else equal:
the xW sse2 setting I've been using all along = 14.15 secs (repeated runs to make sure)
the xN sse2 setting I wanted to test properly = 12.8 secs (repeated runs to make sure)

That makes xN builds nearly 10% faster on my old clunker with looping math code!

This means that:
 The good news is  I may already have found a way to acheive my 5 to 10% speed improvement goal for this machine! (without doing much at all.... Hmmm ...Better start thinking of a new goal! )
 ;D

Bad news is that I now have to go and rebuild the seti projects with my new settings to see if it will work ... and no time this week!  :(


Surprise Surprise, a  QxN build is faster on my Northwood :P
LOL


       
Title: Re: optimized sources
Post by: _heinz on 03 Nov 2007, 02:55:37 pm
lol... make a copy of your current seti folder and set it parallel to the boinc folder...so you need not touch the old one.

Title: Re: optimized sources
Post by: _heinz on 05 Nov 2007, 11:53:36 am
Surprise Surprise, a  QxN build is faster on my Northwood :P
LOL     
have a Northwood too  --->
CPU(s)   
Number of CPUs 1
 
Name Intel Pentium 4
Code Name Northwood
Specification Intel(R) Pentium(R) 4 CPU 2.66GHz
Family / Model / Stepping F 2 7
Extended Family / Model 0 0
Brand ID 9
Package mPGA-478
Core Stepping C1
Technology 0.13 um
Supported Instructions Sets MMX, SSE, SSE2
CPU Clock Speed 2672.8 MHz
Clock multiplier x 20.0
Front Side Bus Frequency 133.6 MHz
Bus Speed 534.6 MHz
L1 Data Cache 8 KBytes, 4-way set associative, 64 Bytes line size
L1 Trace Cache 12 Kuops, 8-way set associative
L2 Cache 512 KBytes, 8-way set associative, 64 Bytes line size
L2 Speed 2672.8 MHz (Full)
L2 Location On Chip
L2 Data Prefetch Logic yes
L2 Bus Width 256 bits
-----------------------------------------------------------------------------------------
Let us speed up the old machines --->  ;D


Title: Re: optimized sources
Post by: Jason G on 05 Nov 2007, 12:21:38 pm
Boincstats Host cpus, top 10 highest number on seti@home:
Pos.,  CPU, #, Total Credit

1    Intel(R) Pentium(R) 4 CPU 3.00GHz     104,449     1,920,980,979.29    
2    Intel(R) Pentium(R) 4 CPU 2.80GHz     88,848     1,254,181,274.59    
3    Intel(R) Pentium(R) 4 CPU 2.40GHz     57,309     633,952,931.43    
4    Intel(R) Pentium(R) 4 CPU 3.20GHz     45,737     875,822,530.51    
5    AMD Athlon(tm) 64 Processor 3000+     31,878     257,872,702.50    
6    AMD Athlon(tm) 64 Processor 3200+     30,304     288,741,370.07    
7    AMD Athlon(tm) Processor                   27,726        129,774,610.58    
8    Intel(R) Pentium(R) 4 CPU 2.00GHz     21,701    197,541,843.70
9    Intel(R) Pentium(R) 4 CPU 2.66GHz     19,200     208,668,039.95    
10    AMD Athlon(tm) 64 Processor 3500+     19,049     191,994,766.55    

We're Both in the top 10 most popular :D,  I have a #8 & #4  :P [Doesn't it feel good to know you're with the 'in crowd'?]

[Must get around to try to strip mine those inner pulse foldiing loops for the p4 64k / 1meg aliasing problem]
Title: Re: optimized sources
Post by: _heinz on 05 Nov 2007, 10:02:53 pm
It is worth to speed them up.... ;D

Although Dr. Who is already running his code... we give the old boxes a chance

squeezed the code of pulsefind.cpp again
sum1 and sum2 are no longer needed

here the case construct --->
  switch (i) {
//    case 30:
//      sum1 = one[29] + two[29];           sum2 = one[28] + two[28];
//      sum1 += three[29];                  sum2 += three[28];
//      P->dest[29] = sum1;                 P->dest[28] = sum2;
//      if (sum1 > tmax1) tmax1 = sum1;     if (sum2 > tmax2) tmax2 = sum2;
 //seti_britta: new code:
    case 30:
      P->dest[29]= one[29] + two[29]+three[29];           P->dest[28]= one[28] + two[28]+three[28];
 //     sum1 += three[29];                  sum2 += three[28];
 //     P->dest[29] = sum1;                 P->dest[28] = sum2;
      if (P->dest[29] > tmax1) tmax1 = P->dest[29];     if (P->dest[28] > tmax2) tmax2 = P->dest[28];

and so on for all cases
----------------------------------------------------------------------------------------------------------------------------------------------------

and here the loop construct
// ----------------------------------------------------------------------------
//   Function:   sum_func_ptt( sw_sum3_t31 )
//   Typ      :   float
//   Inhalt   :   folding subroutines, FPU optimized                     
//   parameter:   sw_sum3_t31         
//   last update:23.09.2007   by:seti_britta   new function
// ----------------------------------------------------------------------------
sum_func_ptt( sw_sum3_t31 ) {
  register int i, j, k;
  float tmax2, tmax1; //seti_britta: new
  float *one   = ss[0];
  float *two   = ss[0]+P->tmp0;
  float *three = ss[0]+P->tmp1;
  tmax2 = tmax1 = (0.0f); //seti_britta: no convert !!
  i = P->di;
  if ( i & 1 )
  {
    i -= 1;
    P->dest = tmax1 = one + two + three; //seti_britta:new
  }
   for ( j = i-1, k = i-2; j > 0; j -= 2, k -= 2 )
   {
      P->dest[j]= one[j] + two[j] + three[j];           P->dest[k]= one[k] + two[k] + three[k];
      if (P->dest[j] > tmax1) tmax1 = P->dest[j];     if (P->dest[k] > tmax2) tmax2 = P->dest[k];
   }
  if (tmax1 > tmax2) return tmax1;
  return tmax2;
}
-------------------------------------------------------------------------------------------------------------------------------------------
maybe the compact loop have a chance
so far it compiles well... now we must measure to find fastest
have fun
regards heinz   ;D  ;D
Title: Re: optimized sources
Post by: Jason G on 06 Nov 2007, 02:32:21 am
Yes, I think I would like to carefully go back and rexamine Joe's ideas/Posts in the other thread for incorporating 3 phase processing/ block prefetch in some places. I'll get a chance to look next weekend, and hopefully plan a methodical approach that might be able to handle striping for the p4 at the same time. 

Intel theories suggest 3 to 5 times possible improvement, in certain code by fixing those p4 problems,  And the 3 phase & prefetch techniques [ Ala AMD Paper] even more.  If it adds up to a 10 to 20% crunch time improvement I'll be happy because it would bring my p4 3.2 back over 1000 RAC :D

Title: Re: optimized sources
Post by: Jason G on 06 Nov 2007, 07:34:37 am
Progress so far,  Long way to go :D :
[Each compared against preset 2.3S9 xW SSE2 IPP build, on vs2005/ICC, p4 Northwood 2.0A@2.1GHz,NoHT, WinXP]

Tactic                                                                                                        Type            Status                 Effect
1- Better memcpy in GetFixedPot                                                                   Generic x86   Prelim Tests      ~0.3%
2- Out of Place FFTs / eliminating associated memcopies                                   Intel IPP        Initial          ~?.?%
3- Once off seti.cpp 8meg memcpy                                                                Generic x86    Untested    ~0.?%
4- Chirp function Block Prefetch, memcpy++ zerocase & 3phase chirp                  Generic x86   Untested        ~?.?%
5- Compiler Flags (xN SSE2 p4 Specific)                                                               P4 specific   Tested            ~10%
6- Strip Mined Inner loops (p4 specific, 64k & 1M variants)                        P4, possible x86   Untested        ~??%
7- GaussFit Improvements                                                                                   To be Determined

~ means approximate, my system, 'your mileage may vary'.

[Please anyone feel free to suggest additions, updates or corrections to this list: 
            either fairly generic OR p4 specific will do :D, Consider equivalent xP SSE3 builds as already on the list for later]

Jason
Title: Re: optimized sources
Post by: Jason G on 07 Nov 2007, 07:13:57 am
Quote
4- Chirp function Block Prefetch, memcpy++ zerocase & 3phase chirp                  Generic x86   Untested        ~?.?%

Took a quick look between school and work, looks like this may be easier than I thought to try.  On my configuration the consistantly selected chirping function is the outstanding "sse2_ChirpData_ak".  nice one.

The structure is already there for potential 3 phase processing, though it is currently straight SSE2 rendering it vectorised SIMD as far as I can see. The existing prefetch, processing and writing sections are all SSE2, clearly laid out and exhibit the clean crystal vase like 'niceness' quality that make you reluctant to tamper :D

With few other adaptations, adjusting the prefetch, changing the processing to FPU, and suitably adjusting the streaming writes should do the trick,
  ... though for the p4 I would like to try to keep the aliasing issue in mind which might just dictate some of the block sizes and order they are processed.

Oh for the weekend :D

Title: Re: optimized sources
Post by: Jason G on 07 Nov 2007, 11:05:59 am
First run of original code [ Will need run more times for baseline though ] : ( Very Nice function already )

--------------------------------------------------------------------------------------
Testing xN SSE2 Build.

sse2_ChirpData_ak:

NumDataPoints = 1024*1024
test_points = 32768

Timer Frequency in:

Hz  =       3579545
MHz =       3.57955
GHz =    0.00358

Start Time =    1585115997106 Ticks
Stop Time  =    1585116003199 Ticks

Duration in Ticks   =  6093
Duration in seconds =  0.0017021716447

--------------------------------------------------------------------------------------

Inner loop executes 8192 times
Title: Re: optimized sources
Post by: _heinz on 07 Nov 2007, 11:47:04 am
measure its the best to try code and find optimal variants.  ;D

the loop construct in pulsefind.cpp is ready now, but not measured.
Today I will squeeze the case-construct code.
have still some good ideas to eleminate code else and there...we will see...

Title: Re: optimized sources
Post by: Jason G on 07 Nov 2007, 12:14:29 pm
measure its the best to try code and find optimal variants.  ;D

the loop construct in pulsefind.cpp is ready now, but not measured.
Today I will squeeze the case-construct code.
have still some good ideas to eleminate code else and there...we will see...



Great!, a pulsefind baseline will be good too. for underneath pulsefind  It seems my machine also selects always AK folding routines and spends much of its time in the x2AL version..  I am running vtune on the chirp one now to look for any p4 specific slowdowns, wickedly fast code though :D
Title: Re: optimized sources
Post by: _heinz on 07 Nov 2007, 01:55:39 pm



 I am running vtune on the chirp one now to look for any p4 specific slowdowns, wickedly fast code though :D

have a strong modified chirpfft.cpp which we can try  too
Title: Re: optimized sources
Post by: _heinz on 07 Nov 2007, 04:47:27 pm
easy we can compile all 3 cases with the präprozessordefinition now --->
---------------------------------------------------------------------------------------------------
// USE_PFLOOP  --> Präprozessordirective
// USE_PFCASE  --> Präprozessordirective
#if defined( USE_PFLOOP )
   #pragma message ("-----PFLOOP-----")
   #include "pfloop.h" //use the loop-construct
#else
#if defined( USE_PFCASE )
   #pragma message ("-----PFCASE-----")
   #include "pfcase.h" //use the modified case-construct
#else
   //use original code
#endif // USE_PFCASE
#endif // USE_PFLOOP
-----------------------------------------------------------------------------------------
------ Build started: Project: seti_boinc, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "." /I "../../../boinc/api" /I "../../../boinc/client/win" /I "../../../boinc/lib" /I ".." /I "glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\boinc" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_DEBUG" /D "USE_PFLOOP" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MTd /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\pulsefind.cpp"
pulsefind.cpp
-----PFLOOP-----
..\pulsefind.cpp(1487) : warning C4146: unary minus operator applied to unsigned type, result still unsigned
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 1 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

regards   ;D

Title: Re: optimized sources
Post by: Jason G on 08 Nov 2007, 04:50:05 am
       have a strong modified chirpfft.cpp which we can try  too

Good we'll do that I think it is a very good idea, I have p4 sse2  primary performance data  (vtune) for the sse2_ChirpData_ak, 10000 loops on p4 Northwood with 512k l2 cache, which took a toral time of 10 secs execution time: (19 runs worth of data gathered)
(preliminary data, subject to verification with further runs)
   64k Alaising : almost none... Accounts for 1.34% of function workload (about 0.13 secs)
  Second Level Cache misses: Accounts for 10.28% of the workload (about 1 second)

other statistics (preliminary, subject to verification) :
128 bit mmx instructions ~82 million (no 64 bit MMX instructions counted)
packed double precision Floating Point SSE instructions ~1.4 billion (thousand million)
packed single precision  Floating Point SSE instructions ~4 billion (thousand million)

Mispredicted Branches = 0 !!!  :o

No Machine Clear counts (Pipeline flushes), split loads or blocked store forwards at all :D

I think that's a really good function, much better statistics than the pulefolding functions gave me, but I'll have to retest those in isolation too as I'm getting better at selecting the correct compiler settings and driving vtune too.

Well I'll check a few build setting and run primary performance measures again to verify those results, and add secondary performance indicators to see what else turns up.... Then on the weekend maybe fiddle with that 3 phase idea to see if it actually works....All good fun :D...

Jason


Title: Re: optimized sources
Post by: _heinz on 08 Nov 2007, 12:12:38 pm
the modified PFCASE is ready now
-----------------------------------------------
------ Build started: Project: seti_boinc, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "." /I "../../../boinc/api" /I "../../../boinc/client/win" /I "../../../boinc/lib" /I ".." /I "glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\glut" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\jpeglib" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\image_libs" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" /I "C:\I\SC\vs90\boinc" /I "C:\I\SC\vs90\boinc\api" /I "C:\I\SC\vs90\boinc\client\win" /I "C:\I\SC\vs90\boinc\lib" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "NBOINC_APP_GRAPHICS" /D "CLIENT" /D "_MT" /D "USE_IPP" /D "USE_SSE2" /D "_DEBUG" /D "USE_PFCASE" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Gm /EHsc /MTd /Zp16 /Gy /Fp".\Release/seti_boinc.pch" /Fo".\Release32-NOGFX\\" /Fd".\Release32-NOGFX\vc90.pdb" /FR".\Release32-NOGFX\\" /W3 /c /Wp64 /Zi /TP "..\pulsefind.cpp"
pulsefind.cpp
-----PFCASE-----
..\pulsefind.cpp(1487) : warning C4146: unary minus operator applied to unsigned type, result still unsigned
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 1 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
 ;D
Title: Re: optimized sources
Post by: _heinz on 08 Nov 2007, 09:50:55 pm
modified PFCASE rocks

here as it was before --->
ar=0.435000 done. Total flop count: 108711033335.208650

PulTimB 0.5    Totals:  Ratio            Ticks
             standard:  1.000      87303043476
Plan < 512 FPU swi ! :  0.575      50201832416
 Plan < 512 AK SSE ! :  0.634      55338411648
Plan < 512 BHx SSE ! :  0.993      86661631716
 Plan < 512 BH SSE ! :  0.774      67545465584

PFCASE ---->
ar=0.435000 done. Total flop count: 108711033335.208650

PulTimB 0.5    Totals:  Ratio            Ticks
             standard:  1.000      87387438720
Plan < 512 FPU swi ! :  0.504      44014700492
 Plan < 512 AK SSE ! :  0.633      55324520388
Plan < 512 BHx SSE ! :  0.992      86681643504
 Plan < 512 BH SSE ! :  0.773      67531081560
----------------------------------------------------------------------------------------------------
modified PFCASE ---> ~13% faster     ;D
heinz
Title: Re: optimized sources
Post by: Jason G on 09 Nov 2007, 01:45:24 am
Woohoo!, It's weekend! that function was with just the changes you made before? I'll guess that maybe the compiler did vectorise some of that,  I would like to look at disassembly output,  if the compiler was smart enough to put prefetch plus FPU plus streaming stores then that IS 3-Phase :D, anything is possible, have you compared for accuracy as well ?

Title: Re: optimized sources
Post by: _heinz on 09 Nov 2007, 10:26:39 am
Hi Jason,

The compact loop construction PFLOOP runs.
some first impressions: --->

FFTlen 8192, PulsePoTLen 24,  1048576 loops.
             Standard:     9049838772 ticks, 8630.60 per loop, 0 rpt
Plan < 512 FPU swi ! :     3892589440 ticks, 3712.26 per loop, 0 rpt
 Plan < 512 AK SSE ! :     5260680348 ticks, 5016.98 per loop, 0 rpt
Plan < 512 BHx SSE ! :    13525734128 ticks, 12899.15 per loop, 0 rpt
 Plan < 512 BH SSE ! :     9339515956 ticks, 8906.86 per loop, 0 rpt

ar=0.435000 done. Total flop count: 108711033335.208650

PulTimB 0.5    Totals:  Ratio            Ticks
             standard:  1.000      87462139372
Plan < 512 FPU swi ! :  0.609      53291444096
 Plan < 512 AK SSE ! :  0.634      55471031448
Plan < 512 BHx SSE ! :  0.990      86608697300
 Plan < 512 BH SSE ! :  0.772      67556177968

I´m surprised, did not expected it --->
Against the standard opt. case-construct it speeds up ~12% in FFTlen 8192, PulsePoTLen 24
And that it shows in FFTlen 8192, PulsePoTLen 24 ---> 3892589440 ticks, 3712.26   per loop
and the original code FFTlen 8192, PulsePoTLen 24---> 4427492996 ticks, 4222.39 per loop
looks like the LOOP-construct is faster in this case, but not in summary....
further measuring must manifest it.

All cases compiled with /Zi /Od no compiler optimization... and MS-Compiler...
so further improvement can be expected to use the Intel Compiler  ;D



Title: Re: optimized sources
Post by: Jason G on 09 Nov 2007, 11:03:35 am
Yeah, It is good now I can have a look at Pulsefind again now msvs is fixed! :D,  I think you might be finding a similar thing to what I've been seeing from a different part of the code:

     -  the code is very sensitive to certain optimisation settings,

I haven't worked out yet whether this is because the optimiser is improving some weakness in the code, or whether the code is written to take advantage of the optimisers, or perhaps [more likely] a little of both.  Time [and examining the assembly output listing  ;) ] will tell,

I would easily place that 12% in the range of the optimiser so I have learned [the hard way] to take care to use final settings for timing comparison.  That unexpected surprise might be a phantom, though it will be nice if it isn't :D

Jason
Title: Re: optimized sources
Post by: _heinz on 09 Nov 2007, 12:53:33 pm
yes, all this must be analyzed.... I will have a look at the asm code...to see what is really going on there.
As Joe alredy stated, shorter code must not be automatic  fastest.

I have a lot of not necessary assignments to some vars eleminated, so results not stored meanwhile to the vars, they will be hold in the registers for next operations.  ;)  fewer instructions for the same result
We keep in mind every command must be loaded into the instruction register and executed by the instruction decoder so a well squeezed code can show any effects too, especially in big loops.  ;)   fewer instructions means smaller times

The optimizing by setting compiler flags ... unrolling loops and fill the cache properly etc. are a other big field....

heinz
Title: Re: optimized sources
Post by: _heinz on 12 Nov 2007, 11:41:57 am
access of 23 000    .... I didn´t expected it..... looks like a hot thread
greatings to all who are reding here
heinz  ;)
Title: Re: optimized sources
Post by: Jason G on 12 Nov 2007, 12:34:25 pm
access of 23 000    .... I didn´t expected it..... looks like a hot thread
greatings to all who are reding here
heinz  ;)

Ithink It'll get much bigger yet, with the optimisations you are trying, and maybe a few from me too if I can consolidate a little :D (and get some more study and work done first!  ;D)
Title: Re: optimized sources
Post by: _heinz on 13 Nov 2007, 06:57:38 am
Hi Jason,

have you access to the "Pre-Release Applications" Forum ?  If not, ask the moderators to give you access rights.

heinz
Title: Re: optimized sources
Post by: Jason G on 13 Nov 2007, 09:09:09 am
Hi Jason,

have you access to the "Pre-Release Applications" Forum ?  If not, ask the moderators to give you access rights.

heinz

No, I didn't know there was such a place  ::).  Have you let the cat out of the bag again Heinz?  Who are the Moderators? (Never did pay much attention to the tags on the posts... I guess I should really know that :D)

[Is there excellent stuff worth pestering a mod about  in there I might want or need ?]

Thanks, Jason
Title: Re: optimized sources
Post by: Gecko_R7 on 13 Nov 2007, 04:03:26 pm
Hi Jason,

have you access to the "Pre-Release Applications" Forum ? If not, ask the moderators to give you access rights.


heinz

No, I didn't know there was such a place ::). Have you let the cat out of the bag again Heinz? Who are the Moderators? (Never did pay much attention to the tags on the posts... I guess I should really know that :D)

[Is there excellent stuff worth pestering a mod about in there I might want or need ?]

Thanks, Jason

Joe and I are Mods, though this is more like the Maytag repairman's job around here.
I think this can only be done by Admin rights (Simon), but have asked Joe if he is able to do it by chance as I don't see the option.  Hang tight.
Title: Re: optimized sources
Post by: Jason G on 13 Nov 2007, 11:00:42 pm
Joe and I are Mods, though this is more like the Maytag repairman's job around here.
I think this can only be done by Admin rights (Simon), but have asked Joe if he is able to do it by chance as I don't see the option.  Hang tight.
thanks Gecko_R7! I have all the time in the world, no hurry.  I have been informed by PM that some interesting/important  things ARE in there, sounds good. Keep up with the intense moderation work :D

Jason
Title: Re: optimized sources
Post by: Josef W. Segur on 14 Nov 2007, 07:59:04 am
Joe and I are Mods, though this is more like the Maytag repairman's job around here.
I think this can only be done by Admin rights (Simon), but have asked Joe if he is able to do it by chance as I don't see the option.  Hang tight.
thanks Gecko_R7! I have all the time in the world, no hurry.  I have been informed by PM that some interesting/important  things ARE in there, sounds good. Keep up with the intense moderation work :D

Jason

I also cannot find a way to grant access, I'm sure Simon will when he becomes active again.

I agree with Heinz that you'd find some of the discussions interesting, but I don't think there's anything you really need hidden there. Because the postings in threads there were made with the expectation they'd only be read by a limited group, I won't move or copy those threads to this public area. I'll just say that any of my own postings there may be quoted here, so if Heinz had something specific in mind which can be satisfied that way...
                                                      Joe
Title: Re: optimized sources
Post by: Jason G on 14 Nov 2007, 08:32:15 am
Thanks Joe, no probs :D
Title: Re: optimized sources
Post by: _heinz on 14 Nov 2007, 08:29:48 pm
Compile FPUCOMP and AKSCOMP now two further compact versions of code
-----------------------------------------------------------------------------------------------------------
------ Rebuild All started: Project: Optimizer, Configuration: Release Win32 ------
Deleting intermediate and output files for project 'Optimizer', configuration 'Release|Win32'
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /GL /I "C:\I\SC\PulTimB_5\client\Header Files" /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\PulTimB_5\client\vector" /I "C:\I\SC\PulTimB_5\client" /I "C:\I\SC\PulTimB_5\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_IPP" /D "USE_SSE2" /D "NDEBUG" /D "WIN32" /D "_WINDOWS" /D "_LIB" /D "_CONSOLE" /D "HAVE_SINF" /D "HAVE_COSF" /D "HAVE_ATANF" /D "USE_FPUCOMP" /D "USE_AKFCOMP" /D "USE_ASMLIB" /D "_DEBUG" /D "_WIN32" /D "_UNICODE" /D "UNICODE" /GF /Gm /EHsc /MTd /Zp16 /Gy /arch:SSE2 /fp:fast /Fo"Release\\" /Fd"Release\vc90.pdb" /W3 /c /Wp64 /Zi /TP .\AKfoldSSE.cpp
AKfoldSSE.cpp
-----AKFCOMP-----
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ot /Oy /GT /GL /I "C:\I\SC\PulTimB_5\client\Header Files" /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\PulTimB_5\client\vector" /I "C:\I\SC\PulTimB_5\client" /I "C:\I\SC\PulTimB_5\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_FPUCOMP" /D "USE_AKFCOMP" /D "USE_ASMLIB" /D "_DEBUG" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_LIB" /D "_CONSOLE" /D "HAVE_SINF" /D "HAVE_COSF" /D "HAVE_ATANF" /D "_UNICODE" /D "UNICODE" /GF /Gm /EHsc /MTd /Zp16 /Gy /fp:fast /Fo"Release\\" /Fd"Release\vc90.pdb" /W3 /c /Wp64 /Zi /TP .\opt_unopt.cpp
opt_unopt.cpp
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /GL /I "C:\I\SC\PulTimB_5\client\Header Files" /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\PulTimB_5\client\vector" /I "C:\I\SC\PulTimB_5\client" /I "C:\I\SC\PulTimB_5\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_IPP" /D "USE_SSE2" /D "NDEBUG" /D "WIN32" /D "_WINDOWS" /D "_LIB" /D "_CONSOLE" /D "HAVE_SINF" /D "HAVE_COSF" /D "HAVE_ATANF" /D "USE_FPUCOMP" /D "USE_AKFCOMP" /D "USE_ASMLIB" /D "_DEBUG" /D "_WIN32" /D "_UNICODE" /D "UNICODE" /GF /Gm /EHsc /MTd /Zp16 /Gy /arch:SSE /fp:fast /Fo"Release\\" /Fd"Release\vc90.pdb" /W3 /c /Wp64 /Zi /TP .\opt_SSEx.cpp
opt_SSEx.cpp
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /GL /I "C:\I\SC\PulTimB_5\client\Header Files" /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\PulTimB_5\client\vector" /I "C:\I\SC\PulTimB_5\client" /I "C:\I\SC\PulTimB_5\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_IPP" /D "USE_SSE" /D "NDEBUG" /D "WIN32" /D "_WINDOWS" /D "_LIB" /D "_CONSOLE" /D "HAVE_SINF" /D "HAVE_COSF" /D "HAVE_ATANF" /D "USE_FPUCOMP" /D "USE_AKFCOMP" /D "USE_ASMLIB" /D "_DEBUG" /D "_WIN32" /D "_UNICODE" /D "UNICODE" /GF /Gm /EHsc /MTd /Zp16 /Gy /arch:SSE /fp:fast /Fo"Release\\" /Fd"Release\vc90.pdb" /W3 /c /Wp64 /Zi /TP .\opt_SSE.cpp
opt_SSE.cpp
.\opt_SSE.cpp(95) : warning C4311: 'type cast' : pointer truncation from 'const float *__w64 ' to 'unsigned int'
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Ot /Oy /GT /GL /I "C:\I\SC\PulTimB_5\client\Header Files" /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\PulTimB_5\client\vector" /I "C:\I\SC\PulTimB_5\client" /I "C:\I\SC\PulTimB_5\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_MMX" /D "USE_ASMLIB" /D "_DEBUG" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_LIB" /D "_CONSOLE" /D "HAVE_SINF" /D "HAVE_COSF" /D "HAVE_ATANF" /D "USE_FPUCOMP" /D "USE_AKFCOMP" /D "_UNICODE" /D "UNICODE" /GF /Gm /EHsc /MTd /Zp16 /Gy /fp:fast /Fo"Release\\" /Fd"Release\vc90.pdb" /W3 /c /Wp64 /Zi /TP .\opt_FPU.cpp
opt_FPU.cpp
-----FPUCOMP-----
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /O2 /Ob2 /Ot /Oy /GT /GL /I "C:\I\SC\PulTimB_5\client\Header Files" /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\PulTimB_5\client\vector" /I "C:\I\SC\PulTimB_5\client" /I "C:\I\SC\PulTimB_5\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_FPUCOMP" /D "USE_AKFCOMP" /D "USE_ASMLIB" /D "_DEBUG" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_LIB" /D "_CONSOLE" /D "HAVE_SINF" /D "HAVE_COSF" /D "HAVE_ATANF" /D "_UNICODE" /D "UNICODE" /GF /Gm /EHsc /MTd /Zp16 /Gy /fp:fast /Fo"Release\\" /Fd"Release\vc90.pdb" /W3 /c /Wp64 /Zi /TP .\benchmark.cpp
benchmark.cpp
Creating library...
Build log was saved at "file://c:\I\SC\PulTimB_5\Optimizer\Release\BuildLog.htm"
Optimizer - 0 error(s), 1 warning(s)
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========

heinz
Title: Re: optimized sources
Post by: Jason G on 17 Nov 2007, 09:31:15 am
Okay, I'll be taking some time now to catch up on this in 'the area' now, look interesting :D

Jason
Title: Re: optimized sources
Post by: ScanMan on 17 Nov 2007, 11:54:17 am
Hi,

I am new to this form and would like to know which app is best for my Computer.

The OS is Windows 2000 pro Service Pack 4 the Cpu is Dual Xeon 2.0 GHZ which app would be the best for my setup!

Regards

ScanMan
Title: Re: optimized sources
Post by: Gecko_R7 on 17 Nov 2007, 03:40:47 pm
Hi,

I am new to this form and would like to know which app is best for my Computer.

The OS is Windows 2000 pro Service Pack 4 the Cpu is Dual Xeon 2.0 GHZ which app would be the best for my setup!

Regards

ScanMan

Welcome ScanMan.
Check here: http://calbe.dw70.de/win32.html
 
If you have a Core2-based CPU, you can use the sse3, or ssse3 based aps.  Just decide whether you want pretty graphics version or not.

If a prescott-based Xeon, the sse2 or sse3 will do fine.  Some folks don't see any noticable gain w/ sse3 on prescott cpus.   Depends on the specific cpu and other factors.  You may or may not gain much vs. sse2 ap.  Can always try them both and judge for yourself.

BTW, if you have any further questions, please post in the following on-topic thread instead, or on the main Seti number crunching forum board  :) Thanks!  The subject matter of this particular thread is development-based.

http://lunatics.at/windows/crunch3r-new-apps-in-32-xp-sp2.0.html

Cheers!
Title: Re: optimized sources
Post by: ScanMan on 18 Nov 2007, 11:26:17 am
Thanks for the heads up on my question.


Regards

ScanMan
Title: Re: optimized sources
Post by: _heinz on 18 Nov 2007, 06:27:58 pm
Hi Jason,
Merci for compiling my codepieces and make asm files with Intel-Compiler. After a first look at asm-code, AKFCOMP and FPUCOMP performs well.  ;D

found why my asm output not worked in ORCAS, in Configuration was Release, but must have Debug.  ;)

heinz
Title: Re: optimized sources
Post by: _heinz on 21 Nov 2007, 09:41:07 am
Hi Jason,
if you have some little time, try this with the Intel-compiler and use the etimer-project for measuring.
if you need anythink PM me.
------------------------------------------
------ Build started: Project: Optimizer, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /Od /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_AKFCOMP" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MTd /Zp16 /Gy /FAs /Fa"Release32-NOGFX\\" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc90.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp"
AKfoldSSE.cpp
-----IPP-----
-----SSE2/em-----
-----AKFCOMP-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
I had have a look at the asm-file yet.   ;)
heinz
Title: Re: optimized sources
Post by: Jason G on 21 Nov 2007, 10:05:43 am
Will have a look at compiling this with 'USE_AKFCOMP" defined soon , and check if I need anything else.
[was done & pm'd]

Jason
Title: Re: optimized sources
Post by: _heinz on 21 Nov 2007, 01:09:56 pm
Merci,
must a little be finetuned to go more parallel.
PM you if it is done.
heinz
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2007, 08:21:52 pm
The auto- vectorizer runs  ;D
-----------------------------------
------ Build started: Project: Optimizer, Configuration: Release32-NOGFX Win32 ------
Compiling...
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.
cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_AKFSIMD" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MTd /Zp16 /arch:SSE2 /fp:fast /FAs /Fa"Release32-NOGFX\\" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc90.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp"
AKfoldSSE.cpp
-----IPP-----
-----SSE2/em-----
-----AKFSIMD-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
Title: Re: optimized sources
Post by: _heinz on 30 Nov 2007, 08:08:10 pm
Working now on a vectorized version of chirpfft.cpp
heinz  ;D
Title: Re: optimized sources
Post by: Jason G on 01 Dec 2007, 07:28:09 am
Hi Heinz,
  Did you manage to determine any performance differences between our 'auto vectoriser friendly' folding routine (when compiled under ICC, with the pragma hints / dependency overrides) and hand vectorised code?  If you haven't had a chance I'll be able to take another look in 2 weeks (holidays  ;D)

Jason
Title: Re: optimized sources
Post by: _heinz on 03 Dec 2007, 06:22:15 pm
Hi Jason,
I´m waiting with this till you have holidays. Realised some nice ideas to eleminate not necessary code.  ::)
The autovectorizer runs great. Let surprise you.
Have a nice week.
Heinz  ;D
Title: Re: optimized sources
Post by: _heinz on 12 Dec 2007, 05:16:57 pm
As I´m going through the code, fraction_done get my attention.
Always before it is called we found (sometimes not directly before) following statement --->
progress = std::min( progress, 1.0 );

1. in function do_transpose
                    progress = std::min( progress, 1.0 );
                    #ifdef BOINC_APP_GRAPHICS
                        if ( !nographics() )
                            {
                            if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                            sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                            }
                    #endif
                    remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                    fraction_done( progress, remaining );
----------------------------------------------------------------------------------------------------------
2. in function process_data
                progress = std::min( progress, 1.0 );
                #ifdef BOINC_APP_GRAPHICS
                    if ( !nographics() )
                        {
                        if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData );
                        sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts );
                        }
                #endif
                remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft;
                fraction_done( progress, remaining );
------------------------------------------------------------------------------------------------------
3. in analyzePoT.cpp line 246
         progress = std::min( progress, 1.0 );   // prevent display of >   100%
         fraction_done( progress, remaining );
-----------------------------------------------------------------------------------------------------------------------------------
4.  in analyzePot.cpp line 387
            progress = std::min( progress, 1.0 );   // prevent display of >   100%
            fraction_done( progress, remaining );
----------------------------------------------------------------------------------------------------------------------------------------------------
therefore I think if we call fraction_done( double progress, double remaining )
it is not necessary in it to calculate progress again --->progress = std::min( progress, 1.0 );
because we get same result as before. So we can comment it out.
After helping the Compiler with some additional vars we get following short hopfully effective code --->

; 75   :     prog2 = 1.0 - remaining;

   fld1
   fsub   QWORD PTR _remaining$[esp-4]

; 76   : //   progress = std::min( progress, 1.0 ); // is alredy done before call fraction_done
; 77   : //   prog = progress * ( 1.0 - pow( prog2, PROG_POWER ) ) + prog2 * pow(prog2,PROG_POWER );//original
; 78   : //   A = pow( prog2,PROG_POWER );
; 79   : //   prog = progress * ( 1.0 - A ) +  prog2 * A ;
; 80   : //   B = 1.0 - A; C = prog2 * A;
; 81   : //   prog = progress * B + C;
; 82   : //  D = progress * B;
; 83   : //   prog = D + C;
; 84   :
; 85   :    A = pow( prog2,PROG_POWER );

   fld   QWORD PTR __real@4018000000000000
   call   __CIpow

; 86   :    B = 1.0 - A; C = prog2 * A;

   fld1
   fsubrp   ST(1), ST(0)

; 87   :    D = progress * B;
; 88   :    prog = D + C;
; 89   :     boinc_fraction_done( prog );

   sub   esp, 8
   fmul   ST(0), ST(0)
   fmul   QWORD PTR _progress$[esp+4]
   fadd   ST(0), ST(0)
   fstp   QWORD PTR [esp]
   call   _boinc_fraction_done
   add   esp, 8

; 90   :     }

   ret   0
?fraction_done@@YAXNN@Z ENDP            ; fraction_done
---------------------------------------------------------------------------------------------------------------------------------------

your comments are welcome

heinz



Title: Re: optimized sources
Post by: Jason G on 16 Dec 2007, 04:58:53 am
Working now on a vectorized version of chirpfft.cpp
heinz  ;D
Hi Heinz, I'm now on holidays :D, Are you looking at this one? I am trying to get reoriented after finishing study/work for the year, and am recovering after some serious celebrations :D.  It's time to catch up!

Jason

(PS, I been raised to code wizard so I've been reading more of the private areas, I think some of the stuff we've been trying out to force the autovectoriser has some real relevance and we maybe should start a thread about it there)
Title: Re: optimized sources
Post by: _heinz on 17 Dec 2007, 05:20:03 pm
Hi Jason,
had not have time the last days.... think we should equalize our codes first, if you are agree using the new programm structure I will upload all and if it is done PM you.
heinz
Title: Re: optimized sources
Post by: Jason G on 18 Dec 2007, 02:29:32 am
Hi Heinz,
Sounds like a good idea, PM when ready, take your time, no rush :D. For a comparative baseline reference, I have a functional 2.4V noGFX build with xN switches now.  It was tough finding a suitable Boincapi svn revision to build against because of much restructuring of random 'utils' and gfx classes between ~august 'til now.  [Investigating some unresolved externals actually led me to posts made by Simon back about July, on Beta, regarding the same sets of unresolved externals].  I think we should decide if we want to fix at a certain Boinc API svn revision (less work but may break), or build against the HEAD (lots more work...).

One initial feeling I get from that experience is any improvements that involve cutting out unnecessary boinc interface, and remove some of the basic string, memory and utility functions away from boincapi --> back towards OS/app might stabilise some of those issues (As these elements seem to be in constant flux in boincapi)....Yes I'm aware that's the exact opposite feeling an api library is supposed to generate [stability and solidity].  Of course a stripped down minimalist 'version' might constitute its own branch.... Just ideas. [Might be an idea to make the required utility functions in our own lib, maybe allowing us to drop some boincapi .h & .c references completely, removing dependancy on the revision... e.g. 'str_util.c' & 'str_util.h'.. do we really need to use boinc's version of this?...]

Jason
Title: Re: optimized sources
Post by: _heinz on 29 Dec 2007, 05:19:18 pm
I like the idea of a stripped down minimalist version, but we should eleminate not necessary code with #ifdef directives, in connection with the use of include files for variants, as I have done it  with USE_PFLOOP etc. , because it is important to have still one sourcecode, from which we can generate all necessary programmversions for the different cpu´s.

'str_util.c' & 'str_util.h'.. do we really need to use boinc's version of this?...] is a question for Joe

surprise... we are codewizards,  ::)   who does it ?   

heinz
Title: Re: optimized sources
Post by: Jason G on 29 Dec 2007, 08:12:53 pm
Yay, I feel special too.. I believe It was 'The Lunatic Mods of Chickenness" calling on the Holy Powers of the "Knights Who Say Ni!"
Title: Re: optimized sources
Post by: _heinz on 05 Feb 2008, 05:33:42 pm
Hi all who are reading here...
long time no post....we are working  :)

heinz
Title: Re: optimized sources
Post by: _heinz on 04 Mar 2008, 03:33:41 pm
Hi all,
we are still working...
Today one year in the forum now  ;)

heinz
Title: Re: optimized sources
Post by: _heinz on 19 Mar 2008, 04:00:09 pm
We are still there and the work is going on.

Happy Eastern
Joyeuse Paques 

heinz
Title: Re: optimized sources
Post by: _heinz on 22 Mar 2008, 12:24:15 pm
Hi,
during all the developement time(one year now) I'm crunching parallel 24/7, all on one machine, a  Northwood P4 2.66Mhz.
Since august 2007 (25566 at 2007-08-11) I'm going backwards in the "World Position List" and the curve looks like a typical "bathtub curve" as you can see here (http://boincstats.com/stats/boinc_user_graph.php?pr=bo&id=5e024335320e436c4d050e073963e326/)
this means much more new users has better machine like mine.
Title: Re: optimized sources
Post by: Jason G on 22 Mar 2008, 12:30:51 pm
Heinz, keep in mind i did pay Aus$197 for my Wolfdale  , which is really a piece of sand about 0.25 cm squared.  That is more expensive, yet more useful than gold...
Title: Re: optimized sources
Post by: _heinz on 22 Mar 2008, 12:46:37 pm
At the moment the price of a Intel Xeon E5405 activ (BX80574E5405A) Harperton Quadcore/2000MHz, L2 Cache 12MB, Bustakt 1333, Socket LGA 771 Boxed(80Watt) is still 176,90 Euros, thats a very interesting alternative to build a 8 core.
But these dualboards are very expensive.....
heinz
modify:
178.00 Euros = 296.34 Australian dollars 
 
Exchange rate: 1.664847 
Rate valid as of: 22/3/2008 
-----------------------------------------
here in europa is the wolfdale E8400 2x3.0GHz BOX 6MB with 200 Euro more expensive than a Xeon E5405 Quadcore 2GHz 178,62 Euro
this give me some thinking   :o
Title: Re: optimized sources
Post by: _heinz on 22 Mar 2008, 01:15:51 pm
Heinz, keep in mind i did pay Aus$197 for my Wolfdale  , which is really a piece of sand about 0.25 cm squared.  That is more expensive, yet more useful than gold...

197.00 Australian dollars = 118.33 Euros 
 
Exchange rate: 0.600656 
Rate valid as of: 22/3/2008 
-----------------------------------------
Intel Core2 Duo E8400 2x3.0GHz BOX 6MB
199,90 €
here in Germany ---> http://www.kmelektronik.de/
200.00 Euros = 332.97 Australian dollars 
 
Exchange rate: 1.664847 
Rate valid as of: 22/3/2008 
-------------------------------------------

you got a very hot price in australia   ::)


heinz
Title: Re: optimized sources
Post by: Jason G on 22 Mar 2008, 08:27:40 pm
Yeah Not anymore Though, I checked at the same place I bought my E8400:

    " Intel Core 2 Duo E8400 (3.00Ghz/6MB/1333FSB/EMT64/XD/Dual Core)
**SHORT SUPPLY. NEXT SHIPMENT DUE MID TO LATE APRIL** "  $399

 :o

Q6600 Quads are now $299 at the same place
Title: Re: optimized sources
Post by: _heinz on 22 Mar 2008, 10:32:29 pm
Sometime I'm not sure what todo, SkullTrail yes/no with  E5405 (http://www.intel.com/cd/products/services/emea/deu/processors/xeon5000/344535.htm) (178 Euro)
or better wait for  Nehalem  (http://www.intel.com/technology/architecture-silicon/next-gen/index.htm?iid=tech_arch+body_45nm_nehalem) and Next Generation Intel Microarchitecture  (http://www.intel.com/technology/architecture-silicon/next-gen/whitepaper.pdf)
or meanwhile a cheaper resulution:  board XFX GeForce 7150/MCP630i (70 Euro) no graphiccard necessary, with a Intel Core2 Quad Q9450 4x2.67GHz BOX (300 Euro), case ram disk..all together ca 680 Euro for the hardware + software XP Professional (130 Euro) for testing our parallel stuff.....
Title: Re: optimized sources
Post by: Jason G on 23 Mar 2008, 08:09:19 am
Sometime I'm not sure what todo, SkullTrail yes/no with  E5405 (http://www.intel.com/cd/products/services/emea/deu/processors/xeon5000/344535.htm) (178 Euro)
or better wait for  Nehalem  (http://www.intel.com/technology/architecture-silicon/next-gen/index.htm?iid=tech_arch+body_45nm_nehalem)
or meanwhile a cheaper resulution:  board XFX GeForce 7150/MCP630i (70 Euro) no graphiccard necessary, with a Intel Core2 Quad Q9450 4x2.67GHz BOX (300 Euro), case ram disk..all together ca 680 Euro for the hardware + software XP Professional (130 Euro) for testing our parallel stuff.....

Honestly Heinz, I'd say it'd be difficult to go past the Q6600 at the moment.  I'd guess the Yorkfields are being held off 'till the stock of those clears a bit.  Then the Yorkfields will be awesome [If this wolfdale is anything to go by].  I am getting the feeling that the Nehalem architecture will be a fairly radical departure from what we're used to, and it may take some time for the software to follow.  Perhaps something like the OpenMP standard gives some insight there,  many cores with shared memory.

Jason
Title: Re: optimized sources
Post by: _heinz on 23 Mar 2008, 09:03:57 am
Hi Jason,
please read Next Generation Intel Microarchitecture  (http://www.intel.com/technology/architecture-silicon/next-gen/whitepaper.pdf) with  Intel QuickPath Architecture  (http://www.intel.com/technology/quickpath/whitepaper.pdf) it is a revolution and a step forward to the next generation.  :o
under this circumstances is it better to wait for it and meanwhile use a cheaper solution as described before.
please give feedback if you have read it.

heinz
Title: Re: optimized sources
Post by: Jason G on 23 Mar 2008, 09:18:50 am
That pretty much describes the reasons I chose to settle for a Wolfdale Right now:
  - I needed a fairlly inexpensive upgrade (well it was at the time cheap, now they have gone too high in price here)
  - My other jobs require me to support development of SSE4.1 functionality
  - My current system has limited power & cooling considerations (45 degree C desert heat, 1 10amp power circuit @240Vac, for everything)

 So the dual core suits my needs perfectly for now, but I recognise the software infrastructure required to make better use of parallelism is improving immensely, both through fine grained and thread level approaches,. So my next upgrade will probably be either Nehalem or the 32nm variant after Nehalem, depending on what my work demands.. I saw the name but don't remeber it of the next generation...
Title: Re: optimized sources
Post by: Gecko_R7 on 23 Mar 2008, 12:20:27 pm
Sometime I'm not sure what todo, SkullTrail yes/no with  E5405 (http://www.intel.com/cd/products/services/emea/deu/processors/xeon5000/344535.htm) (178 Euro)
or better wait for  Nehalem  (http://www.intel.com/technology/architecture-silicon/next-gen/index.htm?iid=tech_arch+body_45nm_nehalem)
or meanwhile a cheaper resulution: board XFX GeForce 7150/MCP630i (70 Euro) no graphiccard necessary, with a Intel Core2 Quad Q9450 4x2.67GHz BOX (300 Euro), case ram disk..all together ca 680 Euro for the hardware + software XP Professional (130 Euro) for testing our parallel stuff.....

Honestly Heinz, I'd say it'd be difficult to go past the Q6600 at the moment. I'd guess the Yorkfields are being held off 'till the stock of those clears a bit. Then the Yorkfields will be awesome [If this wolfdale is anything to go by]. I am getting the feeling that the Nehalem architecture will be a fairly radical departure from what we're used to, and it may take some time for the software to follow. Perhaps something like the OpenMP standard gives some insight there, many cores with shared memory.

Jason


I think Nehalem will be a quite expensive upgrade/transition for a bit when all costs are factored.
It's LGA1366 so factor a brand new Mobo.
Also, DDR3 is almost a given to take advantage of the new arch.

Likely to be massive price gouging and very little supply initially...ltd mobo options, and buggy release bios's.  Wouldn't be surprised for it to be at least Q2 09' before we see decent pricing and availability for us mere mortals that have budgets to consider.
We should also see some nice price drops on Penryn/Yorkfield and perhaps a new stepping as Nehalem is released.

Hard to argue against the current value and mobo selection of c2d & quadcore chips.
Pretty cheap $$$ / performance ratio.

Cheers!
Title: Re: optimized sources
Post by: Jason G on 23 Mar 2008, 01:17:34 pm
Likely to be massive price gouging and very little supply initially...ltd mobo options, and buggy release bios's.  Wouldn't be surprised for it to be at least Q2 09' before we see decent pricing and availability for us mere mortals that have budgets to consider.
We should also see some nice price drops on Penryn/Yorkfield and perhaps a new stepping as Nehalem is released.

Well priorities have a way of shifting depending on need.  As you point out p4's & AMDs of SSE2 vintage are still extremely popular according to boincstats, and dominate throughput in many respects.

What the tests seem to be showing is that Alex pretty well nailed the Core2 code, and unless we decide to tackle the other end there may be little left to do there for now (Unless, that is,  some of the relaxed validation requirements that have been spoken about are put in place, then the parallelism race may be back on in force). 

Early p4's have special characteristics to do with cache that aren't necessarily all that happy with techniques used in builds targeted for the core2 architecture.  There are speed improvement showing in the p4(SSE3) I tested, but not as great as the core2 improvements.  There might be plenty of room to tweak that and the SSE3 instructions may as well be macro encapsualted while we're there, allowing SSE2 substitution.

There is though possibly still quite a bit more opportunity to squeeze more performance from the core2 build first.  We have spoken about profile guided optimisations, which haven't been touched yet, and in fact no profiles have even been run yet to identify possible bottlenecks or problems with the build,  That is why, in my book,  it is still considered pre-alpha.  Valid results is one thing, but releasing substandard builds I'd rather leave to the software companies who have the excuse of pressure from the marketing department.

Jason
Title: Re: optimized sources
Post by: Gecko_R7 on 23 Mar 2008, 01:33:30 pm

Well priorities have a way of shifting depending on need. As you point out p4's & AMDs of SSE2 vintage are still extremely popular according to boincstats, and dominate throughput in many respects.

What the tests seem to be showing is that Alex pretty well nailed the Core2 code, and unless we decide to tackle the other end there may be little left to do there for now (Unless, that is, some of the relaxed validation requirements that have been spoken about are put in place, then the parallelism race may be back on in force).

Early p4's have special characteristics to do with cache that aren't necessarily all that happy with techniques used in builds targeted for the core2 architecture. There are speed improvement showing in the p4(SSE3) I tested, but not as great as the core2 improvements. There might be plenty of room to tweak that and the SSE3 instructions may as well be macro encapsualted while we're there, allowing SSE2 substitution.

There is though possibly still quite a bit more opportunity to squeeze more performance from the core2 build first. We have spoken about profile guided optimisations, which haven't been touched yet, and in fact no profiles have even been run yet to identify possible bottlenecks or problems with the build, That is why, in my book, it is still considered pre-alpha. Valid results is one thing, but releasing substandard builds I'd rather leave to the software companies who have the excuse of pressure from the marketing department.

Jason

Did you intend this rseponse to be attached to other thread?
Title: Re: optimized sources
Post by: Jason G on 23 Mar 2008, 01:35:21 pm
Probably  ;) I'm lost, oh well here's as good a place as any  ;D [ Ahh yes, the questions about SSE2 support and AMD chips in the development thread, oh well , we can draw a mental link , it's  gotten crazy in there so I thought I'd come out for some fresh air  8) ]
Title: Re: optimized sources
Post by: _heinz on 27 Mar 2008, 06:58:11 pm
found the  supercruncher  (http://www.tomshardware.com/de/Supercomputer-Eigenbau,news-240741.html).  :o
10 quad cpus.
The idea is not new, I have thought about it 2 years ago as I started with diskless CF machines. It is possible build 4 watercooled dual-boards in a server case, vertical installation, using one powersupply and a common disk-array. So we have 8 physical  quad-CPU's. These are 32 cpus.
I would prefer vertical installation, as a blade Server.

I like such crazy things.  ;D

heinz

Title: Re: optimized sources
Post by: _heinz on 28 Mar 2008, 07:54:16 pm
The new servers are comming
Altos G540 TT.G54E0.033
Altos G540 Xeon QC E5405 2.0GHz/1333 MHz/12MB, 2x1GB FBD 667, 1xSAS/SATA HDD cage (4 bays, second cage optional), SATA 6 ch on board, no HDD and Carrier, DVD, Dual Gigabit Lan Onboard, PS2 Maus, ohne Keyboard, EasyBuild 7.1, 3 Jahre Garantie und Vor-Ort-Service Next 
 
1321,18 Euro  there  (http://www.f-m-shop.de/seiten/frame_ga_schotte.cfm?kat=gArtikel&nav=2&artnr=991388505)

2GB RAM looks a little bit small for a quad, but the price is hot  ::)

heinz
Title: Re: optimized sources
Post by: Jason G on 28 Mar 2008, 10:44:26 pm
The new servers are comming
Altos G540 TT.G54E0.033
Altos G540 Xeon QC E5405 2.0GHz/1333 MHz/12MB, 2x1GB FBD 667, 1xSAS/SATA HDD cage (4 bays, second cage optional), SATA 6 ch on board, no HDD and Carrier, DVD, Dual Gigabit Lan Onboard, PS2 Maus, ohne Keyboard, EasyBuild 7.1, 3 Jahre Garantie und Vor-Ort-Service Next 
 
1321,18 Euro  there  (http://www.f-m-shop.de/seiten/frame_ga_schotte.cfm?kat=gArtikel&nav=2&artnr=991388505)

2GB RAM looks a little be smal for a quad, but the price is hot  ::)

heinz
yeah, the prices of FB-Dimms or DDR3 too is why I have stayed with a DDR2 desktop mainboard.  It is a nice step up for you and I think there is enough processing power there to decommission your other machines, then you will quickly save enough money in electricity and the prices of RAM will drop far enough then you can put 4 Gig in easily  ;D
Title: Re: optimized sources
Post by: _heinz on 06 Apr 2008, 08:18:03 am
.... must wait still  to get the rest of the hardware (the board will be expected available from 18.04.08), hoping to start with the v8 around the 1st of may.

At the moment I'm collecting and preparing the necessary software...

for all others who wait for a new app... our developer team is working hard to prepare a v8 optimized app which is comming soon.

meanwhile I  cleaned my old servers --->  server demounting  (http://www.britta-d.de/bilder/demount/demount.htm)
the board is now equipt with 2 Pentium 200MMX.  (http://www.britta-d.de/images/dual0002.jpg)
did you ever have a look in your power supply after 12 years, here you can see it.

heinz  ;D


Title: Re: optimized sources
Post by: _heinz on 06 Apr 2008, 11:15:41 am
while we are waiting, here a first look on the parts of the Xeon-V8 (http://www.britta-d.de/bilder/server/server.htm)
Continuing as new parts arrive.

heinz   ;D
Title: Re: optimized sources
Post by: _heinz on 06 Apr 2008, 02:30:48 pm
have a look ---> Today is the LHC Open Day  (http://lhc2008.web.cern.ch/LHC2008/index-E.html) 

have fun  ;D  .. heinz
Title: Re: optimized sources
Post by: _heinz on 06 Apr 2008, 09:50:28 pm
in preparation to future tasks under VM (http://www.britta-d.de/bilder/ibmbooks/books.htm), I did a look into my private bibliothek.  :o

modify:
better have a look at the newest  Windows Server  (http://www.microsoft.com/virtualization/default.mspx)

I set a new collection of VM-optimizing manuals/dokus in our testproject-repository.  ;D

heinz
Title: Re: optimized sources
Post by: _heinz on 08 Apr 2008, 11:06:44 am
Today I made some fotos of the powersupply. Have a look at Xeon V8-Server (http://www.britta-d.de/bilder/server/server.htm).
There are 4 connectors for PCi-E (for the possible 4 graphic cards of the D5400XS)
With the processors I got the Xeon label, but it is not allowed me to place it on the front bezel of the case without a extra licence of Intel, huuuh  :o

heinz
Title: Re: optimized sources
Post by: Jason G on 08 Apr 2008, 12:01:45 pm
Today I made some fotos of the powersupply. Have a look at Xeon V8-Server (http://www.britta-d.de/bilder/server/server.htm).
There are 4 connectors for PCi-E (for the possible 4 graphic cards of the D5400XS)
With the processors I got the Xeon label, but it is not allowed me to place it on the front bezel of the case without a extra licence of Intel, huuuh  :o

heinz
So are you saying you can have only one Xeon sticker on the front ?  LOL, maybe if you put it there Francois will come and make you peel it off again ...  ;D  If he does this I insist that you kick his bum!
Title: Re: optimized sources
Post by: _heinz on 08 Apr 2008, 12:11:24 pm
No, in every cpu package is a Xeon-inside label (so I have 2 of them), but I did not open the second package till now.
Show  Xeon inside  (http://www.britta-d.de/bilder/server/page11.htm) and read the full text.
Title: Re: optimized sources
Post by: Jason G on 08 Apr 2008, 12:20:13 pm
No, in every cpu package is a Xeon-inside label (so I have 2 of them), but I did not open the second package till now. Show  Xeon inside  (http://www.britta-d.de/bilder/server/page11.htm) and read the full text.

LOL, I like that .. "we give you a sticker, but you can't use it unless you agree to sign an agreement, and by doing so your are agreeing not to use the sticker on any system that we don't want you to!"   LOL ::)
Title: Re: optimized sources
Post by: _heinz on 08 Apr 2008, 12:27:42 pm
uuuhhh, the second time my download of the Fedora ISO-DVD of 3,6GB crashed after at 1,35GB after 10 hours... :'(
hmmm... perhaps I should use a newest downloadmanager... which is recommended ?
Title: Re: optimized sources
Post by: Raistmer on 08 Apr 2008, 12:38:45 pm
try GetRight :) I really like it
Title: Re: optimized sources
Post by: _heinz on 08 Apr 2008, 01:19:57 pm
get it now, thanks
regards heinz

it is started again... downloadtime 16hours with 63,8k/sec thats the maximum here...(on th next street they have 2000)   where is my DSL ? the france telecom did not insert a repeater....now we are waiting already 2 years.  :'(   :'(   :'(
Title: Re: optimized sources
Post by: _heinz on 09 Apr 2008, 01:56:30 pm
download of Fedora ended sucessful now.  ;D thanks to Raistmer
Next download is running:  Windows Server 2008 Enterprise, 12 hours downloadtime....we are waiting... ;)
Title: Re: optimized sources
Post by: Raistmer on 09 Apr 2008, 02:46:22 pm
;D
Title: Re: optimized sources
Post by: _heinz on 09 Apr 2008, 04:26:19 pm
set the MS_Virtualization_Overview_v1.1.doc into our testproject , it is worth to have a look at it ... ;)

Title: Re: optimized sources
Post by: _heinz on 09 Apr 2008, 06:19:36 pm
Hi hidden user...looking up again  ;)
Title: Re: optimized sources
Post by: Raistmer on 10 Apr 2008, 05:06:35 am
I have doubt about great use ofVWM for purpose of tuning app. With VM one can see if app will work under different OSes, but no timing at all IMHO...
The same with instruction set. Even with VM I cant emulate SSE4.1 on my Venice IMHO ;)
Title: Re: optimized sources
Post by: _heinz on 10 Apr 2008, 07:22:20 am
We will try out all this stuff of Windows2008...and we have always the possibility to use a boot menue at the beginning to load the OS under which we will tuning the app directly.
That would be very interesting... ;D
Title: Re: optimized sources
Post by: Jason G on 10 Apr 2008, 08:07:35 am
.....Even with VM I cant emulate SSE4.1 on my Venice IMHO ;)
LOL, maybe if you clench really hard like Hiro from Heroes it will magically turn into a Wolfdale  ;D.  Although I use VM's for testing,  I think any kind of timing benchmarks would be a waste of time.  This is why I wish to run dual boot to test 64 bit for myself,  might as well get you some sse41 PGO data while I'm there, running native.

 I think in future PGO instrumented runs will be okay in VM because we have proven them to be NOT hardware bound, (at least with light PGO)  but definitely code coverage and data dependant (which will execute the same on any hardware that supports the build (hopefully) . This *might be* one good argument against installing the code dispatching mechanisms like in 2.4 & stock .. PGO data would have to be generically targeted across builds and include all the functions, so performance target gets 'diluted'. 

Also relative runs to compare app speeds may be okay in VM, but not too representative of native speeds except *maybe* where there is some hardware virtualisation support available.

Jason
Title: Re: optimized sources
Post by: _heinz on 10 Apr 2008, 08:41:19 am
jason,
as I have seen hardware virtualisation is available there.  ;D
Today I got the email from Redmont for the evaluation with a lot of further infos.
thanks Microsoft.  ;D
Title: Re: optimized sources
Post by: Jason G on 10 Apr 2008, 08:55:02 am
jason,
as I have seen hardware virtualisation is available there.  ;D
Today I got the email from Redmont for the evaluation with a lot of further infos.
thanks Microsoft.  ;D

I think your new server is designed with something like this in mind :
    - Run a native Linux (Fedora core with custom compiled kernel) , public facing webserverapache/PHP
    - dedicated MySql database VM, allows optimal filesystem configuration for the database(s)
    - Windows Server Host, For serving asp pages & dotnet web application & intranet stuff, maybe running an exchange server and nt domain
    - preferred Host OS as desired, probably a Pro edition of Windows, or a Fluffy Linux Variant as workstation
 
Now that saves a lot of hardware if you need all those things in a small to medium business,  Lots of power saved too

Jason.

   
Title: Re: optimized sources
Post by: _heinz on 13 Apr 2008, 08:46:18 pm
Fedora is the right one...
have now some installation packages
1. Fedora
1.1 CellBE
1.2 OS/390
2.  Server2008
2.1 SDK_2008
2.2 VS2008
2.3 ATI- GPU-developer kit

a lot of work ....to found, get, and later install it

heinz
Title: Re: optimized sources
Post by: _heinz on 14 Apr 2008, 07:37:36 pm
did anybody know howmany place will be needed for a normal install of Fedora. Is a 4GB CF-disk big enough ?
the IDE port can be equipt with 2 CF-Cards
thanks
modify: found this
The /usr directory holds the majority of software content on a Fedora system. For an installation of the default set of software, allocate at least 4 GB of space. If you are a software developer or plan to use your Fedora system to learn software development skills, you may want to at least double this allocation.
here are some important links:
Fedora for You (http://fedoraproject.org/get-fedora.html)
Fedora 8 Installation Guide (http://docs.fedoraproject.org/install-guide/f8/)
Fedora Forums (http://forums.fedoraforum.org/)
Fedora Wiki (http://fedoraproject.org/wiki/FAQ/)
Fedora Documentation (http://docs.fedoraproject.org/)
Linux Documentation Project (http://www.tldp.org/)
Red Hat Enterprise Linux Documentation (http://www.redhat.com/docs/manuals/enterprise/)

so we have a lot to read
 ;)
Title: Re: optimized sources
Post by: Jason G on 15 Apr 2008, 09:01:46 am
yeah, fedora is a bit sophisticated for me, about 12 years since I used Linux.   I'd imagine the size requirement, like most Linux installations, would depend on what packages you wish to install, and what you want to use the OS for, If you don't need a graphical desktop it could be tiny.  I'd guess that If you plan to run a gui, webserver, database server, development tools to rebuild a custom kernel, and a swap partition too, then 4GB might be pushing it.. But for just a single user thing to develop and run small boinc Apps it might be heaps of room. Combined 8Gb might be more than enough for just about anything except high volume server.

I will be cranking up Ubuntu in a vm next week and I am considering doing a Linux  systems administration course and Windows server Certification in parallel. not sure yet
Title: Re: optimized sources
Post by: _heinz on 15 Apr 2008, 01:15:17 pm
wow, new 32GB Transcend CompactFlashs are available price 132 Euro
read/write 45MB/21,5MB pro second and ECC correction, PIO Mod 6, UDMA 4, ATA
Thats enough to install a full XP.....
since a year I have already 2 machines with CF running for testing....our MMx builds
machine 1  a 200MMx with a 4GB Transcend 266x running w98
machine 2 a dual-200MMx with a 2GB Transcend 120x running NT4 Workstation
both machines run fine, and are very quick...and they are integrated in my private network...

heinz
Title: Re: optimized sources
Post by: _heinz on 16 Apr 2008, 11:31:37 am
First Windows High Performance Computing (HPC) User Group Meeting on April 22, 2008
RWTH_RZ_Aachen (http://www.rz.rwth-aachen.de/go/id/sbb/lang/en)

Microsoft HPC++ (http://www.microsoft.com/windowsserver2003/ccs/default.aspx) look at News
have a look to explore it... ;D

heinz
Title: Re: optimized sources
Post by: _heinz on 18 Apr 2008, 01:06:23 pm
while I'm waiting for the last tools I invite you to have a look where the river Sauer kissed the Rhin.. (http://www.britta-d.de/bilder/rhin/rhin.htm)  ;D
heinz
Title: Re: optimized sources
Post by: _heinz on 18 Apr 2008, 06:30:59 pm
today we have 2 Hidden Users....greetings  ;)
Title: Re: optimized sources
Post by: _heinz on 19 Apr 2008, 06:54:38 pm
Today I reorganized my workspace, added 70cm to my workbench, rejected the old Mac Quadra800 after he has done his work as a DTP machine now more than 12 years, to have room for the new server.
Today one old Mac 7300/200 is enough for me ...for DTP(to create postscript) if needed.......
These old Mac's are very secure machines...the hardware has a very long life ... ;D

Title: Re: optimized sources
Post by: _heinz on 22 Apr 2008, 11:17:37 am
you want to know some more about the GRID.. I invite you to have a look at the Grid Café (http://gridcafe.web.cern.ch/gridcafe/)
heinz  ;)
Title: Re: optimized sources
Post by: _heinz on 22 Apr 2008, 04:44:17 pm
Hi Hidden user.. have a look again  ;)
Title: Re: optimized sources
Post by: _heinz on 24 Apr 2008, 06:20:44 pm
for all freaks of CF
new 300x CF are available:
4096MB Transcend Compact Flash Card 300x ca. 45,00 Euro
8192MB Transcend Compact Flash Card 300x ca. 82,50 Euro
16GB Transcend Compact Flash Card 300x     ca. 173,30 Euro

Speed 300X CompactFlash® Cards running Dual-Channel Ultra DMA Modus 5*
- Hardware ECC (Error Correction Code) Fehlerkorrektur
- Support IDE PIO Modus 6 and Ultra DMA Modus 5
------------------------------------------------------------------------
to use it you need a adapter (http://www.britta-d.de/bilder/server/page25.htm)
 ;D

Title: Re: optimized sources
Post by: Gecko_R7 on 24 Apr 2008, 07:38:59 pm
for all freaks of CF
new 300x CF are available:
4096MB Transcend Compact Flash Card 300x ca. 45,00 Euro
8192MB Transcend Compact Flash Card 300x ca. 82,50 Euro
16GB Transcend Compact Flash Card 300x     ca. 173,30 Euro

Speed 300X CompactFlash® Cards running Dual-Channel Ultra DMA Modus 5*
- Hardware ECC (Error Correction Code) Fehlerkorrektur
- Support IDE PIO Modus 6 and Ultra DMA Modus 5
------------------------------------------------------------------------
to use it you need a adapter (http://www.britta-d.de/bilder/server/page25.htm)
 ;D


4 of those might work REAL nice in RAID 0 in this:

http://addonics.com/products/flash_memory_reader/ad4cfprj.asp
Title: Re: optimized sources
Post by: _heinz on 24 Apr 2008, 08:55:36 pm
excellent this CF-Raid adapter...it open us new possibilities ..thanks Gecko  ;)
Title: Re: optimized sources
Post by: _heinz on 26 Apr 2008, 05:04:27 pm
Some short history,
on 10 januar 2008 we decided to do no further investigation into 2.4/ 2.2B source.
Although I did invest a lot work into the 2.4/ 2.2B source we started up with the Stock v6 source.
The goal is to reorganize the seti_app from its structure to use much more parallel work in its flow-chart and support newest processor architecture.
On this the developer-team (Heinz, Jason, Raistmer, Joe) is working now.
But the community cried very hard for a new seti_app for their quaddies, so Jason and Raistmer ported the ak_V8 code meanwhile.
It will be published 1th of May.
Give them what they want..  ;)
This was still a interruption of our real work to go parallel with the seti_app.

Because it is not possible to test and develop parallel code without have a machine to handle it, I decide to investigate(work and money) in a well equipt new 8-core server. This new machine gives us the possibilities and testbeds we need to develop HPC parallel app's.  I'm waiting still on the board which will be available on 5th of May(hopefully)....
so it is some time to invite you to a short walk to the ponds (http://www.britta-d.de/bilder/weiher/weiher.htm)

heinz



Title: Re: optimized sources
Post by: _heinz on 27 Apr 2008, 09:55:08 pm
you want to know some more about the GRID.. I invite you to have a look at the Grid Café (http://gridcafe.web.cern.ch/gridcafe/)
Welcome to the EGI (http://web.eu-egi.eu/)
Welcome to CERN (http://public.web.cern.ch/Public/Welcome.html)
Read the CERN Courier (http://cerncourier.com/cws/latest/cern;jsessionid=D98C32D34A5668815335E6A1BD1B210B) the International Journal of High-Energy Physics

have fun
heinz  ;)

Title: Re: optimized sources
Post by: _heinz on 30 Apr 2008, 11:49:09 am
The ak_v8 is up now...
I wish you all a nice 1 May   ;D

heinz
Title: Re: optimized sources
Post by: _heinz on 19 Jun 2008, 04:11:12 am
Hi all who are reading here,
Today 19th of June it is the first time since the lunatics outage, I can post here again.
Greetings to all.  ;)

What happened in the meantime:
1. The V8-Xeon is built.
2. today a month ago, 19th of May the V8-Xeon started to crunch. (Read the V8-Xeon Server (http://setiweb.ssl.berkeley.edu/forum_thread.php?id=47247) thread on seti)
    The results are great, the machine crunched in 30 days more than my P4 2.6GHz Northwood in 3 years and 4 months.
3. V8-Xeon runs absolut stable.
4. The machine has a boot menu and a very handy backup-system and run with Vista64Ultimate,
    till now no other OS is installed.

What is todo next:
1. Install other necessary OS for development and testing
2. Install program developer environment in the different OS's
3. Testing functionality of the developer environment
4. Start again with our development process for seti & Co.
I hope todo point 1-3 till before the summer-holidays are starting....

heinz
Title: Re: optimized sources
Post by: Jason G on 19 Jun 2008, 09:35:20 am
Hi Heinz,
   Got some holidays coming in two weeks also, so perhaps I'll have a lot more time to look at this also.  It'll be nice to finally get a chance to look a bit deeper into the code/algorithms.  Glad to see you're back.

Jason
Title: Re: optimized sources
Post by: _heinz on 20 Jun 2008, 09:34:42 am

Server2008
did download the file 6001.18000.080118-1840_amd64fre_Server_de-de-KRMSXFRE_DE_DVD.iso before some months and have thought to get a free version of Server2008. Now when i try to install, it shows the window to put in the registration number, but i have none, if i say go on without it, the next menue come up where you must choose the different servertypes to installl, but I can't choose nothing of it and click the next step is impossible.
It's frustrating...
So I think we can forget it......
Very pitty

heinz
Title: Re: optimized sources
Post by: Urs Echternacht on 20 Jun 2008, 04:55:52 pm
Windows Server 2008 (http://www.microsoft.com/downloads/details.aspx?FamilyID=13C7300E-935C-415A-A79C-538E933D5424&displaylang=de) and extending the evaluation period (http://support.microsoft.com/kb/948472). Don't know if that will work, but i think its worth a try.
Title: Re: optimized sources
Post by: _heinz on 28 Jun 2008, 04:24:28 am
Thanks Urs for your hints  :)
meanwhile I installed VS2008Express under Vista64, next will be Fedora9 with Eclipse.

heinz


Title: Re: optimized sources
Post by: _heinz on 01 Jul 2008, 02:25:28 pm
Today 1th of July, the V8-Xeon get a RAC of 8000 and is now one the first 25 computers in the "top list" of seti.

heinz  ;D
Title: Re: optimized sources
Post by: Raistmer on 02 Jul 2008, 08:14:16 am
Greetings!  :) Keep up !
Title: Re: optimized sources
Post by: _heinz on 05 Jul 2008, 09:54:48 am
Greetings!  :) Keep up !
looks like I can OC the machine up to 4GHz
Everest shows:
    Prozessor Eigenschaften:
      Hersteller                                        Intel(R) Corporation
      Version                                           Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
      Externer Takt                                     333 MHz
      Maximaler Takt                                    4000 MHz
      Aktueller Takt                                    2000 MHz
      Typ                                               Central Processor
      Spannung                                          1.6 V
      Status                                            Aktiviert
      Aufrüstung                                        Socket LGA771
      Sockelbezeichnung                                 J6PR

-----------------------------------------------------------------------------------------------------------------------------
 CPU-Eigenschaften:
      CPU Typ                                           2x QuadCore Intel Xeon E5405
      CPU Bezeichnung                                   Harpertown
      CPU stepping                                      C0/M0
      Engineering Sample                                Nein
      CPUID CPU Name                                    Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
      CPUID Revision                                    00010676h
      Core Spannung                                     1.038 V

    CPU Geschwindigkeit:
      CPU Takt                                          2398.3 MHz  (Original: [ TRIAL VERSION ] MHz, overclock: 20%)
      CPU Multiplikator                                 6x
      CPU FSB                                           399.7 MHz  (Original: 333 MHz, overclock: 20%)
      Speicherbus                                       399.7 MHz
      DRAM:FSB Verhältnis                               1:1

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sensor Eigenschaften:
      Sensortyp                                         Dual ADT7490  (SMBus 2Ch, 2Eh)
      GPU Sensortyp                                     Diode  (ATI-Diode)
      Motherboard Name                                  Intel D5400XS

    Temperaturen:
      CPU1                                              59 °C  (138 °F)
      CPU2                                              58 °C  (136 °F)
      1. CPU / 1. Kern                                  74 °C  (165 °F)
      1. CPU / 2. Kern                                  64 °C  (147 °F)
      1. CPU / 3. Kern                                  71 °C  (160 °F)
      1. CPU / 4. Kern                                  72 °C  (162 °F)
      2. CPU / 1. Kern                                  70 °C  (158 °F)
      2. CPU / 2. Kern                                  66 °C  (151 °F)
      2. CPU / 3. Kern                                  68 °C  (154 °F)
      2. CPU / 4. Kern                                  68 °C  (154 °F)
      DIMM                                              76 °C  (169 °F)
      GPU Diode                                         59 °C  (138 °F)
      Temperatur 1                                      57 °C  (135 °F)
      Temperatur 2                                      56 °C  (133 °F)
      Temperatur 3                                      54 °C  (129 °F)
      FB-DIMM1                                          92 °C  (198 °F)
      FB-DIMM2                                          84 °C  (183 °F)
      Seagate ST31000340NS                              [ TRIAL VERSION ]
      Seagate ST31000340NS                              [ TRIAL VERSION ]
      Seagate ST31000340NS                              [ TRIAL VERSION ]

    Kühllüfter:
      CPU1                                              650 RPM
      CPU2                                              652 RPM
      North Bridge                                      3887 RPM
      South Bridge                                      4255 RPM
      DIMM                                              3417 RPM
      Aux                                               1176 RPM

    Spannungswerte:
      CPU1 Core                                         1.21 V
      CPU2 Core                                         1.11 V
      +1.5 V                                            1.51 V
      +3.3 V                                            3.35 V
      +5 V                                              5.13 V
      +12 V                                             [ TRIAL VERSION ]
      FSB VTT                                           1.11 V
      North Bridge Kern                                 1.25 V
      DIMM                                              1.80 V


--------[ CPU ]---------------------------------------------------------------------------------------------------------

    CPU-Eigenschaften:
      CPU Typ                                           2x QuadCore Intel Xeon E5405, 2400 MHz (6 x 400)
      CPU Bezeichnung                                   Harpertown
      CPU stepping                                      C0/M0
      Befehlssatz                                       x86, x86-64, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1
      Vorgesehene Taktung                               [ TRIAL VERSION ]
      Min / Max CPU Multiplikator                       6x / 6x
      Engineering Sample                                Nein
      L1 Code Cache                                     32 KB per core
      L1 Datencache                                     [ TRIAL VERSION ]
      L2 Cache                                          2x 6 MB  (On-Die, ASC, Full-Speed)

    Multi CPU:
      Motherboard ID                                   
      CPU #1                                            Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 2411 MHz
      CPU #2                                            Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 2398 MHz
      CPU #3                                            Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 2398 MHz
      CPU #4                                            Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 2398 MHz
      CPU #5                                            Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 2398 MHz
      CPU #6                                            Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 2398 MHz
      CPU #7                                            Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 2398 MHz
      CPU #8                                            Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 2398 MHz

    CPU Hersteller:
      Firmenname                                        Intel Corporation
      Produktinformation                                http://www.intel.com/products/processor

    CPU Auslastung:
      1. CPU / 1. Kern                                  100 %
      1. CPU / 2. Kern                                  100 %
      1. CPU / 3. Kern                                  100 %
      1. CPU / 4. Kern                                  100 %
      2. CPU / 1. Kern                                  100 %
      2. CPU / 2. Kern                                  100 %
      2. CPU / 3. Kern                                  100 %
      2. CPU / 4. Kern                                  100 %


--------[ CPUID ]-------------------------------------------------------------------------------------------------------

    CPUID Eigenschaften:
      CPUID Hersteller                                  GenuineIntel
      CPUID CPU Name                                    Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
      CPUID Revision                                    00010676h
      IA Markenzeichen ID                               00h  (Unbekannt)
      Plattform ID                                      27h / MC 40h  (LGA771 DP)
      Microcode Update Revision                         60B
      HTT / CMP Einheiten                               0 / 4
      Tjmax Temperatur                                  105 °C  (221 °F)

--------[ Motherboard ]-------------------------------------------------------------------------------------------------

    Motherboard Eigenschaften:
      Motherboard ID                                    XS54010J.86A.1140.2008.0520.2239
      Motherboard Name                                  Intel Skulltrail D5400XS

    Front Side Bus Eigenschaften:
      Bustyp                                            Intel AGTL+
      Busbreite                                         64 Bit
      Tatsächlicher Takt                                400 MHz (QDR)
      Effektiver Takt                                   1600 MHz
      Bandbreite                                        12800 MB/s

    Speicherbus-Eigenschaften:
      Bustyp                                            Dual DDR2 SDRAM
      Busbreite                                         128 Bit
      DRAM:FSB Verhältnis                               1:1
      Tatsächlicher Takt                                400 MHz (DDR)
      Effektiver Takt                                   800 MHz
      Bandbreite                                        [ TRIAL VERSION ] MB/s

    Motherboard Technische Information:
      CPU Sockel/Steckplätze                            2 LGA771
      Erweiterungssteckplätze                           [ TRIAL VERSION ]
      RAM Steckplätze                                   4 FB-DIMM
      Integrierte Geräte                                Audio, Gigabit LAN
      Bauform (Form Factor)                             Extended ATX
      Motherboard Chipsatz                              i5400
      Besonderheiten                                    [ TRIAL VERSION ]

    Motherboardhersteller:
      Firmenname                                        Intel Corporation
      Produktinformation                                http://www.intel.com/products/motherboard/index.htm
      BIOS Download                                     http://downloadcenter.intel.com/default.aspx

--------[ SPD ]---------------------------------------------------------------------------------------------------------

  [ Channel0-DIMM1: Kingston (4 GB DDR2-800 ECC DDR2 SDRAM FB-DIMM) ]

    Arbeitsspeicher Eigenschaften:
      Modulname                                         Kingston
      Seriennummer                                      820E8BBDh (3180007042)
      Herstellungsdatum                                 Woche 15 / 2008
      Modulgröße                                        4 GB (2 ranks, 8 banks)
      Modulart                                          FB-DIMM
      Speicherart                                       DDR2 SDRAM FB-DIMM
      Speichergeschwindigkeit                           DDR2-800 (400 MHz)
      Modulspannung                                     1.5 V / 1.8 V
      Fehlerkorrekturmethode                            ECC
      Auffrischungsrate                                 Reduziert (7.8 us)
      DRAM Hersteller                                   Elpida

    Speicher Timings:
      @ 320 MHz                                         4-4-4-15  (CL-RCD-RP-RAS) / 18-41-3-5-3-3  (RC-RFC-RRD-WR-WTR-RTP)
      @ 400 MHz                                         5-5-5-18  (CL-RCD-RP-RAS) / 22-51-3-6-3-3  (RC-RFC-RRD-WR-WTR-RTP)

    Speichermodulbesonderheiten:
      50 Ohm On-Die Termination                         Unterstützt
      75 Ohm On-Die Termination                         Unterstützt
      150 Ohm On-Die Termination                        Unterstützt

    Speichermodulhersteller:
      Firmenname                                        Kingston Technology Company, Inc.
      Produktinformation                                http://www.kingston.com/products/default.asp

  [ Channel1-DIMM1: Kingston (4 GB DDR2-800 ECC DDR2 SDRAM FB-DIMM) ]

    Arbeitsspeicher Eigenschaften:
      Modulname                                         Kingston
      Seriennummer                                      1904442Eh (776209433)
      Herstellungsdatum                                 Woche 14 / 2008
      Modulgröße                                        4 GB (2 ranks, 8 banks)
      Modulart                                          FB-DIMM
      Speicherart                                       DDR2 SDRAM FB-DIMM
      Speichergeschwindigkeit                           DDR2-800 (400 MHz)
      Modulspannung                                     1.5 V / 1.8 V
      Fehlerkorrekturmethode                            ECC
      Auffrischungsrate                                 Reduziert (7.8 us)
      DRAM Hersteller                                   Elpida

    Speicher Timings:
      @ 320 MHz                                         4-4-4-15  (CL-RCD-RP-RAS) / 18-41-3-5-3-3  (RC-RFC-RRD-WR-WTR-RTP)
      @ 400 MHz                                         5-5-5-18  (CL-RCD-RP-RAS) / 22-51-3-6-3-3  (RC-RFC-RRD-WR-WTR-RTP)

    Speichermodulbesonderheiten:
      50 Ohm On-Die Termination                         Unterstützt
      75 Ohm On-Die Termination                         Unterstützt
      150 Ohm On-Die Termination                        Unterstützt

    Speichermodulhersteller:
      Firmenname                                        Kingston Technology Company, Inc.
      Produktinformation                                http://www.kingston.com/products/default.asp


--------[ BIOS ]--------------------------------------------------------------------------------------------------------

    BIOS Eigenschaften:
      BIOS Typ                                          Intel
      BIOS Version                                      XS54010J.86A.1140.2008.0520.2239
      Datum System BIOS                                 05/20/08
      Datum Video BIOS                                  01/21/08

--------[ Windows Video ]-----------------------------------------------------------------------------------------------

  [ ATI Radeon HD 3870 ]

    Video Adapter Eigenschaften:
      Gerätebeschreibung                                ATI Radeon HD 3870
      Adapterserie                                      ATI Radeon HD 3870
      BIOS Version                                      11X-1E620A-100
      Chiptyp                                           ATI Radeon Graphics Processor (0x9501)
      DAC Typ                                           Internal DAC(400MHz)
      Treiberdatum                                      20.12.2007
      Treiberversion                                    8.451.0.0
      Treiberanbieter                                   ATI Technologies Inc.
      Speichergröße                                     512 MB

    Installierter Treiber:
      atiumdag                                          7.14.10.0555 - ATI Catalyst 8.1
      atiumdva                                          7.14.10.0178
      atiumd64                                          7.14.10.0555
      atiumd6a                                          7.14.10.0178
      atitmm64                                          6.14.11.17

    Video Adapter Hersteller:
      Firmenname                                        Advanced Micro Devices, Inc.
      Produktinformation                                http://ati.amd.com/products/home-office.html
      Treiberdownload                                   http://game.amd.com/us-en/drivers_catalyst.aspx
      Treiberupdate                                     http://driveragent.com?ref=59

  [ ATI Radeon HD 3870 ]

    Video Adapter Eigenschaften:
      Gerätebeschreibung                                ATI Radeon HD 3870
      Adapterserie                                      ATI Radeon HD 3870
      BIOS Version                                      11X-1E620A-100
      Chiptyp                                           ATI Radeon Graphics Processor (0x9501)
      DAC Typ                                           Internal DAC(400MHz)
      Treiberdatum                                      20.12.2007
      Treiberversion                                    8.451.0.0
      Treiberanbieter                                   ATI Technologies Inc.
      Speichergröße                                     512 MB

    Installierter Treiber:
      atiumdag                                          7.14.10.0555 - ATI Catalyst 8.1
      atiumdva                                          7.14.10.0178
      atiumd64                                          7.14.10.0555
      atiumd6a                                          7.14.10.0178
      atitmm64                                          6.14.11.17

    Video Adapter Hersteller:
      Firmenname                                        Advanced Micro Devices, Inc.
      Produktinformation                                http://ati.amd.com/products/home-office.html
      Treiberdownload                                   http://game.amd.com/us-en/drivers_catalyst.aspx
      Treiberupdate                                     http://driveragent.com?ref=59


--------[ PCI / AGP Video ]---------------------------------------------------------------------------------------------

    ATI Radeon HD 3870 (RV670)                                                        Grafikkarte
    ATI Radeon HD 3870 (RV670)                                                        3D-Beschleuniger


--------[ Grafikprozessor (GPU) ]---------------------------------------------------------------------------------------

  [ PCI Express 2.0 x16: Sapphire Radeon HD 3870 ]

    Grafikprozessor Eigenschaften:
      Grafikkarte                                       Sapphire Radeon HD 3870
      BIOS Version                                      010.077.000.000.000000
      BIOS Datum                                        01/21/08 08:48
      GPU Codename                                      RV670
      Teilenummer                                       11X-1E620A-100
      PCI-Geräte                                        1002-9501 / 174B-E620  (Rev 00)
      Transistoren                                      666 Mio.
      Fertigungstechnologie                             55 nm
      Gehäusefläche                                     192 mm2
      Bustyp                                            PCI Express 2.0 x16 @ x16
      Speichergröße                                     512 MB
      GPU Takt                                          297 MHz  (Original: 776 MHz)
      RAMDAC Takt                                       400 MHz
      Pixel Pipelines                                   16
      Textureinheiten (TMU) / Pipeline                  1
      Unified Shaders                                   320  (v4.1)
      DirectX Hardwareunterstützung                     DirectX v10.1
      Pixel Füllrate                                    4752 MPixel/s
      Texel Füllrate                                    [ TRIAL VERSION ]

    Speicherbus-Eigenschaften:
      Bustyp                                            GDDR4
      Busbreite                                         256 Bit
      Tatsächlicher Takt                                1125 MHz (DDR)  (Original: 1126 MHz)
      Effektiver Takt                                   2250 MHz
      Bandbreite                                        [ TRIAL VERSION ]

    Verschiedenes:
      Lüftergeschwindigkeit                             81%
      Auslastung                                        1%

    ATI PowerPlay (BIOS):
      State #1                                          Grafikprozessor (GPU): 777 MHz, Speicher: 1126 MHz  (Boot)
      State #2                                          Grafikprozessor (GPU): 776 MHz, Speicher: 1126 MHz  (OverDrive)
      State #3                                          Grafikprozessor (GPU): 776 MHz, Speicher: 1126 MHz
      State #4                                          Grafikprozessor (GPU): 776 MHz, Speicher: 1126 MHz  (UVD)

    Grafikprozessorhersteller:
      Firmenname                                        Advanced Micro Devices, Inc.
      Produktinformation                                http://ati.amd.com/products/home-office.html
      Treiberdownload                                   http://game.amd.com/us-en/drivers_catalyst.aspx
      Treiberupdate                                     http://driveragent.com?ref=59

    ATI GPU Registers:
      ati-$0134                                         3ABE40E8
      ati-$0138                                         000031EE
      ati-$0600                                         21001CBC
      ati-$0610                                         24001F54
      ati-$0618                                         000176EB
      ati-$06FC                                         00000000
      ati-$0700                                         00001C5A
      ati-$0704                                         00001419
      ati-$0708                                         00001409
      ati-$070C                                         00812090
      ati-$078C                                         000241EB
      ati-$07D4                                         00000000
      ati-$07D8                                         00000000
      ati-$07F4                                         0018F239
      ati-$0808                                         00136D87
      ati-$2004                                         00002F10
      ati-$2408                                         000013A9
      ati-$5428                                         20000000
      ati-$8950                                         00000000
      ati-$8954                                         00000000
-------------------------------------------------------------------------------------------------------------------

and least at all we have a 3870 graphicprocessor with 320 streaming units,
and this resource I will use to sopport seti in our next app.
There are some news from ATI, 3870, 3870 X2, 4870 will support full floatpoint functionality together with a new developer environment.
This open us much more possibilities as NVIDIA have at the moment.
4870 has 800 streaming units....and there are 4870 X2 in the next future.
So we know where the development must go...
Using GPU's  together with their development environments and ITBB are our goals for the next half of year in our development process.

regards _heinz  ;D



Title: Re: optimized sources
Post by: Jason G on 05 Jul 2008, 10:06:23 am
Hi Heinz,
   Looks like you can cook some eggs for breakfast on your RAM  ;D I guess as you push the CPU's a bit more this may be the first thing that may require some compromise (perhaps on the 1:1 ratio).  If you drop the voltage on those a little do you get errors?, just curious

Jason
Title: Re: optimized sources
Post by: _heinz on 05 Jul 2008, 02:13:17 pm
Hi Heinz,
   Looks like you can cook some eggs for breakfast on your RAM  ;D I guess as you push the CPU's a bit more this may be the first thing that may require some compromise (perhaps on the 1:1 ratio).  If you drop the voltage on those a little do you get errors?, just curious

Jason
I have seen this:
        DIMM                                              76 °C  (169 °F)
      FB-DIMM1                                          92 °C  (198 °F)        ::)
      FB-DIMM2                                          84 °C  (183 °F)
have a look in the specification:
Tstg   Storage temperatur                                                   min = -55 , max = 100 C
Tcase  DDR2 SDRAM device operating temperature (Ambient)  min = 0 , max = 95 C  (1)
Tcase  AMB device operating temperature (Ambient)              min = 0 , max = 110 C
(1)Above 85 C DRAM case temperature the Auto-Refresh command interval has to be reduced to tREFI = 3,9µs
------------------------------------------------------------------------------------------------------------------------------------------------------------
actual roomtemp = 27,8 C
My mounted FBDIMM-sensor is direct on the AMB, and it shows actual 68 C (left case fan on min speed)
If I switch both case fans on and poti on full speed, temp of FB-DIMM = 66, and cpu temp  min = 54 , max = 67 C
FB-DIMM1 = 96 C
FB-DIMM2 = 90 C
hmm.... the temp of the FB-DIMMs is going up to 100....so it is better to switch the case fans off at the moment.

to hold the temps down it is necessory to have a DRAM-Cooler, as I have and mounted.
I have noted, if I switch on the left big case fan, the warm air from the right cpu(2) will pressed down to the DRAM-Cooler and temps are increase by this. I should mount a 45 degree sheet metal between the two cpus, then the warm air from the right cpu(2) goes upwards and not into the cpu(1)
Some time is it better to switch the case fans off, the cpu temps then increase to max 74 C, but the FB-DIMM temps are not so high
Now I have the case fans off. FB-DIMM = 69, DIMM1 = 96, DIMM2 = 88 C , CPU's min = 63, max = 74

We know the FB-DIMMs are a hot matter.....the FB-DIMM cooler runs with 3280 rpm....autocontrolled by the board...
have you noted that the two cpu-coolers run still very slow.. CPU1 = 648 rpm, CPU2 = 651 rpm....
The system is really silent, if the rommtemp is not over 30 C.

There are some experiments necessary to run optimal...

heinz (cooked now the breakfast eggs)  ;D




Title: Re: optimized sources
Post by: _heinz on 05 Jul 2008, 04:36:26 pm
a small test
now I stopped Boinc waited 15 min, to cool down
FB-DIMM = 69 (my sensor)
FB-DIMM  = 72 C
FB-DIMM1 = 67 C
FB-DIMM2 = 62 C
------------------------------------------------------
conclusion:
100% full load increase the temp
 of FBDIMM1 from 67 to 96 C
and FBDIMM2 from 62 to 88 C
and the mounted OCZ- DIMM fan runs from 2748 rpm to 3302 rpm on full load
as Jason already commented:
heinz (cooked now the breakfast eggs) LOL

 ;D
Title: Re: optimized sources
Post by: Raistmer on 05 Jul 2008, 05:02:52 pm
New approach to grid computing ;)
http://www.codeproject.com/KB/silverlight/gridcomputing.aspx
Title: Re: optimized sources
Post by: _heinz on 05 Jul 2008, 06:05:10 pm
New approach to grid computing ;)
http://www.codeproject.com/KB/silverlight/gridcomputing.aspx

Thank you very much to remember me, I have got a email too ;)
heinz
Title: Re: optimized sources
Post by: _heinz on 05 Jul 2008, 06:38:30 pm
24:00 hour
temp = 21,9 C , humidity = 97 %
FB-DIMM = 66 C (my sensor)
with both case fans are on(poti min) I can coole down the machine a bit
here are the values from Everest (100% full load on all 8 cores)

Sensor Eigenschaften:
Sensortyp Dual ADT7490 (SMBus 2Ch, 2Eh)
GPU Sensortyp Diode (ATI-Diode)
Motherboard Name Intel D5400XS

Temperaturen:
CPU1 48 °C (118 °F)
CPU2 52 °C (126 °F)
1. CPU / 1. Kern 66 °C (151 °F)
1. CPU / 2. Kern 53 °C (127 °F)
1. CPU / 3. Kern 63 °C (145 °F)
1. CPU / 4. Kern 63 °C (145 °F)
2. CPU / 1. Kern 61 °C (142 °F)
2. CPU / 2. Kern 56 °C (133 °F)
2. CPU / 3. Kern 61 °C (142 °F)
2. CPU / 4. Kern 61 °C (142 °F)
DIMM 69 °C (156 °F)
GPU Diode 40 °C (104 °F)
Temperatur 1 48 °C (118 °F)
Temperatur 2 46 °C (115 °F)
Temperatur 3 48 °C (118 °F)
FB-DIMM1 90 °C (194 °F)
FB-DIMM2 84 °C (183 °F)

Kühllüfter:
CPU1 610 RPM
CPU2 637 RPM
North Bridge 2502 RPM
South Bridge 4182 RPM
DIMM 2671 RPM
Aux 566 RPM
-----------------------------------------------------------------
the FB-DIMM1 is 6 degrees cooler now, cpu temps round 10 degrees,
looks like the high humidity helps to cool down  :o

heinz

Title: Re: optimized sources
Post by: _heinz on 12 Aug 2008, 04:33:57 pm
The V8-Xeon Server has now the highest possible Windows-performanceindex of 5,9

performanceindex (http://www.britta-d.de/images/v8_leistungsindex.jpg)   ;D

heinz
Title: Re: optimized sources
Post by: Raistmer on 12 Aug 2008, 04:45:34 pm
Congratulations! :)
My quad lose in graphics. It seems 5,9 - the highest mark Vista can set? V8 surely should have higher CPU marks than quad...
Title: Re: optimized sources
Post by: Jason G on 12 Aug 2008, 09:47:40 pm
ey Heinz! time to drag out the development documentation again  ;D
Title: Re: optimized sources
Post by: _heinz on 14 Aug 2008, 06:12:24 pm
News from the V8-Xeon:
Fedora9_x86_64 is installed now on the SKT parallel to Vista64 on the second disk.
I'm here with the Firefox 3 Beta 5 now.
There are still some additional adjustments todo, like monitorsupport and resolution. Hoping that the fonts then look a little bit better.

heinz

Title: Re: optimized sources
Post by: _heinz on 15 Aug 2008, 08:53:59 am
ey Heinz! time to drag out the development documentation again  ;D
Today I installed TortoiseSVN, latest Version. If I have done all the necessary adjustments then I will open the development documentation "H_Performance_Doku". It consists of 1292 files, summary 187 MB.
You will get a PM if it is done.

heinz
Title: Re: optimized sources
Post by: Jason G on 15 Aug 2008, 09:04:27 am
Hey Heinz!, remember about the SVN Autoproperties thing at:
http://lunatics.kwsn.net/5-windows/stock-sah-v6-build-notes-so-far.msg6174.html#msg6174

to avoid line ending madness... well reduce it anyway  ;)

Title: Re: optimized sources
Post by: Jason G on 15 Aug 2008, 10:41:20 am
Right -
  to unhide Application Data folder in x64 I did the following

- Start a command prompt [ Change to whichever drive has the "Documents and Settings" folder ]
- cd \Documents and Settings\<your username>
- attrib -h "Application data"
(don't forget the quotes)

now it should show next time you try to browse there or do a 'DIR' at the command prompt

Jason
Title: Re: optimized sources
Post by: _heinz on 15 Aug 2008, 03:15:50 pm
The performance_doku is now in our test-repository. It is worth to have a look at it...
You can find some important knowledge out of the HPC (High Performance Computing) world.
Attention if you work via a phone line, some files are not even small.
Have fun to explore it.
modify: I made a directory structure for better browse and acess for the phone-line user

heinz  ;D
Title: Re: optimized sources
Post by: _heinz on 17 Aug 2008, 02:59:41 pm
Hey Heinz!, remember about the SVN Autoproperties thing at:
http://lunatics.kwsn.net/5-windows/stock-sah-v6-build-notes-so-far.msg6174.html#msg6174

In Vista the directory is:
\Users\<your user name>\AppData\subversion

have done this SVN Autoproperties thing
now it works and makes the update  ;)

a lot of data are transfered to make the SVN update now on the V8-Xeon
I thought it would go easier, copied all files on a usb-stick, then on the V8-Xeon,
at first this shows great, it shows the green symbol, then I maked update...now all files are uploaded again...
.. but I have time...as long as the line are good...no problem...

heinz

Title: Re: optimized sources
Post by: _heinz on 19 Aug 2008, 09:22:23 am
4870 x2 with 2x 800 Stream Processing Units is now available in germany, price 449,00 €
Technische Daten
Chipsatz der Grafikkarte:  Radeon HD4870 X2 
Grafikkartenspeicher:  2048MB 
Speicherinterface:  256 bit 
Taktung der GPU in MHz:  750 
Taktung des Speichers in MHz:  3600 
RAMDAC:  2x 400 Mhz 
PCI-e x16:  Ja 
Steckplatz der Grafikkarte:  PCIe x16 
Anschlüsse der Grafikkarte:  2x DVI 
TV-Out:  Ja 
Max. Auflösung:  2560x1600 
Direct-X Version der Grafikkarte:  10.1 
Farbtiefe:  32 bit 
Crossfire Support:  Ja 
Kühlung der Grafikkarte:  aktiv 
Verpackung:  Retail 
 
weitere Informationen
Technische Daten:

- 2x ATI HD 4870 Grafikchip mit 750MHz
- 2048MB GDDR5 3600MHz Speichertakt
- DirectX 10.1 Unterstützung
- Shader Model 4.1
- PCI-Express 2.0 x16
- 2x 800 Stream Processing Units
- 2x 256-bit Memory Interface
- 400MHz RAMDAC
- Dual-Link DVI mit HDCP / TV-Out / HDMI durch Adapter (nicht im Lieferumfang)
- CrossFireX™
- ATI AvivoTM HD
- PowerPlay™-Stromspar Technologie
- Kühlung: aktive Dual-Slot Lösung
- Display Port Unterstützung
- Windows Vista ™ Premium Ready
----------------------------------------------------------------------------------------------------------
It shows us where we should go in our  hpc-development process

heinz ;D
Title: Re: optimized sources
Post by: Jason G on 19 Aug 2008, 09:36:03 am
see post I just made in other thread  ;)
Title: Re: optimized sources
Post by: _heinz on 19 Aug 2008, 02:11:07 pm
News from the V8-Xeon:
Fedora9_x86_64 is installed now on the SKT parallel to Vista64 on the second disk.
heinz

Today fedora9 runs several hundred of updates, although I installed the latest stable version.
here is a look at fedora systemmonitor (http://www.britta-d.de/images/fedora_systemueberwachung.jpg) during the update process.
After the updates I must still install the ATI graphic-driver for HD3870.
With the standard driver the resolution is not high enough and the fonts look out blurred.
Perhaps I need a monitor file too, but I have'nt...it uses generic..
heinz


Title: Re: optimized sources
Post by: _heinz on 19 Aug 2008, 08:41:07 pm
What a adventure with the ATI-graphic driver.
After download the file ati-driver-installer-8-7-x86.x86_64.run  into a central location I logged in a terminal as su and started it with the command
./ati-driver-installer-8-7-x86.x86_64.run
it worked and started as here (http://www2.ati.com/drivers/linux/installernotes.html)
after I have this done (standard) the system said the driver is not properly installed please run aticonfig from a terminal as su
or if xserver fails run aticonfig --initial -f  from console.

after I typed the commands
./aticonfig --initial=dual-head --screen-layout=above
./aticonfig --dtop=horizontal --overlay-on=1
./aticonfig --resolution=0,1600x1200,1280x1024,1024x768
./aticonfig --force-monitor=crt1,notv
./aticonfig --tv-geometry=85x90+10-10
./aticonfig --list-adapters
it shows "0"

./aticonfig --adapter=0 --initial

./startx

it says now:
logfile: "/var/log/Xorg.0.log"
using configfile: "/etc/X11/xorg.conf"
dlopen: "/usr/lib64/xorg/modules/drivers//fglrx_drv.so"  :undefined Symbol: miZeroLineScreenIndex
(EE) Failed to load "/usr/lib64/xorg/modules/drivers//fglrx_drv.so"
(EE) Failed to load module fglrx (loader failed 7 )
(EE) No drivers available
Fatal Server errors
No screens found
giving up   :'(

and run aticonfig --initial -f   did not help
---------------------------------------
a second install and configure get the same result.....uhhhh what a pain...
the system starts and xstart fails, it stand then on the console login

looks like a new install, but then I have the same problem, no the actual ATI driver and still small resolution...
did not expected such a trouble with fedora9....and the ati-driver... :'( :'( :'(

heinz





Title: Re: optimized sources
Post by: _heinz on 20 Aug 2008, 03:01:21 pm
After some tries to install the driver again, allways same error...
A rescue of the system did not help....startx did not start sucessful..
I started a complete new installation...on the formated disk..
After this was done and the system comes up, and the update process started -->
37 security updates
330 bugfix updates
29 enhancement updates.....
download and installation runs 5 hours....
I had never thought that such a lot of updates must be done, my download was newest version from 14.05.2008 when Fedora9 was published. Better no comment to this...
---------------------------------------------
Fedora  release 9(Sulphur), RedHat nash version 6.0.52 is now up again.
kernel: 2.6.25-14.fc9.x86_64

The problem with the monitor resolution is not solved...must read some fora comments, I think the system did not known my TFT-monitor-specific values like max possible resolutions etc..., the manufacturer Medion is unknown....it takes generic, so resolution of 1280x1024 and some others are  not available.
What we can do ?  maybe to edit the monitor file....
Very angry..... :'(

Why I do this ?? All for our testbed...and development systems....
heinz

Title: Re: optimized sources
Post by: Jason G on 20 Aug 2008, 03:10:29 pm
Hi Heinz,
   Back ~14 years ago we had to recompile the kernel just to get a network card running, and manually configure the monitor and video card files too,  and some video card manufacturers outright refused to release details / driver code. later it was found they were doing dodgy stuff like skipping frames to make the refresh appear faster. 

Is fedora close enough to any other distros that they may have a more suitable driver package etc? were all your monitor specs supplied with your monitor? maybe you can tailor configuration files from an existing model that is close.

Dunno, we used to have to do it all manually, but that was way before modern window managers, so maybe this is no longer possible?  Good luck.

Jason
Title: Re: optimized sources
Post by: _heinz on 22 Aug 2008, 02:46:27 pm
Hi Jason,
the toaster is alredy on in the V8-Xeon thread, FB-DIMM1-temps of 101°C are monitored, have a look there

heinz
Title: Re: optimized sources
Post by: Jason G on 22 Aug 2008, 02:51:29 pm
Hi Heinz,
   That seems rather incredible.  Have you got some kind of thermometer to put it there so you can see if that is anywhere near accurate?
Title: Re: optimized sources
Post by: _heinz on 22 Aug 2008, 04:02:02 pm
Hi Heinz,
   That seems rather incredible.  Have you got some kind of thermometer to put it there so you can see if that is anywhere near accurate?

The only thing is the sensor of the case, which i placed between the 2 FB-DIMMs, in the last 5 minutes this sensor goes down from 71 to 65°C, after I ended the measuruing with the programms Everest, CPUID-Hardwaremonitor and Processexplorer.
A temp of 65°C looks normal for me. Roomtemp = 25,7°C.
Very interesting that measuring increase the FB-DIMM temp so much, what do you mean to this hypothese ? Can this be ?
I started short Everest to look up
FB-DIMM = 76°C
FB-DIMM1 = 94°C
FB-DIMM2 = 87°C
looks like normal, as I had have before
--------------------------
looks like the hypothese is true. 
hmmm.. very interesting   ;D
edit: I looked up in the wu's, as the temp goes down to 65, I had a couple of this short 15 min wu's running
now my case FB-DIMM-sensor shows  again 71°C
FB-DIMM = 76
FB-DIMM1 = 96
FB-DIMM2 = 88
at this moment are 8 wu's  running with a duration of 1h:01min
but the values look now better, seems ok,.

heinz
Title: Re: optimized sources
Post by: _heinz on 06 Sep 2008, 02:27:16 pm
Killed several times my multiboot by using Acronis Disk Director Suite and OS-Selektor.  :o
Now some support requests are open...
OS-Selector quitted his work...
Always Fedora9 was booting...
from there I edited the file bootwiz.oss   ..after this was done Vista64 starts again
Get now MBR Error3 at start  :'(
Must still fix it, but Vista boots again..Linux is dead...Grub shows some sites of laughing masks  ;D rofl
several tries to fix grub dont have sucess...
A lot of trouble with all this multiboot....
Have now latest Open Suse 11.0 und Fedora9 as DVD and Knoppix ready for install

heinz
Title: Re: optimized sources
Post by: _heinz on 07 Sep 2008, 05:26:58 am
Fixed the MBR Error 3
With a Rescue Disk assembled with Acronis  I was able after some adventuries to reinstall and activate OS-Selektor again.
This multiboot stuff is really a very complex thematic.
At the moment are installed and working:
disk1: Windows Vista64
disk2: Fedora9 ...
disk3: free for data
no raid active..

heinz


Title: Re: optimized sources
Post by: _heinz on 13 Sep 2008, 03:45:45 pm
Hi Jason,
to get compatibility with several softwarepackages I ordered a additional OS "Windows XP Professional x64 Edition" , which has a sticker "English with German and Japanese Multilingual User Interface CD" .
This OS I installed today on the V8-Xeon into a separate 60GB free partition on the first disk. Installation runs as usual, I choosed german for keyboard and language, but wounder when it starts it shows english-surface. ? uuhh...

Second: the system XP Prof 64 does not recognize the networkadaptor (http://www.britta-d.de/images/network_not_detected) from board DX5400XS.
Vista64 shows this networkadaptor (http://www.britta-d.de/images/vista64_networkadaptor.jpg) which is properly installed (http://www.britta-d.de/images/vista64_intel_pro_1000_pl_network_connection.jpg).
I have not expected to get this error in the network installation.
Its a pain to have latest hardware and the system did not install properly because some drivers are not found on the install medium.
cry... :'(  
Sure I can set in a second better known network-card.... as 3com etc...but thats not my thing.

Third is to say: after the installation of XP, Vista did not start anymore(this I have expected). So I started with the Vista DVD to repair the start-area and wounder what it shows:
Diagnostic and reparation details:
The startsector of the systempartition is damaged.
Repair action: boot sector repair
Result: sucessful
Errorcode = 0x0
----------------------
so far OK, Vista64 starts again but the bootmenue as I know from NT4-Server and XP is not shown anymore.

Next step is to look up for drivers from Intel...then try again...
There is still some research todo to get a full functional multiboot machine..with several different windows and several different linux installations.

heinz


Title: Re: optimized sources
Post by: _heinz on 13 Sep 2008, 04:05:21 pm
Hi  Joe, Jason and Raistmer,
today I added some links to our testproject,  all simple small text files.
regards

Title: Re: optimized sources
Post by: Jason G on 13 Sep 2008, 04:17:54 pm
LoL, multiboot nightmares  ;D, reminds me of some guy installing Linux at school, he didn't follow directions properly and wiped all the other classes partitions (x8) ... he was made to reinstall all of them. 

I'm lazy so I only use VMs for Vista32 & 2 linux distros (just learning).  Curious, I would've thought if you had fedora installed before you'd be using grub for the boot menu.  I hope you can iron out these issues before Santa brings us all 16 thread Nehalems for Christmas, so we can use this thread to find out how to cope with the OS & driver issues. (well it's on my letter to Santa, but he never seems to read them and brings socks & t-shirts instead).

Jason
Title: Re: optimized sources
Post by: _heinz on 13 Sep 2008, 05:23:21 pm
Meanwhile there are several driver updates for D5400XS available... (http://downloadcenter.intel.com/Filter_Results.aspx?strOSs=All&strTypes=All&ProductID=2864&OSFullName=&lang=deu)  :o
Time to setup again...
Title: Re: optimized sources
Post by: _heinz on 13 Sep 2008, 08:41:14 pm
Get my NT4-Server Domain-Controller running as VM.  ;D
Title: Re: optimized sources
Post by: _heinz on 26 Oct 2008, 09:02:49 am
To resolve the problem which I had have with the Acronis OS-Selector, I tried to uninstall the "Acronis Disk Director" completely. After this the OS Vista did not start anymore, some files are missing when the system starts up in its initial phase.
Looks like the deinstallation of "Acronis Disk Director" killed them.
After this I tried a repair of Vista from the install DVD, but this had no sucess also. Then I wrote back track 0 record 0, but the same, Vista did not start anymore.
Last chance before reinstall:
I wrote back the whole partition which resides Vista from a backup of september. This starts Vista again.
After several updates the system is now actual.
The problems wit the OS-Selector are not solved, support requests are still open.
My installed XP Professional is not booting now......OS-Selector is not functional anymore...

Jason, you are right...mutiboot nightmares  ;D   :'( :'(

heinz
Title: Re: optimized sources
Post by: _heinz on 28 Oct 2008, 06:49:46 pm
Multiboot is now repaired again and working.
Next step: testing developer environment under Vista64

heinz
Title: Re: optimized sources
Post by: _heinz on 29 Oct 2008, 10:19:58 pm
Installed VS2008 Express and setup the necessary environment to compile a project.
The INTel stuff ( IPP_5.2beta, MKL ) I copied from the XP-machine, set the environment and it works.(IPP_5.3 is now available, I know.)
Wounder why VS2008  did not find ml for masm. It is not included, so I must install the old masm32 V10.
MASM 8.0 (http://www.microsoft.com/downloads/details.aspx?familyid=7A1C9DA0-0510-44A2-B042-7EF370530C64&displaylang=en) requires VS2005 Express and does not support Vista, still XP.   uuuh  ::)

Now I download and install VS2005 Express parallel to VS2008.
 VS2010  (https://connect.microsoft.com/VisualStudio/content/content.aspx?ContentID=9790) is availabel as CTP next.
Looking around for VS2008 Professional now.
heinz
Title: Re: optimized sources
Post by: _heinz on 30 Oct 2008, 06:24:25 am
Get the stackwalker problem under Vista with SDK and the problem with GL/glaux.h
remembered me to the Windows SDK patch (http://boinc.berkeley.edu/trac/attachment/ticket/115/windowssdk.patch)
and google PSYM_ENUMMODULES_CALLBACK64 (http://www.google.de/search?q=PSYM_ENUMMODULES_CALLBACK64&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:de:official&client=firefox-a) can help too
dont forget it if you new install.  ::)
 
heinz
Title: Re: optimized sources
Post by: _heinz on 30 Oct 2008, 10:11:40 am
To test the installation I compiled the old seti_boinc_2k3_2.2B-Ben-Joe and get a linker error:
1>------ Erstellen gestartet: Projekt: seti_boinc, Konfiguration: Release32-NOGFX Win32 ------
1>Verknüpfen...
1>Microsoft (R) Incremental Linker Version 9.00.21022.08
1>Copyright (C) Microsoft Corporation.  All rights reserved.
1>"/OUT:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.exe" "/LIBPATH:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX" "/LIBPATH:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" "/LIBPATH:C:\I\INTEL\IPP\5.2_beta\ia32\lib" "/LIBPATH:C:\I\INTEL\MKL\9.0\ia32\lib" "/LIBPATH:C:\I\VS9\VC\lib" "/LIBPATH:C:\masm32\lib" "/LIBPATH:C:\masm32\m32lib" "/LIBPATH:C:\I\SDK\Lib" "/LIBPATH:C:\I\SDK\Lib\IA64" "/LIBPATH:C:\I\SDK\Lib\AMD64" "/LIBPATH:C:\Programme\Intel\Compiler\C++\10.0.025\IA32\lib" /MANIFEST:NO /DEBUG "/PDB:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.pdb" /MAP /SUBSYSTEM:WINDOWS /DYNAMICBASE:NO /MACHINE:X86 glut32.lib Optimizer.lib image_libs.lib jpeglib.lib libboinc.lib libboincapi.lib non_ICC.lib setiboincdb.lib ippsmerged.lib ippvmmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmtd.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
1>".\Release32-NOGFX\analyzeFuncs.obj"
1>".\Release32-NOGFX\analyzePoT.obj"
1>".\Release32-NOGFX\analyzeReport.obj"
1>".\Release32-NOGFX\app_ipc.obj"
1>".\Release32-NOGFX\boinc_api.obj"
1>".\Release32-NOGFX\chirpfft.obj"
1>".\Release32-NOGFX\fft8g.obj"
1>".\Release32-NOGFX\filesys.obj"
1>".\Release32-NOGFX\gaussfit.obj"
1>".\Release32-NOGFX\gdata.obj"
1>".\Release32-NOGFX\graphics_api.obj"
1>".\Release32-NOGFX\graphics_data.obj"
1>".\Release32-NOGFX\gutil.obj"
1>".\Release32-NOGFX\lcgamm.obj"
1>".\Release32-NOGFX\main.obj"
1>".\Release32-NOGFX\malloc_a.obj"
1>".\Release32-NOGFX\parse.obj"
1>".\Release32-NOGFX\progress.obj"
1>".\Release32-NOGFX\pulsefind.obj"
1>".\Release32-NOGFX\s_util.obj"
1>".\Release32-NOGFX\sah_gfx.obj"
1>".\Release32-NOGFX\sah_gfx_base.obj"
1>".\Release32-NOGFX\schema_master.obj"
1>".\Release32-NOGFX\seti.obj"
1>".\Release32-NOGFX\seti_header.obj"
1>".\Release32-NOGFX\shmem.obj"
1>".\Release32-NOGFX\spike.obj"
1>".\Release32-NOGFX\sqlblob.obj"
1>".\Release32-NOGFX\sqlrow.obj"
1>".\Release32-NOGFX\tgalib.obj"
1>".\Release32-NOGFX\timecvt.obj"
1>".\Release32-NOGFX\util.obj"
1>".\Release32-NOGFX\version.obj"
1>".\Release32-NOGFX\windows_opengl.obj"
1>".\Release32-NOGFX\worker.obj"
1>".\Release32-NOGFX\xml_util.obj"
1>libcpmtd.lib(iostream.obj) : fatal error LNK1112: Modul-Computertyp "IA64" steht in Konflikt mit dem Zielcomputertyp "X86".
1>Das Buildprotokoll wurde unter "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm" gespeichert.
1>seti_boinc - 1 Fehler, 0 Warnung(en)
========== Erstellen: 0 erfolgreich, Fehler bei 1, 0 aktuell, 0 übersprungen ==========
*****************************************************************************
a search over libcpmtd.lib give following results:
Alle suchen "libcpmtd.lib", Unterordner, Suchergebnisse: 1, "C:\I"
  C:\I\SC\ltm_client\boinc\win_build\boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\ltm_client\boinc\win_build\cpdnbbc_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\ltm_client\boinc\win_build\gr_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\ltm_client\boinc\win_build\seed_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\ltm_client\boinc\win_build\installerv2\redist\Windows\src\boinccas\boinccas.vcproj(36):            AdditionalDependencies="msi.lib libcmtd.lib libcpmtd.lib delayimp.lib netapi32.lib advapi32.lib kernel32.lib user32.lib"
  C:\I\SC\ltm_client\boinc\win_build\installerv2\redist\Windows\src\boinccas\boinccas95.vcproj(36):            AdditionalDependencies="msi.lib libcmtd.lib libcpmtd.lib delayimp.lib netapi32.lib advapi32.lib kernel32.lib user32.lib"
  C:\I\SC\ltm_client\ltm_client\win_build\upper_case.vcproj(179):            AdditionalDependencies="libcmtd.lib libcpmtd.lib kernel32.lib user32.lib gdi32.lib opengl32.lib glu32.lib glaux.lib ole32.lib delayimp.lib"
  C:\I\SC\PulTimB_5\client\pultime.vcproj(72):            AdditionalDependencies="Optimizer.lib asmlibm.lib ippsmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmtd.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib $(NOINHERIT)"
  C:\I\SC\PulTimB_5\client\pultime.vcproj(160):            AdditionalDependencies="Optimizer.lib asmlibm.lib ippsmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmtd.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib $(NOINHERIT)"
  C:\I\SC\PulTimB_5\client\Debug\BuildLog.htm(15):/OUT:&quot;Debug\pultime.exe&quot; /INCREMENTAL /LIBPATH:&quot;C:\I\SC\PulTimB_5\client&quot; /LIBPATH:&quot;C:\I\SC\PulTimB_5\client\Debug&quot; /LIBPATH:&quot;C:\I\SC\PulTimB_5\Optimizer\Debug&quot; /LIBPATH:&quot;C:\I\SC\PulTimB_5\Optimizer&quot; /MANIFEST /MANIFESTFILE:&quot;Debug\pultime.exe.intermediate.manifest&quot; /MANIFESTUAC:&quot;level='asInvoker' uiAccess='false'&quot; /DEBUG /PDB:&quot;c:\I\SC\PulTimB_5\client\Debug\pultime.pdb&quot; /SUBSYSTEM:CONSOLE /DYNAMICBASE:NO /MACHINE:X86 Optimizer.lib asmlibm.lib ippsmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmtd.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
  C:\I\SC\PulTimB_5\client\Release\BuildLog.htm(34):/OUT:&quot;Release\pultime.exe&quot; /LIBPATH:&quot;C:\I\SC\PulTimB_5\Optimizer\Release&quot; /LIBPATH:&quot;C:\I\SC\PulTimB_5\Optimizer&quot; /LIBPATH:&quot;C:\I\SC\PulTimB_5\client\release&quot; /LIBPATH:&quot;C:\I\VS9\VC\lib&quot; /LIBPATH:&quot;C:\I\INTEL\IPP\5.2_beta\ia32\lib&quot; /LIBPATH:&quot;C:\I\INTEL\MKL\9.0\ia32\lib&quot; /LIBPATH:&quot;C:\masm32\lib&quot; /LIBPATH:&quot;C:\I\SDK\Lib&quot; /LIBPATH:&quot;C:\masm32\m32lib&quot; /LIBPATH:&quot;C:\I\SDK\Lib\IA64&quot; /LIBPATH:&quot;C:\I\SDK\Lib\AMD64&quot; /MANIFEST /MANIFESTFILE:&quot;Release\pultime.exe.intermediate.manifest&quot; /MANIFESTUAC:&quot;level='asInvoker' uiAccess='false'&quot; /DEBUG /PDB:&quot;c:\I\SC\PulTimB_5\client\release\pultime.pdb&quot; /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /DYNAMICBASE:NO /MACHINE:X86 /LTCG Optimizer.lib asmlibm.lib ippsmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmtd.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
  C:\I\SC\seti\boinc\win_build\boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\seti\boinc\win_build\cpdnbbc_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\seti\boinc\win_build\gr_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\seti\boinc\win_build\seed_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\seti\boinc\win_build\installerv2\redist\Windows\src\boinccas\boinccas.vcproj(36):            AdditionalDependencies="msi.lib libcmtd.lib libcpmtd.lib delayimp.lib netapi32.lib advapi32.lib kernel32.lib user32.lib"
  C:\I\SC\seti\boinc\win_build\installerv2\redist\Windows\src\boinccas\boinccas95.vcproj(36):            AdditionalDependencies="msi.lib libcmtd.lib libcpmtd.lib delayimp.lib netapi32.lib advapi32.lib kernel32.lib user32.lib"
  C:\I\SC\vs90\boinc\win_build\boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\vs90\boinc\win_build\cpdnbbc_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\vs90\boinc\win_build\gr_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\vs90\boinc\win_build\seed_boinc_ss.vcproj(46):            AdditionalDependencies="libcmtd.lib libcpmtd.lib wsock32.lib wininet.lib winmm.lib kernel32.lib user32.lib gdi32.lib advapi32.lib"
  C:\I\SC\vs90\boinc\win_build\installerv2\redist\Windows\src\boinccas\boinccas.vcproj(36):            AdditionalDependencies="msi.lib libcmtd.lib libcpmtd.lib delayimp.lib netapi32.lib advapi32.lib kernel32.lib user32.lib"
  C:\I\SC\vs90\boinc\win_build\installerv2\redist\Windows\src\boinccas\boinccas95.vcproj(36):            AdditionalDependencies="msi.lib libcmtd.lib libcpmtd.lib delayimp.lib netapi32.lib advapi32.lib kernel32.lib user32.lib"
  C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\seti_boinc.vcproj(91):            AdditionalDependencies="glut32.lib Optimizer.lib image_libs.lib jpeglib.lib libboinc.lib libboincapi.lib non_ICC.lib setiboincdb.lib ippsmerged.lib ippvmmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmtd.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib $(NOINHERIT)"
  C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm(13):/OUT:&quot;C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.exe&quot; /LIBPATH:&quot;C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX&quot; /LIBPATH:&quot;C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX&quot; /LIBPATH:&quot;C:\I\INTEL\IPP\5.2_beta\ia32\lib&quot; /LIBPATH:&quot;C:\I\INTEL\MKL\9.0\ia32\lib&quot; /LIBPATH:&quot;C:\I\VS9\VC\lib&quot; /LIBPATH:&quot;C:\masm32\lib&quot; /LIBPATH:&quot;C:\masm32\m32lib&quot; /LIBPATH:&quot;C:\I\SDK\Lib&quot; /LIBPATH:&quot;C:\I\SDK\Lib\IA64&quot; /LIBPATH:&quot;C:\I\SDK\Lib\AMD64&quot; /LIBPATH:&quot;C:\Programme\Intel\Compiler\C++\10.0.025\IA32\lib&quot; /MANIFEST:NO /DEBUG /PDB:&quot;C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.pdb&quot; /MAP /SUBSYSTEM:WINDOWS /DYNAMICBASE:NO /MACHINE:X86 glut32.lib Optimizer.lib image_libs.lib jpeglib.lib libboinc.lib libboincapi.lib non_ICC.lib setiboincdb.lib ippsmerged.lib ippvmmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmtd.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
  C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm(90):"/OUT:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.exe" "/LIBPATH:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX" "/LIBPATH:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX" "/LIBPATH:C:\I\INTEL\IPP\5.2_beta\ia32\lib" "/LIBPATH:C:\I\INTEL\MKL\9.0\ia32\lib" "/LIBPATH:C:\I\VS9\VC\lib" "/LIBPATH:C:\masm32\lib" "/LIBPATH:C:\masm32\m32lib" "/LIBPATH:C:\I\SDK\Lib" "/LIBPATH:C:\I\SDK\Lib\IA64" "/LIBPATH:C:\I\SDK\Lib\AMD64" "/LIBPATH:C:\Programme\Intel\Compiler\C++\10.0.025\IA32\lib" /MANIFEST:NO /DEBUG "/PDB:C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\seti_boinc.pdb" /MAP /SUBSYSTEM:WINDOWS /DYNAMICBASE:NO /MACHINE:X86 glut32.lib Optimizer.lib image_libs.lib jpeglib.lib libboinc.lib libboincapi.lib non_ICC.lib setiboincdb.lib ippsmerged.lib ippvmmerged.lib ippchmerged.lib ippcorel.lib delayimp.lib libcpmtd.lib WinMM.lib OpenGL32.lib Kernel32.Lib oldnames.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib
  C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm(127):libcpmtd.lib(iostream.obj) : fatal error LNK1112: Modul-Computertyp "IA64" steht in Konflikt mit dem Zielcomputertyp "X86".
  C:\I\VS8\Microsoft Visual C++ 2005 Express Edition - DEU\Logs\VSMsiLog19E2.txt(4464):MSI (s) (38:D0) [14:36:37:328]: Executing op: ComponentRegister(ComponentId={2C4024A5-B667-44DD-BD4A-54FCA48E8907},KeyPath=C:\I\VS8\VC\lib\libcpmtd.lib,State=3,,Disk=1,SharedDllRefCount=0,BinaryType=0)
  C:\I\VS8\Microsoft Visual C++ 2005 Express Edition - DEU\Logs\VSMsiLog19E2.txt(8156):MSI (s) (38:D0) [14:36:45:390]: Executing op: FileCopy(SourceName=libcpmtd.lib|libcpmtd.lib,SourceCabKey=FL_libcpmtd_lib_____X86.3643236F_FC70_11D3_A536_0090278A1BB8,DestName=libcpmtd.lib,Attributes=0,FileSize=6703474,PerTick=32768,,VerifyMedia=1,,,,,CheckCRC=0,,,InstallMode=58982400,HashOptions=0,HashPart1=-1378151864,HashPart2=-640444595,HashPart3=-1526918150,HashPart4=1542994153,,)
  C:\I\VS8\Microsoft Visual C++ 2005 Express Edition - DEU\Logs\VSMsiLog19E2.txt(8157):MSI (s) (38:D0) [14:36:45:390]: File: C:\I\VS8\VC\lib\libcpmtd.lib;   To be installed;   Won't patch;   No existing file
  C:\I\VS8\Microsoft Visual C++ 2005 Express Edition - DEU\Logs\VSMsiLog19E2.txt(8159):MSI (s) (38:D0) [14:36:45:390]: Note: 1: 2318 2: C:\I\VS8\VC\lib\libcpmtd.lib
  Übereinstimmende Zeilen: 31    Übereinstimmende Dateien: 25    Insgesamt durchsuchte Dateien: 28802


Any idea ? Looks like I must install VS8 and SDK2003 new under Vista64 and not use a XP_32 copy....

heinz
Title: Re: optimized sources
Post by: Jason G on 30 Oct 2008, 12:22:09 pm
Heinz, 2.2 is too old. Though I can't read the German errors I would avoid older versions (that old) like the plague because of Boincapi, compiler, and target platform changes. please consider starting with AK_v8 build as we tried to make it much easier.  simply changing the ICC & IPP directories (in the project includes and linker linraiy paths) should allow you to build against our boincapi. If you need some help let me know, as I am coming to some heavy multithreaded opportuntities and would like you to be up to speed for some collaboration.

Jason
Title: Re: optimized sources
Post by: _heinz on 30 Oct 2008, 02:08:30 pm
Heinz, 2.2 is too old. Though I can't read the German errors I would avoid older versions (that old) like the plague because of Boincapi, compiler, and target platform changes. please consider starting with AK_v8 build as we tried to make it much easier.  simply changing the ICC & IPP directories (in the project includes and linker linraiy paths) should allow you to build against our boincapi. If you need some help let me know, as I am coming to some heavy multithreaded opportuntities and would like you to be up to speed for some collaboration.

Jason

on my old xp-machine this older version 2.2 compiles complete without error. So I thought it is a good complete test.
The problem is perhaps the following: OS is Vista64, compile  a program as 32 bit program
libcpmtd.lib(iostream.obj) : fatal error LNK1112: Modul-Computertyp "IA64" is in conflict with goal-computertyp "X86".
--------------------------------------------------------------
In the SDK Realease notes we found:
Visual C++® Search Paths and Registering Environment Variables
Selecting to Register Environment Variables places the Platform SDK bin, include, and library directories at the beginning of the search paths used when building programs in the Visual Studio IDE.

Note To target a specific version of Windows, you must still define the _WIN32_WINNT, _WIN32_IE, and _WIN32_WINDOWS macros as appropriate.

To register the SDK bin, include, and library directories with Microsoft Visual Studio® version 6.0 and Visual Studio .NET, click Start, point to All Programs, point to Microsoft Platform SDK for Windows Server 2003 SP1, point to Visual Studio Registration, and then click Register PSDK Directories with Visual Studio. This registration process places the SDK bin, include, and library directories at the beginning of the search paths, which ensures that the latest headers and libraries are used when building applications in the IDE.

Note that for Visual Studio 6.0 integration to succeed, Visual Studio 6.0 must run at least once before you select Register PSDK Directories with Visual Studio. Also note that when this option is run, the IDEs should not be running.

To develop a 32-bit C/C++ application on 64-bit Windows, do not register environment variables when you install Visual C++ 6.0. Instead, open a command window and run Vcvars32.bat (from the Visual C++ \bin folder), followed by Setenv.bat (from the SDK bin folder), specifying the appropriate switches (such as /SVR32 /2000 /XP32).  
Maybe this is the problem
did you remember todo so Jason?

some other projects(arprec) I tried did not have this problem... will try some others now too

with akv8 I believe I must use IPP5.3,  have still 5.2beta.
there is still something to install (ITBB and latest IPP)
some closer collaboration would I like  ;D

heinz



Title: Re: optimized sources
Post by: _heinz on 30 Oct 2008, 03:15:56 pm
Looked up for the link error and found:
Linker Tools Error LNK1112
The object files specified as input were compiled for different computer types.
libcpmtd.lib(iostream.obj) : fatal error LNK1112: Modul-Computertyp "IA64" is compiled for IA64, but should for x86
hmm.....
look as a error in the environment... ?
----------------------------------------------------
The Platform SDK 64-bit build tools use a version of CRT that still has support for the older style iostream.h, (iostream.h is no longer supported in Visual Studio 2003.  Visual Studio 2003 supports iostream instead.) As a consequence of this, there are some samples in the PSDK, using iostream.h that build for 64-bit Windows platforms, but not for 32-bit platforms.

looks like using old iostream headers  (http://support.microsoft.com/default.aspx?scid=kb;en-us;154419) here
maybe this is the error ?

heinz
Title: Re: optimized sources
Post by: _heinz on 30 Oct 2008, 06:51:15 pm
set the environment
but no possibility to do this for Vista64, not in script...
REM "Usage:    Setenv [/2000 | XP32 | /XP64 | /SRV32 | /SRV64 | /X64] [/DEBUG | /RETAIL]"
REM
REM                  /2000   - target Windows 2000 and IE 5.0
REM                  /XP32   - target Windows XP 32 (default)
REM                  /XP64   - target Windows XP 64
REM                  /SRV32  - target Windows Server 2003 32 bit
REM                  /SRV64  - target Windows Server 2003 64 bit
REM                  /X64    - target Windows for the X64 bit platform
REM                  /DEBUG  - set the environment to DEBUG
REM                  /RETAIL - set the environment to RETAIL

-----------------------------------------------------------------------------------

C:\I\SDK>SetEnv /X64
Targeting Windows Server 2003 X64 DEBUG


C:\I\SDK>SetEnv /SRV64
Targeting Windows Server 2003 IA64-bit DEBUG
---------------------------------------------------------------------------------
there is vcvarsall.bat -->
@echo off
if "%1" == "" goto x86
if not "%2" == "" goto usage

if /i %1 == x86       goto x86
if /i %1 == amd64     goto amd64
if /i %1 == x64       goto amd64
if /i %1 == ia64      goto ia64
if /i %1 == x86_amd64 goto x86_amd64
if /i %1 == x86_ia64  goto x86_ia64
goto usage

:x86
if not exist "%~dp0bin\vcvars32.bat" goto missing
call "%~dp0bin\vcvars32.bat"
goto :eof

:amd64
if not exist "%~dp0bin\amd64\vcvarsamd64.bat" goto missing
call "%~dp0bin\amd64\vcvarsamd64.bat"
goto :eof

:ia64
if not exist "%~dp0bin\ia64\vcvarsia64.bat" goto missing
call "%~dp0bin\ia64\vcvarsia64.bat"
goto :eof

:x86_amd64
if not exist "%~dp0bin\x86_amd64\vcvarsx86_amd64.bat" goto missing
call "%~dp0bin\x86_amd64\vcvarsx86_amd64.bat"
goto :eof

:x86_ia64
if not exist "%~dp0bin\x86_ia64\vcvarsx86_ia64.bat" goto missing
call "%~dp0bin\x86_ia64\vcvarsx86_ia64.bat"
goto :eof

:usage
echo Error in script usage. The correct usage is:
echo     %0 [option]
echo where [option] is: x86 ^| ia64 ^| amd64 ^| x86_amd64 ^| x86_ia64
echo:
echo For example:
echo     %0 x86_ia64
goto :eof

:missing
echo The specified configuration type is missing.  The tools for the
echo configuration might not be installed.
goto :eof

-----------------------------------------------------------------
and now call vcvarsall.bat

C:\Program Files (x86)\Microsoft Visual Studio 8\VC>vcvarsall.bat ia64
The specified configuration type is missing.  The tools for the
configuration might not be installed.
try then this -->
C:\Program Files (x86)\Microsoft Visual Studio 8\VC>vcvarsall.bat
Setting environment for using Microsoft Visual Studio 2005 x86 tools.
hmmm....
we will se if it works on Vista64
the arprec project I can compile sucessful  with VS2005 Express
the old 2.2  shows the same link error as before with VS2008 Express....
tried to compile a project which has asm source and it fails with VS2008 Express...
with VS2005 Express the asm works correct.....hmmmm

any ideas ?

heinz

heinz

Title: Re: optimized sources
Post by: Jason G on 31 Oct 2008, 03:47:16 am
Try this one Heinz:

Quote
REM                  /X64    - target Windows for the X64 bit platform
Title: Re: optimized sources
Post by: _heinz on 31 Oct 2008, 08:34:15 pm
Had have a look at the system variables of Vista64 and wounder what I found there:

NUMBER_OF_PROCESSORS 8
PROCESSOR_ARCHITECTURE AMD64
PROCESSOR_IDENTIFIER Intel64 Family 6 Model 23 Stepping 6, GenuineIntel
PROCESSOR_LEVEL 6
PROCESSOR_REVISION 1706
----------------------------------------------------------------------------
If I look at the package of the Processor E5405 we see:
Quad-Core
Intel Xeon Processor 5400 Series
QUOAD-CORE SERVER
Intel Core Microarchitecture
Support for Two-Socket Systems
Intel 64 Architecture
Intel Virtualization Technology
--------------------------------------------------
and Everst shows:
Computertyp   ACPI x64-basierter PC
Betriebssystem   Microsoft Windows Vista Ultimate
.....
Motherboard   
CPU Typ   2x QuadCore Intel Xeon E5405, 2400 MHz (6 x 400)
Motherboard Name   Intel Skulltrail D5400XS  (2 PCI, 4 PCI-E x16, 4 FB-DIMM, Audio, Gigabit LAN)
Motherboard Chipsatz   Intel Seaburg 5400B
Arbeitsspeicher   [ TRIAL VERSION ]
Channel0-DIMM1: Kingston   4 GB DDR2-800 ECC DDR2 SDRAM FB-DIMM  (4-4-4-15 @ 320 MHz)  (5-5-5-18 @ 400 MHz)
Channel1-DIMM1: Kingston   4 GB DDR2-800 ECC DDR2 SDRAM FB-DIMM  (4-4-4-15 @ 320 MHz)  (5-5-5-18 @ 400 MHz)
BIOS Typ   Intel (08/25/08)
---------------------------------------------
I cant believe it, Vista 64 shows PROCESSOR_ARCHITECTURE AMD64  :o

heinz
Title: Re: optimized sources
Post by: Jason G on 31 Oct 2008, 09:34:51 pm
LoL
Title: Re: optimized sources
Post by: Raistmer on 01 Nov 2008, 03:23:51 pm
I cant believe it, Vista 64 shows PROCESSOR_ARCHITECTURE AMD64  :o

heinz

And it's right. Cause x64 mode _IS_ AMD64 ;)
Title: Re: optimized sources
Post by: _heinz on 03 Nov 2008, 02:48:16 pm
Nehalem is available in Germany
http://www.kmelektronik.de/
-----------------------------------------
Intel Core i7 920 4x2.67GHz BOX seit 23.10.2008 
399,99 €
Preis inkl. MwSt. + ab 5,99 € bei Versand
Filialbestände Lieferzeit in der Versandzentrale bei Vorbestellung voraussichtlich 3-4 Wochen
-----------------------------------------------------------------------------------------------------------------------------
Intel Core i7 940 4x2.93GHz BOX seit 23.10.2008 
699,99 €
Preis inkl. MwSt. + ab 5,99 € bei Versand
Filialbestände Lieferzeit in der Versandzentrale bei Vorbestellung voraussichtlich 2-3 Wochen
------------------------------------------------------------------------------------------------------------------------------   
Intel Core i7 965 XE 4x3.2GHz BOX seit 23.10.2008 
1199,99 €
Filialbestände Lieferzeit in der Versandzentrale bei Vorbestellung voraussichtlich 3 Wochen   
----------------------------------------------------------------------------------------------------------------------------
@ Jason: Santa Claus is comming soon, wish you good luck

heinz
 
Title: Re: optimized sources
Post by: Raistmer on 03 Nov 2008, 05:29:06 pm
Good news! :) But price still high...
Does it need DDR3 too or some boards with DDR2 support exist?
What ultimate upgrade price (all parts that should be replaced) will be?
Title: Re: optimized sources
Post by: _heinz on 03 Nov 2008, 08:39:15 pm
VS2008 Prof (90 days) is installed now on V64.
 first compiled project was sucessful, so we can hope.
1>arprec - 0 Fehler, 3 Warnung(en)
========== Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========

1>------ Neues Erstellen gestartet: Projekt: qd, Konfiguration: Release Win32 ------
1>Die Zwischen- und Ausgabedateien für das Projekt "qd" mit der Konfiguration "Release|Win32" werden gelöscht.
1>Kompilieren...
1>util.cpp
1>qd_real.cpp
1>qd_const.cpp
1>fpu.cpp
1>dd_real.cpp
1>dd_const.cpp
1>c_qd.cpp
1>c_dd.cpp
1>bits.cpp
1>Code wird generiert...
1>Verknüpfen...
1>   Bibliothek "C:\I\SC\vs90\qd-2.3.0-pre5-windll\Release\qd.lib" und Objekt "C:\I\SC\vs90\qd-2.3.0-pre5-windll\Release\qd.exp" werden erstellt.
1>Das Manifest wird eingebettet...
1>Das Buildprotokoll wurde unter "file://c:\I\SC\vs90\qd-2.3.0-pre5-windll\Release\qd_files\BuildLog.htm" gespeichert.
1>qd - 0 Fehler, 0 Warnung(en)
2>------ Neues Erstellen gestartet: Projekt: qd_timer, Konfiguration: Release Win32 ------
3>------ Neues Erstellen gestartet: Projekt: qd_test, Konfiguration: Release Win32 ------
2>Die Zwischen- und Ausgabedateien für das Projekt "qd_timer" mit der Konfiguration "Release|Win32" werden gelöscht.
4>------ Neues Erstellen gestartet: Projekt: pslq_test, Konfiguration: Release Win32 ------
5>------ Neues Erstellen gestartet: Projekt: quadt_test, Konfiguration: Release Win32 ------
5>Die Zwischen- und Ausgabedateien für das Projekt "quadt_test" mit der Konfiguration "Release|Win32" werden gelöscht.
4>Die Zwischen- und Ausgabedateien für das Projekt "pslq_test" mit der Konfiguration "Release|Win32" werden gelöscht.
3>Die Zwischen- und Ausgabedateien für das Projekt "qd_test" mit der Konfiguration "Release|Win32" werden gelöscht.
2>Kompilieren...
4>Kompilieren...
3>Kompilieren...
5>Kompilieren...
3>qd_test.cpp
5>tictoc.cpp
2>tictoc.cpp
4>tictoc.cpp
2>qd_timer.cpp
5>quadt_test.cpp
4>pslq_test.cpp
4>Code wird generiert...
3>Verknüpfen...
4>Verknüpfen...
2>Code wird generiert...
5>Code wird generiert...
3>Das Manifest wird eingebettet...
4>Das Manifest wird eingebettet...
2>Verknüpfen...
5>Verknüpfen...
2>Das Manifest wird eingebettet...
5>Das Manifest wird eingebettet...
3>Das Buildprotokoll wurde unter "file://c:\I\SC\vs90\qd-2.3.0-pre5-windll\Release\qd_test_files\BuildLog.htm" gespeichert.
3>qd_test - 0 Fehler, 0 Warnung(en)
4>Das Buildprotokoll wurde unter "file://c:\I\SC\vs90\qd-2.3.0-pre5-windll\Release\pslq_test_files\BuildLog.htm" gespeichert.
4>pslq_test - 0 Fehler, 0 Warnung(en)
2>Das Buildprotokoll wurde unter "file://c:\I\SC\vs90\qd-2.3.0-pre5-windll\Release\qd_timer_files\BuildLog.htm" gespeichert.
2>qd_timer - 0 Fehler, 0 Warnung(en)
5>Das Buildprotokoll wurde unter "file://c:\I\SC\vs90\qd-2.3.0-pre5-windll\Release\quadt_test_files\BuildLog.htm" gespeichert.
5>quadt_test - 0 Fehler, 0 Warnung(en)
========== Alles neu erstellen: 5 erfolgreich, Fehler bei 0, 0 übersprungen ==========

Next will be ITBB...

heinz
Title: Re: optimized sources
Post by: The Grinch on 05 Nov 2008, 01:06:05 am
Good news! :) But price still high...
Does it need DDR3 too or some boards with DDR2 support exist?
What ultimate upgrade price (all parts that should be replaced) will be?

And Motherboards are NOT still stable!
I've anybody buy now, it getting betastyle Hardware.
Hardware-Onlinemagazine sounds like "Banana-Product", finished after buy.
Title: Re: optimized sources
Post by: _heinz on 06 Nov 2008, 11:33:25 am
It's hard to install a full functional developer system in Vista64. Fighting with IPP, MKL, TBB and ASM...
a lot work is todo by hand...
To setup the environment for MKL you had to go to \tools\environment and execute in a dos windows mklvars64 for 64Bit environment, and mklvars32 for 32Bit environment.
You should see then somethink like the following --->
C:\>cd C:\I\INTEL\MKL\9.0\tools\environment

C:\I\INTEL\MKL\9.0\tools\environment>mklvars64

C:\I\INTEL\MKL\9.0\tools\environment>set lib=C:\I\INTEL\MKL\9.0\ia64\lib;

C:\I\INTEL\MKL\9.0\tools\environment>set include=C:\I\INTEL\MKL\9.0\include;

C:\I\INTEL\MKL\9.0\tools\environment>set path=C:\I\INTEL\MKL\9.0\ia64\bin;C:\Win
dows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Program Files\Intel\DMIX;C:
\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static;C:\Program Files (x86)
\Microsoft SQL Server\90\Tools\binn\;C:\Program Files\TortoiseSVN\bin

C:\I\INTEL\MKL\9.0\tools\environment>mklvars32

C:\I\INTEL\MKL\9.0\tools\environment>set lib=C:\I\INTEL\MKL\9.0\ia32\lib;C:\I\IN
TEL\MKL\9.0\ia64\lib;

C:\I\INTEL\MKL\9.0\tools\environment>set include=C:\I\INTEL\MKL\9.0\include;C:\I
\INTEL\MKL\9.0\include;

C:\I\INTEL\MKL\9.0\tools\environment>set path=C:\I\INTEL\MKL\9.0\ia32\bin;C:\I\I
NTEL\MKL\9.0\ia64\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:
\Program Files\Intel\DMIX;C:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-S
tatic;C:\Program Files (x86)\Microsoft SQL Server\90\Tools\binn\;C:\Program File
s\TortoiseSVN\bin

C:\I\INTEL\MKL\9.0\tools\environment>
----------------------------------------------------------------
with TBB it is much more complicated.. a really good setup is missing, a lot of  work by hand...uuuhhh
download tbb21_012oss_win.zip and additional msvs_plugin_20080610.zip, unzip both into the directory you want to install.
set the environment variable TBB21_INSTALL_DIR  to the directory you want to install.  goto the directory msvs_plugin_20080610 and read the file "README". You can not use a normal win-Editor. So use your Visual Studio to open the file. You see now the following Instructions --->
------------------------------------------------------------------------
Intel(R) Threading Building Blocks Integration Plug-in - README
------------------------------------------------------------------------


Contents
--------

    - Overview
    - Installation
    - Usage
    - Integration actions
    - Advanced configuration
    - Known limitations


Overview
--------

The Intel(R) Threading Building Blocks integration plug-in provides
a simple mechanism to set up the use of Intel(R) Threading Building
Blocks (TBB) in Microsoft* Visual Studio* projects.

The TBB integration plug-in adds the following settings into
Microsoft* Visual C++* projects as required by TBB:

    - the path to the TBB header files
    - the path to the TBB libraries
    - the specific TBB libraries to link with

The plug-in can be used with C++ projects created in Microsoft*
Visual Studio* 2003, 2005 and 2008.

See http://threadingbuildingblocks.org for full TBB documentation
and software information.


Installation
------------

Unzip the msvs_plugin.zip package and open the msvs_plugin folder that
is created.

Before use, the plug-in must be registered by performing the following
steps (depending on the version of Microsoft* Visual Studio* being used):

    - Microsoft* Visual Studio* 2003

   1.  Create or open the "<My Documents>\Visual Studio 2003\Addins"
       folder.

   2.  Copy all files from msvs_plugin\vc7.1 into the above folder.

   3.  If using a 32-bit version of Microsoft* Windows*, double-
       click on the tbb_integration_vc7.1_32.reg file to add infor-
       mation about the plug-in into the registry.

       If using a 64-bit version of Microsoft* Windows*, double-
       click on the tbb_integration_vc7.1_64.reg file to add infor-
       mation about the plug-in into the registry.

   4.  Register the tbb_integration_vc7.1.dll by using the
       Microsoft* regasm utility by performing these steps:

      cd "<My Documents>\Visual Studio 2003\Addins"

      regasm tbb_integration_vc7.1.dll /codebase

       where <My Documents> is the actual path to your "My
       Documents" folder.

   5.  Reset Microsoft* Visual Studio* 2003 by performing this step:

      devenv.exe /setup

    - Microsoft* Visual Studio* 2005

   1.  Create or open the "<My Documents>\Visual Studio 2005\Addins"
       folder.

   2.  Copy all files from msvs_plugin\vc8 into the above folder.

    - Microsoft* Visual Studio* 2008

   1.  Create or open the "<My Documents>\Visual Studio 2008\Addins"
       folder.

   2.  Copy all files from msvs_plugin\vc9 into the above folder.

To ensure that the registration succeeded, select Tools -> Add-in
Manager in the main Microsoft* Visual Studio* menu and check that the
TBB integration plug-in is listed in the table.  Also, the list of
installed products, in the Microsoft* Visual Studio* Help -> About
dialog, should contain an entry for "TBB Integration".

To upgrade to a new version of the TBB integration plug-in, simply
replace the previous version of the DLL installed above by the new
version.

The plugin is pre-configured to use it with commercial Intel(R) TBB
software of versions 2.0 and 2.1. To use the plugin with open-source
TBB packages, in addition to the above steps, do the following:

   1.  Determine the full path to the root directory with the content
        of the open-source TBB package.

   2.  Set TBB20_INSTALL_DIR or TBB21_INSTALL_DIR environment variable
        to contain the above path. For instructions about how to set
        an environment variable, seek Microsoft* Windows* help system.

See also "Advanced configuration" section for additional information.


Usage
-----

To enable TBB for a given C++ project, right-click on the project item
in the Microsoft* Visual Studio* Solution Explorer, and open the "Use
Intel(R) TBB" sub-menu in the project context menu.  The sub-menu
consists of the following items:

    - The list of TBB versions the plugin is configured to use with.

   Click the desired version you want to use with your project to
   add the necessary parameters to the project settings.

    - "Remove Intel(R) TBB settings".

   Select this item to delete all TBB-specific settings from your
   project.


Integration Actions
-------------------

When TBB is enabled for a C++ project as described above, the following
C++ project settings are changed:

    - The path to the TBB header files is added to "Additional Include
      Directories".

    - The path to the TBB libraries is added to "Additional Library
      Directories".

    - The TBB libraries are added into "Additional Dependencies"
      (tbb.lib for Release configuration, and tbb_debug.lib for Debug).

    - For Microsoft* Visual Studio* 2003, a command to copy TBB dynamic
      libraries (tbb.dll / tbb_debug.dll) into the OutDir directory is
      added into the list of post-build events.

    - For Microsoft* Visual Studio* 2005 or 2008, the path to the TBB
      dynamic libraries is appended to the PATH environment variable in
      "Debugging | Environment".

The above actions are sufficient to successfully use TBB in Microsoft*
Visual Studio* C++ projects.


Advanced configuration
----------------------

The plugin takes the paths to the TBB header files and libraries from
the paths.xml file located in the directory with the plug-in DLL.
For example, the following tag structure inside paths.xml provides
the plugin with the paths for TBB 2.0 installations:
  <TBB version_name="TBB 2.0">
    <TBB_INCLUDE_DIR>$(TBB20_INSTALL_DIR)\include</TBB_INCLUDE_DIR>
    <TBB_LIB_DIR mode="32">$(TBB20_INSTALL_DIR)\ia32\vc8\lib</TBB_LIB_DIR>
    <TBB_LIB_DIR mode="64">$(TBB20_INSTALL_DIR)\em64t\vc8\lib</TBB_LIB_DIR>
    <TBB_BIN_DIR mode="32">$(TBB20_INSTALL_DIR)\ia32\vc8\bin</TBB_BIN_DIR>
    <TBB_BIN_DIR mode="64">$(TBB20_INSTALL_DIR)\em64t\vc8\bin</TBB_BIN_DIR>
  </TBB>
The $(TBB20_INSTALL_DIR) is the environment variable containing the path
to the root directory of TBB 2.0 installation; it is set automatically
when commercial TBB 2.0 package is installed. Likewise TBB21_INSTALL_DIR
is used for TBB 2.1 packages, etc.

When necessary, you can specify more TBB paths by adding new <TBB> tags
to the configuration file. An example is provided there as a comment.
The version_name attribute should contain the text to be displayed in
the TBB integration menu; if missed, the whole tag is ignored. If some
directory tag under <TBB> is missed, then the corresponding setting
will not be set during integration. If TBB_INCLUDE_DIR tag is missed,
the corresponding item will be disabled in the TBB integration menu.

If there is no paths.xml file, the TBB integration menu will only have
the option to remove TBB settings.


Known Limitations
-----------------

   - The plug-in files should not be located on a remote disk due to
     known add-in loading limitations.  See
     http://msdn2.microsoft.com/en-us/library/19dax6cz(vs.80).aspx
     for details.
---------------------------------------------------------------------------------------------------

Really a lot todo, if we have 3 different  developer environments(32, 64, em64t) and  3 VS-Studios, (VS2005 Express, VS2008 Express, VS2008 Professional)....and 3 different OS(Vista64, XP64, Linux)
And if all this is done you had to go to the SDK to set the environment new....uuuhh.
And it is not simple ... for me is simple when I can click on a setup.exe file and all is done....
Such a method to make TBB available is not userfriendly !

heinz
Title: Re: optimized sources
Post by: _heinz on 07 Nov 2008, 07:32:06 pm
After poking around I get TBB20 running on Vista64, VS2005 Express Edition, VS2008 Express Edition and Visual Studio 2008 Professional Edition. With all 3 I compiled the fiboniacci project sucessful, although in the "Solution Explorer" the submenu "Use Intel(R) TBB" is not available(till now).  ::)
edit:
no menu "Use Intel(R) TBB" (http://www.britta-d.de/images/vs/vs2008_nomenu.jpg)
Title: Re: optimized sources
Post by: Raistmer on 25 Nov 2008, 04:35:48 pm
@
You are among the first to receive notification of the groundbreaking Intel® Parallel Composer Beta. Download this exciting new tool and get instant access to an advanced parallelism C/C++ compiler, debugger, and libraries that can change the way you develop parallel applications. @
 ;D
Will look...
Title: Re: optimized sources
Post by: Gecko_R7 on 25 Nov 2008, 06:32:29 pm
@
You are among the first to receive notification of the groundbreaking Intel® Parallel Composer Beta. Download this exciting new tool and get instant access to an advanced parallelism C/C++ compiler, debugger, and libraries that can change the way you develop parallel applications. @
 ;D
Will look...


Ohhhh......sounds cool!!!!  8)
Title: Re: optimized sources
Post by: Raistmer on 25 Nov 2008, 11:27:57 pm
"
1.2.3
Minimum System Requirements

A PC based on an IA-32 architecture processor supporting the Intel® Streaming SIMD 2 Extensions (Intel® SSE2) instructions, or a PC based on an Intel® 64 architecture processor or 64-bit AMD* Athlon* or Opteron* processor
"
It seems now Intel directly claim that it supports AMD CPUs (usually there is "and compatible CPUs" statement)
Title: Re: optimized sources
Post by: Jason G on 26 Nov 2008, 07:40:48 am
Well AFAICT it's a nice thing for those with MSVS projects they'd like to parallelise, but is basically repackaged ICC/IPP/TBB & Thread profiler&checker ... already have these, but might be nice to see if they've integrated things a bit better.
Title: Re: optimized sources
Post by: Jason G on 26 Nov 2008, 11:44:44 am
@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail?  Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 

I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.

Jason
Title: Re: optimized sources
Post by: _heinz on 26 Nov 2008, 12:04:38 pm
@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail?  Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 

I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.

Jason

Hi Jason,
did not bench the single or multithreaded FFT, but can do this in the next days.
Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 
I do not have them, post pm where I can get, then I will do so.
heinz
Title: Re: optimized sources
Post by: Jason G on 26 Nov 2008, 12:06:02 pm
Cheers!  ;D [will see what I can work out if you have no test piece, will take some time ]
Title: Re: optimized sources
Post by: _heinz on 26 Nov 2008, 01:35:14 pm
@
You are among the first to receive notification of the groundbreaking Intel® Parallel Composer Beta. Download this exciting new tool and get instant access to an advanced parallelism C/C++ compiler, debugger, and libraries that can change the way you develop parallel applications. @
 ;D
Will look...
I'm registered and downloading now..
We will see how this works...
:-)
Title: Re: optimized sources
Post by: _heinz on 26 Nov 2008, 02:01:59 pm
Hi Raistmer,
have you seen this (http://software.intel.com/en-us/articles/note-when-installing-the-intelr-parallel-composer-beta-on-a-system-with-intelr-c-compiler)
Note when Installing the Intel(R) Parallel Composer Beta on a system with Intel(R) C++ Compiler
This is a limitation of the Intel(R) Parallel Composer beta's Integration with Microsoft Visual Studio*.

If you install the Intel Parallel Composer beta on a system that has Intel C++ Compiler 9.x, 10.x or 11.0 installed already, the IDE integration of Intel Parallel Composer will replace the existing IDE integration from Intel C++ Compiler. This causes the existing Intel C++ Compiler 9.x, 10.x or 11.0 not usable from within the Visual Studio IDE.

If you'd like to use the Intel C++ Compiler 9.x, 10.x or 11.0, please uninstall the Intel Parallel Composer, and repair the old compiler.

-------------------------
uuuuhhh... requires a VM for me to try out...or a second parallel installation of OS..
greetings... ;D
Title: Re: optimized sources
Post by: Raistmer on 26 Nov 2008, 02:20:33 pm
Hi :)
No prob, I not using ICC right now :)
Title: Re: optimized sources
Post by: Raistmer on 26 Nov 2008, 02:34:39 pm
Just Released: AMD Core Math Library v4.2.0


New features in v4.2.0 include:
Further optimized DGEMM for better performance, and requiring less memory bandwidth
Improved 3D Complex-Complex FFT routines with significantly reduced work space requirements
New optimized RNG base generators for 32-bit builds
Updated version of GFORTRAN to 4.3.2
And another news form AMD - new Shanghai core
http://forums.amd.com/devblog/blogpost.cfm?threadid=103010&catid=271

"
A comprehensive suite of Fast Fourier Transforms (FFTs) in both single-, double-, single-complex and double-complex data types.
"
Always wanted to do comparison between IPP/FFTW and ACML for AMD CPUs :)

OMG it's FORTRAN library %)
http://developer.amd.com/cpu/Libraries/acml/downloads/Pages/default.aspx#downloads
And it has many flavors... Interesting, can it be used w/o any FORTRAN installation, just as simple lib-file? ....
Title: Re: optimized sources
Post by: _heinz on 26 Nov 2008, 03:47:42 pm
Just Released: AMD Core Math Library v4.2.0

OMG it's FORTRAN library %)
http://developer.amd.com/cpu/Libraries/acml/downloads/Pages/default.aspx#downloads
And it has many flavors... Interesting, can it be used w/o any FORTRAN installation, just as simple lib-file? ....
Out of my view we can link from different libs....often this is used in scientific work.
Have seen it AMD Core Math Library v4.2.0 is published...
Thanks

heinz
Title: Re: optimized sources
Post by: _heinz on 26 Nov 2008, 07:24:20 pm
@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail?  Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 

I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.

Jason

compiled the fftw project (single thread) as 32 bit
 /I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:prompt

Results:
C:\Windows\system32>echo off
fftw-3.1.2 benchfsse(VS2005) started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 300.32 us, time: 169.69 ns, ``mflops'': 707.16
Problem: 16, setup: 288.86 us, time: 332.84 ns, ``mflops'': 961.43
Problem: 32, setup: 7.91 ms, time: 726.79 ns, ``mflops'': 1100.7
Problem: 64, setup: 27.46 ms, time: 1.67 us, ``mflops'': 1148.4
Problem: 128, setup: 62.98 ms, time: 4.19 us, ``mflops'': 1069.1
Problem: 256, setup: 137.48 ms, time: 9.18 us, ``mflops'': 1115
Problem: 512, setup: 267.80 ms, time: 20.95 us, ``mflops'': 1099.6
Problem: 1024, setup: 575.47 ms, time: 46.10 us, ``mflops'': 1110.7
Problem: 2048, setup: 1.37 s, time: 99.17 us, ``mflops'': 1135.8
Problem: 4096, setup: 3.42 s, time: 220.42 us, ``mflops'': 1115
Problem: 8192, setup: 8.83 s, time: 530.79 us, ``mflops'': 1003.2
Problem: 16384, setup: 21.99 s, time: 1.13 ms, ``mflops'': 1014.9
Problem: 32768, setup: 53.80 s, time: 2.41 ms, ``mflops'': 1020
Problem: 131072, setup: 369.12 s, time: 9.89 ms, ``mflops'': 1126
fftw-3.1.2 benchfsse ended.
Drücken Sie eine beliebige Taste . . .
----------------------------------------------------------------------------------------------------
For the threaded variants I must first read doku again...
Did you mean this ? or if you want some other Compiler options let me know..
If I have installed the Intel® Parallel Composer Beta, I will recompile the project...

regards heinz
Title: Re: optimized sources
Post by: Jason G on 27 Nov 2008, 07:34:50 am
Thanks Heinz,
   Could you let me know:
   - Current CPU speed at time of test
   - Cache sizes per package
   - Bus speed

My single core computations are so far within around 10% of your numbers at least, but don't allow for those overheads for large problems, so I factor them into the instruction cost at the moment. 

For multithreaded (eventually)  FFTW i think it would require a different package they have, (alpha?).  In any case the purpose is to refine my textbook efficiency approximations into more practical ones that can be used to assess scalability of parallel FFT algorithms. 
Title: Re: optimized sources
Post by: _heinz on 27 Nov 2008, 06:09:38 pm
Thanks Heinz,
   Could you let me know:
   - Current CPU speed at time of test
   - Cache sizes per package
   - Bus speed
CPU speed 2398 MHz
FSB speed 400(QP) 1600
Cache sizes per package ... I must look up ( where can I find in the source ? )
ahh.. cpu package.. 12 MB
Title: Re: optimized sources
Post by: Leaps-from-Shadows on 27 Nov 2008, 07:46:06 pm
Current Nehalem CPUs (920, 940, 965) have 32k L1 instruction cache per core, 32k L1 data cache per core, 256k L2 cache per core, and 8MB shared L3 cache.
Title: Re: optimized sources
Post by: _heinz on 27 Nov 2008, 07:47:11 pm
Intel® Parallel Composer Beta is installed and running, but not in the VS2005/2008 Express versions.
>------ Erstellen gestartet: Projekt: fibonacci, Konfiguration: Release x64 ------
1>Compiling with Intel(R) C++ Compiler 11.1.032 [Intel(R) 64]... (Intel C++ Environment)
1>Intel(R) C++ Compiler for applications running on Intel(R) 64, Version 11.1  Beta  Build 20081112 Package ID: composer_beta_update2.032
1>Copyright (C) 1985-2008 Intel Corporation.  All rights reserved.
1>icl /c /I C:\I\INTEL\tbb21_012oss\include -D WIN64 -D NDEBUG -D _CONSOLE -D _MBCS /EHsc /MD /GS /fp:fast /FoC:\Users\heinz\AppData\Local\Temp\tbb_examples\fibonacci\x64\Release/ /W1 /nologo /Qvc9 "/Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\x86_amd64" ..\Fibonacci.cpp
1>
1>Fibonacci.cpp
1>Linking... (Intel C++ Environment)
1>xilink: executing 'link'
1>Embedding manifest... (Microsoft VC++ Environment)
1>Copying tbb.dll (Microsoft VC++ Environment)
1>        1 Datei(en) kopiert.
1>Build log was saved at "file://C:\Users\heinz\AppData\Local\Temp\tbb_examples\fibonacci\x64\Release\BuildLog.htm"
1>fibonacci - 0 error(s), 0 warning(s)
========== Erstellen: 1 erfolgreich, Fehler bei 0, 0 aktuell, 0 übersprungen ==========

I give you 2 results on the hand, both compiled with VS2008, but one with integrated Parallel Composer.
VS2008 TBB --> fibonacci_1000_out.txt
VS2008 TBB Parallel Composer -->fibonacciopt_1000_out.txt
files attached

heinz  ;D

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 01:58:02 am
Thanks Heinz,
   Could you let me know:
   - Current CPU speed at time of test
   - Cache sizes per package
   - Bus speed
CPU speed 2398 MHz
FSB speed 400(QP) 1600
Cache sizes per package ... I must look up ( where can I find in the source ? )
ahh.. cpu package.. 12 MB

Thanks again, looks like my single thread estimates come good for your parameters:  Could you try a comparison run to this bench I compiled? (attached) Still Single threaded, but will make sure we have reference for future numbers.

same parameter usage: benchf_sse_icc  -opatient [same FFT lengths as before]

Jason



[attachment deleted by admin]
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 04:27:50 am
fftw-3.1.2 benchf_sse_icc(jason) started
benchf_sse_icc.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32
768 131072
Problem: 8, setup: 273.78 us, time: 49.65 ns, ``mflops'': 2416.8
Problem: 16, setup: 262.88 us, time: 98.21 ns, ``mflops'': 3258.2
Problem: 32, setup: 7.68 ms, time: 117.86 ns, ``mflops'': 6787.9
Problem: 64, setup: 26.83 ms, time: 222.62 ns, ``mflops'': 8624.6
Problem: 128, setup: 61.58 ms, time: 429.96 ns, ``mflops'': 10420
Problem: 256, setup: 124.30 ms, time: 925.40 ns, ``mflops'': 11066
Problem: 512, setup: 235.98 ms, time: 2.13 us, ``mflops'': 10816
Problem: 1024, setup: 401.79 ms, time: 4.50 us, ``mflops'': 11366
Problem: 2048, setup: 710.67 ms, time: 11.17 us, ``mflops'': 10080
Problem: 4096, setup: 1.39 s, time: 27.94 us, ``mflops'': 8797.1
Problem: 8192, setup: 3.08 s, time: 60.62 us, ``mflops'': 8783.6
Problem: 16384, setup: 6.91 s, time: 134.93 us, ``mflops'': 8499.6
Problem: 32768, setup: 15.86 s, time: 289.70 us, ``mflops'': 8483.2
Problem: 131072, setup: 86.42 s, time: 1.39 ms, ``mflops'': 7988.8
fftw-3.1.2 benchf_sse_icc ended.
----------------------------------------------
... great results   ;D
heinz
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 04:47:30 am
Huh... now my estimates are way out :o, That places cost of a complex multiply-add pair about 1.5 cycles and half the initial startup latency (now 35nS).  What was your original bench? non-sse floats fftw 3.1.2? (before cost estimate was 10.5 cycles per mul-add & startup latency 60nS).  Must be seeing effect of SSE instruction level parallelism and out-of-order execution hiding some of the latency maybe.
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 08:45:53 am
What was your original bench? non-sse floats fftw 3.1.2?
Configuration: Active(Release float SSE) Platform: Active(Win32)
/I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:prompt
******************************************
/OUT:"..\benchf_sse.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\I\SC\fftw-3.1.2\libfftwf_sse.lib" /MANIFEST /MANIFESTFILE:".\bench___Win32_Release_float_SSE\benchf_sse.exe.intermediate.manifest" /PDB:".\bench___Win32_Release_float/benchf.pdb" /SUBSYSTEM:CONSOLE /MACHINE:X86 /ERRORREPORT:PROMPT ..\libfftwf_sse.lib  kernel32.lib

heinz
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 08:54:48 am
ugghhh... ever stranger...same build (except mine with ICC), I guess when they say ICC builds aren't much faster they must mean against GCC builds.  Don't have my MinGW/GCC setup anymore to try that build, and that one managed to strangle my p4 back last year. Maybe I'll have better luck this year with improved hardware.

[In any case, we have some reference FFT speeds for the skulltrail now thanks, next is to come up with something that equals that, that can be more easily scaled to parallel.]

Jason
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 09:29:20 am
I tried to compile the fftw project with Parallel Composer but get error... it will have a 64bit project, but this is win32.
Looks like I must additional install the Composer for 32bit..
Please have a look at the resultfiles from the fibonacci project, it is 64 bit compiled with Composer

heinz
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 09:35:47 am
Hi Heinz, Why did it choose 5 threads instead of 8 ?

Quote
Threads number is 5
test before that is 4, next is 16, so it seems a bit weird.  I have some fibonnaci example project I did here with TBB a while ago.  Will see if I can dig it out.
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 09:46:15 am
Hi Heinz, Why did it choose 5 threads instead of 8 ?

Quote
Threads number is 5
test before that is 4, next is 16, so it seems a bit weird.  I have some fibonnaci example project I did here with TBB a while ago.  Will see if I can dig it out.

5:8   should be 5(1) to 8, hmmm 1:4 did work as you can see
echo off
echo please wait
echo fibonacci result in fibonacciopt_1000_out.txt
rem no second parameter means standard(1:4)
fibonacci.exe 1000 >fibonacciopt_1000_out.txt
echo fibonacci 1000 5:8
fibonacci.exe 1000 5:8 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 16
fibonacci.exe 1000 16 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 32
fibonacci.exe 1000 32 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 64
fibonacci.exe 1000 64 >>fibonacciopt_1000_out.txt
echo ready

you can compare with the other result file to see differences
heinz
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 10:08:00 am
Hi Heinz, Why did it choose 5 threads instead of 8 ?

Quote
Threads number is 5
test before that is 4, next is 16, so it seems a bit weird.  I have some fibonnaci example project I did here with TBB a while ago.  Will see if I can dig it out.

5:8   should be 5(1) to 8, hmmm 1:4 did work as you can see
echo off
echo please wait
echo fibonacci result in fibonacciopt_1000_out.txt
rem no second parameter means standard(1:4)
fibonacci.exe 1000 >fibonacciopt_1000_out.txt
echo fibonacci 1000 5:8
fibonacci.exe 1000 5:8 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 16
fibonacci.exe 1000 16 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 32
fibonacci.exe 1000 32 >>fibonacciopt_1000_out.txt
echo fibonacci 1000 64
fibonacci.exe 1000 64 >>fibonacciopt_1000_out.txt
echo ready

you can compare with the other result file to see differences
heinz
looked up.. this works correct with one line
fibonacci.exe 1000 1:64 >fibonacciopt_1000_out.txt

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 10:22:49 am
Downloading Parallel Composer beta now, for cooperative exploration & development.  If we can do something with FFT & FFA that would be good IMO for astropulse,  but there are strong possibilities for Multibeam softawre as well (maybe more, because it has a higher degree of serial optimisation already)... Will See.... This Beta software better not mess up my ICC/IPP installation!  :o
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 10:48:26 am
Change of Plan:
Quote
  Intel(R) C++ Compiler 10.1 Integration(s) in Microsoft Visual Studio* is already installed.
Installation can continue; however, you will not be able to use the Intel C++ Compiler 10.1 or 9.0 within the Visual Studio IDE

So I will switch to my 64 bit boot drive, and install both 32 & 64 bit parallel composer there instead.
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 11:16:22 am
Heinz, have 64 & 32 bit Parallel composer beta (update 2) installed .... Where can I find the fibonacci sample?  (the stuff I see here looks more boring)
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 11:23:20 am
Heinz, have 64 & 32 bit Parallel composer beta (update 2) installed .... Where can I find the fibonacci sample?  (the stuff I see here looks more boring)

fibonacci is part of TBB
If TBB is installed you have this sample
it is in C:\I\SC\ITBB\examples\test_all\fibonacci\vc8
analog what you choose as installdir
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 11:24:13 am
Hmmm, I have TBB on my other (32-bit) drive ... maybe I can install it here, will try.

[Hmmm, so parallel composer doesn't actually have TBB in it then ....  ??? that seems a bit odd, maybe they expect you'll use pure openmp.. what about IPP, I suppose that's not there either which would make this ICC 11 ?]

32 & 64 bit fibonacci sample built & ran.  Will consider fully migrating to 64 bit platform for holidays in a few weeks.  It'll be painful, but about time probably.

Jason
Title: Re: optimized sources
Post by: Leaps-from-Shadows on 28 Nov 2008, 11:49:09 am
Quote
ahh.. cpu package.. 12 MB
This is wrong, unless you have a not-yet-released version of the Nehalem processor.

Currently released versions have 32k L1 instruction cache per core, 32k L1 data cache per core, 256k L2 cache per core, and 8MB shared L3 cache.  They only have four physical cores, so that's 128k L1 instruction cache total, 128k L1 data cache total, 1MB L2 cache total, and 8MB L3 cache.  The four HT virtual cores aren't physical cores, so they don't have cache of their own.

I don't know how much difference this would make, but I thought I'd point it out anyway...
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 11:52:39 am
Hi Leaps! .. Nahhh .. It's 2xXeon Quads on a Skultrail Mobo  :) [Heinz, Please check cache size with CPU-Z]
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 12:01:02 pm
Hmmm, I have TBB on my other (32-bit) drive ... maybe I can install it here, will try.

[Hmmm, so parallel composer doesn't actually have TBB in it then ....  ??? that seems a bit odd, maybe they expect you'll use pure openmp.. what about IPP, I suppose that's not there either which would make this ICC 11 ?]

32 & 64 bit fibonacci sample built & ran.  Will consider fully migrating to 64 bit platform for holidays in a few weeks.  It'll be painful, but about time probably.

Jason

I did upload the fibonacci project to our testproject
it is in /users/heinz/heinz_projects/fibonacci/vc8

As far as I have seen IPP will be used if it is installed...but must read doku to manifest it..
Looked up now: --->
Vectorization and Loop Optimization
Vectorization detects patterns of sequential data accesses by the same instruction and transforms the code for SIMD execution, including use of the SSE, SSE2, SSE3, SSSE3, and SSE4 instruction sets.
heinz
did you see the speedup between the easy TBB and the composer with TBB. my testresultfiles are up.  ;D
heinz
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 12:03:35 pm
@heinz: I didn't check yet ... wait up.. this is fun... will compare 32 bit Fibonacci here.

@Leaps: Will PM shortly about something
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 12:20:50 pm
Hi Leaps! .. Nahhh .. It's 2xXeon Quads on a Skultrail Mobo  :) [Heinz, Please check cache size with CPU-Z]
there you can see CPUID (http://www.britta-d.de/bilder/server_oc/page9.htm)
and here you can see CPUZ (http://www.britta-d.de/bilder/server/page38.htm)
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 12:30:43 pm
Ahhh, 6 meg per package ( 1.5 meg per core )... Okay, yep it is 12 meg total for the 8 cores.

Compared 32 bit ICC 10.1 / TBB 2.0 build of fibonacci, and it IS slower than Parallel composer 32 bit build under XP64 ... Will have to try that build under XP32 to confiirm though.  I will probably update all my ICC/IPP base packages as soon as I get time, in a few week.

Jason
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 12:46:23 pm
Ahhh, 6 meg per package ( 1.5 meg per core )... Okay, yep it is 12 meg total for the 8 cores.

Compared 32 bit ICC 10.1 / TBB 2.0 build of fibonacci, and it IS slower than Parallel composer 32 bit build under XP64 ... Will have to try that build under XP32 to confiirm though.  I will probably update all my ICC/IPP base packages as soon as I get time, in a few week.

Jason


12 MB per chip (http://www.intel.com/cd/channel/reseller/emea/deu/products/server/processors/q5400/feature/index.htm)
BX80574E5405A Aktivkühler oder für 1-HE-Systeme 45 nm E5405 2,00 GHz (80 W) 1333 12 MB gesamt
we have 2 processors so we have 24MB for 8 Cores
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 12:49:42 pm
Err well CPU-Z shows only per core then?  In any case:

Hmm, not a lot of Fibonacci difference here, but some: (fastest thread number was 2)

Built under xp32 with ICC 10.1 + TBB (run on XP 32)
Quote
Threads number is 2
Shared serial (mutex)           - in 0.286294 msec
Shared serial (spin_mutex)      - in 0.196978 msec
Shared serial (queuing_mutex)   - in 0.301214 msec
Shared serial (Conc.HashTable)  - in 4.313505 msec
Parallel while+for/queue        - in 1.485761 msec
Parallel pipe/queue             - in 1.980293 msec
Parallel reduce                 - in 0.523162 msec
Parallel scan                   - in 0.338611 msec
Parallel tasks                  - in 0.566134 msec

and Built under XP64 with Parallel Composer Beta Update 2 + TBB 2.0 ( but run on XP 32 also)
Quote
Threads number is 2
Shared serial (mutex)           - in 0.279819 msec
Shared serial (spin_mutex)      - in 0.208223 msec
Shared serial (queuing_mutex)   - in 0.284642 msec
Shared serial (Conc.HashTable)  - in 4.461598 msec
Parallel while+for/queue        - in 1.718736 msec
Parallel pipe/queue             - in 2.188073 msec
Parallel reduce                 - in 0.571781 msec
Parallel scan                   - in 0.357319 msec
Parallel tasks                  - in 0.534837 msec

So some things look a bit slower, but I will carefully consider shifting to ICC 11 soon, and check how our projects of interest compare.
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 12:58:28 pm
How many number let you generate ? 1000 ?
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 01:00:42 pm
No, just used default which was 100... will try 1000

[Later:]  Fastest 32 bit run built on XP32 ICC10.1 / TBB2.0 now 3 threads  :o:
Quote
Threads number is 3
Shared serial (mutex)           - in 162.014407 msec
Shared serial (spin_mutex)      - in 11.609819 msec
Shared serial (queuing_mutex)   - in 50.960339 msec
Shared serial (Conc.HashTable)  - in 401.327768 msec
Parallel while+for/queue        - in 93.399315 msec
Parallel pipe/queue             - in 164.994829 msec
Parallel reduce                 - in 27.500117 msec
Parallel scan                   - in 22.918168 msec
Parallel tasks                  - in 25.904447 msec

Getting parallel composer build data:
Quote
Threads number is 3
Shared serial (mutex)           - in 76.449678 msec
Shared serial (spin_mutex)      - in 13.449323 msec
Shared serial (queuing_mutex)   - in 50.961819 msec
Shared serial (Conc.HashTable)  - in 413.186277 msec
Parallel while+for/queue        - in 93.995606 msec
Parallel pipe/queue             - in 171.541281 msec
Parallel reduce                 - in 28.647254 msec
Parallel scan                   - in 27.231642 msec
Parallel tasks                  - in 24.389762 msec


Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 02:48:58 pm
No, just used default which was 100... will try 1000

[Later:]  Fastest 32 bit run built on XP32 ICC10.1 / TBB2.0 now 3 threads  :o:
Quote
Threads number is 3
Now you know why I choosed 5 .. a not even number
We can create every number of threads 1, 2, 3, 4.. 128, 256, 512 etc.   not even numbers also.
and we can use /QxHOST ---> Best performance on latest features of the processor supported by the compilation host.
 ::)
heinz
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2008, 04:02:41 pm
@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail?  Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 

I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.

Jason

compiled the fftw project (single thread) as 32 bit
 /I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:prompt

Results:
C:\Windows\system32>echo off
fftw-3.1.2 benchfsse(VS2005) started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 300.32 us, time: 169.69 ns, ``mflops'': 707.16
Problem: 16, setup: 288.86 us, time: 332.84 ns, ``mflops'': 961.43
Problem: 32, setup: 7.91 ms, time: 726.79 ns, ``mflops'': 1100.7
Problem: 64, setup: 27.46 ms, time: 1.67 us, ``mflops'': 1148.4
Problem: 128, setup: 62.98 ms, time: 4.19 us, ``mflops'': 1069.1
Problem: 256, setup: 137.48 ms, time: 9.18 us, ``mflops'': 1115
Problem: 512, setup: 267.80 ms, time: 20.95 us, ``mflops'': 1099.6
Problem: 1024, setup: 575.47 ms, time: 46.10 us, ``mflops'': 1110.7
Problem: 2048, setup: 1.37 s, time: 99.17 us, ``mflops'': 1135.8
Problem: 4096, setup: 3.42 s, time: 220.42 us, ``mflops'': 1115
Problem: 8192, setup: 8.83 s, time: 530.79 us, ``mflops'': 1003.2
Problem: 16384, setup: 21.99 s, time: 1.13 ms, ``mflops'': 1014.9
Problem: 32768, setup: 53.80 s, time: 2.41 ms, ``mflops'': 1020
Problem: 131072, setup: 369.12 s, time: 9.89 ms, ``mflops'': 1126
fftw-3.1.2 benchfsse ended.
Drücken Sie eine beliebige Taste . . .
----------------------------------------------------------------------------------------------------
For the threaded variants I must first read doku again...
Did you mean this ? or if you want some other Compiler options let me know..
If I have installed the Intel® Parallel Composer Beta, I will recompile the project...

regards heinz

sample above compiled with MSC-Compiler

C:\Windows\system32>echo off
compiled with Parallel Composer  Configuration(Release float SSE) Platform(Win32)
fftw-3.1.2 benchf_sse started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 241.93 us, time: 49.93 ns, ``mflops'': 2403.6
Problem: 16, setup: 276.57 us, time: 94.39 ns, ``mflops'': 3390
Problem: 32, setup: 7.91 ms, time: 117.86 ns, ``mflops'': 6787.9
Problem: 64, setup: 26.76 ms, time: 219.35 ns, ``mflops'': 8753.3
Problem: 128, setup: 61.71 ms, time: 447.42 ns, ``mflops'': 10013
Problem: 256, setup: 124.16 ms, time: 855.56 ns, ``mflops'': 11969
Problem: 512, setup: 238.18 ms, time: 1.99 us, ``mflops'': 11575
Problem: 1024, setup: 403.56 ms, time: 4.47 us, ``mflops'': 11455
Problem: 2048, setup: 719.56 ms, time: 10.62 us, ``mflops'': 10611
Problem: 4096, setup: 1.41 s, time: 25.84 us, ``mflops'': 9510.4
Problem: 8192, setup: 3.14 s, time: 58.67 us, ``mflops'': 9076.4
Problem: 16384, setup: 7.01 s, time: 125.16 us, ``mflops'': 9163.6
Problem: 32768, setup: 16.08 s, time: 279.92 us, ``mflops'': 8779.5
Problem: 131072, setup: 87.35 s, time: 1.29 ms, ``mflops'': 8658.3
fftw-3.1.2 benchf_sse ended.

with 128K  8658,3 mflops
best relation ~1:10
let's everybody make his own thoughts..
heinz
Title: Re: optimized sources
Post by: Jason G on 28 Nov 2008, 10:26:52 pm
Ahhh, so FFTW's warnings about MS compiler generating incorrect SSE code for FFTW might be correct.   Good to know.  I'm pretty sure the stock DLL would have been built with GCC/MinGW.

Much better numbers  ;D
Title: Re: optimized sources
Post by: _heinz on 29 Dec 2008, 05:26:30 am
Hi Jason,
the new Intel Board is available -->Intel SmackOver DX58SO X58 price 228,58 € in Germany
Produkttyp Motherboard
Formfaktor ATX
Abmessungen (Breite x Tiefe x Höhe) 30.5 cm x 24.4 cm
Chipsatz Intel X58 Express / Intel ICH10R
Multi-Core-Unterstützung 4-Core
Prozessor 0 ( 1 ) - LGA1366 Socket
Kompatible Prozessoren Core i7, Core i7 Extreme
64-Bit-Prozessor-Kompatibilität Eingebaut
RAM 0 MB (installiert) / 16 GB (Max)
Unterstützte RAM-Technologie DDR3 SDRAM
Unterstützte RAM-Integritätsprüfung Nicht-ECC
Storage Controller Serial ATA-300 (RAID)
Konfiguration von USB-Steckplätzen 12 x USB
Konfiguration von Speichersteckplätzen 6 x SATA, 2 x eSATA
Konfiguration von FireWire-Steckplätzen 2 x FireWire
Audioausgang Soundkarte - 7.1 Channel Surround
Netzwerk Netzwerkkarte - Intel 82567LM - Ethernet, Fast Ethernet, Gigabit Ethernet

have a look http://www.kmelektronik.de/
 
heinz

Title: Re: optimized sources
Post by: _heinz on 05 Jan 2009, 08:10:12 am
Happy New Year,
the new year started with some strong issues.
short before chrismas the last AP is out now, thanks to all who are involved to make it possible.
1. AP rev69 duration time now 9 - 10 hours , Standard AP need ca 70-90 hours (measured Intel E8600 @3,6 Ghz)
2. we are working on AP, to make it fit for much more parallelism.
3. my test and developer machine AK-V8 suffered by a  bad disk, which I took off today. Now it runs again.
4. some support requests are still open btw ati 8.12 driver, which I need for ati developer environment.
5. our actions will be in the closed forums, so let you surprize from time to time.

heinz
 ;D
Title: Re: optimized sources
Post by: Crunch3r on 05 Jan 2009, 01:36:35 pm
@Heinz: Do you happen to have any single and multithreaded FFT processing times benched on your skulltrail?  Time for 1,2,4 & 8 threads would be nice for 32k element &/or 128k elements, if you have them. 

I'm trying to verify/refine some efficiency calculations & have no reference but my dual core.

Jason

compiled the fftw project (single thread) as 32 bit
 /I "." /I ".." /I "../libbench2" /I "../api" /I "../kernel" /I "../dft" /I "../rdft" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "FFTW_SINGLE" /D "BENCHFFT_SINGLE" /D "HAVE_SSE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHsc /MT /Fp".\bench___Win32_Release_float/bench.pch" /Fo".\bench___Win32_Release_float/" /Fd".\bench___Win32_Release_float/" /W3 /nologo /c /errorReport:prompt

Results:
C:\Windows\system32>echo off
fftw-3.1.2 benchfsse(VS2005) started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 300.32 us, time: 169.69 ns, ``mflops'': 707.16
Problem: 16, setup: 288.86 us, time: 332.84 ns, ``mflops'': 961.43
Problem: 32, setup: 7.91 ms, time: 726.79 ns, ``mflops'': 1100.7
Problem: 64, setup: 27.46 ms, time: 1.67 us, ``mflops'': 1148.4
Problem: 128, setup: 62.98 ms, time: 4.19 us, ``mflops'': 1069.1
Problem: 256, setup: 137.48 ms, time: 9.18 us, ``mflops'': 1115
Problem: 512, setup: 267.80 ms, time: 20.95 us, ``mflops'': 1099.6
Problem: 1024, setup: 575.47 ms, time: 46.10 us, ``mflops'': 1110.7
Problem: 2048, setup: 1.37 s, time: 99.17 us, ``mflops'': 1135.8
Problem: 4096, setup: 3.42 s, time: 220.42 us, ``mflops'': 1115
Problem: 8192, setup: 8.83 s, time: 530.79 us, ``mflops'': 1003.2
Problem: 16384, setup: 21.99 s, time: 1.13 ms, ``mflops'': 1014.9
Problem: 32768, setup: 53.80 s, time: 2.41 ms, ``mflops'': 1020
Problem: 131072, setup: 369.12 s, time: 9.89 ms, ``mflops'': 1126
fftw-3.1.2 benchfsse ended.
Drücken Sie eine beliebige Taste . . .
----------------------------------------------------------------------------------------------------
For the threaded variants I must first read doku again...
Did you mean this ? or if you want some other Compiler options let me know..
If I have installed the Intel® Parallel Composer Beta, I will recompile the project...

regards heinz

sample above compiled with MSC-Compiler

C:\Windows\system32>echo off
compiled with Parallel Composer  Configuration(Release float SSE) Platform(Win32)
fftw-3.1.2 benchf_sse started
benchf_sse.exe -opatient 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
131072
Problem: 8, setup: 241.93 us, time: 49.93 ns, ``mflops'': 2403.6
Problem: 16, setup: 276.57 us, time: 94.39 ns, ``mflops'': 3390
Problem: 32, setup: 7.91 ms, time: 117.86 ns, ``mflops'': 6787.9
Problem: 64, setup: 26.76 ms, time: 219.35 ns, ``mflops'': 8753.3
Problem: 128, setup: 61.71 ms, time: 447.42 ns, ``mflops'': 10013
Problem: 256, setup: 124.16 ms, time: 855.56 ns, ``mflops'': 11969
Problem: 512, setup: 238.18 ms, time: 1.99 us, ``mflops'': 11575
Problem: 1024, setup: 403.56 ms, time: 4.47 us, ``mflops'': 11455
Problem: 2048, setup: 719.56 ms, time: 10.62 us, ``mflops'': 10611
Problem: 4096, setup: 1.41 s, time: 25.84 us, ``mflops'': 9510.4
Problem: 8192, setup: 3.14 s, time: 58.67 us, ``mflops'': 9076.4
Problem: 16384, setup: 7.01 s, time: 125.16 us, ``mflops'': 9163.6
Problem: 32768, setup: 16.08 s, time: 279.92 us, ``mflops'': 8779.5
Problem: 131072, setup: 87.35 s, time: 1.29 ms, ``mflops'': 8658.3
fftw-3.1.2 benchf_sse ended.

with 128K  8658,3 mflops
best relation ~1:10
let's everybody make his own thoughts..
heinz

you gotta be carefull with fftw and which compiler to use. From my own experience the pre-packaged gcc builds where always faster than the icc compiled code !

Title: Re: optimized sources
Post by: Jason G on 05 Jan 2009, 06:37:36 pm
you gotta be carefull with fftw and which compiler to use. From my own experience the pre-packaged gcc builds where always faster than the icc compiled code !

Good tip, when I tried this back on my p4 last year, the machine didn't seem to handle the GCC build, will have to get around to trying again on newer hardware.
Title: Re: optimized sources
Post by: _heinz on 24 Feb 2009, 11:36:26 am
The new astropulse_v5 5.03 was published february 21th, have a look at our frontpage http://lunatics.kwsn.net/index.php

 ;D
Title: Re: optimized sources
Post by: _heinz on 31 Mar 2009, 03:12:18 am
Running the new astropulse 5.03:
21 _heinz 13,969.73 1,857,340 GenuineIntel
Intel(R) Xeon(R) CPU E5405 @ 2.00GHz [Intel64 Family 6 Model 23 Stepping 6]
(8 processors) 
---------------------------------------------------------------------------
now number 21 with a rac of 13969,73 without any GPU app.
Hope to get the 14000 tonight.

heinz  ;D
Title: Re: optimized sources
Post by: _heinz on 01 Apr 2009, 12:19:03 pm
21 _heinz 14,405.11 1,880,786
modify:
two days later
18 _heinz 15,307.10 1,915,535
15 _heinz 15,699.14 1,924,733
V8-SK01 home 16,033.16 1,929,683
14 _heinz 16,033.16 1,929,683
Title: Re: optimized sources
Post by: Jason G on 01 Apr 2009, 12:25:46 pm
LoL, Go Heinz! My downward dive has started. Both machines cooling down ready for cleanout.  Two more weeks 'till our refactoring microscopic disassembly sessions.  Dust off your dork hat, I still need to find mine .. I left it somewhere...
Title: Re: optimized sources
Post by: _heinz on 04 Apr 2009, 07:18:03 pm
since yesterday the runtime from ap5.03 (ap_5.03r112_SSE3.exe) decrease from 15 hours to 10 hours.
in all these shorter wu's I see now: repetitive pulses: 30, percent blanked: 0.00

looks like the wu finished in this case.
hmmm....anything happened on the serverside ?
on 4th april I calculated 19 wu's, all finished around 10 hours.
have a look at my machine http://setiathome.berkeley.edu/result.php?resultid=1196217010

heinz
Title: Re: optimized sources
Post by: Josef W. Segur on 05 Apr 2009, 01:30:38 am
since yesterday the runtime from ap5.03 (ap_5.03r112_SSE3.exe) decrease from 15 hours to 10 hours.
in all these shorter wu's I see now: repetitive pulses: 30, percent blanked: 0.00

looks like the wu finished in this case.
hmmm....anything happened on the serverside ?
on 4th april I calculated 19 wu's, all finished around 10 hours.
have a look at my machine http://setiathome.berkeley.edu/result.php?resultid=1196217010

heinz

Yep, "class Astropulse::T_ffa:   total=4.37e+009,   N=1" indicates the repetitive pulse limit was reached in the first short FFA, our mod stops doing FFAs when they aren't needed any more and saves a lot of crunch time.

You got a group of WUs from the ap_28dc08aa 'tape' which had too little noise to get above the remove_radar threshold, but enough to be detected by the repetitive pulse folding. Until there's a radiotelescope on the far side of Luna or some other place away from terrestrial noise, that's going to happen sometimes.
                                                                   Joe
Title: Re: optimized sources
Post by: _heinz on 12 Apr 2009, 08:19:01 am
Joyeuses Paques
Happy Easter....Happy Spring 2009 !!
 to you and yours.........
hugs
Heinz   :)
Title: Re: optimized sources
Post by: _heinz on 12 Apr 2009, 04:19:38 pm
Added a second 8GB FB-DIMM RAM Kit to the V8-Xeon.
Both kits are from Kingston and should be identical, first is assembled in USA, second in China.
My first original FB-DIMM RAM Kit
KVR800D2D4F5K2/8G
8 GB PC2-6400 CL5 ECC 240-Pin-DIMM Kit (2 pcs.)
      Assembled in USA
ELP 9965422-009.A00LF  999
---------------------------------------------------------------------
My second FB-DIMM RAM Kit
KVR800D2D4F5K2/8G
8 GB PC2-6400 CL5 ECC 240-Pin-DIMM Kit (2 pcs.)
ELP          Assembled in China
SS0192S  9965422-009.A01LF   4074
-----------------------------------------------------------------------

SLOT 1 Channel A DIMM 0:  4GB FBDIMM USA 9965422-009.A00LF
SLOT 2 Channel B DIMM 0:  4GB FBDIMM USA 9965422-009.A00LF
SLOT 3 Channel C DIMM 0:  4GB FBDIMM China 9965422-009.A01LF
SLOT 4 Channel D DIMM 0:  4GB FBDIMM China 9965422-009.A01LF
-----------------------------------------------------------------------
I found nothing in the install doku about pairing like (SLOT1 & SLOT3) (SLOT2 & SLOT4).

D5400XS Jumper 23 is BIOS Configuration
1-2 Normal
2-3 Configuration
no Jumper - Recovery
-----------------------------------------------------------
I shut down the machine, switched the electricity off, set the Jumper to 2-3 Configuration, set the new FB-DIMMs into SLOT3 and SLOT4, switched the electricity on, started the machine, it runs into setup and shows all 4 FBDIMMs and 16GB total RAM.
I looked through the entries, all normal and quitted with F10 save Yes.
The machine shows : Switch off electricity and set Jumper to Normal (1-2)
I have done so.
OK.
Switch electricity on, Start the machine, it starts Vista64, OK
Vista64 shows in Systeminformation:
Betriebssystemname   Microsoft® Windows Vista™ Ultimate
Version   6.0.6001 Service Pack 1 Build 6001
Zusätzliche Betriebssystembeschreibung    Nicht verfügbar
Betriebssystemhersteller   Microsoft Corporation
Systemname   V8-SK01
Systemhersteller   INTEL_
Systemmodell   D5400XS_
Systemtyp   x64-basierter PC
Prozessor   Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz, 2398 MHz, 4 Kern(e), 4 logische(r) Prozessor(en)
Prozessor   Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz, 2398 MHz, 4 Kern(e), 4 logische(r) Prozessor(en)
BIOS-Version/-Datum   Intel Corp. XS54010J.86A.1175.2008.1121.1010, 21.11.2008
SMBIOS-Version   2.4
...
...
Installierter physikalischer Speicher (RAM)   16,0 GB
Gesamter realer Speicher   4,00 GB
Verfügbarer realer Speicher   5,71 GB
Gesamter virtueller Speicher   16,0 GB
Verfügbarer virtueller Speicher   13,7 GB
Größe der Auslagerungsdatei   8,29 GB
Auslagerungsdatei   C:\pagefile.sys
-----------------------------------------------------------------------------------------------------
So we can assume Vista64 knows 16 GB RAM.
Now the Problem:
Some other installed Programms on Vista show still 8 GB of RAM (VMWare Server 2.0, Sideboard CPU usage shows 29% same as I have still 8GB, BOINC seti shows still 8GB, and Everest Ultimate shows still 8 GB)
---------------------------------------------------------------------------------------------------------------
I use EVEREST v5.01.1700 (latest version) --> http://www.lavalys.com/
Everest Ultimate Motherboard --> Speicher
Informationsliste   Wert
Physikalischer Speicher   
Gesamt   8188 MB
Belegt   2353 MB
Frei   5834 MB
Auslastung   29 %
   
Auslagerungsdatei   
Gesamt   16428 MB
Belegt   2453 MB
Frei   13974 MB
Auslastung   15 %
   
Virtueller Speicher   
Gesamt   24616 MB
Belegt   4807 MB
Frei   19809 MB
Auslastung   20 %
   
Auslagerungsdatei   
Auslagerungsdatei   C:\pagefile.sys
Momentane Größe   8488 MB
   
Physical Address Extension (PAE)   
Vom Betriebssystem unterstützt   Ja
Von der CPU unterstützt   Ja
Aktiv   Ja
-----------------------------------------------------------------------------------------------
Informationsliste   Wert   
Motherboard   
CPU Typ   2x QuadCore Intel Xeon E5405, 2400 MHz (6 x 400)
Motherboard Name   Intel Skulltrail D5400XS  (2 PCI, 4 PCI-E x16, 4 FB-DIMM, Audio, Gigabit LAN)
Motherboard Chipsatz   Intel Seaburg 5400B
Arbeitsspeicher   8188 MB  (DDR2-800 Fully Buffered ECC DDR2 SDRAM)
Channel0-DIMM1: Kingston   4 GB DDR2-800 ECC DDR2 SDRAM FB-DIMM  (4-4-4-15 @ 320 MHz)  (5-5-5-18 @ 400 MHz)
Channel1-DIMM1: Kingston   4 GB DDR2-800 ECC DDR2 SDRAM FB-DIMM  (4-4-4-15 @ 320 MHz)  (5-5-5-18 @ 400 MHz)
BIOS Typ   Intel (11/21/08)
--------------------------------------------------------------------------------------

if I look with EVEREST at Sensors:
FB-DIMM1   80 °C  (176 °F)
FB-DIMM2   75 °C  (167 °F)
FB-DIMM3   64 °C  (147 °F)
FB-DIMM4   62 °C  (144 °F)
it shows the temps of all 4 FB-DIMMs
hmmm....

If I start the machine new (jumper 23 in Normal position, then press F2 to look into the Bios still 8GB are shown.  :'(
any ideas, why the RAM is not shown in full size ?

I set Jumper 23 to 2-3 Configuration, it shows again 16 GB.
Your hints are welcome

modify: 2 days later
I got all 16 GB runing, after disable OC.
Now I OCed the machine again, it runs with same speed as before but RAM is configured automatic.  ;D

heinz




Title: Re: optimized sources
Post by: _heinz on 13 Apr 2009, 04:53:25 am
19 _heinz 16,803.26 2,089,656
Today I get a RAC of 16803 running our optimized astropulse 5.03.  ;)
modify:
19 _heinz 17,106.68 2,102,178   ;D
Title: Re: optimized sources
Post by: _heinz on 14 Apr 2009, 01:40:47 pm
Whats new after running 16 GB ?
 Memory bus properties
 1.  Bus Type  changed from Dual DDR2 SDRAM to Quad DDR2 SDRAM
 2.  Bus With changed from 128-bit to 256-bit
 3.  Bandwith changed from 12800 MB/s to 25600 MB/s

North Bridge
Memory Controller
Typ Quad channel 256 Bit
active mode changed from Dual Channel (128 Bit) to Quad Channel (256 Bit)

hint: it is recommended use all 4 memory slots to run quad channel modus !

 ;D

Title: Re: optimized sources
Post by: _heinz on 26 Apr 2009, 05:44:17 pm
My RAC drop down like a rock. (http://www.britta-d.de/images/astropulse/rac_down.jpg)
18th april rac=17500
26th april rac=15400
Although crunching 24/7 astropulse 5.03 all the time this surprized me.
Usually I run 12 wu/per day, sometimes some more, depending on the blanking of the wu. Looks like I get a wu-series with high blanking now.
My pendings increased from 146,000 up to
Pending credit: 174,653.22
----------------------------------------
have a look at the statistics (http://boincstats.com/stats/boinc_host_graph.php?pr=bo&id=f92199c6a274d674b15745b604c432e2)
Tonight I will run the cache dry to make some OC experiments and reconfigurations.

heinz
Title: Re: optimized sources
Post by: Urs Echternacht on 27 Apr 2009, 04:45:58 pm

My RAC drop down like a rock. (http://www.britta-d.de/images/astropulse/rac_down.jpg)
18th april rac=17500
26th april rac=15400
Although crunching 24/7 astropulse 5.03 all the time this surprized me.
Usually I run 12 wu/per day, sometimes some more, depending on the blanking of the wu. Looks like I get a wu-series with high blanking now.
My pendings increased from 146,000 up to
Pending credit: 174,653.22
----------------------------------------
have a look at the statistics (http://boincstats.com/stats/boinc_host_graph.php?pr=bo&id=f92199c6a274d674b15745b604c432e2)
Tonight I will run the cache dry to make some OC experiments and reconfigurations.

heinz

What wonders me more is that i have seen a similar course of RAC (http://www.echtbaer.de/Images/RAC_P9500_090427.png) on my P9500 about a week ago.
Its RAC went up again over the last week.

Guess you had bad luck with some lazy wingmen.  ;D
Title: Re: optimized sources
Post by: _heinz on 03 May 2009, 12:13:47 pm
21 _heinz 17,539.20 2,429,461
today number 21 again
it goes a little up and down, as my pendings, depends on my wingmens
Pending credit: 145,871.70

;-)
Title: Re: optimized sources
Post by: _heinz on 04 May 2009, 03:54:30 am
18 _heinz 18,201.13 2,449,583 GenuineIntel
Intel(R) Xeon(R) CPU E5405 @ 2.00GHz [Intel64 Family 6 Model 23 Stepping 6]
(8 processors) 
today number 18 (http://setiathome.berkeley.edu/top_hosts.php) with a RAC of 18201   ;D

modify: late in the evening
number 16 now
16 _heinz 18,919.80 2,467,167

 ;D
Title: Re: optimized sources
Post by: _heinz on 09 May 2009, 07:13:55 am
9th of may,
19 _heinz 18,964.10 2,548,575
now again nearly 19000 RAC... thanks our optimized astropulse app 5.03

modify: late evening
get it
18 _heinz 19,086.22 2,557,348

wow over 19000 RAC now  :o
Title: Re: optimized sources
Post by: _heinz on 10 May 2009, 05:04:36 am
Today number 16 again in the top hosts,
16 _heinz 19,537.79 2,574,871  ;D

Maybe I can get 20000 RAC .....
Title: Re: optimized sources
Post by: _heinz on 16 May 2009, 05:37:54 pm
16 _heinz 19,382.72 2,697,822
number 16 again, but rac is going up and down  ::)

I maked some VMware-experiments with several VM's (Xp64, NT4 with SFM[Services for Macintosh], W7RC 64Bit, Suse 11.0 64Bit) .
The best of all the VMWare is the flexibility(processors, memory, disk)  to create the VM's for a special task area.
Over the remote client vmware-vmrc.exe you can connect to the VM's over the network from different client and start  applications on the virtual machine. If the vmrc is connected you can switch easy by click to all your running VM's, if permission is set. Really cool stuff. There is a lot to explore und start interesting experiments.

heinz


Title: Re: optimized sources
Post by: _heinz on 22 May 2009, 01:57:24 pm
Hi Jason,

are you interested to have access to one of my VMs  ?

heinz
Title: Re: optimized sources
Post by: Jason G on 22 May 2009, 02:04:27 pm
Could be useful/ How's that work?[tunnling & vnc?] and what are they running?

[I see..., List below  ;) looks like a very useful combo for testing installers and other stuff, will have to read about it]
Title: Re: optimized sources
Post by: _heinz on 22 May 2009, 05:24:39 pm
I did upload the vm-ware plugin you need to our testproject. Install it on one of our 32-bit machines.
If I have done managing roles and permissions to give you access, I will make a short readme with further instructions on the next days.

heinz
Title: Re: optimized sources
Post by: _heinz on 24 May 2009, 03:47:56 am
Hi Jason,
the readme is up there. If the seti-race is over we can start the experiment. I will PM you if I'm ready to begin.

heinz
Title: Re: optimized sources
Post by: Jason G on 24 May 2009, 04:06:03 am
Cheers Heinz.  Will have a good look a bit later tonight.
Title: Re: optimized sources
Post by: _heinz on 25 May 2009, 03:15:10 am
Hi Jason
The first experiments to reach the VM's (with your account) over vm-RC client from my intranet has been sucessful, so we can hope to start soon.
Today I will go to my neighbour and try to reach the VMs over internet from there.

heinz


Title: Re: optimized sources
Post by: Jason G on 25 May 2009, 04:29:19 am
Yeah, I didn't have a user & pass to enter, but it certainly tried  ;) I wonder if you have the same VM's running all the time, on the same subnet, so I can experiment with some clustering ideas in the future? Boinc's limitations, under most compute intensive applications, could be leveraged (with alternate means... and some minor modification) as advantages for reliability & performance.
Title: Re: optimized sources
Post by: _heinz on 25 May 2009, 05:02:50 pm
Hi Jason,
vmware plugin:
two modified readme files are up now in our testproject
have a look

heinz
Title: Re: optimized sources
Post by: _heinz on 25 May 2009, 05:36:12 pm
Today I will go to my neighbour and try to reach the VMs over internet from there.
have had some problems todo it

heinz
Title: Re: optimized sources
Post by: Jason G on 25 May 2009, 07:59:15 pm
Hi Heaiz,
  Will have to take care of dome car issues today first,  then some work to do, but will be able to have another look this eveneing,

Jason
Title: Re: optimized sources
Post by: _heinz on 27 May 2009, 05:08:02 am
Today I will go to my neighbour and try to reach the VMs over internet from there.
have had some problems todo it
problems resolved now, Jason can work on my VM's now  ;D
This give us a wide testfield solving special tasks for our work now.
I have been suffering many years not having the hardware to install and try modern and some exotic OS.
The experts know what I mean.
This is solved now.
greetings to all readers  ;)  of this epic thread.....56172 views now

heinz





Title: Re: optimized sources
Post by: Gecko_R7 on 27 May 2009, 06:45:27 pm
So, am I late to the party with this?

Me thinks you guys are already on top of this dev kit  :-\

Intel Parallel Studio (http://www.dailytech.com/Intel+Parallel+Studio+Helps+Software+Developers+Go+MultiCore/article15238.htm)
Title: Re: optimized sources
Post by: Jason G on 28 May 2009, 09:02:21 am
Tried the Beta.  Hated the removal of the compiler switches I like, then went back to ICC.  They've got basically the same stuff, repackaged IMO, but targeted to a different target customer.
Title: Re: optimized sources
Post by: Gecko_R7 on 28 May 2009, 02:59:17 pm
Tried the Beta.  Hated the removal of the compiler switches I like, then went back to ICC.  They've got basically the same stuff, repackaged IMO, but targeted to a different target customer.


Lol!  I suspected you'd already played w/ it.
Makes sense re: ICC.
Title: Re: optimized sources
Post by: Jason G on 28 May 2009, 03:21:27 pm
yeah, ICC+TBB really, integrated a bit, so not a bad combo all the same.  The part I founding annoying was the implications (probably due to marketing pressure) that if you recompile your program in this environment, then you get a multithreaded program.  Still little/no mention of processes involved to achieve an effective parallel design, which remains a pencil and paper affair  ::)
Title: Re: optimized sources
Post by: _heinz on 09 Jun 2009, 05:53:33 pm
"VMware Tools" is now running on V8-VM2 openSUSE-11.0-x86_64
To activate VMwaretools, the kernel must recompiled, some modules of VMwaretools must be included into the kernel. This is successful done.  ;D
In the "VMware Infrastructure Web Access" you see VMware Tools is running.
We have now full functionality of VMwaretools in linux.
The VMware Tools package provides support required for shared folders and for drag and drop operations. Other tools in the package support synchronization of time in the guest operating system with time on the host, automatic grabbing and releasing of the mouse cursor, copying and pasting between guest and host, and improved mouse performance. The SVGA driver provides significantly faster graphics performance.

heinz  ;D
Title: Re: optimized sources
Post by: _heinz on 13 Jun 2009, 09:48:48 am
astropulse a short history

it begun with astropulse rev 358  (http://www.britta-d.de/images/astropulse/berkeley_rev358.jpg)
after getting some small difficulties (http://www.britta-d.de/images/astropulse/err_01.jpg) and surprises (http://www.britta-d.de/images/astropulse/err_vector.jpg) we had to checkout boinc (http://www.britta-d.de/images/astropulse/checkout_boinc.jpg).
After some work and introducing "Visul Studio 8" we get ap 4.35 rev24b54 (http://www.britta-d.de/images/astropulse/ap_4.35_directorystructure.jpg)
Next was some important research in parallel work like
bench fft sse (http://www.britta-d.de/images/astropulse/benchfftsse.jpg) and bench fft see affinity (http://www.britta-d.de/images/astropulse/benchfftsseaffinity.jpg) show no difference.
astropulse r69 (http://www.britta-d.de/images/astropulse/ap_r69p.jpg) runs parallel with 8*AK_v8_win_x64_SSSE3x
followed by some research with fft serial (http://www.britta-d.de/images/astropulse/128_serial.jpg) and 128K fft serial (http://www.britta-d.de/images/astropulse/128_serial_all.jpg) runs on 1 core.
128K fft parallel (http://www.britta-d.de/images/astropulse/128_parallel.jpg) as you can see it runs on all 8 cores.
astropulse rev78 (http://www.britta-d.de/images/astropulse/ap_r78p.jpg) runs parallel with 8*AK_v8_win_x64SSSE3x
sysinfo (http://www.britta-d.de/images/astropulse/ap_r78pj_sysinfo.jpg) of astropulse rev78pj (parallel job)
test r78 preference job (http://www.britta-d.de/images/astropulse/ap_r78prefj.jpg)
latest free optimized astropulse 5.03 and RAC (http://www.britta-d.de/images/astropulse/rac_down2703_2604.jpg)
astropulse 5.03 on virtual machine VM1 (http://www.britta-d.de/images/astropulse/vm1ap503.jpg) run parallel with 8* ap5.03 on the host V8-Xeon
influence of astropulse to RAC (http://www.britta-d.de/images/astropulse/rac_down2703_2604.jpg)
astropulse 5.03 highest RAC (http://www.britta-d.de/images/astropulse/rac_down_0905_0806.jpg)
and since astropulse has no work RAC drops like a rock (http://www.britta-d.de/images/astropulse/rac_down_1405_1306.jpg)

now astropulse 5.05 is in beta status...
Title: Re: optimized sources
Post by: _heinz on 08 Sep 2009, 03:24:58 pm
Greetings to all readers of this epic thread  ;D , more than 60600 had have a look here.  :o
After the summer break I'm back now with a full equipt V8-VM1 for final testing.
OS: Windows64 Professional 64Bit
MS Visual Studio Professional
Intel Parallel Studio
NVIDIA GPU Computing SDK(emu mode)
ATI Stream SDK with Open CL

To show how all this works I compiled some samples and attached the files.
Now you can have a look at it.

 

[attachment deleted by admin]
Title: Re: optimized sources
Post by: _heinz on 10 Sep 2009, 04:41:15 pm
Today I had have a outage of 6 hours.
After updating the Catalyst Center and several updates from Microsoft the machine shut down and did not restart again. The display of the board shows constant 29, nothing more happened. POST comes not to the end any more.
So I demounted the Server, took off the memory, cleaned all tools. As I was ready with my work and all mounted again, I switched the power on. It hang again on 29. Then I pressed reset and after that POST finished and the machine comes up, I pressed F2 to go to BIOS setup, switched off the ocing press F10 to save and after this the machine comes up again and load the OS Vista64 Ultimate.
Now I made BIOS update 1353 over OS.
Datum System BIOS   07/24/09
DMI BIOS Version   XS54010J.86A.1353.2009.0724.1139
After restart the machine loads the OS happily.   ;D
I made some driver update for sound and network, shut down and switched the ocing on again in the BIOS.
CPU Takt   2398.3 MHz  (Original: 2000 MHz, overclock: 20%)
CPU FSB   399.7 MHz  (Original: 333 MHz, overclock: 20%)
The machine starts without any problems now.
Now Catalyst 09.8(from Sapphire not from AMD) is installed and working.

All days surpises with updates,  :o I'm happy that no hardware was damaged  ;D



Title: Re: optimized sources
Post by: Raistmer on 10 Sep 2009, 04:53:27 pm
Better downgrade Catalyst to 9.2
it has more sense now.
Title: Re: optimized sources
Post by: _heinz on 25 Sep 2009, 06:03:05 pm
Now I'm running ati 9.2 & boinc 6.10.7
so far this works with the milkyway project.
elapsed 2:13 for each wu.
Some additional test for our seti project can now be done.
In preparation of this I give Jason access to the V8.

Title: Re: optimized sources
Post by: _heinz on 26 Sep 2009, 05:12:46 pm
Today I updated the ati-stream-SDK to v2.0-beta3  09/23/2009
on V8-VM1 and V8-SK01
some compiler and code generation issues are fixed
64-bit atomic built-ins are now recognized
*-*-*-*-*-*
Title: Re: optimized sources
Post by: _heinz on 10 Oct 2009, 04:50:18 pm
Raistmer announced:
now in beta testing:
have a look -->astropulse hybrid-cpu-ati-gpu-build-for-gpus-with-double-precisions-support (http://lunatics.kwsn.net/18-astropulse-testing/astropulse-hybrid-cpu-ati-gpu-build-for-gpus-with-double-precision-support.msg22408.html;topicseen#msg22408)
---------------------------------------------------------------------------------------------------------------------------
This picture 8akv8_2collatz_2brook_r278_nodouble (http://www.britta-d.de/images/astropulse/8akv8_2collatz_2brook_r278_nodouble.jpg)   :o
shows how  4 gpu apps are running on a single HD3870 together with 8*akv8 on my v8-Xeon



Title: Re: optimized sources
Post by: _heinz on 11 Oct 2009, 06:44:17 pm
Intel Parallel Composer
07 Oct 2009   Update 2 Revised
-----------------------------------------------
now installed on V8-vm1 and tested with the NQueens sample:
1>------ Neues Erstellen gestartet: Projekt: Step4-Parallel-Tune, Konfiguration: Release x64 ------
2>------ Neues Erstellen gestartet: Projekt: Step2-3-Parallel-Check, Konfiguration: Release x64 ------
2>Deleting intermediate files and output files for project 'Step2-3-Parallel-Check', configuration 'Release|x64'.
1>Deleting intermediate files and output files for project 'Step4-Parallel-Tune', configuration 'Release|x64'.
1>Compiling with Intel(R) C++ Compiler 11.1.068 [Intel(R) 64]... (Intel C++ Environment)
2>Compiling with Intel(R) C++ Compiler 11.1.068 [Intel(R) 64]... (Intel C++ Environment)
1>nq-parallelfinal.cpp
2>nq-parallelstart.cpp
2>Linking... (Intel C++ Environment)
1>Linking... (Intel C++ Environment)
1>xilink: executing 'link'
2>xilink: executing 'link'
1>Embedding manifest... (Microsoft VC++ Environment)
2>Embedding manifest... (Microsoft VC++ Environment)
2>Build log was saved at "file://C:\Documents and Settings\heinz\My Documents\Visual Studio 2008\Projects\NQueens-ParallelStudio\Step2-3-Parallel-Check\x64\Release\BuildLog.htm"
2>Step2-3-Parallel-Check - 0 error(s), 0 warning(s)
1>Build log was saved at "file://C:\Documents and Settings\heinz\My Documents\Visual Studio 2008\Projects\NQueens-ParallelStudio\Step4-Parallel-Tune\x64\Release\BuildLog.htm"
1>Step4-Parallel-Tune - 0 error(s), 0 warning(s)
3>------ Neues Erstellen gestartet: Projekt: Step1-Serial-Hotspot, Konfiguration: Release x64 ------
3>Deleting intermediate files and output files for project 'Step1-Serial-Hotspot', configuration 'Release|x64'.
3>Compiling with Intel(R) C++ Compiler 11.1.068 [Intel(R) 64]... (Intel C++ Environment)
3>nq-serial.cpp
3>Linking... (Intel C++ Environment)
3>xilink: executing 'link'
3>Embedding manifest... (Microsoft VC++ Environment)
3>Build log was saved at "file://C:\Documents and Settings\heinz\My Documents\Visual Studio 2008\Projects\NQueens-ParallelStudio\Step1-Serial-Hotspot\x64\Release\BuildLog.htm"
3>Step1-Serial-Hotspot - 0 error(s), 0 warning(s)
========== Alles neu erstellen: 3 erfolgreich, Fehler bei 0, 0 übersprungen ==========
 :)
Title: Re: optimized sources
Post by: _heinz on 15 Oct 2009, 03:54:15 pm
Get my W7 Ultimate yesterday  ;D
Title: Re: optimized sources
Post by: Urs Echternacht on 15 Oct 2009, 04:07:38 pm
Get my W7 Ultimate yesterday  ;D
Have fun !  :D
Title: Re: optimized sources
Post by: _heinz on 19 Oct 2009, 04:09:32 pm
This morning I found the V8-Xeon down..
The 1000W AXP Power supply died....phhhh  :'(
Fighting now with RMA... 
Title: Re: optimized sources
Post by: @Home on 22 Oct 2009, 06:39:02 am
Downgraded to Catalyst 9.2.
Downloaded ap_5.05_win_x86_SSE3_BROOK_r280 and like to start testing.
Has someone a working app_info.xml for this purpose?
Furthermore a read something about a BOINC Beta account. Do I need that? Is it different from my 'normal' BOINC account?
Title: Re: optimized sources
Post by: Raistmer on 22 Oct 2009, 09:23:33 am
Downgraded to Catalyst 9.2.
Downloaded ap_5.05_win_x86_SSE3_BROOK_r280 and like to start testing.
Has someone a working app_info.xml for this purpose?
Furthermore a read something about a BOINC Beta account. Do I need that? Is it different from my 'normal' BOINC account?

All discussion of this app in beta area still.
Title: Re: optimized sources
Post by: @Home on 22 Oct 2009, 09:44:40 am
OK, I'll start looking for my answers in the Beta section...
Title: Re: optimized sources
Post by: _heinz on 28 Oct 2009, 08:23:47 pm
This morning I found the V8-Xeon down..
The 1000W AXP Power supply died....phhhh  :'(
Fighting now with RMA... 
I'm still waiting for the power supply  :'(
Title: Re: optimized sources
Post by: _heinz on 07 Nov 2009, 07:39:16 pm
New power supply has arrived now. This time I got a SuperFlower Crystal Plus 1000W.
My old one AXP 1000W (http://www.britta-d.de/bilder/server/page14.htm) was not available, so I got the Superflower (http://www.super-flower.de/index.php?id=46).
Tomorrow we will see if this woks with my D5400XS.
Title: Re: optimized sources
Post by: _heinz on 07 Nov 2009, 08:02:46 pm
Time to have a look on my new toy Acer Aspire REVO R3600 N230
CPU type GenuineIntel
Intel(R) Atom(TM) CPU 230 @ 1.60GHz [x86 Family 6 Model 28 Stepping 2]
Number of processors 2
Coprocessors NVIDIA ION (256MB) driver: 19107
Operating System Microsoft Windows Vista
Home Premium x86 Edition, Service Pack 2, (06.00.6002.00)
Memory 1790.45 MB
Cache 512 KB
Swap space 3831.9 MB
Total disk space 53.74 GB
Free Disk Space 26.28 GB
Measured floating point speed 678.2 million ops/sec
Measured integer speed 1630.76 million ops/sec
Average upload rate 2.29 KB/sec
Average download rate 35.76 KB/sec
Average turnaround time 1 days
Maximum daily WU quota per CPU 5000/day
~+~+~+~+~+~+~+~+~+~+~+~
BOINC shows:
07.11.2009 09:22:17      Processor: 2 GenuineIntel Intel(R) Atom(TM) CPU  230   @ 1.60GHz [x86 Family 6 Model 28 Stepping 2]
07.11.2009 09:22:17      Processor: 512.00 KB cache
07.11.2009 09:22:17      Processor features: fpu tsc pae nx sse sse2 pni mmx
07.11.2009 09:22:18      OS: Microsoft Windows Vista: Home Premium x86 Edition, Service Pack 2, (06.00.6002.00)
07.11.2009 09:22:18      Memory: 1.75 GB physical, 3.74 GB virtual
07.11.2009 09:22:18      Disk: 53.74 GB total, 26.58 GB free
07.11.2009 09:22:18      Local time is UTC +1 hours
07.11.2009 09:22:22      NVIDIA GPU 0: ION (driver version 19107, CUDA version 2030, compute capability 1.1, 256MB, 35 GFLOPS peak)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
the cpu has SSE3 and SSSE3 looked up with everest
lInformationsliste   Wert
CPU-Eigenschaften   
CPU Typ   Intel Atom 230, 1600 MHz (12 x 133)
CPU Bezeichnung   Diamondville-SC
CPU stepping   C0
Befehlssatz   x86, x86-64, MMX, SSE, SSE2, SSE3, SSSE3
Vorgesehene Taktung   1600 MHz
Min / Max CPU Multiplikator   6x / 12x
Engineering Sample   Nein
L1 Code Cache   32 KB
L1 Datencache   24 KB
L2 Cache   512 KB  (On-Die, ECC, ASC, Full-Speed)
   
Multi CPU   
Motherboard ID   nVidia MCP79
CPU #1   Intel(R) Atom(TM) CPU 230 @ 1.60GHz, 1600 MHz
CPU #2   Intel(R) Atom(TM) CPU 230 @ 1.60GHz, 1600 MHz
   
CPU Technische Informationen   
Gehäusetyp   437 Ball FC-BGA
Gehäusegröße   2.2 cm x 2.2 cm
Transistoren   47 Mio.
Fertigungstechnologie   45 nm, CMOS, Cu, High-K + Metal Gate
Gehäusefläche   25 mm2
Typische Leistung   4 W @ 1.60 GHz
   
CPU Hersteller   
Firmenname   Intel Corporation
Produktinformation   http://www.intel.com/products/processor
   
CPU Auslastung   
1. CPU / 1. HTT Einheit   100 %
1. CPU / 2. HTT Einheit   100 %
------------------------------------------------------------
it runs milkyway on cpu with 100%
Title: Re: optimized sources
Post by: _heinz on 10 Nov 2009, 08:15:49 pm
first wu running on the ION is up now:
Name 06ap07ah.419.444220.12.10.60_1
Workunit 528890775
Created 10 Nov 2009 10:41:42 UTC
Sent 10 Nov 2009 11:58:23 UTC
Received 11 Nov 2009 0:53:12 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 5173120
Report deadline 27 Dec 2009 11:00:02 UTC
Run time 16123.8202
CPU time 696.8253
stderr out <core_client_version>6.10.17</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : ION
           totalGlobalMem = 268435456
           sharedMemPerBlock = 16384
           regsPerBlock = 8192
           warpSize = 32
           memPitch = 262144
           maxThreadsPerBlock = 512
           clockRate = 1100000
           totalConstMem = 65536
           major = 1
           minor = 1
           textureAlignment = 256
           deviceOverlap = 0
           multiProcessorCount = 2
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: ION is okay
SETI@home using CUDA accelerated device ION
V12 modification by Raistmer
Priority of worker thread rised successfully
Priority of process adjusted successfully
Total GPU memory 268435456    free GPU memory 225898496
setiathome_enhanced 6.02 Visual Studio/Microsoft C++

Build features: Non-graphics   CUDA    VLAR autokill enabled    FFTW   USE_SSE   x86   
     CPUID:          Intel(R) Atom(TM) CPU  230   @ 1.60GHz

     Cache: L1=64K L2=512K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is :  0.419009
After app init: total GPU memory 268435456    free GPU memory 132575232

Flopcounter: 41353212868636.937000

Spike count:    4
Pulse count:    1
Triplet count:  0
Gaussian count: 1

Wall-clock time elapsed since last restart: 16115.8 seconds
class T_FFT<0>:   total=6.21e+006,   N=146818,   <>=42 (4.20e+001),   min=0 (0.00e+000)
class T_FFT<8>:   total=8.74e+002,   N=25,   <>=34 (3.40e+001),   min=31 (3.10e+001)
class T_FFT<16>:   total=1.21e+003,   N=51,   <>=23 (2.30e+001),   min=15 (1.50e+001)
class T_FFT<64>:   total=2.23e+003,   N=201,   <>=11 (1.10e+001),   min=0 (0.00e+000)
class T_FFT<256>:   total=2.18e+002,   N=807,   <>=0 (0.00e+000),   min=0 (0.00e+000)
class T_FFT<512>:   total=2.41e+004,   N=1613,   <>=14 (1.40e+001),   min=0 (0.00e+000)
class T_FFT<1024>:   total=4.15e+004,   N=3227,   <>=12 (1.20e+001),   min=0 (0.00e+000)
class T_FFT<2048>:   total=1.63e+005,   N=6455,   <>=25 (2.50e+001),   min=15 (1.50e+001)
class T_FFT<4096>:   total=3.70e+005,   N=12911,   <>=28 (2.80e+001),   min=15 (1.50e+001)
class T_FFT<8192>:   total=8.38e+005,   N=25821,   <>=32 (3.20e+001),   min=15 (1.50e+001)
called boinc_finish

</stderr_txt>
]]>
 
Validate state Valid
Claimed credit 113.332442307238
Granted credit 113.332442307238
application version Anonymous platform
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
temps are moderate cool
MB 45-47
CPU 65-69
CPUDiode 80-85
MCP 70-78
Aux 53-57
GPUDiode 65-73
Disk 45-46 grd celsius
~~~~~~~~~~~~~~~~
 :o
Title: Re: optimized sources
Post by: _heinz on 10 Nov 2009, 08:20:54 pm
New power supply has arrived now. This time I got a SuperFlower Crystal Plus 1000W.
My old one AXP 1000W (http://www.britta-d.de/bilder/server/page14.htm) was not available, so I got the Superflower (http://www.super-flower.de/index.php?id=46).
Tomorrow we will see if this woks with my D5400XS.
Good news,
V8-Xeon is up now.
runs 100% full load again.
 ;D
Title: Re: optimized sources
Post by: _heinz on 11 Nov 2009, 03:02:43 pm
one other test of the ION platform
run a colltz 2.03 cuda23 wu
~~~~~~~~~~~~~~~~~
Name collatz_1257918720_11237_1
Workunit 2761219
Created 11 Nov 2009 9:20:23 UTC
Sent 11 Nov 2009 9:24:43 UTC
Received 11 Nov 2009 19:50:16 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 8007
Report deadline 25 Nov 2009 9:24:43 UTC
Run time 31843.605201
CPU time 355.105
stderr out <core_client_version>6.10.17</core_client_version>
<![CDATA[
<stderr_txt>
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361500352612221561192
No checkpoint data found.
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361500352612221561192
Resuming from checkpoint ... done
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361500352612221561192
Resuming from checkpoint ... done
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361500352612221561192
Resuming from checkpoint ... done
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361500352612221561192
Resuming from checkpoint ... done
needed 1630 steps for 2361500352664408774747
104790454282383 total executed steps for 206158430208 numbers

WU completed.
GPU time: 7793.85 seconds
Elapsed time: 7817.19
called boinc_finish

</stderr_txt>
]]>
 
Validate state Initial
Claimed credit 0.52424516813916
Granted credit 0
application version collatz v2.03 (cuda23)
~~~~~~~~~~~~~~~~~~~~~~~~~
it shows it works.
not fast, but it does it.
 ;)
Title: Re: optimized sources
Post by: Claggy on 11 Nov 2009, 04:17:48 pm
That's still faster than my laptop's 128Mb 8400M GS,  :'(

Claggy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Name collatz_1257464231_21478_0
Workunit 2506649
Created 6 Nov 2009 13:23:30 UTC
Sent 6 Nov 2009 13:55:21 UTC
Received 11 Nov 2009 1:22:01 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 4183
Report deadline 20 Nov 2009 13:55:21 UTC
Run time 37431.816
CPU time 195.1105
stderr out <core_client_version>6.10.18</core_client_version>
<![CDATA[
<stderr_txt>
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361447870860643510632
No checkpoint data found.
needed 1661 steps for 2361447870870191297023
107168071604666 total executed steps for 206158430208 numbers

WU completed.
GPU time: 37416.6 seconds
Elapsed time: 37429
called boinc_finish

</stderr_txt>
]]>
 
Validate state Valid
Claimed credit 0.761649606321737
Granted credit 733.066365723493
application version collatz v2.03 (cuda23)
Title: Re: optimized sources
Post by: _heinz on 12 Nov 2009, 03:59:49 am
to explore the ION platform next collatz is up.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Name collatz_1257918720_12864_0
Workunit 2762846
Created 11 Nov 2009 9:51:19 UTC
Sent 11 Nov 2009 9:55:43 UTC
Received 12 Nov 2009 7:53:34 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 8007
Report deadline 25 Nov 2009 9:55:43 UTC
Run time 30484.7952
CPU time 268.8209
stderr out <core_client_version>6.10.17</core_client_version>
<![CDATA[
<stderr_txt>
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361500688031987509608
No checkpoint data found.
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361500688031987509608
Resuming from checkpoint ... done
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
   based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 206158430208 numbers starting with 2361500688031987509608
Resuming from checkpoint ... done
needed 1630 steps for 2361500688074235203339
108796353080243 total executed steps for 206158430208 numbers

WU completed.
GPU time: 25256.2 seconds
Elapsed time: 25284.1
called boinc_finish

</stderr_txt>
]]>
 
Validate state Initial
Claimed credit 0.396863062811901
Granted credit 0
application version collatz v2.03 (cuda23)
~~~~~~~~~~~~~~~~~~~~~~~~~
It runs like a charme, no problems on the machine.
no heat, no noise, you can do easy work with the machine, all no prob.
 :)
Title: Re: optimized sources
Post by: Gecko_R7 on 13 Nov 2009, 02:32:38 pm
Intel(R) Atom(TM) CPU 230 @ 1.60GHz [x86 Family 6 Model 28 Stepping 2]
Number of processors 2
Coprocessors NVIDIA ION (256MB) driver: 19107
Operating System Microsoft Windows Vista
Home Premium x86 Edition, Service Pack 2, (06.00.6002.00)
Memory 1790.45 MB
Cache 512 KB
Swap space 3831.9 MB
Total disk space 53.74 GB
Free Disk Space 26.28 GB
Measured floating point speed  678.2 million ops/sec
Measured integer speed       1630.76 million ops/sec

Average credit   202.93
CPU type   Power Macintosh
Power Macintosh [Power Macintosh Model PowerMac3,6] [AltiVec]
Number of processors   1
Operating System   Darwin
8.11.0
BOINC client version   6.6.36
Memory   1536 MB
Cache   976.56 KB
Measured floating point speed   802.04 million ops/sec
Measured integer speed          2505.06 million ops/sec

This is PPC G4 1.25Mhz.
RAC w/ MB v8 app is normally @ 235

Interesting to compare the reported BOINC benchs between this & Atom.
Though, I think Atom CPU will out-produce the G4 CPU w/ better incarnation of current -doze MB app vs. the 2 year old PPC v8 version.
Would also be great to see what a 30 day diet of Cuda processing produces.

Good things come in small packages.  ;D

Title: Re: optimized sources
Post by: _heinz on 20 Nov 2009, 04:59:51 pm
I started now with seti cuda app on the ION
if someone will lookup, the hostid is 5173120 (http://setiathome.berkeley.edu/results.php?hostid=5173120)
and here are the statistics (http://boincstats.com/stats/boinc_host_graph.php?pr=bo&id=2c331715ddea0590c336066226571b03)
just a astropulse wu 531699270 (http://setiathome.berkeley.edu/workunit.php?wuid=531699270) comes in and run on the cpu parallel to the cuda app.
modify:
some days later: the astropulse ended after ~78 hours
if you take a look on my wingman
(Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz) 203,739.83 sec
(Intel(R) Atom(TM) CPU 230 @ 1.60GHz )             279,898.58 sec
this is a reasonable value for the small cpu
Title: Re: optimized sources
Post by: _heinz on 22 Nov 2009, 07:47:15 pm
20 active task (http://www.britta-d.de/images/astropulse/20_active_tasks.jpg) under BOINC on the V8-Xeon
-----------------------------------------------------------
8 x akv8
11 x astropulse_BROOK_R278_nodouble
1 x collatz64_cal_2.06
-----------------------------------------------------------
V8-Xeon has still one grafikadapter RV670
Title: Re: optimized sources
Post by: _heinz on 26 Nov 2009, 05:24:52 pm
no work from seti on all my machines now.
running collatz and milkyway...
time to compile something for the ION platform  :)
Title: Re: optimized sources
Post by: _heinz on 01 Dec 2009, 04:50:24 pm
new NVIDIA driver 19562 CUDA version 3000 is now available for download...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
01.12.2009 22:29:27      NVIDIA GPU 0: ION (driver version 19562, CUDA version 3000, compute capability 1.1, 256MB, 35 GFLOPS peak)

this answers a lot of questions.
Title: Re: optimized sources
Post by: _heinz on 09 Dec 2009, 11:30:18 am
ION,
today I found some errors happened in cuda-app.
have a look at 1440732028 (http://setiathome.berkeley.edu/result.php?resultid=1440732028)
- exit code -1073741819 (0xc0000005)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work Unit Info:
...............
WU true angle range is :  0.408909
After app init: total GPU memory 268435456    free GPU memory 126873600
Cuda error 'cudaMallocArray( &dev_gauss_dof_lcgf_cache' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 417 : out of memory.
setiathome_CUDA: CUDA runtime ERROR in device memory allocation (Step 2 of 3). Falling back to HOST CPU processing...


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x726F662F read attempt to address 0x726F662F

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

10 others error'ed out by VLAR autokill
I got 7 new wu's, hope they will run
Title: Re: optimized sources
Post by: _heinz on 09 Dec 2009, 06:07:15 pm
V8-Xeon
BOINC 6.10.18
I have same problems with the latest BOINC version. If I click on Tasks then on Show Active Tasks and vice versa Boinc did not response. The windows closed by OS. This happened twice today.
on the machine are running:
8 x ak_v8b
2 x Milkyway
---------------------------------------------
A other temperature problem happened twice again. FBDIMM shows temps of 98 grd celsius and the FBDIMM-fan (OCZ) runs more than 4000 rpm and makes a lot of noise. (Looks like I need a new, better fan)
Normally the FBDIMM's have temps ~76 - 86 grd celsius.
At first I took off 4 slot-plates on the backside for better air circulation, this way the hot air of the passive cooled grafik-adapter RV670 can now go direct out backside and must not stream up to the additionlal backside fan.
I hope this helps a bit.
Additional I opened the window, outside are 5 grd at night.
Now temps are normal.
Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (ATI-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   53 °C  (127 °F)
CPU2   61 °C  (142 °F)
1. CPU / 1. Kern   50 °C  (122 °F)
1. CPU / 2. Kern   37 °C  (99 °F)
1. CPU / 3. Kern   47 °C  (117 °F)
1. CPU / 4. Kern   46 °C  (115 °F)
2. CPU / 1. Kern   44 °C  (111 °F)
2. CPU / 2. Kern   41 °C  (106 °F)
2. CPU / 3. Kern   43 °C  (109 °F)
2. CPU / 4. Kern   43 °C  (109 °F)
DIMM   78 °C  (172 °F)
GPU Diode   71 °C  (160 °F)
Temperatur 1   52 °C  (126 °F)
Temperatur 2   53 °C  (127 °F)
Temperatur 3   53 °C  (127 °F)
FB-DIMM1   86 °C  (187 °F)
FB-DIMM2   83 °C  (181 °F)
FB-DIMM3   78 °C  (172 °F)
FB-DIMM4   76 °C  (169 °F)
Seagate ST31000340NS   42 °C  (108 °F)
Seagate ST31000340NS   42 °C  (108 °F)
Seagate ST31000340NS   44 °C  (111 °F)
   
Kühllüfter   
CPU1   655 RPM
CPU2   649 RPM
North Bridge   4795 RPM
South Bridge   4087 RPM
DIMM   2963 RPM
Aux   983 RPM
Grafikprozessor (GPU)   100%
   
Spannungswerte   
CPU1 Kern   1.11 V
CPU2 Kern   1.11 V
+1.5 V   1.54 V
+3.3 V   3.35 V
+5 V   5.13 V
+12 V   12.31 V
FSB VTT   1.21 V
North Bridge Kern   1.25 V
DIMM   1.80 V
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CPU 100% and GPU 90% full power

 :)
Title: Re: optimized sources
Post by: _heinz on 10 Dec 2009, 10:07:07 am
after updating Everest Ultimate to the latest Beta version 5.30.1965 cpu show following:
Informationsliste   Wert
Prozessor Eigenschaften   
Hersteller   Intel(R) Corporation
Version   Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
Externer Takt   333 MHz
Maximaler Takt   4000 MHz
Aktueller Takt   2000 MHz
Typ   Central Processor
Spannung   1.6 V
Status   Aktiviert
Aufrüstung   Socket LGA771
Sockelbezeichnung   J6PR
---------------------------------------------------------------------------
wuhhh my cpu runs at 50% speed   :o
had seen this alredy a year ago, but later it was never shown again.
looks like there is some more to explore,
but first the temp issues must be solved. (ordered a new FB-DIMM cooler, Corsair Dominator Airflow Fan)
We know E5400 can run 4GHz as our friend Francois Piednoel from Intel had shown us.
Title: Re: optimized sources
Post by: _heinz on 11 Dec 2009, 02:40:52 pm
The "Corsair Dominator Airflow Fan" does not fit , its too high over the FB-DIMM and the fastening is not usable for me.
I must selfconstruct the fastening or I must look for any other one.
A try to run the FB-DIMM's still with my big 22 cm fans of the case has not the effect as a small FB-DIMM fan-cooler as following values show you:
Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (ATI-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   50 °C  (122 °F)
CPU2   62 °C  (144 °F)
1. CPU / 1. Kern   48 °C  (118 °F)
1. CPU / 2. Kern   36 °C  (97 °F)
1. CPU / 3. Kern   44 °C  (111 °F)
1. CPU / 4. Kern   43 °C  (109 °F)
2. CPU / 1. Kern   40 °C  (104 °F)
2. CPU / 2. Kern   37 °C  (99 °F)
2. CPU / 3. Kern   41 °C  (106 °F)
2. CPU / 4. Kern   41 °C  (106 °F)
DIMM   73 °C  (163 °F)
GPU Diode   70 °C  (158 °F)
Temperatur 1   48 °C  (118 °F)
Temperatur 2   46 °C  (115 °F)
Temperatur 3   49 °C  (120 °F)
FB-DIMM1   95 °C  (203 °F)
FB-DIMM2   102 °C  (216 °F)
FB-DIMM3   94 °C  (201 °F)
FB-DIMM4   86 °C  (187 °F)
Seagate ST31000340NS   34 °C  (93 °F)
Seagate ST31000340NS   33 °C  (91 °F)
Seagate ST31000340NS   35 °C  (95 °F)
   
Kühllüfter   
CPU1   643 RPM
CPU2   635 RPM
North Bridge   3492 RPM
South Bridge   4144 RPM
Aux   585 RPM
Grafikprozessor (GPU)   81%
   
Spannungswerte   
CPU1 Kern   1.10 V
CPU2 Kern   1.11 V
+1.5 V   1.54 V
+3.3 V   3.35 V
+5 V   5.13 V
+12 V   12.31 V
FSB VTT   1.21 V
North Bridge Kern   1.25 V
DIMM   1.80 V
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 x collatz
8 x akv8b

102 grd celsius on a FB-DIMM is really hot...  :o
if no work for all 8 cores (cpu usage 1%) temps are:
FB-DIMM1   70 °C  (158 °F)
FB-DIMM2   73 °C  (163 °F)
FB-DIMM3   66 °C  (151 °F)
FB-DIMM4   62 °C  (144 °F)
thats a lot, so we have no big room upwards to max 85 grd celsius (this temp shows normally with 8 x akv8)
-------------------------------------------------------------------------------------
if you have FB-DIMMS on your server and you have some good ideas to cool them down, so let me know.

regards  ;D
Title: Re: optimized sources
Post by: _heinz on 11 Dec 2009, 05:34:19 pm
Would also be great to see what a 30 day diet of Cuda processing produces.

Today a month is over, 2 days the machine was used to play, install software and show videos...

First seen on 2009-11-08 06:38:13
 
CPU Intel(R) Atom(tm) CPU 230 @ 1.60GHz
Number of CPU's (number of (virtual) cores) 1(2)
Operating System and version Microsoft Windows Vista
 
Current Credit (based on incremental update) 51,391.52  
BOINC World position based on credit (based on incremental update) 534,061
Recent average credit RAC (projects accumulated) 1,901.33990  
Recent average credit RAC (according to BOINCstats) 1,554.07922
Average credit per CPU second 0.027428
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
here is the full statistic: Host ID "6187800" (http://boincstats.com/stats/boinc_host_graph.php?pr=bo&id=2c331715ddea0590c336066226571b03)

summary we can say it crunches ~50 000 credits per month
and get a RAC of 1900 - 2000 depending on project
have fun  ;D
Title: Re: optimized sources
Post by: Gecko_R7 on 11 Dec 2009, 06:44:45 pm
Would also be great to see what a 30 day diet of Cuda processing produces.

Today a month is over, 2 days the machine was used to play, install software and show videos...

First seen on 2009-11-08 06:38:13
 
CPU Intel(R) Atom(tm) CPU 230 @ 1.60GHz
Number of CPU's (number of (virtual) cores) 1(2)
Operating System and version Microsoft Windows Vista
 
Current Credit (based on incremental update) 51,391.52  
BOINC World position based on credit (based on incremental update) 534,061
Recent average credit RAC (projects accumulated) 1,901.33990  
Recent average credit RAC (according to BOINCstats) 1,554.07922
Average credit per CPU second 0.027428
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
here is the full statistic: Host ID "6187800" (http://boincstats.com/stats/boinc_host_graph.php?pr=bo&id=2c331715ddea0590c336066226571b03)

summary we can say it crunches ~50 000 credits per month
and get a RAC of 1900 - 2000 depending on project
have fun  ;D

Thanks Heinz.
That's pretty respectable performance.
Title: Re: optimized sources
Post by: _heinz on 11 Dec 2009, 08:45:12 pm
if you have FB-DIMMS on your server and you have some good ideas to cool them down, so let me know.

regards  ;D
looking around I found in inquirer FB-DIMM with Heatpipe cooling (http://www.theinquirer.net/inquirer/news/1009915/first-inqpressions-kingston) a very interesting review.
I ordered now GEIL EVO Cyclone MemoryCoolingSystem (http://www.kmelektronik.de/shop/index.php?show=product_info&ArtNr=22905)
the mounting kit looks a little bit better, varying the high is important and 5000 rpm of the fan
Title: Re: optimized sources
Post by: _heinz on 12 Dec 2009, 01:58:06 pm
ak_v8b and temps of FBDIMM
~~~~~~~~~~~~~~~~~~~
it is running:
4 docking
4 ak_v8b
2 collatz
roomtemp= 20 grd celsius
~~~~~~~~~~~~~~~~
Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (ATI-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   49 °C  (120 °F)
CPU2   55 °C  (131 °F)
1. CPU / 1. Kern   48 °C  (118 °F)
1. CPU / 2. Kern   36 °C  (97 °F)
1. CPU / 3. Kern   44 °C  (111 °F)
1. CPU / 4. Kern   43 °C  (109 °F)
2. CPU / 1. Kern   40 °C  (104 °F)
2. CPU / 2. Kern   36 °C  (97 °F)
2. CPU / 3. Kern   39 °C  (102 °F)
2. CPU / 4. Kern   40 °C  (104 °F)
DIMM   71 °C  (160 °F)
GPU Diode   64 °C  (147 °F)
Temperatur 1   47 °C  (117 °F)
Temperatur 2   46 °C  (115 °F)
Temperatur 3   49 °C  (120 °F)
FB-DIMM1   76 °C  (169 °F)
FB-DIMM2   76 °C  (169 °F)
FB-DIMM3   68 °C  (154 °F)
FB-DIMM4   66 °C  (151 °F)
Seagate ST31000340NS   33 °C  (91 °F)
Seagate ST31000340NS   32 °C  (90 °F)
Seagate ST31000340NS   34 °C  (93 °F)
   
Kühllüfter   
CPU1   641 RPM
CPU2   637 RPM
North Bridge   3056 RPM
South Bridge   4122 RPM
DIMM   2512 RPM
Aux   587 RPM
Grafikprozessor (GPU)   90%
   
If I'm running 4  docking and 4 ak_v8b apps temp of the FB-DIMM are moderate 76 grd celsius
-----------------------------------------------------------------------------------------------------------------------------------
If I'm running 8 docking FB-DIMM is 74 grd celsius,
but if I'm running 8 ak_v8b the temp of the FBDIMM boosted from 74 grd to 87-89  grd celsius (in hiver) in summer you had to add 10 grd to it, then we have 97 to 99 grd...as last summer and last year...
ak-v8b is a toaster  ::)....ready to cook my breakfast eggs.....  :o
Title: Re: optimized sources
Post by: _heinz on 15 Dec 2009, 05:09:19 pm
if the air cooling system works properly we have following values:
roomtemp: 21,5 grd celsius
--------------------------------------
Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (ATI-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   56 °C  (133 °F)
CPU2   60 °C  (140 °F)
1. CPU / 1. Kern   52 °C  (126 °F)
1. CPU / 2. Kern   43 °C  (109 °F)
1. CPU / 3. Kern   49 °C  (120 °F)
1. CPU / 4. Kern   49 °C  (120 °F)
2. CPU / 1. Kern   40 °C  (104 °F)
2. CPU / 2. Kern   37 °C  (99 °F)
2. CPU / 3. Kern   41 °C  (106 °F)
2. CPU / 4. Kern   41 °C  (106 °F)
DIMM   77 °C  (171 °F)
GPU Diode   70 °C  (158 °F)
Temperatur 1   48 °C  (118 °F)
Temperatur 2   49 °C  (120 °F)
Temperatur 3   54 °C  (129 °F)
FB-DIMM1   87 °C  (189 °F)
FB-DIMM2   86 °C  (187 °F)
FB-DIMM3   78 °C  (172 °F)
FB-DIMM4   74 °C  (165 °F)
Seagate ST31000340NS   35 °C  (95 °F)
Seagate ST31000340NS   33 °C  (91 °F)
Seagate ST31000340NS   33 °C  (91 °F)
   
Kühllüfter   
CPU1   659 RPM
CPU2   639 RPM
North Bridge   4556 RPM
South Bridge   4186 RPM
DIMM   3125 RPM
Aux   726 RPM
Grafikprozessor (GPU)   90%
~~~~~~~~~~~~~~~~~~~~~~~~~~~
running:
8 x akv8b
2 x Collatz
See the difference to a mixed run.
We have 10 grd celsius more with 8 akv8b.

I'm still waiting for the GEIL EVO Cyclone MemoryCoolingSystem....
Title: Re: optimized sources
Post by: _heinz on 16 Dec 2009, 05:12:59 am
a light OC of the RV670 bring us a additional 22 GFLOPS
16.12.2009 00:36:33      ATI GPU 0: ATI Radeon HD 3800 (RV670) (CAL version 1.4.467, 512MB, 522 GFLOPS peak)
its running 2 x collatz (together with 8x akv8b on CPU)
roomtemp=21 grd celsius
~~~~~~~~~~~~~~~~~~~~~~~~~~
GPU Diode   70 °C  (158 °F)
Temperatur 1   49 °C  (120 °F)
Temperatur 2   49 °C  (120 °F)
Temperatur 3   53 °C  (127 °F)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Informationsliste   Wert
Grafikprozessor Eigenschaften   
Grafikkarte   Sapphire Radeon HD 3870 512MB GDDR4
BIOS Version   010.077.000.000.000000
BIOS Datum   01/21/08 08:48
GPU Codename   RV670 XT
Teilenummer   11X-1E620A-100
PCI-Geräte   1002-9501 / 174B-E620  (Rev 00)
Transistoren   666 Mio.
Fertigungstechnologie   55 nm
Gehäusefläche   192 mm2
Bustyp   PCI Express 2.0 x16 @ x16
Speichergröße   512 MB
GPU Takt   810 MHz  (Original: 776 MHz, overclock: 4%)
RAMDAC Takt   400 MHz
Pixel Pipelines   16
Textureinheiten (TMU) / Pipeline   1
Unified Shaders   320  (v4.1)
DirectX Hardwareunterstützung   DirectX v10.1
Pixel Füllrate   12960 MPixel/s
   
Speicherbus-Eigenschaften   
Bustyp   GDDR4
Busbreite   256 Bit
Tatsächlicher Takt   1233 MHz (DDR)  (Original: 1126 MHz, overclock: 10%)
Effektiver Takt   2466 MHz
Bandbreite   77.1 GB/s
   
Verschiedenes   
Auslastung   97%
   
ATI PowerPlay (BIOS)   
State #1   Grafikprozessor (GPU): 777 MHz, Speicher: 1126 MHz  (Boot)
State #2   Grafikprozessor (GPU): 776 MHz, Speicher: 1126 MHz  (OverDrive)
State #3   Grafikprozessor (GPU): 776 MHz, Speicher: 1126 MHz
State #4   Grafikprozessor (GPU): 776 MHz, Speicher: 1126 MHz  (UVD)
   
Grafikprozessorhersteller   
Firmenname   Advanced Micro Devices, Inc.
Produktinformation   http://ati.amd.com/products/home-office.html
Treiberdownload   http://game.amd.com/us-en/drivers_catalyst.aspx
Treiberupdate   http://driveragent.com?ref=59
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OK, maybe you have seen higher OC values, but this RV670 has passive cooling...so I'm happy with it.  ;)
Title: Re: optimized sources
Post by: _heinz on 16 Dec 2009, 05:07:10 pm
"GEIL EVO Cyclone Memory Cooling System" is built in now.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Text from the package:
World' 1st Memory Cooling fan with REAL-time System Information displayed through LED!
The GEIL EVO Cyclone Memeory Cooling System features the world's 1st memory cooling fan with LED display, embedded with thermal sensor, indicating surrounding temperature and real-time fan RPM.
OPTIMIZED COOLING TECHNOLOGY
Designed with airflow and heat dissipation, the metal fan bracket of the EVO Cyclone come with dual sets of cooling fin on both sides of the fan and air ducts underneath providing meximun cooling power for your high performance memory modules!
Features and Specification:
- Dimension [LxWxH]: 146*52,6*106mm
- Net Weght: about 135g
- Fan Airflow: 4,04CFM
- Fan Speed: 3400 +-10RPM
-Fan Diameter: 50mm
- Fan Life Expectancy: 25000Hours continous operating(25grd celsius, 65%RH)
- Connector: 3 Pin
- Fan Voltage: 12V
- Universal heat sink bracket and clip design
- Embedded thermal sensor
- Logo Fan with LED Display of: Company Name, Product Name, real time fan RPM and surrounding temperature
- 1 Year Manufacture Warranty
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
roomtemp=21 grd celsius
running:
8 x akv8b
2 x collatz
-----------------------------------
Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (ATI-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   56 °C  (133 °F)
CPU2   64 °C  (147 °F)
1. CPU / 1. Kern   52 °C  (126 °F)
1. CPU / 2. Kern   42 °C  (108 °F)
1. CPU / 3. Kern   49 °C  (120 °F)
1. CPU / 4. Kern   49 °C  (120 °F)
2. CPU / 1. Kern   43 °C  (109 °F)
2. CPU / 2. Kern   41 °C  (106 °F)
2. CPU / 3. Kern   43 °C  (109 °F)
2. CPU / 4. Kern   43 °C  (109 °F)
DIMM   79 °C  (174 °F)
GPU Diode   74 °C  (165 °F)
Temperatur 1   51 °C  (124 °F)
Temperatur 2   51 °C  (124 °F)
Temperatur 3   54 °C  (129 °F)
FB-DIMM1   94 °C  (201 °F)
FB-DIMM2   92 °C  (198 °F)
FB-DIMM3   88 °C  (190 °F)
FB-DIMM4   85 °C  (185 °F)
Seagate ST31000340NS   35 °C  (95 °F)
Seagate ST31000340NS   34 °C  (93 °F)
Seagate ST31000340NS   35 °C  (95 °F)
   
Kühllüfter   
CPU1   657 RPM
CPU2   650 RPM
North Bridge   4808 RPM
South Bridge   4157 RPM
DIMM   4212 RPM
Aux   850 RPM
Grafikprozessor (GPU)   100%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As we can see the temperatures of FBDIMM1 and FBDIMM2 are 94 / 92 grd thats 7 grd higher than OCZ cooling system.
We see further that DIMM3 and DIMM4 have now 10 grd more than the OCZ system.
This means the one 5cm cooler is not enough to cool the FBDIMMS down, although the fan runs 4212RPM, thats 800RPM over its specification, very well, but its not enough....
Further we see my inner sensor which is placed between DIMM1 and DIMM2 shows 74 grd , with OCZ it shows 67 grd.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The EVO shows 41 grd and 4286 RPM on its LED Display. (After one and a half hour operation)
Out of my view the EVO Cyclone is more a funny kit than a very good cooling solution.   
Pitty, it fits perfect over the FBDIMMs.
And now the upper x16 grafic-slot are usable.
Here are some pictures:
EVO Cyclone (http://www.britta-d.de/images/evo/evo1.jpg) mounted.
EVO Cyclone with additional fin-plate bottom (http://www.britta-d.de/images/evo/evo2.jpg) for better airstream and that the fan not getting hot air from the graphic adapter RV670.
I mounted the additional finplate bottom(white) for better air-streaming from the fan sitting right bottom before the big cooler of the RV670. As you can see on left side 4 slot-plates are open now for better airstream.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now after 4 hours and switch on a additional 28cm case-fan near frontside temps are:
DIMM   79 °C  (174 °F)
FB-DIMM1   92 °C  (198 °F)
FB-DIMM2   88 °C  (190 °F)
FB-DIMM3   86 °C  (187 °F)
FB-DIMM4   81 °C  (178 °F)

CPU1   662 RPM
CPU2   638 RPM
North Bridge   4900 RPM
South Bridge   4182 RPM
DIMM   4231 RPM
Aux   722 RPM

The EVO shows 35 grd C,   4136 RPM  on its LED
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OK, I will run this EVO Cyclone cooler, but I must search for a better more effective cooling solution.
Lets everybody make his own conclusion.  ;)

modify EVO:
EVO with tube (http://www.britta-d.de/images/evo/evo3.jpg)
EVO look into tube (http://www.britta-d.de/images/evo/evo4.jpg)
The EVO shows 24 grd C,   4050 RPM  on its LED
DIMM   79 °C  (174 °F)
FB-DIMM1   88 °C  (190 °F)
FB-DIMM2   83 °C  (181 °F)
FB-DIMM3   80 °C  (176 °F)
FB-DIMM4   80 °C  (176 °F)

CPU1   659 RPM
CPU2   652 RPM
North Bridge   4936 RPM
South Bridge   4157 RPM
DIMM   4157 RPM
Aux   935 RPM
~~~~~~~~~~~~~~~~~~~~~~~
my modification shows effect, from 92 down to 88 grd. (Case open)
I construct a tube, so cool air from outside the case can now come in.
Title: Re: optimized sources
Post by: Raistmer on 16 Dec 2009, 06:10:17 pm
>10C difference between edge memory slots?! Too big gradient IMO to call this cooling system good one.
Title: Re: optimized sources
Post by: _heinz on 16 Dec 2009, 06:45:22 pm
>10C difference between edge memory slots?! Too big gradient IMO to call this cooling system good one.

you are right, its more a funny kit than a good cooling solution.  :(
Title: Re: optimized sources
Post by: Urs Echternacht on 16 Dec 2009, 08:11:59 pm
Quote
OK, I will run this EVO Cyclone cooler, but I must search for a better more effective cooling solution.
Lets everybody make his own conclusion.   ;)

_heinz,
on such a "hot" source a little H2O (http://shop.aquacomputer.de/index.php?language=de&cPath=7_11_18) sometimes will be up to the job.  ;)
Title: Re: optimized sources
Post by: _heinz on 19 Dec 2009, 06:57:32 pm
Hi
a big thank you to all users and anonymous readers of this epic thread, we have today 66000 Views, really a nice number.
@Urs, thanks for the link, very interesting...

Regards   ;)


Title: Re: optimized sources
Post by: _heinz on 20 Dec 2009, 11:09:15 am
modify: picture tunneling, longer tube, better values
A full tunneling (http://www.britta-d.de/images/evo/evo5.jpg) of the passive RV670 transport the heat out of the case now. (its all paper-maché to see if it works, will make it later from any plastic thermo-isolated material)
The tube is now 21 cm. Case is open.
roomtemp 21 grd celsius
running:
8 x akv8
2 x collatz
~~~~~~~~~~~~~~~~~~~~
Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (ATI-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   55 °C  (131 °F)
CPU2   53 °C  (127 °F)
1. CPU / 1. Kern   51 °C  (124 °F)
1. CPU / 2. Kern   42 °C  (108 °F)
1. CPU / 3. Kern   48 °C  (118 °F)
1. CPU / 4. Kern   47 °C  (117 °F)
2. CPU / 1. Kern   46 °C  (115 °F)
2. CPU / 2. Kern   43 °C  (109 °F)
2. CPU / 3. Kern   45 °C  (113 °F)
2. CPU / 4. Kern   46 °C  (115 °F)
DIMM   77 °C  (171 °F)
GPU Diode   74 °C  (165 °F)
Temperatur 1   53 °C  (127 °F)
Temperatur 2   52 °C  (126 °F)
Temperatur 3   53 °C  (127 °F)
FB-DIMM1   78 °C  (172 °F)
FB-DIMM2   72 °C  (162 °F)
FB-DIMM3   70 °C  (158 °F)
FB-DIMM4   70 °C  (158 °F)
Seagate ST31000340NS   41 °C  (106 °F)
Seagate ST31000340NS   42 °C  (108 °F)
Seagate ST31000340NS   44 °C  (111 °F)
   
Kühllüfter   
CPU1   663 RPM
CPU2   657 RPM
North Bridge   4553 RPM
South Bridge   4160 RPM
DIMM   4115 RPM
Aux   946 RPM
Grafikprozessor (GPU)   100%
~~~~~~~~~~~~~~~~~~~~~~~~~   
The EVO shows 20 grd and 3972 RPM
This works now as recommended.    ;)
~~~~~~~~~~~~~~~~~~~~~~~~~
A better solution would be a cooling-unit with 3 Fans a 5*5 *1cm running 5000 RPM, complete in a flexible tube so that it can get cool fresh air from outside the case.
Conclusion: The EVO Cyclone without modification is not qualified to cool the FB-DIMMS down.
 ;)
Title: Re: optimized sources
Post by: Pappa on 20 Dec 2009, 05:47:16 pm
The silly question I have is the fan blowing Down onto the cooling fins or UP pulling case air through the fins...

So looking at the pictures you have posted you have what looks like a dead space for air movement in the area of the RAM. What I think I would do is install one of the fans that covers a "blade slot" in the back of the case to exhaust from that area and insure that the RAM Fan is pulling outside air (forcing down onto to the cooling fins). Then you are creating specific airflow for just the RAM.

Title: Re: optimized sources
Post by: _heinz on 20 Dec 2009, 06:39:35 pm
The silly question I have is the fan blowing Down onto the cooling fins or UP pulling case air through the fins...

So looking at the pictures you have posted you have what looks like a dead space for air movement in the area of the RAM. What I think I would do is install one of the fans that covers a "blade slot" in the back of the case to exhaust from that area and insure that the RAM Fan is pulling outside air (forcing down onto to the cooling fins). Then you are creating specific airflow for just the RAM.


The fan in the tube is blowing the cool air from outside onto the FB-DIMMs. The fan does not blow on the cooling fins left and right. These cooling fins goes still under the fan along. When the fan blows the air on these fins under it, they cool down, but this has no big effect if the case temp is 40 grd celsius. (Therefore I must the case open now).
Where the airducts are I could not see exactly.
Remember to our first try the EVO shows 41 grd and 4286 RPM on its LED Display. (see above)
Right on the paper-mache tunnel of the RV670 is a big fan, it blows cool air from right through the tunnel to three open blade slots backside.
The case has 2 fans 22cm diameter, each with separate switch and variable RPM regulator.
If I close the case and switch the two fans on, a lot of cool air blows in, but the temperature of the FBDIMMS are going upwards.
See our sample above, to run the machine without special FB-DIMM fan-cooler, temp goes up to 102 grd celsius.
Its very mysterious, but true, the two big fans disturb the inner airflow. I had them always off.

modify: Review from InsideHW  GEIL EVO Cyclone Memory Cooler (http://www.insidehw.com/Reviews/Cooling/GeIL-EVO-Cyclone-Memory-Cooler.html)
Title: Re: optimized sources
Post by: Pappa on 20 Dec 2009, 09:29:36 pm
The hard part is every case is designed to "look nice" for the user to impress their friends. Other than a few, it was not designed to move air to cool an extremme machine. So it is known that there will be dead spots that air does not move well.
Otherwise we would see a BIG Honking FAN in the Front with a Grill on every machine.
Your drive cage prevents moving air Front to Back which would cool the Chipset, RAM and Hard Drives.

What I would look at is if you have two of the 5.24 bays open in the top to mount a 120mm Fan just push air into the case. If under the plastic front of the case is mounting area for an 80mm fan put one there (poke holes). The idea would be to force as much air as possible. Then fans that exhaust air out of the case will work more effectively.

One might consider mounting another fan at the exhaust of the power supply to help pull more air out of the case.
Title: Re: optimized sources
Post by: _heinz on 21 Dec 2009, 06:53:17 pm
Thanks Papa

Nothing did help, my tunnel for the RV670, the tube for fresh air, a plate for airstream.
The REVO Cyclone can not cooling the FB-DIMM's down when I running 8 x akv8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
roomtemp: 21 grd celsius
running:
8 x akv8
2 x collatz

Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (ATI-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   57 °C  (135 °F)
CPU2   60 °C  (140 °F)
1. CPU / 1. Kern   53 °C  (127 °F)
1. CPU / 2. Kern   43 °C  (109 °F)
1. CPU / 3. Kern   49 °C  (120 °F)
1. CPU / 4. Kern   49 °C  (120 °F)
2. CPU / 1. Kern   47 °C  (117 °F)
2. CPU / 2. Kern   43 °C  (109 °F)
2. CPU / 3. Kern   45 °C  (113 °F)
2. CPU / 4. Kern   46 °C  (115 °F)
DIMM   79 °C  (174 °F)
GPU Diode   77 °C  (171 °F)
Temperatur 1   54 °C  (129 °F)
Temperatur 2   54 °C  (129 °F)
Temperatur 3   55 °C  (131 °F)
FB-DIMM1   92 °C  (198 °F)
FB-DIMM2   86 °C  (187 °F)
FB-DIMM3   82 °C  (180 °F)
FB-DIMM4   81 °C  (178 °F)
Seagate ST31000340NS   42 °C  (108 °F)
Seagate ST31000340NS   40 °C  (104 °F)
Seagate ST31000340NS   42 °C  (108 °F)
   
Kühllüfter   
CPU1   668 RPM
CPU2   655 RPM
North Bridge   4995 RPM
South Bridge   4182 RPM
DIMM   4285 RPM
Aux   1037 RPM
Grafikprozessor (GPU)   100%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

full report with different runs is attached

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Pappa on 21 Dec 2009, 11:37:08 pm
imagine rather than a tube a "Funnel" large enough to mount a 120mm fan down to the memory fan.

I would almost be tempted to try a 120 mm fan with the larger airflow capacity. It is three times as much as the 55mm fan on the ram cooler. Teh funnel changing from 120mm to 55mm should cause teh air to accerate faster as it passes teh RMA ro carry heat away more effeciently.

Title: Re: optimized sources
Post by: _heinz on 23 Dec 2009, 06:38:03 pm
Thanks Papa for your reply.
Till I have done it, I run now a mixed work of wu's  to hold temps a little down.
4 x akv8,
4 x docking,
2 x collatz
~~~~~~~~~~~~~~~~
DIMM   78 °C  (172 °F)
FB-DIMM1   80 °C  (176 °F)
FB-DIMM2   75 °C  (167 °F)
FB-DIMM3   72 °C  (162 °F)
FB-DIMM4   69 °C  (156 °F)
DIMM   4163 RPM
-------------------------------------
roomtemp= 22 °C
EVO shows 24 °C and 4025 RPM on its LED display
Case is open.
 ;)

Title: Re: optimized sources
Post by: _heinz on 02 Jan 2010, 07:15:03 pm
no work from seti on all my machines now.
running collatz and milkyway...
time to compile something for the ION platform  :)
1>AP SSE3ATOM Win32 (Microsoft VC++ Environment)
1>Post Build revision number extraction
1>
1>APREV IS 298
1>Renaming Output Files
1>
1>Build log was saved at "file://C:\I\SC\apwk\astropulse\client\WinBuild\ICC11_2k8\Win32\Output_ext\ap_client\AP SSE3_ATOM\Intermediate\BuildLog.htm"
1>ap_client - 0 error(s), 0 warning(s)
========== Alles neu erstellen: 1 erfolgreich, Fehler bei 0, 0 übersprungen ==========

Compiled a special build of Astropulse for the ATOM Processor
Title: Re: optimized sources
Post by: Raistmer on 03 Jan 2010, 10:03:07 am
could you attach it? I have Atom-based netbook so can test if there is any speed differencies.
Title: Re: optimized sources
Post by: Jason G on 03 Jan 2010, 10:12:58 am
Go to the development thread  ;)
Title: Re: optimized sources
Post by: _heinz on 03 Jan 2010, 02:41:58 pm
for all others who have no access to the developer area:

What we can expect ?

testrun against our latest public published astropulse  ap_5.05r168_SSE3.exe
Quick timetable
 
WU : ap_18se08aa_B6_P1_00046_1LC25.wu
ap_5.05r168_SSE3.exe : 410.609 secs CPU
ap_5.05r303_SSE3_ICC_ATOM.exe : 340.719 secs CPU
Speedup     : 17.02%
Ratio       : 1.21 x
 
WU : Raistmer's_tiny.wu
ap_5.05r168_SSE3.exe : 150.547 secs CPU
ap_5.05r303_SSE3_ICC_ATOM.exe : 144.781 secs CPU
Speedup     : 3.83%
Ratio       : 1.04 x
 
WU : sigind_v5.wu
ap_5.05r168_SSE3.exe : 912.922 secs CPU
ap_5.05r303_SSE3_ICC_ATOM.exe : 727.484 secs CPU
Speedup     : 20.31%
Ratio       : 1.25 x

All results strongly similar.
This is basis for our further optimization process.
Some more in the developer area.
 ;)
Title: Re: optimized sources
Post by: KarVi on 03 Jan 2010, 06:34:36 pm
I have been running the various R303 SSE3 builds on my Phenom.

Strangely enough none of the Atom builds work proberly, allthough they are SSE3, and should be compatible.

Some quick results:

Sigind.wu

ap_5.05r168_SSE3.exe : 845.703 secs CPU
ap_5.05r293_SSE.exe : 775.766 secs CPU
ap_5.05r303_SSE3_ICC_Qopt.exe : 694.078 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 17.031 secs CPU
ap_5.05r303_SSE3_ICC_ATOM.exe : 0.031 secs CPU

The first 3 give strongly similar, the last 2 clearly don't.

But still 303_SSE3 is much faster than r168_SSE3 and 293_SSE.

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Jason G on 03 Jan 2010, 06:42:58 pm
Strangely enough none of the Atom builds work proberly, allthough they are SSE3, and should be compatible.
...
  'Should be', though I believe ATOM has an extra instruction (MOVBE) which is available in our 45nM Core2's (at least)... So ATOM builds are really ATOM specific, though they should run on later Intels OK. (The SSE3 Qopt one uses the generic SSE3 options ... nice to know it works on Phenom II)
Title: Re: optimized sources
Post by: KarVi on 03 Jan 2010, 06:47:44 pm
Thats OK, but then they should be marked (S)SSE3 or something to that effect, since they are not really SSE3 compatible.

But I do like the improvements of the real SSE3 build, its all very promising :)
Title: Re: optimized sources
Post by: Jason G on 03 Jan 2010, 06:53:45 pm
Thats OK, but then they should be marked (S)SSE3 or something to that effect, since they are not really SSE3 compatible.

I agree, that's why my ATOM one doesn't have SSE3 in the name  ;D
Title: Re: optimized sources
Post by: Gecko_R7 on 04 Jan 2010, 09:15:30 am
FYI, started running a full bench suite on Atom N270 last eve via both Atom compiles of 303 apps.
Taking a while, but working fine.

Will upload result file when finished today.

FWIW, the netbook runs great w/ Win7 + 2GB.  I also did the obligatory system optimizing/uninstalls and service/process pruning.
Using latest google chrome as primary browser.  Very snappy & responsive for browsing and normal activities.
Am pretty impressed with the little guy.
It's kinda cute  :P  Should work well for my folks.
Title: Re: optimized sources
Post by: Raistmer on 04 Jan 2010, 04:52:49 pm
still leaved damned Vista on my netbook so can't say it works fast, but Chrome is nice brower indeed. Started to use it on netbook firstly and now switched to it on my home desktop too :)

And for record: Atom is SSSE3-compatible CPU. In lacks x64 mode and SSE4.* only.
Title: Re: optimized sources
Post by: Gecko_R7 on 04 Jan 2010, 05:12:22 pm
still leaved damned Vista on my netbook so can't say it works fast, but Chrome is nice brower indeed. Started to use it on netbook firstly and now switched to it on my home desktop too :)

And for record: Atom is SSSE3-compatible CPU. In lacks x64 mode and SSE4.* only.

re: SSSE3.  Funny you mention that.  Yes, it supports it, but Intel associates their Atom-specific compiler switch as -xSSE3_ATOM.
It's supposed to make changes better suited for in-order execution processing of Atom.
Wonder why they attached that to SSE3 vs. SSSE3 instruction set?  :-\
Title: Re: optimized sources
Post by: _heinz on 04 Jan 2010, 05:40:09 pm
a short look for all who are interested
Informationsliste   Wert
CPU-Eigenschaften   
CPU Typ   Intel Atom 230, 1600 MHz (12 x 133)
CPU Bezeichnung   Diamondville-SC
CPU stepping   C0
Befehlssatz   x86, x86-64, MMX, SSE, SSE2, SSE3, SSSE3
Vorgesehene Taktung   1600 MHz
Min / Max CPU Multiplikator   6x / 12x
Engineering Sample   Nein
L1 Code Cache   32 KB
L1 Datencache   24 KB
L2 Cache   512 KB  (On-Die, ECC, ASC, Full-Speed)
   
Multi CPU   
Motherboard ID   nVidia MCP79
CPU #1   Intel(R) Atom(TM) CPU 230 @ 1.60GHz, 1600 MHz
CPU #2   Intel(R) Atom(TM) CPU 230 @ 1.60GHz, 1600 MHz
   
CPU Technische Informationen   
Gehäusetyp   437 Ball FC-BGA
Gehäusegröße   2.2 cm x 2.2 cm
Transistoren   47 Mio.
Fertigungstechnologie   45 nm, CMOS, Cu, High-K + Metal Gate
Gehäusefläche   25 mm2
Typische Leistung   4 W @ 1.60 GHz
   
CPU Hersteller   
Firmenname   Intel Corporation
Produktinformation   http://www.intel.com/products/processor
   
CPU Auslastung   
1. CPU / 1. HTT Einheit   6 %
1. CPU / 2. HTT Einheit   6 %
~~~~~~~~~~~~~~~~~~~~~~~
Informationsliste   Wert
CPUID Eigenschaften   
CPUID Hersteller   GenuineIntel
CPUID CPU Name   Intel(R) Atom(TM) CPU 230 @ 1.60GHz
CPUID Revision   000106C2h
IA Markenzeichen ID   00h  (Unbekannt)
Plattform ID   E1h / MC 04h  (FCBGA8)
Microcode Update Revision   212
HTT / CMP Einheiten   2 / 1
Tjmax Temperatur   125 °C  (257 °F)
   
Befehlssatz   
64-bit x86-Erweiterung (AMD64, Intel64)   Unterstützt
AMD 3DNow!   Nicht unterstützt
AMD 3DNow! Professional   Nicht unterstützt
AMD 3DNowPrefetch   Nicht unterstützt
AMD Enhanced 3DNow!   Nicht unterstützt
AMD Extended MMX   Nicht unterstützt
AMD MisAligned SSE   Nicht unterstützt
AMD SSE4A   Nicht unterstützt
AMD SSE5   Nicht unterstützt
Cyrix Extended MMX   Nicht unterstützt
IA-64   Nicht unterstützt
IA MMX   Unterstützt
IA SSE   Unterstützt
IA SSE 2   Unterstützt
IA SSE 3   Unterstützt
IA Supplemental SSE 3   Unterstützt
IA SSE 4.1   Nicht unterstützt
IA SSE 4.2   Nicht unterstützt
IA AVX   Nicht unterstützt
IA FMA   Nicht unterstützt
IA AES Extensions   Nicht unterstützt
VIA Alternate Instruction Set   Nicht unterstützt
CLFLUSH Befehl   Unterstützt
CMPXCHG8B Befehl   Unterstützt
CMPXCHG16B Befehl   Unterstützt
Conditional Move Befehl   Unterstützt
LZCNT Befehl   Nicht unterstützt
MONITOR / MWAIT Befehl   Unterstützt
MOVBE Befehl   Unterstützt
PCLMULQDQ Befehl   Nicht unterstützt
POPCNT Befehl   Nicht unterstützt
RDTSCP Befehl   Nicht unterstützt
SYSCALL / SYSRET Befehl   Nicht unterstützt
SYSENTER / SYSEXIT Befehl   Unterstützt
VIA FEMMS Befehl   Nicht unterstützt
   
Sicherheits Besonderheiten   
Advanced Cryptography Engine (ACE)   Nicht unterstützt
Advanced Cryptography Engine 2 (ACE2)   Nicht unterstützt
Dateiausführungsverhinderung (DEP, NX, EDB)   Unterstützt
Hardware Zufallsnummern Generator (RNG)   Nicht unterstützt
PadLock Hash Engine (PHE)   Nicht unterstützt
PadLock Montgomery Multiplier (PMM)   Nicht unterstützt
Prozessor Seriennummer (PSN)   Nicht unterstützt
   
Energieverwaltungs Fähigkeiten   
Automatic Clock Control   Unterstützt
Digital Thermometer   Unterstützt
Dynamic FSB Frequency Switching   Nicht unterstützt
Enhanced Halt State (C1E)   Unterstützt, Deaktiviert
Enhanced SpeedStep Technology (EIST, ESS)   Nicht unterstützt
Frequency ID Control   Nicht unterstützt
Hardware P-State Control   Nicht unterstützt
LongRun   Nicht unterstützt
LongRun Table Interface   Nicht unterstützt
PowerSaver 1.0   Nicht unterstützt
PowerSaver 2.0   Nicht unterstützt
PowerSaver 3.0   Nicht unterstützt
Processor Duty Cycle Control   Unterstützt
Software Thermal Control   Nicht unterstützt
Temperatur Sensing Diode   Nicht unterstützt
Thermal Monitor 1   Unterstützt
Thermal Monitor 2   Unterstützt
Thermal Monitoring   Nicht unterstützt
Thermal Trip   Nicht unterstützt
Voltage ID Control   Nicht unterstützt
   
CPUID Besonderheiten   
1 GB Page Size   Nicht unterstützt
36-bit Page Size Extension   Nicht unterstützt
Address Region Registers (ARR)   Nicht unterstützt
CPL Qualified Debug Store   Unterstützt
Debug Trace Store   Unterstützt
Debugging Extension   Unterstützt
Direct Cache Access   Nicht unterstützt
Dynamic Acceleration Technology (IDA)   Nicht unterstützt
Fast Save & Restore   Unterstützt
Hyper-Threading Technology (HTT)   Unterstützt, Aktiviert
Invariant Time Stamp Counter   Unterstützt
L1 Context ID   Nicht unterstützt
Local APIC On Chip   Unterstützt
Machine Check Architecture (MCA)   Unterstützt
Machine Check Exception (MCE)   Unterstützt
Memory Configuration Registers (MCR)   Nicht unterstützt
Memory Type Range Registers (MTRR)   Unterstützt
Model Specific Registers (MSR)   Unterstützt
Nested Paging   Nicht unterstützt
Page Attribute Table (PAT)   Unterstützt
Page Global Extension   Unterstützt
Page Size Extension (PSE)   Unterstützt
Pending Break Event   Unterstützt
Physical Address Extension (PAE)   Unterstützt
Safer Mode Extensions (SMX)   Nicht unterstützt
Secure Virtual Machine Extensions (Pacifica)   Nicht unterstützt
Self-Snoop   Unterstützt
Time Stamp Counter (TSC)   Unterstützt
Turbo Boost   Nicht unterstützt
Virtual Machine Extensions (Vanderpool)   Nicht unterstützt
Virtual Mode Extension   Unterstützt
x2APIC   Nicht unterstützt
XSAVE / XRSTOR Extended States   Nicht unterstützt
Title: Re: optimized sources
Post by: Gecko_R7 on 04 Jan 2010, 06:14:21 pm
Full Atom run attached w/ result files in 7zip.
Strange that the ATOM switch shows "slower" on the DMH1023 WU.  :-\
Not sure I trust that result.
Also noticed a missing WU.

Gonna re-run.... ::)
 
WU : DMH1023rr_ap_21oc08ab_B2_P0_00081_20081130_08605.dat
ap_5.05r168_SSE3.exe : 2207.679 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 1791.094 secs CPU
Speedup     : 18.87%
Ratio       : 1.23 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1664.967 secs CPU
Speedup     : 24.58%
Ratio       : 1.33 x
 
WU : JasonMediumrr.dat
ap_5.05r168_SSE3.exe : 11137.676 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 6406.134 secs CPU
Speedup     : 42.48%
Ratio       : 1.74 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 6412.452 secs CPU
Speedup     : 42.43%
Ratio       : 1.74 x
 
WU : JasonShortrr.dat
ap_5.05r168_SSE3.exe : 3823.569 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3199.768 secs CPU
Speedup     : 16.31%
Ratio       : 1.19 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 3213.012 secs CPU
Speedup     : 15.97%
Ratio       : 1.19 x
 
WU : Raistmer_tinyrr.dat
ap_5.05r168_SSE3.exe : 941.263 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 802.922 secs CPU
Speedup     : 14.70%
Ratio       : 1.17 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 806.447 secs CPU
Speedup     : 14.32%
Ratio       : 1.17 x
 
WU : sigindrr.dat
ap_5.05r168_SSE3.exe : 5168.859 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3982.191 secs CPU
Speedup     : 22.96%
Ratio       : 1.30 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 4041.658 secs CPU
Speedup     : 21.81%
Ratio       : 1.28 x
Quote

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Jason G on 05 Jan 2010, 03:43:59 am
...
Strange that the ATOM switch shows "slower" on the DMH1023 WU.  :-\
Not sure I trust that result.
...
Well, A few ideas on that run:
  - DMH1023 is a weird one, with lot's of blanking & early signals IIRC....
  - The ATOM_QOpt build being first run in the Science Apps folder, would likely mean it was generating the FFTW wisdom (which takes time of course), and subsequent builds/runs might have benefited from that once off cost.
 - I'm still trying to work a few things out about the characteristics of the newer ICC optimisations, that mean targetted switches are likely not operative on the hot code regions.  Targetted platform builds ( such as /QxSSE4.1.... ) seem to be performing inferior to generate arch:sse3,  that could indicate a combination of hand optimisations confounding/blocking the compiler automation, and/or a need to adjust Joe's excellent hand SSE code per platform ( of which there are a few fairly straight forward parameters clearly set for P3-P3 at the moment )
Title: Re: optimized sources
Post by: _heinz on 05 Jan 2010, 06:34:20 pm
R3600 ATOM

100.000 credits today  ;D

First seen on 2009-11-08 06:38:13
Current Credit (based on incremental update) 100,105.20
Recent average credit RAC (projects accumulated) 1,934.37570
mostly crunched collatz on the ION chip
cpu run empty.....

see full statistic of host R3600 6187800 (http://boincstats.com/stats/boinc_host_graph.php?pr=bo&id=6187800)

summary we can say it crunches 50000 per month and get a rac of ~2000 running collatz on ION, cpu run empty..
for ~4 days I used the machine and swithed BOINC off

happy crunching  ;D

Title: Re: optimized sources
Post by: Gecko_R7 on 05 Jan 2010, 07:54:20 pm
Re-run of Atom N270 results attached.
Summary below.

On this run, the 1LC25 WU was the first one and ATOM_ICC_Qopt was slower.
However, the 08605 WU was next & showed the Atom faster.
On my previous run, the 080605 WU was the first run and it was slower like these results.

There does seem to be a slow-down on the first WU run which makes ATOM_ICC times longer.
So, perhaps Wisdom gen time does have noticebale impact?  :-\

Quote

Quick timetable
 
WU : ap_18se08aa_B6_P1_00046_1LC25.dat
ap_5.05r168_SSE3.exe : 2403.913 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 2163.079 secs CPU
Speedup     : 10.02%
Ratio       : 1.11 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1919.093 secs CPU
Speedup     : 20.17%
Ratio       : 1.25 x
 
WU : DMH1023rr_ap_21oc08ab_B2_P0_00081_20081130_08605.dat
ap_5.05r168_SSE3.exe : 1952.649 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 1671.145 secs CPU
Speedup     : 14.42%
Ratio       : 1.17 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 1675.482 secs CPU
Speedup     : 14.19%
Ratio       : 1.17 x
 
WU : JasonMediumrr.dat
ap_5.05r168_SSE3.exe : 13857.850 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 6451.858 secs CPU
Speedup     : 53.44%
Ratio       : 2.15 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 6548.376 secs CPU
Speedup     : 52.75%
Ratio       : 2.12 x
 
WU : JasonShortrr.dat
ap_5.05r168_SSE3.exe : 3752.620 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 3227.926 secs CPU
Speedup     : 13.98%
Ratio       : 1.16 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 3236.210 secs CPU
Speedup     : 13.76%
Ratio       : 1.16 x
 
WU : Raistmer_tinyrr.dat
ap_5.05r168_SSE3.exe : 1186.544 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 810.191 secs CPU
Speedup     : 31.72%
Ratio       : 1.46 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 813.795 secs CPU
Speedup     : 31.41%
Ratio       : 1.46 x
 
WU : sigindrr.dat
ap_5.05r168_SSE3.exe : 5153.165 secs CPU
ap_5.05r303_ATOM_ICC_Qopt.exe : 4008.071 secs CPU
Speedup     : 22.22%
Ratio       : 1.29 x
ap_5.05r303_SSE3_ICC_Qopt.exe : 4072.968 secs CPU
Speedup     : 20.96%
Ratio       : 1.27 x

[attachment deleted by admin]
Title: Re: optimized sources
Post by: Jason G on 05 Jan 2010, 08:01:17 pm
...
So, perhaps Wisdom gen time does have noticebale impact?  :-\
...
  It certainly can, and is probably the case here.  Some platforms seem to converge quite quickly on wisdom, some take longer.  I reckon it depends on how fftw arranged the heuristics in that initialisation, and to whether it finds the best codelet sequences soon or later in allowed time limits.

To confirm wisdom impact, take a look at the counters in stderr.txt.  The Init component will contain any wisdom generation, while the crunch time is just that.  The additional ffa counter is a subcomponent of crunching that Joe's been doing lot's of work on recently.
Title: Re: optimized sources
Post by: _heinz on 18 Jan 2010, 08:15:30 pm
some more interesting results
Quick timetable

WU : ap_18se08aa_B6_P1_00046_1LC25.wu
ap_5.05r168_SSE3.exe : 718.266 secs CPU
ap_5.05r309_SSSE3_ICC_CSP_QxSSSE3.exe : 357.641 secs CPU
Speedup     : 50.21%
Ratio       : 2.01 x

WU : Raistmer's_tiny.wu
ap_5.05r168_SSE3.exe : 275.047 secs CPU
ap_5.05r309_SSSE3_ICC_CSP_QxSSSE3.exe : 135.047 secs CPU
Speedup     : 50.90%
Ratio       : 2.04 x

WU : sigind_v5.wu
ap_5.05r168_SSE3.exe : 1073.109 secs CPU
ap_5.05r309_SSSE3_ICC_CSP_QxSSSE3.exe : 782.500 secs CPU
Speedup     : 27.08%
Ratio       : 1.37 x
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
some more tests needed to confirm it
Title: Re: optimized sources
Post by: _heinz on 23 Jan 2010, 08:26:26 am
Among all compiled app's it's sometime difficult to choose the right one for a specific processor.
The question is always:
What's the best option for my processor?

SSE4.2
Intel® Core™ i7 Processors
Intel® Xeon® 55XX series
 
SSE4.1
Intel® Xeon® 74XX series
Quad-Core Intel® Xeon 54XX, 33XX series
Dual-Core Intel® Xeon 52XX, 31XX series
Intel® Core™ 2 Extreme 9XXX series
Intel® Core™ 2 Quad 9XXX series
Intel® Core™ 2 Duo 8XXX series
Intel® Core™ 2 Duo E7200

SSSE3
Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series
Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series
Intel® Core™ 2 Extreme 7XXX, 6XXX series
Intel® Core™ 2 Quad 6XXX series
Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series
Intel® Core™ 2 Solo 2XXX series
Intel® Pentium® dual-core processor E2XXX, T23XX series
 
SSE3_ATOM
Intel® ATOM™ Processor only (not usable for any other Processor)
 
SSE3
Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series
Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16
Dual-Core Intel® Xeon® 2.8
Intel® Xeon® processors with SSE3 instruction set support
Intel® Core™ Duo
Intel® Core™ Solo
Intel® Pentium® dual-core processor T21XX, T20XX series
Intel® Pentium® processor Extreme Edition
Intel® Pentium® D
Intel® Pentium® 4 processors with SSE3 instruction set support

SSE2(default)
Intel® Xeon® processors
Intel® Pentium® 4 processors
Intel® Pentium® M
 
IA32
Intel® Pentium® III Processor
Intel® Pentium® II Processor
Intel® Pentium® Processor

------------------------------------------------------------------
Which processor is targeted by default?

On IA-32 systems running Windows* and Linux*, /arch:SSE2 is on by default.
The resulting code path should run on the Intel Pentium 4 and Intel Xeon processors with SSE2 support
and other later Intel processors or compatible non-Intel processors with SSE2 support.
App's compiled with /arch:IA32 are special builds for the early Pentium® Processors(PIII, PII, Pentium®)

You can run CPUZ (http://www.cpuid.com/) to see your processor specific options.
Title: Re: optimized sources
Post by: _heinz on 31 Jan 2010, 02:41:46 pm
By the way, today I got 7 Mio total credit and 2 Mio collatz.   ;D
Current Credit (based on incremental update) 7,028,888.84
--> full statistic (http://boincstats.com/stats/boinc_user_graph.php?pr=bo&id=5e024335320e436c4d050e073963e326)
Title: Re: optimized sources
Post by: _heinz on 20 Feb 2010, 07:44:56 pm
I did run a astropulse wu on my old P4 2,6GHz in about 110874 seconds
my wingman run stock app on a P4 3,2GHz need about 265779 seconds
the wu has 0% blanking !
now everybody can do its own calculation.
~~~~~~~~~~~~~~~~~~~~~~~~~~
thanks to all readers of this epic thread, we have now more than 72 000 hits.

regards  ;D


Title: Re: optimized sources
Post by: _heinz on 21 Feb 2010, 02:02:13 pm
wow, 8,012,203.19 total today (need 21 days for the last Mio)
have a look here (http://www.boincstats.com/signature/user_374946.gif)
let's crunching and have fun  ;D
Title: Re: optimized sources
Post by: sunu on 21 Feb 2010, 02:11:01 pm
For 10? I think you need two million. Oh, and I thought you were German.  :)
Congrats anyway.
Title: Re: optimized sources
Post by: _heinz on 21 Feb 2010, 05:18:57 pm
For 10? I think you need two million. Oh, and I thought you were German.  :)
Congrats anyway.
10 mio ? -->Target will be reached in 41.63 days on April 4 2010 (if i have no hardware outage)
 :)
Title: Re: optimized sources
Post by: _heinz on 23 Feb 2010, 05:24:05 pm
Have now "Intel Compiler Suite" and "Parallel Studio", ( Composer update5 ) parallel installed on my dev-environment.
This way we can easy change the different compiler-packages in our projects as we need it.


Title: Re: optimized sources
Post by: _heinz on 06 Mar 2010, 07:29:32 pm
all days update...
I switched off my air connected network and installed some CPL Home Plug adapters 200Mbps with 3 port switch.
I had have too many acesspoints in my environment which use the same channel  as mine. This reduced the bandwith to 34 Mpbs , I need days for my big software-downloads of some gigabytes. I was tired to look every day after a other free channel.
For fun I installed a USB-HDTV stick on R3600, to see the olympia events in HDTV quality. It worked great.
-----------------
Last week I installed a complete developer environment on my R3600 Atom.(no VM)
OS: Vista32
Parallel Studio (update5)
Intel Compiler Suite (update5)
A first complex project was tested sucessful and shows that the test and developer environment works.
Now I have still todo the updates on my VM's.

I did not upgrade the R3600 to W7.
But perhaps jason can tell us if his dev environment works on W7.

With some new astropulse wu's I had have no luck, I did not get any of them, so we are playing the waiting game..

Title: Re: optimized sources
Post by: _heinz on 10 Mar 2010, 08:22:42 am
looks like we manifest now the 50% speedup against ap_5.05r168_SSE3 as our latest tests with ap_5.05r309 shows.
some more tests on different hardware-platforms must confirm it.
 :)
Title: Re: optimized sources
Post by: _heinz on 11 Mar 2010, 06:07:50 pm
For 10? I think you need two million. Oh, and I thought you were German.  :)
Congrats anyway.
10 mio ? -->Target will be reached in 41.63 days on April 4 2010 (if i have no hardware outage)
 :)
9 Mio total today
10 mio ? -->Target will be reached in 18.28 days on March 30 2010
 ;)
Title: Re: optimized sources
Post by: The Grinch on 12 Mar 2010, 06:55:16 am
Kommt hier eigentlich mal was Produktives, oder wird hier nur das "optimale" System präsentiert?
Title: Re: optimized sources
Post by: _heinz on 12 Mar 2010, 08:50:24 am
Kommt hier eigentlich mal was Produktives, oder wird hier nur das "optimale" System präsentiert?
natürlich gibts was productives, wir arbeiten an speed optimierten astropulse Apps
mit den neuesten Technologien für CPU und GPU(NVIDIA uind ATI)
Ziel sind 50% speedup gegenüber der standard astropulse app.
downlods zum testen gibts hier nicht, sondern nur im PubliC Release Beta Testforum
http://lunatics.kwsn.net/14-public-release-beta-test-forum/index.0.html
final versionen gibts dann auf der startseite, wie immer.

@Grinch: as always, next time in english
Title: Re: optimized sources
Post by: The Grinch on 13 Mar 2010, 01:40:50 am
Das mag sein das es das Ziel ist, aber auf den meisten der 36 Seiten dieses Threads
les ich nur was von irgend welcher Hardware oder Software, die mal funktioniert oder auch nicht.

Und mein Englisch ist leider nicht so gut für dieses Forum, lesen klappt bescheiden.
Title: Re: optimized sources
Post by: AMpractice on 01 Apr 2010, 11:57:32 pm
Quote
on my old xp-machine this older version 2.2 compiles complete without error. So I thought it is a good complete test.
...

some other projects(arprec) I tried did not have this problem... will try some others now too

with akv8 I believe I must use IPP5.3,  have still 5.2beta.
there is still something to install (ITBB and latest IPP)

some closer collaboration would I like  ;D

heinz
Quote

Hello:
I am a newbie in installing external packages (ARPREC) in VS2008 c++ on Windows XP.  I am having trouble in configuring Visual Studio 2008 to recognize arprec lib.  After running the arprec.vcproj and compiling the projects successfully, the runtime error was:
RegSvr32 Message:

No DLL name specified.

Usage:  regsvr32[/u][/s][/n][/i[:cmdline]]dllname

/u-Unregistered server

/s-Silent; display no message boxes

/i-Call DllInstall passing it an optional [cmdline]; when used with /u calls dll uninstall

/n-do not call DllRegisterServer; this option must be used with /i

If someone can provide the directions for installing ARPREC properly in MSVS2008 C++ on WinXP, it would be greatly appreciated.   The README for the pkg may not have been specific for Visual Studio.  Sorry for the newbie question.     :o
Title: Re: optimized sources
Post by: _heinz on 02 Apr 2010, 09:44:13 am
Hi AMpractice,

configure the project as /Release
if you have compiled the project sucessful, you must write your own batchfiles to execute the different programs. If you study the source, you can find which parameters the exe need to execute. This is the main object of the batchfiles.
You must copy all files from /Release including the arprec.lib, arprec.dll, arprec.exp into your testdirectory where the batchfiles are.
Write your own batchfiles is a good exercise to understunding the project.
On my website (http://www.britta-d.de) you can find the arprec samples behind some other interesting math.
Ask, if you have questions

Regards
heinz
Title: Re: optimized sources
Post by: _heinz on 05 Apr 2010, 05:28:34 pm
For 10? I think you need two million. Oh, and I thought you were German.  :)
Congrats anyway.
10 mio ? -->Target will be reached in 41.63 days on April 4 2010 (if i have no hardware outage)
 :)
9 Mio total today
10 mio ? -->Target will be reached in 18.28 days on March 30 2010
 ;)
got the 10 Mio on 4th of April  ;)
Title: Re: optimized sources
Post by: sunu on 06 Apr 2010, 02:17:50 pm
got the 10 Mio on 4th of April  ;)

Very nice milestone!  :)
Title: Re: optimized sources
Post by: _heinz on 11 Apr 2010, 04:37:04 pm
Hi,
Since nearly a half of year I had have laying around a "W7 Ultimate 64".
Now its time to install:
- Updated my host machine to VMWare Server 2.02.:
- Installed now unsupported  :o "W7 Ultimate 64" as V8-VM7
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have now a 64Bit (V8-VM7)  "Windows 7" machine for Beta testing.

 ;)

Title: Re: optimized sources
Post by: _heinz on 14 Apr 2010, 10:36:18 am
Hi Jason,
I tested  beta Unified installer on V8-VM7 , no problems on W7 Ultimate 64.
Installed AK_v8b_win_x64_SSE41 and ap_5.05r339_SSE, get work for AK_v8b and it runs.
 ;)
Title: Re: optimized sources
Post by: Jason G on 14 Apr 2010, 10:40:31 am
Cheers Heinz.  This should really be released already (for a while), but I have some hectic schedule preventing my full attention to do things right. I hope to sort things out soon so we can move onto the next stage.

Jason
Title: Re: optimized sources
Post by: _heinz on 15 Apr 2010, 08:49:50 pm
how p8 compiles:
p8 - New Optimizations for 32-bit applications on 45nm Intel® Core™2 Duo (Penryn,Nehalem,Westmere) family processors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1>------ Neues Erstellen gestartet: Projekt: ap_client, Konfiguration: AP_SSE2_CSP_QaxAVX Win32 ------
1>Deleting intermediate files and output files for project 'ap_client', configuration 'AP_SSE2_CSP_QaxAVX|Win32'.
1>AP SSE2_IPP_ICC_CSP_QaxSSE2_Qparallel_MKLP_BLANKIT_O2_Oii Win32 (Microsoft VC++ Environment)
1>Generating new BuildInfo
1>SubWCRev: 'C:\I\SC\apwk\ap_ICCIPP'
1>Last committed at revision 374
1>Updated to revision 374
1>Local modifications found
1>
1>APREV IS 374
1>Deleting old output files
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_schema.cpp
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_fold.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>intrinsics.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_fileio.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_science.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>sbtf.cpp
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_remove_radar.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_debug.cpp
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_client_main.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>version.cpp
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>mtrand.cpp
1>dm_chunk_parallel.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_gfx_main.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Compiling with Intel(R) C++ 11.1.060 [IA-32]... (Intel C++ Environment)
1>ap_timer.cpp
1>-----USE_AVX activ-----
1>-----AVX undefined in x32, p8 used -----
1>Linking... (Intel C++ Environment)
1>xilink: executing 'link'
1>Embedding manifest... (Microsoft VC++ Environment)
1>AP SSE2_IPP_ICC_CSP_QaxSSE2_Qparallel_MKLP_BLANKIT_O2_Oi Win32 (Microsoft VC++ Environment)
1>Post Build revision number extraction
1>
1>APREV IS 374
1>Copy/Renaming Output Files
1>
1>Build log was saved at "file://C:\I\SC\apwk\ap_ICCIPP\client\WinBuild\ICC11_2k8\Win32\Output_ext\ap_client\AP_SSE2_CSP_QaxAVX\Intermediate\BuildLog.htm"
1>ap_client - 0 error(s), 0 warning(s)
========== Alles neu erstellen: 1 erfolgreich, Fehler bei 0, 0 übersprungen ==========
This is a build with dispatcher and include optimized code for the selected processor
from min SSE2 to SSE4.2
Its a Qax and run on any processor with SSE2 and above and Pentium3, and all Athlon with min SSE2
resultfile attached
A full 64Bit AVX build is in preparation  ;)

[attachment deleted by admin]
Title: Re: optimized sources
Post by: _heinz on 01 May 2010, 04:59:14 pm
Hi all,

update6 for parallel composer and ICSP is out. A lot todo with updates.

Composer update6
1>Compiling with Intel(R) C++ Compiler 11.1.082 [IA-32]... (Intel C++ Environment)

ICSP update6
1>Compiling with Intel(R) C++ 11.1.065 [IA-32]... (Intel C++ Environment)


thanks to all readers of this epic thread...

by the way, got "Brown Belt" from Intel today  ;)

and 11 Mio today: 1st of Mai
Current Credit (based on incremental update) 11,016,471.91

regards
 
Title: Re: optimized sources
Post by: _heinz on 25 May 2010, 09:42:08 pm
Hi,
the new driver 197.45 WHQL added Open CL 1.0  support to all devices upto Geforce 8 and later GPU's
Fügt Unterstützung für OpenCL(Open Computing Language) 1.0 für alle GeForce 8-Serie und später GPUs.
Version: 197.45 WHQL
Freigabedatum: 2010.04.13
Betriebssystem: Windows 7 64-bit
Sprache: Deutsch
Dateigröße: 143 MB 

 ;)
http://www.nvidia.de/object/win7_winvista_64bit_197.45_whql_de.html

really great to get OCL this way  ;D
Title: Re: optimized sources
Post by: _heinz on 26 May 2010, 09:28:39 am
And does it really worked you can see here:
~~~~~~~~~~~~~~~~~~~~~~~~~~
oclDeviceQuery.exe Starting...

OpenCL SW Info:

 CL_PLATFORM_NAME:      NVIDIA CUDA
 CL_PLATFORM_VERSION:   OpenCL 1.0 CUDA 3.0.1
 OpenCL SDK Revision:   5537818


OpenCL Device Info:

 1 devices found supporting OpenCL:

 ---------------------------------
 Device ION
 ---------------------------------
  CL_DEVICE_NAME:                       ION
  CL_DEVICE_VENDOR:                     NVIDIA Corporation
  CL_DRIVER_VERSION:                    197.45
  CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:          2
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:        512 / 512 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:        512
  CL_DEVICE_MAX_CLOCK_FREQUENCY:        1100 MHz
  CL_DEVICE_ADDRESS_BITS:               32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:         128 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:            241 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
  CL_DEVICE_LOCAL_MEM_TYPE:             local
  CL_DEVICE_LOCAL_MEM_SIZE:             16 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:              1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:        128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       8
  CL_DEVICE_SINGLE_FP_CONFIG:           INF-quietNaNs round-to-nearest round-to-
zero round-to-inf fma

  CL_DEVICE_IMAGE <dim>                 2D_MAX_WIDTH     8192
                                        2D_MAX_HEIGHT    8192
                                        3D_MAX_WIDTH     2048
                                        3D_MAX_HEIGHT    2048
                                        3D_MAX_DEPTH     2048

  CL_DEVICE_EXTENSIONS:                 cl_khr_byte_addressable_store
                                        cl_khr_icd
                                        cl_khr_gl_sharing
                                        cl_nv_d3d9_sharing
                                        cl_nv_d3d10_sharing
                                        cl_nv_d3d11_sharing
                                        cl_nv_compiler_options
                                        cl_nv_device_attribute_query
                                        cl_nv_pragma_unroll
                                        cl_khr_global_int32_base_atomics
                                        cl_khr_global_int32_extended_atomics


  CL_DEVICE_COMPUTE_CAPABILITY_NV:      1.1
  NUMBER OF MULTIPROCESSORS:            2
  NUMBER OF CUDA CORES:                 16
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:     8192
  CL_DEVICE_WARP_SIZE_NV:               32
  CL_DEVICE_GPU_OVERLAP_NV:             CL_FALSE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:     CL_FALSE
  CL_DEVICE_INTEGRATED_MEMORY_NV:       CL_TRUE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>  CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1,
 DOUBLE 0


oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA
3.0.1, SDK Revision = 5537818, NumDevs = 1, Device = ION

System Info:

 Local Time/Date = 15:24:44, 5/26/2010
 CPU Arch: 0
 CPU Level: 6
 # of CPU processors: 2
 Windows Build: 6002
 Windows Ver: 6.0


PASSED


Press <Enter> to Quit...
-----------------------------------------------------------
Title: Re: optimized sources
Post by: _heinz on 01 Jun 2010, 09:27:50 am
GeForce GTX470
I got it running...
~~~~~~~~~~~
oclDeviceQuery.exe Starting...

OpenCL SW Info:

 CL_PLATFORM_NAME:      NVIDIA CUDA
 CL_PLATFORM_VERSION:   OpenCL 1.0 CUDA 3.0.1
 OpenCL SDK Revision:   5537818


OpenCL Device Info:

 1 devices found supporting OpenCL:

 ---------------------------------
 Device GeForce GTX 470
 ---------------------------------
  CL_DEVICE_NAME:                       GeForce GTX 470
  CL_DEVICE_VENDOR:                     NVIDIA Corporation
  CL_DRIVER_VERSION:                    197.75
  CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:          14
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:        1024 / 1024 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:        1024
  CL_DEVICE_MAX_CLOCK_FREQUENCY:        810 MHz
  CL_DEVICE_ADDRESS_BITS:               32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:         312 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:            1248 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
  CL_DEVICE_LOCAL_MEM_TYPE:             local
  CL_DEVICE_LOCAL_MEM_SIZE:             48 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:              1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:        128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       8
  CL_DEVICE_SINGLE_FP_CONFIG:           INF-quietNaNs round-to-nearest round-to-
zero round-to-inf fma

  CL_DEVICE_IMAGE <dim>                 2D_MAX_WIDTH     8192
                                        2D_MAX_HEIGHT    8192
                                        3D_MAX_WIDTH     2048
                                        3D_MAX_HEIGHT    2048
                                        3D_MAX_DEPTH     2048

  CL_DEVICE_EXTENSIONS:                 cl_khr_byte_addressable_store
                                        cl_khr_icd
                                        cl_khr_gl_sharing
                                        cl_nv_d3d9_sharing
                                        cl_nv_d3d10_sharing
                                        cl_nv_d3d11_sharing
                                        cl_nv_compiler_options
                                        cl_nv_device_attribute_query
                                        cl_nv_pragma_unroll
                                        cl_khr_global_int32_base_atomics
                                        cl_khr_global_int32_extended_atomics
                                        cl_khr_local_int32_base_atomics
                                        cl_khr_local_int32_extended_atomics
                                        cl_khr_fp64


  CL_DEVICE_COMPUTE_CAPABILITY_NV:      2.0
  NUMBER OF MULTIPROCESSORS:            14
  NUMBER OF CUDA CORES:                 448
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:     32768
  CL_DEVICE_WARP_SIZE_NV:               32
  CL_DEVICE_GPU_OVERLAP_NV:             CL_TRUE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:     CL_FALSE
  CL_DEVICE_INTEGRATED_MEMORY_NV:       CL_FALSE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>  CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1,
 DOUBLE 1


oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA
3.0.1, SDK Revision = 5537818, NumDevs = 1, Device = GeForce GTX 470

System Info:

 Local Time/Date = 15:22:35, 6/1/2010
 CPU Arch: 0
 CPU Level: 6
 # of CPU processors: 8
 Windows Build: 6002
 Windows Ver: 6.0


PASSED


Press <Enter> to Quit...
-----------------------------------------------------------
regards  ;)

heinz
Title: Re: optimized sources
Post by: _heinz on 02 Jun 2010, 07:16:43 pm
02.06.2010 22:27:47      NVIDIA GPU 0: GeForce GTX 470 (driver version 25715, CUDA version 3010, compute capability 2.0, 1248MB, 726 GFLOPS peak)

come to our beta forum to test the new sah fermi- app.

regards  heinz
Title: Re: optimized sources
Post by: _heinz on 03 Jun 2010, 05:18:17 pm
If you want to see some fermi results have a look at my host (http://setiathome.berkeley.edu/results.php?hostid=4387433)
~12 a half min per wu against my Xeon with 3 hours.

 :o

 ;D
Title: Re: optimized sources
Post by: _heinz on 08 Jun 2010, 11:12:44 am
The Fermi application (v6.10) has become visible on the SETI applications page.
If you have a GTX470/480 you can download now and run it.
Work is not available at the moment, why the splitters are offline.
We are all waiting now.

heinz  ;)
Title: Re: optimized sources
Post by: _heinz on 28 Aug 2010, 08:49:40 am
Vacation are over now, thank you all for your patience. ;)

Hi Jason,
the ION wu is up now.
Laufzeit 13,861.30
CPU Zeit 508.61

http://setiathome.berkeley.edu/result.php?resultid=1693745738
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: ION, 241 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2
     clockRate = 1100000
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: ION is okay
SETI@home using CUDA accelerated device ION
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan

 )       _   _  _)_ o  _  _
(__ (_( ) ) (_( (_  ( (_ ( 
 not bad for a human...  _)

Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is :  0.410268

Flopcounter: 34143005518374.668000

Spike count:    0
Pulse count:    0
Triplet count:  1
Gaussian count: 0
called boinc_finish

</stderr_txt>
]]>
if one of you will download the new cuda app and use the Unified Installer v0.37 goto: http://lunatics.kwsn.net/index.php
regards Heinz
Title: Re: optimized sources
Post by: _heinz on 29 Aug 2010, 06:02:03 am
Hi Jason,
next wu of ION is up now.
Computer ID 5510631
Ablaufdatum 15 Oct 2010 19:55:05 UTC
Laufzeit 14,564.88
CPU Zeit 520.29

http://setiathome.berkeley.edu/result.php?resultid=1693924875

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
  Device 1: ION, 241 MiB, regsPerBlock 8192
     computeCap 1.1, multiProcs 2
     clockRate = 1100000
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: ION is okay
SETI@home using CUDA accelerated device ION
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan

 )       _   _  _)_ o  _  _
(__ (_( ) ) (_( (_  ( (_ ( 
 not bad for a human...  _)

Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is :  0.392020

Flopcounter: 45991923541269.156000

Spike count:    3
Pulse count:    0
Triplet count:  2
Gaussian count: 1
called boinc_finish

</stderr_txt>
]]>
______________________________
4h:02min, not bad for the ION chip,
so we know now that the app works on this chipset too.
regards Heinz
Title: Re: optimized sources
Post by: Jason G on 29 Aug 2010, 06:12:36 am
...
4h:02min, not bad for the ION chip,
so we know now that the app works on this chipset too.
regards Heinz

That's good news heinz.  There are ways we can speed this up for similar onboard GPUs, but knowing that it works OK for now is a good first step.

Jason
Title: Re: optimized sources
Post by: _heinz on 29 Aug 2010, 06:32:07 am
Hi Jason,
I changed the OS from Vista32 to W7 32 and had to make a clean install on this machine.
Now I had to reinstall all my compiler and development programs. I will do it in one of the next days so we can explore this chipset a little bit more.
heinz  :)
Title: Re: optimized sources
Post by: Jason G on 29 Aug 2010, 08:09:51 am
Good move heinz. Also due to cheap/good hard drive availability here, I've been gradually migrating my development environment over to a performance raid 10 one, and that influences production workflow far more than I expected.  I'll soon be migrating also, gradually, to VS2008 for primary development, since the nVidia nSight stuff is made to work on that.  Might make life a bit easier if we are on similar platforms/environment.
Title: Re: optimized sources
Post by: Raistmer on 29 Aug 2010, 01:11:02 pm
I'll soon be migrating also, gradually, to VS2008 for primary development, since the nVidia nSight stuff is made to work on that.  Might make life a bit easier if we are on similar platforms/environment.
Yeah, I'm on VS2008 too already :)
Title: Re: optimized sources
Post by: Jason G on 29 Aug 2010, 01:14:30 pm
Yeah, I'm on VS2008 too already :)

Did you play with nSight already ? do Ati have something similar for you to use for openCL ? 
Title: Re: optimized sources
Post by: Raistmer on 29 Aug 2010, 01:21:43 pm
Downloaded it but not installed still (have no NV GPU in dev host now).
ATI far away with debugging/profiling/supporting tools as usual...
Each new release of KernelAnalyser immediately starts new thread on AMD forum with new bugs discovered.
For example, in last one I see only 4xxx GPUs, where 5xxx gone - no idea.... Before they never restrict me with actually installed hardware, now only 4xxx available (but few of them, not only really installed hardware GPU)
Title: Re: optimized sources
Post by: _heinz on 30 Aug 2010, 05:06:08 am
Hi Jason,
ION-results:  http://setiathome.berkeley.edu/results.php?hostid=5510631
<coproc>
            <type>CUDA</type>
            <count>1.0</count>
</coproc>
GTX470-results:  http://setiathome.berkeley.edu/results.php?hostid=4387433
<coproc>
            <type>CUDA</type>
            <count>0.5</count>
</coproc>
__________________________
Modify some later:
GTX470-results:  http://setiathome.berkeley.edu/results.php?hostid=4387433
<coproc>
            <type>CUDA</type>
            <count>0.25</count>
</coproc>
no problem with 4 wu's parallel (time vary from 24 to 37 min)
-----------------------------------------
runs great  ;)
Title: Re: optimized sources
Post by: Gecko_R7 on 30 Aug 2010, 09:17:25 am
@ Heinz,

Is it worth revisiting updated Atom CPU builds based on more recent source?

Even 5-10% differences are worthwhile when current Atom AP is @ 80 hours.
Title: Re: optimized sources
Post by: _heinz on 30 Aug 2010, 11:02:45 am
@ Heinz,

Is it worth revisiting updated Atom CPU builds based on more recent source?

Even 5-10% differences are worthwhile when current Atom AP is @ 80 hours.

Hi Gecko,
I have several compiled apps for Atom under testing. If  I'm ready to reconfig my Atom R3600 and have all development programs installed again, I will compile some new atom-apps based on latest source-updates. You know I changed to W7 now, so a lot of updates are necessary.
heinz
Title: Re: optimized sources
Post by: Jason G on 30 Aug 2010, 11:18:26 am
...You know I changed to W7 now, so a lot of updates are necessary....

I am moving to VS2008 w/sp1 Heinz (for Cuda development mostly)  So I'd advise going to that config if you have the option.
Title: Re: optimized sources
Post by: _heinz on 30 Aug 2010, 11:42:23 am
...You know I changed to W7 now, so a lot of updates are necessary....

I am moving to VS2008 w/sp1 Heinz (for Cuda development mostly)  So I'd advise going to that config if you have the option.
Yes I have VS2008 installed already and latest Intel Compiler(065), now I have installed SVN but I cant find this config-file where I had todo the changes for the SVN-autoprops.
what's the exact name of this svn config file and where can I find it in W7 ?
Title: Re: optimized sources
Post by: Jason G on 30 Aug 2010, 02:06:04 pm
will see if I can find it

[Edit:] here:

http://lunatics.kwsn.net/5-windows/stock-sah-v6-build-notes-so-far.msg6174.html#msg6174
Title: Re: optimized sources
Post by: Jason G on 30 Aug 2010, 03:46:59 pm
Oh heinz, nearly forgot, sorry:

If you are having trouble locating the appication data folder on Win7, you can open a 'My Computer' windows explorer window, then type %APPDATA% in the address bar & press enter ... that should take you inside the mysterious hidden bowels of Windows quickly.  Look around from there and you should find the right place (subversion folder) fairly easily.
Title: Re: optimized sources
Post by: _heinz on 30 Aug 2010, 04:34:51 pm
Oh heinz, nearly forgot, sorry:

If you are having trouble locating the appication data folder on Win7, you can open a 'My Computer' windows explorer window, then type %APPDATA% in the address bar & press enter ... that should take you inside the mysterious hidden bowels of Windows quickly.  Look around from there and you should find the right place (subversion folder) fairly easily.

I found it this way and made the necessary changes. Thank you very much.  ;)
I have installled TortoiseSVN-1.6.10.19898-win32-svn-1.6.12 ,
have still win7 32Bit(because I get a upgrade from Vista 32Bit) not 64 on my Atom, although I have installed 64Bit components of VS2008 and intel compiler, so I can produce 64 Bit app, but not test on this machine. No problem I can test 64Bit app on the V8-Xeon.
I have not installed VS2005, so I run into trouble with the common libs we placed there.
Are there any news with the common libs 32 and 64 Bit you created, on which place they are now ?
heinz
Title: Re: optimized sources
Post by: Raistmer on 30 Aug 2010, 04:37:07 pm
Hm.... Atom is x86-only AFAIK...
Title: Re: optimized sources
Post by: Frizz on 30 Aug 2010, 04:47:22 pm
Hm.... Atom is x86-only AFAIK...

Depends  ;)

Support for Intel 64:
- Atom 230, Atom 330
- Pineview (Atom-(N)400- and 500-Series)

No support for Intel 64:
- Atom N270, Atom N280
- Silverthorne (Atom-Z-500-Series)
Title: Re: optimized sources
Post by: _heinz on 30 Aug 2010, 05:25:31 pm
30.08.2010 21:47:59      Processor: 2 GenuineIntel          Intel(R) Atom(TM) CPU  230   @ 1.60GHz [Family 6 Model 28 Stepping 2]

Mine has 64 bit support, but I have still 32 bit W7 installed
Title: Re: optimized sources
Post by: Raistmer on 30 Aug 2010, 05:28:53 pm
Wow, good news! Then my next netbook could be x64 one ;D
Title: Re: optimized sources
Post by: Frizz on 30 Aug 2010, 05:33:56 pm
Wow, good news! Then my next netbook could be x64 one ;D

You don't wanna wait for AMD Ontario? (with APU (=CPU +GPU)) :)
http://www.semiaccurate.com/2010/07/29/early-amd-llano-and-ontario-performance-numbers-tip/
Title: Re: optimized sources
Post by: Raistmer on 30 Aug 2010, 05:44:01 pm
Actually I will wait even more :) I'm happy with my current netbook :)
Title: Re: optimized sources
Post by: Gecko_R7 on 30 Aug 2010, 06:23:18 pm
My N450 is x64 compatible.  ;D
Currently has Win7 Starter, but I did trim back all the bloatware and minimized unnecessary services.
Also increased memory to 2GB.

Have to admit that I like this little netbook.  It's kinda cool.  :P

Acer just announced a new model, Aspire One D255

Quote
The humble webcentric Netbook computer is getting a significant image boost this week following the official launch of Acer's Aspire One D255, which arrives as the very first Netbook to hit the U.S. market armed with a snazzy dual-core Intel Atom processor.
Sporting a typically pocket-friendly price of just $399 USD, the D255 comes equipped with the improved 1.5GHz Atom N550 platform (http://ark.intel.com/Product.aspx?id=50154&code=N550), 1GB of DDR3 RAM (1066MHz), a spacious 250GB hard drive (5400RPM), a 10.1-inch LCD screen with a resolution of 1024 x 600, and the Windows 7 Starter operating system.
Other contributing elements packed into Acer's latest ultra portable computer include integrated Intel GMA 3150 graphics, 802.11b/g/n Wi-Fi connectivity, 10/100 Ethernet, a trio of USB 2.0 ports, VGA out, a capable six-cell 4400mAh battery, a multi-card reader, and chassis colour choices of red, black, aquamarine or sandstone.
According to tech site Liliputing, the Aspire One D255 is presently only on sale through the Home Shopping Network, though we expect more widespread availability is likely just around the corner.

4 threads & DDR3 w/ IMC  ;D

I think my wife needs my N450 now.... lol!
Title: Re: optimized sources
Post by: Raistmer on 31 Aug 2010, 02:43:44 am
ROFL ;D
Well, too small for my eyes (I prefer 11,6" size) and little memory (1GB for all above XP is low ;) ) but ... Acer goes in right direction anyway ;D
Title: Re: optimized sources
Post by: _heinz on 01 Sep 2010, 05:57:45 pm
CUDA3.1 is now installed on W7 32
~~~~~~~~~~~~~~~~~~~~~~
OCL is working...

oclDeviceQuery.exe Starting...

OpenCL SW Info:

 CL_PLATFORM_NAME:      NVIDIA CUDA
 CL_PLATFORM_VERSION:   OpenCL 1.0 CUDA 3.1.1
 OpenCL SDK Revision:   6161726


OpenCL Device Info:

 1 devices found supporting OpenCL:

 ---------------------------------
 Device ION
 ---------------------------------
  CL_DEVICE_NAME:                       ION
  CL_DEVICE_VENDOR:                     NVIDIA Corporation
  CL_DRIVER_VERSION:                    258.96
  CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:          2
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:        512 / 512 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:        512
  CL_DEVICE_MAX_CLOCK_FREQUENCY:        1100 MHz
  CL_DEVICE_ADDRESS_BITS:               32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:         128 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:            241 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
  CL_DEVICE_LOCAL_MEM_TYPE:             local
  CL_DEVICE_LOCAL_MEM_SIZE:             16 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:              1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:        128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       8
  CL_DEVICE_SINGLE_FP_CONFIG:           INF-quietNaNs round-to-nearest round-to-
zero round-to-inf fma

  CL_DEVICE_IMAGE <dim>                 2D_MAX_WIDTH     8192
                                        2D_MAX_HEIGHT    8192
                                        3D_MAX_WIDTH     2048
                                        3D_MAX_HEIGHT    2048
                                        3D_MAX_DEPTH     2048

  CL_DEVICE_EXTENSIONS:                 cl_khr_byte_addressable_store
                                        cl_khr_icd
                                        cl_khr_gl_sharing
                                        cl_nv_d3d9_sharing
                                        cl_nv_d3d10_sharing
                                        cl_khr_d3d10_sharing
                                        cl_nv_d3d11_sharing
                                        cl_nv_compiler_options
                                        cl_nv_device_attribute_query
                                        cl_nv_pragma_unroll
                                        cl_khr_global_int32_base_atomics
                                        cl_khr_global_int32_extended_atomics


  CL_DEVICE_COMPUTE_CAPABILITY_NV:      1.1
  NUMBER OF MULTIPROCESSORS:            2
  NUMBER OF CUDA CORES:                 16
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:     8192
  CL_DEVICE_WARP_SIZE_NV:               32
  CL_DEVICE_GPU_OVERLAP_NV:             CL_FALSE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:     CL_TRUE
  CL_DEVICE_INTEGRATED_MEMORY_NV:       CL_TRUE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>  CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1,
 DOUBLE 0


  ---------------------------------
  2D Image Formats Supported (71)
  ---------------------------------
  #     Channel Order   Channel Type

  1     CL_R            CL_FLOAT
  2     CL_R            CL_HALF_FLOAT
  3     CL_R            CL_UNORM_INT8
  4     CL_R            CL_UNORM_INT16
  5     CL_R            CL_SNORM_INT16
  6     CL_R            CL_SIGNED_INT8
  7     CL_R            CL_SIGNED_INT16
  8     CL_R            CL_SIGNED_INT32
  9     CL_R            CL_UNSIGNED_INT8
  10    CL_R            CL_UNSIGNED_INT16
  11    CL_R            CL_UNSIGNED_INT32
  12    CL_A            CL_FLOAT
  13    CL_A            CL_HALF_FLOAT
  14    CL_A            CL_UNORM_INT8
  15    CL_A            CL_UNORM_INT16
  16    CL_A            CL_SNORM_INT16
  17    CL_A            CL_SIGNED_INT8
  18    CL_A            CL_SIGNED_INT16
  19    CL_A            CL_SIGNED_INT32
  20    CL_A            CL_UNSIGNED_INT8
  21    CL_A            CL_UNSIGNED_INT16
  22    CL_A            CL_UNSIGNED_INT32
  23    CL_RG           CL_FLOAT
  24    CL_RG           CL_HALF_FLOAT
  25    CL_RG           CL_UNORM_INT8
  26    CL_RG           CL_UNORM_INT16
  27    CL_RG           CL_SNORM_INT16
  28    CL_RG           CL_SIGNED_INT8
  29    CL_RG           CL_SIGNED_INT16
  30    CL_RG           CL_SIGNED_INT32
  31    CL_RG           CL_UNSIGNED_INT8
  32    CL_RG           CL_UNSIGNED_INT16
  33    CL_RG           CL_UNSIGNED_INT32
  34    CL_RA           CL_FLOAT
  35    CL_RA           CL_HALF_FLOAT
  36    CL_RA           CL_UNORM_INT8
  37    CL_RA           CL_UNORM_INT16
  38    CL_RA           CL_SNORM_INT16
  39    CL_RA           CL_SIGNED_INT8
  40    CL_RA           CL_SIGNED_INT16
  41    CL_RA           CL_SIGNED_INT32
  42    CL_RA           CL_UNSIGNED_INT8
  43    CL_RA           CL_UNSIGNED_INT16
  44    CL_RA           CL_UNSIGNED_INT32
  45    CL_RGBA         CL_FLOAT
  46    CL_RGBA         CL_HALF_FLOAT
  47    CL_RGBA         CL_UNORM_INT8
  48    CL_RGBA         CL_UNORM_INT16
  49    CL_RGBA         CL_SNORM_INT16
  50    CL_RGBA         CL_SIGNED_INT8
  51    CL_RGBA         CL_SIGNED_INT16
  52    CL_RGBA         CL_SIGNED_INT32
  53    CL_RGBA         CL_UNSIGNED_INT8
  54    CL_RGBA         CL_UNSIGNED_INT16
  55    CL_RGBA         CL_UNSIGNED_INT32
  56    CL_BGRA         CL_UNORM_INT8
  57    CL_BGRA         CL_SIGNED_INT8
  58    CL_BGRA         CL_UNSIGNED_INT8
  59    CL_ARGB         CL_UNORM_INT8
  60    CL_ARGB         CL_SIGNED_INT8
  61    CL_ARGB         CL_UNSIGNED_INT8
  62    CL_INTENSITY    CL_FLOAT
  63    CL_INTENSITY    CL_HALF_FLOAT
  64    CL_INTENSITY    CL_UNORM_INT8
  65    CL_INTENSITY    CL_UNORM_INT16
  66    CL_INTENSITY    CL_SNORM_INT16
  67    CL_LUMINANCE    CL_FLOAT
  68    CL_LUMINANCE    CL_HALF_FLOAT
  69    CL_LUMINANCE    CL_UNORM_INT8
  70    CL_LUMINANCE    CL_UNORM_INT16
  71    CL_LUMINANCE    CL_SNORM_INT16

  ---------------------------------
  3D Image Formats Supported (71)
  ---------------------------------
  #     Channel Order   Channel Type

  1     CL_R            CL_FLOAT
  2     CL_R            CL_HALF_FLOAT
  3     CL_R            CL_UNORM_INT8
  4     CL_R            CL_UNORM_INT16
  5     CL_R            CL_SNORM_INT16
  6     CL_R            CL_SIGNED_INT8
  7     CL_R            CL_SIGNED_INT16
  8     CL_R            CL_SIGNED_INT32
  9     CL_R            CL_UNSIGNED_INT8
  10    CL_R            CL_UNSIGNED_INT16
  11    CL_R            CL_UNSIGNED_INT32
  12    CL_A            CL_FLOAT
  13    CL_A            CL_HALF_FLOAT
  14    CL_A            CL_UNORM_INT8
  15    CL_A            CL_UNORM_INT16
  16    CL_A            CL_SNORM_INT16
  17    CL_A            CL_SIGNED_INT8
  18    CL_A            CL_SIGNED_INT16
  19    CL_A            CL_SIGNED_INT32
  20    CL_A            CL_UNSIGNED_INT8
  21    CL_A            CL_UNSIGNED_INT16
  22    CL_A            CL_UNSIGNED_INT32
  23    CL_RG           CL_FLOAT
  24    CL_RG           CL_HALF_FLOAT
  25    CL_RG           CL_UNORM_INT8
  26    CL_RG           CL_UNORM_INT16
  27    CL_RG           CL_SNORM_INT16
  28    CL_RG           CL_SIGNED_INT8
  29    CL_RG           CL_SIGNED_INT16
  30    CL_RG           CL_SIGNED_INT32
  31    CL_RG           CL_UNSIGNED_INT8
  32    CL_RG           CL_UNSIGNED_INT16
  33    CL_RG           CL_UNSIGNED_INT32
  34    CL_RA           CL_FLOAT
  35    CL_RA           CL_HALF_FLOAT
  36    CL_RA           CL_UNORM_INT8
  37    CL_RA           CL_UNORM_INT16
  38    CL_RA           CL_SNORM_INT16
  39    CL_RA           CL_SIGNED_INT8
  40    CL_RA           CL_SIGNED_INT16
  41    CL_RA           CL_SIGNED_INT32
  42    CL_RA           CL_UNSIGNED_INT8
  43    CL_RA           CL_UNSIGNED_INT16
  44    CL_RA           CL_UNSIGNED_INT32
  45    CL_RGBA         CL_FLOAT
  46    CL_RGBA         CL_HALF_FLOAT
  47    CL_RGBA         CL_UNORM_INT8
  48    CL_RGBA         CL_UNORM_INT16
  49    CL_RGBA         CL_SNORM_INT16
  50    CL_RGBA         CL_SIGNED_INT8
  51    CL_RGBA         CL_SIGNED_INT16
  52    CL_RGBA         CL_SIGNED_INT32
  53    CL_RGBA         CL_UNSIGNED_INT8
  54    CL_RGBA         CL_UNSIGNED_INT16
  55    CL_RGBA         CL_UNSIGNED_INT32
  56    CL_BGRA         CL_UNORM_INT8
  57    CL_BGRA         CL_SIGNED_INT8
  58    CL_BGRA         CL_UNSIGNED_INT8
  59    CL_ARGB         CL_UNORM_INT8
  60    CL_ARGB         CL_SIGNED_INT8
  61    CL_ARGB         CL_UNSIGNED_INT8
  62    CL_INTENSITY    CL_FLOAT
  63    CL_INTENSITY    CL_HALF_FLOAT
  64    CL_INTENSITY    CL_UNORM_INT8
  65    CL_INTENSITY    CL_UNORM_INT16
  66    CL_INTENSITY    CL_SNORM_INT16
  67    CL_LUMINANCE    CL_FLOAT
  68    CL_LUMINANCE    CL_HALF_FLOAT
  69    CL_LUMINANCE    CL_UNORM_INT8
  70    CL_LUMINANCE    CL_UNORM_INT16
  71    CL_LUMINANCE    CL_SNORM_INT16

oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA
3.1.1, SDK Revision = 6161726, NumDevs = 1, Device = ION

System Info:

 Local Time/Date = 23:50:56, 9/1/2010
 CPU Arch: 0
 CPU Level: 6
 # of CPU processors: 2
 Windows Build: 7600
 Windows Ver: 6.1 (Windows Vista / Windows 7)


PASSED


Press <Enter> to Quit...
-----------------------------------------------------------
next step to compile CUDA app

heinz
Title: Re: optimized sources
Post by: _heinz on 03 Sep 2010, 05:04:39 pm
Now "Parallel Studio" (Parallel Composer update6) and latest Intel and Fortran compiler 11.1.067 are installed and working.
1>Compiling with Intel(R) C++ 11.1.067 [IA-32]... (Intel C++ Environment)
1>ap_client_main.cpp
.....
So far the multi-developersystem is working now on the R3600 Atom.
heinz
Title: Re: optimized sources
Post by: _heinz on 05 Sep 2010, 06:17:02 pm
CUDA3.1 in W7 was a hart nut, env vars was not set by setup as supposed.
After sorting it out and made entries by hand it is running now.
Qxh compiled
compiled with:2011 beta updat1
configuration:(Release) Platform(Win32)
g2011_Qxh_ATOM_fft  started
23:55:22.036
g2011_Qxh_ATOM_fft.exe

Device: ION, 1100 MHz clock, 242 MB memory.
Compiled with CUDA 3010.
             --------CUFFT-------  ---This prototype---  ---two way---
   N   Batch Gflop/s  GB/s  error  Gflop/s  GB/s  error  Gflop/s error
   8 1048576    4.3    4.6   1.5      6.8    7.3   1.6      6.9   3.0
  16  524288    5.7    4.6   1.7      7.1    5.7   1.4      7.0   2.3
  64  131072    8.6    4.6   1.7     10.3    5.5   2.2     10.3   3.4
 256   32768   10.0    4.0   2.0      9.4    3.8   2.0      9.5   3.5
 512   16384   10.4    3.7   2.1     12.4    4.4   2.5     12.4   4.2
1024    8192    9.0    2.9   2.5      9.1    2.9   2.4      9.1   4.5
2048    4096    8.5    2.5   2.7      8.8    2.6   3.0      8.9   5.1
4096    2048    7.0    1.9   2.4     10.1    2.7   3.3     10.2   5.4
8192    1024    6.4    1.6   2.4      9.5    2.3   3.4      9.5   5.7

Errors are supposed to be of order of 1 (ULPs).

elapsed time=64 seconds
23:56:26.683
g2011_Qxh_ATOM_fft ended
-------------------------------------------------
compiled with:MS Compiler
configuration:(Release) Platform(Win32)
gmsc_fft  started
23:30:06.184
gmsc_fft.exe

Device: ION, 1100 MHz clock, 242 MB memory.
Compiled with CUDA 3000.
             --------CUFFT-------  ---This prototype---  ---two way---
   N   Batch Gflop/s  GB/s  error  Gflop/s  GB/s  error  Gflop/s error
   8 1048576    0.5    0.5   1.8      6.9    7.3   1.6      6.9   2.1
  16  524288    1.0    0.8   2.2      7.0    5.6   1.5      7.1   1.9
  64  131072    5.8    3.1   1.7     10.3    5.5   2.4     10.3   3.0
 256   32768    8.6    3.4   1.7      9.5    3.8   1.9      9.4   2.9
 512   16384   13.2    4.7   2.1     12.5    4.4   2.5     12.4   3.7
1024    8192    9.5    3.0   2.3     10.6    3.4   2.4     10.6   3.9
2048    4096    8.6    2.5   2.6      8.8    2.6   3.0      8.9   4.5
4096    2048    7.4    2.0   2.2     10.1    2.7   3.3     10.2   4.9
8192    1024    9.1    2.3   2.8      9.6    2.4   3.4      9.5   5.2

Errors are supposed to be of order of 1 (ULPs).

elapsed time=249 seconds
23:34:15.909
gmsc_fft ended
-------------------------------------------------
MS-Compiler with CUDA3000 need for the same job  ~4 much more times.
heinz
Title: Re: optimized sources
Post by: Jason G on 05 Sep 2010, 06:48:10 pm
Hi Heinz,

Numbers come out different when you change to the same data set size that Multibeam apps use ( 1*1024*1024 complex data points).
CUFFT is not very fast at the small sizes for that small amount of data.  It gets better relatively as the FFT size goes up.  I haven't optimised these custom ones (So they remain ~G80 GPU arranged), but did change the results to give in-otder output.  Didn't need two-way, so made forward & inverse transforms instead.

You can see CUFFT goes pretty slowly when doing many small transforms on our smaller dataset.

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compiled with CUDA 3000.
             --------CUFFT-------  ---FFT--------------  ---IFFT------
   N   Batch Gflop/s  GB/s  error  Gflop/s  GB/s  error  Gflop/s error
   8  131072    7.1    7.6   1.4    140.0  149.4   1.1    140.5   1.1
  16   65536   16.1   12.9   1.7    183.1  146.5   1.0    183.7   1.0
  64   16384  259.2  138.2   1.4    280.0  149.4   1.4    279.7   1.4
 256    4096  352.2  140.9   1.4    352.8  141.1   1.5    352.0   1.5
 512    2048  413.3  146.9   1.8    411.8  146.4   1.8    412.2   1.8

Errors are supposed to be of order of 1 (ULPs).


Title: Re: optimized sources
Post by: _heinz on 06 Sep 2010, 08:35:37 pm
thanks Jason, will try it as you said.

the modified bench shows the runtime dependencies from different compiler and compiler options. Reference is MSC compiler. It is always the same source.

CLEANUP DONE

 1 reference science app(s) found
   └─(gmsc_fft.exe)

 7 science app(s) found
   └─(g054_fft.exe)
   └─(g060_fft.exe)
   └─(g065_fft.exe)
   └─(g2011_fft.exe)
   └─(g2011_Qxh_fft.exe)
   └─(g2011_SSSE3_fft.exe)
   └─(gcomp_u6_fft.exe)

======================================

------------------------
Running app : gmsc_fft.exe
Started at  : 00:39:39.679
Ended at    : 00:41:06.070
     86.346 secs Elapsed
     85.703 secs CPU time
Result      : stored as reference.
------------------------
Running app : g054_fft.exe
Started at  : 00:41:06.137
Ended at    : 00:41:25.189
     19.008 secs Elapsed
     26.172 secs CPU time
Speedup     : 69.46%
Ratio       : 3.27 x
------------------------
Running app : g060_fft.exe
Started at  : 00:41:25.318
Ended at    : 00:41:44.833
     19.466 secs Elapsed
     18.844 secs CPU time
Speedup     : 78.01%
Ratio       : 4.55 x
------------------------
Running app : g065_fft.exe
Started at  : 00:41:44.967
Ended at    : 00:42:04.481
     19.467 secs Elapsed
     18.813 secs CPU time
Speedup     : 78.05%
Ratio       : 4.56 x
------------------------
Running app : g2011_fft.exe
Started at  : 00:42:04.602
Ended at    : 00:42:23.763
     19.117 secs Elapsed
     18.391 secs CPU time
Speedup     : 78.54%
Ratio       : 4.66 x
------------------------
Running app : g2011_Qxh_fft.exe
Started at  : 00:42:23.896
Ended at    : 00:42:42.663
     18.722 secs Elapsed
     26.109 secs CPU time
Speedup     : 69.54%
Ratio       : 3.28 x
------------------------
Running app : g2011_SSSE3_fft.exe
Started at  : 00:42:42.786
Ended at    : 00:43:01.603
     18.774 secs Elapsed
     24.594 secs CPU time
Speedup     : 71.30%
Ratio       : 3.48 x
------------------------
Running app : gcomp_u6_fft.exe
Started at  : 00:43:01.725
Ended at    : 00:43:21.231
     19.463 secs Elapsed
     19.047 secs CPU time
Speedup     : 77.78%
Ratio       : 4.50 x
------------------------

Collecting hardware / OS infos, please wait...
Sorting ...

Bench results file V8-SK01-07.09.2010-046-bench.txt
stored in .\Testdatas\ directory.

Quick timetable
--------------------------------------
gmsc_fft.exe : 85.703 secs CPU
Result      : stored as reference.
--------------------------------------
g054_fft.exe : 26.172 secs CPU
Speedup     : 69.46%
Ratio       : 3.27 x
--------------------------------------
g060_fft.exe : 18.844 secs CPU
Speedup     : 78.01%
Ratio       : 4.55 x
--------------------------------------
g065_fft.exe : 18.813 secs CPU
Speedup     : 78.05%
Ratio       : 4.56 x
--------------------------------------
g2011_fft.exe : 18.391 secs CPU
Speedup     : 78.54%
Ratio       : 4.66 x
--------------------------------------
g2011_Qxh_fft.exe : 26.109 secs CPU
Speedup     : 69.54%
Ratio       : 3.28 x
--------------------------------------
g2011_SSSE3_fft.exe : 24.594 secs CPU
Speedup     : 71.30%
Ratio       : 3.48 x
--------------------------------------
gcomp_u6_fft.exe : 19.047 secs CPU
Speedup     : 77.78%
Ratio       : 4.50 x
--------------------------------------

======================================
heinz
Title: Re: optimized sources
Post by: _heinz on 07 Sep 2010, 03:38:45 pm
Hi Heinz,

Numbers come out different when you change to the same data set size that Multibeam apps use ( 1*1024*1024 complex data points).
CUFFT is not very fast at the small sizes for that small amount of data.  It gets better relatively as the FFT size goes up.  I haven't optimised these custom ones (So they remain ~G80 GPU arranged), but did change the results to give in-otder output.  Didn't need two-way, so made forward & inverse transforms instead.

You can see CUFFT goes pretty slowly when doing many small transforms on our smaller dataset.

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compiled with CUDA 3000.
             --------CUFFT-------  ---FFT--------------  ---IFFT------
   N   Batch Gflop/s  GB/s  error  Gflop/s  GB/s  error  Gflop/s error
   8  131072    7.1    7.6   1.4    140.0  149.4   1.1    140.5   1.1
  16   65536   16.1   12.9   1.7    183.1  146.5   1.0    183.7   1.0
  64   16384  259.2  138.2   1.4    280.0  149.4   1.4    279.7   1.4
 256    4096  352.2  140.9   1.4    352.8  141.1   1.5    352.0   1.5
 512    2048  413.3  146.9   1.8    411.8  146.4   1.8    412.2   1.8

Errors are supposed to be of order of 1 (ULPs).



Hi Jason,
compiled a g2011_QxSSE3_ATOM_fft_small with
//    int n_entries = 8*1024*1024;
   int n_entries = 1*1024*1024;
I can't see any big differences in the Gflops against the full matrix.
But to confirm this I will make a testserie for all 1-8*1024*1024 matrices.
protocoll attached
heinz
Title: Re: optimized sources
Post by: Jason G on 07 Sep 2010, 04:11:44 pm
Aha. Yes there will be some consderation to how the FFTs fit on the GPU.  Of course 480 can fit more at once, so the small dataset is underutilising the GPU.  It's pretty clear then for the larger GPUs then the concurrent streams must be used with the small dataset.

It will be interesting to make a modified test to do 4 or 16 of the smaller batches of FFTs at the same time on Fermi to see if that'll bring GPU usage up.
Title: Re: optimized sources
Post by: _heinz on 07 Sep 2010, 04:52:12 pm
Hi Jason,
I run gp_fft_1-8 on the ION (R3600 ATOM)
if you would have a closer look, resultfile attached
modify:
compiled 1-8 on the Xeon (GF470) and run it to see if there are general differences to the ION
have a look on the runtimes
resultfile attached.
some later:
I run GPUZ while 1-8 are running and it is shown, at the beginning when the short 1,2,3,4 are running the GPU usage started with 95% and then slowly fall back to 40% when 8 is running
look at the picture here (http://www.britta-d.de/images/gtx470/gtx470_fft_1-8_gpu_load.jpg)

A complete other picture shows the ION, 1-4 shows at the beginning of 1 ca 70% but then it is going over into spikes and periods of no cpu usage, that means the necessary calculations between the batches took to much time to feed the gpu continious.
ion_fft_1-4_gpu_load (http://www.britta-d.de/images/gtx470/ion_fft_1-4_gpu_load.jpg)
ion_fft_6-7_gpu_load (http://www.britta-d.de/images/gtx470/ion_fft_6-7_gpu_load.jpg)
heinz
Title: Re: optimized sources
Post by: Raistmer on 08 Sep 2010, 02:38:41 am
For emulating current MB FFT situation it's worth to test FFT in bunches where num_of_ffts*size_of_fft always == 1M dots.
That is, small fft size means big number of FFTs that can be done at once whereas large FFTs come in smaller numbers.
If few cfft pairs will be unrolled the rule stays the same, just 1M should be replaced to 1M*number_of_unrolled_cfft_pairs.
Title: Re: optimized sources
Post by: Jason G on 08 Sep 2010, 02:55:38 am
Yes, it's interesting that Heinz's ION shows no speed difference with changes to the batch to reflect the 1M points we use in multibeam. 

The smaller ION GPU seems already filled with this smaller batch.  I have before made a modified version of this test that sticks to 1M points, but chains it with the getpowerspectrum kernel like in the app, and includes flops for that accordingly, so it better matches what we'll need for profiling / refinement.  That one I'll have to dig out from backups, due to recent OS reinstall, and it has the small size freaky powerspectrum prototypes in it against stock method.

All that is clear so far is that 1M points doesn't fill the 480 here, so concurrency at the chirp rate level may be necessary for larger cards.

Curiously, nSight measuies my smaller sized FFTs at .33 occupancy, and the CuFFT ones at .17 , and I tuned those for generic compute capability 1.0 devices.  That would seem to further indicate to me that CuFFT is really meant for larger batches than our 1M points ... (Or is being used incorrectly somehow  :o)

Title: Re: optimized sources
Post by: _heinz on 08 Sep 2010, 06:20:14 am
To make things more clear I compared gpu-load 1 against 8
the first compact part you see is 1, after that comes 8 in exact 9 pieces, the max value in the middle is 512.

Device: ION, 1100 MHz clock, 242 MB memory.
Compiled with CUDA 3010.
             --------CUFFT-------  ---This prototype---  ---two way---
   N   Batch Gflop/s  GB/s  error  Gflop/s  GB/s  error  Gflop/s error
   8 1048576    4.3    4.6   1.5      6.7    7.2   1.6      6.8   3.0
  16  524288    5.5    4.4   1.7      7.0    5.6   1.4      7.0   2.3
  64  131072    8.6    4.6   1.7     10.3    5.5   2.2     10.3   3.4
 256   32768   10.0    4.0   2.0      9.5    3.8   2.0      9.7   3.5
 512   16384   10.5    3.7   2.1     12.4    4.4   2.5     12.4   4.2
1024    8192    9.0    2.9   2.5      9.1    2.9   2.4      9.0   4.5
2048    4096    8.5    2.5   2.7      8.8    2.6   3.0      8.8   5.1
4096    2048    7.0    1.9   2.4      9.9    2.7   3.3     10.2   5.4
8192    1024    6.4    1.6   2.4      9.5    2.3   3.4      9.4   5.7

It is clear to see like the 9 batchs implizit gpu_load on the ION
 ion_fft_1_and_8_gpu_load (http://www.britta-d.de/images/gtx470/ion_fft_1_and_8_gpu_load.jpg)

heinz
Title: Re: optimized sources
Post by: Jason G on 08 Sep 2010, 11:09:21 am
You need nSight Heinz:

Title: Re: optimized sources
Post by: _heinz on 08 Sep 2010, 04:58:19 pm
Hi Jason,
downloading nSight now, need I a Standard license code or a Professional ?
btw 2011 is installed on the R3600 Atom now.
heinz
Title: Re: optimized sources
Post by: Jason G on 08 Sep 2010, 06:16:41 pm
They are giving away a short pro license with the free standard one I think.  I entered the pro one to try out, but I'm not sure what the features difference is.

Also note that it must be used on Visual Studio 2008 with Service pack 1 for now. 
Title: Re: optimized sources
Post by: _heinz on 15 Sep 2010, 05:35:45 pm
Now I asked NVIDIA to become a registered developer. At the beginning of CUDA I was already a registered developer, but as they reorganized their websites I lost my status and new registration was necessary.
Now I'm waiting for a answer.

heinz
Title: Re: optimized sources
Post by: _heinz on 15 Sep 2010, 05:57:47 pm
Also note that it must be used on Visual Studio 2008 with Service pack 1 for now. 
I tried to install "Service Pack 1" but it says Service Pack 1 is already included in "VS2008 Professional", operation aborted.
As I want to install now "Parallel_Nsight_Host_Win32_1.0.10200 (Jul 2010)" it says Service Pack1 is not installed, rollback installation.
Anyhow curious...
Any ideas ?
Perhaps I should post this in NVIDIA forums.
heinz
Title: Re: optimized sources
Post by: Jason G on 15 Sep 2010, 06:14:27 pm
Yeah, really weird heinz  :o  my installation was from raw VS2008, then applied the service pack, then installed nSight no problems.  Maybe they have some slight problem with SP1 integrated version.  My guess is you wouldn't be the first to see this issue and you'll find a workaround/fix on the forum already.
Title: Re: optimized sources
Post by: _heinz on 18 Sep 2010, 02:01:09 pm
Hi Jason,
I found the issue.. Because I had already installed(automatic) several security updates for VS2008, I had to run VS2008-PatchRemovalTool-x86 before install VS2008SP1.
After VS200SP1 was sucessful, I installed now Parallel_Nsight_Host_Win32_1.0.10200 (Jul 2010)
and Parallel_Nsight_Monitor_Win32_1.0.10200 (Jul 2010).
I registered and get Standard and Professional License.
Professiona licence expired October 1st, 2010
Professional licence is activated now.
Realy a short periode till October 1st, 2010, we have not a lot time.

Heinz
Title: Re: optimized sources
Post by: _heinz on 22 Sep 2010, 06:02:56 pm
The pleasure was a short one. As I profiled the app FFT a DOS windows opened and a crash occured. No further using of VS2008 was possible(hardware reset was necessary). Because this happened I deinstalled Nsight and the complete dev-environment. I ordered a 2.5" SEAGATE Momentus XT 500GB 7200.1 32MB to push up the power of R3600 and "Acronis True Image Home 2011" to work in "try and decide" mode. This is necessary to avoid difficulties with the different Compiler-packages(Parallel Studio 2011 and ICC Professional)

heinz 
 
Title: Re: optimized sources
Post by: _heinz on 27 Sep 2010, 09:07:00 pm
Today I installed ICC(067) and compiled ap rev 443 sucessful on the R3600 Atom.

heinz
Title: Re: optimized sources
Post by: _heinz on 28 Sep 2010, 10:22:28 am
performance of the new hybrid disk on R3600
ST950056_20AS_readtest (http://www.britta-d.de/images/r3600/ST950056_20AS_readtest.jpg)
FW:SD23
disk added as external via eSATA, case "Revo Alu Guard"
and used as data storage- and backup- system.
Later I will use it to install W7 64Bit on partition1 (180GB)
heinz
Title: Re: optimized sources
Post by: cristipurdel on 28 Sep 2010, 02:52:39 pm
Question:
Is there an app that shows which application uses the different types of optimization?
For example, when I run seti stock, I want to see what optimization is in the application, and when I run the optimized application from lunatics installer, I want to see the optimizations (e.g. SSSE3, ...)
Title: Re: optimized sources
Post by: _heinz on 28 Sep 2010, 04:10:15 pm
@cristipurdel
as far as I know the Unified Installer v0.37 has a detection mechanism of the CPU, but you can choose your app and the optimization(SSSE3) like yo can see there -->
Unified_Installer_v0.37 (http://www.britta-d.de/images/r3600/Unified_Installer_v0.37.jpg)
perhaps Jason can tell you some more about the mechanism of the installer.
further you can see it if you are looking at the stderr protocoll of the calculated wu's on your host.
heinz
Title: Re: optimized sources
Post by: cristipurdel on 28 Sep 2010, 04:16:17 pm
Sorry, for not being more precise. I'm interested in a general application that can detect the cpu optimizations. The lunatics installer was just an example.
Title: Re: optimized sources
Post by: Jason G on 28 Sep 2010, 04:18:16 pm
SSE through SSE4.2 aren't 'optimisations', they are instruction sets.  As such there aren't 'usuallly' outward means of determining whether a given application uses certain instructions, though we usually put the maximum instruction set level (SSE Level) as  part of the file name.

Stock Mutibeam uses internal benching/dispatching mechanisms to decide which functions to use.  Typically an AK or BH variant is chosen fom those, and the selected function is noted in the stderr output. Those are mostly SSE.

In many cases the instructions chosen represent microarchitectural optimisation, which is one level of optimisation that applies only to specific hardware.  Most optimisations that provide greatest benefit tend to be algorithmic (general) optimisations and are not dependant on the instructions used.  In those cases there are no outward indications of hardware required.

Differnet instructions from different SSE levels built into the microprocessors may or may not be useful for given code, and in most cases simply telling the compiler to use those instructions doesn't do a very good job (i.e. is niot optimisation!)

If you really want to 'see' what instructions were used, then the most effective means I know of would be to use a debugger that shows some disssassembly of the executable code, which you could then look up the instructions in CPU manufacturer reference materials.  Short of that, looking at the source code if curious is never a bad idea IMO if available. (and quite a bit easier  ;))

Jason
Title: Re: optimized sources
Post by: _heinz on 28 Sep 2010, 04:29:19 pm
Sorry, for not being more precise. I'm interested in a general application that can detect the cpu optimizations. The lunatics installer was just an example.
if you want to see which instruction set(SSE SSE2 SSE3 SSSE3 etc) your cpu support you can use Everest Ultimate (http://www.lavalys.com/)
or very easy cpuz
~~~~~~~~
A typical report from Everest looks like this:
Informationsliste   Wert
CPU-Eigenschaften   
CPU Typ   Intel Atom 230, 1600 MHz (12 x 133)
CPU Bezeichnung   Diamondville-SC
CPU stepping   C0
Befehlssatz   x86, x86-64, MMX, SSE, SSE2, SSE3, SSSE3
Vorgesehene Taktung   1600 MHz
Min / Max CPU Multiplikator   6x / 12x
Engineering Sample   Nein
L1 Code Cache   32 KB
L1 Datencache   24 KB
L2 Cache   512 KB  (On-Die, ECC, ASC, Full-Speed)
   
Multi CPU   
Motherboard ID   nVidia MCP79
CPU #1   Intel(R) Atom(TM) CPU 230 @ 1.60GHz, 1600 MHz
CPU #2   Intel(R) Atom(TM) CPU 230 @ 1.60GHz, 1600 MHz
   
CPU Technische Informationen   
Gehäusetyp   437 Ball FC-BGA
Gehäusegröße   2.2 cm x 2.2 cm
Transistoren   47 Mio.
Fertigungstechnologie   45 nm, CMOS, Cu, High-K + Metal Gate
Gehäusefläche   25 mm2
Typische Leistung   4 W @ 1.60 GHz
   
CPU Hersteller   
Firmenname   Intel Corporation
Produktinformation   http://www.intel.com/products/processor
   
CPU Auslastung   
1. CPU / 1. HTT Einheit   0 %
1. CPU / 2. HTT Einheit   0 %


heinz
Title: Re: optimized sources
Post by: Raistmer on 28 Sep 2010, 04:55:03 pm
Debugger as Jason said + profiler like VTune or Code Analyst. They will show "optimization level" in some performance terms and actually intended to be used for "optimization level" assessment.
Title: Re: optimized sources
Post by: cristipurdel on 28 Sep 2010, 05:12:13 pm

Differnet instructions from different SSE levels built into the microprocessors may or may not be useful for given code, and in most cases simply telling the compiler to use those instructions doesn't do a very good job (i.e. is niot optimisation!)

Jason
I saw that some programs require Intel MKL to 'enhance' the computing capabilities and better use the 'optimizations' inside the processor. But when I saw this http://www.agner.org/optimize/blog/read.php?i=49#121 I wondered if there were any free version which could 'enhance' the mkl on my cpu, and not cripple the performance on an amd cpu.
Title: Re: optimized sources
Post by: Jason G on 28 Sep 2010, 07:22:22 pm
I saw that some programs require Intel MKL to 'enhance' the computing capabilities and better use the 'optimizations' inside the processor. But when I saw this http://www.agner.org/optimize/blog/read.php?i=49#121 I wondered if there were any free version which could 'enhance' the mkl on my cpu, and not cripple the performance on an amd cpu.

Not this old chestnut again  ::) It's getting rather tired.

The suggestion there is that Intel's MKL library should be optimised for use on AMD CPUs. That's not something I would either expect or need, mostly since we don't use MKL - don't really care.  What should reallly happen is that AMD should write their own compiler & libraries, rather than play dirty marketing tricks to fool the public that don't know about coding, compilers & microarchitecture. 

They (AMD/ATI) have been trying the same garbage against nVidia too, and it fails... because their investment in software development and support for developers in general is very poor compared to both Intel and nVidia.

Agner Fog is a respected expert in CPU performance and criticises certain Intel tactics with their performance libraries.  Those are well established and justified in certain contexts only... namely code that is not hand optimised, and developers use the compilers & libraries without knowing what's going on inside epecting the best performance. These involve dispatch mechanisms we don't use in our builds since they can result in lress than optimal code paths for many CPUs in our target audience.  Intel compilers produce the fastest multbeam builds under windows on AMD chips, provided dynamic dispatch is not used ... There is no 'crippling' going on here... though I  would as always invite anyone to make faster builds for any platform.

Since we don't use Intel compiler's dynamic dispatch mechanisms (which are subject to choosing code based on processor type) ,  the builds do not run a generic px code path for AMD chips, and only have a single code path. 

Optimisation that we do here is less a function of the compiler & more a function of 'hand rolling'.  Expecting a compiler alone, whatever options & libraries are used, to do the best optimisation job is naive.  Agner Fog's Manuals detail several strategies for ensuring the right code is generated in builds here, and of those we use several.  unfortunately even Intel's compilers with the workarounds aplied doesn't magically obercome hardware CPU limitations.

Jason
Title: Re: optimized sources
Post by: _heinz on 29 Sep 2010, 05:00:33 pm
published 09/28/2010
CUDA Toolkit 3.2 RC (September 2010)
New and Improved CUDA Libraries
(now include Fermi architecture GPUs)
Its worth to have a look there
http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html

heinz
Title: Re: optimized sources
Post by: _heinz on 04 Oct 2010, 03:53:10 pm
3.2 is installed now and running


 CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "ION"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 253296640 bytes
  Multiprocessors x Cores/MP = Cores:            2 (MP) x 8 (Cores/MP) = 16 (Cor
es)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.10 GHz
  Concurrent copy and execution:                 No
  Run time limit on kernels:                     Yes
  Integrated:                                    Yes
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads
can use this device simultaneously)
  Concurrent kernel execution:                   No
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Vers
ion = 3.20, NumDevs = 1, Device = ION


PASSED
~~~~~~~~~~~~~~~~
and BOINC shows:
04.10.2010 21:26:28      NVIDIA GPU 0: ION (driver version 26061, CUDA version 3020, compute capability 1.1, 242MB, 35 GFLOPS peak)

heinz
Title: Re: optimized sources
Post by: _heinz on 05 Oct 2010, 11:21:15 am
Cuda 3.20 does not answer our expectations on this ION chipset.
ICC067: with CUDA3020 we have a -3% against Composer update6(CUDA3000)
if we use MKL(parallel) we can reach nearly the same as our reference(CUDA3000)
PS2011:the most speedup +10.31% CUDA3010 Parallel Studio2011
so the best is to wait till CUDA 3.20 is out of the Beta.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
========================
gcomp_u6_fft.exe
AppName: gcomp_u6_fft.exe
Started at  : 16:41:31.760
Ended at    : 16:42:37.436
     65.520 secs Elapsed
     64.584 secs CPU time
------------------------
g067_cuda32_fft.exe
AppName: g067_cuda32_fft.exe
Started at  : 16:42:37.561
Ended at    : 16:43:44.360
     66.659 secs Elapsed
     66.519 secs CPU time
Speedup     : -3.00%
Ratio       : 0.97 x
------------------------
g067_mklp_fft.exe
AppName: g067_mklp_fft.exe
Started at  : 16:43:44.672
Ended at    : 16:44:49.194
     64.381 secs Elapsed
     64.179 secs CPU time
Speedup     : 0.63%
Ratio       : 1.01 x
------------------------
g067_mkls_fft.exe
AppName: g067_mkls_fft.exe
Started at  : 16:44:49.412
Ended at    : 16:45:56.149
     66.596 secs Elapsed
     66.394 secs CPU time
Speedup     : -2.80%
Ratio       : 0.97 x
------------------------
g2011_fft.exe
AppName: g2011_fft.exe
Started at  : 16:45:56.399
Ended at    : 16:46:54.743
     58.219 secs Elapsed
     57.923 secs CPU time
Speedup     : 10.31%
Ratio       : 1.11 x
------------------------
g2011_SSSE3_fft.exe
AppName: g2011_SSSE3_fft.exe
Started at  : 16:46:54.977
Ended at    : 16:47:53.524
     58.422 secs Elapsed
     59.218 secs CPU time
Speedup     : 8.31%
Ratio       : 1.09 x
------------------------
 
Quick timetable
--------------------------------------
gcomp_u6_fft.exe : 64.584 secs CPU
Result      : stored as reference.
--------------------------------------
g067_cuda32_fft.exe : 66.519 secs CPU
Speedup     : -3.00%
Ratio       : 0.97 x
--------------------------------------
g067_mklp_fft.exe : 64.179 secs CPU
Speedup     : 0.63%
Ratio       : 1.01 x
--------------------------------------
g067_mkls_fft.exe : 66.394 secs CPU
Speedup     : -2.80%
Ratio       : 0.97 x
--------------------------------------
g2011_fft.exe : 57.923 secs CPU
Speedup     : 10.31%
Ratio       : 1.11 x
--------------------------------------
g2011_SSSE3_fft.exe : 59.218 secs CPU
Speedup     : 8.31%
Ratio       : 1.09 x
--------------------------------------
 
 
------------------------
CPU:
Number of processors   1
Number of cores      1 (max 1)
Specification      Intel(R) Atom(TM) CPU  230   @ 1.60GHz (Engineering Sample)
Codename      Silverthorne
Core Speed      1600.1 MHz (12.0 x 133.3 MHz)
Core Stepping      C0
Technology      45 nm
Stock frequency      1666 MHz
------------------------
Chipset:
Northbridge      NVIDIA ID0A82 rev. B1
Southbridge      NVIDIA ID0AAD rev. B2
------------------------
RAM:
Memory Type      
Memory Size      1792 MBytes
------------------------
OS:
Windows Version      Microsoft Windows Vista (6.1) Home Premium Edition   (Build 7600)
========================
heinz
Title: Re: optimized sources
Post by: _heinz on 08 Oct 2010, 06:37:31 pm
Hi Jason,
Today I cloned my system, to work with different compiler-products(ICSProf and PS2011) independent.
This seems the best solution at time, as long as Intel did not use a common menue with switch-technology for all its compiler products(i.e. ICS067, PS2011).
For instance a user has own both Products(ICSProf and PS2011) so he has a valid license for ICSProf wich includes IPP, ITBB and MKL and a valid license for PS2011 which includes IPP, ITBB but he can not use MKL in PS2011(because it is not part of the product PS2011).
I moaned this already in the Beta of PS2011, but had no sucess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By the way the 500GB Hybrid disk homes now the clone as follow:
Partition1 80GB System(clone)
Partition2 55GB DATA(clone)
Partition3 158GB common data(original and clone)
Partition4 158GB Backup area (for original and clone)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
later I will install BOINC in the common data area, so that original and clone has the same data and can run in both systems. Hope this will work.

ICSProf - Intel Compiler Suite Professional
PS2011 - Intel Parallel Studio 2011

heinz
modify: some typo

Title: Re: optimized sources
Post by: _heinz on 09 Oct 2010, 10:03:32 am
Today I can confirm both, the clone and the original, can work with installed BOINC in common area and both took off work and progress from the other side.
Great, it works.
~~~~~~~~~
heinz
Title: Re: optimized sources
Post by: _heinz on 10 Oct 2010, 04:33:19 pm
By the way today 10.10.2010 I got Total Credit 14,009,856  ;)
Although I had have a long out time this summer, V8-Xeon started now with some good numbers.
Title: Re: optimized sources
Post by: _heinz on 11 Oct 2010, 03:33:16 pm
Today I installed PS2011 on the cloned OS of R3600.
Additionel I had to create a new directory PS2011_2k8 in our source tree to hold solutionfiles separate from ICC11_2k8. This is necessary because the installation of PS2011 makes ICC11(067) unuseable by overwritten IDE integration. And when the projectfile is converted to PS2011 there is no step to go back to ICC11 anymore.
It's not easy to create a full functional multi-developer environment.

heinz

 
Title: Re: optimized sources
Post by: _heinz on 13 Oct 2010, 05:09:26 pm
How to use MKL in PS2011
Normally PS2011 does not support MKL, you remember we have no menu entry.
I found a way to use it without any menu entry
~~~~~~~~~~~~~~
include
for MKL additional
;$(INTEL_DEF_IA32_INSTALL_DIR)MKL\Include;$(INTEL_DEF_IA32_INSTALL_DIR)MKL\Include\fftw

Linker section
(to use MKL sequential)
please add to
Additional Libraries Directory:
;$(INTEL_DEF_IA32_INSTALL_DIR)MKL\ia32\lib
Input Additional dependencies:
mkl_solver_sequential.lib mkl_intel_c.lib mkl_sequential.lib mkl_core.lib
~~~~~~~~~~~~~~~~~~~~~
Linker section
(to use MKL parallel)
please add to
Additional Libraries Directory:
$(INTEL_DEF_IA32_INSTALL_DIR)Lib\ia32;$(INTEL_DEF_IA32_INSTALL_DIR)MKL\ia32\Lib
Input Additional dependencies:
mkl_solver.lib mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib
~~~~~~~~~~~~~~~~~~~~~
In Visual Studio I had additional to add a hard link to VC++Directories --> Libraries
$(INTEL_DEF_IA32_INSTALL_DIR)lib\ia32
$(INTEL_DEF_IA32_INSTALL_DIR)MKL\ia32\lib

D:\I\Intel\Compiler\11.1\067\mkl\ia32\lib
~~~~~~~~~~~~~~~~~~~~~~~~~
Regards
Title: Re: optimized sources
Post by: _heinz on 13 Oct 2010, 06:07:37 pm
Last week I bought a additional license of "Intel Parallel Studio 2011" to install it on the cloned OS. That's my personal small amount to support seti.  ;)
Have a look at the Emergency fund drive (http://setiathome.berkeley.edu/forum_thread.php?id=61691)
Your donations are welcome.

heinz
Title: Re: optimized sources
Post by: _heinz on 14 Oct 2010, 11:32:10 am
90 000 calls... really a epic thread
thank you to all readers looking up here.

Kindest regards Heinz  ;)
Title: Re: optimized sources
Post by: _heinz on 15 Oct 2010, 05:29:07 pm
Hi,
Dual-Core Intel® Xeon® processor 5100 series systems. To get the best Intel MKL
performance on Dual-Core Intel® Xeon® processor 5100 series systems, enable the
Hardware DPL (streaming data) Prefetcher functionality of this processor. To configure this
functionality, use the appropriate BIOS settings where, as described in your BIOS
documentation.

heinz
Title: Re: optimized sources
Post by: _heinz on 16 Oct 2010, 12:04:20 pm
Hi,
~~~~~~~~~~~~~~~~
16.10.2010 17:11:41      NVIDIA GPU 0: GeForce GTX 470 (driver version 25896, CUDA version 3010, compute capability 2.0, 1248MB, 726 GFLOPS peak)
16.10.2010 17:11:41      NVIDIA GPU 1: GeForce GTX 470 (driver version 25896, CUDA version 3010, compute capability 2.0, 1249MB, 726 GFLOPS peak)
~~~~~~~~~~~~~~~~
got it running.
now we have a dual pair of GTX470 for our testfield

heinz
Title: Re: optimized sources
Post by: _heinz on 18 Oct 2010, 01:26:59 pm
For all who own GTX470, GTX480 and run out of work.
I tested the new app from Milkyway, it runs on my two GTX470.
Have a look there (http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1987)
You can now choose milkyway as your backup project   ;)
heinz
Title: Re: optimized sources
Post by: _heinz on 25 Oct 2010, 06:19:24 pm
Out of topic:
In this times of longer seti outage(till the new servers arrive and work properly) I switched now
to Collatz. The 2 GTX470(Colorful) produce ~230 -250000 per day(nothing oc'ed) and the 8 cpus are running Docking. No problem to feed 2 GTX470 with a real 8-core.
a typical result:
Run time 537.841553
CPU time 57.96875
stderr out <core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Collatz v2.05 for CUDA 3.1
Start           2366225450231397067112
Checking        206,158,430,208 numbers
Calculate       65,536 items per loop
Loop            8,192 times per reduction
Sleep           0 ms while waiting
Device          GeForce GTX 470
Memory          1,248 MB (1,199/1,248 free/total MB)
Highest Steps   1,904 for 2366225450271017930578
Total Steps     101,532,477,026,622
GPU             534.942 seconds
Total           536.445 seconds
23:55:41 (4672): called boinc_finish

</stderr_txt>
]]>
 
Validate state Valid


heinz  ;)
Title: Re: optimized sources
Post by: _heinz on 30 Oct 2010, 06:23:20 pm
Good news: France Telecom offered me 1,2Mbit
and it works now as you can see:
ADSL-Firmwareversion: A2pB023b.d20e
Modemstatus Connected
Downstream-Verbindungsgeschwindigkeit 1214 kbps
Upstream-Verbindungsgeschwindigkeit 326 kbps
VPI 8
VCI 35
~~~~~~~~~~~~~~~
better than nothing, till now I was sitting behind a 64Kbit line
So I'm happy with it
heinz
Title: Re: optimized sources
Post by: Urs Echternacht on 30 Oct 2010, 07:06:10 pm
 :) Nice and i'm getting envious having actually only a third of that.
Title: Re: optimized sources
Post by: Jason G on 30 Oct 2010, 07:32:40 pm
What's going on over there then ?  Not that long ago my ISP, without asking, dropped the cable price by $10/month & tripled the downloads speed to 20Mbps.
Title: Re: optimized sources
Post by: Urs Echternacht on 30 Oct 2010, 07:35:15 pm
 :o :'( ;) Stop it, please.
Title: Re: optimized sources
Post by: Jason G on 30 Oct 2010, 07:41:53 pm
OK  :P
(http://www.speedtest.net/result/1010937490.png) (http://www.speedtest.net)
Title: Re: optimized sources
Post by: Raistmer on 30 Oct 2010, 07:46:52 pm
Some days I had 100Mbit/s internet link from home... ;D
But after my ISP ws aquired by another one good days are over
Title: Re: optimized sources
Post by: Jason G on 30 Oct 2010, 07:47:51 pm
We better stop poking the Germans & French, they get crappy internet connectiions  :-X
Title: Re: optimized sources
Post by: Urs Echternacht on 30 Oct 2010, 07:49:38 pm
What tortures today... ;D

Just waiting until the fibres to the home are covered by soil.  ;)
Title: Re: optimized sources
Post by: Cosmic_Ocean on 30 Oct 2010, 08:07:58 pm
Mine's not so bad.

(http://www.speedtest.net/result/1010950368.png) (http://www.speedtest.net)

That's with PowerBoost though.  After about 30 seconds in either direction, it drops off to 13/2.  Used to be 6/384 until the infrastructure got upgraded to DOCSIS 3.0.  That's all I can get with a 2.0 modem.  Waiting for the prices of the 3.0 modems to come down (I refuse to rent/lease from the ISP), and 50/10 can be had.
Title: Re: optimized sources
Post by: SubSpace on 30 Oct 2010, 08:53:09 pm
(http://www.speedtest.net/result/1010976409.png) (http://www.speedtest.net)

The channel is limited by the operator, 20 megabits on downloading. Speaks: "100 you will not receive, the physiognomy will burst" (or mug crack).  ;D  ;D  ;D
Title: Re: optimized sources
Post by: Geek@Play on 30 Oct 2010, 10:40:54 pm
Slowest ISP on the planet.................that's Verizon

(http://www.speedtest.net/result/1011030691.png) (http://www.speedtest.net)
Title: Re: optimized sources
Post by: corsair on 31 Oct 2010, 05:31:16 am
(http://www.speedtest.net/result/1011263716.png)

that's mine in Spain we have the problem of slow upload speeds in all ISP.
Title: Re: optimized sources
Post by: Raistmer on 31 Oct 2010, 05:37:02 am
(http://www.speedtest.net/result/1011269409.png) (http://www.speedtest.net)
ROFL, looks like I lead with lowest ping so far ;D ;D ;D
Title: Re: optimized sources
Post by: perryjay on 31 Oct 2010, 09:09:07 am
Running a little slow this morning..

(http://www.speedtest.net/result/1011438565.png)
Title: Re: optimized sources
Post by: SubSpace on 31 Oct 2010, 09:10:19 am
(http://www.speedtest.net/result/1010976409.png) (http://www.speedtest.net)

The channel is limited by the operator, 20 megabits on downloading. Speaks: "100 you will not receive, the physiognomy will burst" (or mug crack).  ;D  ;D  ;D

(http://www.speedtest.net/result/1011438019.png) (http://www.speedtest.net)

This is my second home service provider.
Download speed is much better, but the upload speed worse.
And it shows real distance to the server. The first does the big hook on the country, for not clear reasons and shows distance in 2250 miles.
The possible reason is IP address belonging to other area of the country.
Title: Re: optimized sources
Post by: _heinz on 31 Oct 2010, 10:23:20 am
speed like a snail  :'(
(http://www.speedtest.net/result/1011501849.png) (http://www.speedtest.net)
(http://www.speedtest.net/result/1011976723.png) (http://www.speedtest.net)
 
Test ADSL - Test vitesse ADSL / Vitesse de connexion
Merci de ne pas effectuer d'autres téléchargements parallèlement au test de vitesse ADSL. La ligne doit être libre. Par ailleurs, nous n’accordons aucune garantie pour les données mesurées étant donné qu’elles ne sont pas forcément justes. Le résultat du test de l’ADSL doit également être vérifié auprès d’autres fournisseurs.
Last Result:
Download Speed: 609 kbps (76.1 KB/sec transfer rate)
Upload Speed: 263 kbps (32.9 KB/sec transfer rate)
Latency: 178 ms
~~~~~~~~~~~
 
hadn't thought it.

heinz
Title: Re: optimized sources
Post by: Jason G on 31 Oct 2010, 01:52:56 pm
ROFL, looks like I lead with lowest ping so far ;D ;D ;D

Not if I choose the closer server @ 4am

(http://www.speedtest.net/result/1011733744.png) (http://www.speedtest.net)
Title: Re: optimized sources
Post by: Josef W. Segur on 31 Oct 2010, 02:22:01 pm
PING 192.168.0.250 (192.168.0.250): 50 data bytes
50 bytes from 192.168.0.250: icmp_seq=0 time=0 ms
50 bytes from 192.168.0.250: icmp_seq=1 time=0 ms
50 bytes from 192.168.0.250: icmp_seq=2 time=0 ms
 
PING Statistics for 192.168.0.250
3 packets transmitted, 3 packets received, 0% packet loss
round-trip (ms) min/avg/max = 0/0/0
  ;D
Title: Re: optimized sources
Post by: _heinz on 31 Oct 2010, 08:07:31 pm
VoIP Speed Test (http://whichvoip.com/voip/speed_test/ppspeed.html)
it shows for my ISP Orange.fr (http://www.orange.fr)
VoIP test statistics
--------------------
Jitter: you --> server: 0.3 ms
Jitter: server --> you: 1.3 ms
Packet loss: you --> server: 0.0 %
Packet loss: server --> you: 0.0 %
Packet discards: 0.0 %
Packets out of order: 0.0 %
Estimated MOS score: 4.1

Speed test statistics
---------------------
Download speed: 381088 bps
Upload speed: 199288 bps
Download quality of service: 56 %
Upload quality of service: 23 %
Download test type: socket
Upload test type: socket
Maximum TCP delay: 421 ms
Average download pause: 84 ms
Minimum round trip time to server: 213 ms
Average round trip time to server: 214 ms
Estimated download bandwidth: 381088bps
Route concurrency: --
Download TCP forced idle: --
Maximum route speed: 2461408bps
~~~~~~~~~~~~~~~~~~~~~~~~
reasonable, but sound could be broken

I'm feeling like behind the lune  :'(
Title: Re: optimized sources
Post by: _heinz on 06 Nov 2010, 06:53:55 pm
(http://www.speedtest.net/result/1020082076.png) (http://www.speedtest.net)
(http://www.speedtest.net/result/1020097326.png) (http://www.speedtest.net)

VoIP test statistics
--------------------
Jitter: you --> server: 0.8 ms
Jitter: server --> you: 3.9 ms
Packet loss: you --> server: 0.0 %
Packet loss: server --> you: 0.0 %
Packet discards: 0.0 %
Packets out of order: 0.0 %
Estimated MOS score: 4.0

Speed test statistics
---------------------
Download speed: 1016312 bps
Upload speed: 274936 bps
Download quality of service: 98 %
Upload quality of service: 95 %
Download test type: socket
Upload test type: socket
Maximum TCP delay: 49 ms
Average download pause: 28 ms
Minimum round trip time to server: 213 ms
Average round trip time to server: 217 ms
Estimated download bandwidth: 1840000bps
Route concurrency: 1.8104676
Download TCP forced idle: 0 %
Maximum route speed: 2461408bps
~~~~~~~~~~~~~~~~~~~~~~~~
done by modifying some ADSL parameters of DG834PN ADSL2+ router
PPoA to PPoE
use service G992_3_A
Multiplexmethod: LLC-BASED
VPI: 8
VCI: 35
~~~~~~~~~~~~~
it is now near the max I can get.  ;)
heinz
Title: Re: optimized sources
Post by: _heinz on 11 Nov 2010, 07:58:03 pm
By the way, the new GTX580 is already available in Germany (479,99 Euro)
Run my GPU CUDA ? Most questions are answered there:
List of CUDA GPU's (http://www.nvidia.com/object/cuda_gpus.html)
And CUDA Toolkit 3.2 RC2 is available now.
~~~~~~~~~~~~~~~~~~~~~~~~~~
Have todo some updates now.

heinz
Title: Re: optimized sources
Post by: RottenMutt on 11 Nov 2010, 10:37:50 pm
...
By the way, the new GTX580 is already available in Germany (479,99 Euro)
...

heinz

no need to get one with seti being down...  yes nvidia, some of your bussness is from crunchers..
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2010, 07:59:40 pm
I installed CUDA 3.2 RC2 Release64 on V8-SK01
If you want to compile all samples from the SDK you must install latest DXSDK (http://msdn.microsoft.com/de-de/directx/default(en-us).aspx) (June 2010)
compile all samples from CUDASDK
========== Alles neu erstellen: 92 erfolgreich, Fehler bei 1, 0 übersprungen ==========
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error in project simpleD3D11Texture, library "D3D11Effects_vc9.lib" not found.
If we lookup in the project dir under ./d3d11effect there are two lib-files:
D3DX11Effects_vc9_x64.lib
D3DX11Effects_vc9_x64D.lib

lib "D3D11Effects_vc9.lib" is not there.
This is a bug in the sample project of  CUDASDK 64Bit package.

Solution:
we goto linker-->Additional Dependencies take off the entry D3D11Effects_vc9.lib and add
the two libs D3DX11Effects_vc9_x64.lib D3DX11Effects_vc9_x64D.lib

Then the project simpleD3D11Texture compiles successful. (and exe is executable also)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3>simpleD3D11Texture - 0 Fehler, 6 Warnung(en)

All 93 samples are successful compiled.
We can assume installation and all necessary setting are correct.  ;)
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2010, 08:27:40 pm
Speedup test (ap cpu-version)
IPP SSE2 without BLANKIT compiled on V8-VM3 shows following results on my 3 machines

WU's   V8-VM3   P4,2.6   R3600(1.6)
----------------------------------------------------------------
1LC25   45,48   25,60   33,72
    1,83    1,34    1,51
----------------------------------------------------------------
Rtiny   43,95   18,25   31,27
    1,78    1,22    1,45
----------------------------------------------------------------
sigi_v5   50,26   24,33   35,11
    2,01    1,32    1,54
----------------------------------------------------------------

A special ATOM build is in our public beta test area for download.
http://lunatics.kwsn.net/18-astropulse-testing/beta-testing-astropulse-on-atom-cpu.0.html

heinz
Title: Re: optimized sources
Post by: _heinz on 29 Nov 2010, 03:42:05 am
ap (cpu-version)
with the data from Claggy's ATOM N450 (from our beta test) we can confirm the speedup
Quick timetable
 
WU : ap_18se08aa_B6_P1_00046_1LC25.wu
ap_5.05r168_SSE3.exe :
  Elapsed 2203.207 secs
      CPU 2192.469 secs
ap_5.05r409_SSE.exe :
  Elapsed 1995.041 secs, speedup: 9.45%  ratio: 1.10
      CPU 1985.284 secs, speedup: 9.45%  ratio: 1.10
ap_5.05r460_SSE3_ATOM_ICC_MKLS_O3.exe :
  Elapsed 1429.883 secs, speedup: 35.10%  ratio: 1.54
      CPU 1424.461 secs, speedup: 35.03%  ratio: 1.54
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe :
  Elapsed 1443.502 secs, speedup: 34.48%  ratio: 1.53
      CPU 1434.445 secs, speedup: 34.57%  ratio: 1.53
 
WU : JasonShort_v5.wu
ap_5.05r168_SSE3.exe :
  Elapsed 3439.962 secs
      CPU 3425.423 secs
ap_5.05r409_SSE.exe :
  Elapsed 3251.639 secs, speedup: 5.47%  ratio: 1.06
      CPU 3236.646 secs, speedup: 5.51%  ratio: 1.06
ap_5.05r460_SSE3_ATOM_ICC_MKLS_O3.exe :
  Elapsed 2353.124 secs, speedup: 31.59%  ratio: 1.46
      CPU 2342.168 secs, speedup: 31.62%  ratio: 1.46
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe :
  Elapsed 2348.225 secs, speedup: 31.74%  ratio: 1.46
      CPU 2336.895 secs, speedup: 31.78%  ratio: 1.47
 
WU : Raistmer_tinyrr.wu
ap_5.05r168_SSE3.exe :
  Elapsed 877.392 secs
      CPU 870.907 secs
ap_5.05r409_SSE.exe :
  Elapsed 877.424 secs, speedup: -0.00%  ratio: 1.00
      CPU 868.130 secs, speedup: 0.32%  ratio: 1.00
ap_5.05r460_SSE3_ATOM_ICC_MKLS_O3.exe :
  Elapsed 609.477 secs, speedup: 30.54%  ratio: 1.44
      CPU 604.395 secs, speedup: 30.60%  ratio: 1.44
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe :
  Elapsed 602.364 secs, speedup: 31.35%  ratio: 1.46
      CPU 593.069 secs, speedup: 31.90%  ratio: 1.47
 
WU : short_ap_21oc08ab_B2_P0_00081_20081130_08605_v5.wu
ap_5.05r168_SSE3.exe :
  Elapsed 1819.665 secs
      CPU 1809.362 secs
ap_5.05r409_SSE.exe :
  Elapsed 1724.053 secs, speedup: 5.25%  ratio: 1.06
      CPU 1716.885 secs, speedup: 5.11%  ratio: 1.05
ap_5.05r460_SSE3_ATOM_ICC_MKLS_O3.exe :
  Elapsed 1232.933 secs, speedup: 32.24%  ratio: 1.48
      CPU 1226.059 secs, speedup: 32.24%  ratio: 1.48
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe :
  Elapsed 1223.495 secs, speedup: 32.76%  ratio: 1.49
      CPU 1217.635 secs, speedup: 32.70%  ratio: 1.49
 
WU : sigind_v5.wu
ap_5.05r168_SSE3.exe :
  Elapsed 4775.746 secs
      CPU 4747.922 secs
ap_5.05r409_SSE.exe :
  Elapsed 4278.386 secs, speedup: 10.41%  ratio: 1.12
      CPU 4241.121 secs, speedup: 10.67%  ratio: 1.12
ap_5.05r460_SSE3_ATOM_ICC_MKLS_O3.exe :
  Elapsed 3097.042 secs, speedup: 35.15%  ratio: 1.54
      CPU 3084.374 secs, speedup: 35.04%  ratio: 1.54
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe :
  Elapsed 3108.524 secs, speedup: 34.91%  ratio: 1.54
      CPU 3090.879 secs, speedup: 34.90%  ratio: 1.54
------------------------------------------------------------------------------
ap cpu-version for ATOM processor will be published together with next unified installer version
heinz
Title: Re: optimized sources
Post by: Raistmer on 29 Nov 2010, 10:22:18 am
I'd say ICC and IXE builds have same speed in error range.
Title: Re: optimized sources
Post by: _heinz on 01 Dec 2010, 06:16:40 pm
Milestones

today 25 Mio total (http://www.boincstats.com/signature/user_374946.gif)

tomorrow 1 Mio Docking (http://www.boincstats.com/signature/user_374946_project35.gif)  (all cpu work)

most done by V8-Xeon

happy crunching  ;)


Title: Re: optimized sources
Post by: _heinz on 02 Dec 2010, 02:05:51 pm
Some construction ideas for your next monster (http://www.flickr.com/photos/27920304@N06/) cruncher.
but then liquid cooled.
CUDA@MIT (http://sites.google.com/site/cudaiap2009/pictures)
Found on MIT

have fun

heinz  ;)

[Mod:] corrected unfortunate typo, that might have resulted in too many google search hits ::)
Title: Re: optimized sources
Post by: _heinz on 02 Dec 2010, 08:27:09 pm
Hi Jason,
do you know Australias greenest Supercomputer (http://www.youtube.com/watch?v=BV5cSswg9uE)   ;)
Title: Re: optimized sources
Post by: _heinz on 07 Dec 2010, 02:48:00 pm
Surprize,
Today I'm one of the "Top Contributors" in Intels Software Network forums (http://software.intel.com/en-us/forums/)

 ;D
Title: Re: optimized sources
Post by: _heinz on 10 Dec 2010, 03:23:12 pm
I had have some trouble with latest CUDA3.2 package, it has some issues using different compilers, some error reports are still open. (Sometime, in different projects, ptxas hung up)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
the 2 GTX470 runs fine under full load.
gtx470_geraeteeigenschaften (http://www.britta-d.de/images/gtx470/gtx470_geraeteeigenschaften.jpg)
gtx470_auslastung (http://www.britta-d.de/images/gtx470/gtx470_auslastung.jpg)
gtx470_gpuz (http://www.britta-d.de/images/gtx470/gtx470_gpuz.jpg)
gpuload (http://www.britta-d.de/images/primegrid/pg_gtx470_memory_used_98MB.jpg)
cpu runs on all 8 cores docking, no problem to feed 2 GPU's
cpu_auslastung (http://www.britta-d.de/images/gtx470/E5405_cpu_auslastung.jpg)
temps (http://www.britta-d.de/images/primegrid/pg_gtx470_temp.jpg)

heinz
modify: some links added



Title: Re: optimized sources
Post by: _heinz on 20 Dec 2010, 05:51:02 pm
2 x GTX470 Colorful_stable (http://www.britta-d.de/images/primegrid/pg_gtx470_gpuz_stable.jpg) light oc'ed, prepared for some good numbers.
GPU_load_stable (http://www.britta-d.de/images/primegrid/pg_gtx470_gpuz_stable_sensors.jpg)
To confirm the values I run two GTX470 3 days under full load. (took part at primegrids "Winter Solstice Challenge" place 71 )
Awesome!

Sure there are some other cards with some higher core clocks...but Colorful (http://en.colorful.cn/) surprized me.

heinz

Title: Re: optimized sources
Post by: _heinz on 30 Dec 2010, 07:51:02 pm
Sylvester 2010,
we are not far away from the magic click of 100 000

Time to say "thank you" to all readers of this epic thread
Kindest regards  ;D

Happy New Year 2011

heinz
Title: Re: optimized sources
Post by: _heinz on 02 Jan 2011, 07:54:48 pm
Awaiting the 50 Mio credit today
...waiting
Total  50,689,071.39
got it  :)
 
Title: Re: optimized sources
Post by: _heinz on 05 Jan 2011, 06:13:35 pm
Today 5th of January "Intel C++ Studio XE/SSR" Premier support expires,
renewal for one year costs 467,10 EUR zzgl. MwSt.
my other two "Intel Parallel Studio XE WIN/SSR/ESD/SU" Premier support expired 31.12.2010
renewal  each--> 584,10 EUR zzgl. MwSt.

summary really a lot of money, price of a new product itself....phhh... :'(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
New product:
Intel Parallel Studio XE WIN/COM/ESD/SU
1.439,10 EUR

must think about what todo
heinz
Title: Re: optimized sources
Post by: Jason G on 05 Jan 2011, 06:16:28 pm
must think about what todo
heinz

Your choice Heinz, but I stopped renewing my premier support quite a while ago, since the prices went up and support received with non-functioning advanced features in earlier versions was not satisfactory.

Don't worry too much, as careful hand coding of AVX would end up better anyway  ;)
Title: Re: optimized sources
Post by: _heinz on 05 Jan 2011, 06:39:26 pm
At the moment I'm activating my old P4 2.6GHz as a developer machine, install all necessary software
VS2008Prof
ICC / Parallel Studio XE
...
fighting with not readable DVD again on this machine....

must connect my extern DVD writer from LG, it does it.
After installing VS2008 Service Pack 1 a lot of update followed....
Title: Re: optimized sources
Post by: _heinz on 06 Jan 2011, 03:52:59 pm
Today the machine is ready to use.
Some first tests show it works. MSC and ICC compiles.
To proof full functionality some more complex project must be compiled.

heinz

Title: Re: optimized sources
Post by: _heinz on 06 Jan 2011, 08:28:48 pm
To proof full functionality some more complex project must be compiled.
Used Release_vc90.sln from CUDA3.2 SDK (it contains 93 Projects)

Using MSC  Compiler
========== Alles neu erstellen: 92 erfolgreich, Fehler bei 0, 1 übersprungen ==========

hint: 1 übersprungen -->cudaEncode was not marked in Konfigurations-Manager

we compile now cudaEncode
~~~~~~~~~~~~~~~~~~
cudaEncode - 0 Fehler, 2 Warnung(en)
========== Alles neu erstellen: 1 erfolgreich, Fehler bei 0, 0 übersprungen ==========

summary we compiled 93 cuda32-projects now. That's correct.

heinz
modify: recompiled CUDA32
========== Alles neu erstellen: 93 erfolgreich, Fehler bei 0, 0 übersprungen ==========
Title: Re: optimized sources
Post by: _heinz on 07 Jan 2011, 08:20:41 pm
Intel Compiler Xe2011

step1 converting project solution
Rebuild these 93 projects to ensure all intermediate files are rebuilt with the new compiler.

Detailed update log was saved at "file://C:\CUDA32SDK\C\src\IcUpdateLog.htm"

step2 compiling with Intel compiler
...
...
Compiling with CUDA Build Rule... (Microsoft VC++ Environment)
"C:\CUDA32\v3.2\\bin\nvcc.exe"  -G0  -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\"  --machine 32 -ccbin "C:\Programme\Microsoft Visual Studio 9.0\VC\bin" -D_NEXUS_DEBUG -g   -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MT  " -I"C:\CUDA32\v3.2\/include" -I"./" -I"../../common/inc" -I"../../../shared/inc" -I"../../shared/inc/" -I"C:\CUDA32\v3.2\\include" -maxrregcount=[Value]  --compile -o "Release/bandwidthTest.cu.obj" bandwidthTest.cu
nvcc fatal   : '[Value]': expected a number
Project bandwidthTest : error: A tool returned an error code from "Compiling with CUDA Build Rule..."
Build log was saved at "file://C:\CUDA32SDK\C\src\bandwidthTest\Release\BuildLog.htm"
bandwidthTest - 1 error(s), 0 warning(s), 0 remark(s)
------ Neues Erstellen gestartet: Projekt: asyncAPI, Konfiguration: Release Win32 ------
Deleting intermediate files and output files for project 'asyncAPI', configuration 'Release|Win32'.
Compiling with CUDA Build Rule... (Microsoft VC++ Environment)
"C:\CUDA32\v3.2\\bin\nvcc.exe"  -G0  -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\"  --machine 32 -ccbin "C:\Programme\Microsoft Visual Studio 9.0\VC\bin" -D_NEXUS_DEBUG -g   -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MT  " -I"C:\CUDA32\v3.2\/include" -I"./" -I"../../common/inc" -I"../../../shared/inc" -I"C:\CUDA32\v3.2\\include" -maxrregcount=[Value]  --compile -o "Release/asyncAPI.cu.obj" asyncAPI.cu
nvcc fatal   : '[Value]': expected a number
Project asyncAPI : error: A tool returned an error code from "Compiling with CUDA Build Rule..."
Build log was saved at "file://C:\CUDA32SDK\C\src\asyncAPI\Release\BuildLog.htm"
asyncAPI - 1 error(s), 0 warning(s), 0 remark(s)
------ Neues Erstellen gestartet: Projekt: alignedTypes, Konfiguration: Release Win32 ------
Deleting intermediate files and output files for project 'alignedTypes', configuration 'Release|Win32'.
Compiling with CUDA Build Rule... (Microsoft VC++ Environment)
"C:\CUDA32\v3.2\\bin\nvcc.exe"  -G0  -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\"  --machine 32 -ccbin "C:\Programme\Microsoft Visual Studio 9.0\VC\bin" -D_NEXUS_DEBUG -g   -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MT  " -I"./" -I"../../common/inc" -I"../../../shared/inc" -I"C:\CUDA32\v3.2\\include" -maxrregcount=[Value]  --compile -o "Release/alignedTypes.cu.obj" alignedTypes.cu
nvcc fatal   : '[Value]': expected a number
Project alignedTypes : error: A tool returned an error code from "Compiling with CUDA Build Rule..."
Build log was saved at "file://C:\CUDA32SDK\C\src\alignedTypes\Release\BuildLog.htm"
alignedTypes - 1 error(s), 0 warning(s), 0 remark(s)
========== Alles neu erstellen: 10 erfolgreich, Fehler bei 83, 0 übersprungen ==========

hey, thats the rules file error...maxrregcount get no value...
phh, I installed latest compiler
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [IA-32]... (Intel C++ Environment)

looking up:
Intel(R) C++ Compiler XE on IA-32, version 12.0.1 Package ID: w_ccompxe_2011.1.127
got it with latest download from 04.01.2011
It's a little bit frustrating... that this known error is  in package w_ccompxe_2011.1.127 till today.  :'(

heinz
Title: Re: optimized sources
Post by: _heinz on 08 Jan 2011, 02:06:11 pm
Hi Jason,

CUDA3.2 & INTEL's Compiler
Package ID: w_ccompxe_2011.1.127
I used the patch CompilerIDEPluginUpdate.zip to avoid the rules-file error sucessful.
The rules-file error is gone.  ;)

Projekt: MonteCarloMultiGPU and Projekt: MonteCarlo does not sucessful compile with Intel Compiler
ptxas hung up  :'(
========== Alles neu erstellen: 91 erfolgreich, Fehler bei 2, 0 übersprungen ==========

So far, we must wait for the next update.

heinz
Title: Re: optimized sources
Post by: _heinz on 11 Jan 2011, 07:33:58 am
Sandy Bridge is available in Germany

Intel® Core™ i5-2500K (Boxed, FC-LGA4, "Sandy Bridge") € 219,90*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Der Intel® Core™ i5-2500K Prozessor basiert auf der neuen 32nm-Sandy-Bridge-Architektur und ist eine native Quad-Core-CPU mit integriertem Grafikkern (Intel® HD 3000). Die Weiterentwicklung der bisherigen Intel® Core™ Prozessoren bringt als wichtigste Neuerung mit AVX eine überarbeitete und auf 256-Bit erweiterte Version der SSE-Befehle. Der integrierte Speichercontroller untertstützt jetzt offiziell DDR3-Speicher bis 1600 MHz und der verbesserte "Dynamic Turbomode" kann den Takt bei niedrigen Systemtemperaturen (wie nach dem Rechnerstart) kurzzeitig um bis zu 30% über den Maximalwert anheben.
Die integrierte GPU läuft mit 850 Mhz (max. 1150 mit Turbo) und teilt sich den gemeinsamen, 6 MB großen L3-Cache mit der  CPU.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bezeichnung: Intel® Core™ i5-2500K
Anzahl Prozessorkerne: 4
Taktfrequenz: 3300 MHz
Cache Level 1: 4x 64 kB
Level 2: 4x 256 kB
Level 3: 6144 kB
Befehlssätze: SSE 4.x, AVX, EIST, Intel 64, XD bit, Intel VT-x, Smart Cache, Clear Video, Turbo Boost, AES-NI
Speicher-Controller Speicherstandards: DDR3-1066, DDR3-1333, DDR3-1600
Speicherkanäle: 2
Prozessorkern: Sandy Bridge, 32 nm
Bemerkung Turbo-Modus: CPU bis max. 3,7 GHz; GPU-Takt: 850 / 1.100 MHz (Standard / Turbo)
max. Leistungsaufnahme: 95 Watt
Bauform: FC-LGA4
Sockel: 1155
Lüfter: Befestigung für ATX-Mainboards
Anschlüsse PWM-Lüfteranschluss
Besonderheiten: mit aktivem CPU-Kühler

Weitere Infos: Die K-Modelle der Intel® Core™ Prozessoren sind für Übertakter und Technik-Enthusiasten gedacht und bieten einen frei wählbaren Multiplikator
~~~~~~~

Intel® Core™ i7-2600K (Boxed, FC-LGA4, "Sandy Bridge") € 339,-*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Der Intel® Core™ i7-2600K Prozessor basiert auf der neuen 32nm-Sandy-Bridge-Architektur und ist eine native Quad-Core-CPU mit integriertem Grafikkern (Intel® HD 3000). Die Weiterentwicklung der bisherigen Intel® Core™ Prozessoren bringt als wichtigste Neuerung AVX, eine überarbeitete und auf 256-Bit erweiterte Version der SSE-Befehle. Der integrierte Speichercontroller untertstützt jetzt offiziell DDR3-Speicher bis 1600 MHz und der verbesserte "Dynamic Turbomode" kann den Takt bei niedrigen Systemtemperaturen (wie nach dem Rechnerstart) kurzzeitig um bis zu 30% über den Maximalwert anheben.
Die integrierte GPU läuft mit 850 Mhz (max. 1350 mit Turbo) und teilt sich den gemeinsamen, 8 MB großen L3-Cache mit der CPU.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bezeichnung: Intel® Core™ i7-2600K
Anzahl Prozessorkerne: 4
Taktfrequenz: 3400 MHz
Cache Level 1: 4x 64 kB
Level 2: 4x 256 kB
Level 3: 8192 kB
Befehlssätze: SSE 4.x, AVX, EIST, Intel 64, XD bit, Intel VT-x, Smart Cache, Clear Video, Turbo Boost, HyperThreading, AES-NI
Speicher-Controller:
Speicherstandards: DDR3-1066, DDR3-1333, DDR3-1600
Speicherkanäle: 2
Prozessorkern: Sandy Bridge, 32 nm
Bemerkung Turbo-Modus: CPU bis max. 3,8 GHz; GPU-Takt: 850 / 1.350 MHz (Standard / Turbo)
max. Leistungsaufnahme: 95 Watt
Bauform: FC-LGA4
Sockel: 1155
Lüfter: Befestigung für ATX-Mainboards
Anschlüsse: PWM-Lüfteranschluss
Besonderheiten mit aktivem CPU-Kühler

Weitere Infos: Die K-Modelle der Intel® Core™ Prozessoren sind für Übertakter und Technik-Enthusiasten gedacht und bieten einen frei wählbaren Multiplikator
 
~~~~~~~~~~~~~~~~~~~~~~
astropulse AVX app is in preparation.

heinz
Title: Re: optimized sources
Post by: _heinz on 20 Jan 2011, 11:03:25 am
For all of you who are interested to know some more about IPP/AVX


/* Win32:
px - C-optimized for all IA-32 processors
a6 - Optimized for Pentium III processors remark:thru 5.3 only
w7 - Optimized for Pentium 4 processors (SSE + SSE2)
t7 - Optimized for Pentium 4 processors with Streaming SIMD Extensions 3 (SSE3)
v8 - New Optimizations for 32-bit applications on Intel® Core™ 2 and Intel® Xeon® 5100 Processors
p8 - New Optimizations for 32-bit applications on 45nm Intel® Core™2 Duo (Penryn,Nehalem,Westmere) family processors
s8 - New Optimizations for 32-bit applications on Intel® Atom™ family processors
e9 - Not in Win32
*/
/* Win64:
mx - C-optimized for all Intel 64-based Platforms
m7 - Optimized for Intel 64-Based Platforms for Pentium 4 processors with Streaming SIMD Extensions 3 (SSE3)
u8 - New Optimizations for 64-bit applications on Intel® CoreTM 2 and Intel® Xeon® 5100 Processors
y8 - New Optimizations for 64-bit applications on 45nm Intel® CoreTM2 Duo (Penryn,Nehalem,Westmere) family processors and

Intel® CoreTM i7 processors (Nehalem and Westmer)
n8 - New Optimizations for 64-bit applications on Intel® AtomTM family processors
e9 - AVX required,New Optimizations for 64-bit applications on Sandy Bridge µarchitecture
*/
// Description:
// introduced USE_xxx for better prefix-handling
// USE_AVX is still for 64Bit available.
// USE_AVX exclude all other
// USE_AVX and USE_ATOM exclude each other.
// USE_ATOM exclude all other
// USE_SSE42 and USE_SSE41 use the same prefixes(y8, p8)
/*
USE_IPP         Prefix
---------------------------------      
                   X64   X32
USE_AVX      e9   -(p8)
USE_ATOM   n8   s8
USE_SSE42   y8   p8
USE_SSE41   y8   p8
USE_SSSE3   u8   v8
USE_SSE3    m7   t7
USE_SSE2    m7  w7
USE_SSE      m7  w7
no order      mx   px
*/

We use macros to handle all that IPP stuff in the right way.
The PREFIX will be inserted into the different macros to generate the IPP statements.
sample: #define ippsFFTFree_C_32fc PREFIX(ippsFFTFree_C_32fc)
Further it is necessary that generated objectfile must be static. Linked as static.
64 Bit libs are necessary to link.



------ Neues Erstellen gestartet: Projekt: ap_client, Konfiguration: AP_QaxAVX_CSP x64 ------
Deleting intermediate files and output files for project 'ap_client', configuration 'AP_QaxAVX_CSP|x64'.
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_schema.cpp
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_fold.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
dm_chunk_parallel.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
mtrand.cpp
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_timer.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_fileio.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_remove_radar.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_client_main.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_debug.cpp
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
version.cpp
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
sbtf.cpp
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_gfx_main.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
intrinsics.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
Compiling with Intel(R) C++ Compiler XE 12.0.1.127 [Intel(R) 64]... (Intel C++ Environment)
ap_science.cpp
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
-----USE_AVX activ-----
-----AVX x64 e9 activ-----
Linking... (Intel C++ Environment)
xilink: executing 'link'
So far we can compile, for linking we need the 64Bit libs of libboinc, libboincapi, setiboincdb

To create the 64Bit libs(libboinc, libboincapi, setiboincdb) of the project will take some time.


heinz
Title: Re: optimized sources
Post by: _heinz on 20 Jan 2011, 05:09:22 pm
V8-Xeon
btw crunching:
precalculation says: will reach 100 Mio in 30 days..... on 20th february.
Since 20th december V8-Xeon produced constant 1Mio per day with two well clocked GTX470.
RAC is now: 1,030,867
So we hope the machine run stable next month too.
happy crunching  ;)

heinz



Title: Re: optimized sources
Post by: _heinz on 02 Feb 2011, 06:27:06 am
V8-Xeon, 2xGTX470
cr/per day       week         month         average
1,101,713   7,014,636   30,799,731   1,014,102 

RAC dropped a bit by several driver restarts and twice I found the machine down in the morning,
although I could hold the average over 1 Mio.

heinz

technical news: a new dual bord for XEON is available
EVGA Classified SR-2 (Super Record 2)PART NUMBER: 270-WS-W555-A2 (http://www.evga.com/products/moreInfo.asp?pn=270-WS-W555-A2&family=Motherboard%20Family&series=Intel%205520%20Series%20Family&sw=5)
assembled machines you can find: there (http://www.digitalstormonline.com/)
I like this SUB-ZERO PC (http://www.digitalstormonline.com/compblackops.asp) and the CUBE (http://www.digitalstormonline.com/compblackops.asp)
have fun to explore it.
Title: Re: optimized sources
Post by: _heinz on 10 Feb 2011, 04:30:56 pm
V8-Xeon
10th february 90,095,533.50 total
average credit: 1,091,774
we have still 10 days, we are on the line to catch 100 Mio total on 20th february.

heinz
 
Title: Re: optimized sources
Post by: _heinz on 16 Feb 2011, 01:35:51 pm
16th february: now in the first 500
Pos   Name  Summury       cr/day          cr/week     cr/month       average
496 _heinz  96,392,212   1,316,379   9,851,843   31,132,429   1,090,954 

although I found the machine yesterday down again in the morning I was able to hold the average today.

heinz

Title: Re: optimized sources
Post by: _heinz on 18 Feb 2011, 07:09:28 pm
samedi 19th february
I got the 100 Mio total (http://www.boincstats.com/signature/user_374946.gif) today.
100,114,951.92

number 33 in France
Pos  Name    Summary        cr/day           cr/week     cr/month      average
33   _heinz  100,114,952   2,116,920   8,843,266   32,506,661   1,191,680

have a look at the full statistic (http://de.boincstats.com/stats/boinc_user_graph.php?pr=bo&id=5e024335320e436c4d050e073963e326)

It shows it is possible to produce constant ~1 Mio credit/day with two  well clocked GTX470, started on 10th of december.
Yesterday I added a GTX570 and V8-Xeon runs now with 3 cards.
v8-xeon_2-GTX470 (http://www.britta-d.de/images/gtx470/gtx470_2_cards.jpg)
v8-xeon_3_cards (http://www.britta-d.de/images/gtx470/gtx470_gtx570_3_cards.jpg)
The card in the middle is the GTX570 and a bit longer than the GTX470.
1.6 Mio/day are precalculated. We will see if it will become true.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The upper card runs 10 grd celsius hotter than the other two, althought the case is well aired
slot0 GTX470  ~87 grd Celsius
slot1 GTX570  ~75 grd
slot3 GTX470  ~77  grd
roomtemp=23,5 grd Celsius
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
happy crunching
heinz
 

Title: Re: optimized sources
Post by: _heinz on 22 Feb 2011, 12:23:31 pm
1.6 Mio/day are precalculated. We will see if it will become true.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ 1,669,906.80 seither, today I got it.
It's not easy to hold the machine stable, GPU 0 slot 0 overheated, 86 grd celsius. It seems the card got hot air from the main memory, its cooler blows the air onto the memory slots and hot air comes out left and right. Right under it is GPU0 and sucks this hot air.
Temperatur-difference between GPU0 and GPU1 are ~10 grd celsius. Must do something.
Title: Re: optimized sources
Post by: _heinz on 26 Feb 2011, 11:32:49 am
france number one on pg (http://de.boincstats.com/stats/user_stats.php?pr=pg&st=0&co=France#1)
This is still possible with optimized cuda source code.


Title: Re: optimized sources
Post by: _heinz on 02 Mar 2011, 03:54:29 pm
Primegrid Tuesday, 1 March 2011 90,000,000 (including my Atom and Laptop)
V8-Xeon
Today number 12 in the top host list of Primegrid (http://www.primegrid.com/top_hosts.php)
Nr  Name   Average         Total
12 _heinz 1,413,616.73 89,997,188

happy crunching with GPU's, the time of  the cpu's is over, definitive.

heinz
 
Title: Re: optimized sources
Post by: _heinz on 08 Mar 2011, 08:39:26 pm
Today 8th of March,
 V8-Xeon get 100 Mio from Primegrid, after exactly 3 months, started on 9th of December.
 
Name V8-SK01
Erstellt 9 Dec 2010 | 17:34:23 UTC
Gesamtguthaben 100,571,962
Durchschnittliches Guthaben 1,553,764.96
V8-hostdid=173588 (http://www.primegrid.com/show_host_detail.php?hostid=173588)
Today number 13 of the tophostlist of pg.
todays pg statistics:
User     cr/summary      cr/day          cr/week      cr/month       Average
_heinz  102,051,361   1,804,677   12,494,348   43,651,041   1,615,971 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Todays totalcredit summary: 130,057,400.01

Worldposition number:  389
~~~~~~~~~~~~~~~~~~~~~~~~
Statistic shows:
2011-03-08
Gesamt-Credit 130,057,400.01
Credit/Tag 1,804,677
Position 389
~~~~~~~~~~~~~~~~~~~~~~~~~
My laptop CPU Intel(R) Core(tm) i3 CPU M 390 @ 2.67GHz, GT540M (http://www.notebookcheck.net/NVIDIA-GeForce-GT-540M.41715.0.html), NVIDIA OPTIMUS Technology
hostid=12226764 (http://de.boincstats.com/stats/boinc_host_graph.php?pr=bo&id=12226764)
crunching since 2011-02-21 06:53:45 
got 1 Mio today
1,037,286.49
~~~~~~~~~~~~~~~~~~~~~~~~~
Why to post this here:
It shows clearly the time of the cpu's in crunching area is over now.
Maybe this results influence further CPU development.
First steps are already done with newest I3, i4, i5, i7 which includes intel HD Graphics in the cpu chip directly.
Newest generation laptops have already integrated GPU's(ATI / NVIDIA) on the Mainboard.
Next step should be a cpu chip with integrated INTEL HD / NVIDIA GPU.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Program development is now more GPU oriented as ever before.
 
heinz
edit: GT540M link added
Title: Re: optimized sources
Post by: _heinz on 13 Mar 2011, 04:04:37 pm
fighting again with enormous France Telecom speed.. :'( 
I pay for Formule Plus (http://abonnez-vous.orange.fr/residentiel/forfaits/formule-tripleplay-plus.aspx?rdt=o)
I have the newest Livebox.
a shame...download of 0.32 MB/s
(http://speedtest.net/result/1199976854.png)
Title: Re: optimized sources
Post by: _heinz on 14 Mar 2011, 07:17:46 pm
not a happy day, restarted the server 3 times, a lot of production time is lost, GPU 0, the 470 with old Bios hung more and more up...
Today number 10 of the tophostlist of pg (http://wwww.primegrid.com/top_hosts.php), but it seems I can not hold it ..
10  _heinz   1,536,210.68   109,970,756

Best would be to change this GPU.

heinz
Title: Re: optimized sources
Post by: _heinz on 16 Mar 2011, 12:24:44 pm
new speed record from France Telecom today 17:20
(http://www.speedtest.net/result/1204944264.png)
Title: Re: optimized sources
Post by: _heinz on 22 Mar 2011, 06:17:14 am
After crunching a month:
Intel(R) Core(TM) i3 CPU M 390 @ 2.67GHz, GT540M get 2 Million
gt540m_2Mio_primegrid (http://www.britta-d.de/images/gt540m/gt540m_2Mio_primegrid.jpg)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Erstellt 20 Feb 2011 | 15:48:17 UTC
Gesamtguthaben 2,139,379
Durchschnittliches Guthaben 73,954.83
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
My P4 2,6GHz machine died,  :'(
I shut it down and now never switched it on.
If I push the "on" switch nothing happened,
maybe to replace the PSU
Title: Re: optimized sources
Post by: Miep on 22 Mar 2011, 09:00:43 am
Gesamtguthaben 2,139,379
Durchschnittliches Guthaben 73,954.83

Off topic. There more I see of the german translations the more I wonder who did them. They lack language intuition into both English and German ::)
Title: Re: optimized sources
Post by: Raistmer on 22 Mar 2011, 09:19:51 am
Gesamtguthaben 2,139,379
Durchschnittliches Guthaben 73,954.83

Off topic. There more I see of the german translations the more I wonder who did them. They lack language intuition into both English and German ::)
Google? ;)
Title: Re: optimized sources
Post by: _heinz on 24 Mar 2011, 03:22:55 pm
Today  number 20 in France (http://de.boincstats.com/stats/boinc_user_stats.php?pr=bo&st=0&co=8#20)
Title: Re: optimized sources
Post by: benool on 25 Mar 2011, 09:18:29 am
quite impressive!
Title: Re: optimized sources
Post by: _heinz on 09 Apr 2011, 06:29:18 pm
Starting on 9th of december, today 9th of april exactly after 4 months ended my personal boinc race.
Temps are rising up, I must shut down.
We had have already 26 grd celsius outside.
Lets have a short look at the statistics.
World wide:
france_number_16 (http://www.britta-d.de/images/race/france_number_16.jpg)
world_rac_number_55 (http://www.britta-d.de/images/race/world_rac_number_55.jpg)
world_boinc_number_291 (http://www.britta-d.de/images/race/world_boinc_number_291.jpg)
Primegrid
pg_france_number_1 (http://www.britta-d.de/images/race/pg_france_number_1.jpg)
pg_top_computers_number_8 (http://www.britta-d.de/images/race/pg_top_computers_number_8.jpg)
pg_top_participants_number_33 (http://www.britta-d.de/images/race/pg_top_participants_number_33.jpg)
pg_world_number_42 (http://www.britta-d.de/images/race/pg_world_number_42.jpg)
gt540m_3Mio_primegrid (http://www.britta-d.de/images/gtx540m/gt540m_3Mio_primegrid.jpg)
~~~~~~~~~~~~~~~~
pg_summary 159,081,868
pg_average 1,798,277.49
~~~~~~~~~~~~~~~~
if we are looking at the numbers:
production-average 53 Mio / month
RAC 1.791.860
~~~~~~~~~
1xGTX570
2xGTX470

heinz
Title: Re: optimized sources
Post by: _heinz on 22 Apr 2011, 01:26:38 pm
BOINC 6.10.60 does not detect a GPU gt540m (http://www.britta-d.de/images/gtx540m/gt540m_everest.jpg) on my laptop
latest CUDA-driver 270.61 is installed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.
22.04.2011 18:57:32      Starting BOINC client version 6.10.60 for windows_x86_64
22.04.2011 18:57:32      log flags: file_xfer, sched_ops, task
22.04.2011 18:57:32      Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
22.04.2011 18:57:32      Running as a daemon
22.04.2011 18:57:32      Data directory: C:\BOINC\DATA
22.04.2011 18:57:32      Running under account boinc_master
22.04.2011 18:57:33      Processor: 4 GenuineIntel Intel(R) Core(TM) i3 CPU       M 390  @ 2.67GHz [Family 6 Model 37 Stepping 5]
22.04.2011 18:57:33      Processor: 256.00 KB cache
22.04.2011 18:57:33      Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt pbe
22.04.2011 18:57:33      OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
22.04.2011 18:57:33      Memory: 3.80 GB physical, 7.60 GB virtual
22.04.2011 18:57:33      Disk: 546.24 GB total, 459.77 GB free
22.04.2011 18:57:33      Local time is UTC +2 hours
22.04.2011 18:57:34      No usable GPUs found
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BOINC 6.10.58 shows
22.04.2011 19:44:40      Starting BOINC client version 6.10.58 for windows_x86_64
22.04.2011 19:44:40      log flags: file_xfer, sched_ops, task
22.04.2011 19:44:40      Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
22.04.2011 19:44:40      Data directory: C:\BOINC\DATA
22.04.2011 19:44:40      Running under account heinz
22.04.2011 19:44:40      Processor: 4 GenuineIntel Intel(R) Core(TM) i3 CPU       M 390  @ 2.67GHz [Family 6 Model 37 Stepping 5]
22.04.2011 19:44:40      Processor: 256.00 KB cache
22.04.2011 19:44:40      Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt pbe
22.04.2011 19:44:40      OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
22.04.2011 19:44:40      Memory: 3.80 GB physical, 7.60 GB virtual
22.04.2011 19:44:40      Disk: 546.24 GB total, 466.69 GB free
22.04.2011 19:44:40      Local time is UTC +2 hours
22.04.2011 19:44:41      NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 4000, compute capability 2.1, 962MB, 192 GFLOPS peak)
Title: Re: optimized sources
Post by: Claggy on 22 Apr 2011, 01:43:13 pm
Quote
22.04.2011 18:57:32      Running as a daemon

That's your problem, uninstall Boinc 6.10.60, and reinstall without ticking the 'Protected application execution' box,

the reason why is posted here: No GPUs Found (http://boinc.berkeley.edu/dev/forum_thread.php?id=5609&nowrap=true#31993)

Claggy
Title: Re: optimized sources
Post by: _heinz on 22 Apr 2011, 01:49:28 pm
Thanks Claggy, will try it

~~~~~~~~~~~~~~~
it does
22.04.2011 19:53:58      Starting BOINC client version 6.10.60  for windows_x86_64
22.04.2011 19:53:58      log flags: file_xfer, sched_ops, task
22.04.2011 19:53:58      Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
22.04.2011 19:53:58      Data directory: C:\BOINC\DATA
22.04.2011 19:53:58      Running under account heinz
22.04.2011 19:53:59      Processor: 4 GenuineIntel Intel(R) Core(TM) i3 CPU       M 390  @ 2.67GHz [Family 6 Model 37 Stepping 5]
22.04.2011 19:53:59      Processor: 256.00 KB cache
22.04.2011 19:53:59      Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt pbe
22.04.2011 19:53:59      OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
22.04.2011 19:53:59      Memory: 3.80 GB physical, 7.60 GB virtual
22.04.2011 19:53:59      Disk: 546.24 GB total, 466.23 GB free
22.04.2011 19:53:59      Local time is UTC +2 hours
22.04.2011 19:53:59      NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 4000, compute capability 2.1, 962MB, 192 GFLOPS peak)
Title: Re: optimized sources
Post by: _heinz on 24 Apr 2011, 11:02:57 am
V8-xeon is death  :'(
After the last shutdown I restarted the server yesterday. But surprisingly the machine did not start, it hung up while starting the OS Vista64. I tried the secure mod, but nothing helped. After initial loading of some driver the machine switched off automatic. After this happened, the machine comes no more over the BIOS selftest. So I decided to open the case for a general cleaning. I took off all 3 grafikadapters and the main memory.
Here a look at the empty machine (http://www.britta-d.de/images/v8_empty_machine.jpg).
After cleaning all tools I let the 2 GTX470 off.
Now still one graphic-adapter GTX570 is inserted (http://www.britta-d.de/images/v8_gtx570.jpg)
If I switch the machine on, the display on the board runs till 28 and then comes 29 and the machine switched off.
Port 80h POST codes (http://www.intel.com/support/motherboards/desktop/sb/CS-025434.htm)
No good sign. Looks like a defect memory or chipset.
I will took off the memory and clean again, hoping to have a little more luck.
If you have some specal tips, they are welcome.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Happy Eastern
Joyses Paques
to all readers

heinz
Title: Re: optimized sources
Post by: _heinz on 11 May 2011, 11:15:36 am
V8-Xeon,
after recleaning, I could run once through the BIOS successful till to the access on the disk, then switched the PSU off.
So I can hope the board and chipset is not damaged.
Looks like I need a new PSU.
~~~~~~~~~~~~~~~~~~~~~~~~~~
btw.
 i3 GT540M got 5Mio primegrid (http://www.britta-d.de/images/gtx540m/gt540m_5Mio_primegrid.jpg)
and my ION R3600 Atom will catch 1 Mio in the next days.
modify:
First seen on 2010-10-09 06:58:22
CPU Intel(R) Atom(tm) CPU 230 @ 1.60GHz
Number of CPU's (number of (virtual) cores) 1(2)
Operating System and version Microsoft Windows 7
Current Credit (based on incremental update) 1,091,539.47  
Recent average credit RAC (projects accumulated) 6,143.94147
same later:
now R3600 ATOM 230 ION get 1 Mio primegrid (http://www.britta-d.de/images/primegrid/pg_r3600_atom230_1Mio.jpg)
Last year, nobody had ever thought that such a small machine can do it.  ;D

heinz


Title: Re: optimized sources
Post by: _heinz on 17 Jun 2011, 10:46:36 am
Hi,
here are some pictures of our latest cuda app's running on my i3.
Installer v0.38 used to install.
GT540M slightly oced (http://www.britta-d.de/images/gtx540m/gt540m_gpuz_data.jpg)
GT540M_sensors_Lunatics_x38g_win32_cuda32_2_wus_running (http://www.britta-d.de/images/gtx540m/gt540m_gpuz_sensors.jpg)

picture from our beta test
running 4 akv8 and 1 x38e at the moment of downclocking (http://www.britta-d.de/images/gtx540m/gt540m_crunching_akv8b_lunatics_x38e.jpg)
the downclocking is solved now.

Thanks to the developer team and the beta testers, and all people who are involved.....

download the v0.38 installer and try the new published app's -->downlod v0.38 (http://lunatics.kwsn.net/index.php?module=Downloads;catd=9)

heinz



Title: Re: optimized sources
Post by: _heinz on 18 Jun 2011, 03:43:17 pm
some more pictures from x38g
i3 gt540m crunching 3 akv8b and 2 x38g (http://www.britta-d.de/images/gtx540m/gt540m_crunching_3xakv8b_2x38g.jpg)
while running 3 akv8 vlar on CPU the temp was goi~g up to 84 grd and the CPU downclocked from 2,6GHz to 931MHz.
Yeah, akv8 is a toaster

heinz
modify: have you noticed CPU need still 8.23 Watt  .
Title: Re: optimized sources
Post by: _heinz on 26 Jun 2011, 04:40:01 pm
i3 GT540M hostid=187527 (http://www.primegrid.com/show_host_detail.php?hostid=187527) got 7 Mio  (7,006,127) pg today.  ;D
Title: Re: optimized sources
Post by: _heinz on 03 Jul 2011, 12:13:08 pm
Hi Jason,

CUDA3.2 & INTEL's Compiler
Package ID: w_ccompxe_2011.1.127
I used the patch CompilerIDEPluginUpdate.zip to avoid the rules-file error sucessful.
The rules-file error is gone.  ;)

Projekt: MonteCarloMultiGPU and Projekt: MonteCarlo does not sucessful compile with Intel Compiler
ptxas hung up  :'(
========== Alles neu erstellen: 91 erfolgreich, Fehler bei 2, 0 übersprungen ==========

So far, we must wait for the next update.

heinz
Intel Composer XE 2011 Update 4
meanwhile I have CUDA4000 installed.
It compiles now the two MonteCarlo projects.
...
Compiling with Intel(R) C++ Compiler XE 12.0.4.196 [Intel(R) 64]... (Intel C++ Environment)
========== Alles neu erstellen: 93 erfolgreich, Fehler bei 0, 1 übersprungen ==========

I close now the issue.
heinz.

modify:
one is still open: the massive use of memory during compilation makes the system sluggish as you can see radixsortthrust_memory_use (http://www.britta-d.de/images/2011/radixsortthrust_memory_use.jpg)
On a 32Bit machine with 2GB RAM it hung up.  :'(
Title: Re: optimized sources
Post by: _heinz on 14 Jul 2011, 01:48:47 pm
hardware:
For all AMD enthusiasts:
AMD A8-3850 (http://www.alternate.de/html/product/AMD/A8-3850/883408/) now available in germany  (120 Euro)
AMD A8-3850 Llano APU Benchmarks (http://www.youtube.com/watch?v=iMmAn7EZi0k&NR=1)
AMD A8-3800 Llano APU & Gigabyte GA-A75-UD4H Motherboard Hands-On + Benchmarks (http://www.youtube.com/watch?v=L37q3OyZj8Q&NR=1)

a  4-core/4T  for 120 Euro, really a hot price

heinz
Title: Re: optimized sources
Post by: _heinz on 17 Jul 2011, 07:54:46 am
one is still open: the massive use of memory during compilation makes the system sluggish as you can see radixsortthrust_memory_use (http://www.britta-d.de/images/2011/radixsortthrust_memory_use.jpg)
On a 32Bit machine with 2GB RAM it hung up.  

Memory use of my R3600 ATOM during compilation-process of RadixSurtThrust_x32 (http://www.britta-d.de/images/2011/2011XE_Beta_update6_RadixSurtThrust_x32_memory_use.jpg)
I can cofirm the error is gone using the beta-compiler.  ;D
Now we wait for the official update from Intel

heinz
Title: Re: optimized sources
Post by: _heinz on 22 Jul 2011, 06:14:39 am
To use the right hardware to support AVX and CUDA
Laptop Erazer X6816 (http://aldi.medion.com/md97888/nord/?refPage=aldi#software_anker) now available in Germany.
with Geforce-GT-555M 2048MB VRAM (http://www.nvidia.de/object/product-geforce-gt-555m-de.html)
699 Euro, really a hot price.

Don't make the same error as myself, bought a laptop with i3-390M, (P6630 with GT540M) although it is launched at Q1/2011 it's previous generation.
See difference of  i7-2720QM, i7-2630QM, my i3-390M (http://ark.intel.com/compare/50067,52219,52955)
I bought it too early, could'nt wait.  :'(

heinz
Title: Re: optimized sources
Post by: _heinz on 02 Aug 2011, 02:17:14 am
2nd of august, more than 120 000 views... ;D
Time to say thank you to all readers not lost their interest on this thread.
In autumn I will be hopefully back with a repaired V8-Xeon making some good numbers ....

Thanks to all looking up here.

heinz
Title: Re: optimized sources
Post by: _heinz on 03 Aug 2011, 11:39:44 am
As my v8-xeon died I bought a Tablet PC "AIRIS One PAD" 10-inch-infotmic-X210-1GHz-Android-2-2-OS-Tablet-PC-With-GPS-Function like this one (http://www.lightinthebox.com/Fly-touch-3--10-inch-infotmic-X210-1GHz-Android-2-2-OS-Tablet-PC-With-GPS-Function_p193841.html?currency=EUR#have_reviews)
I equipt it with "32GB mcroSD Class 10" card to hold all my data.
It's very handy and runs great, a very useful toy.
I like small formfactor of hardware.  ;D
Title: Re: optimized sources
Post by: _heinz on 09 Aug 2011, 01:23:27 pm
Most of us run a little bit o'ced GPU's and and are already confronted with the downlocking problematic.
Therefore it's worth to post something about it here.
I had installed Nvidia's driver 275.33, later 280.19 and BOINC 6.12.33 64Bit and run into the downclocking trap.
That means, always if I stop BOINC and Start it again, the GPU clocked down to 50% of its frequency and come not up again.
And if I use EVGA Precision to set the frequency up nothing happened. It was not possible to run in full speed again till to the next restart.
Now I'was going back to driver 267.21 and BOINC 6.10.60 and the downclocking did not happen again. EVGA Precision works as it should do.

heinz
Title: Re: optimized sources
Post by: _heinz on 10 Aug 2011, 08:57:44 am
downclocking - upclocking
~~~~~~~~~~~~~~~~
On the picture you see what happen when I stop Boinc and then start Boinc,  the card clocked down and up again as it should be.
gt540m_clockdown_clockup (http://www.britta-d.de/images/gtx540m/gt540m_clockdown_clockup.jpg)

 ;D
Title: Re: optimized sources
Post by: _heinz on 10 Aug 2011, 01:51:55 pm
BOINC 6.10.60
10.08.2011 19:42:35      NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 3020, compute capability 2.1, 994MB, 200 GFLOPS  peak) gt540m_gpuz_data_ocl (http://www.britta-d.de/images/gtx540m/gt540m_gpuz_data_ocl.jpg)
Awaiting 10 Mio from pg in the next days...
Title: Re: optimized sources
Post by: Jason G on 11 Aug 2011, 01:40:21 am
...driver version unknown...

Hi Heinz,
  I am currently playing with nvApi , which is linked into Boinc.  I haven't looked in detail what interface they use to get the driver, but I suppose it's possible they use some outdated methods, or some other method than recommended.

For other purposes I made an extremely primitive test to see that I could access nVapi, 7zip archived exe and example screenshot attached.  If this shows the correct driver on your system ( instead of Unknown '...' ) then I might dig deeper to see what they might be doing wrong.

Ignore the stuff about view modes, it was just a trial to see if I could dig deeper into some system information, and it may not work for all systems.
Title: Re: optimized sources
Post by: Richard Haselgrove on 11 Aug 2011, 03:48:20 am
I wonder if it could be specific to notebooks?

09-Aug-2011 16:07:08 [---] Starting BOINC client version 6.12.33 for windows_x86_64
09-Aug-2011 16:07:08 [---] OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
09-Aug-2011 16:07:08 [---] NVIDIA GPU 0: GeForce GT 420M (driver version unknown, CUDA version 4000, compute capability 2.1, 994MB, 128 GFLOPS peak)
Title: Re: optimized sources
Post by: _heinz on 11 Aug 2011, 03:55:19 am
Hi Jason,
nvAPItest (http://www.britta-d.de/images/gtx540m/nvapitest.jpg) shows the correct driver version as you can see.
267.21 is the same as gpuz0.5.4 (http://www.britta-d.de/images/gtx540m/gt540m_gpuz_data_ocl.jpg) show.
Your program does it right.  :)
Title: Re: optimized sources
Post by: _heinz on 11 Aug 2011, 04:27:06 am
My ION shows the driver correct
11.08.2011 10:16:27      Starting BOINC client version 6.10.60 for windows_intelx86
11.08.2011 10:16:28      NVIDIA GPU 0: ION (driver version 27032, CUDA version 4000, compute capability 1.1, 242MB, 35 GFLOPS peak)
Title: Re: optimized sources
Post by: _heinz on 11 Aug 2011, 04:42:58 am
Hi Jason,
I posted it some months ago to Ken http://www.primegrid.com/forum_thread.php?id=3144&nowrap=true#33133
The Querying for a CUDA Device is so different in the OPTIMUS Technology,
 see http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_Developer_Guide_for_Optimus_Platforms.pdf
 page 3
 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 I think it would be better the app choose the GPU, than jiggle with profiles.
 my two cents.
 
heinz
 
Title: Re: optimized sources
Post by: Jason G on 11 Aug 2011, 05:29:09 am
I wonder if it could be specific to notebooks?

If mine works, then it's specific to Boinc  :D (i.e. they aren't using the same mechanism, I
I'll have a proper look what the use a bit later. strange they would link with nvApi and not use that. )
Title: Re: optimized sources
Post by: Jason G on 11 Aug 2011, 05:45:41 am
... I think it would be better the app choose the GPU, than jiggle with profiles.
 my two cents....
yeah optimus has differences when you need Cuda + Graphics (DX or OpenGL), which we didn't ever need to worry about.  I am curious how Boinc tells the driver version now, as it looks to sometimes give the same proper result & sometimes weird 'unknown'.  I'll look into it, as really it's likely the server needs to know the proper driver version one day, to send the right applications.
Title: Re: optimized sources
Post by: Richard Haselgrove on 11 Aug 2011, 06:04:40 am
If mine works, then it's specific to Boinc  :D (i.e. they aren't using the same mechanism, I
I'll have a proper look what the use a bit later. strange they would link with nvApi and not use that. )

Well, yes, BOINC - but it's specific to certain hosts, and thus I suspect to certain graphics cards. Looking at my host list (http://setiathome.berkeley.edu/hosts_user.php?userid=5509), it's only the 420M notebook which is missing a driver. The only other difference which would be worth considering is that the notebook is the only 64-bit host in that list - but there are enough 64-bit users around to rule that difference out. Oh, and notebook GPUs are almost by definition OEM variants - GPU-Z has problems getting memory usage from this one, for example.
Title: Re: optimized sources
Post by: Jason G on 11 Aug 2011, 06:08:55 am
Mine's 64 bit, so that's eliminated as a source.  Now that I'm curious I'll have a bit of a dig in the Boinc sources & give you another source patch... LoL
Title: Re: optimized sources
Post by: Richard Haselgrove on 11 Aug 2011, 06:12:06 am
Mine's 64 bit, so that's eliminated as a source.  Now that I'm curious I'll have a bit of a dig in the Boinc sources & give you another source patch... LoL

I'll be around for a couple of hours to run any instrumented test apps, but I've promised to go out on a call at 13:00 UTC.
Title: Re: optimized sources
Post by: Jason G on 11 Aug 2011, 06:13:27 am
I'll be around for a couple of hours to run any instrumented test apps, but I've promised to go out on a call at 13:00 UTC.
I'm off to work anyway.  Only a few hours tonight.  Heinz has just got me curious now, so I have to find out  ;)

[Later:]  Looks like some ancient nvapi.lib (both 32 & 64 bit) in the Boinc Trunk.  I'll have a look if simply bringing them up to date is easy enough, and helps (or not).

Boinc trunk 32 bit version nvapi:
- name: nvapi.lib
- size: 10.1KB
Mine: nvapi.lib, 80.9KB

Boinc trunk 64 bit version nvapi:
- name: nvapi.lib
- size: 10.7KB
Mine: nvapi64.lib, 116KB


Title: Re: optimized sources
Post by: Josef W. Segur on 11 Aug 2011, 11:21:23 am
BOINC is still trying to support Win2k, likely the most recent versions of that DLL don't. Time marches on...
                                                          Joe
Title: Re: optimized sources
Post by: Jason G on 11 Aug 2011, 11:26:40 am
BOINC is still trying to support Win2k, likely the most recent versions of that DLL don't. Time marches on...
                                                          Joe
I'll add that the nvApi interface is 'supposed' to be forward & backward compatible. I guess it seems to work for most of us, but these freaky laptop guys just never do what they are told...
Title: Re: optimized sources
Post by: _heinz on 11 Aug 2011, 08:24:22 pm
Awaiting 10 Mio from pg in the next days...
Got it gt540m_10Mio_primegrid (http://www.britta-d.de/images/gtx540m/gt540m_10Mio_primegrid.jpg)
crunching since 20 Feb 2011 | 15:48:17 UTC, including 14 days vacation, and normal use of the laptop
~~~~~~~~~~~~~~~~~~
Total 193,944,311.28
Precalculation says:
200 Mio total <--- Ziel wird in 86.68 Tagen am 6. November 2011 erreicht werden.
We will see if it come true....
heinz  ;D


Title: Re: optimized sources
Post by: _heinz on 12 Aug 2011, 05:34:46 am
V8-Xeon:
The new SuperFlower 1200W announced in Germany, so my waiting was a right done.
Super Flower SF1200P14HE Crystal Twilight 80+ PC-Netzteil 1200 Watt ATX (http://www.amazon.de/dp/B002DQ670Q/ref=asc_df_B002DQ670Q3899267?smid=A3JWKAKR8XB7XF&tag=schottenlandd-21&linkCode=asn&creative=22494&creativeASIN=B002DQ670Q)
Gewöhnlich versandfertig in 2 bis 5 Wochen.
Ready to ship 2 till 5 weeks..... so it will be in september.
Datenblatt-Super-Flower-Crystal-Twilight-SF1200P14HE-1200-Watt] (Technical description, data-paper) (http://www.chip.de/preisvergleich/166796/Datenblatt-Super-Flower-Crystal-Twilight-SF1200P14HE-1200-Watt.html)
I choose this one because I have a lot of cables and accessories from my old SuperFlower Crystal Plus 1000W.

heinz
Title: Re: optimized sources
Post by: _heinz on 12 Aug 2011, 08:04:19 am
GT540m and ION (http://www.britta-d.de/images/gtx540m/gt540m_10027999_ION_1400930.jpg)
get some respectable numbers (10Mio btw 1.4Mio)...by running optimized cuda apps, no cpu-apps used.

heinz
modify:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Happy Birthday

Als der Rechner auf den Schreibtisch kam

30 Jahre Personal Computer

von Alfred Krüger

Der Personal Computer feiert Geburtstag. Am 12. August 1981 kam der erste IBM-PC auf den Markt, die erfolgreichste Computerplattform aller Zeiten. Seine Marktmacht hat der PC mittlerweile eingebüßt. Mobile Geräte erobern den Markt.

http://www.heute.de/ZDFheute/inhalt/13/0,3672,8318541,00.html

I have one of this old machines (IBM 5160XT, 10MB FP (http://www.homecomputermuseum.de/comp/99_de.htm)) in my collection.  ;D

homecomputermuseum (http://www.homecomputermuseum.de/cgi-bin/forum/start.pl)
Title: Re: optimized sources
Post by: _heinz on 05 Sep 2011, 01:48:13 pm
not a good sign to get the new 1200W Super Flower (http://www.amazon.de/dp/B002DQ670Q/ref=asc_df_B002DQ670Q3899267?smid=A3JWKAKR8XB7XF&tag=schottenlandd-21&linkCode=asn&creative=22494&creativeASIN=B002DQ670Q)

Gewöhnlich versandfertig in 1 bis 2 Monaten.
ready to ship in one till two months   :'(

waiting.....
Title: Re: optimized sources
Post by: Jason G on 05 Sep 2011, 01:52:55 pm
waiting.....

If this takes too long, or this one blows up as well, I can recommend the Corsair AX1200 next.
Title: Re: optimized sources
Post by: _heinz on 07 Sep 2011, 04:42:41 pm
While we are waiting R3600 ATOM ION get 1,5 Mio on pg today (http://www.britta-d.de/images/primegrid/pg_r3600_atom230_1500000.jpg)
Title: Re: optimized sources
Post by: _heinz on 11 Sep 2011, 01:29:45 pm
125 000 views  ::)
Thank you to all readers having not lost interest looking up here.

Regards  heinz

Title: Re: optimized sources
Post by: _heinz on 19 Sep 2011, 10:54:43 am
We are in the beta testing phase of MultiBeam_v7 and AstroPulse, apps for NVIDIA, ATI, OpenCL
are available in our beta area.
To collect results from the different GPU/driver variants take some time and is a steady process.
Even the small ION can run our MB_v7 CUDA app and get respectable results.

heinz
 
Title: Re: optimized sources
Post by: _heinz on 26 Sep 2011, 08:30:04 am
While we are waiting my laptop i3 GT540M get 12 Mio Primegrid (http://www.britta-d.de/images/gtx540m/gt540m_12Mio_primegrid.jpg)  ;D

Title: Re: optimized sources
Post by: _heinz on 26 Sep 2011, 05:18:02 pm
testresults from our beta tests:
i3 GT540M 696 against x41e (http://www.britta-d.de/images/seti/696_x41e_gt540m_bench_results.jpg)
Atom 230 ION 696 against x41e (http://www.britta-d.de/images/seti/ION_696_x41e.jpg)
Title: Re: optimized sources
Post by: _heinz on 09 Oct 2011, 08:09:05 am
While we are waiting my laptop i3 GT540M get gt540m_13Mio_primegrid (http://www.britta-d.de/images/gtx540m/gt540m_13Mio_primegrid.jpg)    ;D
Title: Re: optimized sources
Post by: _heinz on 29 Oct 2011, 05:43:26 pm
29th october  gt540m_14Mio_primegrid (http://www.britta-d.de/images/gtx540m/gt540m_14Mio_primegrid.jpg)

Working now on the 200 Mio total...maybe in a month...

modify:20th november
boinc_total_199_Mio. (http://www.britta-d.de/images/gtx540m/boinc_total_199_Mio.jpg)
Title: Re: optimized sources
Post by: _heinz on 24 Nov 2011, 11:14:45 am
waiting.....

If this takes too long, or this one blows up as well, I can recommend the Corsair AX1200 next.
ordered it today, time to repair the v8 Xeon now  ;D
Title: Re: optimized sources
Post by: _heinz on 25 Nov 2011, 06:09:04 pm
25th of November: V8-Xeon is alive again ;D
~~~~~~~~~~~~~~~~~~~~~~~~~~~
25.11.2011 22:05:26      Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz [Family 6 Model 23 Stepping 6]
25.11.2011 22:05:26      Processor: 6.00 MB cache
25.11.2011 22:05:26      Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx tm2 dca pbe
25.11.2011 22:05:26      OS: Microsoft Windows Vista: Ultimate x64 Edition, Service Pack 2, (06.00.6002.00)
25.11.2011 22:05:26      Memory: 16.00 GB physical, 31.81 GB virtual
25.11.2011 22:05:26      Disk: 2.00 GB total, 1.57 GB free
25.11.2011 22:05:26      Local time is UTC +1 hours
25.11.2011 22:05:26      NVIDIA GPU 0: GeForce GTX 570 (driver version 26658, CUDA version 3020, compute capability 2.0, 1248MB, 1405 GFLOPS peak)
25.11.2011 22:05:26   SETI@home   Found app_info.xml; using anonymous platform
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It's running the old 1000W PSU SuperFlower Crystal Plus Model No: SF-1000K14HE
We will see how long it does.
What a story..
After a general cleaning and the following hardware changes the machine was powered off from eastern this year.
Using 3 trayless MobileRack for 3,5" SATA HDDs support Hot Swap now( still one Segate Barracuda ES.2 1000GB inserted two others are off )
Took off the two GTX470 from slot 0 and 2 and set GTX570 from slot 1 to slot 0 .
Now as I have ordered the new PSU Corsair AX1200  I went to the server and switched the power of the PSU on, then touched the startbotton on the case to see what will happen.
And wounder the board runs through its internal tests with 00 at the end and the machine started the OS Vista64 Ultimate in safe mode because the machine was not shut down last time correctly. I checked all drives on error and found same index errors which could be successful repaired.
This runs a hour but at least the machine comes up with Vista64 normally.  ;D
Now hundred of  important updates followed as usual after a long outage. Same are still running...
So we can hope to have the V8-Xeon back to test our new apps.

heinz


 
Title: Re: optimized sources
Post by: _heinz on 25 Nov 2011, 08:17:43 pm
Two times the machine hung during update process.
Security update for Microsoft Visual Studio 2005 Service Pack 1 (KB2538212) 249,8 MB hung up.. after 12 hours I must push the reset botton and restart the machine with all its trouble afterwards....

a other big is:
Security update for Microsoft Visual Studio VS2008 Service Pack 1 (KB2538241) 365,8 MB
onyone does it already ?

And don't forget I'm behind a sneak_speed_0.55mbps (http://www.britta-d.de/images/sneak_speed_0.55mbps.jpg)

heinz
Title: Re: optimized sources
Post by: Jason G on 25 Nov 2011, 09:06:41 pm
Yeah those are pretty big major updates.  Perhaps running programs interrupted the update processes.  Great that you're up and running again.

Jason
Title: Re: optimized sources
Post by: _heinz on 27 Nov 2011, 08:54:31 am
Last nigt the machine powered suddenly off, looks like PSU is not stable.
Maybe the machine was overheated, CPU2 was over 60 grd celsius. I must do something with the airflow in the machine. I got the machine up again, but with a lot of trouble, disk problems etc.
After sucessful  disk and filesystem-repair, the machine runs now again... we will see how long...
Still GPU work is done now, the 8 CPU's are doing nothing.

btw. first AMD Bulldozer are available in germany.
FX-8120 8x3.1GHz Turbo core 4.0GHz 195,99Euro
FX-8150 8x3.6GHz Turbo core 4.2GHz 240,99 Euro

Die FX-8000er-Serie verfügt über vier Module und ist somit eine native Octacore-CPU mit 8 MByte L2- und 8 MByte L3-Cache.  Die CPU selbst verfügt über viele neue Funktionen wie SSE4.2, AES, CLMUL, AVX, XOP, FMA4, CVT16 und Turbo Core 2.0, das eine dynamische Taktanpassung je nach Auslastungszustand der Kerne ermöglicht.

heinz
Title: Re: optimized sources
Post by: Claggy on 27 Nov 2011, 09:04:28 am
Last nigt the machine powered suddenly off, looks like PSU is not stable.
Maybe the machine was overheated, CPU2 was over 60 grd celsius. I must do something with the airflow in the machine. I got the machine up again, but with a lot of trouble, disk problems etc.
After sucessful  disk and filesystem-repair, the machine runs now again... we will see how long...
Still GPU work is done now, the 8 CPU's are doing nothing.
Do the heatsinks need a clean out?

Claggy
Title: Re: optimized sources
Post by: _heinz on 27 Nov 2011, 10:04:35 am
Hi Claggy,
I did a general cleaning, heatsinks too, but the inner airflow of the case must be better regulated. I built a airstream cunductor from hard paper and set it in between CPU1 and CPU2 so the hot air from CPU1 goes upwards now and not to CPU2 directly. Some measurement with AIDA64 Extreme Edition will help a bit. Outside it is cool now I can open the window. Roomtemp is 21 grd celsius at the moment.
Looks like my conductor did not help a lot.
Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode, CHiL CHL8266  (NV-Diode, 46h)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   39 °C  (102 °F)
CPU2   57 °C  (135 °F)
1. CPU / 1. Kern   40 °C  (104 °F)
1. CPU / 2. Kern   22 °C  (72 °F)
1. CPU / 3. Kern   36 °C  (97 °F)
1. CPU / 4. Kern   33 °C  (91 °F)
2. CPU / 1. Kern   29 °C  (84 °F)
2. CPU / 2. Kern   26 °C  (79 °F)
2. CPU / 3. Kern   29 °C  (84 °F)
2. CPU / 4. Kern   29 °C  (84 °F)
DIMM   74 °C  (165 °F)
GPU Diode   68 °C  (154 °F)
Temperatur 1   36 °C  (97 °F)
Temperatur 2   44 °C  (111 °F)
Temperatur 3   45 °C  (113 °F)
FB-DIMM1   78 °C  (172 °F)
FB-DIMM2   84 °C  (183 °F)
FB-DIMM3   78 °C  (172 °F)
FB-DIMM4   70 °C  (158 °F)
ST31000340NS   35 °C  (95 °F)
   
Kühllüfter   
CPU1   629 RPM
CPU2   618 RPM
North Bridge   1856 RPM
South Bridge   4278 RPM
Aux   531 RPM
Grafikprozessor (GPU)   2790 RPM  (70%)
   
Spannungswerte   
CPU1 Kern   1.137 V
CPU2 Kern   1.125 V
+1.5 V   1.536 V
+3.3 V   3.352 V
+5 V   5.125 V
+12 V   12.250 V
FSB VTT   1.211 V
North Bridge Kern   1.250 V
DIMM   1.823 V
GPU Kern   0.975 V
GPU +12V   12.109 V
GPU VRM   0.939 V
   
Strom Werte   
GPU VRM   58.50 A
   
Leistungswerte   
GPU VRM   54.75 W
~~~~~~~~~~~~~~~~~~~~~
machine runs empty, still GPU GTX570 works
27.11.2011 15:40:37      NVIDIA GPU 0: GeForce GTX 570 (driver version 26658, CUDA version 3020, compute capability 2.0, 1248MB, 1405 GFLOPS  peak)
It runs a pg wu(PPS Sieve) in 598 sec

heinz
Title: Re: optimized sources
Post by: Claggy on 27 Nov 2011, 10:13:57 am
Quote
Temperaturen   
CPU1   39 °C  (102 °F)
CPU2   57 °C  (135 °F)
1. CPU / 1. Kern   40 °C  (104 °F)
1. CPU / 2. Kern   22 °C  (72 °F)
1. CPU / 3. Kern   36 °C  (97 °F)
1. CPU / 4. Kern   33 °C  (91 °F)
2. CPU / 1. Kern   29 °C  (84 °F)
2. CPU / 2. Kern   26 °C  (79 °F)
2. CPU / 3. Kern   29 °C  (84 °F)
2. CPU / 4. Kern   29 °C  (84 °F)
DIMM   74 °C  (165 °F)
GPU Diode   68 °C  (154 °F)
Temperatur 1   36 °C  (97 °F)
Temperatur 2   44 °C  (111 °F)
Temperatur 3   45 °C  (113 °F)
FB-DIMM1   78 °C  (172 °F)
FB-DIMM2   84 °C  (183 °F)
FB-DIMM3   78 °C  (172 °F)
FB-DIMM4   70 °C  (158 °F)
ST31000340NS   35 °C  (95 °F)
It's strange that the core temps on CPU 2 are mostly lower than CPU 1, but CPU 2 has a higher overall temperature, perhaps that heatsink needs new thermal compound, or the sensor is inaccurate.

Claggy
Title: Re: optimized sources
Post by: _heinz on 27 Nov 2011, 12:13:16 pm
I found the Northbridge heatsink (40 x 40 x 5mm) make noise and the FB-DIMM heatsinks is too slow and sometimes it hung so both must be replaced in the next days. CPU2 is the right CPU if you look into the case. It seems the sensor or AIDA64 has a problem and reports wrong value.
Or now compound paste, bu I cant believe it....

heinz
Title: Re: optimized sources
Post by: _heinz on 27 Nov 2011, 06:37:43 pm
The Colorful GTX 570 runs with 840/1680/1900 and GPU temp=69 grd celsius
27.11.2011 23:39:17      NVIDIA GPU 0: GeForce GTX 570 (driver version 26658, CUDA version 3020, compute capability 2.0, 1248MB, 1613 GFLOPS  peak)

Strom Werte   
GPU VRM   59.00 A
   
Leistungswerte   
GPU VRM   55.25 W
~~~~~~~~~~~~~~
modify:
845/1690/1900
28.11.2011 01:29:07      NVIDIA GPU 0: GeForce GTX 570 (driver version 26658, CUDA version 3020, compute capability 2.0, 1248MB, 1622 GFLOPS  peak)


heinz
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2011, 01:31:16 pm
28th of November boinc_total_200_Mio (http://www.britta-d.de/images/gtx540m/boinc_total_200_Mio.jpg)  ;D

and my laptop i3 GT540M get 15Mio_primegrid (http://www.britta-d.de/images/gtx540m/gt540m_15Mio_primegrid.jpg)
heinz
Title: Re: optimized sources
Post by: _heinz on 29 Nov 2011, 08:35:29 pm
v8-Xeon, 3 GPU's under full load
no cpu-work running, I have no fitting RAM heatsink at the moment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Analog Devices ADT7473  (NV-I2C 2Eh)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   39 °C  (102 °F)
CPU2   60 °C  (140 °F)
1. CPU / 1. Kern   40 °C  (104 °F)
1. CPU / 2. Kern   22 °C  (72 °F)
1. CPU / 3. Kern   37 °C  (99 °F)
1. CPU / 4. Kern   34 °C  (93 °F)
2. CPU / 1. Kern   29 °C  (84 °F)
2. CPU / 2. Kern   25 °C  (77 °F)
2. CPU / 3. Kern   29 °C  (84 °F)
2. CPU / 4. Kern   29 °C  (84 °F)
DIMM   72 °C  (162 °F)
GPU1: GPU Diode   88 °C  (190 °F)
GPU2: Grafikprozessor (GPU)   91 °C  (196 °F)
GPU2: GPU Diode   88 °C  (190 °F)
GPU2: GPU Speicher   87 °C  (189 °F)
GPU2: GPU Umgebung   66 °C  (151 °F)
GPU3: Grafikprozessor (GPU)   91 °C  (196 °F)
GPU3: GPU Diode   88 °C  (190 °F)
GPU3: GPU Speicher   87 °C  (189 °F)
GPU3: GPU Umgebung   67 °C  (153 °F)
Temperatur 1   36 °C  (97 °F)
Temperatur 2   48 °C  (118 °F)
Temperatur 3   46 °C  (115 °F)
FB-DIMM1   79 °C  (174 °F)
FB-DIMM2   84 °C  (183 °F)
FB-DIMM3   74 °C  (165 °F)
FB-DIMM4   69 °C  (156 °F)
ST31000340NS   34 °C  (93 °F)
   
Kühllüfter   
CPU1   604 RPM
CPU2   597 RPM
North Bridge   2784 RPM
South Bridge   4526 RPM
Aux   608 RPM
GPU1   3120 RPM  (74%)
GPU2   5013 RPM  (89%)
GPU3   3308 RPM  (71%)
   
Spannungswerte   
CPU1 Kern   1.125 V
CPU2 Kern   1.137 V
+1.5 V   1.536 V
+3.3 V   3.352 V
+5 V   5.125 V
+12 V   12.188 V
FSB VTT   1.211 V
North Bridge Kern   1.250 V
DIMM   1.823 V
GPU1: GPU Kern   0.975 V
GPU1: GPU +12V   12.063 V
GPU1: GPU VRM   0.938 V
GPU2: GPU Kern   0.912 V
GPU2: GPU Vcc   3.364 V
GPU3: GPU Kern   0.962 V
GPU3: GPU Vcc   3.360 V
   
Strom Werte   
GPU1: GPU VRM   63.50 A
   
Leistungswerte   
GPU1: GPU VRM   59.50 W
~~~~~~~~~~~~~~~~~~~~~~~
seems the PSU can do it...
the middle GPU is problematic, not enough(still 2mm) space to get enough air
will see how long it does, before the driver crashes
Title: Re: optimized sources
Post by: _heinz on 29 Nov 2011, 09:14:21 pm
it was a short time (1 hour) and driver crashes... :'(
must power off now
Title: Re: optimized sources
Post by: _heinz on 30 Nov 2011, 06:15:09 pm
I run the new installer v0.39 on the V8-Xeon, But I can't get any work from seti, pitty.
With stock frequency all 3 GPU's are now running fine.
I set all 8 CPU's under 100% full load with other work.

heinz
Title: Re: optimized sources
Post by: _heinz on 01 Dec 2011, 06:19:06 pm
Today I got some work from seti GPU and CPU. All are done now. One has  a -9 error, others looks OK, some are waiting to confirm
http://setiathome.berkeley.edu/results.php?userid=8071209

heinz
Title: Re: optimized sources
Post by: _heinz on 03 Dec 2011, 01:47:01 pm
Last 24 hours I had have 2 driver crashes. On the FB-DIMMS I had have 100 grd Celsius, so I stopped work on any CPU. I need a new FB-DIMM-heatsink, the old is under repair/reclamation.
The GPU temps are GPU1=89 rpm=3360, GPU2=94 rpm=5400, GPU3=91 rpm=3360.
As you can see there is still 2-3mm space between the cards, really small to get enough could air.
V8_Xeon_2GTX470_570_inserted (http://www.britta-d.de/images/V8_Xeon_2GTX470_570_inserted.JPG)

heinz

Title: Re: optimized sources
Post by: sunu on 03 Dec 2011, 04:04:59 pm
Make it look like this. This is how I have my pc. I've tied the bottom of the fan to the chassis and the top to the cpu heatsink.

Title: Re: optimized sources
Post by: _heinz on 04 Dec 2011, 04:56:14 pm
Hi sunu,
good idea to set a additional fan before the 3 GTX, but from the right side in the case.
As you can see the door of the case has 2 big fans (http://www.britta-d.de/bilder/server/page2.htm) each with potentiometer to regulate rpm.
Best would be to construct a own heatsink for the FB-DIMMS.
Will see what I can do...

heinz
Title: Re: optimized sources
Post by: sunu on 04 Dec 2011, 06:14:18 pm
With 2 CPUs and 3 high performance GPUs it's impossible to keep low temperatures with the case closed unless you watercool all of them. And even with the case open you will still need a fan placed as I've shown you, to force air between the graphics cards. I use a 38mm thick fan for high static pressure in order to push air as hard as possible in whatever cracks are open.

Where are the FB-DIMMS located? I can't locate them clearly from the photo. Are they those things between the CPUs and the top card? Have you tried something like these http://www.computeruniverse.net/products/e90199636/corsair-dominator-airflow-cooling-fan.asp?sr=corsair+dominator+fan  http://www.computeruniverse.net/products/e90322367/corsair-dominator-airflow-fan-triple-kit.asp?sr=corsair+dominator+fan

You could also change some things in your CPUs. What thickness are the fans on the thermalrights? Those thermalrights have high density fins and they perform better with high static pressure fans, that is 38mm fans. Or use dual fans in each thermalright in a push-pull config (this will also help with the following).

Also there is a huge hot spot between the two CPUs. You'll need to get the air moving in the space between the two heatsinks. Move the fan from the left heatsink to the front of it, pushing instead of pulling air.
Title: Re: optimized sources
Post by: _heinz on 06 Dec 2011, 08:34:07 am

My P4 2,6GHz machine died,  :'(
I shut it down and now never switched it on.
If I push the "on" switch nothing happened,
maybe to replace the PSU

I could reanimate the old P4 by insert a new PSU and changed the graphic-adapter.
Removed the NVIDIA GeForce 4 4200Ti(NV28) and set in a HIS Radeon HD4670 (H467QS1GHA).
IceQ, AGP, 1GB DDR3, HDMI(Blue-ray + HD DVD support), 1080p Full HD 7.1 surround support, 
OpenGL 2.0, OpenCL 1.0, SM4.1, UVD 2,
native HDMI, DVI, VGA port,
ATI Avivo HD video and display technology.
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Grafikhardware:
Primäradapter      
Grafikkarten-Hersteller   Built by AMD   
Grafik-Chipsatz   ATI Radeon HD 4600 Series     
Geräte-ID   9495   
Anbieter   1002   
   
Subsystem-ID   0028   
Subsystem-Anbieter-ID   1002   
   
Grafikbus-Unterstützung   AGP   
Maximale Buseinstellungen   AGP 8X   
   
BIOS-Version   011.022.006.000   
BIOS-Teilnummer   113-SBRK2G02-10R-01   
BIOS-Datum   2010/06/11   
   
Speichergröße   1024 MB   
Speichertyp   DDR3   
   
Kern-Taktfrequenz in MHz   750 MHz   
Speicher-Taktfrequenz in MHz   800 MHz   
Gesamtspeicherbandbreite in GB/s   25,6 GB/s   
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Treiber-Paketversion   8.911-111025a-128355E-ATI   
Catalyst™-Version   11.11   
Anbieter   AMD.   
2D-Treiberversion   6.14.10.7236   
2D-Treiberpfad   System/CurrentControlSet/Control/Video/{A1A805DD-AE12-4CCA-AF54-F9D8DD69306C}/0000   
Direct3D-Version   6.14.10.0873   
OpenGL-Version   6.14.10.11251   
Catalyst™ Control Center-Version   2011.1025.2152.37348   
AIW/VIVO WDM-Treiberversion   6.14.10.6238   
AIW/VIVO WDM-SP-Treiberversion   6.14.10.6238   
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now I have a ATI machine for testing Raistmers ATI clients too.



heinz


Title: Re: optimized sources
Post by: _heinz on 06 Dec 2011, 08:49:37 am
IceQ HD4670 AGP
~~~~~~~~~~~

if I run OpenCL device query, it shows:
 CL_PLATFORM_NAME:      AMD Accelerated Parallel Processing
 CL_PLATFORM_VERSION:   OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)
 OpenCL SDK Revision:   7027912


OpenCL Device Info:

 2 devices found supporting OpenCL:

 ---------------------------------
 Device ATI RV730
 ---------------------------------
  CL_DEVICE_NAME:                       ATI RV730
  CL_DEVICE_VENDOR:                     Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION:                    CAL 1.4.1607
  CL_DEVICE_VERSION:                    OpenCL 1.0 AMD-APP-SDK-v2.5 (793.1)
  CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:          8
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:        128 / 128 / 128
  CL_DEVICE_MAX_WORK_GROUP_SIZE:        128
  CL_DEVICE_MAX_CLOCK_FREQUENCY:        750 MHz
  CL_DEVICE_ADDRESS_BITS:               32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:         128 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:            512 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
  CL_DEVICE_LOCAL_MEM_TYPE:             global
  CL_DEVICE_LOCAL_MEM_SIZE:             16 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:              0
  CL_DEVICE_MAX_READ_IMAGE_ARGS:        0
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       0
  CL_DEVICE_SINGLE_FP_CONFIG:           INF-quietNaNs round-to-nearest round-to-
zero round-to-inf fma

  CL_DEVICE_IMAGE <dim>                 2D_MAX_WIDTH     0
                                        2D_MAX_HEIGHT    0
                                        3D_MAX_WIDTH     0
                                        3D_MAX_HEIGHT    0
                                        3D_MAX_DEPTH     0

  CL_DEVICE_EXTENSIONS:                 cl_khr_gl_sharing
                                        cl_amd_device_attribute_query

  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>  CHAR 16, SHORT 8, INT 4, LONG 2, FLOAT 4
, DOUBLE 0


 ---------------------------------

...
...

oclDeviceQuery, Platform Name = AMD Accelerated Parallel Processing, Platform Ve
rsion = OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1), SDK Revision = 7027912, NumDevs = 2
, Device = ATI RV730, Device =               Intel(R) Pentium(R) 4 CPU 2.66GHz

System Info:

 Local Time/Date = 14:24:53, 12/6/2011
 CPU Arch: 0
 CPU Level: 15
 # of CPU processors: 1
 Windows Build: 2600
 Windows Ver: 5.1


PASSED


Press <Enter> to Quit...
-----------------------------------------------------------
This AGP card support OpenCL

heinz
Title: Re: optimized sources
Post by: _heinz on 06 Dec 2011, 12:35:05 pm
excerpt from v0.39 installer Readme:
The ATI MB application will not work on ATI cards with workgroup size 128
(e.g. HD43xx).
HD4670 has:
CL_DEVICE_MAX_WORK_GROUP_SIZE:        128

 :'(  :'(  :'(
why  ?
I'm disappointed....

GPUZ shows: gpuz_hd4670 (http://www.britta-d.de/images/ati/gpuz_hd4670.jpg)

heinz
Title: Re: optimized sources
Post by: _heinz on 06 Dec 2011, 03:10:31 pm
I installed now:
for Astropulse
ap_5.06_win_x86_SSE2_OpenCL_ATI_r521.exe

MultiBeam
AK_v8b2_win_SSE2.exe

BOINC shows:
06.12.2011 20:58:39      ATI GPU 0: ATI Radeon HD 4600 series (R730) (CAL version 1.4.1607, 1024MB, 480 GFLOPS peak)


hopefully I will get some work....when seti is up again.

heinz
Title: Re: optimized sources
Post by: _heinz on 06 Dec 2011, 04:26:24 pm
HD4670 AGP, here is what clinfo shows:
~~~~~~~~~~~~~~~~~~~~~~~~~

C:\A\clinfo>echo off
clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.1 AMD-APP-SDK-v2.5 (79
3.1)
  Platform Name:                                 AMD Accelerated Parallel Proces
sing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callbac
k cl_amd_offline_devices


  Platform Name:                                 AMD Accelerated Parallel Proces
sing
Number of devices:                               2
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Device ID:                                     4098
  Max compute units:                             8
  Max work items dimensions:                     3
    Max work items[0]:                           128
    Max work items[1]:                           128
    Max work items[2]:                           128
  Max work group size:                           128
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  4
  Preferred vector width double:                 0
  Max clock frequency:                           750Mhz
  Address bits:                                  32
  Max memory allocation:                         134217728
  Image support:                                 No
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              32768
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    None
  Cache line size:                               0
  Cache size:                                    0
  Global memory size:                            536870912
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             16384
  Error correction support:                      0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   011BA4F4
  Name:                                          ATI RV730
  Vendor:                                        Advanced Micro Devices, Inc.
  Driver version:                                CAL 1.4.1607
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.0 AMD-APP-SDK-v2.5 (79
3.1)
  Extensions:                                    cl_khr_gl_sharing cl_amd_device
_attribute_query


  Device Type:                                   CL_DEVICE_TYPE_CPU
  Device ID:                                     4098
  Max compute units:                             1
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           1024
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  4
  Preferred vector width double:                 0
  Max clock frequency:                           2672Mhz
  Address bits:                                  32
  Max memory allocation:                         1073201152
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            8192
  Max image 2D height:                           8192
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   4096
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             No
  Cache type:                                    Read/Write
  Cache line size:                               0
  Cache size:                                    0
  Global memory size:                            1073201152
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             32768
  Error correction support:                      0
  Profiling timer resolution:                    279
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     Yes
  Queue properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   011BA4F4
  Name:                                                        Intel(R) Pentium(
R) 4 CPU 2.66GHz
  Vendor:                                        GenuineIntel
  Driver version:                                2.0
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.1 AMD-APP-SDK-v2.5 (79
3.1)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_
global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store
 cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_ve
c3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt


Drücken Sie eine beliebige Taste . . .

heinz
Title: Re: optimized sources
Post by: Claggy on 06 Dec 2011, 06:05:43 pm
excerpt from v0.39 installer Readme:
The ATI MB application will not work on ATI cards with workgroup size 128
(e.g. HD43xx).
HD4670 has:
CL_DEVICE_MAX_WORK_GROUP_SIZE:        128

 :'(  :'(  :'(
why  ?
I'm disappointed....

GPUZ shows: gpuz_hd4670 (http://www.britta-d.de/images/ati/gpuz_hd4670.jpg)

heinz
heinz, you'll want to try the MB7_win_x86_SSE3_OpenCL_ATi_LHD4K_r390.exe app from the MB7 r390 sanity check thread, which is especially for GPUs with Max Workgroup size 128

Claggy
Title: Re: optimized sources
Post by: _heinz on 07 Dec 2011, 01:25:47 pm
I run the testcase with MB7_win_x86_SSE3_OpenCL_ATi_LHD4K_r390,
but mine P4 2.66 has still SSE2
I need a SSE2 version of LHD4K
~~~~~~~~~~~~~~~~~~~~
Informationsliste   Wert
CPU-Eigenschaften   
CPU Typ   Intel Pentium 4, 2666 MHz (20 x 133)
CPU Bezeichnung   Northwood
CPU stepping   C1
Befehlssatz   x86, MMX, SSE, SSE2
Vorgesehene Taktung   2667 MHz
Min / Max CPU Multiplikator   20x / 20x
Engineering Sample   Nein
L1 Trace Cache   12K Instructions
L1 Datencache   8 KB
L2 Cache   512 KB  (On-Die, ECC, ATC, Full-Speed)
   
CPU Technische Informationen   
Gehäusetyp   478 Pin FC-PGA2
Gehäusegröße   35 mm x 35 mm
Transistoren   55 Mio.
Fertigungstechnologie   6M, 0.13 um, CMOS, Cu, Low-K
Gehäusefläche   131 mm2
Kern Spannung   1.475 - 1.55 V
I/O Spannung   1.475 - 1.55 V
Typische Leistung   38.7 - 89.0 W  (Abhängig von der Taktung)
Maximale Leistung   49 - 109 W  (Abhängig von der Taktung)
   
CPU Hersteller   
Firmenname   Intel Corporation
Produktinformation   http://ark.intel.com/search.aspx?q=Intel Pentium 4
Treiberupdate   http://www.aida64.com/driver-updates
   
CPU Auslastung   
CPU #1   0 %

heinz
Title: Re: optimized sources
Post by: _heinz on 07 Dec 2011, 06:48:25 pm
HD4670 AGP
I can confirm to run successful a pg wu in 2h 35min, GPU load ~90% and CPU load was 100%, so there must be some issue in app or driver. CPU load max 5% should it be I think.
Have a look at hostid=232541 (http://www.primegrid.com/results.php?hostid=232541)

Everest shows:
Gerätebeschreibung
AGP 8x: ATI Radeon HD 4670 AGP (RV730)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Informationsliste   Wert
Grafikprozessor Eigenschaften   
Grafikkarte   ATI Radeon HD 4670 AGP (RV730)
BIOS Version   011.022.006.000.000000
BIOS Datum   06/11/10 04:09
GPU Codename   RV730 Pro
Teilenummer   113-SBRK2G02-10R-01
PCI-Geräte   1002-9495 / 1002-0028  (Rev 00)
Transistoren   514 Mio.
Fertigungstechnologie   55 nm
Gehäusefläche   146 mm2
Bustyp   AGP 8x @ 8x
Speichergröße   1 GB
GPU Takt   750 MHz  (Original: 750 MHz)
RAMDAC Takt   400 MHz
Pixel Pipelines   8
Texturen Mapping Einheiten   32
Unified Shaders   320  (v4.1)
DirectX Hardwareunterstützung   DirectX v10.1
Pixel Füllrate   6000 MPixel/s
Texel Füllrate   24000 MTexel/s
   
Speicherbus-Eigenschaften   
Bustyp   GDDR3
Busbreite   128 Bit
Tatsächlicher Takt   796 MHz (DDR)  (Original: 800 MHz)
Effektiver Takt   1593 MHz
Bandbreite   24.9 GB/s
   
Auslastung   
Grafikprozessor (GPU)   91%
   
ATI PowerPlay (BIOS)   
State #1   Grafikprozessor (GPU): 600 MHz, Speicher: 750 MHz  (Boot)
State #2   Grafikprozessor (GPU): 750 MHz, Speicher: 800 MHz
State #3   Grafikprozessor (GPU): 750 MHz, Speicher: 800 MHz  (UVD)
State #4   Grafikprozessor (GPU): 750 MHz, Speicher: 800 MHz
   
Grafikprozessorhersteller   
Firmenname   Advanced Micro Devices, Inc.
Produktinformation   http://www.amd.com/us/products/desktop/graphics
Treiberdownload   http://sites.amd.com/us/game/downloads
Treiberupdate   http://www.aida64.com/driver-updates

heinz
Title: Re: optimized sources
Post by: _heinz on 12 Dec 2011, 03:18:29 pm
Meanwhile I tried several ATI app's from different projects with driver 11.11(primegrid, Moo)
None of them has a acceptable CPU usage, min 60 - max 100% CPU usage.
Nothing changed since years, ATI hardware is good, but driver support is catastrophic.
11.11 driver is not really usable for GPU calculations, it forces 100% CPU usage.

heinz
modify: ATI app of Collatz crashed also
Stderr output
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
 - exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>

Running Collatz Conjecture (3x+1) ATI GPU application version 2.09 by Gipsel (Win32, CAL 1.4)
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 824633720832 numbers starting with 2372965778048095594856

CPU: Intel(R) Pentium(R) 4 CPU 2.66GHz (1 cores/threads) 2.67271 GHz (0ms)

CAL Runtime: 1.4.1607
Found 1 CAL device

Device 0: ATI Radeon HD4600 (RV730) 1024 MB local RAM (remote 64 MB cached + 128 MB uncached)
GPU core clock: 750 MHz, memory clock: 800 MHz
320 shader units organized in 8 SIMDs with 8 VLIW units (5-issue), wavefront size 32 threads
not supporting double precision

Initializing lookup table (16384 kB) ... done
Starting WU on GPU 0
Copy lookup table to GPU memory (16384 kB)
Initialize step array on GPU (256 MB)
predicted runtime per iteration is 167 ms (33.3333 ms are allowed), dividing each iteration in 6 parts
borders of the domains at 0 688 1368 2048 2736 3416 4096
No checkpoint data found.


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00BB7E66 read attempt to address 0x0000014C


 :'( :'(
Title: Re: optimized sources
Post by: _heinz on 12 Dec 2011, 04:01:38 pm
Back to V8-Xeon,
I run now the third day under full load (CPU+GPU) and can get ~900 000 cr/day (2GTX470+1GTX570).
Pitty I cant reach the million/day with this hardware-configuration.
3GTX570 or 3 GTX580 could do it...
Seems stable now....

heinz
Title: Re: optimized sources
Post by: _heinz on 14 Dec 2011, 05:29:51 pm
V8-Xeon is back in the first 20 computer of the toplist of primegrid. Number 19 today.  ;D
It's GPU's blow up 384 PPS-wu's per day now.
I got no work from Astropulse...

heinz

Title: Re: optimized sources
Post by: _heinz on 16 Dec 2011, 04:53:30 am
ATI HD4670 AGP
After running a week primegrid we can say production output is hd4670_pg_output (http://www.britta-d.de/images/ati/hd4670_pg_output.jpg) 20000 points per day.
This OpenCL app btw. driver 11.11 forces CPU 100%.
Hoping we will get a well working driver update soon.

heinz
Title: Re: optimized sources
Post by: _heinz on 16 Dec 2011, 12:01:42 pm
V8-Xeon
ID-number    number owner    avg/credit       summary cr      Boinc ver.
ID: 173588   15       _heinz     743,356.24     170,010,538     6.10.58

Number 15 of the pg tophost list (http://www.primegrid.com/top_hosts.php) today
170 Mio pg now
Still possible with optimized CUDA application.

heinz
Title: Re: optimized sources
Post by: _heinz on 20 Dec 2011, 11:55:33 am
new milestones:
20th of december 2011
Current total Credit 220,708,713.18

modify:
30th of december 2011
My ION get 2 Mio primegrid (http://www.britta-d.de/images/seti/ION_2Mio_primegrid.jpg)
statistic shows: R3600_2Mio_primegrid (http://www.britta-d.de/images/seti/R3600_2Mio_primegrid.jpg)  ::)  ;D  ::)

sylvester 2011
200Mio_primegrid (http://www.britta-d.de/images/seti/200Mio_primegrid.jpg)
boinc_200Mio_primegrid (http://www.britta-d.de/images/seti/boinc_200Mio_primegrid.jpg)

Happy new Year 2012  ;D

Thank you to all readers looking up here.
Happy crunching 2012

 
Title: Re: optimized sources
Post by: _heinz on 05 Jan 2012, 07:12:02 pm
6th of january
v8-Xeon pg_number_8 (http://www.britta-d.de/images/seti/pg_number_8.jpg)  ;D
top_hosts (http://ww.primegrid.com/top_hosts.php)

Title: Re: optimized sources
Post by: Raistmer on 06 Jan 2012, 03:59:40 am
6th of january
v8-Xeon pg_number_8 (http://www.britta-d.de/images/seti/pg_number_8.jpg)  ;D
top_hosts (http://ww.primegrid.com/top_hosts.php)


Congratulations, Heinz! Keep to climb the  ladder!  :D
Title: Re: optimized sources
Post by: _heinz on 07 Jan 2012, 09:35:35 am
Now after a month running the HD4670AGP its time for a summary.
Started to crunch with the ati OCL application and driver 12.1 on 07.th of december, now a month later on 7th of january let's have a look at the results.
hd4670_600000 (http://www.britta-d.de/images/ati/hd4670_600000.JPG)
As we can see HD4670AGP earned ~600000 in 30 days.
Here is a look at the results on this host (http://www.primegrid.com/results.php?hostid=232541)
No error occured during the testperiod of one month.
0.6Mio per month is a respectable result for this old machime with P4 2.66MHz from the year 2005.
Now we wait for better driver which will hopefully reduce the CPU-usage to 5%.
On seti's side I'm waiting for Raistmers ati application to support workgroupsize=128
I need a SSE2 version of LHD4K

I bought this ATI Radeon 4670AGP for development and testing OCL, and it does.

heinz
Title: Re: optimized sources
Post by: _heinz on 09 Jan 2012, 02:53:55 pm
After installing new driver 11-12_agp-hotfix_xp32_dd_ccc.exe
cpu usage for primegrid's ati app is now reduced by 50% (http://www.britta-d.de/images/ati/hd4670_AGP_cpuload_pg1.38.jpg)
have a look at the progress..

modify some days later:
CPU usage is at 50%, but runtime increased from 13800 sec to 15000 sec, not so good

heinz
Title: Re: optimized sources
Post by: _heinz on 19 Jan 2012, 04:18:23 pm
HD4670 AGP awaiting 1 Mio pg on 28th january
Title: Re: optimized sources
Post by: _heinz on 20 Jan 2012, 04:52:42 pm
A short look at V8-Xeon
NVIDIA GPU 0: GeForce GTX 570 (driver version 26658, CUDA version 3020, compute capability 2.0, 1248MB, 1603 GFLOPS peak)
NVIDIA GPU 1: GeForce GTX 470 (driver version 26658, CUDA version 3020, compute capability 2.0, 1249MB, 1308 GFLOPS peak)
NVIDIA GPU 2: GeForce GTX 470 (driver version 26658, CUDA version 3020, compute capability 2.0, 1249MB, 1326 GFLOPS peak)

I have an old BIOS on one of the GTX470, and this card shows some higher temps as the other with newer BIOS.
Next I will flash the BIOS of my graphicadapters. Perhaps I can squeeze still a bit more out of it.

btw today 20th of january I got 250 Mio total (http://www.britta-d.de/images/seti/250Mio_total.jpg) credit  ;D
Current Credit (based on incremental update) 250,641,103.33

Title: Re: optimized sources
Post by: _heinz on 25 Jan 2012, 11:31:32 am
25th january, v8-Xeon today tophost_pg_number_6 (http://www.britta-d.de/images/seti/tophost_pg_number_6.jpg)
The machine is crunching alone on its GPU's, no cpu work is activated.
Title: Re: optimized sources
Post by: _heinz on 25 Jan 2012, 04:46:16 pm
If you do'nt know it, have a look at the Fermilab (http://www.fnal.gov/)  :o
have fun to explore it

_heinz
Title: Re: optimized sources
Post by: _heinz on 28 Jan 2012, 11:12:27 am
HD4670 AGP awaiting 1 Mio pg on 28th january
I got it on 27 january HD4670 AGP 1Mio pg (http://www.britta-d.de/images/ati/hd4670_pg_1Mio.jpg)

boinc stats shows pg_id=232541 (http://de.boincstats.com/stats/host_graph.php?pr=pg&id=232541)
Summary we can say HD4670 AGP can do ~20000 pg credits daily continous.

If we would have now a SSE2 OPENCL astropulse app we could be happy.

 ;D
 modify:
today 31th january I stop crunching on HD4670 AGP, it is proven.
Title: Re: optimized sources
Post by: _heinz on 09 Feb 2012, 02:46:32 pm
Trouble with the internetconnection over a whole day, no internet, no telefon, not enough work on V8-Xeon.
And now as internet is back V8-Xeon produces more results as I can upload, the queue to upload increase slowly....
(http://www.speedtest.net/result/1762814262.png)
France Telecom connection   :o

heinz
Title: Re: optimized sources
Post by: _heinz on 15 Feb 2012, 06:18:22 pm
Yesterday V8-Xeon server get disk error on partition where BOINC is installed. I use a own 2GB partition still for BOINC. I had to reformat it and installed BOINC again.  Bad block is shown in harddisk 0\DR0.
---------------------------
Protokollname: System
Quelle:        disk
Datum:         16.02.2012 01:55:56
Ereignis-ID:   7
Aufgabenkategorie:Keine
Ebene:         Fehler
Schlüsselwörter:Klassisch
Benutzer:      Nicht zutreffend
Computer:      V8-SK01
Beschreibung:
Fehlerhafter Block bei Gerät \Device\Harddisk0\DR0.
Ereignis-XML:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="disk" />
    <EventID Qualifiers="49156">7</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2012-02-16T00:55:56.172Z" />
    <EventRecordID>451446</EventRecordID>
    <Channel>System</Channel>
    <Computer>V8-SK01</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\Harddisk0\DR0</Data>
    <Binary>030080000100000000000000070004C0000100009C0000C0000000000000000000CCDC80000000003E58010000000000FFF FFFFF00000000580000840200000000200A1240032000000000003C00000000A0621280FAFFFF1891681080FAFFFF000000 0000000000609C681080FAFFFF607A441080FAFFFF666E400000000000280000406E6600000800000000000000F00003000 000000B00000000000000000000000000000000</Binary>
  </EventData>
</Event>
--------------------------
Further I have some CRC sector errors in C so it was not possible to make a image backup. Looks like I had to low level format the disk in the next future.  :'(

btw. today BOINC total shows 277,331,153.42

heinz
 
Title: Re: optimized sources
Post by: _heinz on 23 Feb 2012, 11:21:28 am
Be warned to run Cullen/Woodall(Sieve)1.12 cpu application! You must have a very good cooling solution!
It heated up the FB-DIMM's of V8-Xeon to 118 grd Celsius.
pg_fb-dimm-118grd (http://www.britta-d.de/images/primegrid/pg_fb-dimm-118grd.jpg)
i3-390-cpu-82grd (http://www.britta-d.de/images/primegrid/pg_i3-390-hot.jpg)

 :o

heinz


Title: Re: optimized sources
Post by: _heinz on 24 Feb 2012, 09:40:31 am
Friday 24th February: shutdown V8-Xeon to clean and repair the disk.
Title: Re: optimized sources
Post by: _heinz on 28 Feb 2012, 09:41:39 am
I was surprized about the dust bunnies as I opened the case. Last cleanup was in november short time ago. I cleaned all especially the two big CPU cooler and the fans. Dust was behind the fans. The FBDIMMS looked clean so I did not take it off.
As all cleaning work was done I started the server and it booted OK.
I changed the disk0 against a other one, I had installed W7 last year.
( I have 3 start disks with different OS, Vista Ultimate, W7 Ultimate, XP64 Professional)
W7 must be activated again and downloaded some days a lot of updates, which was a pain behind my slow internet connection.
Now I have all update and my developer environment is installed again.
VS2008 Professional + Intel XE 2011 + CUDA 4.1

Flash BIOS of GTX470
Next step was to change the BIOS of Graphicadapter 1 (GTX470) with newer BIOS from Graphicadapter 2 (GTX470)
Todo this you need GPUZ0.5.9 (http://www.techpowerup.com/downloads/SysInfo/GPU-Z/), and nvflash for windows (http://www.techpowerup.com/downloads/Utilities/BIOS_Flashing/NVIDIA/) and NiBiTor v6.03 (http://www.mvktech.net/index.php?option=com_remository&func=fileinfo&id=3559).
DOWNLOAD: NiBiTor v6.03 via mvktech.net -> Klick (http://www.smartredirect.de/ad/clickGate.php?u=Ha2Rd3xX&m=1&p=9yUI69TI8b&t=klf5673z&st=&s=&splash=2&url=http%3A%2F%2Fwww.mvktech.net%2Findex.php%3Foption%3Dcom_remository%26func%3Dfileinfo%26id%3D3559&r=http%3A%2F%2Fwww.hardwareluxx.de%2Fcommunity%2Ff14%2Fnvidia-karten-bios-mit-nibitor-editieren-und-per-nvflash-flashen-634387.html)
First step is to make backups from your original BIOS of all graphicadapters.
This can be easy done with GPUZ0.5.9. Save the file as .bin or with nvflash itself.
You can later edit .bin files with NiBiTor to make necessary changes and store `s .rom.
Install nvflash and NiBiTor best into your user account directly. This store the backuped original BIOS on the same place.
I used nvflash to make the backups. I made 3 links from nvflash with added backup entries, for each adaptor one.
Adaptor 0 GTX570 (http://www.britta-d.de/images/nvflash/adapter0_gpuz_gtx570.jpg) BIOS:70.10.17.00.03(P1261-0005) Device ID:10DE-1081 backup as orig0.rom use link"nvflash 0 backup (http://www.britta-d.de/images/nvflash/nvflash_backup_orig0.rom.jpg)"
Adaptor 1 GTX470 (http://www.britta-d.de/images/nvflash/adapter1_gpuz_gtx470.jpg) BIOS:70.00.21.00.03(P1025-0006) Device ID:10DE-06CD backup as oirig1.rom use link"nvflash 1 backup (http://www.britta-d.de/images/nvflash/nvflash_backup_orig1.rom.jpg)"
Adaptor 2 GTX470 (http://www.britta-d.de/images/nvflash/adapter2_gpuz_gtx470.jpg) BIOS:70.00.1A.00.03(P1025-0006) Device ID:10DE-06CD backup as orig2.rom use link"nvflash 2 backup (http://www.britta-d.de/images/nvflash/nvflash_backup_orig2.rom.jpg)"

to make the backups for each adaptor click easy on the backup link. If you have done it for all three, You will find the backupfiles orig0.rom orig1.rom orig2.rom in your user account.

if we look now with  NiBitor to orig1.rom (http://www.britta-d.de/images/nvflash/nibitor_orig1_Board_ID_5E00.jpg) we see under Adv. Info:
Device ID:06CD
Sub Vendor ID:10DE
Sub System ID:079F
Board ID:5E00

if we look with NiBitor to orig2.rom (http://www.britta-d.de/images/nvflash/nibitor_orig2_Board_ID_DD00.jpg) we see under Adv. Info:
Device ID:06CD
Sub Vendor ID:10DE
Sub System ID:079F
Board ID:DD00

If we want to use the BIOS of orig2.rom to flash adaptor1 the original Board ID of adaptor1 should not be changed, so we must edit Board ID:DD00 to Board ID:5E00 and save as 470-1mod.rom nibitor_470-1mod_changed_Board_ID_5E00 (http://www.britta-d.de/images/nvflash/nibitor_470-1mod_changed_Board_ID_5E00.jpg)
Now make a link"nvflash 1 470-1mod (http://www.britta-d.de/images/nvflash/nvflash_adaptor1_470-1mod.rom.jpg)" of nvflash.exe using adaptor1 with file 470-1mod.rom
Now make a link"nvflash 1 original (http://www.britta-d.de/images/nvflash/nvflash_adaptor1_orig1.rom.jpg)" of nvflash.exe using adaptor1 with file orig1.rom
We can now flash adaptor1 with the modified Bios from adaptor2 by easiely clicking on the link "nvflash 1 470-1mod". A DOS windows opened, nvflash.exe is executed and you must confirm with y to flash. After some seconds the flash is done DOS windows closed automatic and the graphicadapter resets. Now call GPUZ and you see adaptor1 has the new same BIOS as grafikadaper2.
For the case something go wrong we can easy click on the link "nvflash 1 original (http://www.britta-d.de/images/nvflash/nvflash_adaptor1_orig1.rom.jpg)" to write the original BIOS back to adaptor1.

This all can be done under W7 full operational, but BOINC should not run any GPU-apps on the grafikadapter to flash.
I can confirm that this  works on my machine V8-Xeon.

And as always: You do all modifications on your own risk.

I restarted the machine and crunched some hours under full load, but adaptor1(GTX470) with the new BIOS run up to 110 grd Celsius.
This are ~20 grd more than with the original BIOS. Reason unknown.....
So I flashed the original BIOS of adaptor1 back.
Now adaptor1 show  evga_gpu2_90grd (http://www.britta-d.de/images/nvflash/evga_gpu2_90grd.jpg) under full load as before.
See v8-xeon under full load (http://www.britta-d.de/images/nvflash/v8-xeon_full_load.jpg)
interesting point are the temps and RPM of fans of the 3 grapicadapters.
GPU1 3240 RPM (74%) temp=89 grd Celsius
GPU2 5346 RPM (92%) temp=93 grd Celsius
GPU3 3784 RPM (77%) temp=90 grd Celsius
Goal was to bring the RPM's down as GPU3 has 77% at 90 grd Celsius. Both cards are GTX470 but have different BIOS.
But easy use the BIOS of GPU3 without any additional changes did not work as recommended.

Your hints and fora links are welcome.
add:
downloads guru3d (http://downloads.guru3d.com/downloadget.php?id=2604&file=4&evp=a7fe3aebb6d0e971af4efb3ae87903bc) NiBiTor 6.04 and other interesting downloads

heinz
Title: Re: optimized sources
Post by: _heinz on 28 Feb 2012, 06:34:03 pm
V8-Xeon is crunching away with temp of all_gpus_88grd_roomtemp_20grd (http://www.britta-d.de/images/nvflash/evga_all_gpus_88grd_roomtemp_20grd.jpg)

Looks like my P4 with AGP HD4670 goes slowly to dead again. Sometime after a longer run no VGA output is shown. Then I switch the machine off. If I switch the machine on the graphicadapter shows a red line and says additional voltage cable not connected or missing. This happened twice again. If I switch the power completely off for a longer time the machine starts again. Maybe the new powersupply works not correct. Now after 6 hours outage the machine started again.
Statistic shows pg 1320000 all done with the AGP HD4670 OpenCL application

Tomorrow I will open the case to look up for any cables contact issues...

heinz
Title: Re: optimized sources
Post by: _heinz on 29 Feb 2012, 02:46:07 pm
Yesterday V8-Xeon server get disk error on partition where BOINC is installed. I use a own 2GB partition still for BOINC. I had to reformat it and installed BOINC again.  Bad block is shown in harddisk 0\DR0.
--------------------------- 
Good News
Today 29th february I got the disk repaired by deleting the partition, make new partition, set it as primary, and format it new, so the error is gone.
This was drive D: and on it was still BOINC and some utilities, so the lost parts was small.
After all was done system started normally, I copied BOINC from my USB stick back and all works as before. So we have now again a full operational disk with Vista Ultimate + developer environment.
I feel now better  ;D

heinz
Title: Re: optimized sources
Post by: _heinz on 01 Mar 2012, 02:34:16 pm
To test the V8-Xeon Win7 under full load I took part on pg Leap Day Challange and run up to number 23 (http://www.britta-d.de/images/primegrid/pg_number23_leap-day_challange.jpg)   ;D
So far OK temps ~88-95 grd celsius under full load. Its hot, but I can live with it till summer is comming.

heinz
Title: Re: optimized sources
Post by: _heinz on 05 Mar 2012, 03:26:27 am
5th March boinc_300_Mio_total (http://www.britta-d.de/images/seti/boinc_300_Mio_total.jpg) and france_number_16 (http://www.britta-d.de/images/seti/france_number_16_total_credit.jpg)

some days later:
14th March pg_250_Mio (http://www.britta-d.de/images/primegrid/pg_250_Mio.jpg)   ;D
Title: Re: optimized sources
Post by: _heinz on 26 Mar 2012, 12:37:10 pm
Updated Installers, v0.40 for Windows is available on the MainPage (http://lunatics.kwsn.net/index.php)
Please read Installer v0.40 release notes (http://lunatics.kwsn.net/2-windows/installer-v0-40-release-notes.msg47299.html#msg47299) before you install.
Thanks to all who are involved anyhow.
We have now more than 160 000 calls on this thread, thank you to all readers looking up here.

_heinz



Title: Re: optimized sources
Post by: _heinz on 26 Mar 2012, 02:06:26 pm
I always wanted to know how much energy my machines need to crunch.
I bought last a Watt measure unit and have now some values of my machines.

840 W, max 874 W V8-Xeon (1 GTX570, 2 GTX470)
230 W, max 270 W P4 with AGP HD4670
  66 W,  laptop i3, GT540M
  31 W,  R3600 ION
-------------------------------------
1167 W  for all my machines
1,167 KWh
28,008 KWh per day
840,24 KWh per month

If now the price per KWh is known you can calculate your crunching costs.
In my case in France we have 8 hours/day to 0.0567 Euro and 16 hours/day 0,0916 Euro per KWh
280,08 KWh * 0.0567 Euro = 15,88 Euro
560.16 KWh * 0,0916 Euro = 51,31 Euro
--------------------------------------------------
summary 67,19 Euro
+ 19,5% Mwst = 13,10 Euro
+ power-connection costs/month = 7,50 Euro
--------------------------------------------------------
Total = 87,79 Euro/month  without internet connection costs

actual credit/month is 36,763,527


87,79 Euro / 36,763527 Mio cc = 2,387964571516764 Euro per Mio cc
Today I have 326,832403 Mio credit
326,832403 Mio * 2,387964571516764 Euro/Mio  = 780,4641991876895 Euro

my summary total crunching cost ~ 780,46 Euro
------------------------------------------------------------
Remark:
in Germany the price per KWh is 0,24 Euro strompreise-Deutschland (http://www.kwh-preis.de/strom/strompreise#strompreise-Deutschland)
840,24 KWh / month * 0,24 Euro = 201,66 Euro/month in Germany
-----------------------------------------------------------------------------------
201,66 Euro/month in Germany

87,79 Euro/month in France

happy crunching in France,  bleu blanc rouge   ;D

_heinz

 




 



Title: Re: optimized sources
Post by: Mike on 26 Mar 2012, 04:17:59 pm

That can´t be true Heinz.

I´m consuming 10.000 KW a year (dont ask) all in all.
I (only) pay €180 a month all together.

My FX is very power hungry even overclocked.

Title: Re: optimized sources
Post by: _heinz on 27 Mar 2012, 05:10:34 am
That can´t be true Heinz.
Hi Mike,
all depends how many Graphicadapters are set into your FX,
perhaps you will be surprized if you test the power with a Watt-meter.
V8-Xeon has 3 graphicadapters( 1 GTX570 + 2 GTX470)
GTX570 runs at 832MHz
GTX470 runs at 730MHz
GTX470 runs at 730MHz
measured max power is 3,84 Ampere
220V * 3,84A = 844VA =844W =0,844KW for V8-Xeon,
what I have seen its is no big difference if i suspend work on all 8 cpus or not.
will make some measurement in the next days to document it.

heinz
Title: Re: optimized sources
Post by: Claggy on 27 Mar 2012, 05:29:52 am
_heinz, can you do an updated AP v6 Atom app when you have a chance please, while r555 & r557 are faster than r409 on an Atom, they are nowhere near as fast as your special Atom builds,
See the FFTW 3.3.1 static library development thread for benches on my Atom N450

Claggy
Title: Re: optimized sources
Post by: _heinz on 27 Mar 2012, 06:29:35 am
_heinz, can you do an updated AP v6 Atom app when you have a chance please, while r555 & r557 are faster than r409 on an Atom, they are nowhere near as fast as your special Atom builds,
See the FFTW 3.3.1 static library development thread for benches on my Atom N450

Claggy
Hi Claggy, have seen it... be patient...have a lot around my ears at the moment.
Title: Re: optimized sources
Post by: Claggy on 27 Mar 2012, 06:37:04 am
_heinz, can you do an updated AP v6 Atom app when you have a chance please, while r555 & r557 are faster than r409 on an Atom, they are nowhere near as fast as your special Atom builds,
See the FFTW 3.3.1 static library development thread for benches on my Atom N450

Claggy
Hi Claggy, have seen it... be patient...have a lot around my ears at the moment.
O.K, thanks, i didn't know if you had seen it or not,

Claggy
Title: Re: optimized sources
Post by: _heinz on 27 Mar 2012, 03:42:27 pm
_heinz, can you do an updated AP v6 Atom app when you have a chance please, while r555 & r557 are faster than r409 on an Atom, they are nowhere near as fast as your special Atom builds,
See the FFTW 3.3.1 static library development thread for benches on my Atom N450

Claggy
done, available in the Beta download area

heinz
Title: Re: optimized sources
Post by: _heinz on 28 Mar 2012, 03:55:04 am
How does this special astropulse v6 build for ATOM CPU's run shows the following quick test of ap_6.01r557_SSE3_ATOM_IXE_MKLS_O3.exe
More results in the Beta test area.
 
Bench results file R3600-20120328-0439-benchAP.txt
stored in .\Testdatas\ directory.

Quick timetable

WU : AP_single_pass.wu
ap_5.05r409_SSE.exe -verbose :
  Elapsed 2118.152 secs
      CPU 2110.569 secs
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 1532.965 secs, speedup: 27.63%  ratio: 1.38x
      CPU 1528.560 secs, speedup: 27.58%  ratio: 1.38x
ap_6.01r548_SSE_331_noAVX.exe -verbose  :
  Elapsed 2155.741 secs, speedup: -1.77%  ratio: 0.98x
      CPU 2145.170 secs, speedup: -1.64%  ratio: 0.98x
ap_6.01r557_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 1530.656 secs, speedup: 27.74%  ratio: 1.38x
      CPU 1524.067 secs, speedup: 27.79%  ratio: 1.38x

WU : Raistmer's_tiny.wu
ap_5.05r409_SSE.exe -verbose :
  Elapsed 932.693 secs
      CPU 926.724 secs
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 650.598 secs, speedup: 30.25%  ratio: 1.43x
      CPU 646.219 secs, speedup: 30.27%  ratio: 1.43x
ap_6.01r548_SSE_331_noAVX.exe -verbose  :
  Elapsed 850.933 secs, speedup: 8.77%  ratio: 1.10x
      CPU 846.961 secs, speedup: 8.61%  ratio: 1.09x
ap_6.01r557_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 629.632 secs, speedup: 32.49%  ratio: 1.48x
      CPU 626.048 secs, speedup: 32.45%  ratio: 1.48x

WU : sigind_v5.wu
ap_5.05r409_SSE.exe -verbose :
  Elapsed 4540.926 secs
      CPU 4508.039 secs
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 3578.609 secs, speedup: 21.19%  ratio: 1.27x
      CPU 3539.101 secs, speedup: 21.49%  ratio: 1.27x
ap_6.01r548_SSE_331_noAVX.exe -verbose  :
  Elapsed 4875.686 secs, speedup: -7.37%  ratio: 0.93x
      CPU 4810.275 secs, speedup: -6.70%  ratio: 0.94x
ap_6.01r557_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 3355.997 secs, speedup: 26.09%  ratio: 1.35x
      CPU 3340.746 secs, speedup: 25.89%  ratio: 1.35x

======================================
_heinz  ;)
Title: Re: optimized sources
Post by: _heinz on 29 Mar 2012, 09:56:05 am
what I have seen its is no big difference if i suspend work on all 8 cpus or not.
will make some measurement in the next days to document it.
some new measurement for V8-Xeon
~300W if I switch on the machine, selftest, no OS loaded
~286W windows is ready, logonscreen
~830W crunching distributed network on all 3 grapicadapters and run on 8 cpu's primegrids Sophie Germain (LLR)
~743W suspend work on CPU's, still GPU work is done
~370W suspend work on GPU's, wait till all fans are down, allow work on CPU's
means primegrid Sophie Germain (LLR) need still ~70W-90W to run on all 8 CPU's
The crunching 3 GPU's need then 743W -300W = 443W ,because the machine's base load power is ~300W.

summary we can say 2GTX470 + 1GTX570 need 450W for crunching (GPU load 99%)

the relation between the power of  3GPU/ 8CPU is 443W / 70W  = 6,33 : 1

Total power of the crunching v8-Xeon is ~840W (820 - 860)

_heinz


Title: Re: optimized sources
Post by: _heinz on 30 Mar 2012, 07:55:19 am
Picture of the link shows GTX570 of V8-Xeon crunching.
EVGA_PRECISION_X_3.0.1 (http://www.britta-d.de/images/nvflash/EVGA_PRECISION_X_3.0.1.jpg) is available for download on the EVGA-site (http://www.evga.com/precision/)
It supports the new GTX680.

_heinz
Title: Re: optimized sources
Post by: _heinz on 02 Apr 2012, 05:32:31 pm
with the new EVGA-tool I screwed up the frequency of the GPU-adapters.
Have a look at the pictures:
GPU1_570 842MHz (http://www.britta-d.de/images/gtx470/GPU1_570.jpg)
GPU2_470 770MHz (http://www.britta-d.de/images/gtx470/GPU2_470.jpg)
GPU3_470 780MHz (http://www.britta-d.de/images/gtx470/GPU3_470.jpg)

_heinz
Title: Re: optimized sources
Post by: _heinz on 03 Apr 2012, 08:03:13 pm
As I'm running primegrid "The Riesel Problem LLR 6.13" it comes to a v8-xeon_overheating (http://www.britta-d.de/images/primegrid/pg_v8-xeon_overheating.jpg)
CPU2 70 grd C
FBDIMM2 104 grd C
FBDIMM3 102 grd C
FBDIMM4 100 grd C

Still GPU2 run up to 100 grd and clocked down from 770 to 405MHz
GPU1 72 grd C
GPU3 80 grd C

_heinz
Title: Re: optimized sources
Post by: _heinz on 22 Apr 2012, 12:17:52 pm
New i7-3770 Ivy-Bridge is available in europa.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INTEL Core i7-3770K @ 4 x 4,5GHz, Quad-Core "Ivy Bridge" • TDP: 77W • Fertigung: 22nm  • DMI: 5GT/s •L2-Cache: 4x 256kB • L3-Cache: 8MB shared • Memory Controller: Dual Channel PC3-12800U (DDR3-1600) • Turbo Boost, Hyper-Threading, Multiplikator frei wählbar.
Intels Prozessoren mit Codenamen Ivy Bridge werden in 22-Nanometer-Technik gefertigt. Sie bieten neue Sicherheitsfunktionen, erweiterte Stromsparfunktionen und einen neuen Grafikkern, der DirectX 11 unterstützt.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GAINWARD GeForce GTX 680 Phantom, 4GB GDDR5, 2x DVI, HDMI, DisplayPort. Chiptakt: 1084MHz, Speichertakt: 1575MHz, Shadertakt: 1084MHz • Chip: GK104 • Speicherinterface: 256-bit • Stream-Prozessoren: 1536 • Textureinheiten: 128 • Fertigung: 28nm • Maximaler Verbrauch: 195W • DirectX: 11.1 • Shader Modell: 5.0 • Bauweise: Triple-Slot • Schnittstelle: PCIe 3.0 • Besonderheiten: werkseitig übertaktet, unterstützt HDCP, 4-Way-SLI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Have a look at this Low-Noise-Gaming_PC (http://www.hitech-gamer.com/Low-Noise-Gaming-PCs/LOW-NOISE-GAMER-PC-THE-THING-V2.html#longdescinfoicons)

_heinz  ;)
Title: Re: optimized sources
Post by: _heinz on 22 Apr 2012, 04:04:58 pm
v8-Xeon with triple oced NV-adapters

GTX570
GPU Takt (Geometric Domain)   850 MHz  (Original: 732 MHz, overclock: 16%)
GPU Takt (Shader Domain)   1700 MHz  (Original: 1464 MHz, overclock: 16%)

GTX470
GPU Takt (Geometric Domain)   770 MHz  (Original: 607 MHz, overclock: 27%)
GPU Takt (Shader Domain)   1540 MHz  (Original: 1215 MHz, overclock: 27%)

GTX470
GPU Takt (Geometric Domain)   780 MHz  (Original: 607 MHz, overclock: 29%)
GPU Takt (Shader Domain)   1560 MHz  (Original: 1215 MHz, overclock: 28%)

so far the old GTX470 are good overclocked, all air cooled

_heinz
Title: Re: optimized sources
Post by: _heinz on 23 Apr 2012, 03:32:37 am
special astropulse 6.01-build for ATOM-Processor
My long test run with a real astropulse wu ended today
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Quick timetable

WU : #ap_genwis.dat
ap_5.05r409_SSE.exe -verbose :
  Elapsed 104.863 secs
      CPU 101.026 secs
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 8.455 secs, speedup: 91.94%  ratio: 12.40x
      CPU 3.994 secs, speedup: 96.05%  ratio: 25.29x
ap_6.01r548_SSE_331_noAVX.exe -verbose  :
  Elapsed 98.888 secs, speedup: 5.70%  ratio: 1.06x
      CPU 95.785 secs, speedup: 5.19%  ratio: 1.05x
ap_6.01r557_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 6.209 secs, speedup: 94.08%  ratio: 16.89x
      CPU 3.588 secs, speedup: 96.45%  ratio: 28.16x
ap_6.01r557_SSE3_ATOM_IXE_MKLS_O3_libfftwf-3.3.1.exe -verbose  :
  Elapsed 104.192 secs, speedup: 0.64%  ratio: 1.01x
      CPU 100.277 secs, speedup: 0.74%  ratio: 1.01x
ap_6.01r557_SSE3_ATOM_IXE12.1.2.278_MKLS10.3.8_O3.exe -verbose  :
  Elapsed 6.474 secs, speedup: 93.83%  ratio: 16.20x
      CPU 3.869 secs, speedup: 96.17%  ratio: 26.11x

WU : ap_08mr07ag_B4_P1_00025_20100428_07060.wu
ap_5.05r409_SSE.exe -verbose :
  Elapsed 245817.702 secs
      CPU 239508.351 secs
ap_5.05r468_SSE3_ATOM_IXE_MKLS_O3.exe  -verbose  :
  Elapsed 175379.443 secs, speedup: 28.65%  ratio: 1.40x
      CPU 173293.118 secs, speedup: 27.65%  ratio: 1.38x
ap_6.01r548_SSE_331_noAVX.exe -verbose  :
  Elapsed 231145.730 secs, speedup: 5.97%  ratio: 1.06x
      CPU 230139.305 secs, speedup: 3.91%  ratio: 1.04x
ap_6.01r557_SSE3_ATOM_IXE_MKLS_O3.exe -verbose  :
  Elapsed 220803.835 secs, speedup: 10.18%  ratio: 1.11x
      CPU 218619.209 secs, speedup: 8.72%  ratio: 1.10x
ap_6.01r557_SSE3_ATOM_IXE_MKLS_O3_libfftwf-3.3.1.exe -verbose  :
  Elapsed 202800.952 secs, speedup: 17.50%  ratio: 1.21x
      CPU 201832.971 secs, speedup: 15.73%  ratio: 1.19x
ap_6.01r557_SSE3_ATOM_IXE12.1.2.278_MKLS10.3.8_O3.exe  -verbose  :
  Elapsed 173418.757 secs, speedup: 29.45%  ratio: 1.42x
      CPU 172586.200 secs, speedup: 27.94%  ratio: 1.39x


======================================

Restoring BOINC to pretest state...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Against ap_5.05r486 we lost no speedup with the new ap_6.01r557
We can be happy with it. 
Full test-result in our beta test area.

_heinz
Title: Re: optimized sources
Post by: _heinz on 24 Apr 2012, 08:30:26 pm
Today I installed CUDA 4.2 on my machines
Using BOINC 6.10.58

R3600
25.04.2012 03:21:53 NVIDIA GPU 0: ION (driver version 30132, CUDA version 4020, compute capability 1.1, 256MB, 35 GFLOPS peak)

P6630 Laptop i3, 2.6GHz
25.04.2012 10:04:05 NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 4020, compute capability 2.1, 1024MB, 205 GFLOPS peak)

v8-Xeon
25.04.2012 09:59:25 NVIDIA GPU 0: GeForce GTX 570 (driver version 30132, CUDA version 4020, compute capability 2.0, 1280MB, 1632 GFLOPS peak)
25.04.2012 09:59:25 NVIDIA GPU 1: GeForce GTX 470 (driver version 30132, CUDA version 4020, compute capability 2.0, 1280MB, 1398 GFLOPS peak)
25.04.2012 09:59:25 NVIDIA GPU 2: GeForce GTX 470 (driver version 30132, CUDA version 4020, compute capability 2.0, 1280MB, 1380 GFLOPS peak)

_heinz
Title: Re: optimized sources
Post by: _heinz on 25 Apr 2012, 02:21:53 pm
ATI - driver for HD4670 AGP  12-1_agp-hotfix_xp32_dd_cc

BOINC 7.0.25
30.04.2012 20:19:49 |  | Processor: 1 GenuineIntel               Intel(R) Pentium(R) 4 CPU 2.66GHz [Family 15 Model 2 Stepping 7]
30.04.2012 20:19:49 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pbe
30.04.2012 20:19:49 |  | OS: Microsoft Windows XP: Home x86 Edition, Service Pack 3, (05.01.2600.00)
30.04.2012 20:19:49 |  | Memory: 1023.48 MB physical, 4.90 GB virtual
30.04.2012 20:19:49 |  | Disk: 55.89 GB total, 16.03 GB free
30.04.2012 20:19:49 |  | Local time is UTC +2 hours
30.04.2012 20:19:49 |  | ATI GPU 0: ATI RV730 (CAL version 1.4.1664, 1024MB, 1012MB available, 960 GFLOPS peak)

works fine with primegrid, cpu-usage now down to 7- 10% (tpsieve_1.38)
the machine (P4 Northwood 2.6GHz) shows ~50% cpu usage total while crunching.
This shows: a single CPU can feed a Graphicadapter like HD4670 to crunch

25th april the machine had 2.250.000 pg-credit

_heinz

Title: Re: optimized sources
Post by: _heinz on 30 Apr 2012, 10:53:37 am
Its hot here in the Rhinvalley since 2 days, we had outdoortemperature over 30 grd C already
Roomtemp=25,8 grd celsius
GPU2_470_101_grd (http://www.britta-d.de/images/gtx470/GPU2_470_101_grd.jpg)  :'(
All temps over 100 grd C makes me trouble with the hardware.
Looks like I need a watercooled solution....or a compressor-cooling like a refrigerator.

If temps are not going down a bit, then I must shutdown my crunchers.

_heinz

Title: Re: optimized sources
Post by: _heinz on 03 May 2012, 11:01:52 am
MSI announces:
N690GTX Dual GPU (http://de.msi.com/news-media/news/100175.html)
N690GTX-P3D4GD5 (http://de.msi.com/product/vga/N690GTX-P3D4GD5.html)
3072 CUDA-Processors
4GB GDDR5, 6008MHz 
(TDP)  300 Watt
max temp 98 grd C
10 years ultra long lifetime (under full load).

price in germany: 999,00 Euro (http://www2.hardwareversand.de/articledetail.jsp?aid=59049&agid=1947&pvid=99ahl9c58_h1rva8cu&ref=26&lb)

happy crunching
_heinz
Title: Re: optimized sources
Post by: _heinz on 23 May 2012, 02:28:42 pm
V8-Xeon is dead again.

14 days ago I shut down the machine to go to holidays. As I'm back I switched the power on and V8-Xeon did not start anymore.  The light and the fans are on for a second and then off. On the display of the board is nothing shown. No selfest is starting. Looks like the machine eat a next PSU..... :o

_heinz
Title: Re: optimized sources
Post by: _heinz on 19 Jun 2012, 05:59:58 pm
Ordered now a PSU tester unit to see if PSU or board is dead.
Waiting...
Title: Re: optimized sources
Post by: _heinz on 22 Jun 2012, 01:18:23 pm
21th June my i3-laptop-gt540m got 7Mio_distrtgen (http://www.britta-d.de/images/gtx540m/gt540m_7Mio_distrtgen.jpg)

_heinz
Title: Re: optimized sources
Post by: _heinz on 04 Jul 2012, 07:24:04 am
4th of July a great day in Research at CERN,

Higgs within reach (http://public.web.cern.ch/public/)

Read the CERN Press Release (http://press.web.cern.ch/press/PressReleases/Releases2012/PR17.12E.html)

Austria-German Press articles
~~~~~~~~~~~~~~~~~~
Wir haben es (http://diepresse.com/home/science/1262089/Wir-haben-es-CERNDurchbruch-bei-HiggsSuche?_vl_backlink=/home/index.do)

Physiker feiern Durchbruch (http://www.spiegel.de/wissenschaft/technik/higgs-boson-cern-gibt-entdeckung-von-teilchen-am-lhc-bekannt-a-842478.html)

 ;D  ;D  ;D
_heinz
Title: Re: optimized sources
Post by: _heinz on 07 Jul 2012, 10:28:53 am
Ordered now a PSU tester unit to see if PSU or board is dead.
Waiting...
PSU tester does not show anything, if I switch the power on, the display of PSU tester goes short on, then the PSU did switch the power automatic off. The PSU is definitely dead.

So I will looking for a new 1200W PSU in autumn. Now in the summer if temps are up 28 grd I can't run the air-cooled V8-Xeon.

By the way today my i3-laptop-gt540m passed 8Mio_distrtgen (http://www.britta-d.de/images/gtx540m/gt540m_8Mio_distrtgen.jpg)

_heinz
Title: Re: optimized sources
Post by: _heinz on 12 Jul 2012, 03:43:21 pm
Today I cleaned the PSU and had a closer look in it to see something.
PSU_open_cleaned (http://www.britta-d.de/images/psu/PSU_open_cleaned)

I found a overheated transformer.
PSU_transformer_E4220_defect (http://www.britta-d.de/images/psu/PSU_transformer_E4220_defect.jpg)
PSU_transformer_E4220_defect_closer_look (http://www.britta-d.de/images/psu/PSU_transformer_E4220_defect_closer_look.jpg)

After all the cleaning I conneted the PSU-tester, if it shows something.
PSU_tester_connected (http://www.britta-d.de/images/psu/PSU_tester_connected.jpg)

And really it shows something now, wow what a effect.
PSU_tester_shows_all_voltages (http://www.britta-d.de/images/psu/PSU_tester_shows_all_voltages.jpg)

PSU tester says the PSU is OK.

Now I closed the PSU and set it back into V8-Xeon.
I switched the Power on and the machine runs till to the logon screen, then switched the power automatic off again. Pitty..
A next try to start the machine was not sucessful anymore.
Will order a new PSU now.

_heinz
Title: Re: optimized sources
Post by: _heinz on 18 Jul 2012, 05:02:20 am
More than 180 000 hits on this thread now
Thank You to all users still looking up here.  ;)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Although we are a bit out of the headline, I'm hoping you enjoy.
To running the hardware on its edge is a great challange.
I dont mention it, my P4 with AGP HIS4670 died also, PSU burnt out the third time.
So I have stiil my R3600 ATOM ION and my laptop i3 GT540M to crunch.
Looking and control the hardware is always necessary to run extremly.
I have seen latest GPU-Z 0.6.3 does not show CUDA on GT540M on my laptop.
I was also wondering about DirectCompute is not shown.
(Errorreport done)
http://www.britta-d.de/images/gtx540m/gt540m_gpuz_0.6.3_nocuda.jpg
Have a closer look with GPU Caps viewer on this GPU:
http://www.britta-d.de/images/gtx540m/gt540m_gpu.jpg
http://www.britta-d.de/images/gtx540m/gt540m_opengl.jpg
http://www.britta-d.de/images/gtx540m/gt540m_opencl.jpg
http://www.britta-d.de/images/gtx540m/gt540m_cuda.jpg

_heinz

Title: Re: optimized sources
Post by: _heinz on 20 Jul 2012, 07:04:05 pm
V8-Xeon is dead again.
14 days ago I shut down the machine to go to holidays. As I'm back I switched the power on and V8-Xeon did not start anymore.  The light and the fans are on for a second and then off. On the display of the board is nothing shown. No selfest is starting. Looks like the machine eat a next PSU..... :o

_heinz
(a little late, but it is worth to tell and show picture)
As I opened the case I was really surprized about all the dust in it. Last cleaning was sill two a half months ago.
Have a closer look on the picture and you see dust ontop of the grapicadapters too, Its not light, its dust.
v8-xeon_dustbunnies (http://www.britta-d.de/images/v8xeon/v8-xeon_dustbunnies.jpg)
The most dust is on the wires before the first cpu cooling unit. I had to think about any effective air-filtersystem in the next future.
See you the small space between the graphicadapters, no wounder to get high temperatures, not enough air can come in.
This is not a masterpiece of engeneering, the graphicadapters must be smaller in the part where the fans are to let more air in.
Better I should run still two graphicadapters on this bordlayout.
I could increase the performance using two adapters with dual GPUs. Maybe a mixed configuration, one from NVIDIA the other from ATI.
One of the main-problems are the high temperatures of the FB-DIMM, you know I had already temps over 100 grd Celsius.
On the market I did not found really good freezers for the FB-DIMMs and a lot of memory coolers does not fit.
I got a KINGSTON HyperX RAM-Cooler 2x60mm blue , but it does not fit, it's too big.
I need a 50mm and still 5 mm high to fit under the clamp of the 12cm fan of the left CPU-cooler.
Looks like I must selfcunstruct something.

_heinz
 


Title: Re: optimized sources
Post by: _heinz on 24 Jul 2012, 07:57:57 am
I dont mention it, my P4 with AGP HIS4670 died also, PSU burnt out the third time.

_heinz
Got the machine repaired after a general cleaning and demounting.  :)
23.07.2012 22:11:41 |  | ATI GPU 0: ATI RV730 (CAL version 1.4.1664, 1024MB, 1012MB available, 960 GFLOPS peak)
23.07.2012 22:11:41 |  | OpenCL: ATI GPU 0: ATI RV730 (driver version CAL 1.4.1664, device version OpenCL 1.0 AMD-APP (851.4), 1024MB, 1012MB available)
~~~~~~~~~~~~~~~~~~~~~~
hd4670_AGP_GPUZ_graphicadapter (http://www.britta-d.de/images/ati/hd4670_AGP_GPUZ_graphicadapter.jpg)
hd4670_AGP_GPUZ_crunching (http://www.britta-d.de/images/ati/hd4670_AGP_GPUZ_crunching.jpg)
We see a constant ~90% GPU-load and GPU-Temp ~66 grd celsius, Roomtemp 26,2 grd celsius.
Looks good, my working horse is running again.

_heinz
Title: Re: optimized sources
Post by: _heinz on 10 Aug 2012, 02:44:36 pm
10th august 2012, I got  a new milestone with my small i3- laptop gt540m_10Mio_distrtgen (http://www.britta-d.de/images/gtx540m/gt540m_10Mio_distrtgen.jpg)  ;D

my comment to the FPGA:
The price is high,  1392 Euro for a PCI-Express-card with XILINX™ VIRTEX-4™ FPGA and 512 MByte SO-DIMM storage-modul (http://www.cesys.com/produkte/kategorie/fpga-karten-virtex/produkt/pciev4base/)
see this entry in my developer-book some years ago (http://www.britta-d.de/images/seti/fpga_seti.JPG)
Although if someone has access to all the hard and software it would be nice to have a seti application on a FPGA.
See FPGA thread (http://lunatics.kwsn.net/1-discussion-forum/fpgas.msg49334.html;topicseen#msg49334)

_heinz
Title: Re: optimized sources
Post by: _heinz on 17 Aug 2012, 09:26:57 am
GT540M
installed now: 304.79-notebook-win8-win7-winvista-64bit-international-beta
BOINC shows:
17.08.2012 15:05:16      NVIDIA GPU 0: GeForce GT 540M (driver version unknown, CUDA version 5000, compute capability 2.1, 1024MB, 205 GFLOPS peak)
CUDA 5.0 device driver used now.
CUDA 5.0 is available as pre release (DeveloperZone)
will run some tests in one of the next days..

_heinz
Title: Re: optimized sources
Post by: _heinz on 21 Aug 2012, 08:05:39 pm
I have a wattmeter and measured my machines during crunching.

The question is:How many KWh need my machines to get 1 Mio Cobblestone?
Calculated for distrrtgen
1 WU = 8758 Points
1000000 / 8758 = 114,181 ~ 115WU's needed.

Machine 1: Pentium 2.6GHz Northwood, ATI HIS Radeon HD4670 AGP
Duration per WU = 4h 10 min ~ 4,2h
115wu * 4,25h = 483h
Measured crunch-power = 140W
483h*140W=67620W ~67,62KWh
1Mio = 67,62KWh
~~~~~~~~~~~~~~~

Machine 2: Laptop i3 2.6GHz NVIDIA GT540M
Duration per wu = 2h 15min ~ 2,25h
115wu * 2,25h = 258,75h
Measured crunch-power = 62W
258,75h*62W=16042,5W ~16,0425KWh
1Mio = 16,0425KWh
~~~~~~~~~~~~~~~~~

Machine 3: ATOM 1,6GHz, ION GPU
Duration per WU = 17h 30min ~17,5h
115wu * 17,5h = 2012,5h
Measured crunch-power = 28W
2012,5h*28W=56350w ~56,35KWh
1Mio = 56,35KWh
~~~~~~~~~~~~~~~

Machine 4: V8-Xeon,2,4GHz NVIDIA GTX570, GTX470, GTX470
data from project primegrid:
per 24h = 1,6Mio
Measured crunch-power = 860W
24h*860W=20,64KWh/1,6Mio =12,9KWH per 1Mio
1Mio = 12,6KWh
~~~~~~~~~~~~~~
distrrtgen
GTX570 need ~662s/wu, 115wu*662s=76130s/3600= 21,14h/Mio
Because we have 3 GTX570 we must divide by 3
21,13h/3=7,04h per Mio
860w * 7,04h=6054,4W ~6,054KWh per Mio
1Mio = 6,054KWh
~~~~~~~~~~~~~~~
distrrtgen
GTX680 need 485,06sec/wu, 115wu*485,06s=55781,9s/3600s= 15,5h per Mio distrrtgen.
If we have 3 GTX680 we must divide by 3
15,5h/3=5,16h per Mio
We assume we need 860W with V8-Xeon.
860w * 5,16h=4437,6Wh ~4,4376KWh
1Mio = 4,4376KWh
~~~~~~~~~~~~~~~~

Remark:no CPU work done, all GPU work
I can confirm: CPU usage on all 4 machines slower than 3% (mostly 0,5-1,5)

Sure we can not direct compare primegrid against distrrtgen.
The measured results are surprizing and very interesting.
If we compare all 4 machines, V8-Xeon is the most effective.
V8-Xeon
~6,05KWh per Mio with 3 GTX570
~4,44KWh per Mio with 3 GTX680 (precalculated on real data)

As soon as I have a new PSU, I will test v8-Xeon with distrrtgen to get real data.

It is mostly effective to use modern latest hardware.

_heinz
Title: Re: optimized sources
Post by: _heinz on 22 Aug 2012, 08:19:14 am
What about ATI

A good sample for effectivity calculation are computers of james Ying.
userid=15715 (http://boinc.freerainbowtables.com/distrrtgen/hosts_user.php?userid=15715)
lets have a look at

hostid=23173 (http://boinc.freerainbowtables.com/distrrtgen/show_host_detail.php?hostid=23173)
Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
[4] AMD AMD Radeon HD 79x0 series (Tahiti) (3072MB) driver: 1.4.1720
running (distrrtgen) v3.52 (opencl_ati_101)
runtime:436 sec/wu
115wu * 436 = 5140s/3600s=13,93h
if we have 3GPU we divide by 3
13,93h/3=4,64h per Mio
if we assume we need 860W with V8-Xeon
860W * 4,64h=3990,4 ~3,99KWh
1Mio = 3,99KWh  (3 HD7970)


if we have 4GPU so we divide by 4
13,93h/4=3,48h per Mio
if we assume we need 860W with V8-Xeon
860W * 3,48h=2994,47 ~2,994KWh
1Mio = 2,994KWh (with 4 HD7970)

This means ATI HD7970 with new OpenCL app (opencl_ati_101) can beat a GTX680 running (distrrtgen) v3.48 (cuda23)
runtimerelation HD7970/GTX680:436s / 485s , difference 49sec

Reference: HD7970 runtime=423s
hostid=39986 (http://boinc.freerainbowtables.com/distrrtgen/show_host_detail.php?hostid=39986)
Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
AMD AMD Radeon HD 79x0 series (Tahiti) (3072MB) driver: 1.4.1720

Reference: GTX590 runtime=848s (anonymous platform runs OCL version, maybe 2 at once?)
hostid=41489 (http://boinc.freerainbowtables.com/distrrtgen/show_host_detail.php?hostid=41489)
Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]
[2] NVIDIA GeForce GTX 590 (1535MB) driver: 304.79

_heinz
Title: Re: optimized sources
Post by: _heinz on 26 Aug 2012, 10:02:35 am
Meanwhile my laptop passed 30Mio total. All GPU work. If I run cpu work additional the machine overheated.
 i3_gt540m_30Mio_total (http://www.britta-d.de/images/gtx540m/i3_gt540m_30Mio_total.jpg)

1Mio = 16,0425KWh
16,0425Kw/Mio * 30Mio = 481,275 KWh
30Mio = 481,275 KWh
How much money cost electricity in France ?
the contract has 2 prices dependant from daytime
from 2:00h-8:00h and 14:30h-16:30h summary 8h(a third of a day) price= 0,0567 Euro/KWh
rest of the day 16h(2 third of a day) price 0,0916 Euro/KWh
so we can say
10 * 16,0425 KWh = 160,425 KWh * 0,0567 Euro/Kwh =   9,096 Euro
20 * 16,0425 KWh = 320,85  KWh  * 0,0916 Euro/KWh = 29,389 Euro
Summary 38,48586 Euro
30 Mio = ~38,49 Euro
~~~~~~~~~~~~~~
This is still true for the Laptop

_heinz
Title: Re: optimized sources
Post by: _heinz on 30 Aug 2012, 10:54:35 am
100 Mio distrRTgen in sight...
passed 98 Mio distrrtgen (http://www.britta-d.de/images/gtx540m/boinc_total_98_Mio_distrrtgen.jpg)
the last two Mio will be calculated by GT540M, ATI HIS4670 AGP, ION 

_heinz
Title: Re: optimized sources
Post by: _heinz on 03 Sep 2012, 10:35:17 am
back to HD4670 AGP
hd4670_distrrtgen_1Mio (http://www.britta-d.de/images/ati/hd4670_distrrtgen_1Mio.jpg) (opencl_ati_101)
It need a month to get this milestone. The old P4 did it... :)
edit: 14th september
hd4670_distrrtgen_1.5Mio (http://www.britta-d.de/images/ati/hd4670_distrrtgen_1.5Mio.jpg) (opencl_ati_101)
a half Mio in 14 days...
edit: 23th september
hd4670_distrrtgen_2Mio (http://www.britta-d.de/images/ati/hd4670_distrrtgen_2Mio.jpg) (opencl_ati_101)
boinc_hd4670_2_Mio_distrrtgen (http://www.britta-d.de/images/ati/boinc_hd4670_2_Mio_distrrtgen.jpg)

_heinz
Title: Re: optimized sources
Post by: _heinz on 15 Sep 2012, 06:46:45 pm
15th september, got it...
100_Mio_distrrtgen (http://www.britta-d.de/images/gtx540m/boinc_total_100_Mio_distrrtgen.jpg)
boinc_stats_100_Mio_distrrtgen (http://www.britta-d.de/images/gtx540m/boinc_stats_100_Mio_distrrtgen.jpg)

modify:
ATOM R3600 ION_1Mio_distrrtgen (http://www.britta-d.de/images/seti/ION_1Mio_distrrtgen.jpg)
boinc_stats_ION_1_Mio_distrrtgen (http://www.britta-d.de/images/seti/boinc_ION_1_Mio_distrrtgen.jpg)
ION_sensors_during_crunching (http://www.britta-d.de/images/seti/ION_sensors_during_crunching.jpg)

_heinz
Title: Re: optimized sources
Post by: _heinz on 06 Oct 2012, 08:34:50 am
2nd october,
shutdown all my machies, vacation  :)

_heinz
Title: Re: optimized sources
Post by: _heinz on 22 Nov 2012, 02:44:17 pm
vacation are over now....
22nd november,
i3 gt540m laptop get a milestone gt540m_15Mio_distrrtgen_stat (http://www.britta-d.de/images/gtx540m/gt540m_15Mio_distrrtgen_stat.jpg) and gt540m_15Mio_distrrtgen (http://www.britta-d.de/images/gtx540m/gt540m_15Mio_distrrtgen.jpg)
remark: its not easy to hold the GPU temp under 90 grd Celsius with a laptop, from time to time I found the machine down in the morning.
gt540m_crunching_91_grd_celsius (http://www.britta-d.de/images/gtx540m/gt540m_crunching_91_grd_celsius.jpg) roomtemp 22 grd Celsius
22.11.2012 19:23:11 |  | NVIDIA GPU 0: GeForce GT 540M (driver version 306.94, CUDA version 5.0, compute capability 2.1, 1024MB, 8381384MB available, 258 GFLOPS peak)


_heinz
Title: Re: optimized sources
Post by: Claggy on 22 Nov 2012, 03:55:37 pm

22.11.2012 19:23:11 |  | NVIDIA GPU 0: GeForce GT 540M (driver version 306.94, CUDA version 5.0, compute capability 2.1, 1024MB, 8381384MB available, 258 GFLOPS peak)
You're suffering from the Wacky Nvidia GPU Memory Bug, upgrade to Boinc 7.0.36 or 7.0.38 for the partial or full fix, But Note that since 7.0.32 and later introduce a higher internal flops value for the GPU,
and which puts existing GPU tasks on the verge of going Maximum Time Exceeded, you should run down your GPU tasks prior to upgrading,

Claggy
Title: Re: optimized sources
Post by: _heinz on 27 Nov 2012, 04:01:43 am
Hi Claggy,
I have BOINC 7.0.28x64 the actual BOINC, where I can find BOINC 7.0.38   ?
thanks in advance
_heinz
Title: Re: optimized sources
Post by: Urs Echternacht on 27 Nov 2012, 08:27:17 am
Hi Claggy,
I have BOINC 7.0.28x64 the actual BOINC, where I can find BOINC 7.0.38   ?
thanks in advance
_heinz
http://boinc.berkeley.edu/dl/ should have all versions of BOINC.
Title: Re: optimized sources
Post by: Claggy on 27 Nov 2012, 08:34:00 am
Hi Claggy,
I have BOINC 7.0.28x64 the actual BOINC, where I can find BOINC 7.0.38   ?
thanks in advance
_heinz
The Boinc 7 Changelog (http://boinc.berkeley.edu/dev/forum_thread.php?id=6698) thread has the changes and links for the different versions after they become available
(normally a day or so after they have appeared in the D/L directory, incase they implode our PC's)

Claggy
Title: Re: optimized sources
Post by: _heinz on 28 Nov 2012, 08:41:46 am

22.11.2012 19:23:11 |  | NVIDIA GPU 0: GeForce GT 540M (driver version 306.94, CUDA version 5.0, compute capability 2.1, 1024MB, 8381384MB available, 258 GFLOPS peak)
You're suffering from the Wacky Nvidia GPU Memory Bug, upgrade to Boinc 7.0.36 or 7.0.38 for the partial or full fix, But Note that since 7.0.32 and later introduce a higher internal flops value for the GPU,
and which puts existing GPU tasks on the verge of going Maximum Time Exceeded, you should run down your GPU tasks prior to upgrading,

Claggy
I installed BOINC 7.0.38 now.
28.11.2012 09:13:48 |  | Starting BOINC client version 7.0.38 for windows_x86_64
...
28.11.2012 09:13:48 |  | NVIDIA GPU 0: GeForce GT 540M (driver version 306.94, CUDA version 5.0, compute capability 2.1, 1024MB, 968MB available, 258 GFLOPS peak)
28.11.2012 09:13:48 |  | OpenCL: NVIDIA GPU 0: GeForce GT 540M (driver version 306.94, device version OpenCL 1.1 CUDA, 1024MB, 968MB available)
..
ION Shows now:
28.11.2012 18:34:46 |  | NVIDIA GPU 0: ION (driver version 306.97, CUDA version 5.0, compute capability 1.1, 256MB, 225MB available, 53 GFLOPS peak)
28.11.2012 18:34:46 |  | OpenCL: NVIDIA GPU 0: ION (driver version 306.97, device version OpenCL 1.0 CUDA, 256MB, 225MB available)

Thank you both for your comment and help
_heinz
Title: Re: optimized sources
Post by: _heinz on 03 Dec 2012, 06:08:49 pm
3rd december,
we have some big milestones to celebrate
200540 views on this thread -->  a big thank you to all readers  :)

my i3 Gt540m laptop passed today 20Mio primegrid and the old P4 with its AGP HD4670 passed 3Mio primegrid.
gt540m_20Mio_primegrid (http://www.britta-d.de/images/gtx540m/gt540m_20Mio_primegrid.jpg)
gt540m_20Mio_pg_CAL_ATI_RV730_3Mio_pg_stat (http://www.britta-d.de/images/gtx540m/gt540m_20Mio_pg_CAL_ATI_RV730_3Mio_pg_stat.jpg)

_heinz
Title: Re: optimized sources
Post by: _heinz on 23 Dec 2012, 08:05:32 am
V8-Xeon is dead again.

14 days ago I shut down the machine to go to holidays. As I'm back I switched the power on and V8-Xeon did not start anymore.  The light and the fans are on for a second and then off. On the display of the board is nothing shown. No selfest is starting. Looks like the machine eat a next PSU..... :o

_heinz
V8-Xeon is back again with a LEPA 1600W PSU (http://www.lepatek.com/eng/product_content/1/1/20/#produkte), hoping it does a little bit longer than the other 3 PSU before.
I changed still the PSU....configuration as before, 1 x GTX570, 2 x GTX470.
Measured power under full load of all GPU's = 730W (no CPU-work)
Seems stable till now (12hours), so we can hope... to have a

Merry Christmas and Happy New Year

_heinz

Title: Re: optimized sources
Post by: _heinz on 30 Dec 2012, 07:52:45 pm
sylvester, passed 400 Mio total (http://www.boincstats.com/signature/user_425.gif)

Happy New Year
Title: Re: optimized sources
Post by: _heinz on 03 Jan 2013, 08:12:27 pm
Have a new toy to play,  ;D

04.01.2013 00:02:36 |  | Processor: 8 GenuineIntel       Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz [Family 6 Model 58 Stepping 9]
04.01.2013 00:02:36 |  | Processor: 256.00 KB cache
04.01.2013 00:02:36 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe
04.01.2013 00:02:36 |  | OS: Microsoft Windows 8: x64 Edition, (06.02.9200.00)
04.01.2013 00:02:36 |  | Memory: 15.82 GB physical, 21.32 GB virtual
04.01.2013 00:02:36 |  | Disk: 913.70 GB total, 870.75 GB free
04.01.2013 00:02:36 |  | Local time is UTC +1 hours
04.01.2013 00:02:36 |  | NVIDIA GPU 0: GeForce GT 650M (driver version 310.70, CUDA version 5.0, compute capability 3.0, 8374272MB, 8374206MB  available, 730 GFLOPS peak)
04.01.2013 00:02:36 |  | OpenCL: NVIDIA GPU 0: GeForce GT 650M (driver version 310.70, device version OpenCL 1.1 CUDA, 2048MB, 8374206MB available)
BOINC 7.0.28 does not show 2048MB, shows wrong values

05.01.2013 17:37:42 |  | NVIDIA GPU 0: GeForce GT 650M (driver version 310.70, CUDA version 5.0, compute capability 3.0, 2048MB, 1982MB available, 730 GFLOPS peak)
05.01.2013 17:37:42 |  | OpenCL: NVIDIA GPU 0: GeForce GT 650M (driver version 310.70, device version OpenCL 1.1 CUDA, 2048MB, 1982MB available)
05.01.2013 17:37:42 |  | Config: simulate 4 CPUs
05.01.2013 17:37:42 |  | Config: use all coprocessors
05.01.2013 17:37:42 |  | Version change (7.0.28 -> 7.0.38)

Grafikprozessor Eigenschaften   
Grafikkarte   nVIDIA GeForce GT 650M (Acer)
GPU Codename   GK107M  (PCI Express 3.0 x16 10DE / 0FD1, Rev A2)
GPU Takt   950 MHz
Speichertakt   900 MHz

evga_precision_gt650m (http://www.britta-d.de/images/i7/evga_precision_gt650m.jpg)
gpuz_gt650m (http://www.britta-d.de/images/i7/gpuz_gt650m.jpg)
gpuz_gt650m_sensors (http://www.britta-d.de/images/i7/gpuz_gt650m_sensors.jpg)
it's crunching 2 pg wu's at once, temps are cool... roomtemp=23 grd
modify:
aida64_sensors (http://www.britta-d.de/images/i7/aida64_sensors.jpg) running  5 pg genefer 1.07, one on GT650M and 4 on i7 CPU, roomtemp 19,3 grd
later:
aida64_overclock_cpu (http://www.britta-d.de/images/i7/aida64_overclock_cpu.jpg) running  5 pg genefer 1.07, one on GT650M and 4 on i7 CPU, roomtemp 15,7 grd
under full load cpu increase from 2,4GHz to 3,2GHz automatic

some later:
i7_gt650m_not_impressive (http://www.britta-d.de/images/i7/i7_gt650m_not_impressive.jpg)
i7_gt650m_genefer_50_percent (http://www.britta-d.de/images/i7/i7_gt650m_genefer_50_percent.jpg)
precalculated runtime GT650m ~55h
i7_genefer_20_percent (http://www.britta-d.de/images/i7/i7_genefer_20_percent.jpg)
and cpu i7 precalculated ~187h
not really impressed running 4wu on a i7-3630QM quadcore with HT
i7_gt650m_genefer_100_percent (http://www.britta-d.de/images/i7/i7_gt650m_genefer_100_percent.jpg)
runtime GT650M = 55h 06m 20s
my GT570 need for the same job 9h 53m
my old GTX470 does it in about  11h 20m
will post real values in one of the next days when work is done

some days later, work is done
runtime:
CPU run 8 Jobs
49430^1048576+1 is complete. (4922006 digits) (err = 0.0029) (time = 140:37:04) 17:25:10
49412^1048576+1 is complete. (4921841 digits) (err = 0.0029) (time = 140:45:41) 17:33:46
49420^1048576+1 is complete. (4921914 digits) (err = 0.0029) (time = 140:31:12) 17:19:17
49328^1048576+1 is complete. (4921066 digits) (err = 0.0029) (time = 140:16:55) 17:05:00
84190^1048576+1 is complete. (5164510 digits) (err = 0.0098) (time = 121:15:42) 03:45:23
84186^1048576+1 is complete. (5164489 digits) (err = 0.0088) (time = 121:18:19) 03:48:00
84184^1048576+1 is complete. (5164478 digits) (err = 0.0088) (time = 121:18:14) 03:47:55
82904^1048576+1 is complete. (5157501 digits) (err = 0.0088) (time = 121:01:58) 03:31:39

GPU GT650
49330^1048576+1 is complete. (4921084 digits) (err = 0.0022) (time = 55:18:04) 03:49:29
38988^1048576+1 is complete. (4813941 digits) (err = 0.0014) (time = 54:06:20) 16:26:13
73812^1048576+1 is complete. (5104602 digits) (err = 0.0049) (time = 58:03:21) 02:29:37
91842^1048576+1 is complete. (5204127 digits) (err = 0.0081) (time = 58:30:15) 13:03:16
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GTX570
88112^1048576+1 is complete. (5185246 digits) (err = 0.0068) (time = 8:49:50) 15:50:47
107858^1048576+1 is complete. (5277329 digits) (err = 0.0107) (time = 8:59:25) 07:00:54
31050^1048576+1 is complete. (4710268 digits) (err = 0.0009) (time = 8:01:05) 13:02:03
21984^1048576+1 is complete. (4553029 digits) (err = 0.0004) (time = 7:45:01) 14:41:51

GTX470
108002^1048576+1 is complete. (5277936 digits) (err = 0.0103) (time = 10:56:51) 22:09:30
108268^1048576+1 is complete. (5279056 digits) (err = 0.0107) (time = 10:56:16) 12:57:03
108566^1048576+1 is complete. (5280308 digits) (err = 0.0107) (time = 10:56:28) 11:12:35
86298^1048576+1 is complete. (5175772 digits) (err = 0.0066) (time = 10:43:31) 00:16:04

the challange statistic show me on place 43 (http://www.primegrid.com/challenge/2013_1/top_users.html) Last update: 2013 01 17 23:30:14

43 _heinz  2603033.43
 _heinz
Title: Re: optimized sources
Post by: _heinz on 05 Jan 2013, 07:47:17 pm
My small amount to support seti  ;)

Reneweled support for my outdated 2011 lizenz.
Have now "Version 2013" Update 1 of "Intel® C++ Composer XE for Windows*"
As always a full production lizenz.
Support Status: Active 04 Jan 2014

_heinz
Title: Re: optimized sources
Post by: _heinz on 29 Jan 2013, 07:38:44 pm
we have some small milestone to celebrate
28th January: my old P4 2.66GHz with AGP HD4670 passed 4 Mio primegrid
agp_hd4670_4Mio_primegrid (http://www.britta-d.de/images/ati/hd4670_primegrid_4Mio.jpg)

my laptop i7-3630QM GeForce GT 650M  passed its first Mio distrrtgen
i7_gt650m_1Mio_distrrtgen (http://www.britta-d.de/images/i7/i7_gt650m_1Mio_distrrtgen.jpg)
v8_Xeon passed 100Mio distrrtgen
v8-xeon_100Mio_distrrtgen (http://www.britta-d.de/images/gtx470/v8-xeon_100Mio_distrrtgen.jpg)

_heinz
Title: Re: optimized sources
Post by: _heinz on 04 Feb 2013, 09:33:06 am
i7_gt650m_2Mio_distrrtgen (http://www.britta-d.de/images/i7/i7_gt650m_2Mio_distrrtgen.jpg)
last not least not so bad as I thought, we should don't forget it is still a laptop.

_heinz
Title: Re: optimized sources
Post by: _heinz on 11 Feb 2013, 08:21:49 pm
11th February,
as I came into the server room this night V8-Xeon was down. I pressed the reset botton, then the start botton.
This implemented the main-circuit breaker switched power off. Looks like LEPA 1600 quitted its work.
Hoping the board is not damaged. Tomorrow I will measure the voltages of the LEPA then we know more.
next day: PSU is definitely dead, filled out RMA and ordered a new one.

_heinz


 
Title: Re: optimized sources
Post by: _heinz on 13 Feb 2013, 06:01:19 pm
13th February
This morning I made a general cleaning and build all components out. I let still the motherboard into the case.
At afternoon I bought a new LEPA 1600, the old one must be resend to the producer. I made a agreement to get my money back if LEPA confirm the PSU is defect. To get confirmation I had to wait several weeks.  :'(
I set the new PSU into the machine and after several restarts and repairing the OS, v8-Xeon is alive again since 21:00 h
I measured the power under full load running primegrid on  8 CPU and distrrtgen on  3 GPU =780 Watt.
Lucky that the board was not damaged.  ;D

_heinz


Title: Re: optimized sources
Post by: _heinz on 19 Feb 2013, 10:59:50 am
19th february
geforce-gtx-titan (http://www.nvidia.de/object/geforce-gtx-titan-de.html#pdpContent=2)
Maybe available in some weeks in germany.
edit:
ASUS GTX-TITAN-6GD5 949 Euro, available in March (http://www.alternate.de/html/listings/17193/17195)
Look at the 3 way SLI i7-3930K Gaming-Computer 4999 Euro
Happy crunching

_heinz
Title: Re: optimized sources
Post by: _heinz on 21 Feb 2013, 07:51:38 am
v8-Xeon is back to the first 100 computer at distrrtgen
v8-Xeon_best_computer_distrrtgen_number_99 (http://www.britta-d.de/images/seti/v8-Xeon_best_computer_distrrtgen_number_99.jpg)
 RAC goes slowly upwards again.. ;D

_heinz
Title: Re: optimized sources
Post by: _heinz on 22 Feb 2013, 08:47:05 am
Lunatics_x41zc_win32_cuda50.exe runs on the ION like a charm, cpu usage nearly zero
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
22.02.2013 14:07:55 |  | Processor: 2 GenuineIntel          Intel(R) Atom(TM) CPU  230   @ 1.60GHz [Family 6 Model 28 Stepping 2]
22.02.2013 14:07:55 |  | Processor: 512.00 KB cache
22.02.2013 14:07:55 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 nx lm tm2 movebe pbe
22.02.2013 14:07:55 |  | OS: Microsoft Windows 7: Home Premium x86 Edition, Service Pack 1, (06.01.7601.00)
22.02.2013 14:07:55 |  | Memory: 1.75 GB physical, 3.50 GB virtual
22.02.2013 14:07:55 |  | Disk: 158.20 GB total, 123.47 GB free
22.02.2013 14:07:55 |  | Local time is UTC +1 hours
22.02.2013 14:07:55 |  | NVIDIA GPU 0: ION (driver version 314.7, CUDA version 5.0, compute capability 1.1, 256MB, 213MB available, 53 GFLOPS peak)
22.02.2013 14:07:55 |  | OpenCL: NVIDIA GPU 0: ION (driver version 314.07, device version OpenCL 1.0 CUDA, 256MB, 213MB available)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Results you can see on hostid=5510631 (http://setiathome.berkeley.edu/results.php?hostid=5510631)

_heinz
Title: Re: optimized sources
Post by: _heinz on 22 Feb 2013, 06:52:52 pm
i7-3630QM_primegrid_1Mio cpu-work (http://www.britta-d.de/images/i7/i7-3630QM_primegrid_1Mio.jpg)
its impressig for a laptop, it runs faster than 2.4GHz Xeon E5405, both running 8 WU at once and distrrtgen on their GPU's.
Riesel Sieve v1.12
runtime = 12,194.19 for Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz [Family 6 Model 58 Stepping 9](8 Prozessoren)
runtime = 13,825.52 for Intel(R) Xeon(R) CPU E5405 @ 2.00GHz [Family 6 Model 23 Stepping 6](8 Prozessoren)

time goes on, V8-Xeon is nearly 4 years old, time to give it a hardware upgrade.

_heinz
 
Title: Re: optimized sources
Post by: _heinz on 26 Feb 2013, 11:21:31 am
i7_gt650m_2Mio_distrrtgen (http://www.britta-d.de/images/i7/i7_gt650m_2Mio_distrrtgen.jpg)
last not least not so bad as I thought, we should don't forget it is still a laptop.

_heinz
24th febrary
i7_gt650m_5Mio_distrrtgen (http://www.britta-d.de/images/i7/i7_gt650m_5Mio_distrrtgen.jpg)
continous work  :)
Title: Re: optimized sources
Post by: corsair on 26 Feb 2013, 12:12:10 pm
i7_gt650m_2Mio_distrrtgen (http://www.britta-d.de/images/i7/i7_gt650m_2Mio_distrrtgen.jpg)
last not least not so bad as I thought, we should don't forget it is still a laptop.

_heinz
24th febrary
i7_gt650m_5Mio_distrrtgen (http://www.britta-d.de/images/i7/i7_gt650m_5Mio_distrrtgen.jpg)
continous work  :)

are you using stock app or optimized one?

in case that optimized where could I found them for NVDIA and AMD GPU??
Title: Re: optimized sources
Post by: _heinz on 26 Feb 2013, 02:22:23 pm
Hi corsair, distrrtgen is downloading programs automatic onto your machine, nothing todo.
Title: Re: optimized sources
Post by: corsair on 27 Feb 2013, 03:11:47 pm
Hi corsair, distrrtgen is downloading programs automatic onto your machine, nothing todo.

Thanks a lot _heinz already notice that but seen somewhere that there is people compiling it's own builds ??
Title: Re: optimized sources
Post by: _heinz on 02 Mar 2013, 04:48:16 pm
Titan, first results on seti forum (http://setiathome.berkeley.edu/forum_thread.php?id=70767)
not so impressed as we thought  :o, the card does not use its full potential.
surprize....

_heinz
Title: Re: optimized sources
Post by: Mike on 03 Mar 2013, 04:54:30 am
I`m not surprised at all.
It was similar with Tesla back then.
Title: Re: optimized sources
Post by: _heinz on 07 Mar 2013, 05:09:52 am
I`m not surprised at all.
It was similar with Tesla back then.

Looks like a complete redesign of the app is necessary to use Tesla`s and Titan`s properties optimal.

_heinz
Title: Re: optimized sources
Post by: _heinz on 08 Mar 2013, 01:49:16 pm
i7-3630QM_primegrid_1Mio cpu-work (http://www.britta-d.de/images/i7/i7-3630QM_primegrid_1Mio.jpg)
time goes on, V8-Xeon is nearly 4 years old, time to give it a hardware upgrade.

_heinz

installed 2 GTX Titan EVGA SC SLI and GTX570
Boinc shows:
08.03.2013 18:43:27 |  | NVIDIA GPU 0: GeForce GTX TITAN (driver version 314.14, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
08.03.2013 18:43:27 |  | NVIDIA GPU 1: GeForce GTX TITAN (driver version 314.14, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
08.03.2013 18:43:27 |  | NVIDIA GPU 2: GeForce GTX 570 (driver version 314.14, CUDA version 5.0, compute capability 2.0, 1280MB, 1178MB available, 1405 GFLOPS peak)
08.03.2013 18:43:27 |  | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN (driver version 314.14, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available)
08.03.2013 18:43:27 |  | OpenCL: NVIDIA GPU 1: GeForce GTX TITAN (driver version 314.14, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available)
08.03.2013 18:43:27 |  | OpenCL: NVIDIA GPU 2: GeForce GTX 570 (driver version 314.14, device version OpenCL 1.1 CUDA, 1280MB, 1178MB available)
GTX_Titan_SLI_ready (http://www.britta-d.de/images/titan/GTX_Titan_SLI_ready.jpg)
GTX_Titan_SLI_working (http://www.britta-d.de/images/titan/GTX_Titan_SLI_working.jpg)
GPUZ_GTX_Titan (http://www.britta-d.de/images/titan/GPUZ_GTX_Titan.jpg)
GPUZ_Sensors_GTX_Titan (http://www.britta-d.de/images/titan/GPUZ_Sensors_GTX_Titan.jpg)
Happy crunching, a lot todo now.
_heinz

Title: Re: optimized sources
Post by: William on 08 Mar 2013, 02:36:42 pm
I`m not surprised at all.
It was similar with Tesla back then.
Looks like a complete redesign of the app is necessary to use Tesla`s and Titan`s properties optimal.

_heinz
Everyone knows the answer is 42.

To paraphrase Jason, He's pretty pleased that his code scales so well on new architecture. There's still a lot of optimisation potential left.
Currently the Titans underperform (when you compare to e.g. a 690) by about 30% - IOW from the specs you would expect some 30% more speed.
According to Jason, drivers for new cards take a year or so to mature - in that time there's quite a bit of speed improvement (from our POV), so I'd expect when the drivers are good, the Titans scale according to their specs. Still a lot of potential left and a lot of stuff to explore.

To me, looks like the wrong bandwagon to jump on.
Title: Re: optimized sources
Post by: Mike on 08 Mar 2013, 05:31:52 pm
I totally agree William.
Title: Re: optimized sources
Post by: _heinz on 09 Mar 2013, 04:01:28 am
I`m not surprised at all.
It was similar with Tesla back then.
Looks like a complete redesign of the app is necessary to use Tesla`s and Titan`s properties optimal.

_heinz
Everyone knows the answer is 42.

 Still a lot of potential left and a lot of stuff to explore.

I know-->42 (http://en.wikipedia.org/wiki/42_%28number%29#In_The_Hitchhiker.27s_Guide_to_the_Galaxy)  ;D
greetings

_heinz
edit:
passed 500Mio_total (http://www.britta-d.de/images/seti/500Mio_total.jpg) today
last month credit go's up to 200Mio_distrrtgen_total (http://www.britta-d.de/images/titan/200Mio_distrrtgen_total.jpg)
Title: Re: optimized sources
Post by: _heinz on 18 Mar 2013, 02:45:42 pm
driver 314.21:
17.03.2013 15:42:01 |  | CUDA: NVIDIA GPU 0: GeForce GTX TITAN (driver version 314.21, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
17.03.2013 15:42:01 |  | CUDA: NVIDIA GPU 1: GeForce GTX TITAN (driver version 314.21, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
17.03.2013 15:42:01 |  | CUDA: NVIDIA GPU 2: GeForce GTX 570 (driver version 314.21, CUDA version 5.0, compute capability 2.0, 1280MB, 1178MB available, 1405 GFLOPS peak)
17.03.2013 15:42:01 |  | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN (driver version 314.21, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4989 GFLOPS peak)
17.03.2013 15:42:01 |  | OpenCL: NVIDIA GPU 1: GeForce GTX TITAN (driver version 314.21, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4989 GFLOPS peak)
17.03.2013 15:42:01 |  | OpenCL: NVIDIA GPU 2: GeForce GTX 570 (driver version 314.21, device version OpenCL 1.1 CUDA, 1280MB, 1178MB available, 1405 GFLOPS peak)
17.03.2013 15:42:01 | DistrRTgen | Found app_info.xml; using anonymous platform
17.03.2013 15:42:01 |  | Config: simulate 8 CPUs
17.03.2013 15:42:01 |  | Config: use all coprocessors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
no news, machine running standard frequency produced continous ~ 2,1 Mio/day as before

Title: Re: optimized sources
Post by: _heinz on 28 Mar 2013, 09:16:52 pm
After testing and running Zdenek's compiled app CC2.0 for GTX570 and Titan daily output increased up to 4,3 Mio/day on 2013-03-28  ;D
first step is done.
_heinz
Title: Re: optimized sources
Post by: _heinz on 01 Apr 2013, 04:38:15 am
Joyeuses Pâques
Frohe Ostern
Happy Easter
Thank you to all readers not lost interest looking up here.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Try the new astropulse OCL apps from Raistmer and Urs Echternacht

_heinz
Title: Re: optimized sources
Post by: _heinz on 03 Apr 2013, 05:45:55 pm
Because I had have trouble to run BOINC multiple clients with mixed configuration GTXTitan /GTX570 to give every card its optimal client, I decided to remove GTX570 and run now triple SLI GTX Titan
31.03.2013 19:33:12 |  | Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz [Family 6 Model 23 Stepping 6]
31.03.2013 19:33:12 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx tm2 dca pbe
31.03.2013 19:33:12 |  | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
31.03.2013 19:33:12 |  | Memory: 16.00 GB physical, 31.99 GB virtual
31.03.2013 19:33:12 |  | Disk: 931.51 GB total, 840.36 GB free
31.03.2013 19:33:12 |  | Local time is UTC +2 hours
31.03.2013 19:33:12 |  | CUDA: NVIDIA GPU 0: GeForce GTX TITAN (driver version 314.22, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
31.03.2013 19:33:12 |  | CUDA: NVIDIA GPU 1: GeForce GTX TITAN (driver version 314.22, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
31.03.2013 19:33:12 |  | CUDA: NVIDIA GPU 2: GeForce GTX TITAN (driver version 314.22, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
31.03.2013 19:33:12 |  | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN (driver version 314.22, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4989 GFLOPS peak)
31.03.2013 19:33:12 |  | OpenCL: NVIDIA GPU 1: GeForce GTX TITAN (driver version 314.22, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4989 GFLOPS peak)
31.03.2013 19:33:12 |  | OpenCL: NVIDIA GPU 2: GeForce GTX TITAN (driver version 314.22, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4989 GFLOPS peak)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GTX_Titan_3SLI_working (http://www.britta-d.de/images/titan/GTX_Titan_3SLI_working.jpg)
measured power 960Watt
Production output ~4,3Mio/day continous

_heinz
Title: Re: optimized sources
Post by: _heinz on 12 Apr 2013, 06:22:38 pm
27th march 2013, excluded my old P4 2,66Mhz AGP HD4670 from crunching after 5Mio on primegrid (http://www.britta-d.de/images/ati/hd4670_pg_5Mio.jpg)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
today 2013-04-12 I'm at the same boinc_place_291 (http://www.britta-d.de/images/seti/boinc_place_291.jpg) as two years ago on 2011-04-10.
RAC is still climbing..

Title: Re: optimized sources
Post by: _heinz on 15 Apr 2013, 06:09:06 pm
Although distrrtgen reduced its credit by longer wu's, production output is now ~5,2Mio/day and RAC is still climbing.
V8-Xeon calculate now a distrrtgen wu in 929 - 1060 sec, that is as fast as HD7970 and a little bit faster.
The differences in runtime comes from different clock speeds of the cards. The upper two cards of dev 0 and 1 are hotter and run with slower clocks than dev 2 the undermost card.
modify:
With boinc_avg_number_29 (http://www.britta-d.de/images/seti/boinc_avg_number_29.jpg) I'm now back in the first 30 worldwide.
Back to the first 10 tophosts distrrtgen_tophosts_number_9 (http://www.britta-d.de/images/seti/distrrtgen_tophosts_number_9.jpg)
One of the first 20 top user distrrtgen_topuser_number_19 (http://www.britta-d.de/images/seti/distrrtgen_topuser_number_19.jpg)
Title: Re: optimized sources
Post by: _heinz on 17 Apr 2013, 11:05:59 am
For all who are interested in technical datails of GK110, here there are:
aida64_grafikprozessor_GTX_Titan (http://www.britta-d.de/images/titan/aida64_grafikprozessor_GTX_Titan.jpg)
aida64_GPGPU_CUDA_GTX_Titan_properties_1 (http://www.britta-d.de/images/titan/aida64_GPGPU_CUDA_GTX_Titan_properties_1.jpg)
aida64_GPGPU_CUDA_GTX_Titan_properties_2 (http://www.britta-d.de/images/titan/aida64_GPGPU_CUDA_GTX_Titan_properties_2.jpg)
aida64_GPGPU_Direct3D_GTX_Titan_properties_3 (http://www.britta-d.de/images/titan/aida64_GPGPU_Direct3D_GTX_Titan_properties_3.jpg)
aida64_GPGPU_OpenCL_GTX_Titan_properties_4 (http://www.britta-d.de/images/titan/aida64_GPGPU_OpenCL_GTX_Titan_properties_4.jpg)
aida64_GPGPU_OpenCL_GTX_Titan_properties_5 (http://www.britta-d.de/images/titan/aida64_GPGPU_OpenCL_GTX_Titan_properties_5.jpg)
aida64_GPGPU_OpenCL_GTX_Titan_properties_6 (http://www.britta-d.de/images/titan/aida64_GPGPU_OpenCL_GTX_Titan_properties_6.jpg)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
My answer to: http://lunatics.kwsn.net/2-windows/optimized-sources.msg49420.html#msg49420
As soon as I have a new PSU, I will test v8-Xeon with distrrtgen to get real data.
here there are:

The question is:How many KWh need my machines to get 1 Mio Cobblestone?
Calculated for distrrtgen
1 WU = 19825 Points
1000000 / 19825 = 50,441 ~ 51WU's needed.

Machine 4: V8-Xeon, 2,4GHz  3xNVIDIA GTX Titan
GTX Titan need ~1006 sec/WU, 51WU*1006sec/3600= 14,25h/Mio
While we have 3 Titan we must divide by 3
14,25h/3=4,75h per Mio
Measured crunch power = 860W
860W*4,75h=4085Wh ~4,085KWh
1Mio = 4,085KWh
~~~~~~~~~~~~~~~~~
Machine 5: Laptop Acer Aspire V3-771G, i7-3630QM 2,4GHz NVIDIA GT650M
Duration per WU = 10246 sec
51WU*10246sec/3600= 145,15h/Mio
crunch power ~60W
60W*145,15h=8709Wh ~8,709KWh
1Mio = 8,709KWh
~~~~~~~~~~~~~~~~~
Machine 3: ATOM 1,6GHz, ION GPU
Duration per WU = 111283sec
51wu * 111283/3600 = 1576,50h/Mio
Measured crunch-power = 28W
1576,5h*28W=44142Wh ~44,14KWh
1Mio = 44,14KWh
~~~~~~~~~~~~~~~~
Title: Re: optimized sources
Post by: _heinz on 26 Apr 2013, 07:03:02 pm
passed boinc_combined_700Mio (http://www.britta-d.de/images/seti/boinc_combined_700Mio.jpg) and the magical 99,99%
distrrtgen_topuser_number_16 (http://www.britta-d.de/images/seti/distrrtgen_topuser_number_16.jpg)
distrrtgen_tophosts_number_4 (http://www.britta-d.de/images/seti/distrrtgen_tophosts_number_4.jpg)
RAC is still climbing.
Title: Re: optimized sources
Post by: _heinz on 05 May 2013, 06:45:47 pm
meanwhile short status:
BOINC World position based on RAC boinc_avg_number_20 (http://www.britta-d.de/images/seti/boinc_avg_number_20.jpg)
number 13 of distrrtgen_top_users (http://boinc.freerainbowtables.com/distrrtgen/top_users.php)
number 2 of distrrtgen_top_hosts (http://boinc.freerainbowtables.com/distrrtgen/top_hosts.php)
in ca 3 weeks we will reach the maximum of RAC
Title: Re: optimized sources
Post by: _heinz on 13 May 2013, 07:50:13 am
13th of Mai,
V8-Xeon is now number 1 of distrrtgen_top_hosts (http://www.britta-d.de/images/seti/distrrtgen_tophosts_number_1.jpg)
Title: Re: optimized sources
Post by: _heinz on 14 May 2013, 08:08:24 pm
Lost one hour today. Shut down the machine for a short cleaning and mounted a additional fan directly bevore the triple set. RAC dropped a bit... So TUKIA had the chance to catch me.
Precalculation says I will got 1.000.000.000 at 10th of June. We will see if it become true...
I can produce 50Mio/week as you can see at france_number_9 (http://www.britta-d.de/images/seti/france_number_9.jpg)
Title: Re: optimized sources
Post by: _heinz on 17 May 2013, 04:56:17 pm
A storm of shorties let drop my RAC rapidly...
every wu, Long=18800 points or short=9400 points, runs ~ 470 sec on the Titan, so RAC depends from the wu-type you get
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
by the way Xeon Phi 5110P is now available in Germany, Price~2400 Euro...
60 x86-Kerne, 1053 MHz, 240 Threads. 320 GB/s Speicherbandbreite. 1011 Gigaflops (Double Precision).
Standard-PCIe-x16-Formfaktor, Passivkühlung, Linux Betriebssystem, IP-adressierfähig, 512-Bit SIMD-Befehle,
Unterstützt von den neuesten Intel® Produkten für die Software-Entwicklung

Title: Re: optimized sources
Post by: _heinz on 20 May 2013, 11:20:56 am
20th Mai, distrrtgen_topuser_number_12 (http://www.britta-d.de/images/seti/distrrtgen_topuser_number_12.jpg)
Title: Re: optimized sources
Post by: _heinz on 23 May 2013, 05:45:09 pm
EVGA GeForce_GTX_780_SuperClocked (http://www.alternate.de/EVGA/EVGA+GeForce_GTX_780_SuperClocked,_Grafikkarte/html/product/1082707/?)   is available in Germany (€ 659,-* )
 Kepler GK110-300
 2.304 Shader
 13 SMX-Einheiten
 208 TMUs
 40 ROPs
 941 MHz Basistakt
 Boost-Funktion bis zu 992 MHz
 3 Gbyte GDDR5
 384-Bit-Interface
 :o  surprize faster(MHz) than a Titan
~~~~~~~~~~~~~~~~~~~~~~
Titan EVGA SC
 Kepler GK110
 2.688 Shader
 14 SMX-Einheiten
 224 TMUs
 48 ROPs
 876 MHz Basistakt
 Boost-Funktion bis zu 928 MHz
 6 Gbyte GDDR5
 384-Bit-Inter&ace
~~~~~~~~~~~~~~~~~~~~~~
one of my oced Titan runs 1136 MHz stable, the other two running 1084 MHz (Roomtemp=16,8 grd Celsius)
Title: Re: optimized sources
Post by: _heinz on 29 May 2013, 03:17:37 am
29th Mai
i3_gt540m_distrrtgen_monthly_output (http://www.britta-d.de/images/gtx540m/gt540m_distrrtgen_monthly_output.jpg)
i7_gt650m_20Mio_distrrtgen (http://www.britta-d.de/images/i7/i7_gt650m_20Mio_distrrtgen.jpg)
i7-gt650m can prodce ~5Mio per month, thats the same as V8-Xeon get per day.
GTX_Titan_distrrtgen_monthly_output (http://www.britta-d.de/images/titan/GTX_Titan_distrrtgen_monthly_output.jpg)
Title: Re: optimized sources
Post by: _heinz on 08 Jul 2013, 11:51:13 am
Intel's 4th Generation CPU are available in Germany.
Intel® Core™ i7-4770K (http://www.alternate.de/Intel(R)/Intel(R)+Core(TM)_i7-4770K,_CPU/html/product/1063382/?click_HP=23996), CPU (FC-LGA4, "Haswell", boxed, boxed) €=299,00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Today I got 1_Mrd_total_credit (http://www.britta-d.de/images/primegrid/1_Mrd_total_credit.jpg)
Summer in Germany, temps are up to 28 grd....
pg_v8-xeon_FBDIMM_109_grd_Celsius (http://www.britta-d.de/images/primegrid/pg_v8-xeon_FBDIMM_109_grd_Celsius.jpg)
I reduced now ncpus to 4 to get temps under 100 grd Celsius.
<ncpus>4</ncpus>

_heinz
Title: Re: optimized sources
Post by: _heinz on 31 Jul 2013, 05:36:17 pm
31st of July
time to show a picture of V8-Xeon_triple_GTX_Titan (http://www.britta-d.de/images/titan/triple_GTX_Titan.JPG)
The machine is today distrrtgen_tophosts_number_2 (http://www.britta-d.de/images/seti/distrrtgen_tophosts_number_2.jpg)
The summer is here with temps up to 38 grd Celsius last and next weekend....
The downclocking feature of the Titan works perfect to hold the temps below 85 grd Celsius.
I measured 880-920 Watt running 3 distrrtgen and 8 primegrid PPSE LLR 6.15
with 4 primegrid it shows 860-880 Watt
if it runs empty it shows 200 Watt
This evening roomtemp is 25,8 grd and the case sensor shows 51 grd between the FBDIMMS.
FBDIMM2 temp = 82 grd C
~~~~~~~~~~~~~~~~~~~~~~~~
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (NV-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   61 °C  (142 °F)
CPU2   59 °C  (138 °F)
1. CPU / 1. Kern   55 °C  (131 °F)
1. CPU / 2. Kern   44 °C  (111 °F)
1. CPU / 3. Kern   53 °C  (127 °F)
1. CPU / 4. Kern   54 °C  (129 °F)
2. CPU / 1. Kern   46 °C  (115 °F)
2. CPU / 2. Kern   41 °C  (106 °F)
2. CPU / 3. Kern   47 °C  (117 °F)
2. CPU / 4. Kern   47 °C  (117 °F)
DIMM   78 °C  (172 °F)
GPU1: GPU Diode   87 °C  (189 °F)
GPU2: GPU Diode   87 °C  (189 °F)
GPU3: GPU Diode   83 °C  (181 °F)
Temperatur 1   55 °C  (131 °F)
Temperatur 2   50 °C  (122 °F)
Temperatur 3   53 °C  (127 °F)
FB-DIMM1   80 °C  (176 °F)
FB-DIMM2   82 °C  (180 °F)
FB-DIMM3   73 °C  (163 °F)
FB-DIMM4   70 °C  (158 °F)
ST31000340NS   36 °C  (97 °F)
   
Kühllüfter   
CPU1   604 RPM
CPU2   583 RPM
North Bridge   4078 RPM
South Bridge   4580 RPM
Aux   682 RPM
GPU1   4295 RPM  (85%)
GPU2   4347 RPM  (85%)
GPU3   4163 RPM  (85%)
   
Spannungswerte   
CPU1 Kern   1.102 V
CPU2 Kern   1.102 V
+1.5 V   1.536 V
+3.3 V   3.352 V
+5 V   5.063 V
+12 V   12.125 V
FSB VTT   1.211 V
North Bridge Kern   1.250 V
DIMM   1.797 V
GPU1: GPU Kern   1.087 V
GPU2: GPU Kern   1.062 V
GPU3: GPU Kern   1.137 V
   
Leistungswerte   
GPU1: GPU TDP%   85.27 %
GPU2: GPU TDP%   79.48 %
GPU3: GPU TDP%   98.60 %
~~~~~~~~~~~~~~~~~~~~
Summary we can say the machine is stable, Zero error of any distrrtgen wu since March.

_heinz

Title: Re: optimized sources
Post by: _heinz on 10 Aug 2013, 02:13:55 pm
10th August 2013
although I had have 14 days vacation end of last month, V8-Xeon is today distrrtgen_tophosts_number_1_again (http://www.britta-d.de/images/seti/distrrtgen_tophosts_number_1_again.jpg)
I modified the triple set of Titans (http://www.britta-d.de/images/titan/modified_triple_titan.jpg) by closing the airslot (red-metal) ontop of the cards to hold the hot air out of the case.
Additional I set a  12cm fan from a defect PSU on the right side before the cards.
The 9,5cm fan over the cards is cooling the FB-DIMM's.
modify: pictures
original_triple_gtx_titan (http://www.britta-d.de/images/titan/original_triple_gtx_titan.jpg) (airslots open)
modified_triple_gtx_titan (http://www.britta-d.de/images/titan/modified_triple_gtx_titan.jpg) (airslots closed)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More than 230 000 clicks ... , a big thank you to all readers.  ;D

_heinz
Title: Re: optimized sources
Post by: _heinz on 15 Aug 2013, 05:46:40 pm
The case is out of the room and hang in the window.
It is cooler now and with my last modifications I get better temp's.
This night:
airtemp 20,8 grd C
humidity 50%
~~~~~~~~~~~~~~~~~~~~

Informationsliste   Wert
Sensor Eigenschaften   
Sensortyp   Dual ADT7490  (SMBus 2Ch, 2Eh)
GPU Sensortyp   Diode  (NV-Diode)
Motherboard Name   Intel D5400XS
   
Temperaturen   
CPU1   42 °C  (108 °F)
CPU2   50 °C  (122 °F)
1. CPU / 1. Kern   43 °C  (109 °F)
1. CPU / 2. Kern   27 °C  (81 °F)
1. CPU / 3. Kern   39 °C  (102 °F)
1. CPU / 4. Kern   38 °C  (100 °F)
2. CPU / 1. Kern   39 °C  (102 °F)
2. CPU / 2. Kern   36 °C  (97 °F)
2. CPU / 3. Kern   38 °C  (100 °F)
2. CPU / 4. Kern   38 °C  (100 °F)
DIMM   70 °C  (158 °F)
GPU1: GPU Diode   85 °C  (185 °F)
GPU2: GPU Diode   85 °C  (185 °F)
GPU3: GPU Diode   68 °C  (154 °F)
Temperatur 1   45 °C  (113 °F)
Temperatur 2   41 °C  (106 °F)
Temperatur 3   42 °C  (108 °F)
FB-DIMM1   61 °C  (142 °F)
FB-DIMM2   64 °C  (147 °F)
FB-DIMM3   58 °C  (136 °F)
FB-DIMM4   56 °C  (133 °F)
ST31000340NS   32 °C  (90 °F)
   
Kühllüfter   
CPU1   584 RPM
CPU2   576 RPM
North Bridge   2273 RPM
South Bridge   4466 RPM
Aux   437 RPM
GPU1   4327 RPM  (85%)
GPU2   4364 RPM  (85%)
GPU3   4175 RPM  (85%)
   
Spannungswerte   
CPU1 Kern   1.102 V
CPU2 Kern   1.113 V
+1.5 V   1.536 V
+3.3 V   3.352 V
+5 V   5.063 V
+12 V   12.125 V
FSB VTT   1.211 V
North Bridge Kern   1.250 V
DIMM   1.823 V
GPU1: GPU Kern   1.125 V
GPU2: GPU Kern   1.125 V
GPU3: GPU Kern   1.137 V
   
Leistungswerte   
GPU1: GPU TDP%   90.68 %
GPU2: GPU TDP%   91.27 %
GPU3: GPU TDP%   96.68 %
~~~~~~~~~~~~~~~~~~~~~~~
_heinz
Title: Re: optimized sources
Post by: _heinz on 29 Aug 2013, 02:16:41 pm
to find the optimal point of overclocking is not easy...so I got many inconclusive and errors and my RAC dropped down...
sofar I found my shortest successful runtime GTX_Titan_distrrtgen_401_sec_successful (http://www.britta-d.de/images/titan/GTX_Titan_distrrtgen_401_sec_successful.jpg)
modify:
my host is 31943
distrrtgen_paket_26876843 (http://www.britta-d.de/images/titan/distrrtgen_paket_26876843.jpg)
GTX_Titan_distrrtgen_366_sec_successful (http://www.britta-d.de/images/titan/GTX_Titan_distrrtgen_366_sec_successful.jpg)

_heinz

Title: Re: optimized sources
Post by: _heinz on 03 Sep 2013, 04:12:49 am
3rd September,
R3600 ATOM with ION chipset get today ION_2Mio_distrrtgen (http://www.britta-d.de/images/seti/ION_2Mio_distrrtgen.jpg)
Acer Aspire V3-771G i7-3630QM 2.4GHz i7_gt650m_34Mio_distrrtgen (http://www.britta-d.de/images/i7/i7_gt650m_34Mio_distrrtgen.jpg)
btw.
Cecile Tseng's watercooled hostid=70990 (http://boinc.freerainbowtables.com/distrrtgen/show_host_detail.php?hostid=70990) i7-2600K, 2 Titan's with modified BIOS Settings runs ~398 sec per distrrtgen-wu
Till now I could hold the number 1 in the distrrtgen/top_hosts (http://boinc.freerainbowtables.com/distrrtgen/top_hosts.php) list
But you know things can easy change by thunderstorms, heating wheather periods, no electricity,  no internet, driver resets, harware issues, vacation and so on....

_heinz


 
Title: Re: optimized sources
Post by: corsair on 04 Sep 2013, 07:25:36 pm
Is it any optimized SETI app for intel GPU??

eg Intel HD 4000 embebed in i5
Title: Re: optimized sources
Post by: MarkJ on 05 Sep 2013, 07:02:07 am
Is it any optimized SETI app for intel GPU??

eg Intel HD 4000 embebed in i5
Raistmer had an Astropulse one that was being beta tested but it had some "precision" issue and was withdrawn. Search the message thread over at Seti in the Number Crunching forum. The last post was on the 12th of June.

Einstein have an app for the HD2500 and HD4000
Title: Re: optimized sources
Post by: _heinz on 15 Sep 2013, 04:54:43 am
crunching full load with a laptop works as a toaster
roomtemp 22 grd Celsius
i7_3630QM_under_full_load (http://www.britta-d.de/images/i7/i7_3630QM_under_full_load.jpg)  :o

_heinz
Title: Re: optimized sources
Post by: _heinz on 17 Sep 2013, 10:28:41 am
v8-xeon was now more than 3 weeks on place 1 of distrrtgen. Since yesterday distrrtgen has no work, RAC dropped down now.
The big race is over now...
I run now seti v7, 3wu's at once per graphicadapter on v8-Xeon.
mbcuda.cfg
pfblockspersm = 8
pfperiodsperlaunch = 200

If you want a look at v8-xeon see hostid=6944847 (http://setiathome.berkeley.edu/results.php?hostid=6944847)
Titan 3 at once runtime ~9 a 15min
Laptop i7-3630QM,   GT650M see hostid=7096138 (http://setiathome.berkeley.edu/results.php?hostid=7096138)
GT650M runs two at once. Runtime ~1h 6min
seti v7 on ION see hostid=5510631 (http://setiathome.berkeley.edu/results.php?hostid=5510631)
ION runtime ~3h 5 min

Installation was easy with the new installer Lunatics_Win64_v0.41_setup.exe downloadable from arkayn's site (http://www.arkayn.us/forum/index.php?action=tpmod;dl=cat3)
Thanks to all who are involved to publish it.

_heinz
Title: Re: optimized sources
Post by: _heinz on 31 Dec 2013, 08:29:57 pm
Joyeuses Fêtes et une Bonne Année 2014  ;D

Title: Re: optimized sources
Post by: _heinz on 23 Jan 2014, 05:48:05 pm
available soon:
NVIDIA GeForce GTX TITAN Black Edition
NVIDIA GeForce GTX 790
nvidia-launch-geforce-gtx-titan-black-edition-geforce-gtx-790 (http://videocardz.com/48533/nvidia-launch-geforce-gtx-titan-black-edition-geforce-gtx-790)

_heinz
Title: Re: optimized sources
Post by: _heinz on 07 Jul 2014, 11:29:22 am
Titan-Z is available in Germany, 15th July
GeForce-GTX-TITAN-Z-Grafikkarte (http://www.alternate.de/ASUS/GeForce-GTX-TITAN-Z-Grafikkarte/html/product/1138783?)
2799 Euro for two full GK110-"Kepler"-Chips on one board with 12288 MB GDDR5 and 5760 Streaming processors.

btw. my Server lost 2 disks by a short breakout of Voltage by thunderstorm. Had have a lot trouble to bring it back to the net. New install of a old XP licence necessary to the last cobblestones to my goal 2 Mrd (http://www.britta-d.de\images\seti\2mrd.jpg). Last Sunday on 6th July I get it. My time for crunching in France is over now. Will shut down v8-Xeon forever in the next days. Going back to Berlin. Will give a short review about V8-Xeon and it's history of crunching in one of the next months.

_heinz