Forum > Linux
R-2.2 Linux apps?
Crunch3r:
--- Quote from: Simon on 04 Mar 2007, 10:26:24 pm ---Are ya :)
Well, he did send the code to me, and I did get something compiled. It doesn't fully work, yet, but we're finally seeing progress.
--- End quote ---
As of now i think i'm allmost done porting the code to linux... The whole "benchmarks" works now 100%.
Here's the output of a "-bench" run:
--- Code: ---Can't set up shared mem: -1
------- [ benchmark ] --------
PowerSpectrum--: 11962970 x1.00 0 -- [ avg magnitude = 1.8313 (50)]
sse_GetTPS: 11868075 x1.01 0
PowerSpectrum--[sse_GetTPS]: 11868075 (chosen)
------------------------------
PwrSpectOnly--: 1788970 x1.00 0 -- [ avg magnitude = 1.8313 (50)]
sse_GetPSO_npr: 1702755 x1.05 0
sse_GetPSO_p32: 1232205 x1.45 0
sse_GetPSO_p64: 1181956 x1.51 0
sse_GetPSO_p128: 1634496 x1.09 0
PwrSpectOnly--[sse_GetPSO_p64]: 1181956 (chosen)
------------------------------
Transpose--: 10711025 x1.00 0 -- [ avg magnitude = 0.9996 (50)]
Transpose2: 5965319 x1.80 0
Transpose4: 4939761 x2.17 0
sse_Trans4ntw: 1841899 x5.82 0
sse_pfTrans8x4ntw: 1841200 x5.82 0
Transpose--[sse_pfTrans8x4ntw]: 1841200 (chosen)
------------------------------
ChirpData--: 185765588 x1.00 0 -- [ avg magnitude = 0.9735 (12)]
TrigArray: 91202502 x2.04 1.6e-09
sse1_akChirp: 27424411 x6.77 1.9e-07
ChirpData--[sse1_akChirp]: 27424411 (chosen)
------------------------------
GetPeak--: 665042 x1.00 0 -- [ avg magnitude = 0.9735 (50)]
hand_opt: 166825 x3.99 7.1e-07 t=-16605.6680 o=-16605.6562
sse_vector: 76711 x8.67 4.7e-07 t=-16605.6641 o=-16605.6562
GetPeak--[sse_vector]: 76711 (chosen)
------------------------------
f_sum--: 319228 x1.00 0 -- [ avg magnitude = 0.9735 (50)]
unroll4: 298885 x1.07 1.9e-08 t=-12129.2469 o=-12129.2467
hand_sse: 484944 x0.66 1e-08 t=-12129.2466 o=-12129.2467
sse_vector: 322472 x0.99 0 t=-12129.2467 o=-12129.2467
f_sum--[unroll4]: 298885 (chosen)
------------------------------
GetChiSq--: 49060 x1.00 0 -- [ avg magnitude = 0.9735 (50)]
hoisted+abs(: 40314 x1.22 2.5e-07 t=123.6700 o=123.6699
GetChiSq--[hoisted+abs(]: 40314 (chosen)
------------------------------
IPP FFT SSE1(64K): 13528329 x1.00 0 -- [ avg magnitude = 30.3482 (50)]
IPP FFT SSE1(64K)[original]: 13528329 (chosen)
------------------------------
Bench Time: 8.36 seconds
- [ pulse fold select ] -
Standard: 26110400 x1.00 0
FPU opt: 13988950 x1.87 4.3e-10
ben SSE: 5240973 x4.98 0
AK SSE: 6544747 x3.99 4.4e-07
BH SSE: 5523332 x4.73 0
ben SSE: 5240973 (chosen)
Test Time: 0.59 seconds
--- End code ---
The malloc_a.cpp needs to be reverted back to the default code that comes from berkeley and all calls to MEM.free MEM.alloc etc. had to be rplaced by malloc_a and free_a etc. Memory allocation works now 100% too.
ATM i guess i'm at 60-70% done with porting the code.
However cpu feature detection does not work correctly, in benchmark.cpp (the "if ( our_cpu.sse3() )" ... why does int work while running the benchmark ? ... Hmmmm... ) but that's not a problem... replacing the "ifs" with a if defined "USE_SSE" etc and a simple recompile of the optimizer.a with the matching cpu flags will do the job. (compilation of the optimizer.a takes less than 20 sec.)
However... running the app in the intel debugger pointed out to a "signal SEGV"
stopped at [float sse_sum3(float**, struct PoTPlan*):403 0x081382df]
403 s_putU( &sums[i + 0], sum1 ); s_putU( &sums[i +4], sum2 );
I think the problem is the opcodes_SSE.hpp: especially the #define s_putU( ptr, aaaa ) WHAT IS THE "PTR" for ?
Ben, Joe, Alex any idea ?
Josef W. Segur:
--- Quote from: Crunch3r on 08 Mar 2007, 03:37:56 pm ---...
I think the problem is the opcodes_SSE.hpp: especially the #define s_putU( ptr, aaaa ) WHAT IS THE "PTR" for ?
Ben, Joe, Alex any idea ?
--- End quote ---
It's needed by the intrinsics into which s_putU() is expanded, so they'll know where to store aaaa. That's Ben's code, but he's been busy elsewhere much of the time lately.
I'll note that Simon had somewhat similar problems with some of those macro expansions when using the non-commercial ( 9.0 ) Intel compiler, so he switched to the eval version 9.1 and they went away.
Joe
BenHer:
Crunch3r,
I wrote those SIMD macros back when I was both developing 3DNow and SSE1-3 versions of functions. I used macros instead of the built in intrinsics of the compiler, because I would rather use s_add(a, b) instead of two oddly named intrinsics for 3DNow and SSE which still weren't the same as the underlying assembler opcodes (I tend to think in assembler). ;)
It also allowed me to use virtually identical functions for both 3DNow and SSE with a few tweaks here and there.
PTR is just a macro parameter that is fed a pointer value, depending on the compiler and how they formulated their underlying intrinsic's prototypes, it either should be a pointer to a full SIMD value or a pointer to a buffer of floats. The put_U part makes it use the Unaligned version of the simd resgister to memory store opcode.
Regarding the CPUID code...Hans Dorn posted a working unix version of that on one of these boards (as a standalone executable). He modified the source where appropriate to get it to compile and work. I specifically wrote the source so that the core code that actually does the identifying was shared between the standalone app and the seti code (same source files). So once Hans' patches are applied I can't see why it wouldn't work, assuming similar compilers & libraries for unix (linux, *nix).
Simon:
Folks,
I've added Crunch3r to the Linux porting team (member group here, equivalent permissions as pre-release tester/coder).
Combining our efforts, I'm sure we will get further than on our own.
Crunch3r, you should now see some extra boards and download categories.
--- Quote from: Crunch3r ---However cpu feature detection does not work correctly, in benchmark.cpp (the "if ( our_cpu.sse3() )" ... why does int work while running the benchmark ? ... Hmmmm... )
--- End quote ---
It checks for the current CPU's SIMD capabilities to see what benchmark functions to test. When you look at the benchmarks you posted, they are for SSE only. The CPUID code still checks whether your CPU can do SSE/2/3. When SSE2 or 3 is supported, it will benchmark them and use the quickest! For this to work, the CPUID code has to work too, though.
They need to know what's supported so only those run. I just let the Windows SSE-optimized app run on an A64, it benched and used the SSE2 functions. The differences between the apps are really only the compile switches (-xK vs. -xW etc.).
I've also dug up Hans Dorn's Linux CPUID post (contains a source archive).
He stated that he got some compiler warnings related to __fastcall, same for me. We should #ifdef those, if possible (not sure how, either define __fastcall to something compatible somewhere or just omit it for Linux compiles).
BTW, am I imagining this, or do benchmarks on Linux have a smaller error ratio? The only thing I can think of why this is: Linux, by default, uses 80 bits (64 significant bits) for double precision, Windows uses 64 (53 significant bits) per ICC default settings, -pc64 -pc80.
Regards,
Simon.
Crunch3r:
--- Quote from: michael37 on 04 Mar 2007, 10:24:49 pm ---Wow: look at this thread:
http://setiathome.berkeley.edu/forum_thread.php?id=38058&nowrap=true#525923
Chicken, we're waiting for your update with Crunch3rs source code (and maybe even binaries!!!) shortly!
--- End quote ---
Hi Micheal37,
I'll guess that we can offer the ported app by the end of the week, maybee sooner.
Do you still have your IA64 running Linux ?
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version