Author Topic: Bug report science function (Read 12308 times)

nanobyte · « **on:** 10 Jul 2008, 06:49:25 pm »

I am under the impression that the compiler has generated code that effectively ignores results in the chirp function.

Version:
Windows optimized S@H Enhanced application by Alex Kan
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale

I use the 'pe explorer/disassembler' and Intel VTune to analyze the assembly source. VTune names this routine 'v_vChirpData'.

When disassembled, the function at 401b0c has a section where the following instruction sequence is used.

At address: 401c77
movaps xmm0,xmm4
addpd xmm0,xmm3
subpd xmm0,xmm4
subpd xmm3,xmm0

In this sequence, xmm3 is always zero.

Based on a similar sequence, several other registers are eventually zeroed as well. This undermines the logic in the entire routine.

Could you please verify this against the source code ?

nanobyte.

Jason G · « **Reply #1 on:** 11 Jul 2008, 12:50:10 am »

Hi there,
That stretch of code is among the intrinsic portion of the function, and is part of the code that just grabs the 'sign bit', so it should be +/- zero . I believe this is a necessary, and intentional part for the rounding action needed for the chirp , angle reduction to within the range -0.5 to 0.5.

This is one of the few places in the code where extreme precision is required, so double precision is used) and a 'bug' as such would manifest probably with horrible consequences. If I am correct then what the compiler has done there is replicate the source faithfully (from the intrinsics), and sequences of adds & subs are generally faster to use within critical loops than other possible methods. I hope that helps.

Thanks, Jason

[See Alex's much better answer

]

Alex Kan · « **Reply #2 on:** 11 Jul 2008, 01:27:39 am »

Quote from: nanobyte on 10 Jul 2008, 06:49:25 pm

When disassembled, the function at 401b0c has a section where the following instruction sequence is used.

At address: 401c77
movaps xmm0,xmm4
addpd xmm0,xmm3
subpd xmm0,xmm4
subpd xmm3,xmm0

In this sequence, xmm3 is always zero.

Based on a similar sequence, several other registers are eventually zeroed as well. This undermines the logic in the entire routine.

Could you please verify this against the source code ?

This is the intended behavior.

Your observation would be true if all floating-point operations were carried out with infinite precision. However, since each operation rounds off to a fixed precision, addition and subtraction are not actually associative. Despite its outward appearances, that instruction sequence does not set xmm3 to zero—it actually generates the fractional part of the value originally contained in xmm3. Specifically, the first three instructions round xmm3 to the nearest integer value using magic numbers (chosen to push all the fractional bits off the end of the mantissa), then place the rounded value into xmm0. Subtracting xmm0 from xmm3 yields the fractional part.

Overzealous compiler optimization based on arithmetic associativity breaks techniques relying on floating-point rounding behavior, like Kahan summation and the code above. Fortunately, the Intel compiler has not done anything of the sort here.

nanobyte · « **Reply #3 on:** 11 Jul 2008, 02:15:34 am »

Clear explanation, thank you very much. I am relieved that this behaviour was given thought.
You did an excellent job on the code.

best regards,
nanobyte

Author Topic: Bug report science function (Read 12308 times)

nanobyte

Bug report science function

Jason G

Re: Bug report science function

Alex Kan

Re: Bug report science function

nanobyte

Re: Bug report science function