Forum > Windows

Current Profile Analysis and points to optimze

<< < (2/5) > >>

BenHer:
Simon,

Could you post a full WU for testing that

a. has typical  angle_range
b. crunches for a typical amount of time on one of your systems (ie the average time for that system)

As Josef suggested doing a full run with this type of full WU would be best.

Thanks,
=Ben

Simon:
Josef is talking about the most common angle ranges - apart from the adjusted chirp_limit parameters, they are the exact same WUs.

Change the two occurrences of <chirp_limit>Value</chirp_limit> to 20 and 50, respectively, so the passage in each WU (there is an XML-style header on each) looks like this:


--- Code: ---  <chirps>
    <chirp_parameter_t>
      <chirp_limit>20</chirp_limit>
      <fft_len_flags>262136</fft_len_flags>
    </chirp_parameter_t>
    <chirp_parameter_t>
      <chirp_limit>50</chirp_limit>
      <fft_len_flags>65528</fft_len_flags>
    </chirp_parameter_t>
  </chirps>
--- End code ---

Only change the chirp_limit values please.

testWU-2, 3 and 5 fall in the "most common ARs" category.

I've modified these WUs and attached them to this post, since I realized you may not have a test pack.

HTH,
Simon.

[attachment deleted by admin]

BenHer:
Thanks Simon,

I had downloaded those 7 WUs and the testbench already.

What I was looking for was a WU that would take the full normal ammount of time...2  3 and 5 are all "shorter" WUs.

Which aspect makes them short I`m not certain...perhaps changing those chirp_limit values you suggested would make the time long again (run it as it would if downloaded from seti )

Simon:
Exactly my point :)

Those were downloaded from S@H by one of my hosts about two months ago.

The files I attached to my previous post have the chirp_limit values already reset, so please use them. The one that would run the longest would be testWU-6 (after changing chirp_limit as pointed out previously) as it is a VLAR (very low angle range) WU.

HTH,
Simon.

BenHer:
Simon,

You mentioned you and Michael weren't able to get speed improvments out of find_pulse...

Here is a section of the Intel C++ compiler  SSE2 optimized code for the loop which "folds" two tables into a third table.
 

--- Code: --- float tmp_max = 0;
for (int i=0;i<length;i++) {
register float tmpfloat=(ptr1[i]+ptr2[i])/2;
sums[i]=tmpfloat;
if (tmpfloat>tmp_max) {
tmp_max=tmpfloat;
}
}

... becomes...

Tree Samples Address Code Bytes Source Line # CPU0 CPU1
5 0x410629 F3 0F 10 05 48 17 50 00 movss xmm0,[00501748h] 5 5
0x41063d 89 0C 24 mov [esp],ecx 6
36 0x410640 8B 4D 14 mov ecx,[ebp+14h] 7 17 19
0x41064b 8B 75 10 mov esi,[ebp+10h] 8
38 0x41064e 33 FF xor edi,edi 9 18 20
0x410650 F3 0F 10 14 BE movss xmm2,[esi+edi*4] 10
99 0x410655 F3 0F 58 14 B9 addss xmm2,[ecx+edi*4] 11 40 59
11 0x41065a F3 0F 59 D0 mulss xmm2,xmm0 12 7 4
23 0x41067c 0F 28 05 A0 16 50 00 movaps xmm0,[005016a0h] 13 10 13
0x410699 0F 10 14 B9 movups xmm2,[ecx+edi*4] 14
495 0x41069d 0F 58 14 BA addps xmm2,[edx+edi*4] 15 194 301
270 0x4106a1 0F 59 D0 mulps xmm2,xmm0 16 122 148
175 0x4106ab 0F 10 4C B9 10 movups xmm1,[ecx+edi*4+10h] 17 74 101
359 0x4106b0 0F 58 4C BA 10 addps xmm1,[edx+edi*4+10h] 18 154 205
199 0x4106b5 0F 59 C8 mulps xmm1,xmm0 19 76 123
470 0x4106cf 0F 10 24 B9 movups xmm4,[ecx+edi*4] 20 201 269
1284 0x4106d3 0F 10 14 BA movups xmm2,[edx+edi*4] 21 540 744
1204 0x4106d7 0F 58 E2 addps xmm4,xmm2 22 532 672
510 0x4106da 0F 59 E0 mulps xmm4,xmm0 23 204 306
535 0x4106e4 0F 10 4C B9 10 movups xmm1,[ecx+edi*4+10h] 24 201 334
1066 0x4106e9 0F 10 5C BA 10 movups xmm3,[edx+edi*4+10h] 25 461 605
1142 0x4106ee 0F 58 CB addps xmm1,xmm3 26 478 664
539 0x4106f1 0F 59 C8 mulps xmm1,xmm0 27 213 326
119 0x410726 F3 0F 10 05 48 17 50 00 movss xmm0,[00501748h] 28 49 70
136 0x41072e 8B 45 14 mov eax,[ebp+14h] 29 60 76
114 0x410731 8B 55 10 mov edx,[ebp+10h] 30 48 66
5 0x410734 F3 0F 10 14 BA movss xmm2,[edx+edi*4] 31 1 4
1751 0x410739 F3 0F 58 14 B8 addss xmm2,[eax+edi*4] 32 683 1068
143 0x41073e F3 0F 59 D0 mulss xmm2,xmm0 33 44 99

--- End code ---

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version