+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: Difference?  (Read 27303 times)

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Difference?
« on: 09 Jun 2011, 04:21:45 pm »
Lets say we have some Credit and Runtime data from a few tasks and want to calculate Credit/sec. I see three possibilities:

1) sum(Credit) / sum(Runtime)
2) avg(Credit) / avg(Runtime)
3) avg(Credit / Runtime)

In the 3rd we calculate Credit/sec for each task and then we take the average of those.

1 and 2 give me the same result but not 3. What is the difference?

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: Difference?
« Reply #1 on: 09 Jun 2011, 04:34:39 pm »
You mean, apart from the random number generator called 'Credit new'?

I'll look inti the maths tomorrow.

[edit]1 and 2 are identical because avarage is sum divided  by number of elements. as number of elements is identical they cancel each other out.

I need pen and paper for 3.
« Last Edit: 09 Jun 2011, 04:49:09 pm by Miep »
The road to hell is paved with good intentions

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: Difference?
« Reply #2 on: 09 Jun 2011, 04:45:32 pm »
Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: Difference?
« Reply #3 on: 09 Jun 2011, 04:54:06 pm »
yes, sorry I did understand it as a purely mathematical question of formulas and why they produce different results.
probably to do with the way the sums are done and in what order the errr. operations are performed. but I'll have to write it out on paper and have a close look.
The road to hell is paved with good intentions

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: Difference?
« Reply #4 on: 09 Jun 2011, 04:55:54 pm »
Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?

How much different ? with just a few numbers is it closer than if there are a lot ?  If it's a fractional difference there are several opportunities for accumulated sum and rounding errors, which can look like random results, and changing the order of computation like that can make a big difference. 

Keeping the division to a single operation at the last step will be far more accurate if you have many results, and there are ways to further improve the result accuracy by not summing long strings of numbers in a line too.  summing in blocks of SQRT(N), then summing those block results, minimises accumulated roundoff error in the sums (one way).

[Edit:] Looking at the third equation with that in mind, it would basically maximise the accumulated summing error by adding smaller values, so the error has more effect on the average, Also having  applied truncation to every element during the divisions ... So yeah, yuck

If you have trouble sleeping sometime you can read this:

What Every Computer Scientist Should Know about Floating Point Arithmetic, by David Goldberg
« Last Edit: 09 Jun 2011, 05:21:43 pm by Jason G »

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: Difference?
« Reply #5 on: 09 Jun 2011, 05:23:03 pm »
I first found out about it looking at thousands of results and thought about rounding errors. But then I took data from ten tasks to look closely.

With 10 decimal points accuracy for the separate credit / runtime operations the difference is already 0.0014 between the two methods for only 10 tasks. I don't think it could be a rounding error.

Edit: Thanks for the link!
« Last Edit: 09 Jun 2011, 05:30:06 pm by sunu »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: Difference?
« Reply #6 on: 09 Jun 2011, 05:38:09 pm »
Have a look at the section on summing error & see which answer you get if you use your first equation using Kahan Summation or similar, minimising the division to the one final one.  That would be the 'most right' answer, though there isn't any 'right' answer in floating point... They're all wrong!  :o  :D

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: Difference?
« Reply #7 on: 09 Jun 2011, 05:49:35 pm »
Ok, I took 3400 tasks. Difference is 0.0017 almost equal with the 0.0014 from 10 tasks. This can't be a rounding error.

I'll look at Kahan Summation.

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: Difference?
« Reply #8 on: 09 Jun 2011, 07:17:27 pm »
I think an equivalent everyday example would be:

You drive from A to B and you want to know  your average km/h. This is elementary school stuff:  distance / time

The next time you drive from A to B you make 4-5 stops in between for coffee. How do you calculate your average speed now? Do you add the distance and the time and divide them ( sumx / sumy ) or do you calculate your average speed from each segment and then calculate the average as a whole ( avg (x / y))?

The last method now seems goofy but why is it right or wrong? And is the difference just a rounding error or avg (x / y) calculates something different?
« Last Edit: 09 Jun 2011, 07:21:47 pm by sunu »

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: Difference?
« Reply #9 on: 09 Jun 2011, 07:59:37 pm »
On that second drive do you also have to figure in the restroom stops?   ::)

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: Difference?
« Reply #10 on: 09 Jun 2011, 09:04:51 pm »
Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?

Methods 1 and 2 give more weight to long-running tasks. Take two tasks, one which runs in 6 hours and gives 100 credits, another which runs in 2 hours and gives 40 credits. The six hours of the first task makes the 2 hours of the second task only 1/4 of the total time. So you get 17.5 credits/hour which is closer to the 16.7 c/h of the first task than the 20 c/h of the second.

But method 3 gives equal weight to the tasks no matter how quickly or slowly they run. So you get 18.333 c/h.

BOINC uses method 3 for its server-side averages, a 100 hour task is weighted the same as a 1 minute task...
                                                                       Joe

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: Difference?
« Reply #11 on: 10 Jun 2011, 01:54:25 am »
The last method now seems goofy but why is it right or wrong? And is the difference just a rounding error or avg (x / y) calculates something different?
Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?
But method 3 gives equal weight to the tasks no matter how quickly or slowly they run. So you get 18.333 c/h.

That's right they are different, nothing is goofy (except maybe me), because the order is important.   so it's a different calculation with or without precision issues.

#1:  sum(x) / sum(y) simplifies to the same as #2 by n/n,
#2: avg(x) / avg(y) is the ratio of two averages, which will weight by large x,
#3: avg(x / y), is the arithmetic mean of x/y , so likely the one you want,

but depending on what you want to achieve, if you want a more robust statistic you could possibly use the medians instead, or even truncated means to chuck out outliers.

« Last Edit: 10 Jun 2011, 02:28:27 am by Jason G »

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: Difference?
« Reply #12 on: 10 Jun 2011, 06:50:30 am »
Yes, "weight" seems the magic word here. After Josef's post I looked at various weighted means but still avg(x / y) doesn't look anything like them.

but depending on what you want to achieve, if you want a more robust statistic you could possibly use the medians instead, or even truncated means to chuck out outliers.

I just wanted to calculate the credit / sec output of my machine broken down to CPU, GPU, AP, MB etc. :)

As for the problem with the car above, the answer isn't as simple as I thought http://en.wikipedia.org/wiki/Harmonic_mean#In_physics

Well, I guess we need a professional statistician  :D

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: Difference?
« Reply #13 on: 10 Jun 2011, 06:57:01 am »
As for the problem with the car above, the answer isn't as simple as I thought http://en.wikipedia.org/wiki/Harmonic_mean#In_physics

Well, I guess we need a professional statistician  :D

Hahaha, Yep, Don't know about Joe but my statistics is certainly rusty.  If you intend to process a lot of results, do work with a general idea of the golden rules in mind with floating point as well, since anything that could compound tiny error in unexpected ways will change the result as well.

Jason

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: Difference?
« Reply #14 on: 10 Jun 2011, 09:54:01 am »
I do plain linear regression. mainly to prove that credit new is not linear ;D
0.188 credit/second on beta with some flavour of x37.

The road to hell is paved with good intentions

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 107
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 73
Total: 73
Powered by EzPortal