+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: x38g reports  (Read 152151 times)

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #30 on: 21 Jun 2011, 09:36:21 am »
Mike beat me to it. You have the same wingman on all three of those work units. He is running a 560TI and is apparently throwing out bad -9 results. Hope the next in line does better. You should get credit no problem on those.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #31 on: 21 Jun 2011, 12:21:20 pm »
Got one invalid result  http://setiathome.berkeley.edu/workunit.php?wuid=761506607  Not much to it, I found a pulse the other two wingmen didn't. It was when I was running the 0.38e flavor. Thought I would mention it just in case. It's the only invalid result I've got so far.

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: x38g reports
« Reply #32 on: 21 Jun 2011, 06:22:19 pm »

Just keep an eye on it perryjay.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #33 on: 21 Jun 2011, 09:53:13 pm »
Got one invalid result  http://setiathome.berkeley.edu/workunit.php?wuid=761506607  Not much to it, I found a pulse the other two wingmen didn't. It was when I was running the 0.38e flavor. Thought I would mention it just in case. It's the only invalid result I've got so far.

Yep, as mentioned on main, looks like the single, likely low power, pulse that you found, where the others didn't, would be simply due to the innaccurate old nVidia app chirp.  So it fits the expected pattern.  In science terms yours is 'more correct' of course, and would likely have matched a CPU app wingman strongly, but being ganged up on by 2 older apps that way is going to happen during the transition period.

Jason

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: x38g reports
« Reply #34 on: 22 Jun 2011, 12:26:41 am »
Got one invalid result  http://setiathome.berkeley.edu/workunit.php?wuid=761506607  Not much to it, I found a pulse the other two wingmen didn't. It was when I was running the 0.38e flavor. Thought I would mention it just in case. It's the only invalid result I've got so far.

Yep, as mentioned on main, looks like the single, likely low power, pulse that you found, where the others didn't, would be simply due to the innaccurate old nVidia app chirp.  So it fits the expected pattern.  In science terms yours is 'more correct' of course, and would likely have matched a CPU app wingman strongly, but being ganged up on by 2 older apps that way is going to happen during the transition period.

Jason

The one reported pulse doesn't fully explain the invalid judgement, since "weakly similar" merely needs half the signals to match. The task was VHAR, so there should have been a best_gaussian with all zero values, that's a gimme match. The reported pulse would be repeated as best_pulse, and if the difference were due to it being only a tiny bit above threshold that best_pulse should match the others close enough. And finally there would be a best_spike. IOW 1 dodgy pulse could have easily had 3 acceptable best_* signals to yield weakly similar.  To get invalid 3 of the 4 must not have found a match in the other results.

OTOH, we have no way of knowing the result file didn't get corrupted server-side or something like that. However, I'd expect some indication from other users of similar problems in that case. It's a puzzle which cannot be solved now, just watch to see if it happens again with x38g.

The one on http://setiathome.berkeley.edu/workunit.php?wuid=762393888 is a loss as far as analysis goes, there's no stderr information from x38g.
                                                           Joe

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #35 on: 22 Jun 2011, 01:41:25 am »
OTOH, we have no way of knowing the result file didn't get corrupted server-side or something like that. However, I'd expect some indication from other users of similar problems in that case. It's a puzzle which cannot be solved now, just watch to see if it happens again with x38g.

Hmmm, the missing stderr information to me indicates a few possibilities.  Either the improved exit code is not functioning as designed (due to system specific issue or other problem in the code itself), there is a communication issue of some sort (I suppose the server load could have some part there), or indeed the server itself lost that information.   I've seen no indication that result files wouldn't follow the same behaviour as stderr contents.

I'm finding that as the cuda app issues get rarer, they do get harder to diagnose when they appear.  One thing that is noticeable is that users are finding their errors & inconclusives more quickly now that the web pages display in categorised form  ;D

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #36 on: 22 Jun 2011, 09:59:06 am »
I noticed the missing stderr not only for my result but also one of the others as well. I didn't think it would do you much good that way but from Jason's comment I guess I should have mentioned it here too.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #37 on: 23 Jun 2011, 11:01:23 am »
Well, I woke up this morning to another downclocking. I had noticed last night a general sluggishness to my computer but decided not to reboot. Guess I should have. I tend to leave everything running when I quit the computer so I would guess it just built up until something had to give. I don't think it was downclocked for very long so I didn't lose too much. After a reboot everything is back and running good.


EDIT  I spoke too soon. It down clocked again. I've rebooted again and it is back up to where it is supposed to be. Guess I will see if it will hold this time. Gotta go cut the grass so I will be away for about an hour. Hope it doesn't go down in that length of time.
« Last Edit: 23 Jun 2011, 11:40:03 am by perryjay »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #38 on: 23 Jun 2011, 11:41:37 am »
Can you catch a task name that's in progress when it does it next time ?  When the result is uploaded we could then see if the stderr says anything useful.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #39 on: 23 Jun 2011, 12:48:26 pm »
This is one that took forever, not sure if it's the one you want.  http://setiathome.berkeley.edu/workunit.php?wuid=765100017

Here's another one that completed and validated  http://setiathome.berkeley.edu/workunit.php?wuid=765100083


It seemed to effect my CPU times too but that is hard to tell for sure. This one http://setiathome.berkeley.edu/workunit.php?wuid=766957670 seemed to be way too long. I was finishing a couple of APs at the time it happened so I don't have many CPU tasks done . Since the APs were within an hour or so of completion it didn't effect their runtime by much and I don't know exactly what time it happened the first time.
« Last Edit: 23 Jun 2011, 01:04:34 pm by perryjay »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #40 on: 23 Jun 2011, 01:33:01 pm »
Thanks,
   Clearly those runtimes indicate something freaked out.  Despite that, there's no visible indication in stderr apart from the excessive runtime on the task report itself, which means I'll need to instrument every kernel launch to find out what's happening.  That will take a few days to go through the whole code, then if you;re agreeable I'll drag you into the dev area to pin down the exact point(s) of downclock.  I'll do so by using a build instrumented to check for kernel errors and subsquently print the brand new, presumably downclocked, clock speed after the point of failure.   

Can you confirm (once again) that these are 'sticky downclocks' requiring  a reboot to clear ?

Jason

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #41 on: 23 Jun 2011, 01:37:06 pm »
 :o  The, the dev area????  Can I bring a gun?

Yes, they needed a reboot. Well, at least the first one did. I just went ahead and rebooted when I saw the one today. Figured it was the easiest way to get going again fast. So far this time everything is running okay again now.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #42 on: 23 Jun 2011, 01:43:05 pm »
:o  The, the dev area????  Can I bring a gun?

Yes, they needed a reboot. Well, at least the first one did. I just went ahead and rebooted when I saw the one today. Figured it was the easiest way to get going again fast. So far this time everything is running okay again now.

OK, but we won't wait for it to downclock again to try something.

Please swap in the attached, deliberately slightly dialled back for diagnostic purposes, build, while I spend the next few days instrumenting the code.  If this one doesn't initiate downclocks on the card in the meantime, then it'll add some possibilities to the investigation, directing me to optimise a particular piece of code I've been hesitant to touch so far (so that part remained stock until this dialled back build). 

(x39c, dialled back build attached for diagnostic purposes)

[Edit:] Old build removed.  Please use the updated x39d build at:
http://lunatics.kwsn.net/12-gpu-crunching/x38g-reports.msg39407.html#msg39407
« Last Edit: 27 Jun 2011, 03:02:30 am by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: x38g reports
« Reply #43 on: 23 Jun 2011, 01:53:23 pm »
FYI:  there is an easy way to swap in builds if you're confused by the app_info.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: x38g reports
« Reply #44 on: 23 Jun 2011, 01:57:55 pm »
Okay, just to be sure, where does it go? Do I just replace all instances of  <file_name>Lunatics_x38g_win32_cuda32.exe</file_name> in the app info or do I need to put it somewhere else?

Easy way? What's that? I've never seen such a thing. Nothing is easy for a n00b like me!   ;D

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 4
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 65
Total: 65
Powered by EzPortal