Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: Morten on 24 Sep 2011, 05:37:28 am

Title: AP tasks with deadline 7 & 8 of October stall host (HD 5970)
Post by: Morten on 24 Sep 2011, 05:37:28 am
This host: http://setiathome.berkeley.edu/results.php?hostid=5514874&offset=0&show_names=0&state=0&appid=5

has been crunching thousands of AP tasks with same setup (Catalyst 11.3, OpenCL 1.1 AMD-APP-SDK-v2.4-rc1 (595.9), 2xHD 5970, Win7 x64, ATI_r516.exe (upgraded to r521 but no change)), but a few days ago it simply halted.

Even with just one ap task it will load to gpu, then nothing. This renders host inaccessible with remote software. CPU tasks will progress a bit more before that too halts, so host must be booted in order to access host and GPUs again.

I downloaded Milkyway tasks to check if there was some hardware/Catalyst problems, but they all crunched like butter.

I have also upgraded to SDK 2.5 and CAL 11.9, but no change in behaviour.

Morten

Edit: There is  some change after all with SDK 2.5 and CAL 11.9: After a long halt, the computer becomes responsive again, but the ap task remains at zero progress.
Title: Re: AP tasks with deadline 7 & 8 of October stall host (HD 5970)
Post by: Josef W. Segur on 24 Sep 2011, 12:44:13 pm
This host: http://setiathome.berkeley.edu/results.php?hostid=5514874&offset=0&show_names=0&state=0&appid=5

has been crunching thousands of AP tasks with same setup (Catalyst 11.3, OpenCL 1.1 AMD-APP-SDK-v2.4-rc1 (595.9), 2xHD 5970, Win7 x64, ATI_r516.exe (upgraded to r521 but no change)), but a few days ago it simply halted.

Even with just one ap task it will load to gpu, then nothing.
...

Morten, there are a couple of validated tasks for that host which had Oct. 7 deadline, and too many "in progress" to guess which specifically have shown the problem. Can you provide a specific example? Ideally, compress one with 7zip, Rar, Zip or similar and put it someplace a tester could download it. Attach it here if you don't have a better download location. At minimum, the exact WU file name or a link to the project task or workunit page would allow somebody to get the file from Berkeley, though that's an extra burden on the download pipe.

Even with the specific WU perhaps nobody will be able to reproduce your problem, but it's the best starting point. I have no idea whether Milkyway tasks use the GPU similarly to S@H AP, so can't comment on that comparison.
                                                   Joe
Title: Re: AP tasks with deadline 7 & 8 of October stall host (HD 5970)
Post by: Morten on 25 Sep 2011, 06:38:18 am
Hey Joe,

It's strange that with latest drivers it will momentarily hang/driver restart as opposed to complete system hang with 11.3 SDK 2.4, albeit an improvement.

Starting all task at once will result in no task progressing.
Starting one task at a time will result in 2 tasks at zero progression, but all the others with normal progress. So 6 out of 8 tasks (2 tasks per core) with normal progression. I let this run for about 24hrs, and no more taks were hanging, and all were validated.

I'm going to let the two tasks stay hanging until the AP cache is empty, and see what will happen wit the two tasks.

I suspect one GPU core is having trouble with AP openCL calls, but have no idea what, and why this should hang all tasks when all tasks are started simultaneously

Morten