Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => Topic started by: VoidPilot on 07 Jul 2010, 05:44:22 am

Title: Seemingly Silly Question
Post by: VoidPilot on 07 Jul 2010, 05:44:22 am: I was trawlling through the seti board and saw soehting that caught my eye at the time. I started wondering about it much later on and my curiousity has been tweaked so i was wondering if anyone knew the answer to this.

Is it possible to get the GPU to process 2 seti WUs at the same time and if so, does it take each WU, say, 10% longer or 50% longer etc...

If this can be done, how does one do it

rgds

VP
Title: Re: Seemingly Silly Question
Post by: Jason G on 07 Jul 2010, 07:04:58 am: Quote from: VoidPilot on 07 Jul 2010, 05:44:22 am
Is it possible to get the GPU to process 2 seti WUs at the same time and if so, does it take each WU, say, 10% longer or 50% longer etc...

Short Answer 'Yes and No'.

In more detail, pre-Fermi Cuda capable cards are architected to do basically one thing at a time. Context switches (if doing multiple things at once) in older drivers & hardware, that usually fail for memory or other resource reasons, because of the physical memory model used by XP style drivers. The Fermi cards have hardware devoted to context switching of applications, at least in the WDDM ( vista/win7 ) device driver model, which enables multiple Cuda contexts to be run at once.

This new driver model allows each application to be isolated ( ignoring diver bugs) and so theoretically use the full resources of the card, by paging things in and out. That's an expensive process across the PCI exoress bus, so overloading the cards wouldn't be advised, however current Fermi applications don't use the whole card's resources. That means that some gains have been seem by running, on the Fermi's under the newer driver/OS, more instances can be run. For now, running two instances makes each task run slower, but the total throughput seems to increase by about 50% on GTX 480. I would expect, though, that figure to reduce greatly as we use more resources to speed thins up, and take advantage of greater capabilities in the Cuda framework.

That's acheived in Boinc using the anonymous platform, setting the number of GPUs needed to say 0.5. As I suggested, the value of doing this will likely reduce, but it is something that may (or may not) help in the short term if you have a Fermi card running with WDDM drivers. ( I doubt it will work on previous generation hardware/drivers)

Hope that helps, Jason
Title: Re: Seemingly Silly Question
Post by: Raistmer on 07 Jul 2010, 01:28:31 pm: And for pre-FERMI NV GPUs it was tested few times with negative result - total performance dropped.
Title: Re: Seemingly Silly Question
Post by: VoidPilot on 07 Jul 2010, 05:52:07 pm: J, R

thnx

VP
Title: Re: Seemingly Silly Question
Post by: hellsheep on 07 Jul 2010, 09:06:33 pm: I'm going to assume for any performance increase with the fermi's we require nVIDIA to release some new CUDA files of some sort? Maybe work on newer drivers too?
Title: Re: Seemingly Silly Question
Post by: Jason G on 08 Jul 2010, 01:38:08 am: Quote from: hellsheep on 07 Jul 2010, 09:06:33 pm
I'm going to assume for any performance increase with the fermi's we require nVIDIA to release some new CUDA files of some sort? Maybe work on newer drivers too?
Not quite the full picture, but tool & SDK refinement would probably help as things go on.

There are extra facilities in the newer Cuda libraries already that are designed to cram more concurrent processing onto the cards, and the hardware is being underutilised. We're not using everything yet, and only some portions use sufficient threads in the traditional cuda kernel sense. Only difficulties so far seem to be that the existing cuda apps are full of design flaws that need to be fixed first, which requires a deeper understanding of the multibeam algorithms than I had ever needed previously. That takes a long time (for me anyway), along with that the drivers/tools and techniques for programming in parallel are fundamentally more difficult, as there is less prior work to draw upon, and much less experience on platforms like this.

It won't require newer cuda libraries to extract more performance, but will take time. Step by step refinement in all areas, particularly reliability first, will see things stabilise in the right direction, and we can turn to making use of the resources for speed as all the components mature.

Jason
Title: Re: Seemingly Silly Question
Post by: hellsheep on 08 Jul 2010, 04:19:15 am: Ah thanks for that Jason.

Good to know that little bit of information.

Take all the time in the world you need. :) I know a lot of people here and over at SETI are working hard and trying to do their best. Long time or short time, it doesn't matter. The point is eventually it'll be working as we desire. :)
Title: Re: Seemingly Silly Question
Post by: Josef W. Segur on 08 Jul 2010, 01:41:34 pm: Quote from: hellsheep on 08 Jul 2010, 04:19:15 am
...
The point is eventually it'll be working as we desire. :)

LOL, infinite speed is a target which will never be reached ;D
Title: Re: Seemingly Silly Question
Post by: Gecko_R7 on 08 Jul 2010, 02:33:03 pm: Quote from: Josef W. Segur on 08 Jul 2010, 01:41:34 pm
Quote from: hellsheep on 08 Jul 2010, 04:19:15 am
...
The point is eventually it'll be working as we desire. :)

LOL, infinite speed is a target which will never be reached ;D

I'll settle just for a quantum-entanglement optimized application. :P
Anyone know how to write for qbit processing?
Title: Re: Seemingly Silly Question
Post by: hellsheep on 09 Jul 2010, 04:13:23 am: Quote from: Gecko on 08 Jul 2010, 02:33:03 pm
Quote from: Josef W. Segur on 08 Jul 2010, 01:41:34 pm
Quote from: hellsheep on 08 Jul 2010, 04:19:15 am
...
The point is eventually it'll be working as we desire. :)

LOL, infinite speed is a target which will never be reached ;D

I'll settle just for a quantum-entanglement optimized application. :P
Anyone know how to write for qbit processing?

Give me a moment i'll just call up my Vulcan friend. :P