Ok, I revisited this problem and found out that I had incorrectly modified the INF file for the TCC driver. I now have the driver loading for my GT220 and CUDA programs running through Remote Desktop, which is fantastic.In short, these are the modifications I had to do to NVWD.inf from the TCC package:[NVIDIA_SetA_Devices.NTamd64.6.0]%NVIDIA_DEV.0A20.01% = Section001, PCI\VEN_10DE&DEV_0A20[NVIDIA_SetA_Devices.NTamd64.6.1]%NVIDIA_DEV.0A20.01% = Section002, PCI\VEN_10DE&DEV_0A20[Strings]NVIDIA_DEV.0A20.01 = "NVIDIA GeForce GT 220"
Looks like someone may have got the TCC model drivers to work with a GT220 card......may give this a go on the 465 and see what happenshttp://forums.nvidia.com/index.php?showtopic=159208QuoteOk, I revisited this problem and found out that I had incorrectly modified the INF file for the TCC driver. I now have the driver loading for my GT220 and CUDA programs running through Remote Desktop, which is fantastic.In short, these are the modifications I had to do to NVWD.inf from the TCC package:[NVIDIA_SetA_Devices.NTamd64.6.0]%NVIDIA_DEV.0A20.01% = Section001, PCI\VEN_10DE&DEV_0A20[NVIDIA_SetA_Devices.NTamd64.6.1]%NVIDIA_DEV.0A20.01% = Section002, PCI\VEN_10DE&DEV_0A20[Strings]NVIDIA_DEV.0A20.01 = "NVIDIA GeForce GT 220"
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe --driver-model=GPU 0 is not a supported TCC device, skipping
CUDA Device Query (Runtime API) version (CUDART static linking)There is 1 device supporting CUDADevice 0: "GeForce GTX 480" CUDA Driver Version: 3.20 CUDA Runtime Version: 3.20 CUDA Capability Major/Minor version number: 2.0 Total amount of global memory: 1576468480 bytes Multiprocessors x Cores/MP = Cores: 15 (MP) x 32 (Cores/MP) = 480 (Cores) Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Clock rate: 0.81 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threadscan use this device simultaneously) Concurrent kernel execution: Yes Device has ECC support enabled: No Device is using TCC driver mode: NodeviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 1, Device = GeForce GTX 480PASSED
OK, non-critical unless I make computation mistakes ( I was mostly concerned here to not make code slower...). Stock / x32f code there is doing something your GPU doesn't like IMO.Was that quadro 'integrated & using some portion of system memory ? or does it use dedicated memory ?
Yeah tried that compute mode thing too. With Fermi's we want 'Normal' mode anyway, so as to allow multiple instances .I've thought about it, and to bypass the issue altogether I'll try get another hard drive sorted sometime soon, and use it to dualboot to WinXP32. That way I can keep my snazzy Win7 dev environment, yet leave for extended period crunching under XPDM. I have XPx64 as well, but since 64 bit Cuda apps yield a net small slowdown, it seems illogical to use that copy for that.Jason
something must be changed, last Test5 above shows11.2 GFlops 45.3 GB/s 121.7ulps
Stock results on XP Pro x32 260.99 drivers:... PS+SuMx( 64) 4.4 GFlops 17.7 GB/s... 256 threads, fftlen 64: (worst case: full summax copy) 6.7 GFlops 27.2 GB/s 121.7ulps... 256 threads, fftlen 64: (best case, nothing to update) 8.7 GFlops 35.3 GB/s 121.7ulps