Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.Compiled with CUDA 3000. --------CUFFT------- ---FFT-------------- ---IFFT------ N Batch Gflop/s GB/s error Gflop/s GB/s error Gflop/s error 8 131072 7.1 7.6 1.4 140.0 149.4 1.1 140.5 1.1 16 65536 16.1 12.9 1.7 183.1 146.5 1.0 183.7 1.0 64 16384 259.2 138.2 1.4 280.0 149.4 1.4 279.7 1.4 256 4096 352.2 140.9 1.4 352.8 141.1 1.5 352.0 1.5 512 2048 413.3 146.9 1.8 411.8 146.4 1.8 412.2 1.8Errors are supposed to be of order of 1 (ULPs).
Hi Heinz,Numbers come out different when you change to the same data set size that Multibeam apps use ( 1*1024*1024 complex data points).CUFFT is not very fast at the small sizes for that small amount of data. It gets better relatively as the FFT size goes up. I haven't optimised these custom ones (So they remain ~G80 GPU arranged), but did change the results to give in-otder output. Didn't need two-way, so made forward & inverse transforms instead.You can see CUFFT goes pretty slowly when doing many small transforms on our smaller dataset.Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.Compiled with CUDA 3000. --------CUFFT------- ---FFT-------------- ---IFFT------ N Batch Gflop/s GB/s error Gflop/s GB/s error Gflop/s error 8 131072 7.1 7.6 1.4 140.0 149.4 1.1 140.5 1.1 16 65536 16.1 12.9 1.7 183.1 146.5 1.0 183.7 1.0 64 16384 259.2 138.2 1.4 280.0 149.4 1.4 279.7 1.4 256 4096 352.2 140.9 1.4 352.8 141.1 1.5 352.0 1.5 512 2048 413.3 146.9 1.8 411.8 146.4 1.8 412.2 1.8Errors are supposed to be of order of 1 (ULPs).
Also note that it must be used on Visual Studio 2008 with Service pack 1 for now.