Seti@Home optimized science apps and information
		Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: Devaster on 05 Jun 2007, 08:40:43 am
		
			
			- 
				still in early alpha .
 
 now i have some problems with analyzepot function - it seems that it stay in unending loop .... ???
 but i havent doing any changes in this part ...
- 
				Maybe the data structures it expects aren't arriving/are invalid/are different than it expects because of changes in code that you made up until now? 
 
 It may be helpful to compare the data arrays from a non-GPU app to yours - at least that's where I'd start.
 
 Regards,
 Simon.
- 
				it had been found a bug  in Rapidmind glsl backend when accessing a array member trough random access lookup  
 
 my code is affected too ...
- 
				Hello alll,
 
 are there any news about the GPU Client or is the project canceled? Edit: Or are the efforts are now concentrated on DX 10 and CUDA?
 
 Any information about the status would be nice since i would really like to use my little 6600, 7600 and 7900 for Seti.  :)
 
 Greetings to everyone
 
 Karsten
- 
				development is transferred partially to CUDA but now is paused - no CUDA driver under Vista and i am too lazy reinstall sys ....
 but i still work at RM version ....
- 
				good news  :)
			
- 
				Nice to hear that :)
			
- 
				Having made a fresh XP install, and getting the CUDA SDK examples to compile, I'm having problems getting the basic Seti app to compile.
 
 I can get the BOINC source to compile, and most of the Seti source compiles, but it falls down on a type that appears to be GNU C specific (or at least not implemented in Visual Studio).
 
 Can anyone give me a pointer (pun not intended) as to where I'm going wrong?
 
 Background setup info:
 Installed:
 MS Visual Studio C++ Express*
 Platform SDK
 wxWidgets
 
 BOINC source was the latest HEAD obtained through SVN, I've tried the seti source from both august 1st and 2nd nightlies.
 
 * I've read a lot about needing the Intel compiler and libraries, and compiling in other compilers. Am I on a road to nothing by using VSC++EE? If so, is there a "recommended" setup that will compile the sources?
 
 Why am I doing this? I plan to try out some of the research I did a few years ago about offloading processing onto FPGAs to see if it's valid in this context, and for personal curiosity.
- 
				Hi Christofire,
 
 Compiling S@H with VS 2005 by Urs Echternacht (http://lunatics.at/windows/visual-studio-2005-compatibility-issues.msg544.html#msg544)
 
 Compiling with VS "Orcas" (currently in Beta) (http://lunatics.at/windows/sources-with-orcas.msg4222.html;topicseen#msg4222)
 
 These may help :)
 
 Also, ICC and IPP (the compiler/lib package the KWSN apps use if available for the target platform) are available for free as non-commercial versions.
 
 Regards,
 Simon.
- 
				Simon,
 
 I apologise for not finding those with the search. My google-fu must be weak.
 
 I didn't realise the ICC and IPP were available free(ish). I'll continue trying with VS for now as I've experience of using that, and I've got the CUDA SDK samples to compile (and run).
 
 Many thanks for the help.
 
 Chris.
- 
				CUDA based client ...
 
 [attachment deleted by admin]
- 
				How it install?
			
- 
				only for standalone testing !!!! not for real crunching !!!!
			
- 
				Well, standalone testing shows:
 no result.sah generated on secondary and so on time.
 stderr.txt contains
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 229
 
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 229
 
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 229
 
 
 At very first time program was run situation was different. There was result.sah (but only header, could not pass validation) and stderr.txt contained:
 Can't set up shared mem: -1
 Work Unit Info:
 ...............
 WU true angle range is :  0.604884
 Can't set up shared mem: -1
 Work Unit Info:
 ...............
 WU true angle range is :  0.604884
 Can't set up shared mem: -1
 Work Unit Info:
 ...............
 WU true angle range is :  0.604884
 
 Test was performed on testWU-1.wu from KWSN test pack. In that very first time stdout was
 
 Device name: GeForce 8800 GTS
 Total global memory: 639 MB
 Shared memory per block: 16 kB
 Registers per block: 8192
 Warp size: 32
 Max threads per block: 512
 Shaders clock rate: 1188 MHz
 Generated FFT plans
 Calculated FFT on GPU fftlen:8 batch size:131072
 After PowerSpectrum & pulsefind on CPU ...
 
- 
				only for standalone testing !!!! not for real crunching !!!!
 
 i know
 but....
- 
				What is necessary for start CUDA based client?
			
- 
				http://developer.nvidia.com/object/cuda.html
 
 For Win32XP  -  http://developer.download.nvidia.com/compute/cuda/1_0/windows/toolkits/NVIDIA_CUDA_Toolkit_1.0.exe
- 
				Same thing in stderr.txt here:
 
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 229
 
 
- 
				First run on WinXP x86.
 no that error messages but only header in result.sah. See attachment.
 
 
 [attachment deleted by admin]
- 
				WinXP 32
 
 FFT failed....
 
 Device name: GeForce 8800 GTS
 Total global memory: 319 MB
 Shared memory per block: 16 kB
 Registers per block: 8192
 Warp size: 32
 Max threads per block: 512
 Shaders clock rate: 1188 MHz
 Generated FFT plans
 Calculated FFT on GPU fftlen:8 batch size:131072
 After PowerSpectrum & pulsefind on CPU ...
 FFT failed
 
 
 
 [attachment deleted by admin]
- 
				Not worked in Vista32
 
 stderr:
 Can't set up shared mem: -1
 There is no device supporting CUDA.
- 
				CUDA is not supported under vista ...
 
 try this under winxp32 ...
 
 [attachment deleted by admin]
- 
				CUDA is not supported under vista ...
 
 I know.
- 
				try this under winxp32 ...
 
 New version WORKED!!!!!!!!
 ;D ;D ;D
 
 [attachment deleted by admin]
- 
				showing time in ms spent in gpu - upload data ,calc fft and download data;
 
 pls send some average numbers from dos box + stderr.txt ....
 
 thanx
 
 MUST USING DRIVER VERSION 169.09 DUE DIFFERENT INTEGRATION OF CUDA IN DRIVER  - compiled with final CUDA 1.1
 STILL NOT VALID AND ONLY TECHNOLOGY TEST !!!!
 
 [attachment deleted by admin]
- 
				New data from new version.
 
 [attachment deleted by admin]
- 
				nice numbers - average 16 msec against 70 msec of mine (8500GT)
			
- 
				please use new 169.21 driver for CUDA 1.1 (http://www.nvidia.com/object/winxp_169.21.html)
			
- 
				please use new 169.21 driver for CUDA 1.1 (http://www.nvidia.com/object/winxp_169.21.html)
 
 Hmm...
 I read release notes for 169.21
 There is no CUDA...
 
 And this page (http://www.nvidia.com/object/cuda_get.html#windows)  recommends 169.09 driver for CUDA 1.1...
 
 Why 169.21 ???
- 
				its preffered on developer forums by head of the CUDA team - mfatica
			
- 
				New data with new 169.21 videodrivers.
 
 [attachment deleted by admin]
- 
				HEY YA !
 
 this version produces correct results with small differences
 this is from cuda app :
 <best_spike>
 <peak_power>22.561671618532</peak_power>
 <mean_power>1</mean_power>
 <time>2451239.5780686</time>
 <ra>8.3192064227348</ra>
 <decl>27.898178773406</decl>
 <q_pix>0</q_pix>
 <freq>1418974011.7011</freq>
 <detection_freq>1418983759.2021</detection_freq>
 <barycentric_freq>0</barycentric_freq>
 <fft_len>131072</fft_len>
 <chirp_rate>-0.90022989063243</chirp_rate>
 <rfi_checked>0</rfi_checked>
 <rfi_found>0</rfi_found>
 <reserved>0</reserved>
 </best_spike>
 and this is from default app :
 <best_spike>
 <peak_power>22.561794281006</peak_power>
 <mean_power>1</mean_power>
 <time>2451239.5780686</time>
 <ra>8.3192064227348</ra>
 <decl>27.898178773406</decl>
 <q_pix>0</q_pix>
 <freq>1418975203.0489</freq>
 <detection_freq>1418975184.9249</detection_freq>
 <barycentric_freq>0</barycentric_freq>
 <fft_len>131072</fft_len>
 <chirp_rate>-0.90022989063243</chirp_rate>
 <rfi_checked>0</rfi_checked>
 <rfi_found>0</rfi_found>
 <reserved>0</reserved>
 </best_spike>
 as you see there are very small differencs in float numbers - different rounding ???
 
 an other data are same
 
 [attachment deleted by admin]
- 
				Are the float/rounding differences enough to cause validation problem?
			
- 
				<peak_power>22.561671618532</peak_power>
 <peak_power>22.561794281006</peak_power>
 
 <freq>1418974011.7011</freq>
 <freq>1418975203.0489</freq>
 
 <detection_freq>1418983759.2021</detection_freq>
 <detection_freq>1418975184.9249</detection_freq>
 
 Small???
 :) :) :)
- 
				yeah this is small to compare with  totally different previous results .... ;D
 and yeah this problem by validation ....
- 
				new binary : 
 
 - added a GPU PowerSpectrum - based on work by Hans Dorn ....
 - optimised grid,block and thread scheduling over GPU for POwerSpectrum ....
 - using pre-generated FFT plans ...
 
 
 TECHNOLOGY PREVIEW !!! TESTING VERSION !!!!
 
 [attachment deleted by admin]
- 
				Already testing
			
- 
				wow!
 
 faster than previous?
- 
				okay now new code :
 
 important !!!! now its validated !!!!!!!!!!
 ------------ 
 sahcuda.exe / testWU-1.wu :
 Started at  : 17:57:49
 Ended at    : 18:17:08
 Elapsed time: 1159 seconds
 Speedup     : -114.63%
 Ratio       : 0.47 x
 
 Result      : strongly similar.
 speed is compared against last cruncher app
 
 [attachment deleted by admin]
- 
				New data with new version.
 
 [attachment deleted by admin]
- 
				okay now new code :
 
 important !!!! now its validated !!!!!!!!!!
 ------------ 
 sahcuda.exe / testWU-1.wu :
 Started at  : 17:57:49
 Ended at    : 18:17:08
 Elapsed time: 1159 seconds
 Speedup     : -114.63%
 Ratio       : 0.47 x
 
 Result      : strongly similar.
 speed is compared against last cruncher app
 
 Seriously ?
 ;D
- 
				try in knabench and will see
			
- 
				I can not understand how use knabench.....
 not user-friendly packet...
 
 
 = Knabench 1.43 W32-W64 02/12/2007 by Kna + Simon =
 = mods: quick timetable, stderr, speedup/ratio    =
 
 7 testWU(s) found
 └─(testWU-1.wu)
 └─(testWU-2.wu)
 └─(testWU-3.wu)
 └─(testWU-4.wu)
 └─(testWU-5.wu)
 └─(testWU-6.wu)
 └─(testWU-7.wu)
 
 1 reference science app(s) found
 └─(default-515.exe)
 
 0 science app(s) found
 
 ======================================
 
 Stopping Boinc ...
 System error 1060 has occurred.
 
 The specified service does not exist as an installed service.
 
 ------------
 Running app : default-515.exe with
 with WU     : testWU-1.wu
 Started at  : 21:20:37
 Ended at    : 21:28:14
 Elapsed time: 455 seconds
 ------------
 Не удается найти указанную метку пакетного файла - NOSCAPPS
 Running app : !refapp1! with
 with WU     : !wunbr2!
 Started at  : !time_tmp!
 "!refapp1!" не является внутренней или внешней
 командой, исполняемой программой или пакетным файлом.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Не удается найти C:\Documents and Settings\Oleg\Desktop\KWSN Knabench 1.43\KWSN
 Knabench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with
 with WU     : !wunbr3!
 Started at  : !time_tmp!
 "!refapp1!" не является внутренней или внешней
 командой, исполняемой программой или пакетным файлом.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Не удается найти C:\Documents and Settings\Oleg\Desktop\KWSN Knabench 1.43\KWSN
 Knabench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with
 with WU     : !wunbr4!
 Started at  : !time_tmp!
 "!refapp1!" не является внутренней или внешней
 командой, исполняемой программой или пакетным файлом.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Не удается найти C:\Documents and Settings\Oleg\Desktop\KWSN Knabench 1.43\KWSN
 Knabench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with
 with WU     : !wunbr5!
 Started at  : !time_tmp!
 "!refapp1!" не является внутренней или внешней
 командой, исполняемой программой или пакетным файлом.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Не удается найти C:\Documents and Settings\Oleg\Desktop\KWSN Knabench 1.43\KWSN
 Knabench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with
 with WU     : !wunbr6!
 Started at  : !time_tmp!
 "!refapp1!" не является внутренней или внешней
 командой, исполняемой программой или пакетным файлом.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Не удается найти C:\Documents and Settings\Oleg\Desktop\KWSN Knabench 1.43\KWSN
 Knabench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with
 with WU     : !wunbr7!
 Started at  : !time_tmp!
 "!refapp1!" не является внутренней или внешней
 командой, исполняемой программой или пакетным файлом.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Не удается найти C:\Documents and Settings\Oleg\Desktop\KWSN Knabench 1.43\KWSN
 Knabench 1.43\science_apps\reference\*.sah
 ------------
- 
				Since I dont have a cuda card I cant test it out but I do have one question... Is CPU usage still 100% for this app?
 
 ~BoB
- 
				new binary : 
 
 - added a GPU PowerSpectrum - based on work by Hans Dorn ....
 - optimised grid,block and thread scheduling over GPU for POwerSpectrum ....
 - using pre-generated FFT plans ...
 
 TECHNOLOGY PREVIEW !!! TESTING VERSION !!!!
 
 
 Do you think you will be able to optimize for GPU use of pulse_find, GaussFit etc?
 
- 
				bob: yes stil 100% CPU usage not all things are on gpu and i dont use for now async acces .... 
 
 for now i am working on chirp routine ....
- 
				Out of curiosity, what RAC would you expect to get from say a Geforce 8800 series card?
 
 Cheers.  ;)
 
- 
				I learned to run Knabench  :D
 
 ....but received a very very strange results:
 
 WinXP 32. testWU-1 - testWU-7
 C2D E6600 (2.4GHz), default-515.exe, one core in knabench vs 8800GTS 320Mb and last sahcuda.exe
 
 1 - all 7 results - DIFFERENT!  :(
 2 - 8800GTS slower than one core E6600!!!  :-[
 
 This is as it should be?
 
 
 Quick timetable
 
 WU : testWU-1.wu
 default-515.exe : 304 seconds
 sahcuda.exe : 499 seconds
 Speedup: -64.14%, Ratio: 0.61 x
 
 WU : testWU-2.wu
 default-515.exe : 496 seconds
 sahcuda.exe : 590 seconds
 Speedup: -18.95%, Ratio: 0.84 x
 
 WU : testWU-3.wu
 default-515.exe : 541 seconds
 sahcuda.exe : 657 seconds
 Speedup: -21.44%, Ratio: 0.82 x
 
 WU : testWU-4.wu
 default-515.exe : 125 seconds
 sahcuda.exe : 123 seconds
 Speedup: 1.60%, Ratio: 1.02 x
 
 WU : testWU-5.wu
 default-515.exe : 499 seconds
 sahcuda.exe : 596 seconds
 Speedup: -19.44%, Ratio: 0.84 x
 
 WU : testWU-6.wu
 default-515.exe : 823 seconds
 sahcuda.exe : 943 seconds
 Speedup: -14.58%, Ratio: 0.87 x
 
 WU : testWU-7.wu
 default-515.exe : 361 seconds
 sahcuda.exe : 376 seconds
 Speedup: -4.16%, Ratio: 0.96 x
- 
				something wrong is on your computer ...... :o
 
 see there :http://setiathome.berkeley.edu/result.php?resultid=681495948 - this is one real work unit crunched with last aplication .....
 
 and yes its still slower than any CPU version ....
 
- 
				Have to reinstall Windows  :(
			
- 
				lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....
			
- 
				and yes its still slower than any CPU version ....
 
 
 Strangely....
 
 I always thought that Nvidia 8 Series faster than Intel C2D
 
 http://en.wikipedia.org/wiki/FLOPS
 "As of 2007, the fastest PC processors perform over 30 GFLOPS.[8] GPUs in PCs are considerably more powerful in terms of pure FLOPS. For example, in the GeForce 8 Series the nVidia 8800 Ultra performs around 576 GFLOPS  on 128 Processing elements. This equates to around 4.5 GFLOPS per element, compared with 2.75 per core for the Blue Gene/L. It should be noted that the 8800 series performs only Single precision calculations, and that while GPUs are highly efficient at calculations they are not as flexible as a general purpose CPU."
 
 And Nvidia promises that the new card (GeForce 9800) will be even faster. 1 or 3 (!!!!) Tflops.... http://www.nordichardware.com/index.php?news=1&action=more&id=6911
 
 I understand that this performance is not at all the tasks...
 Perhaps the algorithm sahcuda can optimize computing?
 seti_britta mathematician  :)
 It can help?  ;D
- 
				lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....
 
 8500 - 16/16 processors
 8800 GTS - 96/96 processors
- 
				something wrong is on your computer ...... :o
 
 
 Again, I launched the knabench.
 Now all results - strongly similar  :o
 
 [attachment deleted by admin]
- 
				this code is not optimized ... there are a lot mem transfers that can be avoided for example  and so on ... next there is  mixed the CPU and GPU code in 95:5 .... and not used async access to device ....
 
 first it mus be validated then optimized
- 
				Mimo,
 
 In what order does clock speed impact GPU performance as far as S@H is concerned?  CPU clock, memory clock, shader clock?
 Also, do I understand correctly that the G92 8800GT has 12 FPU processors in the GPU?
 Do the shaders provide any benefit?
 
 Sorry for the questions. Just tying to understand this better.
 
 
 
- 
				yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction  - you may imagine that your cpu has 16*4 cores ....
 instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
 cache is extremely effective in contignous reads/writes - called coalescing
 
 clocks speed havent so great impact on gpu performance as count of shaders
- 
				yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction  - you may imagine that your cpu has 16*4 cores ....
 instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
 cache is extremely effective in contignous reads/writes - called coalescing
 
 clocks speed havent so great impact on gpu performance as count of shaders
 
 
 
 Thanks Mimo! Very interesting.  So, higher shader count and faster shader clock will actually have better impact on crunching speed/potential for our purposes?  In the case of a new G92 based 8800GT, 112 stream processors, each that can process 4 floats in 1 instruction.  Wow!  The interest in this becomes very clear.  G80/G92 stream processors are scaler units, not vector processors?
 
- 
				more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf
 
 no shaders are strictly vectorized , massive pararelizm is implemented on hw....
- 
				more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf
 
 no shaders are strictly vectorized , massive pararelizm is implemented on hw....
 
 
 Thanks!
 
- 
				new app....
 
 - added GPU data chirping - if chirp rate=0 then copy else calculate ....
 
 TRY IN KNABENCH and post please actual file from testdatas directory (filename is as date)....
 
 [attachment deleted by admin]
- 
				What is data chirping?
			
- 
				read here : http://seticlassic.ssl.berkeley.edu/about_seti/about_seti_at_home_1.html
			
- 
				New data from new version....
 Hmmm...
 Result weakly similar
 Its ok?
 
 [attachment deleted by admin]
- 
				at me at home weakly similar too....
 
 maybe loosed precision ... but i dont know where , all math is fully single precision and all double math is doing on CPU .....
 
 :-\ >:(
- 
				please any theroy on the next thread please
 
 here i will place only binaries ...
- 
				ok, im at a loss as to how to get the client to run.  I dropped i tinto the folder with the WU's, and edited the app_info.xml
 
 perhaps there is some fine point im missing
- 
				ok, im at a loss as to how to get the client to run.  I dropped i tinto the folder with the WU's, and edited the app_info.xml
 
 perhaps there is some fine point im missing
 
 
 You can download KWSN Knabench 1.43 Benchmark Package from the download section of this site. Then follow the advice in readme.
 
 I tested the latest version. The results is  different from the optimised CPU application.
 
 During the test, there is a vast display of the FFT time, it makes the PC not very responsive. Is it intended? The GPU speed is very slow, is it due to the display?
 
- 
				ok, im at a loss as to how to get the client to run.  I dropped i tinto the folder with the WU's, and edited the app_info.xml
 
 perhaps there is some fine point im missing
 
 Copy sahcuda.exe to \KWSN Knabench 1.43\Science_apps\ and run Knabench-1.43.cmd
- 
				here i will place only binaries ...
 
 When the wait for the new version?
- 
				maybe after xmas
			
- 
				okay, here it is ...
 
 - part of find spike on gpu ...
 
 [attachment deleted by admin]
- 
				testing....
 
 Ended at    : 01:31:33
 Elapsed time: 348 seconds
 Speedup     : -32.32%
 Ratio       : 0.76 x
 Result      : weakly similar.
 ------------
 
 Collecting hardware / OS infos, please wait...
 Sorting ...
 
 Bench results file 11.01.2008-131-HOME-0C501089AC-bench.txt
 stored in Knabench\Testdatas\ directory.
 
 Quick timetable
 
 WU : testWU-1.wu
 setiathome_5.27_windows_intelx86.exe : 263 seconds
 sahcuda.exe : 348 seconds
 Speedup: -32.32%, Ratio: 0.76 x
 
 PS
 Im start testing all 7 WUs
- 
				what card you have ?
 
- 
				Tried new app on 8 wu's.
 
 Asus EN8800GT 256 MB card.
 P4 3.0 ghz HT off.
 
 Testdata file enclosed.
 
 
 
 [attachment deleted by admin]
- 
				7 WUs
 8800 GTS 320
 
 [attachment deleted by admin]
- 
				new code ....
 
 - FindSpike fully on GPU - now result  only one best spike instead whole array - removed cpu analyze loop
 
 (for info - GPU code is now 400kb big ;D)
 
 [attachment deleted by admin]
- 
				Tried latest app.
 Performance about the same as previous app.
 Close to stock performance at some angle range values.
 
 
 [attachment deleted by admin]
- 
				New data.
 
 All results - weakly similar or different   :(
 
 [attachment deleted by admin]
- 
				Placed sahcuda.exe under KWSN Knabench 1.43\Science_apps
 Started Knabench-1.43.cmd
 Then....
 ------------
 Running app : default-515.exe with -nographics
 with WU     : testWU-1.wu
 Started at  : 02:19:54
 Access is denied.
 Ended at    : 02:25:50
 Elapsed time: 356 seconds
 Result      : stored as ref for validation.
 ------------
 Running app : sahcuda.exe with -verbose
 with WU     : testWU-1.wu
 Started at  : 02:25:50
 
 And here the fun ended with message for sahcuda.exe - Unable to Locate component:
 "The application has failed to start because cufft.dll was not found. Re-installing the application may fix this problem."
 
 I have not installed cuda 1.0 or beta 1.1 <--- do I need to do that?
 E6600
 2 GB ram
 Asus GTX 8800 with bios 60.80.08.00.44
 driver 6.14.11.6921 date 05.12.2007
 
 Devaster please beat my guess in this post: (23. mar 2008)http://setiathome.berkeley.edu/forum_thread.php?id=35342
 
 [thumbs up]
 
 q:
 1 Access is denied. ?
 2 Do I need to install cuda
 3 Anything else I'm doing wrong
- 
				
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 239
 
 
 Got the same error for all apps...
 
 win xp
 8600GTS
 169.29 drivers
- 
				"The application has failed to start because cufft.dll was not found. Re-installing the application may fix this problem." 
 
 I have not installed cuda 1.0 or beta 1.1 <--- do I need to do that?
 
 q:
 1 Access is denied. ?
 2 Do I need to install cuda
 3 Anything else I'm doing wrong
 
 
 Actually you need to have all needed DLLs of that CUDA runtime consist in place searchable by OS for DLL (simple saying in LIB_PATH). If you use CUDA only for SETI probably better way to put that DLLs together with exe file. When I tried CUDA few months ago the needed DLLs list was cuda.dll cudart.dll cufft.dll cutil32.dll (maybe slightly redundant).
 
 
- 
				my mistake - i havent add a cufft.dll to the package.
 
 other libraries you dont need - they are in the driver already included ...
 
 if you wanna install  CUDA use only v.1.1 -  app is compiled with this version and cufft.dll(1.1) is about 2x speeder on CUDA devices v. 1.1 as cufft.dll from v. 1.0 ....
- 
				Installed cuda beta 1.1 and sahcuda did run.
 The access denied message also disapperede, dunno why.
 Not sure but can it be that Knabench need boinc to run seti as a service?
 
 As seen in Bench result file, there is some debug info witch I belive is coming from that I do not run seti as a service.
 
 Also I got a console message '.\tools\wu_time' is not recognized as an internal or external command.
 
 IF massive testing is needed I will suggest that a conistent howto should be written, and needed dll's should be included.
 
 Until next time - have a nice day :)
 
 [attachment deleted by admin]
- 
				Not sure but can it be that Knabench need boinc to run seti as a service?
 
 bench doesnt start anything as service. It simple calls sequentally all exe files found in 2 subdirectories and logs run time for that exe. And what it does it stops boinc service at beginning and starts it at ending of test.
 
 Also I got a console message '.\tools\wu_time' is not recognized as an internal or external command.
 
 When I checked my PC with antiviruses one of them suspected this tool. Maybe your antivirus blocks its execution too? Anyway wu_time.exe is not needed for main objective of bench - to log execution times. So you could just delete the exe and ignore all errors that arise from that.
 
- 
				new code (cufft.dll v1.1 included)...
 
 
 - PowerSpectrum transpose moved to the GPU ....
 - eliminated one GPU to CPU memory transfer ....
 
 [attachment deleted by admin]
- 
				New results
 
 [attachment deleted by admin]
- 
				Results from latest client.
 
 
 [attachment deleted by admin]
- 
				When I checked my PC with antiviruses one of them suspected this tool. Maybe your antivirus blocks its execution too? Anyway wu_time.exe is not needed for main objective of bench - to log execution times. So you could just delete the exe and ignore all errors that arise from that.  
 Thx Raistmer I will check it out, or just as you suggest ignore it :)
- 
				As Raistmer indicate, yes my AVG do tell that wu_time.exe is infected with worm/Sohanad.F and from that AVG decide to block access to the file.
 
 I updated my GPU bios from last time. Test result from last app for Devaster attached.
 
 Result      : weakly similar.
 Result      : weakly similar.
 Result      : DIFFERENT.
 Result      : weakly similar.
 Result      : DIFFERENT.
 Result      : weakly similar.
 Result      : weakly similar.
 Speedup: -198.21%, Ratio: 0.34 x
 Speedup: 9.02%, Ratio: 1.10 x
 Speedup: 12.09%, Ratio: 1.14 x
 Speedup: -34.23%, Ratio: 0.74 x
 Speedup: 8.70%, Ratio: 1.10 x
 Speedup: 23.23%, Ratio: 1.30 x
 Speedup: 26.87%, Ratio: 1.37 x
 
 
 [attachment deleted by admin]
- 
				i see that in some cases is GPU application better than stock app .... :o
 
 interesting ....
- 
				i see that in some cases is GPU application better than stock app .... :o
 
 interesting ....
 
 
 Hm...
 This is 8800GTX  :)
- 
				There are too many variables to make an accurate assessment of the true speedup of GPU client.
 When I use default-515.exe as the reference client I also get Speedup Ratio  greater than 1.00 for some WU's.
 Devaster what source files are you using to base your GPU client on, and what optimizations are done?
 If you compiled a reference client to be used against your GPU client then most of the variables would
 be known.
 
 
- 
				at home by tests i am using latest optimized app from  cruncher page and i have 8500GT only ....
 
 remember , for now aren't in GPU code optimizations (shared mem usage ,memory coalescent access by read/write, optimal thread/block scheduling against core, ). used is only partial loop unroll ...
- 
				last weekend i have downloaded GUI profiler for CUDA. It has showed  many interesting things .... i will write more later - it would bigger ...
			
- 
				Hi Mimo,
 
 Just saw this recent update on the Cuda developer forum.
 
 http://forums.nvidia.com/index.php?showtopic=34241
 
 Any benefit to your efforts?
- 
				SETI@HOME uses 1d FFT and this is batched naturally in CUFFT library ....
			
- 
				SETI@HOME uses 1d FFT and this is batched naturally in CUFFT library ....
 
 SETI@HOME uses 1d FFT and this is batched naturally in CUFFT library ....
 
 
 Sorry  :-[.  My inquiry/interest was related to the batching, not the 2d fft itself since as you pointed-out, Seti uses 1d fft single precision complex ^2.
 Didn't realize the batching was already done in the CUFFT library however.
 Just ignore "the little man from behind the curtain"..... :P
 
 
 
- 
				try
 
 [attachment deleted by admin]
- 
				What has changed?
			
- 
				New data...
 
 [attachment deleted by admin]
- 
				my first run...
 
 8600GTS
 
 Cpu and vid card not OC'ed.
 
 ~BoB
 
 [attachment deleted by admin]
- 
				- changed data alignment to multiples of two (float2,float4) -some speedup
 
- 
				my first run...
 
 8600GTS
 
 Cpu and vid card not OC'ed.
 
 ~BoB
 
 
 more multiprocessors and more MHz as me and better results ...
- 
				My result, open attachment
 
 Result      : weakly similar.
 Result      : weakly similar.
 Result      : DIFFERENT.
 Result      : weakly similar.
 Result      : DIFFERENT.
 Result      : weakly similar.
 Result      : weakly similar.
 Speedup: 5.88%, Ratio: 1.06 x
 Speedup: 9.83%, Ratio: 1.11 x
 Speedup: 11.48%, Ratio: 1.13 x
 Speedup: -31.30%, Ratio: 0.76 x
 Speedup: 10.21%, Ratio: 1.11 x
 Speedup: 24.52%, Ratio: 1.32 x
 Speedup: 28.01%, Ratio: 1.39 x
 
 [attachment deleted by admin]
- 
				new app:
 
 -fixed chirp rate by findspike - typo error in chirpdata;
 - some host code cleanup - maybe some speedup.
 
 [attachment deleted by admin]
- 
				New result
 
 [attachment deleted by admin]
- 
				Hello
 Is it posible to provide build for 64-bit WinXP ?
 Cause on this OS always
 "
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 239
 "
 
- 
				new app
 
 - fixed detection freq by findspike
 - compiled with use_fast_math on GPU
 
 
 x64 not for now i havent installed x64 wxp and nvcc is not cross compiler ....
 
 [attachment deleted by admin]
- 
				New result
 
 Result      : weakly similar.
 Result      : weakly similar.
 Result      : DIFFERENT.
 Result      : weakly similar.
 Result      : DIFFERENT.
 Result      : weakly similar.
 Result      : weakly similar.
 Speedup: 6.18%, Ratio: 1.07 x
 Speedup: 8.78%, Ratio: 1.10 x
 Speedup: 10.67%, Ratio: 1.12 x
 Speedup: -32.14%, Ratio: 0.76 x
 Speedup: 9.59%, Ratio: 1.11 x
 Speedup: 23.49%, Ratio: 1.31 x
 Speedup: 27.27%, Ratio: 1.38 x
 
 [attachment deleted by admin]
- 
				Q6600 3Ghz vs 8800GTS 320
 
 [attachment deleted by admin]
- 
				Hello
 Is it posible to provide build for 64-bit WinXP ?
 Cause on this OS always
 "
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 239
 "
 
 
 
 make sure you copy the reference app init_data.xml into the folder of the GPU app. I was having the exact same error until I copied that file over.
 
 ~BoB
- 
				Ok, will test this, thanx
			
- 
				OK, I want to test this, but I need explicit instructions on how to set it up and what to test.  Can someone point me to instructions on what to do?
 
 Thanks!
 
- 
				Getting these errors.
 
 I downloaded KNABench 1.43 and replaced default 5.15.exe with the optimized client exe.  I also copied in sahcuda.exe.
 
 
 = Knabench 1.43 W32-W64 02/12/2007 by Kna + Simon =
 = mods: quick timetable, stderr, speedup/ratio    =
 
 7 testWU(s) found
 └─(testWU-1.wu)
 └─(testWU-2.wu)
 └─(testWU-3.wu)
 └─(testWU-4.wu)
 └─(testWU-5.wu)
 └─(testWU-6.wu)
 └─(testWU-7.wu)
 
 2 reference science app(s) found
 └─(KWSN_2.4_SSE3-Core2_MB.exe)
 └─(sahcuda.exe)
 
 0 science app(s) found
 
 ======================================
 
 Stopping Boinc ...
 System error 1060 has occurred.
 
 The specified service does not exist as an installed service.
 
 ------------
 Running app : KWSN_2.4_SSE3-Core2_MB.exe with -nographics
 with WU     : testWU-1.wu
 Started at  : 22:39:42
 Ended at    : 22:41:55
 Elapsed time: 133 seconds
 ------------
 Running app : sahcuda.exe with -nographics
 with WU     : testWU-1.wu
 Started at  : 22:41:55
 Ended at    : 22:41:56
 Elapsed time: 1 seconds
 ------------
 The system cannot find the batch label specified - NOSCAPPS
 Running app : !refapp1! with -nographics
 with WU     : !wunbr2!
 Started at  : !time_tmp!
 '!refapp1!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp2! with -nographics
 with WU     : !wunbr2!
 Started at  : !time_tmp!
 '!refapp2!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with -nographics
 with WU     : !wunbr3!
 Started at  : !time_tmp!
 '!refapp1!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp2! with -nographics
 with WU     : !wunbr3!
 Started at  : !time_tmp!
 '!refapp2!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with -nographics
 with WU     : !wunbr4!
 Started at  : !time_tmp!
 '!refapp1!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp2! with -nographics
 with WU     : !wunbr4!
 Started at  : !time_tmp!
 '!refapp2!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with -nographics
 with WU     : !wunbr5!
 Started at  : !time_tmp!
 '!refapp1!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp2! with -nographics
 with WU     : !wunbr5!
 Started at  : !time_tmp!
 '!refapp2!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with -nographics
 with WU     : !wunbr6!
 Started at  : !time_tmp!
 '!refapp1!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp2! with -nographics
 with WU     : !wunbr6!
 Started at  : !time_tmp!
 '!refapp2!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp1! with -nographics
 with WU     : !wunbr7!
 Started at  : !time_tmp!
 '!refapp1!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 Running app : !refapp2! with -nographics
 with WU     : !wunbr7!
 Started at  : !time_tmp!
 '!refapp2!' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : !time_tmp!
 0Elapsed time: !elapsed_time_stock! seconds
 Could Not Find C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Kn
 abench 1.43\science_apps\reference\*.sah
 ------------
 
 C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Knabench 1.43>z
 'z' is not recognized as an internal or external command,
 operable program or batch file.
 
 C:\Documents and Settings\Rich\Desktop\KWSN Knabench 1.43\KWSN Knabench 1.43>
 
 
 No idea what to do with these errors.  Please help.
- 
				okay some new builds - nothing changed only build options and used VC++ 2008 ...
 
 - sahvarch10.exe - virtual architecture v 1.0 - this is with default GPU code - GPU code is partially compiled on app start by driver based on GPU and for old architecture v. 1.0 (G80)
 - sahvarch11.exe - virtual architecture v 1.1 - this is with default GPU code - GPU code is partially compiled on app start by driver based on GPU but for new architecture v. 1.1 (G82,84,86,92)
 - sahgpu10.exe - GPU architecture v 1.0 GPU code is generated directly for GPU architecture v 1.0 without any driver stage (binary image loaded to the GPU) (G80)
 - sahgpu11.exe - GPU architecture v 1.0 GPU code is generated directly for GPU architecture v 1.1 without any driver stage (binary image loaded to the GPU) (G82,84,86,92)
 
 by virtual architecture is generated pseudo code and this code is finally compiled and optimized by driver on application start, by GPU architecture is generated final binary image of the GPU code ...
 
 v1.1 has some optimizations that uses new features implemented in v1.1 arch.
 
 v1.1 will not run on v1.0 !!!
 
 [attachment deleted by admin]
- 
				Getting these errors.
 
 I downloaded KNABench 1.43 and replaced default 5.15.exe with the optimized client exe.  I also copied in sahcuda.exe.
 
 
 
 = Knabench 1.43 W32-W64 02/12/2007 by Kna + Simon =
 = mods: quick timetable, stderr, speedup/ratio    =
 
 7 testWU(s) found
 └─(testWU-1.wu)
 └─(testWU-2.wu)
 └─(testWU-3.wu)
 └─(testWU-4.wu)
 └─(testWU-5.wu)
 └─(testWU-6.wu)
 └─(testWU-7.wu)
 
 2 reference science app(s) found
 └─(KWSN_2.4_SSE3-Core2_MB.exe)
 └─(sahcuda.exe)
 
 0 science app(s) found
 ...
 
 No idea what to do with these errors.  Please help.
 
 Move sahcuda.exe to the Science_apps folder. Knabench is designed to check the results of one or more Science apps against one reference app.
 Joe
- 
				Rename KWSN_2.4_SSE3-Core2_MB.exe to default 5.15.exe and replace
			
- 
				setiathome_5.27_windows_intelx86.exe vs sahcudagpu10.exe vs sahcudavarch10.exe
 
 [attachment deleted by admin]
- 
				Btw for the record.. cuda application works in X64 Xp environment if u use 32 bit cuda toolkit in Windows XP x86-64 instead of 64 bit.  Noticed that today!  ;D
			
- 
				Btw for the record.. cuda application works in X64 Xp environment if u use 32 bit cuda toolkit in Windows XP x86-64 instead of 64 bit.  Noticed that today!  ;D
 
 Hm, did you complete toolkit install or just used separate DDLs needed by seticuda?
- 
				Complete!
			
- 
				Thanks for the help thus far, all.
 
 I renamed the optimized client to default-5-15.exe and moved sahcuda.exe to the Science_apps folder.  The bench ran correctly, but I ran into more errors.
 
 I've attached the log.
 
 
 
 [attachment deleted by admin]
- 
				Curious, how do i test the sahcuda.exe? 
 I installed cuda 1.0 + sdk but all i get is:
 
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 239
 
 Do i need to install cuda 1.1? or do i need more to run sahcuda (or the sahcudavarch11 etc)
- 
				this not related to cuda - cuda eerors are another ....
 
- 
				I am confused with all the info in here so i dont really understand what i need to do to get this up and running 
 i have a vista 64 bit machine however my understanding is that if i installed the cuda 32bit s/w then no issues
 
 i have a quadcore at 3.4ghz
 and 3 8800 ultras in a sli configuration i have to use the 169.25 drivers for this to work however that is a higher revision then the .11
 4gb of memory
 
 currently if i try to run the knabench 1.43 at default it goes to cpu 1 affinity if i set it to use 0-3 it will still only used 25% of the total power of the cpu
 
 i also do not know were to put the sahcuda sahcudagpu11 sahcudavarch11
 
 i also see many refrences to xml files but i dont know were these are coming from or the edit i need to make to them or were they need to go
 
 i moved the sahcuda exes into the science apps location
 
 
 = Knabench 1.43 W32-W64 02/12/2007 by Kna + Simon =
 = mods: quick timetable, stderr, speedup/ratio    =
 
 7 testWU(s) found
 └─(testWU-1.wu)
 └─(testWU-2.wu)
 └─(testWU-3.wu)
 └─(testWU-4.wu)
 └─(testWU-5.wu)
 └─(testWU-6.wu)
 └─(testWU-7.wu)
 
 1 reference science app(s) found
 └─(default-515.exe)
 
 3 science app(s) found
 └─(sahcuda.exe)
 └─(sahcudagpu11.exe)
 └─(sahcudavarch11.exe)
 
 ======================================
 
 Stopping Boinc ...
 The BOINC service is not started.
 
 More help is available by typing NET HELPMSG 3521.
 
 ------------
 Running app : default-515.exe with -nographics
 with WU     : testWU-1.wu
 Started at  : 11:34:34
 Ended at    : 11:39:49
 Elapsed time: 315 seconds
 Result      : stored as ref for validation.
 ------------
 Running app : sahcuda.exe with -verbose
 with WU     : testWU-1.wu
 Started at  : 11:39:49
 Ended at    : 11:39:50
 Elapsed time: 1 seconds
 Speedup     : 99.68%
 Ratio       : 315.00 x
 Result      : DIFFERENT.
 ------------
 Running app : sahcudagpu11.exe with -verbose
 with WU     : testWU-1.wu
 Started at  : 11:39:50
 Ended at    : 11:39:51
 Elapsed time: 1 seconds
 Speedup     : 99.68%
 Ratio       : 315.00 x
 Result      : DIFFERENT.
 ------------
 Running app : sahcudavarch11.exe with -verbose
 with WU     : testWU-1.wu
 Started at  : 11:39:51
 Ended at    : 11:39:51
 Elapsed time: 0 seconds
 Speedup     : 100.00%
 Ratio       : 1.#J x
 Result      : DIFFERENT.
 ------------
 Running app : default-515.exe with -nographics
 with WU     : testWU-2.wu
 Started at  : 11:39:51
 
 
 Quick timetable
 
 WU : testWU-1.wu
 default-515.exe : 315 seconds
 sahcuda.exe : 1 seconds
 Speedup: 99.68%, Ratio: 315.00 x
 sahcudagpu11.exe : 1 seconds
 Speedup: 99.68%, Ratio: 315.00 x
 sahcudavarch11.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 
 WU : testWU-2.wu
 default-515.exe : 365 seconds
 sahcuda.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 sahcudagpu11.exe : 1 seconds
 Speedup: 99.73%, Ratio: 365.00 x
 sahcudavarch11.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 
 WU : testWU-3.wu
 default-515.exe : 396 seconds
 sahcuda.exe : 1 seconds
 Speedup: 99.75%, Ratio: 396.00 x
 sahcudagpu11.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 sahcudavarch11.exe : 1 seconds
 Speedup: 99.75%, Ratio: 396.00 x
 
 WU : testWU-4.wu
 default-515.exe : 105 seconds
 sahcuda.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 sahcudagpu11.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 sahcudavarch11.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 
 WU : testWU-5.wu
 default-515.exe : 368 seconds
 sahcuda.exe : 1 seconds
 Speedup: 99.73%, Ratio: 368.00 x
 sahcudagpu11.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 sahcudavarch11.exe : 0 seconds
 Speedup: 100.00%, Ratio: 1.#J x
 
 WU : testWU-6.wu
 default-515.exe : 580 seconds
 sahcuda.exe : 1 seconds
 Speedup: 99.83%, Ratio: 580.00 x
 sahcudagpu11.exe : 1 seconds
 Speedup: 99.83%, Ratio: 580.00 x
 sahcudavarch11.exe : 1 seconds
 Speedup: 99.83%, Ratio: 580.00 x
 
 WU : testWU-7.wu
 default-515.exe : 258 seconds
 sahcuda.exe : 1 seconds
 Speedup: 99.61%, Ratio: 258.00 x
 sahcudagpu11.exe : 1 seconds
 Speedup: 99.61%, Ratio: 258.00 x
 sahcudavarch11.exe : 1 seconds
 Speedup: 99.61%, Ratio: 258.00 x
 
 
 ======================================
 
 
 the  sec run times make be belive that it is not running at all and is geting some sort of error ?
 
 
 if i try to run the files by the self in the seti projects location i get
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 239
 
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 239
 
 SETI@home error -108 Unknown error
 from boinc_init_diagnostics()
 File: ..\main.cpp
 Line: 239
 
 i see that some one says to move the  init_data.xml  file but i dont know were that is from or were to move it to or what edits to make
 
 
 ok with bionic i figured out that app info xml file is really just telling the service witch exe to proc i got it working with KWSN_2.4_SSSE3_IPP_Ben-Joe however when i tried to set it to either of the 3 sahcuda exe's it would fail to inilize and spam the message log
 
 also are under 2 hour jobs normal ? or is that only because i have just etup bionic now and it is trying to base line ?
- 
				i have a vista 64 bit machine however my understanding is that if i installed the cuda 32bit s/w then no issues ONLY Windows XP....
 http://www.nvidia.com/object/cuda_get.html#windows
- 
				i have a vista 64 bit machine however my understanding is that if i installed the cuda 32bit s/w then no issues ONLY Windows XP....
 http://www.nvidia.com/object/cuda_get.html#windows
 
 
 3 way sli will only work under vista
 
 do any of the combatablity tools work  ?
 
 I also have the abiltiy to load virtual machines or would cuda not work under vmware ?
- 
				hey all,
 
 Its been a while but I finally got an 8000 series video card (8600 GTS).  So now I can start testing...(it's in a comp running Vista, is that going to be an issue?)
 
 I also picked up another 7800GTX card but I don't think that CUDA runs on the 7000 series card (please confirm).
 
 -citroja
- 
				to clone :
 
 try it  i dont kjnow if it is real but try ....
 
 
 i think if vmware don't virtualize a graphics card then it can run. but if a graphics card is virtual then no, CUDA need direct access to hw thru kernel ...
 
 to citroja :
 
 ee only winxp
- 
				ok...well it looks like I am still sitting this one out :(
 
 I will keep an eye on the posts and help as i can.
 
 Good Luck!
 
 -citroja
- 
				VMware creates a video device
 
 "VMware SVGA ll"
 
 So i guess i am out of luck unless i load xp
 
 Currently i avg about 30 min a unit per core so 8 units an hour
 
 How much faster would the 3 8800 ultra's be ?
- 
				Whatever happened to the version that works on on other cards than 8xxx cards?
 
 I am talking about this thread: http://lunatics.kwsn.net/windows/gpu-crunching-question.180.html
 
 
 
- 
				this app development i have stopped ...
			
- 
				Hi Devaster, Sorry to hear that, Though I am sure you have your reasons.  Although I have no idea about GPU programming I appreciate the potential it may hold to make extra use of a much untapped resource.  Would you like me to preserve any code in my repository ? [Technically OUR repository, It just happens to be at my place till we find a better more permanent home  ;) ]  It would seem a shame if that work should go to waste.
 
 Jason
- 
				No no , CUDA application is living but code for other GPU is abadoned ....
			
- 
				Ahhh , I see! no probs! As you were then  ;D
			
- 
				Any reason for abandoning the other app?
 
 When i tested it I got an error 9 as many others.
 
 Is the pre-8xxx hardware that bad for these kind of things?
 
 Just so sorry to hear as i own an Nvidia 7950 which i bought just half an year ago with my new computer.
- 
				i have time only for one project and code for pre8x and 8x card is totally different ....
 
 now working on populate PoT functions ...
- 
				new code ...
 
 
 - small change in findspike - no speedup better validation only ....
 
 [attachment deleted by admin]
- 
				new code ...
 
 
 - small change in findspike - no speedup better validation only ....
 
 
 After running for 45 min no progress had been made... (still sitting on 1st fft (size 8?))
 
 ~BoB
- 
				new code ...
 
 
 - small change in findspike - no speedup better validation only ....
 
 
 Is sahcudagpu10.zip for G80 only?
 Will not work on EN8800GT   G92.
- 
				No its for all G90+ all from 8500 and up..
 
 //Vyper
- 
				Well ran latest code sahcudagpu10.exe
 
 code runs on cpu at 100%, gpu card does nothing.
 
 This is all I get
 
 Running app : sahcudagpu10.exe with -verbose
 with WU     : testWU-1.wu
 Started at  : 13:31:12
 Device name: GeForce 8800 GT
 Device version : 1.1
 Total global memory: 255 MB
 Shared memory per block: 16 kB
 Registers per block: 8192
 Warp size: 32
 Max threads per block: 512
 Shaders clock rate: 1512 MHz
 Generated FFT plans
 mem alloc
 Copying chirped data (chirp rate=0) ...
 FFT s:8 bs:131072 > Finded spikes > Transposing
 
 Seems to hang on my system anyway.
 
 Cheers.
- 
				"The application has failed to start because cufft.dll was not found. Re-installing the application may fix this problem." 
 
 Actually you need to have all needed DLLs of that CUDA runtime consist in place searchable by OS for DLL (simple saying in LIB_PATH). If you use CUDA only for SETI probably better way to put that DLLs together with exe file. When I tried CUDA few months ago the needed DLLs list was cuda.dll cudart.dll cufft.dll cutil32.dll (maybe slightly redundant).
 
 
 
 
 Hi All,
 I'm having difficulty getting the sahcuda client running due to library path problems.
 I've installed Nvidia cuda sdk v0.81 and Cuda toolkit v0.80.0000 to the same directory
 I've checked the Cuda paths in Environment variables and they seem fine.
 I installed the sahcuda.exe into the same directory but I still get a "cudart.dll was not found" error.
 
 CUDA_LIB_PATH=C:\CUDA\lib
 CUDA_BIN_PATH=C:\CUDA\bin
 CUDA_INC_PATH=C:\CUDA\include
 
 Card: Evga 8800 GTS 512
 
 
 How can I solve this problem?
 
 
- 
				I can understand why you have problems..
 
 Download the latest as found here http://www.nvidia.com/object/cuda_get.html.
 
 Cuda is now up to version 1.1 and that version u have doesn't even support large CUFFT calls.
 
 Kind regards Vyper
- 
				Hi Vyper, Thanks for the reply,
 
 The version I have installed is actually v1.1 (X86-64 for XP-64bit) . The version reported under "support information" in Add/Remove programs is incorrect.
 
 
- 
				Running in 12 hour produce this:
 
 ------------
 Running app : default-515.exe with -nographics
 with WU     : testWU-1.wu
 Started at  : 20:25:00
 '.\tools\wu_time' is not recognized as an internal or external command,
 operable program or batch file.
 Ended at    : 20:31:06
 Elapsed time: 366 seconds
 Result      : stored as ref for validation.
 ------------
 Running app : sahcudagpu10.exe with -verbose
 with WU     : testWU-1.wu
 Started at  : 20:31:06
 Device name: GeForce 8800 GTX
 Device version : 1.0
 Total global memory: 767 MB
 Shared memory per block: 16 kB
 Registers per block: 8192
 Warp size: 32
 Max threads per block: 512
 Shaders clock rate: 1350 MHz
 Generated FFT plans
 mem alloc
 Copying chirped data (chirp rate=0) ...
 FFT s:8 bs:131072 > Finded spikes > Transposing
 
 
 Nothing more happen. Same as the others.
 
- 
				Hmm this is odd really. Have you tried the previous versions compiled by DevasteR?
 
 I've tried them out by myself and they used to work properly.
 
 U might need VS2005 addon in some sort but it don't know the name of that file atm, think it had to do with some asm functions or so?
 
 Kind Regards Vyper
- 
				last exe is broken .. sorry
			
- 
				@Devaster
 On nVidia forums answer was given that they ship Win64 support with CUDA 1.1 regardless of XP/Server OS used (as I anderstood that andswer).
 Could you put your current SETI-CUDA sources online for porting them to 64-bit environment (probably, siply recompile to 64-bit executable?...).
 
- 
				His Cuda app works on atleast XP64, all u have to do is to install the CUDA sdk for 32bit enviroment and it works there also.
 
 Vista X64 is another story..
 
 Kind Regards Vyper
- 
				So I need to install 
 CUDA x86 1.1 SDK and ToolSet ander WinServer 2003 x64 and all will be fine? ;) Good! And what about drivers, what version of detonator or smth new-named I should install on x64 Win 2003 server in this case?
- 
				on my 9600gt are latest 176.4x broken - CUDA is non functional .... :-[
			
- 
				tried CUDA app on Win2003x64 with 169.21_forceware_winxp_64bit_english_whql.exe driver and x86 CUDA 1.1 Toolkit and SDK installed.
 Got error: FFT failed.
 Full stderr & stdout in prerelease forum thread.
 Anybody know what version is free from FFT-related problems (maybe some earlier ones comute FFT on CPU ...) ?
- 
				something weird - FFT is functional in any application ... 
			
- 
				Hi :) What data should I collect to help with this issue?
 
- 
				OK got teh executable, but how do I install it so SAH will run it instead of the x86
			
- 
				You shouldn't run it _instead_ of x86 app! CUDA app still alpha cause generate weakly similar results that could be invalidated by validator.
 You need to run it in separate (from BOINC) directory as offline, standalone test only.
 Just put exe file you downloaded in separate directory, copy some of WUs in same dir, rename that WU into work_unit.sah, copy in that new dir init_data.xml from your BOINC project and run exe from that new dir.
 And you need CUDA SDK/tools installed or runtime CUDA DLLs to be in PATH (maybe LIBPATH).
- 
				Hi all,
 
 thought I would check in here, where I left off some time ago.
 
 I've been playing around with CUDA and the SETI sources, but got stuck with sub-par performance of my code, and stopped working on it.
 Later on I managed to destroy my graphics card (8800GTX) while trying to clean the fans - don't ask  ;D
 
 Long story short, I'm getting another graphics card, but can't decide which one would be best for crunching SETI:
 
 I'm considering another 8800GTX or maybe a 9800GTX or even 9800GX2.
 
 Could you give me any hints w.r.t FFT performance of these cards?
 
 Regards Hans
 
 P.S: I have a gut feeling the 8800GTX still might be the best number cruncher...
 
 
 
 
 
 
 
 
 
- 
				Hi Hans
 
 Long time no see..
 
 Well this is rather hard but i sincerly think that u should get your hands on a 8800GT (G92) because it's still based on the good o'l G90 architecture in terms of performance if your going to spend around 200Eur or so.
 I recall somewhere that Cuda 2.0 is supporting wider caculations or so but i don't know if it's already supported in 9xxx series or so. If that's the case then a 9xxx based card is the way to go performance wise.
 It's all there in the Cuda doc's.
 
 Then there has been a release of the new Alex Kan code which is vastly improved if your going to start from scratch with porting different functions.
 I suggest you take a chat with DevasteR because he has managed to port many functions over to Cuda 2.0 and getting it to work with faster code than 2.4V with his 9600 and that's truly impressive.
 
 The only issue he has is that it doesn't validate (weakly similar or different) status but in which function that occur i can't remember.
 
 You two can perhaps cook something really nice in terms of apps/hacks..
 
 Really nice to see you again Hans, one person that has reappeared to this community only one to go (Simon)..
 
 Kind Regards Vyper
- 
				velcome back Hans !
			
- 
				velcome back Hans !
 
 
 Hi there!
 
 I'm still rummaging around while trying to decide which GPU to get. The newer cards don't show up in the CUDA docs, but I guess that's not a problem.
 
 I might go with the 8800GT @ 1GB memory - that one would come handy for gaming, too  ;D
 
 Regards Hans
- 
				i am very happy with 9600GT card - 64 stream procs, G94, arch 1.1 - this is main advantage against older cards ...
 see CUDA 1.1 or 2.0 manual for diferencies between 1.0 and 1.1
 
 code generated for 1.1 IS not backward compatible with 1.0 ...
- 
				Cuda 2.0 support Vista32 & Vista64
 
 http://www.nvidia.com/object/get_cuda2_beta.html
- 
				Is there a new version coming? Reading thru the thread it says the last one was broken. Nothing since?
			
- 
				now that cuda 2 has been released for vista 32 and 64 bit, i still have 3 8800 ultras that i would like to get working with cuda for seti.
 
 I am reading the thread and as versions/os have changed i do not know what files i need to get this working now.  Like the previous poster stated the last exe was broken ?
 
 Is there some sort of guide to getting this going already posted somewere?
- 
				I'm in the same situation...  :'(
			
- 
				code is completely rewrited ...
 and 6.3.3 boinc lib is now detecting CUDA devices as coprocessor yet , not as next core ...
 still u need for one GPU core one CPU core ....
- 
				Does new version available for testing?
 And how BOINC handles remote desktop session (there is no CUDA-capable GPU detected in remote session) ? Should it run as service ?
- 
				still in early alpha .
 
 
 I'd say: Kudos to your effort, although there are precision and performance issues remaining. Excuse my curiosity - is the source code for the CUDA based GPU client publicly available at this time?
 
 Christian
 
- 
				Hi, I have an 8800 GTS (G80)  and i have installed Cuda 2.0 beta2 (my OS is Vista 64 bit).
 Which app i must use for gpu crunch?and how do i install it?
 Thanks
- 
				is any working client what can we download and test ? (client what would work with boinc and crunch seti for own risk)
			
- 
				here :
 
 fft and powerspectrum on GPU
 
 [attachment deleted by admin]
- 
				Hi,
 
 I receive the following error when running the executable:
 
 Popup:
 The application failed to initialize properly (0xc000007b).
 
 Windows Eventlog:
 Faulting application setiathome_6.01_windows_intelx86.exe, version 6.0.0.0, time stamp 0x4874fa46, faulting module ntdll.dll, version 6.0.6001.18000, time stamp 0x4791a783, exception code 0xc000007b, fault offset 0x0006ecfb, process id 0xe84, application start time 0x01c8e61984c85068.
 
 I run Vista x64.
 
 Morten
- 
				application is 32 -bit then u need 32-bit  CUDA  toolkit  ....
 
 later todaz i will try build 64 bit version ...
- 
				here :
 
 fft and powerspectrum on GPU
 
 
 Hi
 
 Can u please tell me where I can find the necessary dll-files to run "setiathome_6.01_windows_intelx86"? Any installer? CUDA-toolkit? It's not in download-area  :-[
 
 Thanks
 Peter
- 
				http://www.nvidia.com/object/cuda_get.html
			
- 
				Hi,
 
 With regards to the client  "setiathome_6.01_windows_intelx86.rar" on this page, I set this up in BOINC as a regular optimized app and not in a separate directory running standalone, as described earlier?
 
 (http://www.boincstats.com/signature/user_108695.gif)
- 
				Using the app in standalone mode, I downloaded the 7 sample work units.  Renamed the first as work_unit.sah and ran.  The window showed a running output of "FFT=" messages from 64 to 64K.  Got a boinc_finished_called file, so I think it completed sucessfully.  
 
 What sort of testing should we be doing?
 
 I am running an eVGA 8800 GTS 512 on vista 32.  CUDA drivers are 177.35 and toolkit 2.0 beta 2 for vista 32.
 
 Thanks, TheMule
- 
				u may use  knabench system for speed comparision ...
			
- 
				Ok, not what I expected. Using KNAbench and work unit 1, I got:
 
 226 sec - setiathome_6.01_windows_intelx86
 203 sec - setiathome_5.27_windows_intelx86
 
 About 23 sec slower.  Is it due to the FFT messages on the screen? Data follows:
 
 setiathome_5.27_windows_intelx86.exe -nographics / testWU-1.wu :
 Started at  : 13:53:37
 Ended at    : 13:57:00
 Elapsed time: 203 seconds
 
 [ stderr ]
 Can't set up shared mem: -1
 Will run in standalone mode.
 setiathome_enhanced 5.27 DevC++/MinGW
 
 Work Unit Info:
 ...............
 WU true angle range is :  0.604884
 Optimal function choices:
 -----------------------------------------------------
 name
 -----------------------------------------------------
 v_BaseLineSmooth (no other)
 v_vGetPowerSpectrumUnrolled 0.00006 0.00000
 sse3_ChirpData_ak 0.00899 0.00000
 v_vTranspose4 0.00143 0.00000
 AK SSE folding 0.00076 0.00000
 
 Flopcounter: 637401180238.359500
 
 Spike count:    0
 Pulse count:    0
 Triplet count:  0
 Gaussian count: 0
 [ /stderr ]
 
 
 
 setiathome_6.01_windows_intelx86.exe -nographics / testWU-1.wu :
 Started at  : 13:44:56
 Ended at    : 13:48:42
 Elapsed time: 226 seconds
 
 [ stderr ]
 Device name: GeForce 8800 GTS 512
 Device version: 1.1
 Total global memory (MB): 512
 Number of multiprocessors : 16
 Number of cores :128
 Shared memory per block (kB): 16
 Registers per block: 8192
 Warp size: 32
 Max threads per block: 512
 Shaders clock rate (MHz): 1674
 Concurrent copy and execution: No
 Can't set up shared mem: -1
 Will run in standalone mode.
 setiathome_enhanced 6.01 Visual Studio/Microsoft C++
 libboinc: 6.3.4
 
 Work Unit Info:
 ...............
 WU true angle range is :  0.604884
 
 Flopcounter: 627299330081.366820
 
 Spike count:    0
 Pulse count:    0
 Triplet count:  0
 Gaussian count: 0
 called boinc_finish
 [ /stderr ]
 ------------
 
 
 
 
- 
				okay :
 new code - now 64-bit ...
 
 as previous 32-bit build ....
 
 compiled with VS2008+VS2005 under Windows Server 2008 x64
 
 small test :
 ============ 
 setiathome_6.00S08_windows_intelx86.exe -verb -nog / testWU-4.wu :
 Started at  : 20:57:18.970
 Ended at    : 21:00:46.190
 207.126 secs Elapsed
 199.109 secs CPU time
 
 [ stderr ]
 Can't set up shared mem: -1
 Will run in standalone mode.
 setiathome_enhanced 6.00S08 DevC++/MinGW
 libboinc: 6.1.6
 
 DataIn=0x32b00c0, ChirpedData=0x2aa0040
 
 Work Unit Info:
 ...............
 WU true angle range is :  1.279649
 Optimal function choices:
 -----------------------------------------------------
 name  timing   error
 -----------------------------------------------------
 v_BaseLineSmooth (no other)
 
 v_GetPowerSpectrum 0.00079 0.00000  test
 v_vGetPowerSpectrum 0.00073 0.00000  test
 v_vGetPowerSpectrum2 0.00075 0.00000  test
 v_vGetPowerSpectrumUnrolled 0.00076 0.00000  test
 v_vGetPowerSpectrumUnrolled2 0.00075 0.00000  test
 v_vGetPowerSpectrum 0.00073 0.00000  choice
 
 v_ChirpData 0.03327 0.00000  test
 fpu_ChirpData 0.04556 0.00000  test
 v_vChirpData_x86_64 0.24693 0.00002  test
 sse1_ChirpData_ak 0.03216 0.00000  test
 sse2_ChirpData_ak 0.03455 0.00000  test
 sse3_ChirpData_ak 0.02924 0.00000  test
 sse3_ChirpData_ak 0.02924 0.00000  choice
 
 v_Transpose 0.04322 0.00000  test
 v_Transpose2 0.02599 0.00000  test
 v_Transpose4 0.01550 0.00000  test
 v_Transpose8 0.02781 0.00000  test
 v_pfTranspose2 0.02539 0.00000  test
 v_pfTranspose4 0.01571 0.00000  test
 v_pfTranspose8 0.02681 0.00000  test
 v_vTranspose4 0.01173 0.00000  test
 v_vTranspose4np 0.01197 0.00000  test
 v_vTranspose4ntw 0.01090 0.00000  test
 v_vTranspose4x8ntw 0.00758 0.00000  test
 v_vTranspose4x16ntw 0.00580 0.00000  test
 v_vpfTranspose8x4ntw 0.01072 0.00000  test
 v_vTranspose4x16ntw 0.00580 0.00000  choice
 
 FPU opt folding 0.00423 0.00000  test
 AK SSE folding 0.00220 0.00000  test
 BH SSE folding 0.00201 0.00000  test
 BH SSE folding 0.00201 0.00000  choice
 
 
 Flopcounter: 243285924139.522000
 
 Spike count:    0
 Pulse count:    0
 Triplet count:  0
 Gaussian count: 0
 called boinc_finish
 [ /stderr ]
 ------------
 setiathome_6.01_windows_intelx64.exe -verb -st / testWU-4.wu :
 Started at  : 21:00:46.346
 Ended at    : 21:03:02.643
 136.219 secs Elapsed
 128.750 secs CPU time
 Speedup     : 35.34%
 Ratio       : 1.55 x
 
 Result      : Strongly similar,  Q= 99.99%
 [ stderr ]
 Device name: GeForce 9600 GT
 Device version: 1.1
 Total global memory (MB): 512
 Number of multiprocessors : 8
 Number of cores :64
 Shared memory per block (kB): 16
 Registers per block: 8192
 Warp size: 32
 Max threads per block: 512
 Shaders clock rate (MHz): 1625
 Concurrent copy and execution: No
 Can't set up shared mem: -1
 Will run in standalone mode.
 setiathome_enhanced 6.01 Visual Studio/Microsoft C++
 libboinc: 6.3.5
 
 Work Unit Info:
 ...............
 WU true angle range is :  1.279649
 
 Flopcounter: 238022320153.522060
 
 Spike count:    0
 Pulse count:    0
 Triplet count:  0
 Gaussian count: 0
 called boinc_finish
 [ /stderr ]
 
 
 ------------
 
 Quick timetable
 
 WU : testWU-4.wu
 setiathome_6.00S08_windows_intelx86.exe : 199.109 secs CPU
 setiathome_6.01_windows_intelx64.exe : 128.750 secs CPU
 Speedup     : 35.34%
 Ratio       : 1.55 x
 
 ------------
 CPU:
 Number of processors	1
 Number of cores		1 (max 1)
 Specification		AMD Athlon(tm) 64 Processor 3000+
 Codename		Venice
 Core Speed		1005.3 MHz (5.0 x 201.1 MHz)
 Core Stepping		DH-E6
 Technology		90 nm
 Stock frequency		1800 MHz
 ------------
 Chipset:
 Northbridge		NVIDIA nForce4 rev. A3
 Southbridge		NVIDIA nForce4 MCP rev. A3
 ------------
 RAM:
 Memory Type		DDR
 Memory Size		2048 MBytes
 Memory Frequency	201.1 MHz (CPU/5)
 Max bandwidth		PC3200 (200 MHz)
 CAS#			3.0
 RAS# to CAS#		3
 RAS# Precharge		3
 Cycle Time (tRAS)	8
 DRAM Idle Timer		16
 ------------
 OS:
 Windows Version		Microsoft Windows Vista (6.0) Enterprise Edition (Full)  Service Pack 1 (Build 6001)
 ============
 
 apps was runnig almost all the time at 100 percent - MS has made very good job with 2008 server in performance ....
 
 [attachment deleted by admin]
- 
				Hi,
 
 Tested x64-version and got this:
 
 ==================
 Device name: Device Emulation (CPU)
 Device version: 9999.9999
 Total global memory (MB): 4095
 Number of multiprocessors : 16
 Number of cores :128
 Shared memory per block (kB): 16
 Registers per block: 8192
 Warp size: 1
 Max threads per block: 512
 Shaders clock rate (MHz): 1350
 Concurrent copy and execution: No
 Can't set up shared mem: -1
 Will run in standalone mode.
 GPU memory allocation error (source buffer) ...
 
 ==================
 
 I'm running Cuda display driver NVIDIADisplayWinVista64(177_35)Int.exe on Geforce 8800 GT
 
 Morten
- 
				has someone same problem ?
 
 try use latest drivers ....
- 
				I found the cause of the problem: 
 
 I was connected to the machine using RDP/Terminal Services ("mstsc /v:computer /console"). In this session Nvidia is not available.
 
 After testing this I have some questions/comments:
 
 1: When running the executable it's using 100% CPU - shouldn't the CPU utilization be close to zero and GPU be utilized to the max? As it is now it has no practical use as I give away my CPU in order to utilize the GPU.
 2: How to install an run in combo with BOINC? What is your roadmap/intention on this?
 3: With the Terminal Services issue mentioned, it appears the only way to run interactively is being logged on locally/physically.
 3a: The best way to run is as a service - do you have any suggestions/plans on how to facilitate a service installation, or just use sc.exe?
 
 Morten
 
 [attachment deleted by admin]
- 
				1. for now not using streams and ported only 10 % of code to GPU ...
 2. this code is onlz technology preview so i dont know .....
 3. i dont know about  some workaround with terminal services ... sorry
 4. service running is managed by BOINC core not by computing app ....
- 
				Hi,
 
 Thanks for clearing that up.
 
 Do you recon it's realistic to port 100% to GPU? Do you have an idea how much you will be able to port and when? I think this is such an excellent idea and am really hoping you'll be able to pull it off!
 
 M
- 
				fft and powerspectrum on GPU
 
 
 are you making use of CUFFT's batching feature? If you do, you can basically run multiple FFTs with one CUDA call, which can save some API and kernel launch overhead.
 
 
- 
				yes , used cufft batch mode ....
			
- 
				okay :
 new code - now 64-bit ...
 
 as previous 32-bit build ....
 
 compiled with VS2008+VS2005 under Windows Server 2008 x64
 
 small test :
 
 
 apps was runnig almost all the time at 100 percent - MS has made very good job with 2008 server in performance ....
 
 
 How to install the test app?
- 
				all working fine
 for wu1
 with the x64 6.01 app ----- 127 sec
 with ak v8 SSSE3.1 -----  69 sec
 with ak v8 SSE4.1 ----- 45 sec
 Best Regards
 D.Draganov
 Nvidia GeForce 8800GTX 768Mb
 Core Duo E8500 @ 4.17
 Windows x64 XP Pro
 
 Just wondering if the GPU is 100% load :) myhahahaha
 and when it recog. as another pro not a co proc
 
 
 
 Device name: GeForce 8800 GTX
 Device version: 1.0
 Total global memory (MB): 767
 Number of multiprocessors : 16
 Number of cores :128
 Shared memory per block (kB): 16
 Registers per block: 8192
 Warp size: 32
 Max threads per block: 512
 Shaders clock rate (MHz): 1350
 Concurrent copy and execution: No
 Can't set up shared mem: -1
 Will run in standalone mode.
 setiathome_enhanced 6.01 Visual Studio/Microsoft C++
 libboinc: 6.3.5
- 
				What you mean?
 What your GPU timing?
- 
				What you mean?
 What your GPU timing?
 
 all stock
 engine 576
 shader 1350
 memory 1800
 if i understand what you are asking
- 
				Ah, no, I asked what time it takes to run GPU-version of SETI client on your host?
 You wrote
 with the x64 6.01 app ----- 127 sec
 with ak v8 SSSE3.1 -----  69 sec
 with ak v8 SSE4.1 ----- 45 sec
 
 Are these numbers GPU-app run times?
 What GPU app version you used?
 
- 
				Ah, no, I asked what time it takes to run GPU-version of SETI client on your host?
 You wrote
 with the x64 6.01 app ----- 127 sec
 with ak v8 SSSE3.1 -----  69 sec
 with ak v8 SSE4.1 ----- 45 sec
 
 Are these numbers GPU-app run times?
 What GPU app version you used?
 
 
 yes it is there 6.01 app i used - setiathome_6.01_windows_intelx64.exe 127 sec
 AK_v8.0_Win64_SSE4.1 - 45 sec
 AK_v8.0_Win64_SSSE3.1 - 69 sec
 Test wu-1
- 
				if there are some beta's upcoming i'm 100% volunteer for testing  ;D ;D
 
 Best Regards
 D.Draganov
- 
				I'm volunteer too  ;D
			
- 
				[noise]now working on ap...[/noise]
			
- 
				any progress ???
			
- 
				Any progress ^^ ?  ;D
			
- 
				In debugging. 
 If bug will be found, FFT will be done by  CUDA FFT library on GPU device.
 (for AP app. MB GPU dev freezed for some time AFAIK)
 
- 
				nice will wait for test it :)
			
- 
				hey how's the progress :( 
 
 Best Regards
 D.Draganov
- 
				Current  GPU app (FFT, highpass filter, dechirp) works correctly but at first glance gives no performance boost (at first sight because there was no CPU load info during test, only total timings).
 One of aims of GPU app as I see it to free CPU for another work. So if it does task slower than CPU version but frees CPU much, it will be total benefit.
 
 Now we trying to extend GPU processing on FFA function too (one of the longest routines in AstroPulse). The more code will be on GPU and the less data moves between GPU and CPU remains - the faster GPU app will be.
 Maybe some big architecture change of app will be needed to better suit GPU hardware. Work in progress  8)
- 
				wow nice thanks man keep moving !!! :)
 
 Best Regards
 D.Draganov
- 
				The client will be compatible with ati cards ?
			
- 
				The client will be compatible with ati cards ?
 
 
 No, sorry.  :-[ Not at first. Current efforts are centered @ Nvidia using Cuda SDKs.
 It's as much a learning process as a development process.  GPU & CPU are different beasts & this is a new, uncharted frontier in Seti development.
 Developers are having to completely rethink how to integrate the necessary tools and their understanding of Seti code into a structure that can exploit CPU & GPU strengths & dependencies.
 An ATI effort may be in the cards in the future, but there's a long way to go before a Cuda app will be sophisticated and efficient enough to merit tackling another unique architecture and SDK.  ;)
 
 Then again...maybe someone will join the effort that can lead this project w/ the ability, tools and skill necessary.  :)
 This is a welcoming group.  ;D
 
 
- 
				Ah, no, I asked what time it takes to run GPU-version of SETI client on your host?
 You wrote
 with the x64 6.01 app ----- 127 sec
 with ak v8 SSSE3.1 -----  69 sec
 with ak v8 SSE4.1 ----- 45 sec
 
 Are these numbers GPU-app run times?
 What GPU app version you used?
 
 
 yes it is there 6.01 app i used - setiathome_6.01_windows_intelx64.exe 127 sec
 AK_v8.0_Win64_SSE4.1 - 45 sec
 AK_v8.0_Win64_SSSE3.1 - 69 sec
 Test wu-1
 
 
 
 omg wow.  :o :o :o :o :o
 45 seconds to run one work unit? amazing! am i assuming correctly that the gpu can only do 1 at a time, or do more, but slower? can boinc even get enough work units at that speed? :D. i have an ati card and i wish there would be an app for it, hell for any video card for that matter:)
- 
				Hi
 
 
 What req. GPU APP need ?   ( OS and CPU ?)
 Winxp or Win Vista version 32 bit or 64 bit?
 CPU AMD or INTEL ?
- 
				Test wu-1 
 omg wow.  :o :o :o :o :o
 45 seconds to run one work unit? amazing!
 
 Test wu-1 is an artificially shortened WU specifically for quick tests comparing speeds of different app builds, and the 45 second run was on a non-GPU build.
 Joe
- 
				Hi
 
 
 What req. GPU APP need ?   ( OS and CPU ?)
 Winxp or Win Vista version 32 bit or 64 bit?
 CPU AMD or INTEL ?
 
 
 GeForce 8xxx and up is target GPU IMHO.
 Memory requirements is unclear for now but certainly 256Mb will be supported eventualy. But Maybe on 512MB and up some additional speedup will be possible due to extensiv precaching.
 
- 
				is this your guy's work?
 
 http://setiweb.ssl.berkeley.edu/beta/   no ati :(. :'(
- 
				no ....
			
- 
				quick test ....
 Quick timetable 
 
 WU : testWU-4.wu
 AK_v8_win_x64_SSE3.exe : 123.375 secs CPU
 setiathome_6.05_windows_intelx86__cuda.exe : 25.641 secs CPU
 Speedup     : 79.22%
 Ratio       : 4.81 x
 
 WU : testWU-7.wu
 AK_v8_win_x64_SSE3.exe : 138.609 secs CPU
 setiathome_6.05_windows_intelx86__cuda.exe : 28.063 secs CPU
 Speedup     : 79.75%
 Ratio       : 4.94 x
 
 [attachment deleted by admin]
- 
				Aww sweet !  ;D
			
- 
				how many points is that small work unit worth?
			
- 
				Seems like you were beaten to the punch?
 
 http://setiathome.berkeley.edu/beta/cuda.php
- 
				Seems like you were beaten to the punch?
 
 http://setiathome.berkeley.edu/beta/cuda.php
 
 Well, we working on AP GPU now, not on MB GPU ;) But it's REALLY great that Berkeley release their own CUDA MB, we all wanna look at sources now ;D
 
- 
				I don't think I understand what you mean by AP vs MB GPU.  I think I'm getting my letters mixed up with different acronyms.
 
 I think it is still a beta so it's not out officially but it is cool they are working on an app.
- 
				AP = AstroPulse, MB = MultiBeam (or SETI Enhanced).
 
- 
				Hello 
 
 New GPU CUDA seti not working on my computer :(
 
 Im download new beta BOINC etc.. in mesage CUDA detecte , adapter detect  (G8600GT) but no performance..
 I dont know why computing is very slow i have CUDA 2.0 version and (178.28)driver.
 
 ex.
 enchanced Athlon computeing in ~8000 sec vs GPU app CUDA ~17000 sec computing :o
 
 http://setiathome.berkeley.edu/result.php?resultid=1085527783 ( GPU )
 
 http://setiathome.berkeley.edu/result.php?resultid=1084236529  ( CPU opty. app for AMD )
 
 thx all help
 
- 
				YaY, another person in the 8600GT graphics card stutter club with me! .   I got it to work *OK* after some effort, but at first had trouble with driver crashes and it didn't work so well. I then made sure I updated to the the 178.28 combined display driver with Cuda 2.0, and it worked *a little* better.  Then I installed the cuda Beta 2.1 drivers over the top, and it became slightly better again, and has been crunching ever since.
 
 The system still gets periods where there is slowdown, but reducing the priority of the control app manually *appears* to have helped a little more, so I installed 'process lasso' to automatically set it to low-priority when the application starts.  No effect on speed of the GPU crunching it seems, but the periods of lag & stutter are now infrequent.
 
 HTH
- 
				enchanced Athlon computeing in ~8000 sec vs GPU app CUDA ~17000 sec computing :o
 
 http://setiathome.berkeley.edu/result.php?resultid=1085527783 ( GPU )
 
 That work unit was not done with the GPU... It was stock 6.03 CPU app
- 
				Good Spot!, it pays to look at the results you're identifying with  :)
			
- 
				enchanced Athlon computeing in ~8000 sec vs GPU app CUDA ~17000 sec computing :o
 
 http://setiathome.berkeley.edu/result.php?resultid=1085527783 ( GPU )
 
 That work unit was not done with the GPU... It was stock 6.03 CPU app
 
 
 
 I know but why CUDA not working ?:/
- 
				The cuda MB app was only released today out of beta...
			
- 
				And also CUDA will actual NOT work with Windows x64 and BOINC-Manager x64  :-\
			
- 
				And also CUDA will actual NOT work with Windows x64 and BOINC-Manager x64  :-\
 
 
 Lol  ;D
 
 at the moment i run seti cuda on vista x64 with boinc 6.4.5
 
 
 
 
 
 
- 
				Sorry, but i got error message, sounds like
 not for this computer ..
- 
				try to download different version of boinc client
 
 Best Regards
 D.Draganov
- 
				try to download different version of boinc client
 
 Best Regards
 D.Draganov
 
 Thanks, but i still waiting of the end of this beta ..
 Ony my ohter machine it prduced only errors, also on different versions.
 
 May be there will comes optimized CUDA-Apps  :D