Author Topic: Seti@home CUDA 3.0 linux apps (Read 13405 times)

sunu · « **on:** 08 Apr 2010, 03:25:05 pm »

Crunch3r has released four new seti@home cuda 3.0 apps here.

I checked the speed of the apps and the validity of the results. The tests were done in this machine.

First of all, if you haven't seen it, nvidia has released a Fermi compatibility guide where it says:

Quote

My application uses the CUDA Runtime API with CUDA Toolkit 2.1, 2.2, or 2.3.
How can I confirm that my application is ready to run on Fermi?
Answer: CUDA applications built using the CUDA Toolkit versions 2.1 through
2.3 are compatible with Fermi as long as they are built to include PTX versions of
their kernels. NVIDIA Driver versions 195.xx or newer allow the application to use
the PTX JIT code path. To test that PTX JIT is working for your application, you
can do the following:
- Go to the NVIDIA website, and install the latest R195 driver.
- Set the system environment flag CUDA_FORCE_PTX_JIT=1
- Launch your application.
When starting a CUDA application for the first time with the above environment
flag, the CUDA driver will JIT compile the PTX for each CUDA kernel that is used
into native CUBIN code. The generated CUBIN for the target GPU architecture is
cached by the CUDA driver. This cache persists across system shutdown/restart
events.
If this test passes, then your application is ready for Fermi.

So I've done the following tests:
Cuda 2.2 app
No CUDA_FORCE_PTX_JIT flag:
cuda 2.3 libs OK
cuda 3 libs OK

CUDA_FORCE_PTX_JIT flag enabled:
cuda 2.3 libs OK
cuda 3 libs FAIL

All cuda 3 apps
No CUDA_FORCE_PTX_JIT flag:
cuda 3 libs OK

CUDA_FORCE_PTX_JIT flag enabled:
cuda 3 libs FAIL

The combination of CUDA_FORCE_PTX_JIT flag and cuda 3 libraries produces garbage results (all result overflows) no matter what app was used.

Now to the tests.
I used 10 different normal workunits. Instead of their names, I write their AR along with what was found, S=Spikes, P=Pulses. T=Triplets and G=Gaussians.
I used the time utility to get accurate running times. A bit of explanation:

real: elapsed real (wall clock) time used by the process
user: Total number of CPU-seconds that the process used directly (in user mode)
sys: Total number of CPU-seconds used by the system on behalf of the process (in kernel mode) e.g., executing system calls

Percentages were calculated taking as base the cuda 2.2 app with cuda 2.3 libraries
and no CUDA_FORCE_PTX_JIT flag with the formula 100*(cuda2.2-app)/cuda2.2. Exception is the VLAR were as base was used AKv8.

                                      AR 0.011923 0S 7P 1T 0G
                                      real                      user                     sys      
AK_V8 ssse3 64bit                  8886.5  sec              8876.21 sec                1.31 sec
CUDA 3                            10700.64 sec  -20.41%       52.13 sec  99.41%        5.82 sec -344.27%
cuda30                            10701.57 sec  -20.43%       52.77 sec  99.41%        5.86 sec -347.33%
cuda30_v0.2                       10701.18 sec  -20.42%       52.55 sec  99.41%        5.73 sec -337.40%



                              CUDA 2.2 app, CUDA 2.3 libs, no jit                                                                                            
                                      real                      user                     sys      
AR 0.230341 0S 1P 0T 18G           4956.77 sec                73.5  sec               11.71 sec
AR 0.265939 2S 0P 0T 5G            4021.79 sec                68.58 sec               10.6  sec
AR 0.309898 29S 1P 0T 0G            936.14 sec                41.94 sec                3.31 sec
AR 0.386085 3S 1P 0T 1G            2756.04 sec                48.42 sec               19.01 sec
AR 0.396978 6S 1P 2T 0G            2675.23 sec                58.76 sec                8.08 sec
AR 0.409679 2S 0P 0T 0G            2435.57 sec                57.46 sec                7.61 sec
AR 0.437709 1S 0P 1T 0G            2231.96 sec                55.85 sec                6.73 sec
AR 0.510893 4S 0P 0T 0G            2024.16 sec                53.12 sec                6.3  sec
AR 0.942199 8S 0P 0T 0G            1201.43 sec                46.57 sec                4.52 sec
                                                                                                
Total                             23239.09 sec               504.2  sec               77.87 sec
 
                                                                                               
                                                                                                
                             CUDA 2.2 app, CUDA 2.3 libs, jit enabled                                                                                            
                                      real                      user                     sys      
AR 0.230341 0S 1P 0T 18G           4965.56 sec  -0.18%        82.1  sec -11.70%       12.24 sec  -4.53%
AR 0.265939 2S 0P 0T 5G            4030.33 sec  -0.21%        77.52 sec -13.04%       10.8  sec  -1.89%
AR 0.309898 29S 1P 0T 0G            945.22 sec  -0.97%        51.26 sec -22.22%        3.4  sec  -2.72%
AR 0.386085 3S 1P 0T 1G            2763.89 sec  -0.28%        68.13 sec -40.71%        8.22 sec  56.76%
AR 0.396978 6S 1P 2T 0G            2684.3  sec  -0.34%        68.25 sec -16.15%        8.24 sec  -1.98%
AR 0.409679 2S 0P 0T 0G            2443.44 sec  -0.32%        65.97 sec -14.81%        7.67 sec  -0.79%
AR 0.437709 1S 0P 1T 0G            2240.33 sec  -0.38%        64.65 sec -15.76%        7.08 sec  -5.20%
AR 0.510893 4S 0P 0T 0G            2033.17 sec  -0.45%        62.74 sec -18.11%        6.6  sec  -4.76%
AR 0.942199 8S 0P 0T 0G            1209.93 sec  -0.71%        55.3  sec -18.75%        4.85 sec  -7.30%
                                                                                                
Total                             23316.17 sec  -0.33%       595.92 sec -18.19%       69.1  sec  11.26%
                                                                                                
                                                                                                
                                                                                                
                                 CUDA 2.2 app, CUDA 3 libs, no jit                                                                                            
                                      real                      user                     sys      
AR 0.230341 0S 1P 0T 18G           5042.03 sec  -1.72%        72    sec   2.04%       11.98 sec  -2.31%
AR 0.265939 2S 0P 0T 5G            4102.33 sec  -2.00%        67.36 sec   1.78%       10.7  sec  -0.94%
AR 0.309898 29S 1P 0T 0G            973.15 sec  -3.95%        41.91 sec   0.07%        3.18 sec   3.93%
AR 0.386085 3S 1P 0T 1G            2823.52 sec  -2.45%        58.44 sec -20.69%        7.99 sec  57.97%
AR 0.396978 6S 1P 2T 0G            2742.01 sec  -2.50%        58.02 sec   1.26%        7.88 sec   2.48%
AR 0.409679 2S 0P 0T 0G            2501.22 sec  -2.70%        56.48 sec   1.71%        7.38 sec   3.02%
AR 0.437709 1S 0P 1T 0G            2295.95 sec  -2.87%        54.51 sec   2.40%        6.97 sec  -3.57%
AR 0.510893 4S 0P 0T 0G            2084.92 sec  -3.00%        51.32 sec   3.39%        7.26 sec -15.24%
AR 0.942199 8S 0P 0T 0G            1252.34 sec  -4.24%        45.24 sec   2.86%        4.42 sec   2.21%
                                                                                                
Total                             23817.47 sec  -2.49%       505.28 sec  -0.21%       67.76 sec  12.98%
                                                                                                
                                                                                                
                                                                                                
                                  CUDA 3 app, CUDA 3 libs, no jit                                                                                            
                                      real                      user                     sys      
AR 0.230341 0S 1P 0T 18G           5282.36 sec  -6.57%        72.33 sec   1.59%       11.7  sec   0.09%
AR 0.265939 2S 0P 0T 5G            4323.51 sec  -7.50%        67.5  sec   1.57%       10.54 sec   0.57%
AR 0.309898 29S 1P 0T 0G           1023.86 sec  -9.37%        42.41 sec  -1.12%        3.24 sec   2.11%
AR 0.386085 3S 1P 0T 1G            3007.54 sec  -9.13%        58.44 sec -20.69%        7.96 sec  58.13%
AR 0.396978 6S 1P 2T 0G            2921.9  sec  -9.22%        58.06 sec   1.19%        7.71 sec   4.58%
AR 0.409679 2S 0P 0T 0G            2678.24 sec  -9.96%        56.37 sec   1.90%        7.34 sec   3.55%
AR 0.437709 1S 0P 1T 0G            2459.33 sec -10.19%        55.56 sec   0.52%        6.89 sec  -2.38%
AR 0.510893 4S 0P 0T 0G            2234.59 sec -10.40%        52    sec   2.11%        6.18 sec   1.90%
AR 0.942199 8S 0P 0T 0G            1331.61 sec -10.84%        45.4  sec   2.51%        4.46 sec   1.33%
                                                                                                
Total                             25262.94 sec  -8.71%       508.07 sec  -0.77%       66.02 sec  15.22%
                                                                                                
                                                                                                
                                                                                                
                               CUDA 3 vlarkill app, CUDA 3 libs, no jit                                                                                           
                                      real                      user                     sys      
AR 0.230341 0S 1P 0T 18G           5282.05 sec  -6.56%        71.86 sec   2.23%       11.86 sec  -1.28%
AR 0.265939 2S 0P 0T 5G            4321.99 sec  -7.46%        66.95 sec   2.38%       10.26 sec   3.21%
AR 0.309898 29S 1P 0T 0G           1024.01 sec  -9.39%        42.57 sec  -1.50%        3.16 sec   4.53%
AR 0.386085 3S 1P 0T 1G            3007.05 sec  -9.11%        58.21 sec -20.22%        7.71 sec  59.44%
AR 0.396978 6S 1P 2T 0G            2923.24 sec  -9.27%        58.72 sec   0.07%        7.8  sec   3.47%
AR 0.409679 2S 0P 0T 0G            2678.58 sec  -9.98%        56.52 sec   1.64%        7.45 sec   2.10%
AR 0.437709 1S 0P 1T 0G            2458.97 sec -10.17%        55.2  sec   1.16%        7    sec  -4.01%
AR 0.510893 4S 0P 0T 0G            2233.06 sec -10.32%        52.49 sec   1.19%        5.7  sec   9.52%
AR 0.942199 8S 0P 0T 0G            1331.66 sec -10.84%        45.38 sec   2.56%        4.43 sec   1.99%
                                                                                                
Total                             25260.61 sec  -8.70%       507.9  sec  -0.73%       65.37 sec  16.05%



                                     cuda30 app, CUDA 3 libs, no jit                                                                                          
                                      real                      user                     sys      
AR  0.230341 0S 1P 0T 18G          5282.88 sec  -6.58%        72.34 sec   1.58%       11.7  sec   0.09%
AR  0.265939 2P 0P 0T 5G           4322.89 sec  -7.49%        67.25 sec   1.94%       10.55 sec   0.47%
AR  0.309898 29S 1P 0T 0G          1023.61 sec  -9.34%        42.21 sec  -0.64%        3.22 sec   2.72%
AR  0.386085 3S 1P 0T 1G           3008.17 sec  -9.15%        58.68 sec -21.19%        8.14 sec  57.18%
AR  0.396978 6S 1P 2T 0G           2921.98 sec  -9.22%        57.88 sec   1.50%        7.67 sec   5.07%
AR  0.409679 2S 0P 0T 0G           2678.14 sec  -9.96%        56.3  sec   2.02%        7.35 sec   3.42%
AR  0.437709 1S 0P 1T 0G           2458.46 sec -10.15%        55.11 sec   1.32%        7.06 sec  -4.90%
AR  0.510893 4S 0P 0T 0G           2235.67 sec -10.45%        52.51 sec   1.15%        6.36 sec  -0.95%
AR  0.942199 8S 0P 0T 0G           1332.55 sec -10.91%        46.04 sec   1.14%        4.38 sec   3.10%
                                                                                                
Total                             25264.35 sec  -8.71%       508.32 sec  -0.82%       66.43 sec  14.69%
                                                                                                
                                                                                                
                                                                                                
                                     cuda30_v0.2 app, CUDA 3 libs, no jit                                                     
                                      real                      user                     sys      
AR  0.230341 0S 1P 0T 18G          5282.94 sec  -6.58%        72.91 sec   0.80%       11.91 sec  -1.71%
AR  0.265939 2S 0P 0T 5G           4322.73 sec  -7.48%        67.48 sec   1.60%       10.58 sec   0.19%
AR  0.309898 29S 1P 0T 0G          1023.14 sec  -9.29%        41.73 sec   0.50%        3.19 sec   3.63%
AR  0.386085 3S 1P 0T 1G           3008.15 sec  -9.15%        58.85 sec -21.54%        7.9  sec  58.44%
AR  0.396978 6S 1P 2T 0G           2922.16 sec  -9.23%        57.98 sec   1.33%        7.8  sec   3.47%
AR  0.409679 2S 0P 0T 0G           2678.51 sec  -9.97%        56.49 sec   1.69%        7.52 sec   1.18%
AR  0.437709 1S 0P 1T 0G           2458.38 sec -10.14%        55.24 sec   1.09%        6.86 sec  -1.93%
AR  0.510893 4S 0P 0T 0G           2235.82 sec -10.46%        52.67 sec   0.85%        6.3  sec   0.00%
AR  0.942199 8S 0P 0T 0G           1332.55 sec -10.91%        43.17 sec   7.30%        7.38 sec -63.27%
                                                                                                
Total                             25264.38 sec  -8.72%       506.52 sec  -0.46%       69.44 sec  10.83%

All results, in any permutation, were strongly similar.

So the fastest combination was cuda 2.2 app, cuda 2.3 libs and no jit flag, at least for this cuda 2.x enabled graphics card.

I guess the combination of a cuda 3 app with cuda 3 libs will show their real face only in a Fermi based graphics card.

Pepi · « **Reply #1 on:** 20 Apr 2010, 01:22:46 pm »

Set the system environment flag CUDA_FORCE_PTX_JIT=1

Where to put and what exactly I need to write?

sunu · « **Reply #2 on:** 20 Apr 2010, 01:34:29 pm »

In the file /etc/profile write

export CUDA_FORCE_PTX_JIT=1

You'll need to restart your PC before it takes effect.

Author Topic: Seti@home CUDA 3.0 linux apps (Read 13405 times)

sunu

Seti@home CUDA 3.0 linux apps

Pepi

Re: Seti@home CUDA 3.0 linux apps

sunu

Re: Seti@home CUDA 3.0 linux apps