+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: Seti@home CUDA 3.0 linux apps  (Read 12710 times)

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Seti@home CUDA 3.0 linux apps
« on: 08 Apr 2010, 03:25:05 pm »
Crunch3r has released four new seti@home cuda 3.0 apps here.

I checked the speed of the apps and the validity of the results. The tests were done in this machine.

First of all, if you haven't seen it, nvidia has released a Fermi compatibility guide where it says:

My application uses the CUDA Runtime API with CUDA Toolkit 2.1, 2.2, or 2.3.
How can I confirm that my application is ready to run on Fermi?
Answer: CUDA applications built using the CUDA Toolkit versions 2.1 through
2.3 are compatible with Fermi as long as they are built to include PTX versions of
their kernels. NVIDIA Driver versions 195.xx or newer allow the application to use
the PTX JIT code path. To test that PTX JIT is working for your application, you
can do the following:
- Go to the NVIDIA website, and install the latest R195 driver.
- Set the system environment flag CUDA_FORCE_PTX_JIT=1
- Launch your application.
When starting a CUDA application for the first time with the above environment
flag, the CUDA driver will JIT compile the PTX for each CUDA kernel that is used
into native CUBIN code. The generated CUBIN for the target GPU architecture is
cached by the CUDA driver. This cache persists across system shutdown/restart
If this test passes, then your application is ready for Fermi.

So I've done the following tests:
Cuda 2.2 app
cuda 2.3 libs OK
cuda 3 libs OK

CUDA_FORCE_PTX_JIT flag enabled:
cuda 2.3 libs OK
cuda 3 libs FAIL

All cuda 3 apps
cuda 3 libs OK

CUDA_FORCE_PTX_JIT flag enabled:
cuda 3 libs FAIL

The combination of CUDA_FORCE_PTX_JIT flag and cuda 3 libraries produces garbage results (all result overflows) no matter what app was used.

Now to the tests.
I used 10 different normal workunits. Instead of their names, I write their AR along with what was found, S=Spikes, P=Pulses. T=Triplets and G=Gaussians.
I used the time utility to get accurate running times. A bit of explanation:

real: elapsed real (wall clock) time used by the process
user: Total number of CPU-seconds that the process used directly (in user mode)
sys: Total number of CPU-seconds used by the system on behalf of the process (in kernel mode) e.g., executing system calls

Percentages were calculated taking as base the cuda 2.2 app with cuda 2.3 libraries
and no CUDA_FORCE_PTX_JIT flag with the formula 100*(cuda2.2-app)/cuda2.2. Exception is the VLAR were as base was used AKv8.

                                      AR 0.011923 0S 7P 1T 0G
                                      real                      user                     sys     
AK_V8 ssse3 64bit                  8886.5  sec              8876.21 sec                1.31 sec
CUDA 3                            10700.64 sec  -20.41%       52.13 sec  99.41%        5.82 sec -344.27%
cuda30                            10701.57 sec  -20.43%       52.77 sec  99.41%        5.86 sec -347.33%
cuda30_v0.2                       10701.18 sec  -20.42%       52.55 sec  99.41%        5.73 sec -337.40%

                              CUDA 2.2 app, CUDA 2.3 libs, no jit                                                                                           
                                      real                      user                     sys     
AR 0.230341 0S 1P 0T 18G           4956.77 sec                73.5  sec               11.71 sec
AR 0.265939 2S 0P 0T 5G            4021.79 sec                68.58 sec               10.6  sec
AR 0.309898 29S 1P 0T 0G            936.14 sec                41.94 sec                3.31 sec
AR 0.386085 3S 1P 0T 1G            2756.04 sec                48.42 sec               19.01 sec
AR 0.396978 6S 1P 2T 0G            2675.23 sec                58.76 sec                8.08 sec
AR 0.409679 2S 0P 0T 0G            2435.57 sec                57.46 sec                7.61 sec
AR 0.437709 1S 0P 1T 0G            2231.96 sec                55.85 sec                6.73 sec
AR 0.510893 4S 0P 0T 0G            2024.16 sec                53.12 sec                6.3  sec
AR 0.942199 8S 0P 0T 0G            1201.43 sec                46.57 sec                4.52 sec
Total                             23239.09 sec               504.2  sec               77.87 sec
                             CUDA 2.2 app, CUDA 2.3 libs, jit enabled                                                                                           
                                      real                      user                     sys     
AR 0.230341 0S 1P 0T 18G           4965.56 sec  -0.18%        82.1  sec -11.70%       12.24 sec  -4.53%
AR 0.265939 2S 0P 0T 5G            4030.33 sec  -0.21%        77.52 sec -13.04%       10.8  sec  -1.89%
AR 0.309898 29S 1P 0T 0G            945.22 sec  -0.97%        51.26 sec -22.22%        3.4  sec  -2.72%
AR 0.386085 3S 1P 0T 1G            2763.89 sec  -0.28%        68.13 sec -40.71%        8.22 sec  56.76%
AR 0.396978 6S 1P 2T 0G            2684.3  sec  -0.34%        68.25 sec -16.15%        8.24 sec  -1.98%
AR 0.409679 2S 0P 0T 0G            2443.44 sec  -0.32%        65.97 sec -14.81%        7.67 sec  -0.79%
AR 0.437709 1S 0P 1T 0G            2240.33 sec  -0.38%        64.65 sec -15.76%        7.08 sec  -5.20%
AR 0.510893 4S 0P 0T 0G            2033.17 sec  -0.45%        62.74 sec -18.11%        6.6  sec  -4.76%
AR 0.942199 8S 0P 0T 0G            1209.93 sec  -0.71%        55.3  sec -18.75%        4.85 sec  -7.30%
Total                             23316.17 sec  -0.33%       595.92 sec -18.19%       69.1  sec  11.26%
                                 CUDA 2.2 app, CUDA 3 libs, no jit                                                                                           
                                      real                      user                     sys     
AR 0.230341 0S 1P 0T 18G           5042.03 sec  -1.72%        72    sec   2.04%       11.98 sec  -2.31%
AR 0.265939 2S 0P 0T 5G            4102.33 sec  -2.00%        67.36 sec   1.78%       10.7  sec  -0.94%
AR 0.309898 29S 1P 0T 0G            973.15 sec  -3.95%        41.91 sec   0.07%        3.18 sec   3.93%
AR 0.386085 3S 1P 0T 1G            2823.52 sec  -2.45%        58.44 sec -20.69%        7.99 sec  57.97%
AR 0.396978 6S 1P 2T 0G            2742.01 sec  -2.50%        58.02 sec   1.26%        7.88 sec   2.48%
AR 0.409679 2S 0P 0T 0G            2501.22 sec  -2.70%        56.48 sec   1.71%        7.38 sec   3.02%
AR 0.437709 1S 0P 1T 0G            2295.95 sec  -2.87%        54.51 sec   2.40%        6.97 sec  -3.57%
AR 0.510893 4S 0P 0T 0G            2084.92 sec  -3.00%        51.32 sec   3.39%        7.26 sec -15.24%
AR 0.942199 8S 0P 0T 0G            1252.34 sec  -4.24%        45.24 sec   2.86%        4.42 sec   2.21%
Total                             23817.47 sec  -2.49%       505.28 sec  -0.21%       67.76 sec  12.98%
                                  CUDA 3 app, CUDA 3 libs, no jit                                                                                           
                                      real                      user                     sys     
AR 0.230341 0S 1P 0T 18G           5282.36 sec  -6.57%        72.33 sec   1.59%       11.7  sec   0.09%
AR 0.265939 2S 0P 0T 5G            4323.51 sec  -7.50%        67.5  sec   1.57%       10.54 sec   0.57%
AR 0.309898 29S 1P 0T 0G           1023.86 sec  -9.37%        42.41 sec  -1.12%        3.24 sec   2.11%
AR 0.386085 3S 1P 0T 1G            3007.54 sec  -9.13%        58.44 sec -20.69%        7.96 sec  58.13%
AR 0.396978 6S 1P 2T 0G            2921.9  sec  -9.22%        58.06 sec   1.19%        7.71 sec   4.58%
AR 0.409679 2S 0P 0T 0G            2678.24 sec  -9.96%        56.37 sec   1.90%        7.34 sec   3.55%
AR 0.437709 1S 0P 1T 0G            2459.33 sec -10.19%        55.56 sec   0.52%        6.89 sec  -2.38%
AR 0.510893 4S 0P 0T 0G            2234.59 sec -10.40%        52    sec   2.11%        6.18 sec   1.90%
AR 0.942199 8S 0P 0T 0G            1331.61 sec -10.84%        45.4  sec   2.51%        4.46 sec   1.33%
Total                             25262.94 sec  -8.71%       508.07 sec  -0.77%       66.02 sec  15.22%
                               CUDA 3 vlarkill app, CUDA 3 libs, no jit                                                                                           
                                      real                      user                     sys     
AR 0.230341 0S 1P 0T 18G           5282.05 sec  -6.56%        71.86 sec   2.23%       11.86 sec  -1.28%
AR 0.265939 2S 0P 0T 5G            4321.99 sec  -7.46%        66.95 sec   2.38%       10.26 sec   3.21%
AR 0.309898 29S 1P 0T 0G           1024.01 sec  -9.39%        42.57 sec  -1.50%        3.16 sec   4.53%
AR 0.386085 3S 1P 0T 1G            3007.05 sec  -9.11%        58.21 sec -20.22%        7.71 sec  59.44%
AR 0.396978 6S 1P 2T 0G            2923.24 sec  -9.27%        58.72 sec   0.07%        7.8  sec   3.47%
AR 0.409679 2S 0P 0T 0G            2678.58 sec  -9.98%        56.52 sec   1.64%        7.45 sec   2.10%
AR 0.437709 1S 0P 1T 0G            2458.97 sec -10.17%        55.2  sec   1.16%        7    sec  -4.01%
AR 0.510893 4S 0P 0T 0G            2233.06 sec -10.32%        52.49 sec   1.19%        5.7  sec   9.52%
AR 0.942199 8S 0P 0T 0G            1331.66 sec -10.84%        45.38 sec   2.56%        4.43 sec   1.99%
Total                             25260.61 sec  -8.70%       507.9  sec  -0.73%       65.37 sec  16.05%

                                     cuda30 app, CUDA 3 libs, no jit                                                                                         
                                      real                      user                     sys     
AR  0.230341 0S 1P 0T 18G          5282.88 sec  -6.58%        72.34 sec   1.58%       11.7  sec   0.09%
AR  0.265939 2P 0P 0T 5G           4322.89 sec  -7.49%        67.25 sec   1.94%       10.55 sec   0.47%
AR  0.309898 29S 1P 0T 0G          1023.61 sec  -9.34%        42.21 sec  -0.64%        3.22 sec   2.72%
AR  0.386085 3S 1P 0T 1G           3008.17 sec  -9.15%        58.68 sec -21.19%        8.14 sec  57.18%
AR  0.396978 6S 1P 2T 0G           2921.98 sec  -9.22%        57.88 sec   1.50%        7.67 sec   5.07%
AR  0.409679 2S 0P 0T 0G           2678.14 sec  -9.96%        56.3  sec   2.02%        7.35 sec   3.42%
AR  0.437709 1S 0P 1T 0G           2458.46 sec -10.15%        55.11 sec   1.32%        7.06 sec  -4.90%
AR  0.510893 4S 0P 0T 0G           2235.67 sec -10.45%        52.51 sec   1.15%        6.36 sec  -0.95%
AR  0.942199 8S 0P 0T 0G           1332.55 sec -10.91%        46.04 sec   1.14%        4.38 sec   3.10%
Total                             25264.35 sec  -8.71%       508.32 sec  -0.82%       66.43 sec  14.69%
                                     cuda30_v0.2 app, CUDA 3 libs, no jit                                                     
                                      real                      user                     sys     
AR  0.230341 0S 1P 0T 18G          5282.94 sec  -6.58%        72.91 sec   0.80%       11.91 sec  -1.71%
AR  0.265939 2S 0P 0T 5G           4322.73 sec  -7.48%        67.48 sec   1.60%       10.58 sec   0.19%
AR  0.309898 29S 1P 0T 0G          1023.14 sec  -9.29%        41.73 sec   0.50%        3.19 sec   3.63%
AR  0.386085 3S 1P 0T 1G           3008.15 sec  -9.15%        58.85 sec -21.54%        7.9  sec  58.44%
AR  0.396978 6S 1P 2T 0G           2922.16 sec  -9.23%        57.98 sec   1.33%        7.8  sec   3.47%
AR  0.409679 2S 0P 0T 0G           2678.51 sec  -9.97%        56.49 sec   1.69%        7.52 sec   1.18%
AR  0.437709 1S 0P 1T 0G           2458.38 sec -10.14%        55.24 sec   1.09%        6.86 sec  -1.93%
AR  0.510893 4S 0P 0T 0G           2235.82 sec -10.46%        52.67 sec   0.85%        6.3  sec   0.00%
AR  0.942199 8S 0P 0T 0G           1332.55 sec -10.91%        43.17 sec   7.30%        7.38 sec -63.27%
Total                             25264.38 sec  -8.72%       506.52 sec  -0.46%       69.44 sec  10.83%

All results, in any permutation, were strongly similar.

So the fastest combination was cuda 2.2 app, cuda 2.3 libs and no jit flag, at least for this cuda 2.x enabled graphics card.

I guess the combination of a cuda 3 app with cuda 3 libs will show their real face only in a Fermi based graphics card.

Offline Pepi

  • Knight o' The Realm
  • **
  • Posts: 119
Re: Seti@home CUDA 3.0 linux apps
« Reply #1 on: 20 Apr 2010, 01:22:46 pm »
Set the system environment flag CUDA_FORCE_PTX_JIT=1

Where to put and what exactly I need to write?

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: Seti@home CUDA 3.0 linux apps
« Reply #2 on: 20 Apr 2010, 01:34:29 pm »
In the file /etc/profile write


You'll need to restart your PC before it takes effect.


Welcome, Guest.
Please login or register.
Forgot your password?
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Total Posts: 59559
Total Topics: 1672
Most Online Today: 41
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 29
Total: 29
Powered by EzPortal