+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: CUDA for prime number search  (Read 42274 times)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: CUDA for prime number search
« Reply #15 on: 17 Sep 2011, 02:51:38 pm »
If someone would have a closer look at llr
I found now llr download-area

heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: CUDA for prime number search
« Reply #16 on: 17 Sep 2011, 05:54:42 pm »
could compile llr with VS2008 and CUDA40
1>llrcuda_win64 - 0 Fehler, 336 Warnung(en)
========== Alles neu erstellen: 1 erfolgreich, Fehler bei 0, 0 übersprungen ==========

The using of cutil.h cutil_inline.h in the project llr under CUDA40 is a bit problematic, cutil is no longer part of CUDA(since 4.0)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: CUDA for prime number search
« Reply #17 on: 17 Sep 2011, 06:11:42 pm »
Well done Heinz.  If you plan for boinc lib updates first (to fix exit conditions) then optimisation I can give more hints as time goes on.

Jason

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: CUDA for prime number search
« Reply #18 on: 17 Sep 2011, 07:23:07 pm »
I run a short test with the original llrCUDA not my compiled version on i3 GT540M
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"9999*2^458051+1" -d
Starting Proth prime test of 9999*2^458051+1
Using complex irrational base DWT, FFT length = 65536, a = 5

9999*2^458051+1 is prime!  Time : 487.041 sec..  Time per bit: 1.060 ms.

C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"1000065*2^390927-1" -d

Starting Lucas Lehmer Riesel prime test of 1000065*2^390927-1
Using real irrational base DWT, FFT length = 131072
V1 = 5 ; Computing U0...
V1 = 5 ; Computing U0...done.
Starting Lucas-Lehmer loop...
1000065*2^390927-1, iteration : 10000 / 390927 [2.55%].  Time per iteration : 1.
1000065*2^390927-1, iteration : 20000 / 390927 [5.11%].  Time per iteration : 1.
1000065*2^390927-1, iteration : 30000 / 390927 [7.67%].  Time per iteration : 1.
1000065*2^390927-1, iteration : 40000 / 390927 [10.23%].  Time per iteration : 1
...
...
1000065*2^390927-1, iteration : 190000 / 390927 [48.60%].  Time per iteration :
Iter: 192128/390926, ERROR: ROUND OFF (0.4675197601) > 0.4
Continuing from last save file.
Resuming LLR test of 1000065*2^390927-1 at iteration 2 [0.00%]
1000065*2^390927-1, iteration : 10000 / 390927 [2.55%].  Time per iteration : 1.
1000065*2^390927-1, iteration : 20000 / 390927 [5.11%].  Time per iteration : 1.
1000065*2^390927-1, iteration : 30000 / 390927 [7.67%].  Time per iteration : 1.
1000065*2^390927-1, iteration : 40000 / 390927 [10.23%].  Time per iteration : 1
..
..
1000065*2^390927-1, iteration : 380000 / 390927 [97.20%].  Time per iteration :
1000065*2^390927-1, iteration : 390000 / 390927 [99.76%].  Time per iteration :
1000065*2^390927-1 is not prime.  LLR Res64: 5704E082C8671874  Time : 721.315 sec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"313*2^1012240+1" -d
Starting Proth prime test of 313*2^1012240+1
Using complex irrational base DWT, FFT length = 131072, a = 3

313*2^1012240+1 is not prime.  Proth RES64: A3FC31A0497414EE  Time : 1949.425 sec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"192971*2^4998058-1" -d

Starting Lucas Lehmer Riesel prime test of 192971*2^4998058-1
Using real irrational base DWT, FFT length = 1048576
V1 = 4 ; Computing U0...
V1 = 4 ; Computing U0...done.
Starting Lucas-Lehmer loop...
192971*2^4998058-1, iteration : 10000 / 4998058 [0.20%].  Time per iteration : 2
192971*2^4998058-1, iteration : 20000 / 4998058 [0.40%].  Time per iteration : 1
192971*2^4998058-1, iteration : 30000 / 4998058 [0.60%].  Time per iteration : 1
192971*2^4998058-1, iteration : 40000 / 4998058 [0.80%].  Time per iteration : 1
...
...
192971*2^4998058-1, iteration : 2500000 / 4998058 [50.01%].  Time per iteration
192971*2^4998058-1, iteration : 2510000 / 4998058 [50.21%].  Time per iteration
192971*2^4998058-1, iteration : 2520000 / 4998058 [50.41%].  Time per iteration
192971*2^4998058-1, iteration : 2530000 / 4998058 [50.61%].  Time per iteration
...
...
192971*2^4998058-1, iteration : 4970000 / 4998058 [99.43%].  Time per iteration
192971*2^4998058-1, iteration : 4980000 / 4998058 [99.63%].  Time per iteration
192971*2^4998058-1, iteration : 4990000 / 4998058 [99.83%].  Time per iteration
192971*2^4998058-1 is not prime.  LLR Res64: DBBFCB63CFBA6EA2  Time : 71172.972sec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"3*2^7033641+1" -d
Starting Proth prime test of 3*2^7033641+1
Using complex irrational base DWT, FFT length = 1048576, a = 5

3*2^7033641+1, bit: 90000 / 7033642 [1.27%].  Time per bit: 14.932 ms.
..
3*2^7033641+1, bit: 2590000 / 7033642 [36.82%].  Time per bit: 14.932 ms.
3*2^7033641+1, bit: 2770000 / 7033642 [39.38%].  Time per bit: 14.931 ms.
3*2^7033641+1, bit: 4700000 / 7033642 [66.82%].  Time per bit: 14.932 ms. (20 hours)
...
3*2^7033641+1 is not prime.  Proth RES64: 4DDC768A04467D4E  Time : 105090.700 sec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

uhh, one is running about 19 hours,
next one seems to be a long runner too, precalculation says ~30 hours.....I will see the end..
Remark: GPU temp increased from 70 to 79 grd C
ready now, it was a longer test...
I will rerun the first two tasks to see differences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Will make a modified batchfile for speed-testing variants
« Last Edit: 20 Sep 2011, 02:30:23 am by _heinz »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: CUDA for prime number search
« Reply #19 on: 18 Sep 2011, 10:22:45 am »
Well done Heinz.  If you plan for boinc lib updates first (to fix exit conditions) then optimisation I can give more hints as time goes on.

Jason
Hi Jason
It's a good idea to make the boinc lib updates now...
some hints, links ?

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: CUDA for prime number search
« Reply #20 on: 18 Sep 2011, 10:28:28 am »
Hi Jason
It's a good idea to make the boinc lib updates now...
some hints, links ?

First step, you'll need to look at building some updated Boinc libs from the Boinc trunk, then making them fit into the application, which will require some app updates of files reference in Boinc to be compatible, and to use the newer GPU related features.

Offline aaronhaviland

  • guinea-pig
  • Volunteer Developer
  • Knight o' The Realm
  • *****
  • Posts: 113
    • My computers at seti@home
Re: CUDA for prime number search
« Reply #21 on: 18 Sep 2011, 10:30:18 am »
The using of cutil.h cutil_inline.h in the project llr under CUDA40 is a bit problematic, cutil is no longer part of CUDA(since 4.0)

It doesn't use CUTIL for much, anyway. It doesn't take much to remove this dependency. (So far as I've seen, most projects that use CUTIL only use cutilSafeCall() and/or cufftSafeCall()).

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: CUDA for prime number search
« Reply #22 on: 18 Sep 2011, 11:28:38 am »
Hi Jason
It's a good idea to make the boinc lib updates now...
some hints, links ?

First step, you'll need to look at building some updated Boinc libs from the Boinc trunk, then making them fit into the application, which will require some app updates of files reference in Boinc to be compatible, and to use the newer GPU related features.

C:\I\SC\pg\Ken-g6-PSieve-CUDA-a17a696_heinz\boinc
At revision: 24231
One or more files are in a conflicted state.

Offline aaronhaviland

  • guinea-pig
  • Volunteer Developer
  • Knight o' The Realm
  • *****
  • Posts: 113
    • My computers at seti@home
Re: CUDA for prime number search
« Reply #23 on: 18 Sep 2011, 09:58:41 pm »
C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"1000065*2^390927-1" -d
1000065*2^390927-1 is not prime.  LLR Res64: 5704E082C8671874  Time : 721.315 sec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"313*2^1012240+1" -d
313*2^1012240+1 is not prime.  Proth RES64: A3FC31A0497414EE  Time : 1949.425 sec.

I'm concerned, you're getting the wrong results here: "1000065*2^390927-1" should be prime! "313*2^1012240+1" should return 5FA128A9BECBCDD3.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: CUDA for prime number search
« Reply #24 on: 19 Sep 2011, 02:42:14 am »
C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"1000065*2^390927-1" -d
1000065*2^390927-1 is not prime.  LLR Res64: 5704E082C8671874  Time : 721.315 sec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"313*2^1012240+1" -d
313*2^1012240+1 is not prime.  Proth RES64: A3FC31A0497414EE  Time : 1949.425 sec.

I'm concerned, you're getting the wrong results here: "1000065*2^390927-1" should be prime! "313*2^1012240+1" should return 5FA128A9BECBCDD3.
Although I run the original downloded llrCUDA.exe, will rerun those two(if whole test ends), too see if I get yor result.
If not we have a problem there.
heinz

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: CUDA for prime number search
« Reply #25 on: 20 Sep 2011, 03:51:54 am »
Rerun of those two, this time with right results

C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"1000065*2^390927-1" -d

Starting Lucas Lehmer Riesel prime test of 1000065*2^390927-1
Using real irrational base DWT, FFT length = 131072
V1 = 5 ; Computing U0...
V1 = 5 ; Computing U0...done.
Starting Lucas-Lehmer loop...
1000065*2^390927-1, iteration : 10000 / 390927 [2.55%].  Time per iteration : 1.
...
1000065*2^390927-1, iteration : 390000 / 390927 [99.76%].  Time per iteration :
1000065*2^390927-1 is prime!  Time : 701.721 sec.

C:\I\llrCUDA.0.60-win64\llrCUDA.0.60-win64>llrCUDA.exe -q"313*2^1012240+1" -d
Starting Proth prime test of 313*2^1012240+1
Using complex irrational base DWT, FFT length = 131072, a = 3

313*2^1012240+1 is not prime.  Proth RES64: 5FA128A9BECBCDD3  Time : 1894.073 se
c.
« Last Edit: 20 Sep 2011, 04:01:17 am by _heinz »

Offline Pepi

  • Knight o' The Realm
  • **
  • Posts: 119
Re: CUDA for prime number search
« Reply #26 on: 20 Sep 2011, 12:39:36 pm »
Is this some new build or "old" one? :)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: CUDA for prime number search
« Reply #27 on: 20 Sep 2011, 12:45:17 pm »
Is this some new build or "old" one? :)

Likely work in progress, so not even alpha yet Pepi  ;)

Offline Pepi

  • Knight o' The Realm
  • **
  • Posts: 119
Re: CUDA for prime number search
« Reply #28 on: 20 Sep 2011, 12:54:47 pm »
Alpha or Beta, everything shows progress :)
Hip hip huray :)
It was small step for me, but big step for my GPU :)

Offline aaronhaviland

  • guinea-pig
  • Volunteer Developer
  • Knight o' The Realm
  • *****
  • Posts: 113
    • My computers at seti@home
Re: CUDA for prime number search
« Reply #29 on: 10 Oct 2011, 07:09:05 pm »
On a slightly related note, I've been working with CUDALucas a bit recently, as the current devs of it over at mersenneforum.org had completely broken it as far as Linux support.
It is only for testing mersenne primes, and limited by memory to only being able to test primes up to around 2290000000-1... which would currently take about 245 days on a GTX460  (an exponent, which if my calculations are correct, would take about 19 years on a 2GHz single core CPU. The next Mersenne to win an EFF Cooperative Computing Award would be around 2336000000-1)

My fork (only builds on 64-bit Linux currently): https://github.com/ah42/CUDALucas

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 18
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 16
Total: 16
Powered by EzPortal