+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: OpenCL AP v7 memory consumption  (Read 18164 times)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
OpenCL AP v7 memory consumption
« on: 24 Oct 2014, 05:07:23 pm »
Logged ~ 1 day of my host work (HD6950 running 2 AP instances, logging for one of them) and got such picture.
So, my host affected with memory consumption increase runing wild tasks (as one can see in bottom of picture max working set was ~750MB while usual near 100MB). Next I will reproduce situation with Juan's test case task in offline test, with more detailed logging.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #1 on: 24 Oct 2014, 05:59:40 pm »
As one can see this task (ap_26jn14aa_B6_P0_00184_20141017_17161.wu) consumes more memory than usual indeed, but far less than reported ~1GB (with same settings applied, but ATi instead of NV device/app).

1 - start of offline ap_26jn14aa_B6_P0_00184_20141017_17161.wu run.
2 - usual memory consumption for live task with different options (much less ffa_block values)
3 - restart of live task.

Will check second test case now.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #2 on: 25 Oct 2014, 04:29:55 am »
Picture for second ( ap_28jn14aa_B1_P1_00131_20141017_13023.wu ) task is much interesting.

There is huge increase in memory consumption. Up to the point of exception.

Quote
ERROR: some exception inside long FFA, probably video-driver restart, restarting app...

So, memory increase occurs during whole task live that implies memory not freed between FFA calls (EDIT: need to see if "whole task life" consists of few FFA calls or only single one for this particular task)

Next run will be with defaults.
« Last Edit: 25 Oct 2014, 05:26:57 am by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #3 on: 25 Oct 2014, 07:44:05 am »
Run with defaults not crashed, completed MUCH FASTER than with big -ffa_block and shows that overflow occured in first large FFA pair.
Such FFA pair is processed as single entity for TWIN_FFA mod. So it's impossible to say if memory leaked between FFA runs or not.
Memory increase over usual task still very considerable (defaults for HD6950 mean 5632 threads in fly).

Next to discover is how many FFA blocks were processed before task completion. Each FFA block should behave independently from prev ones in sense of memory consumption...
« Last Edit: 25 Oct 2014, 07:46:16 am by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #4 on: 25 Oct 2014, 01:22:48 pm »
Interestingly, with signal logging enabled (-v 2 option) memory kinetics looks different. But roughly same memory consumption in the end.
Task run took ~3hours instead of <30 mins though.

And it seems there is nothing to fix. Whole enormous number of signals found and eventually replaced/discarded was found inside single FFA block (governed by -ffa_block param). That is, the only way to reduce memory requirement for such kind of task is to reduce -ffa_block value.

The only thing to check now is to decrease -ffa_block below default value to observe memory (and processing time!) savings for such kind of task.

Signal log (stderr) takes ~340 MB (!). Even in 7z-ipped form it takes 17MB so I'll will put it to cloud instead of attaching here directly.
EDIT: link to logs https://cloud.mail.ru/public/8259db7e0b1f%2FCHII-20141025-1841-benchAP.txt.7z
« Last Edit: 25 Oct 2014, 01:43:30 pm by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #5 on: 25 Oct 2014, 02:27:51 pm »
Well, something can be done. Even if not with peak memory consumption but at least with how long that peak consumption remains.
From current run I did with reduced -ffa_block 1024 it seems memory was not freed after FFA finish and return to main loop (flat area on graph).

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: OpenCL AP v7 memory consumption
« Reply #6 on: 26 Oct 2014, 05:01:46 am »
I didn`t assume there is something to fix.

Easiest way for those affected is to use either no ffa_block switch or low values.

oclfft_plan in conjuntion with tune param will still speed up processing.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #7 on: 26 Oct 2014, 05:34:40 am »
Well, each tuning option has it's own point of application inside code so I don't think one of options could replace the effect of other. Yes, same (but suboptimal) performance can be reached by different options combos. Absolute peak would be on some particular options combo though.
Then additional considerations with practical implementation will arise of course:
1) how broad/sharp  that peak in tuning options parameter space. If it broad enough then one can change one of options quite strongly staying in acceptable performance area (it's the case where Mike's approach to solution will work fine).
2) how that absolute peak depends on data processed. There is such dependence in current OpenCL code. App's performance will depend, in particular, from number of already found signals, total number of signals, and (apparent, known from very begining and reduced considerably in v7) from blanking %. All this makes that performance peak moving in app's tuning parameter space. That per task movement should be averaged if one estimates overall host performance that leads to broadening of resulting host performance peak in app's tuning parameter space.
So, if broadening strong enough we again recive broad peak.

I prefer to exclude out of memory crashes though as much as possible and with initial params string I saw such crash on my own hardware with TestCase task (and I'm quite sure same parameter string will work w/o crash on regular task).
« Last Edit: 26 Oct 2014, 05:57:15 am by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #8 on: 26 Oct 2014, 05:44:03 am »
And here is first results on the improvements path for this area.
This picture should be directly compared with one posted here: http://lunatics.kwsn.net/12-gpu-crunching/opencl-ap-v7-memory-consumption.msg57231.html#msg57231

As one can see now app's memory consumption returns to almost normal state after FFA completion. Hence (though peak memory consumption remained the same for now) time duration when high memory amount allocated to app considerably decreased so probability of 2 or more tasks simultaneously will demand huge amount of memory is considerably decreased too. Hence host HDD swapping or other low system memory effects less likely with new build.

Thanks all who attracted my attention to this issue.

« Last Edit: 26 Oct 2014, 05:46:19 am by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #9 on: 26 Oct 2014, 11:51:21 am »
~200 MB saved from TestCase max by checking if overflow already reached and not allocating more.


Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #10 on: 26 Oct 2014, 01:56:59 pm »
And final modification - variable ffa block size. Now this taks consumes almost usual memory amount even in peak.

If this modification produces correct result for test tasks and overflows it will be released soon.

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: OpenCL AP v7 memory consumption
« Reply #11 on: 26 Oct 2014, 03:07:49 pm »
I made an interesting investigation.

Installed win 8.1 and tested the second task 30 times.



Check GPU clock its downclocking and fallback on CPU.



With -oclFFT_plan 256 8 128 memory consumption is normal even with big ffa_block values.
Also you can see memory is freed permanently.
« Last Edit: 26 Oct 2014, 03:10:12 pm by Mike »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #12 on: 26 Oct 2014, 03:17:00 pm »
Quite possible that this task hides another issues to discover.

For example, I'm running it on Q9450 with SSE app more than 6 hours already... With GPU overflowed results I would expect CPU time no more 10 minutes... Quite possible it will finish w/o overflow at all and then we will have another TestCase for investigation why so.

But excessive memory usage on overflows is separate issue. So please compare results with and w/o oclfft_plan switch - do they the same?
« Last Edit: 26 Oct 2014, 03:19:18 pm by Raistmer »

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: OpenCL AP v7 memory consumption
« Reply #13 on: 26 Oct 2014, 03:21:03 pm »
Quite possible that this task hides another issues to discover.

For example, I'm running it on Q9450 with SSE app more than 6 hours already... With GPU overflowed results I would expect CPU time no more 10 minutes... Quite possible it will finish w/o overflow at all and then we will have another TestCase for investigation why so.

But excessive memory usage on overflows is separate issue. So please compare results with and w/o oclfft_plan switch - do they the same?

Yes, it was always the same.

Only -oclFFT_plan 256 8 128 did cure it.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: OpenCL AP v7 memory consumption
« Reply #14 on: 26 Oct 2014, 03:25:58 pm »
Yes, it was always the same.

Only -oclFFT_plan 256 8 128 did cure it.


It's the puzzle. I'll try with 2721 on own host.

And regrading CPU - it gave overflow ultimately... but why SO long?...

astropulse_7.03_windows_intelx86__sse.exe  / ap_28jn14aa_B1_P1_00131_20141017_13023.wu :
AppName: astropulse_7.03_windows_intelx86__sse.exe
AppArgs:
TaskName: ap_28jn14aa_B1_P1_00131_20141017_13023.wu
Started at  : 16:17:19.199
Ended at    : 23:17:32.194
Result      : stored as ref for validations.
  25212.541 secs Elapsed
  23937.355 secs CPU time


[ stderr ]
Not using ap_cmdline.txt-file, using commandline options.
16:17:19 (3240): Can't set up shared mem: -1. Will run in standalone mode.

Build features: Non-graphics    BLANKIT TWINDECHIRP     USE_LRINT       FFTW    USE_INCREASED_PRECISION USE_SSE x86
     CPUID: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz

     Cache: L1=64K L2=6144K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1
AstroPulse v7 Windows x86 rev 2603, V7 match, by Raistmer with support of Lunatics.kwsn.net team.
SSE
ffa threshold, twindechirp, lrint mods by Joe Segur
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
Found 30 single pulses and 30 repeating pulses, exiting.
  percent blanked: 4.25

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 74
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 64
Total: 64
Powered by EzPortal