Forum > GPU crunching

OpenCL AP v7 memory consumption

(1/3) > >>

Raistmer:
Logged ~ 1 day of my host work (HD6950 running 2 AP instances, logging for one of them) and got such picture.
So, my host affected with memory consumption increase runing wild tasks (as one can see in bottom of picture max working set was ~750MB while usual near 100MB). Next I will reproduce situation with Juan's test case task in offline test, with more detailed logging.

Raistmer:
As one can see this task (ap_26jn14aa_B6_P0_00184_20141017_17161.wu) consumes more memory than usual indeed, but far less than reported ~1GB (with same settings applied, but ATi instead of NV device/app).

1 - start of offline ap_26jn14aa_B6_P0_00184_20141017_17161.wu run.
2 - usual memory consumption for live task with different options (much less ffa_block values)
3 - restart of live task.

Will check second test case now.

Raistmer:
Picture for second ( ap_28jn14aa_B1_P1_00131_20141017_13023.wu ) task is much interesting.

There is huge increase in memory consumption. Up to the point of exception.


--- Quote ---ERROR: some exception inside long FFA, probably video-driver restart, restarting app...
--- End quote ---

So, memory increase occurs during whole task live that implies memory not freed between FFA calls (EDIT: need to see if "whole task life" consists of few FFA calls or only single one for this particular task)

Next run will be with defaults.

Raistmer:
Run with defaults not crashed, completed MUCH FASTER than with big -ffa_block and shows that overflow occured in first large FFA pair.
Such FFA pair is processed as single entity for TWIN_FFA mod. So it's impossible to say if memory leaked between FFA runs or not.
Memory increase over usual task still very considerable (defaults for HD6950 mean 5632 threads in fly).

Next to discover is how many FFA blocks were processed before task completion. Each FFA block should behave independently from prev ones in sense of memory consumption...

Raistmer:
Interestingly, with signal logging enabled (-v 2 option) memory kinetics looks different. But roughly same memory consumption in the end.
Task run took ~3hours instead of <30 mins though.

And it seems there is nothing to fix. Whole enormous number of signals found and eventually replaced/discarded was found inside single FFA block (governed by -ffa_block param). That is, the only way to reduce memory requirement for such kind of task is to reduce -ffa_block value.

The only thing to check now is to decrease -ffa_block below default value to observe memory (and processing time!) savings for such kind of task.

Signal log (stderr) takes ~340 MB (!). Even in 7z-ipped form it takes 17MB so I'll will put it to cloud instead of attaching here directly.
EDIT: link to logs https://cloud.mail.ru/public/8259db7e0b1f%2FCHII-20141025-1841-benchAP.txt.7z

Navigation

[0] Message Index

[#] Next page

Go to full version