TDF Snapshot: File seems to be cached in memory

Hi,

I have written a small function which uses a TChain in combination with a TDF to reduce a Dataset. The input data is something about 160 gb of data spread across a lot of files. After the reduction the dataset should take up approx 2-3 gb as a single file. During processing the memory usage of the program increases linearly in time. I wonder if this is some kind of caching mechanism and if it is the case can it be disabled if the caching is not needed?

Cheers Thomas

I think @eguiraud can most probably help on this

Hi, if you are running with EnableImplicitMT I think it might be this issue (can you confirm?).

If you are running single-thread and still see this problem, it would be great if you could share a reproducer – relevant part of the code and possibly a fraction of the data, e.g. on dropbox or afs.

Finally, just to gauge the problem, how large does the memory consumption become?

Cheers,
Enrico

Hi,
yes I can confirm that the memory consumption is stable when disabling implicit MT. With MT the consumption rises with each processed file. The last thing I saw before my RAM was entirely full resulting in a restart, the consumption was about 4.5 gb. Also multiple warnings show up (even without MT):

in TChain::CopyAddresses: Could not find branch named ‘flag_bad_diff_simple_high’ in tree named ‘cal’

But I am not sure if this is caused by using the snapshot with a branch list

Cheers
Thomas

Good, so it seems you hit the exact same issue.
Could you please comment on the issue stating that you are affected and the ROOT version you are using (or, if on master, the exact commit)?

We are working on a fix, it will probably take a few weeks to land in the master branch, and it will be in the first patch release after that.

The warnings are completely separate and are really just stating that for whatever reason TDataFrame could not find a branch named “flag_bad_diff_simple_high” in the tree – I am not aware of any problems in the diagnostics, if you believe the branch should definitely be found please try to reproduce the problem in isolation (smallest possible snippet of code that shows the bug) and let us know!

Hi,
@amadio pushed a commit to master and v6.12-patches today that should fix the memory leak.
Could you please update ROOT and confirm?

Cheese,
Enrico

Hi,

thanks for your fast responses. Unfortunately i did not have the time to test this earlier. I am now using the root version of the following branch:
heads/v6-12-00-patches@v6-11-02-976-g59e96f66da, Dec 11 2017, 19:21:00

In case this is the right Version I still have an increasing memory usage when implicit MT is enabled. If this is not the Version with the patch should I use the master branch?

Probably there is still a bug on my side. I will tests this a bit more and build a small example, which shows the issue in the next days.

Cheers
Thomas

@tquante I think the problem is that when using many threads, data is produced faster than it can be written out to disk, so it accumulates in memory. Right now, you can only adjust the number of threads so that data doesn’t accumulate in memory, but we have an open issue to try to match data creation to the write speed of your machine to avoid this problem in the future by using the callback mechanism recently introduced in TBufferMerger (a class used by TDataFrame for the snapshot action).

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.