Hi everybody,
I have a few general questions regarding memory management when working with RooWorkspaces. I am currently working with RooWorkspaces but can not seem to really understand the memory consumption or how to free the occupied memory properly. However, I would really like to understand the memory consumption better and keep it at a necessary minimum.
Maybe it is best if I try to describe what I would like to do in simple words: So, I have roughly 150 Rootfiles which are each about 33Mb in size (combined ~5Gb). Each of those Rootfiles consist of 1 RooWorkspace which in turn consists of 10 roughly equally large indexed RooDatasets. So, I conclude that each of these 10 RooDatasets is roughly 3.3Mb large. Now I loop over all the indeces of the RooDatasets. For each index I open all 150 files, extract the 150 RooDatasets, add them together, do some manipulation on them. store the result, free up the memory and then move on to the next index. Ideally, the memory consumption should never go much higher than roughly 150*3.3Mb or ~500Mb. However, I seem to have trouble freeing up the memory, as my memory consumption keeps increasing with each index instead of resetting.
I tried looking for similar questions but didn’t find anything that helps me. Can you please help me? I will put below a simple version of my script:
The function to extract all the RooDatasets of a given index (please notice that this is a simplified version only to show you what I do):
void AddDataToWorkspace(RooWorkspace * MetaWorkSpace, int index) {
// I omit here how I get the name of the first file and a vector containing the names of all other files
// Initialising data with the first file
TFile * StandardFile = new TFile(first_file.c_str()); // "first_file" is the very first file out of the 150
RooWorkspace * w_first = (RooWorkspace*) StandardFile->Get("w");
RooDataSet * Data = (RooDataSet *) w_first->data(("Data_" + to_string(index)).c_str());
delete StandardFile;
// Adding all consecutive files
TFile * ToBeAddedFile;
RooWorkspace * w_to_be_added;
RooDataSet * Data_ToBeAdded;
for (std::vector<std::string>::iterator t = consecutive_files.begin(); // "consecutive_files" are the remaining 149 files
t != consecutive_files.end(); ++t) {
ToBeAddedFile = new TFile(t.c_str());
w_to_be_added = (RooWorkspace *) ToBeAddedFile->Get("w");
Data_ToBeAdded = (RooDataSet *) w_to_be_added->data(("Data_" + to_string(index)).c_str());
Data->append(*Data_ToBeAdded);
}
// delete Data_ToBeAdded;
delete w_to_be_added; // can not delete an empty workspace! this will delete the container
delete ToBeAddedFile;
// Putting the histogram into the meta-workspace and returning it
MetaWorkSpace->import(*Data);
return;
}
As you see, my idea is to have a “meta” workspace where the combined dataset will be in. Then I take the first file, initialise the RooDataset with the corresponding Dataset in there and then I loop through all the other files adding the corresponding datasets. In the process I try to free up the memory with:
delete StandardFile;
// and
delete w_to_be_added;
delete ToBeAddedFile;
Is this the correct way? I already have some other questions here as well. When I call “delete w_to_be_added”, does this free up the memory of all constituents of the workplace? I realised for example that I can not call “delete Data_ToBeAdded” first and then delete the workspace. Strangely, I also loose the RooDataSet “DataToBeAdded” when I delete the RooWorkspace “w_to_be_added”. So I am a bit confused what object takes the memory exactly? The RooWorkspace? The constituents of the Workspace? The file? And in what order should I delete them to free the memory?
Anyway, on to my main problem: In my naive understanding, after ending the function call to “AddDataToWorkspace” this should only have left something in the “Metaworkspace” and every other memory occupied for a file, workspace or dataset from one of the 150 files should be free again. My whole script looks something like this: (again simplified version for understanding only)
void Main() {
// Create a MetaWorkspace to manage the project.
RooWorkspace *wks = new RooWorkspace("myWS");
// Initialise some container for the results and other stuff...
// Loop through the index (of the RooDataSets)
for (int nr_index = 0; nr_index <= 9; nr_index++) {
// Add the combined RooDataSet to the MetaWorkspace via the previous function
AddDataToWorkspace(wks, nr_index);
RooDataSet * Data_temp = (RooDataSet *) wks->data(("Data_" + to_string(nr_index)).c_str());
// Make a histogram out of the combined RooDataSet
TH1D *Histo_temp = (TH1D *) Data_temp->createHistogram(Data_temp->GetTitle(), variable, Binning("something"));
// Now I only need the histogram, so I would like to free up the memory of the combined RooDataSet
// wks->Print("v");
wks->RecursiveRemove(Data_temp);
// wks->Print("v");
// Do other stuff with the histogram, which should not take up a lot of memory...
// Specifically, I create some RooAbsPdfs, build a model and do some fits and plots
// -> eventually end this step in the loop
}
}
So, I actually try to remove the whole RooDataSet again after I have made a histogram out of it:
// wks->Print("v");
wks->RecursiveRemove(Data_temp);
// wks->Print("v");
Is this the right way to free up the memory of the RooDataSet again? I only want to delete the DataSet here, as I want to keep the MetaWorkspace. Calling “wks->Print(“v”)” confirms me that the “RecursiveRemove” removes the RooDataSet from my MetaWorkspace. So even though it seems like I always delete the hugest memory-consumer, my memory consumption keeps increasing with the for-loop in the main function. What am I doing wrong? Am I somehow deleting objects in the wrong way or is there a hidden dependence in the RooWorkspace that keeps a copy or so?
Any help is much appreciated! Thanks already in advance