Bug or feature? TRefs and merging two TFiles

Hi all.

So, I have a rather complicated problem. I’ve figured out what I need to do in order to work around this problem, but I wanted to share this information with you, and find out if this is a bug, an unintended consequence of the design of ROOT memory/file handling, or if there’s something fundamentally wrong with my code/understanding.

I have an event class (KHLAEvent) that is saved to an output TFile, with the KHLAEvent as the top-level branch. Within this event class there are two TClonesArrays – one holds objects KSingleBoloSubRecord and another holds objects KSambaSubRecord. Within KSingleBoloSubRecord there is a TRef that should point to the appropriate KSambaSubRecord within the event.

I wrote an assignment operator for my KHLAEvent class for the purpose of merging two separate files (**). I want to be able to do something like this:

TFile *fout = new TFile(“out.root”,“recreate”);
TFile *fin1 = new TFile(“input1.root”);
TFile *fin1 = new TFile(“input2.root”);

//set up the branches for KHLAEvent *eout, *ein1, *ein2

//start looping through all entries input1 and input2 and then

if(some condition){
*eout = *ein1;
TreeOut->Fill()
}
else {
*eout = *ein2;
TreeOut->Fill()
}

//write and then close the file.

The assignment operator KHLAEvent::operator= had to be a bit special in order to create a new KSambaSubRecord for each KSingleBoloSubRecord that had a pointer set in its TRef (I have yet to deal with the special case where more than one KSingleBoloSubRecords point to the same KSambaSubRecord, but never mind that.)

The above “code” wouldn’t work – the wrong KSambaSubRecord was copied to the event – unless I explicitly made sure to change the current working directory to the TFile that holds the event I want to copy and then call TTree::GetEntry(currentEntryNumber) before calling the assignment operator. This looks something like this:

TFile *fout = new TFile(“out.root”,“recreate”);
TFile *fin1 = new TFile(“input1.root”);
TFile *fin1 = new TFile(“input2.root”);

//set up the branches for KHLAEvent *eout, *ein1, *ein2

//start looping through all entries input1 and input2 and then

theCurrentEntryFile1; //retain this value
theCurrentEntryFile2; //retain this value

if(some condition){
fin1->cd();
tree_in1->GetEntry(theCurrentEntryFile1);
*eout = *ein1;
TreeOut->Fill()
}
else {
fin2->cd();
tree_in2->GetEntry(theCurrentEntryFile2);
*eout = *ein2;
TreeOut->Fill()
}

//write and then close the file.

In this case, it works, meaning that the correct KSambaSubRecord was properly copied to my *eout event and written to my TreeOut.

I have attached a fully working example. Just compile it, and then run bin/fillEventTest to see what I’m talking about. fillEventTest.cxx is currently written to produce the problem. Uncomment lines 162, 163, 173, 174, 186, 187, 199, 200 to make it work properly. The difference in the output to stdout can be seen in the final TTree::Scan call – but you’ll have to read all of the output to know what its doing.

You’ll have to read through fillEventTest.cxx to get the best idea of what this is doing. But I think the code is readable enough. Or, maybe somebody already knows why I have to call the TFile::cd() and reload the entry without needing to look at my example.

Thanks a lot for your help.
Adam Cox

(**) - We don’t have a guaranteed index variable that would allow me to use the standard TChain/TTree::Merge routines.
trefExample.zip (139 KB)

Adam,

It will take some time for us to go through your system. I do not understand why a change of directory can modify the TRef behavior. Meanwhile could you make the following test setting the directory after tree->GetEntry as shown below

Rene

[quote]if(some condition){
tree_in1->GetEntry(theCurrentEntryFile1);
fin1->cd();
*eout = *ein1;
TreeOut->Fill()
}
else {
tree_in2->GetEntry(theCurrentEntryFile2);
fin2->cd();
*eout = *ein2;
TreeOut->Fill()
}[/quote]

Rene,

Thanks again for looking at this issue, even though its a bit complicated. It resulted in corrupting our data (obviously) and I thought it might be good to know why.

So, I ran my program with your suggestion - changing the order of GetEntry and cd(). This seems to work properly. I get the expected sub records in my output tree.

Adam

Here’s a random thought. My KDataReader class makes a new TFile
TFile *myFile = new TFile(“filename.root”).

Should I use TFile::Open instead? Does this do something special?

Adam

No the 2 forms are identical for a local file.

so from your previous mail it looks like there is a dependency on the current directory in your assignment operator.

Rene

Well, there’s absolutely nothing written into the assignment operator that deals with the current directory. However, clearly, because I need to call TFile::cd() there is some dependency.

Also, I just made the test where I remove the tree_in->GetEntry(theCurrentEntry). This does NOT work. I get the same problem as before.

BUT… I’m sorry for not testing this earlier, if I just call tree_in->GetEntry(theCurrentEntry) it DOES work.

So, the following appears to work:

if(some condition){
tree_in1->GetEntry(theCurrentEntryFile1);
*eout = *ein1;
TreeOut->Fill()
}
else {
tree_in2->GetEntry(theCurrentEntryFile2);
*eout = *ein2;
TreeOut->Fill()
}

of course you need the call to GetEntry.
so I consider this problem as resolved.

Rene

Oh, no, there’s a bit of a misunderstanding. Its clearly not that simple!

Let me go back to my original email, where I wasn’t totally clear.

TFile *fout = new TFile("out.root","recreate");
TFile *fin1 = new TFile("input1.root");
TFile *fin2 = new TFile("input2.root");

//set up the branches for KHLAEvent *eout, *ein1, *ein2
//Here, I initialize separate memory space for each of these event objects.

fin1->GetEntry(0); fin2->GetEntry(0);

//start while looping until end of the files through all entries in input1 and input2 and then
//here I call fin1->GetEntry(entry1); and fin2->GetEntry(entry2);
//so, I've already done this in the looping. Then based upon information 
//in the events ein1 and ein2, I decide which event to next place in my output.
//Basically, I am sorting two files into an output file that is chronologically ordered. 

theCurrentEntryFile1 = entry1; //retain this value
theCurrentEntryFile2 = entry2; //retain this value

if(some condition){
fin1->cd();
tree_in1->GetEntry(theCurrentEntryFile1);  //this has essentially already been called above!
*eout = *ein1;
TreeOut->Fill()
}
else {
fin2->cd();
tree_in2->GetEntry(theCurrentEntryFile2);
*eout = *ein2;
TreeOut->Fill()
}

entry1++;
entry2++;

I’ve just realized and tested that its the order in which I call tree_in->GetEntry(). For example, if my loop has the order
tree_in1->GetEntry(entry1)
tree_in2->GetEntry(entry2)

and then my condition requires that I want to save the event in tree 1, then I next call
*eout = *ein1.
This is where I get the problem.

One cannot call tree_in2->GetEntry(entry2) before calling
*eout = *ein1.

This order somehow corrupts the data in ein1 and the values stored in eout will not be what you expect.

This implies that ein1 and ein2 are somehow sharing memory. I do not have a static single instance of TClonesArrays. I have separate instantiations for each KHLAEvent. Would the TClonesArrays somehow automatically share their memory space because they are holding objects with the same name?

Adam

Hi,

Note that the value of TBranchElement::GetObject is not guarantee not to change after a call to GetEntry. I suspect your problem would be fix by using: fr1->GetEntry(entry1) KHLAEvent *e1 = (KHLAEvent *)fr1->GetEvent(); fr2->GetEntry(entry2) KHLAEvent *e2 = (KHLAEvent *)fr2->GetEvent();
or better yet, use the SetBranchAddress technique.

Philippe.

Hi Philippe

Thanks for your reply.

In my KDataReader class, I create an instance of a KHLAEvent and then pass the address of the pointer to that object into the SetBranchAddress.

See line 107 of KDataReader.cxx.

Sorry for having all of these classes in my example. I might be able to pare down my example a bit and get rid of the KDataReader/Writer/FileIO classes. But basically, these classes create local instances of the KHLAEvent object, open a file and then call SetBranchAddress (for the Reader class) or Branch (for the Writer class).

Adam

So, I guess you’re referring to the following lines in my KDataReader class:


KEvent* KDataReader::GetEvent(void)
{
	KEvent* event = 0;
	if(fTree !=0){
		if(!fTree->IsZombie())
		{
			TBranchElement *fB = (TBranchElement *)fTree->GetBranch(GetBranchName().c_str());
			event = (KEvent *)fB->GetObject();
		}
	}
	
	return event;
}

[quote]So, I guess you’re referring to the following lines in my KDataReader class:[/quote]Yes … this is what I thought your were using in the failing case …

Hi again.

So, I changed my KDataReader::GetEvent object just to return a pointer to fLocalEvent. I’ll attach the .h and .cxx files if you want to put them in the previous code that I attached and recompile.

The result, however, is the same. My events seem to have the wrong data in them unless I call GetEntry in the proper order.

Adam
KDataReader.h (1.31 KB)
KDataReader.cxx (4.01 KB)

Hi,

Can you send a new zip file containing the code for your latest problem, I have no clue where you do ‘I’ve just realized and tested that its the order in which I call tree_in->GetEntry(). For example, if my loop has the order’. Also can you include (if possible) a log file of a ‘good’ session.

Cheers,
Philippe.

Hi Philippe - I was just about to do this.

I’ve removed the KDataReader stuff and pared it down to something simpler. It is currently set up to fail – which means that the wrong values in the TClonesArrays are written to file. If you uncomment lines 176, 188, 200, and 212, then it works as expected.

When you run bin/fillEventTest it outputs TTree:Scan in the following order

  1. input file 1 – this is an input file i create that i will merge into test1 and test2.
  2. input file 2 – the other input file to be merged.
  3. test1 - this combines input file1 and input file2 using the assignment operator that copies every object in the Event’s TClonesArrays for KSingleBoloSubRecord and KSambaSubRecord without dealing with TRefs.
  4. input file 1 (redundant with 1. because of the lazy way I wrote this test program)
  5. input file 2 (redundant with 2.)
  6. test2 - this combines input file1 and input file2 by making a copy of every KSingleBoloSubRecord in the TClonesArray. Then, if an instance of the KSingleBoloSubRecord object points to a KSambaSubRecord with its TRef, it adds a KSambaSubRecord object to the output file and sets the TRef. (This is easier to understand if you look at the assignment operator in KHLAEvent.cxx – line 128, which calls CopyClonesArrays on line 145).

I hope this helps.

Adam
trefExample 2.zip (128 KB)

Here’s an output from fillEventTest when those particular lines are uncommented. You’ll notice that the very last TTree::Scan looks like the 3rd TTree::Scan output.

If you run fillEventTest as I have it written in my example that I attached in the previous message, the final TTree::Scan output doesn’t have the correct values in the right order.

Compare these two.
correct:
TTree::Scan for test2.root
Should be in order first 3 events from input1, one event from input2, next 3 events from input1, next 3 events from input2


  • Row * Instance * fEnergyHe * fDetector * fSambaEve * fSambaDAQ *

  •    0 *        0 *         1 *         1 *         0 *         7 *
    
  •    1 *        0 *        11 *         1 *         1 *         7 *
    
  •    2 *        0 *        21 *         1 *         2 *         7 *
    
  •    3 *        0 *       201 *         2 *         0 *        17 *
    
  •    4 *        0 *        31 *         1 *         3 *         7 *
    
  •    5 *        0 *        41 *         1 *         4 *         7 *
    
  •    6 *        0 *        51 *         1 *         5 *         7 *
    
  •    7 *        0 *       211 *         2 *         1 *        17 *
    
  •    8 *        0 *       221 *         2 *         2 *        17 *
    
  •    9 *        0 *       231 *         2 *         3 *        17 *
    

incorrect:

TTree::Scan for test2.root
Should be in order first 3 events from input1, one event from input2, next 3 events from input1, next 3 events from input2


  • Row * Instance * fEnergyHe * fDetector * fSambaEve * fSambaDAQ *

  •    0 *        0 *        71 *         1 *         7 *        17 *
    
  •    1 *        0 *         1 *         1 *         0 *         7 *
    
  •    2 *        0 *        11 *         1 *         1 *         7 *
    
  •    3 *        0 *       271 *         2 *         2 *         7 *
    
  •    4 *        0 *        21 *         1 *         0 *        17 *
    
  •    5 *        0 *        31 *         1 *         3 *         7 *
    
  •    6 *        0 *        41 *         1 *         4 *         7 *
    
  •    7 *        0 *       201 *         2 *         5 *         7 *
    
  •    8 *        0 *       211 *         2 *         1 *        17 *
    
  •    9 *        0 *       221 *         2 *         2 *        17 *
    

I hope this helps.

Adam

[quote=“gadamc”]Should I use TFile::Open instead? Does this do something special?
[/quote]

TFile::Open can open all sorts of special files (e.g., starting with dcap://. castor://) where as new TFile() can not. For a local file, (as Rene pointed out) it doesn’t make any difference, but I always use TFile::Open()…

Cheers,
Charles

Hi Charles. Thanks for your message. I was aware that TFile::Open handlies different file ‘types’, but I wasn’t sure if it somehow created a directory structure or something special that would fix my problem.

So… I thought about how I might be able to clarify and focus this problem for you.

Here is the KHLAEvent::CopyClonesArray method, which is called by the assignment operator for the KHLAEvent.

void KHLAEvent::CopyClonesArrays(const KHLAEvent &anEvent)
{
	//
	//ClonesArray assignment doesn't appear to work in the following way
	//*fSamba = *anEvent.GetSambaSubRecords();
	//*fBolo = *anEvent.GetBoloSubRecords();


	ClearArrays("C");
	
	Int_t ObjectNumber = TProcessID::GetObjectCount();
	
	//Using this option, the subrecords are properly copied
	//but they are not linked via TRefs!

	if(fAssignmentOptionNoTref){

                 //The order in which I call TTree::GetEntry for the files that I'm merging does not
                // affect the assignement operator if i copy the clones arrays like this. 

                 //copy all of the samba sub records

		for(Int_t i = 0; i < anEvent.GetNumSambas(); i++){
			KSambaSubRecord *s = AddSamba();
			KSambaSubRecord *sO = anEvent.GetSamba(i);
			if(s != 0 && sO != 0) 
				*s = *sO;
			else
				cout << "KHLAEvent::operator= Invalid Samba Pointer" << endl;
		}
		
                //copy all of the bolo sub records and don't look 
                //if the bolo sub record has a tref that points to a samba sub record
		for(Int_t i = 0; i < anEvent.GetNumBolos(); i++){
			KSingleBoloSubRecord *bolo = AddBolo();
			KSingleBoloSubRecord *boloO = anEvent.GetBolo(i);
			
			if(bolo != 0 && boloO != 0) 
				*bolo = *boloO;
			else
				cout << "KHLAEvent::operator= Invalid SingleBolo Pointer" << endl;
		}
		
	}
	else {
		//THIS IS THE WAY IN WHICH I WANT THIS TO WORK. FOR EACH BOLO SUBRECORD
		//I ADD A SAMBA SUB RECORD WHEN IT POINTS TO ONE. 
                 // However, this does not work properly unless I call TTree::GetEntry in the correct order
                  // for the anEvent object that I want to assign to "this"
	
		for(Int_t i = 0; i < anEvent.GetNumBolos(); i++){
			KSingleBoloSubRecord *bolo = AddBolo();
			KSingleBoloSubRecord *boloO = anEvent.GetBolo(i);
			KSambaSubRecord *sambaO = boloO->GetSambaRecord();
			KSambaSubRecord *samba = 0;
			
			if(bolo != 0 && boloO != 0) 
				*bolo = *boloO;
			else
				cout << "KHLAEvent::operator= Invalid SingleBolo Pointer" << endl;
			
			if(sambaO != 0) {
				samba = AddSamba();
				*samba = *sambaO;
				if(samba) bolo->SetSambaRecord(samba);
			}
			else
				cout << "KHLAEvent::operator= No Samba Pointer for bolo record" << endl;
		}
	}
	
	
	//Restore Object count                                                                                                     
	//To save space in the table keeping track of all referenced objects 
	//and computation time,
	//we assume that our events DO NOT address each other. We reset the                                                        
	//object count to what it was at the beginning of the event.                                                               
	TProcessID::SetObjectCount(ObjectNumber);
	
}

You can see there are two ways that I assign the objects in the TClonesArrays to “this”. One way where I don’t consider the TRefs (if fAssignmentOptionNoTref == true) and the other way where I only make a samba sub record if I get a valid pointer to one from the boloSubRecord (when fAssignmentOptionNoTref != true).

When I don’t use the TRefs, the order in which I call TTree::GetEntry doesn’t matter. It always works. But, If I’m using the TRefs, I need to make sure that I call TTree::GetEntry in the proper order for the two files that I’m trying to merge together.

Does this make sense? Let me know if you have any questions, of course.

Adam

I don’t think this has anything to do with TRefs. I’ve removed the whole TRef business and replaced it with just a number that tells me which element of the TClonesArray that I should access. But, I still get this problem unless I call TTree::GetEntry in the proper order.

There’s also something strange, perhaps, when I call TTree::Scan, because the behavior changes when I remove that call!

Adam

[quote] If you uncomment lines 176, 188, 200, and 212, then it works as expected. [/quote]line 176 reads://t1->GetEntry(entry1); //I've already called this!and indeed I see the following code before hand: t1->GetEntry(0); ... t1->Scan("fEnergyHeat:fDetectorNumber:fSambaEventNumber:fSambaDAQNumber","","colsize = 30"); ... Int_t entry1 = 0; ... and thus line 176 must indeed be uncommented because the TTree::Scan itself call GetEntry from 0
to t1->GetEntries() and thus at the end of the scan, your object is pointing to the last entry of the TTree (that is what I see in your output too).

Similarly, line 188 must be uncommented since the t2->Scan called t2->GetEntry from 0 to t2->GetEntries().

For line 200 and 212, I think the problem (look at the value of GetUniqueID on your object) is due to the fact that all your objects have the same unique id, this means in practice that the last object read (which ever file it comes from is the one being referred to). This problem goes away if you comment out the lines TProcessID::SetObjectCount(ObjectNumber);

Cheers,
Philippe.

[code]root [0]
Attaching file input2.root as _file0…
Warning in TClass::TClass: no dictionary for class KHLAEvent is available
Warning in TClass::TClass: no dictionary for class KEvent is available
Warning in TClass::TClass: no dictionary for class KSambaSubRecord is available
Warning in TClass::TClass: no dictionary for class KSingleBoloSubRecord is available
root [1] t->Scan(“fSamba.fUniqueID”);


  • Row * Instance * fSamba.fU *

  •    0 *        0 *         1 *
    
  •    1 *        0 *         1 *
    
  •    2 *        0 *         1 *
    
  •    3 *        0 *         1 *
    
  •    4 *        0 *         1 *
    
  •    5 *        0 *         1 *
    
  •    6 *        0 *         1 *
    
  •    7 *        0 *         1 *
    

[/code]