GetEntryWithIndex not working properly with Friends trees

Dear experts,

I’ve been having a problem with working with friends trees and GetEntryWithIndex.

I have a class CellsProfiler that I wrote in C++. This class, basically, loops over lots of trees and creates histograms. Moreover, this class was compiled and I made a library that can be called from a python file.
This class takes as input, for example, 2 trees. The main tree has approx 15 branches that I need to read, while for the second one, I’m only interested in 3.

The second tree has almost the same events as the first one, but it has these 3 extra branches that I need. In what they match is in the EventNumber, so I activate these branches

m_tree_main->SetBranchStatus("EventInfo.eventNumber", 1);
m_tree_cell->SetBranchStatus("EventNumber", 1);

and set as index of the second tree (m_tree_cells) the branch "EventNumber", and then I add them as friends:

m_tree_cell->BuildIndex("EventNumber");
m_tree_main->AddFriend(m_tree_cell);

In order to loop over these trees I created a TTreeReader object

TTreeReader * m_reader = nullptr; //In CellsProfiler.h
m_reader = new TTreeReader(m_tree_main); //In CellsProfiler.cxx

and set the branches I want to read with TTreeReaderValue<type>. For example, for the event number branches of both trees I did:

m_eventNumber_main = new TTreeReaderValue<int> (*m_reader, "EventInfo.eventNumber");
m_eventNumber_cell = new TTreeReaderValue<int> (*m_reader, "EventNumber");

So far so good, it loops over events of both trees, as long as **m_eventNumber_cell = **m_eventNumber_main. However, if they don’t match, I need to look in the second tree for that entry that satisfies that condition. To do so, in my main loop I call the function getSameEntry

while (m_reader->Next()) {
	if (!getSameEntry()) {
		no_cells_info ++;
		continue;
	}
	if (!isHealthyCluster()) {
		not_health_cluster ++;
		continue;
	}
	...
}

And the functions is

bool getSameEntry()
{
	/*Check if both trees have the same event with same event-number*/
    // First get event numbers
	int event_number_main = **m_eventNumber_main;
	int event_number_cell = **m_eventNumber_cell;
	
    // //cout << "Enum main="<<event_number_main << " Enum cell="<<event_number_cell << endl;
	//vector<float> tmp_E = **m_ph_ClusterCells_7x11_L2_E;
	//cout<< "Energy38before="<<tmp_E[38]<<endl;

    // Check if they are the same. If they are not, look for the entry in m_tree_cell that 
    // has the same event number as in m_tree_main
	if (event_number_main != event_number_cell) {
		int st = m_tree_cell->GetEntryWithIndex(event_number_main);
		// cout << "st="<<st<<endl;
		if (st<=0) {
			return false; // This is for the case it doesn't find any entry in m_tree_cell
		}
	}
	// Recompute cell event number, because it may be changed from the previous if-clause
	// with GetEntryWithIndex()
	event_number_cell = **m_eventNumber_cell;
	// cout << "Enum main="<<**m_eventNumber_main << " Enum cell="<<event_number_cell << endl;
	// tmp_E = **m_ph_ClusterCells_7x11_L2_E;
	// cout<< "Energy38after="<<tmp_E[38]<<endl;
	// Just a check that it's really the same event number
	if (event_number_main != event_number_cell) {
		return false;
	}
	return true;
}

It basically checks if both event numbers are the same, and if they aren’t, it looks into m_tree_cell and searches for the entry that satisfies the previous condition.

The problem is the following. If at the beginning they are not the same event (equal event number), it should change the entry at which m_reader points in m_tree_cell, leaving intact the entry at which it points in m_tree_main. This is the case ONLY INSIDE getSameEntry AND in the function that called getSameEntry (i.e. the function that has the main while-loop). However, once I enter another function, m_reader points to the old entry in m_tree_cell. I checked that looking at the value of one of the branches of m_tree_cell inside another function, for example in isHealthyCluster, and they were indeed, the old ones.

Just to check that’s the case I implemented the same class in python (although super slow) and it works correctly (I can verify it with other methods external to these functions, after the whole loop is finished). The function in python is:

def getSameEntry(self, tree_Main, tree_Cell, entry):
	tree_Main.GetEntry(entry)
	event_number_main  = getattr(tree_Main, 'EventInfo.eventNumber')
	event_number_cells = tree_Cell.EventNumber
	if (event_number_main != event_number_cells):
		st = tree_Cell.GetEntryWithIndex(int(event_number_main))
		if st <= 0:
			return False
	event_number_cells = tree_Cell.EventNumber
	if event_number_main != event_number_cells:
		return False
	# If none of previous if-statements were entered->both entries have the same event number
	return True

Does anyone can guess what’s the problem here? I’ve been struggling with this for a whole day and I can’t figure out why this problem appears, or how I can fix it, since I don’t want to loop in python (it’s more than 3x slower)…

Thank you very much for your help!!, and sorry for the long post, I tried to be as detailed as possible…

Cheers,
Francisco


ROOT Version: root 6.20.06-x86_64-centos7-gcc8-op
Platform: lxplus


Hi @fsili ,
TTreeReader had some issues with indexed friend trees until recently. Can you try a nightly build and check whether the problem is solved there? If not, I think we’ll need a minimal reproducer that we can debug on our side.

Cheers,
Enrico

Hi @eguiraud , thank you for your answer.
I tried with a nightly build but the problem persists.

In addition to that, I found a very weird behaviour. I added some printout of the events that were being drop because of this problem. Basically I did 2 tests.

  1. Added printing of the event number of current event in m_tree_main (using **m_eventNumber_main) and in m_tree_cell (using **m_eventNumber_cell), labeled as EMain and ECell, respectively, and also one variable from m_tree_cell that is used for the selection, **m_ph_ClusterSize_7x11_L2, labeled as nCells. The location of the printing is
    • Before any function is entered
    • Inside getSameEntry but before entering the first if clause, that is, before changing the location at wich m_reader points in tree m_tree_cell
    • After changing event in m_tree_cell
    • After getSameEntry, in the main loop, being this event accepted by getSameEntry. E.g. the event in m_tree_cell which has the same event number as current event in m_tree_main.
    • Inside isHealthyCluster.
    • After isHealthyCluster returned false.
      This is the output:
NEWENTRY:         EMain=310898  ECell=319487  nCells=71  // before entering any function
getSameEntry:     EMain=310898  ECell=319487  nCells=71  // inside getSameEntry
st=588    // <----Here we change the event at which m_reader points in m_tree_cell
getSameEntry:     EMain=310898  ECell=310898  nCells=70  // inside getSameEntry after changing event
mainLoop:         EMain=310898  ECell=310898  nCells=70  // in main loop
isHealthyCluster: EMain=310898  ECell=310898  nCells=70  // inside isHealthyCluster
mainLoopCond:     EMain=310898  ECell=310898  nCells=70  // after isHealthyCluster returns false
  1. The second test is to print only after isHealthyCuster returns false. This will be an event that is being drop. This is the output for the same event shown before
EMain=310898  ECell=310898  nCells=71

As you can see, in this second case, nCells takes the same value it had before being changed from event 319487 to 310898, so **m_ph_ClusterSize_7x11_L2 is pointing to the previous event (wrongly!). These 2 tests only differ in the position I added cout statements (and of course re-making the files), and its giving different results. It behaving very strangely (like the uncertainty principle :laughing: ), because when showing information inside other functions and in the main loop, it gives one result, but when changing printing, it behaves completely differently. Why and more importantly, how this is happening???

I’m trying to make a minimal working example of this but I’m having difficulties because it’s very strange and I don’t really know why this behaves differently… However as soon as I have something I’ll post it.

Thank you very much!

Cheers,
Francisco

Hi @eguiraud, I managed to reproduce the same problem…

Here’s a macro that you can execute locally even.
Looper.cxx (4.4 KB)

You can have the option to create new trees or even better, I attach the ones I generated here:
treefriend.root (51.0 KB)
treeparent.root (121.5 KB)

In order to see which events are the problem you’ll need to execute the macro as it is. You’ll get a lot of output (only the first 100 events), with the same information in my previous reply.
Next you can comment lines 83, 87, 93, 103, 144, 151 and 159, and you’ll see discrepancies with respect to the previous run.


For example, when running for the first time, in event 94 we get

NEWENTRY:         EnumMain=94  EnumCell=108  nCells=0
getSameEntry:     EnumMain=94  EnumCell=108  nCells=0
st=8
getSameEntry:     EnumMain=94  EnumCell=94  nCells=1
mainLoop:         EnumMain=94  EnumCell=94  nCells=1
isHealthyCluster: EnumMain=94  EnumCell=94  nCells=1
    EVENT PASSED

We start with event 94 in Main and 108 in Cell, but then it can find event 94 in the Cell ntuple, that has nCells=1 so it passes selection.

However, when we run for the second time commenting those lines I mentioned above, we actually get event 94 printed, when it shouldn’t because we got from our first test that nCells=1… but:

mainLoopCond:     EnumMain=94  EnumCell=94  nCells=0

with nCells=0.


So here we can see that this is working very arbitrarily, changing the value of nCells depending if we comment it or not…

Thank you very much!!
Cheers,
Francisco

1 Like

I’ll take a look as soon as possible!

Hi,
if I understand correctly your code is looping on the trees m_T and m_TF using TTreeReader and at the same time it is also calling GetEntryWithIndex on m_TF. I don’t think that is supported: TTreeReader does not expect the state of the trees it is looping on to change under its feet.

If you see wrong data being read when only looping over m_T and m_TF with TTreeReader, then that’s a bad bug (please let us know if this is the case).
Otherwise you should probably open the input files again and create m_T2 and m_TF2 (completely separate copies of the input TTrees) so you can read some entries while TTreeReader is also looping over the data.

Cheers,
Enrico

Hi,
Thank you for your answer. What I wanted to do is to loop over m_T and m_TF and also call GetEntryWithIndex because the friends are not being correctly linked.
I don’t know very well why this is not being done correctly but this may give a hint. In this case I have two trees: m_T and m_TF, the latter being a subset of m_T with additional branches. In order to match them I build index with the branch event_cell in m_TF. When I do this there’s a runtime error which gives the following:

Error in <TTreeFormula::Compile>:  Bad numerical expression : "event_cell"

and therefore I think they are not being linked because of this. Is it because branches which I use to link both trees are named differently?

Thank you very much
Cheers,
Francisco

I guess we need to fix that then :slight_smile:

Is a branch called event_cell present in both trees?

Yes I think that is the main problem really…

Well, not really:

As you can see from the script, m_T has 4 branches: Event, x, y and z. Then I generated another tree (m_TF) which contains a subset of m_T and those events are selected for z<10 (as for now, this is the same as in this example)

The branches I copy from m_T to m_TF are Event and z. In m_TF, then, I added another branch: cells_number which has extra information that m_T doesn’t.

Finally, in m_TF I changed the name of branch Event (the one I copied from m_T) to event_cell. This is because I want to recreate the same scenario as in my real problem: the branch in m_TF I want to build index with has a different name from the one in m_T.

Thank you!

Cheers,
Francisco

It’s not super easy for me to follow, but code speaks more than a thousand words :smiley: Could you please strip down Looper.cxx so it is the simplest possible reproducer for this problem with event_cell?

Cheers,
Enrico

Hi! Sorry for the delay…
Ok so after a lot of testing, now I get why this isn’t working. It isn’t related to TTreeReader but rather with AddFriend, and my trees (the ones I actually use for my calculations), not some generated ones.

Now I leave a short python file reproducing the problem. The trees it uses are a subset of 100 entries of the ones I use, so it reproduces correctly the problem and my situation. As you can test by only executing I get an error at runtime:

Error in <TTreeFormula::Compile>:  Bad numerical expression : "EventInfo.eventNumber"

and then when comparing entry numbers (the last few printed) you can see that they don’t match, even though I required to BuildIndex on T_m with EventInfo.eventNumber. So what I get is that these trees are not friends and that messes up my calculation, as I showed in my first post.

Any help here will be greatly appreciated! :slight_smile:

Thank you very much for your patience with me…

Cheers,
Francisco

test_forum.py (653 Bytes)
tree_main.root (104.1 KB) tree_cell.root (327.6 KB)

Alright, thank you for the reproducer, I will take a look at it as soon as possible.

Cheers,
Enrico

1 Like

Ok so your main tree has a branch called EventInfo.eventNumber and the friend tree has a branch called EventNumber, and for each event in the main tree, you want to retrieve the event in the friend tree for which main.EventInfo.eventNumber == friend.EventNumber? (i.e. you want a SQL join on main.EventInfo.eventNumber == friend.EventNumber?)

Yes, that’s what I want. But, adding tree_cell as a friend should do that automatically, right?

Yes, with a correctly-built TTreeIndex (and this last part is what goes wrong).

I see a few issues in the code you shared, but more importantly there is a problem in the data you shared: not all values of EventInfo.eventNumber are present in EventNumber, and vice versa!

When that is fixed, this should do what you want:

# you only need to call BuildIndex on the indexed friend
# the main tree will index into the friend using the indexed values
friend.BuildIndex("EventNumber")
# there is a problem though: the main tree does not have a branch called EventNumber!
# so we need this trick to make the main tree look into EventInfo.eventNumber
# when EventNumber is queried:
main.SetAlias("EventNumber", "EventInfo.eventNumber")
# now, add the indexed friend as a friend of the main tree (it was vice-versa in the code you shared)
main.AddFriend(friend)
# and loop over the main tree
for event in main:
 ...

Cheers,
Enrico

Hi Enrico,

Thank you!

Mhh is there any way to circumvent this? I know there are values of EventInfo.eventNumber not present in the friend tree, and I actually want to skip those, I’m not interested in them. That’s why I changed tree_main to be the friend tree, because there are some extra events there that I’d never access if that’s the friend…
But I’ll try to correct it and I’ll let you know

Thank you very much for your help! :smiley:
Cheers,
Francisco

I can’t think of any other way to do this other than do a first pass over the trees and build a list of event numbers that you want to skip (you can do this once per dataset created, you don’t have to do this every time you run the analysis) and then add a if eventnumber in event_numbers_to_ignore: continue to your event loop. @pcanal might have a better suggestion.

Cheers,
Enrico

What about this?:
I consider tree_cell as the main tree and add tree_main as a friend to tree_cell (remembering that tree_main has more events than tree_cell). Then I loop over tree_cell and it’ll only see those events present in that tree, and it’ll never enter any extra event in tree_main… (Sorry for the confusing names :confused: )
Is that correct?

Cheers,
Francisco

Yes, but tree_cell also has some values of EventNumber that do not appear in EventInfo.eventNumber I think :confused:

Hi Enrico,

I think that those extra events you saw on tree_cell that were not present on tree_main are the ones that didn’t make it to the first 100 on tree_main.
I did tests interchanging tree_main and tree_cell and is not going over those extra events, and now it’s working fine with the MWE I provided before.

However, if I do the same but now using TTreeReader (I need to run this loop on c++ code and I’m trying to use TTreeReader) you can see that if you run using the whole tree, it loops infinitely, giving 200, 300, 400 % progress. I tested it and it loops infinitely on the last entry. I can put a line that breaks the loop, but this is quite strange. Do you know why this is happening?

I leave the cxx script below, as well as the whole trees.

Thank you very much!
Cheers,
Fran

ROOT Macro: Looper2.cxx (3.0 KB)
and the trees you can find them here: https://drive.google.com/drive/folders/1I1hbdBD9SUgqYdpuikjUnocC-R3brAaQ?usp=sharing