Accessing array-like branches with the type "Float_t"

Dear experts,

I am writing as I got a naive question on how to access array-like branches with the type “float”.

So what I am trying to do is reading a file (in CMS nanoAOD format) and getting the pt distributions of the leading and subleading jet in each event. Jet pt values are stored in a branch, and I believe the structure of the data stored in it should be similar to the std::vector or a normal C++ dynamic array (so that you can store all the jet pts for each event). However, when I tried to use TTree::SetBranchAddress to link the branch and a local std::vector, the script returned an error:

“The pointer type given “vector” does not correspond to the type needed “Float_t” (5) by the branch: Jet_pt”

I also printed the tree and found that the type of the data stored in that branch is indeed Float_t


*Br 214 :Jet_pt : Float_t
*Entries : 4579 : Total Size= 20527 bytes File Size = 1866 *
*Baskets : 2 : Basket Size= 34816 bytes Compression= 10.66 *

I am quite confused as it is very clear that this branch is not a scalar-like branch, but an array-like branch. And I have not idea why the type expected by this branch would be “Float_t”, rather than “Float_t *” or similar…

I checked the file using root object browser and verified that this branch is indeed array-like (more values stored than the number of events, so multiple values per event)

Do you know what is happening? And do you have suggestions on how to properly access this branch?

Thanks a lot for your kind help!

Best regards,
Sebastian

Hi Sebastian,
indeed the TTree::Print output is misleading in this case (@pcanal ping!), but RDataFrame should provide a more human-friendly output:

ROOT::RDataFrame df("Events", "nanoaodfile.root");
std::cout << df.GetColumnType("Jet_pt") << std::endl;

I would suggest to use RDataFrame also to produce the histograms you want: it offers a nicer higher-level syntax with respect to raw TTree. These would pretty much be all the lines you need:

auto histo_leading = df.Filter("jet_pt.size() > 0")
                       .Define("leading_jet_pt", "jet_pt[0]")
                       .Histo1D("leading_jet_pt");

and similarly for the sub-leading jet.

As for why SetBranchAddress did not work for you, I think with TTree you need to read float arrays into raw arrays, i.e. something like Float_t *arr = nullptr; tree.SetBranchAddress("jet_pt", &arr);, but @pcanal can cmment with more authority.

Cheers,
Enrico

Dear Enrico,

Thanks a lot for your quick reply!

Yes! I just checked out the RDataFrame, and indeed the type of that branch is ROOT::VecOps::RVec<Float_t>. I will now try to link it to a local ROOT::VecOps::RVec<Float_t> and this time it should work.

And thanks for pointing out the possibility to use RDataFrame to handle the data quickly, this sounds like a columnar method. Unfortunately, I have some other operations need to do inside each event, and a loop still seems to be the most straightforward way to deal with:)

Thanks again for your kind help!

Best regards,
Sebastian

Ah, I think RDataFrame is hiding the fact that this is just a Float_t* raw array (that RDataFrame reads as an RVec<Float_t> to be helpful). Using a raw C-style array is still your best bet, I think.

(As an aside, note that RDataFrame has zero problems with operations on nested collections.)

Dear Enrico,
Hmm, you are right, I think after loading the tree into RDataFrame, the branch is transformed into ROOT::VecOps::RVec<Float_t>…

I’ve tried a bit, and here is what I found:

TTree tree = (TTree)file->Get(“Events”);
Float_t *pt = NULL;
tree->SetBranchAddress(“Jet_pt”,&pt);
tree->GetEntry(0);
cout<at(0)<<endl;
Resulted in :error: member reference base type ‘Float_t’ (aka ‘float’) is not a structure or union

TTree tree = (TTree)file->Get(“Events”);
Float_t pt;
tree->SetBranchAddress(“Jet_pt”,&pt);
tree->GetEntry(0);
cout<<pt<<endl;
Resulted in a single value: 227.625

but no matter how I try I still can not get the corresponding array…

Thanks a lot!
Sebastian

From the “tree->Print()” output in your original post above (for a variable size array, you should see there something like “Jet_pt[nJet]”) and then from your very last trial (the one that returned 227.625), it is clear that the “Jet_pt” is a single ordinary “Float_t” value.

If you don’t know how to deal with your tree, see how various flavours of automatically generated “analysis skeletons” deal with it.

Jet_pt in NanoAODs should really be an array, and RDF agrees. I think you can read the first element of a C-style array as if it was a simple double.

EDIT:
as a test, could you please try running the following:

ROOT::RDataFrame df("Events", "nanoaodfile.root");
df.Filter("cout << jet_pt.size() << endl; return true;").Count().GetValue();

It should print the array sizes to screen. You can do the same to print the values.

Dear Enrico,

Yes, I did get the array size like these:
14
10
10
7
So Jet_pt branch is indeed array-like. I tried to access them using:

Float_t jet_pt_arr;
eventTree->SetBranchAddress(“Jet_pt”, &jet_pt_arr);
eventTree->GetEntry(ientry);
cout<<(&jet_pt_arr)[i]<<endl;

It seemed to be working at first (can print several reasonable values), but later on when it went to higher indices (but is still within nJet), it started printing near-zero float values (0 in the float case). I also tried to do the histogram of pt of all the jets, and the resulted plot is different from what I got from ROOT Object Browser. So I guess &jet_pt_arr is not pointed to the correct beginning of the Jet_pt array… Very confusing…

Thanks a lot!

Best,
Sebastian

Alright. The automatically generated analysis skeletons suggested by @Wile_E_Coyote might show the right syntax to access those branches with raw TTree.

Otherwise we need @pcanal.

Cheers,
Enrico

Dear Enrico,

I tried another recipe which works:

Float_t *jet_pt_arr = new Float_t;
eventTree->SetBranchAddress(“Jet_pt”, jet_pt_arr);
eventTree->GetEntry(ientry);
cout<<jet_pt_arr[i]<<endl;

However, it seems the declaration of

Float_t *jet_pt_arr = new Float_t;

resulted in some weird behavior of ROOT.

Here is a working example, if you could find a small CMS nanoAOD file (nanoAOD.root) to use:

void debug() {
	UInt_t count0=0;
	
	//=================== Chunk 1 ==============================
	TString outfilename = TString("study.root");
	TFile *outFile = new TFile(outfilename,"RECREATE"); 
	TTree *outTree = new TTree("Events","Events");
	//=========================================================
	
	UInt_t njet = -999;
	Float_t *jet_pt_arr = new Float_t;
	
	// loop through files
	TTree* eventTree = 0;
	std::vector<TString> file_list;
	file_list.push_back("nanoAOD.root");
	for(int ifile=0; ifile<file_list.size(); ifile++){

		TFile *infile = TFile::Open(file_list.at(ifile),"READ");
		assert(infile);

		// Access Event Tree
		eventTree = (TTree*)infile->Get("Events");
		assert(eventTree);
		
		eventTree->SetBranchAddress("nJet", &njet);
		eventTree->SetBranchAddress("Jet_pt", jet_pt_arr);
		
		for(UInt_t ientry=0; ientry<eventTree->GetEntries(); ientry++){
		// for(UInt_t ientry=0; ientry<100; ientry++){
			count0++;
			eventTree->GetEntry(ientry);
            cout<<jet_pt_arr[i]<<endl;
		}// End of event loop
		infile->Close();
	}// End of file loop
	cout<<"Number of events processed: "<<count0<<endl;
	
	// =================== Chunk 2 ==============================
	outFile->cd();
	outFile->Write();
	outFile->Close();
	// =========================================================
}

So occasionally (around 6-7 times out of 10 runs with the same code and sample nanoAOD.root), when quitting from ROOT, I got

*** Error in `JetAna/bin/root.exe': double free or corruption (out): 0x000056383c64abc0 ***

or segment violation. However, if we comment out this line:

eventTree->GetEntry(ientry);

or comment out chunk 1 and chunk 2 labeled in the example, the script would work without any problem…

I am a bit confused since I am not seeing why the declaration of output file and output tree would interfere with eventTree->GetEntry(ientry); and screw up the process.

I am sorry if this is running away from my original question, but I would still like to seek for some advice here:)

Thanks a lot for your kind help!

Best regards,
Sebastian

Analyze files produced by: eventTree->MakeClass();

That’s broken, you are allocating a single float and ROOT then tries to write several of them starting at that address.

Dear Enrico, and @Wile_E_Coyote

Thanks a lot for your suggestions!

So it turned out the TTreeReader is much easier to use in this case, as being used in the method of eventTree->MakeClass() :slight_smile:

Thanks again for your kind help!

Best regards,
Sebastian

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.