Root Trees Workflow

Hi,

I never worked with Trees but I want to start. So I am quite confused what the best way would be as there are different options to create and write to trees.

Maybe a specific example: Suppose I have a device with a dozen channels. An external Trigger common to all channels triggers the data acquisition. So we would have Event 1 and the corresponding data in Ch1, Ch2, Ch3 and so on…

Question 1: Would it be best to write a tree for every channel or put everything in one tree regarding the flexibility of further analysis afterwards?

My approach would be to create one tree and write a struct having the members Event, Channel, Data for example to fill the tree.

Question 2: The number of datapoints recorded per event depends on the configuration of the device before the measurement and could vary from case to case. Would there be a way to account for that when defining the struct, i.e. have a variable sized array for the data member? Or would one just define it in the struct with the size of the maximum number of datapoints possible?

Thanks for any general thoughts and input!

Hello,

I think trees are a very good match for the problem you describe! I would recommend to put all the data in a single tree. With the default settings, the tree will structure the data internally such that when you read it, only accessed channels need to be read from disk.

You can create an Event struct with a layout that is “natural” to your problem. Trees can handle variable sized members, e.g. std::vector<float> or std::vector<Channel> for another user-defined class “Channel” (if that makes sense).

In order to use your own classes with the TTree I/O, you’d need to generate so-called dictionaries for the classes. I suggest the documentation on adding your own class to a tree and the tree4 tutorial to get started.

Cheers,
Jakob

Thanks for the reply! How do I setup a struct with variable sized arrays? So I’d like something like this:


record_length = (this could change)

struct event_STRUCT {
		Int_t Event_NO;
		Int_t Channel;
		Float_t datapoints[record_length];
	};

event_STRUCT sEvent;

TTree *tree = new TTree(); 

and so on......

I know this does not work like that. Is there a way or is another approach necessary? Thanks you!

You could use

struct event_STRUCT {
   Int_t Event_NO;
   Int_t Channel;
   std::vector<Float_t> datapoints;
};

Ok thank you! But how can I populate the vector? Sorry i have close to zero experience with c++.

As an example I know that I could do something like this:

struct event_STRUCT {
	Int_t Event_NO;
	Int_t Channel;
	Float_t datapoints[100];
};

event_STRUCT sEvent;

TTree *tree = new TTree();

tree->Branch("Event", &sEvent.Event_NO, "Event/I");
tree->Branch("Channel", &sEvent.Channel, "Channel/I");
tree->Branch("datapoints", sEvent.datapoints, "datapoints[100]/F");

sEvent.Event = 1;
sEvent.Channel = 1;

for (int i = 0; i < 100; ++i)
{
	sEvent.datapoints[i] = i;
}

tree->Fill();

TFile *output = new TFile("tree.root", "recreate");
tree->Write();
output->Close();
	
}

But how does it work with the vector? I tried some things inside the for loop with push_back() but could not make it work.

What didn’t work, exactly? push_back should work to “fill” vectors.
As side note, and just my two cents: I’d say if you need vectors (or variable size elements), I try to avoid structs in trees, since ROOT will expect contiguous memory for consecutive members of the struct and the variable number of elements could cause trouble depending on the type (not necessarily, but something to be careful about); you can use separate branches for each variable (Event, channel,…), which will need SetBranchAddress for each one, but is generally safer, I think.

I tried


	struct event_STRUCT {
		Int_t Event;
		Int_t Channel;
		std::vector<Float_t> datapoints;
	};

event_STRUCT sEvent;
....
....

tree->Branch("datapoints", sEvent.datapoints, "datapoints/F");

for (int i = 0; i < 100; ++i)
{
	sEvent.datapoints.push_back(i);
}

tree->Fill();

and other different forms of that but it doesn’t seem to work like that. What am I doing wrong?

Concerning seperating the branches for each variable: Would it provide enough flexibility and ease of use for further analysis. Like filtering the tree to get only the data from a specific channel for example.
Could you give further explanation on how the SetBranchAdress thing would work for my example or give me a reference to the documentation? I have a hard time understanding and finding what I need.

https://root.cern/doc/master/tree1_8C_source.html
The example shows how to create, fill, save and read a tree and plot histograms from it.

Thank you! Ok, this is a really simple example.

One question:

In the example they loop over all entries of “px” for example and fill it in a histogram. How would I subdivide/group my entries? In my example I’d have a branch for “Channel” and one for “Datapoints”. How would I then select only the datapoints corresponding to channel 1 for example. Would I have to somehow split my Channel branch first?

Once you have done SetBranchAddress for the branches you want to read (you don’t have to do it for all branches if you don’t need them), every time you do GetEntry or GetEvent all of them are “read” for the same event, so you can do any processing, selection, etc on any of them, exactly as you would do when iterating over arrays or vectors, for instance, where at each iteration you are accessing the same index for all variables; e.g. inside the reader loop, after GetEntry(i) you can just do “if Channel=1 then do this…” or whatever you need. Of course, if you have vectors inside branches, you also have to iterate over these vectors’ entries, inside each event.