I have a table of data with 10 million rows and 100 columns. I am storing this data in a root tree structure. This is a transactional level data that cannot be used in my analysis.
In order to prepare this data for analysis, I have to summarize this data. In SQL, my summary request would be something like:
SELECT SUM(column1), SUM(column2), SUM(column3)
GROUP BY columnA, columnB, columnC, columnD, columnE ;
Can such a summary be created from existing ROOT functionality? If so, how?
Eventually, I would like to create a method CreateSummary() with a syntax as shown above. The above method can be used to summarize any tree into another tree. The SUM() method can be repalced with any other user written aggregation method like MIN(), MAX(), MEDIAN(), WeightedAverage() etc. Do you think the Root tree structure lends itself to this kind of an application?
[quote]Can such a summary be created from existing ROOT functionality?[/quote]I assume you need to crate a TTree (histogram would be somewhat easier as they sum the data inherently). If so this is going to be a bit challenging as the TTree structure is write once / read many time and thus you need to compute the full sum for one of the unique values of (columnA, columnB, columnC, columnD, columnE) before you can write it. So essentially you need do ‘by hand’:
find the list of unique quintuplets (columnA, columnB, columnC, columnD, columnE)
for each of those unique quintuplets
sum the 3 columns over the whole input for all only the entries matching the quintuplets
write the new entry