I have a general question about the best way to deal with time series (e.g. the temperature for each day in the year)
The alternatives I see are:
- put the time series in a histogram
- put the time series in a tree, one entry for each time
- something smarter than (1) or (2)
(1) is fine, but the measurement (temperature) is just one number. One cannot for instance combine in the same histogram the temperature and a new variable (pressure).
(2) is better in the sense that one can make a class ‘weather’ which combine the temperature and pressure for instance. However, in an analysis, you can only access one event at a time. Rebinning the data is obviously more complex than in (1). Also making an FFT analysis.
There are many, many, other pros and cons with both approaches.
Other ROOT users must certainly have thought about this problem and found a good compromise, it would be interesting if they could share their experience.
what about option 4) - store the different temperatures e.g. at the stock markets in New York, Tokyo, Frankfort, London, etc in one TTree, each entry representing a location. And have one TTree per point in time. That way you can loop over all TTrees, keep as many of them in memory as you need for your analysis, and still you don’t need to load the full data set in memory, only those locations that you really need.
But then again I have never worked with temperatures at stock market locations
Yes, that would allow maximum flexibility, but sounds kind of cumbersome. It might be fine to have one tree for one day, which might be ok for stock markets. But if the data is taken at a higher rate, one could easily end up with a huge amount of trees this way. Whereas in the case of a histogram, a million samples is no problem.
Actually, one millions sample should not be much of problem for a disk resident TTree either. TTree is designed to handle very large amount of data.