I hunted around here, but I failed to find this - which means I must have missed it.
We have some 2D arrays in our current ntuple that are declared as “myname[njet][nele]/F” in a call to TTree::Branch. The problem is that most events have 2 jets but every now and then there are 8 or 20 jets. And the same for the number of electrons. Mostly you have small numbers, but we need to deal with large multiplicities.
So the question is - how can we change things without a major disruption to our code? A few things:
- We generate code infrastructure with MakeClass or MakeSelector.
- All our variables are declared using this Branch method currently, so anything we do can’t kill that.
The best way, I woudl think, woudl be do do something like vector<vector> or similar… Is the speed of that reasonable for i/o? What about the dictionaries for something like vector<vector>? Will that break the way all of us use MakeClass?
P.S. In about 6 months we will probably move to another root-tuple, so we don’t want to invest a huge amount of time redeveloping it…
what’s the problem and what’s your goal? Reduce CPU time, reduce memory usage, reduce file size?
Sorry - for some reason I never got an email notification, so I didn’t realize this guy had been replied to.
The main goal is to reduce filesize without a huge cost in the other parameters:
- Small increase in CPU is ok
- Major hassle making MakeClass work is not (have to build dictionaries, etc. every time).
It looks like in 5.27 the dictionary for vector<vector> is included (I assume vector<vector> too?). Is this also the case in 5.26 (I’ll get around to testing this later). At any rate, I was thinking this was the way to go.
did you ever measure the file size difference between keeping it as [njet][nele] and artificially restricting it to say two jets, i.e. [nele], to see whether optimizing the c-style array away is actually worth it? My guess is it’s not worth it, as soon as you turn compression on. Or try [nele][njet] (I always forget which way it’s sorted, I think the first index “iterates first” in memory?) EIther way: 2000 zeroes can be compressed amazingly well.
But yes, you can alternatively use a vector<vector >. 5.27 should automatically create the dictionary if it doesn’t exist (yes, we added even more magic to ROOT - a summer student, actually .
Thanks. I’ve got a general bit of source code now to help out with testing these sorts of things. So, what I see is that if you have a 2D array, make the first index fixed (info[ny]) and that is an almost x10 reduction in the size of the file. This is with default TTree and TFile ctor - so whatever compression you get with that. Because there is less data written you also save on realtime processing and reading the file, of course.
I’m still testing the vector> things. For one thing, it looks like you loose the ability to do CINT processing as it can’t deal with nested templates. Where is that LVVM?
fixing one dimension to 20 is 20 times less file size than unfixed? Or the other way around?
Do you initialize the unused array entries to 0? You should, to get high compression.
Nested templates should work with CINT. And LLVM is not needed by the experiments
Ah, vector<vector> caused soem sort of horrible syntax error - but at the time I’d also fallen into that trap of accidentally putting a " " in my file path (that still isn’t fixed!? ). So I’ll go back and do a more careful check.
I did a comparison of zeroing and not zeroing - holy-cow! It makes a huge difference. A large array with zeros is about 5% or 10% bigger than a correctly sized array. So ROOT is doing an excellent job of compression.
Too bad - looked like a neat idea, even if the father of C++ was pretty pessimistic!
[quote] but at the time I’d also fallen into that trap of accidentally putting a " " in my file path (that still isn’t fixed!? )[/quote]Humm … this should have been fixed in v5.22/00 … How did it fail? Does it also fail with v5.27/04?