Hi,
I hunted around here, but I failed to find this - which means I must have missed it.
We have some 2D arrays in our current ntuple that are declared as “myname[njet][nele]/F” in a call to TTree::Branch. The problem is that most events have 2 jets but every now and then there are 8 or 20 jets. And the same for the number of electrons. Mostly you have small numbers, but we need to deal with large multiplicities.
So the question is - how can we change things without a major disruption to our code? A few things:
We generate code infrastructure with MakeClass or MakeSelector.
All our variables are declared using this Branch method currently, so anything we do can’t kill that.
The best way, I woudl think, woudl be do do something like vector<vector> or similar… Is the speed of that reasonable for i/o? What about the dictionaries for something like vector<vector>? Will that break the way all of us use MakeClass?
Many thanks!
Cheers, Gordon.
P.S. In about 6 months we will probably move to another root-tuple, so we don’t want to invest a huge amount of time redeveloping it…
Sorry - for some reason I never got an email notification, so I didn’t realize this guy had been replied to.
The main goal is to reduce filesize without a huge cost in the other parameters:
Small increase in CPU is ok
Major hassle making MakeClass work is not (have to build dictionaries, etc. every time).
It looks like in 5.27 the dictionary for vector<vector> is included (I assume vector<vector> too?). Is this also the case in 5.26 (I’ll get around to testing this later). At any rate, I was thinking this was the way to go.
did you ever measure the file size difference between keeping it as [njet][nele] and artificially restricting it to say two jets, i.e. [2][nele], to see whether optimizing the c-style array away is actually worth it? My guess is it’s not worth it, as soon as you turn compression on. Or try [nele][njet] (I always forget which way it’s sorted, I think the first index “iterates first” in memory?) EIther way: 2000 zeroes can be compressed amazingly well.
But yes, you can alternatively use a vector<vector >. 5.27 should automatically create the dictionary if it doesn’t exist (yes, we added even more magic to ROOT - a summer student, actually .
Hi Axel,
Thanks. I’ve got a general bit of source code now to help out with testing these sorts of things. So, what I see is that if you have a 2D array, make the first index fixed (info[20][ny]) and that is an almost x10 reduction in the size of the file. This is with default TTree and TFile ctor - so whatever compression you get with that. Because there is less data written you also save on realtime processing and reading the file, of course.
I’m still testing the vector> things. For one thing, it looks like you loose the ability to do CINT processing as it can’t deal with nested templates. Where is that LVVM?
Hi,
Ah, vector<vector> caused soem sort of horrible syntax error - but at the time I’d also fallen into that trap of accidentally putting a " " in my file path (that still isn’t fixed!? ). So I’ll go back and do a more careful check.
I did a comparison of zeroing and not zeroing - holy-cow! It makes a huge difference. A large array with zeros is about 5% or 10% bigger than a correctly sized array. So ROOT is doing an excellent job of compression.
Too bad - looked like a neat idea, even if the father of C++ was pretty pessimistic!
[quote] but at the time I’d also fallen into that trap of accidentally putting a " " in my file path (that still isn’t fixed!? )[/quote]Humm … this should have been fixed in v5.22/00 … How did it fail? Does it also fail with v5.27/04?