Is capacity of std containers serialized to ROOT files?

malfonsi79 · July 3, 2017, 2:20pm

Dear ROOTers,

std::vector and most of other STL containers have also a capacity() field. Does the Streamer() of these classes save this extra field? And in particular when they are saved in a TTree, where probably they be just readout and never modified?

To bring an example, a std::vector containing only one element, would it be serialized as 3x8=24 bytes (size + capacity + 1st and only element) ?

Thanks for the clarification,
Matteo

Danilo · July 3, 2017, 2:23pm

Hi Matteo,

the size is serialised indeed.

Cheers,
D

behrenhoff · July 3, 2017, 3:11pm

Actually, a vector also contains also a pointer to the data. So here (64 bit) sizeof(vector<Double_t>) = sizeof(v.capacity()) + sizeof(v.size()) + sizeof(v.data()) = 24 And then add the real data. I have no clue how the pointer is serialized in root, seems black magic to me

So how much more efficient is a branch a_length/I plus a branch a[a_length]/D in a real word example?

malfonsi79 · July 3, 2017, 3:42pm

Hi Danilo,

do you mean “size” or “capacity”?

Hi Behrenhoff,

I would be really surprised if also the address (which could not be used anyway to reconstruct the data) would be serialized too. At least if I think about how custom class with pointers to arrays are serialized I do not think that just a no-sense address is saved together with the data.

Please experts correct me if wrong

behrenhoff · July 3, 2017, 4:41pm

Disclaimer: I am not an expert. Just testing and looking at the results:

1M events with 1 Double_t element in a vector (content=random). Result: file with vector is approx 10% larger than 2 branches size+array (depends on compression level and type).

Also, it seems capacity is not stored. In the following script, the file size produced for random capacity versus constant capacity (comment out the v.reserve) is identical (constant should compress better than non-constant values).

Test script:

#include <Compression.h>
#include <TFile.h>
#include <TTree.h>
#include <memory>
#include <random>

void testvec() {
    std::mt19937 rngVal(0);
    std::mt19937 rngCap(0);

    auto f1 = std::unique_ptr<TFile>(TFile::Open("testvec.root", "RECREATE"));
    //f1->SetCompressionAlgorithm(ROOT::ECompressionAlgorithm::kLZMA);
    //f1->SetCompressionLevel(8);
    auto t1 = new TTree("t", "t");

    auto f2 = std::unique_ptr<TFile>(TFile::Open("testarr.root", "RECREATE"));
    auto t2 = new TTree("t", "t");

    Int_t v_size = 1;
    std::vector<Double_t> v;

    t1->Branch("v", &v);

    t2->Branch("v_size", &v_size);
    auto arrbr = t2->Branch("a", v.data(), "a[v_size]/D");

    for (size_t i = 0; i < 1000000; ++i) {
        std::vector<Double_t>{}.swap(v);
        v.reserve(std::uniform_int_distribution<size_t>(0,10000)(rngCap));
        v.push_back(std::uniform_real_distribution<Double_t>(0,500)(rngVal));
        arrbr->SetAddress(v.data());

        t1->Fill();
        t2->Fill();
    }

    f1->Write();
    f2->Write();
}

pcanal · July 3, 2017, 5:14pm

The capacity is indeed not stored, just the size of the data. The non-compressed size of vector vs size+array should be similar (the difference being one vs two branches) but indeed the compression would be different.

When the vector is stored in a TTree, the maximum size is record and upon reading the vector’s capacity ought to be that value.

Cheers,
Philippe.

malfonsi79 · July 4, 2017, 2:46pm

Dear all,

thanks, my question is fully replied

system · July 18, 2017, 2:46pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.