RDataFrame misinterprets column containing bitsets

Is there a way to store an array of words of bits in a TTree, and access it with RDataFrame?

If I try (root -b -q hello_word.C+)

#include <TChain.h>
#include "ROOT/RDataFrame.hxx"
#include "ROOT/RVec.hxx"
#include <array>
#include <bitset>
#include <iostream>
#include <TInterpreter.h>

const int N_BITS = 128;
const int N_WORDS = 20;

void hello_word(void) {
    gInterpreter->GenerateDictionary("bitset<128>;array<bitset<128>, 20>", "array;vector;ROOT/RVec.hxx");

    std::bitset<N_BITS> word;
    std::bitset<N_BITS> words[N_WORDS];
    std::array<std::bitset<N_BITS>, N_WORDS> words2;
    
    std::cout << "Defining the tree" << std::endl;
    auto f = new TFile("test.root", "RECREATE");
    auto t = new TTree("t", "a test tree");
    t->Branch("word", &word);
    t->Branch("words", &words, "words[20]");
    t->Branch("words2", &words2, "words2[20]");

    for (Int_t i = 0; i < N_WORDS; i++) {
        word.flip();
        words[i].set();
        words2[i].set();
        t->Fill();
    }
    f->Write();
    f->Close();
    SafeDelete(f);

    std::cout << std::endl << "Reading it back" << std::endl;
    auto chain = TChain("t");
    chain.Add("test.root");
    ROOT::RDataFrame dset(chain);

    for (auto var: dset.GetColumnNames()) {
        std::cout << var << " is " << dset.GetColumnType(var) <<std::endl;
    }
}

returns

Defining the tree
Error in <TTree::Branch>: The class requested (bitset<128>) for the branch "word" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (bitset<128>) to avoid to write corrupted data.

Reading it back
words is ROOT::VecOps::RVec<Float_t>
words2 is ROOT::VecOps::RVec<Float_t>

The actual use case has a new column which contains a struct with a few arrays of bitsets (one per sub-detector) - the motivation for using bitset is probably obvious, while that of using arrays instead of vectors is performance-related (the number of sub-detectors is of course fixed).


Please read tips for efficient and successful posting and posting code

Please fill also the fields below. Note that root -b -q will tell you this info, and starting from 6.28/06 upwards, you can call .forum bug from the ROOT prompt to pre-populate a topic.

ROOT Version: 6.28/10
Platform: linuxx8664gcc
Compiler: g++ (GCC) 13.1.0


Dear Rene,

I think @mczurylo is the right person to ask.

Cheers,
Danilo

Hi @rene,

Is there a way to store an array of words of bits in a TTree, and access it with RDataFrame?

I think this is pretty much what you are doing already? You are first filling branches of your TTree and then reading it from the created RDF.

In order to remove the error, instead of gInterpreter->GenerateDictionary call ( especially the part:
gInterpreter->GenerateDictionary("array<bitset<128>, 20>", "ROOT/RVec.hxx");
generating the following error: Error: It is not necessary to explicitly select class array<bitset<128>,20>. I/O is supported for it transparently.), you need to add a new file “LinkDef.h” in which you do:

#include "bitset"
#ifdef __ROOTCLING__
#pragma link C++ class bitset<128>+;
#endif

and then run:
rootcling -f myDict.cxx LinkDef.h
and
g++ -shared -o libmydict.so mydict.cxx `root-config --cflags --libs

Finally, you can just run your code as before.

I get the following output:

Defining the tree

Reading it back
word is bitset<128>
words is ROOT::VecOps::RVec<Float_t>
words2 is ROOT::VecOps::RVec<Float_t>

If you have some other questions, let us know!

Cheers,
Marta

Hi @mczurylo ,

why would the column type be identified as [quote=“mczurylo, post:3, topic:57460”]
ROOT::VecOps::RVec<Float_t>
[/quote] though?

Hi @rene,

I guess I didn’t fully get your question earlier.

I believe the easiest way is to use RVec’s so the ROOT’s std::vector-like collection: ROOT: ROOT::VecOps::RVec< T > Class Template Reference instead of the arrays (bitset is not a supported type for the arrays in the Branches: ROOT: TTree Class Reference).

In order to use RVecs of bitsets your LinkDef.h file should be:

#include "bitset"
#include "ROOT/RVec.hxx"
#ifdef __ROOTCLING__
#pragma link C++ class bitset<128>+;
#pragma link C++ class ROOT::VecOps::RVec<bitset<128>>+;
#endif

while instead of arrays you should do:

ROOT::VecOps::RVec<bitset<128>> words(20);
ROOT::VecOps::RVec<bitset<128>> words2(20);

while your branches will be:

t->Branch("words", &words); 
t->Branch("words2", &words2);

and then you’ll obtain the types of columns as:

word is bitset<128>
words is ROOT::VecOps::RVec<bitset<128> >
words2 is ROOT::VecOps::RVec<bitset<128> >

Cheers,
Marta

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.