NB: this question is a bit like RDataFrame column types reading in a csv file but in that case they could get around the problem by tricking ROOT into expecting a string with quotation marks. Here I don’t know how to tell ROOT that it’s going to lose precision if it forces a column to an integer, even if the first row makes it seem that it should be safe.
I’m reading in csv data with:
auto df = ROOT::RDF::MakeCsvDataFrame(filename_in);
and it looks like ROOT sets the type using the first entry in the CSV. Sometimes for a large number like momentum, though, it looks like a LongInt when the type is really Double_t if you look at other rows. Is there a way to force ROOT to use the ‘correct’ types for the columns?
ROOT Version: 6.26/10 Platform: Arch Compiler: Not Provided
I got around this problem since I am in control of the CSV contents, too. So I changed the precision of what I dump to CSV so that the column always has a “.” and that forces RDataFrame to read it in as a Double.
Incidentally, I’m only writing to CSV because I have a problem with race conditions when writing to separate TTree/TFiles in a ForEachSlot of an RDataFrame. When I write to CSV instead of TFile I get around it easily…
One way around it is to first read the csv file into a tree, which does support specifying the type of each column (see TTree::ReadFile), and then when you read the tree with a DataFrame, it will take the types from the tree.
$ cat z.txt
2 209231
3 345
7 25435
1 6732645
$ cat z.C
void z() {
TFile f("z.root","RECREATE");
TTree *T=new TTree("T","tree");
T->ReadFile("z.txt","a/I:b/D");
T->Write("T");
f.Close();
}
$ root -l -b -q z.C
(...)
root [0] ROOT::RDataFrame d("T", "z.root")
(ROOT::RDataFrame &) A data frame built on top of the T dataset.
root [1] d.GetColumnType("a")
(std::string) "Int_t"
root [2] d.GetColumnType("b")
(std::string) "Double_t"