.dat to .root


Hi.

This topic is to covert .dat file to .root file
I found the basic2.C program in the tutorial and I use it for the conversion.

The few of the entries in my ‘txtdata.dat’ file are given in the file below:
txtdata.txt (314 Bytes)

Following is the txt2root.C program that I use:

#include "Riostream.h"
void txt2root() {
   TString dir = gROOT->GetTutorialDir();
   dir.Append("/tree/");
   dir.ReplaceAll("/./","/");

   TFile *f = new TFile("output.root","RECREATE");
   /*TH1F *h1 = new TH1F("h1","x distribution",100,-4,4);*/
   TTree *T = new TTree("ntuple","data from ascii file");
   ULong64_t nlines = T->ReadFile(Form("%stxtdata.dat",dir.Data()),"x:y:z");
   printf(" found %llu points\n",nlines);
   /*T->Draw("x","z>2");*/
   T->Write();
}

I run this program in terminal and get the following:

$ root -l txt2root.C
root [0]
Processing txt2root.C...
x=20014208721000, y=1, z=645
x=20014209165640, y=1, z=6263
x=20014209480080, y=1, z=848
x=20014209657920, y=2, z=503
x=20014209790760, y=1, z=4993
 found 18966287 points

root [1] ntuple->Scan("x:y:z","","colsize=30")
***************************************************************************************************************
*    Row   *                              x *                              y *                              z *
***************************************************************************************************************
*        0 *                 20014207860736 *                              1 *                            645 *
*        1 *                 20014209957888 *                              1 *                           6263 *
*        2 *                 20014209957888 *                              1 *                            848 *
*        3 *                 20014209957888 *                              2 *                            503 *
*        4 *                 20014209957888 *                              1 *                           4993 *
*        5 *                 20014209957888 *                              2 *                           5016 *
*        6 *                 20014209957888 *                              2 *                           5019 *
*        7 *                 20014212055040 *                              2 *                            803 *
*        8 *                 20014212055040 *                              2 *                            558 *
*        9 *                 20014212055040 *                              1 *                            259 *
*       10 *                 20014212055040 *                              2 *                           4015 *
*       11 *                 20014212055040 *                              1 *                           3142 *
*       12 *                 20014212055040 *                              1 *                           2224 *
*       13 *                 20014214152192 *                              2 *                           5006 *

Problem:
Why only the ‘x’ values not same as that of my ‘txtdata.dat’ file?
The ‘y’ and ‘z’ values are correctly stored in output.root file.

Note:

  1. The original ‘txtdata.dat’ is a big file of size ~980MB.
  2. I also used the file of size ~314Bytes (which I have attached above), but got the same result.
    Any comments or an alternate to solve this?

ROOT Version: 6.26/10
Platform: Ubuntu 22.04.2 LTS

Thanks,
Newbieeee


Hello,

Thanks for the reproducer.
If I take your input file and convert it like this (your code above):

   TFile *f = new TFile("output.root","RECREATE");
   TTree *T = new TTree("ntuple","data from ascii file");
   ULong64_t nlines = T->ReadFile("txtdata.txt","x:y:z");
   printf(" found %llu points\n",nlines);
   T->Write();

I see exactly the content of the txt file in the TTree:

cat txtdata.txt 
20014208721000 1 645
20014209165640 1 6263
20014209480080 1 848
20014209657920 2 503
20014209790760 1 4993
20014210162560 2 5016
20014210921720 2 5019
20014211039040 2 803
20014211166040 2 558
20014211197040 1 259
20014212153080 2 4015
20014212455920 1 3142
20014212712840 1 2224
20014213219800 2 5006

and

ntuple->Scan("x:y:z","","colsize=14")
***************************************************************
*    Row   *              x *              y *              z *
***************************************************************
*        0 * 20014207860736 *              1 *            645 *
*        1 * 20014209957888 *              1 *           6263 *
*        2 * 20014209957888 *              1 *            848 *
*        3 * 20014209957888 *              2 *            503 *
*        4 * 20014209957888 *              1 *           4993 *
*        5 * 20014209957888 *              2 *           5016 *
*        6 * 20014209957888 *              2 *           5019 *
*        7 * 20014212055040 *              2 *            803 *
*        8 * 20014212055040 *              2 *            558 *
*        9 * 20014212055040 *              1 *            259 *
*       10 * 20014212055040 *              2 *           4015 *
*       11 * 20014212055040 *              1 *           3142 *
*       12 * 20014212055040 *              1 *           2224 *
*       13 * 20014214152192 *              2 *           5006 *
***************************************************************

Cheers,
D

Sorry, but your output of TTree isn’t same to the txt file!

You are fully right! Let me see…

Hi,

check this out: ROOT: TTree Class Reference

In particular,

  • If the type of the first variable is not specified, it is assumed to be “/F”
  • If the type of any other variable is not specified, the type of the previous variable is assumed. eg
      • x:y:z (all variables are assumed of type “F”)

And 20014208721000 is too much for a float:

root [0] float x1 = 20014208721000;
ROOT_prompt_0:1:12: warning: implicit conversion from 'long' to 'float' changes value from 20014208721000 to 20014207860736 [-Wimplicit-const-int-float-conversion]
float x1 = 20014208721000;
      ~~   ^~~~~~~~~~~~~~

Hi,

Please refer to this code and apologies for the mistake above!

   auto fileName = "txtdata.txt";
   auto df = ROOT::RDF::FromCSV(fileName, false, ' ');
   auto d = df.Display();
   df.Snapshot("ntuple", "rdfoutput.root");
   
   TFile f("rdfoutput.root");
   auto t = f.Get<TTree>("ntuple");
   t->Scan("Col0:Col1:Col2","","colsize=14");

This gives

***************************************************************
*    Row   *           Col0 *           Col1 *           Col2 *
***************************************************************
*        0 * 20014208721000 *              1 *            645 *
*        1 * 20014209165640 *              1 *           6263 *
*        2 * 20014209480080 *              1 *            848 *
*        3 * 20014209657920 *              2 *            503 *
*        4 * 20014209790760 *              1 *           4993 *
*        5 * 20014210162560 *              2 *           5016 *
*        6 * 20014210921720 *              2 *           5019 *
*        7 * 20014211039040 *              2 *            803 *
*        8 * 20014211166040 *              2 *            558 *
*        9 * 20014211197040 *              1 *            259 *
*       10 * 20014212153080 *              2 *           4015 *
*       11 * 20014212455920 *              1 *           3142 *
*       12 * 20014212712840 *              1 *           2224 *
*       13 * 20014213219800 *              2 *           5006 *
***************************************************************

Which should be correct

Cheers,
D

Just to complement the answer… Using RDataFrame has many advantages, including the possibility of manipulating your data before dumping as a TTree, for example applying selections or adding columns.

Cheers,
D

1 Like

Although I don’t understand the RDataFrame…The code you provided worked.
Thank you very much for the solution!

Also, if I can add this question here, how can one decrease the file size of the .root file.
The ‘rdfoutput.root’ created is ~324M when I use my original ~980MB ‘txtdata.dat’ file.
Can we modify the code that you sent above for the purpose of decreasing the file size?

Thanks again.

Thanks for the reply. I did look into something like that.

Is there a data type variable for this large no. ‘x’ which I can use in the line below:

 ULong64_t nlines = T->ReadFile("txtdata.txt","x/?:y/F:z/F");

Thanks.

Based on your txtdata.txt I would assume

ULong64_t nlines = T->ReadFile("txtdata.txt","x/l:y/s:z/i"); // unsigned long64, unsigned short16, and unsigned int32

see ROOT: TTree Class Reference for details

1 Like

Thanks.
This also solved the problem.

Below is the correct code.

#include "Riostream.h"
void txt2root() {
   TString dir = gROOT->GetTutorialDir();
   dir.Append("/tree/");
   dir.ReplaceAll("/./","/");

   TFile *f = new TFile("output.root","RECREATE");
   /*TH1F *h1 = new TH1F("h1","x distribution",100,-4,4);*/
   TTree *T = new TTree("ntuple","data from ascii file");
   ULong64_t nlines = T->ReadFile(Form("%stxtdata.dat",dir.Data()),"x/l:y/F:z/F");
   printf(" found %llu points\n",nlines);
   /*T->Draw("x","z>2");*/
   T->Write();
}

To add a note:
This produces the root file of size ~314M with my original txt file of ~980M.

this is great!

Maybe one last thing. It’s probably not a requirement for the current situation, but using RDataFrame makes the code future proof. ROOT is working on the successor of TTree, RNTuple, and RDataFrame will be able to write out datasets in that format without changes in user code.

Cheers,
D

1 Like

Try using

ULong64_t nlines = T->ReadFile(Form("%stxtdata.dat",dir.Data()),"x/l:y/s:z/i");

instead of

ULong64_t nlines = T->ReadFile(Form("%stxtdata.dat",dir.Data()),"x/l:y/F:z/F");

It will probably decrease the file size.

1 Like

Yes. It reduced to 303M.

"x/l:y/b:z/s"
Note: make sure you always have in your original data file: “x >= 0”, “0 <= y <= 255”, and “0 <= z <= 65535

1 Like

Thanks. It reduced to 283M.

Assuming “x” is at most 15 decimal digits long (in your original data file), try also (maybe you’ll get better compression): "x/D:y/b:z/s"

Thanks for the reply, but it increased to 288M.

Of course.

Coming back to our original example:

   auto fileName = "txtdata.txt";
   auto df = ROOT::RDF::FromCSV(fileName, false, ' ');
   auto d = df.Display();
   ROOT::RDF::RSnapshotOptions opts("RECREATE", ROOT::RCompressionSetting::EAlgorithm::kLZMA, 9, 0, 99, false);
   df.Snapshot("ntuple", "rdfoutput.root", ".*", opts);
   
   TFile f("rdfoutput.root");
   auto t = f.Get<TTree>("ntuple");
   t->Scan("Col0:Col1:Col2","","colsize=14");

That will compress your data with LZMA (see ROOT: ROOT::RDF::RSnapshotOptions Struct Reference)

Cheers,
D

Thank you for the reply.

This gave me an error (few lines are pasted below). Seems there is a problem with 2nd argument.
I saw the documentation, but sorry I couldn’t understand it.

 error: no matching constructor for initialization of 'ROOT::RDF::RSnapshotOptions'
   ROOT::RDF::RSnapshotOptions opts("RECREATE", ROOT::RCompressionSetting::EAlgorithm::kLZMA, 9, 0, 99, false);
                               ^    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/include/ROOT/RSnapshotOptions.hxx:27:4: note: candidate constructor not viable: no known conversion from 'ROOT::RCompressionSetting::EAlgorithm::EValues' to 'ROOT::RDF::RSnapshotOptions::ECAlgo' (aka 'ROOT::ECompressionAlgorithm') for 2nd argument
   RSnapshotOptions(std::string_view mode, ECAlgo comprAlgo, int comprLevel, int autoFlush, int splitLevel, bool lazy,

/root/include/ROOT/RSnapshotOptions.hxx:25:4: note: candidate constructor not viable: requires 1 argument, but 6 were provided
   RSnapshotOptions(const RSnapshotOptions &) = default;