Add a String Filter in Datafram

ROOT Version: 6.24/02
Platform: Scientific Linux 7.3 (Nitrogen)
Compiler: gcc 4.8.5(Red Hat 4.8.5-28)

I am using RDataFrame to analyze some data in a root file, which involves the filtering of string data.
i.e.

auto Si = d.Filter("ParticleName == \"proton\"").Histo1D({"Si","Si",4000,0,400},"particleEnergy");

That is to filter out the energy of protons.
However, there is a problem with the above filter statement, and the error is as follows:

/opt/root62402/include/ROOT/RVec.hxx:609:23: error: comparison between pointer and integer ('cha
RVEC_LOGICAL_OPERATOR(==)
~~~~~~~~~~~~~~~~~~~~~~^~~
/opt/root62402/include/ROOT/RVec.hxx:579:49: note: expanded from macro 'RVEC_LOGICAL_OPERATOR'
   auto op = [y](const T0 &x) -> int { return x OP y; };                       \
                                              ~ ^  ~
input_line_32:2:65: note: in instantiation of function template specialization 'ROOT::VecOps::op
auto lambda0 = [](ROOT::VecOps::RVec<Char_t>& var0){return var0 == "proton"
                                                                ^
terminate called after throwing an instance of 'std::runtime_error'
  what():  
RDataFrame: An error occurred during just-in-time compilation. The lines above might indicate th
 All RDF objects that have not run an event loop yet should be considered in an invalid state.

And here are some parts of my root file:

************************************************************************
*    Row   * eventID.e * trackID.t * particleI * ParticleName * particleEnergy *
************************************************************************
*        0 *  37724355 *         2 * 32.914987 *    proton * 16.426902 *
*        1 *  52427664 *         1 * 3.8289047 *   neutron * 1.8401179 *
*        2 *  90015881 *         2 * 2.1190153 *    proton * 1.0689786 *
*        3 * 113350944 *         2 * 1.8670489 *    proton * 0.7781645 *
*        4 * 113388706 *         1 * 0.1019912 *   neutron * 0.0501000 *
*        5 * 201286952 *         1 * 19.599530 *   neutron * 18.641971 *
*        6 * 297711814 *         2 * 2.8497625 *       C12 * 0.2642544 *

How can I make it working?
Thanks a lot
han

Hi @CY_Han ,
and welcome to the ROOT forum!
What type is ParticleName stored as? (e.g. what does TTree::Print say the type of the branch is?)
The problem is that RDF reads the branch as an RVec<char> instead of e.g. a std::string.

This should be a workaround:

d.Filter("std::string s(ParticleName.begin(), ParticleName.end()); return s == \"proton\";")

Cheers,
Enrico

Hi Enrico,
thanks for your reply!
The ParticleName stored as char, as you can see below.

root [2] tree->Print()
******************************************************************************
*Tree    :tree      : Stepping Data                                          *
*Entries :       18 : Total =            4168 bytes  File  Size =       1721 *
*        :          : Tree compression factor =   1.23                       *
******************************************************************************
*Br    0 :eventID   : eventID/L                                              *
*Entries :       18 : Total  Size=        723 bytes  File Size  =        194 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.12     *
*............................................................................*
*Br    1 :trackID   : trackID/L                                              *
*Entries :       18 : Total  Size=        723 bytes  File Size  =        120 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.82     *
*............................................................................*
*Br    2 :particleINEnergy : particleINEnergy/D                              *
*Entries :       18 : Total  Size=        768 bytes  File Size  =        227 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    3 :ParticleName : ParticleName/C                                      *
*Entries :       18 : Total  Size=        807 bytes  File Size  =        191 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.52     *
*............................................................................*
*Br    4 :particleEnergy : particleEnergy/D                                  *
*Entries :       18 : Total  Size=        758 bytes  File Size  =        225 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*

The code you mentioned did not report an error, but I did not get the data I wanted. It seems that there is no data meets the filter criteria, which is incomprehensible. Since we can clearly see the data meets the condition.
In order to better illustrate the problem, I uploaded a data file and the processing code I used. Could you please help me find out where the problem is?

Thanks a lot
han
plot_multi.cpp (1.1 KB)
data.root (7.1 KB)

Hi Han,
thanks a lot for the complete reproducer, that always simplifies debugging :grinning_face_with_smiling_eyes:

It’s C-string shenanigans. It seems that ParticleName is always an array of 8 characters, padded with \0 if the actual ParticleName is shorter. So we were comparing s which could be e.g. proton\0\0 with "proton". I found out by adding std::cout << s << ' ' << s.size() << '\\n'; in the middle of the Filter expression.

This is a workaround:

auto Si = d.Filter(" std::string s(ParticleName.begin(), ParticleName.end()); return s.substr(0, 6) == \"proton\";")

An alternative is to store std::strings instead of arrays of characters.

Cheers,
Enrico

Cool!
Thank you very much for solving my problem. :smiley:

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.