I am working with the simulated files in the EDM4hep event data format, which is based on podio.
I know that, for example, FCCAnalysis try to combine EDM4hep and RDataFrame and so do I, because I like the RDataFrame’s performance.
The TTree
structure of EDM4hep file looks something like this.
The main trouble comes when working with branches that relate to other branches, like in my example, where the branch “person” indicates from which branch to take the “present”.
In my TTree
:
Ecal*
, Hcal*
, LCAL
, LHCAL
, MUON
are collections of CalorimeterHits from different subdetectors in the event. (RVec<edm4hep::HitData>
).
PandoraClusters
is the collection of clusters in the event RVec<edm4hep::ClusterData>
.
Each cluster has associated CalorimeterHits from abovementioned collections.
_PandoraClusters_hits
(RVecpodio::ObjectID) store information about these cluster hits. Basically collectionID
, which is 1 to 1 map to the name of the collection and index of the cluster hit in this collection.
I would like to collect all the hits related to each PandoraCluster
in single collection.
I managed to get what I wanted, but it is very ugly, because a) all collection names are hardcoded; b) the position of all input arguments is hardcoded:
auto get_cluster_hits(edm4hep::ClusterData cluster, ROOT::VecOps::RVec<podio::ObjectID> hit_ids_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> lhcal_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> lcal_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> muon_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> ecal_barrel_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> ecal_barrel_gap_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> ecal_endcap_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> ecal_endcap_gap_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> ecal_endcap_ring_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> hcal_barrel_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> hcal_endcap_hits_col,
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> hcal_endcap_ring_hits_col){
//return clusters associated to the given PFO
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> result;
auto n_hits = cluster.hits_end - cluster.hits_begin;
if (n_hits == 0) return result;
for(int i=cluster.hits_begin; i != cluster.hits_end; i++){
auto objID = hit_ids_col[i];
auto col_name = collection_id2name[objID.collectionID];
ROOT::VecOps::RVec<edm4hep::CalorimeterHitData> hits_col;
if (col_name == "LHCAL") hits_col = lhcal_hits_col;
else if (col_name == "LCAL") hits_col = lcal_hits_col;
else if (col_name == "MUON") hits_col = muon_hits_col;
else if (col_name == "EcalBarrelCollectionRec") hits_col = ecal_barrel_hits_col;
else if (col_name == "EcalBarrelCollectionGapHits") hits_col = ecal_barrel_gap_hits_col;
else if (col_name == "EcalEndcapsCollectionRec") hits_col = ecal_endcap_hits_col;
else if (col_name == "EcalEndcapsCollectionGapHits") hits_col = ecal_endcap_gap_hits_col;
else if (col_name == "EcalEndcapRingCollectionRec") hits_col = ecal_endcap_ring_hits_col;
else if (col_name == "HcalBarrelCollectionRec") hits_col = hcal_barrel_hits_col;
else if (col_name == "HcalEndcapsCollectionRec") hits_col = hcal_endcap_hits_col;
else if (col_name == "HcalEndcapRingCollectionRec") hits_col = hcal_endcap_ring_hits_col;
auto hit = hits_col[objID.index];
result.push_back(hit);
}
return result;
}
In EDM4hep it is a common way to link a lot of related information.
- Calorimeter Cluster ↔ Calorimeter Cluster Hits
- Track ↔ Track Hits
- Reconstructed Particle ↔ It’s tracks/clusters
- MCParticle ↔ ReconstructedParticle/Track/Cluster
- SimHits ↔ RecoHits
So, currently, using RDataFrame with EDM4hep relations is very inconvenient and requires a lot of hardcoding…
I am wondering if there are already existing tools in ROOT, which I am not aware of, that could improve my ugly hardcoded example above.