I have data reconstruction code that currently runs on multiple cores using PROOF-lite. I would like to replace this with a more sophisticated PCA-MDF method, using TPrincipal and TMultiDimFit. I’m wondering if keeping my PROOF-ready architecture can help with this.
The PCA-MDF code I am adapting seems to have five main parts:
- Fill training set events as rows in the PCA matrix with TPrincipal::AddRow()
- Fill pca-flavored training data into the MDF matric with TMultiDimFit::AddRow()
- Reconstruct real data (not the training set) using TMultiDimFit’s outputted function
Which of these parts can/should be parallelized? #5 seems like a natural choice. #1 and #3 involve looping through a TTree, which is a standard case for PROOF parallelization, but I don’t know if the matrices being filled play nice with parallelization.
#2 and #4 seem like steps requiring a large amount of computing for large training sets. Is it desireable to parallelize them? Is it possible?
Thank you for your help!