Hi,
I am trying to get a simple mapreduce job running using the Hadoop Pipes API that uses Root 5.32 to store some vectors in a TTree. The job fails at the point where Root attempts to close the TTree file and I seek any input in resolving the problem.
The mapreduce task receives binary data – essentially vectors, performs a transform on the vector, and writes the resulting vector to a Root TTree that is stored in a file. The job fails at the point where Root tries to write the data to a file – the Hadoop+Root job crashes shortly after the “Close” method of the TTree is called.
This same code completes when not run in the Mapreduce framework.
It looks like there is some memory clobbering at the point that Root attempts to do clean up after the file close.
I have stack trace and code details below, but it seems that the alternatives are:
Suggestions that I can see:
- Run the root I/O and computation in a separate process that is
fork/exec-ed…this separation might remove the memory conflicts that seem to be happening between the two packages. - Perhaps this is a “known” problem associated with a improperly configured compile of Root? This is now under Ubuntu 11.10, Root 5.32. Perhaps re-compile of Root will work?
- What else?
Here is the Hadoop mapreduce debug trace:
A. Stack trace.
attempt_201202230611_0003_m_000000_1: #5 <signal handler called>
attempt_201202230611_0003_m_000000_1: #6 0x00007f4ff0989d49 in free
() from /lib/x86_64-linux-gnu/libc.so.6
attempt_201202230611_0003_m_000000_1: #7 0x00007f4ff2061d44 in
TTree::~TTree() () from /root-5.32.00/lib/libTree.so
attempt_201202230611_0003_m_000000_1: #8 0x00007f4ff2061ed9 in
TTree::~TTree() () from /root-5.32.00/lib/libTree.so
attempt_201202230611_0003_m_000000_1: #9 0x00007f4ff17dbd99 in
TCollection::GarbageCollect(TObject*) () from
/root-5.32.00/lib/libCore.so
attempt_201202230611_0003_m_000000_1: #10 0x00007f4ff17df765 in
TList::Delete(char const*) () from /root-5.32.00/lib/libCore.so
attempt_201202230611_0003_m_000000_1: #11 0x00007f4ff1234711 in
TDirectoryFile::Close(char const*) () from /root-5.32.00/lib/libRIO.so
attempt_201202230611_0003_m_000000_1: #12 0x00007f4ff1246883 in
TFile::Close(char const*) () from /root-5.32.00/lib/libRIO.so
attempt_201202230611_0003_m_000000_1: #13 0x0000000000427c33 in
ProcessFile::transformRow (this=0x20442d0, arraySize=12,
row=0x7fffac1a1160, rootFileName=0x2044268
"/hadoop-distro/rootFile_2.root") at ProcessFile.cpp:64
B. Code.
The above line ProcessFile.cpp:64 is at the end of the following method:
[code]void ProcessFile::transformRow(int arraySize, float row[],const char*
rootFileName){
std::cout << "ProcessFile::transformRow creating matrixes and
vectors " << std::endl;
VectorXf v(arraySize), o(arraySize);
createdMatrix = MatrixXf::Random(arraySize,arraySize);
eventArray= new Float_t(arraySize);
std::cout << "ProcessFile::transformRow creating local root file "
<< std::endl;
TFile local(rootFileName,“recreate”);
std::cout << "ProcessFile::transformRow creating root structures "
<< std::endl;
aTree= new TTree(“test_tree”,“simple_event_tree”);
copyDoubleIntoArray(row,v);
// Using Eigen math library perform multiply that models the transform
o = createdMatrix*v;
std::cout << “ProcessFile::transformRow copying into Root structures
” << std::endl;
aTree->Branch(“arraySize”,&arraySize,“arraySize/I”);
aTree->Branch(“arrays”,eventArray,“eventArray[arraySize]/F”);
std::cout << "ProcessFile::transformRow copying into Eigen structure
and do transform " << std::endl;
copyValueToRootFile(o,aTree,eventArray);
// ** Looks like this is where the problem happens **
local.Close();
}[/code]
The final line, “local.Close()” starts the error. Further, If I comment out the “
copyValueToRootFile(o,aTree,eventArray);”, then the crash does not
occur.
C.
Looking at the copyValueToRootFile method:
void ProcessFile::copyValueToRootFile(VectorXf& v,TTree * aTree,
Float_t *eventArray){
std::cout << "ProcessFile::copyValueToRootFile: Verifying values for
Vector" << std::endl;
for (int i=0;i<v.size();i++)
{
std::cout << "O(" << i << "): " << v(i) << std::endl;
eventArray[i]= v(i);
}
// Save the data to disk
aTree->Fill();
aTree->FlushBaskets();
aTree->Reset();
}
As long as there is no assignment of eventArray elements, then there
is no crash. That is, if I comment out the above line
eventArray[i]=v(i);
Then the mapreduce task completes.
So it seems fair to say that there is some memory or I/O issue
associated with the write to the data associated aTree.