Seeking help with a very weird dictionary error

Dear All,

This will be a very complicated issue, but please bear with me. I’m desperate for some expert help…

In ATLAS we have a pretty complicated “vector type” called DataVector. Amongst other things, it allows us to set up containers where let’s say if Muon inherits from Particle, then DataVector<Muon> ends up inheriting from DataVector<Particle>. One of the public places where you can have a look at the class, is here:

http://acode-browser1.usatlas.bnl.gov/lxr/source/athena/Control/AthContainers/AthContainers/DataVector.h?v=21.2

Now… We’re in the process of writing some new “algorithms” for ATLAS analyses. These algorithms we instantiate in our lightweight framework through their dictionaries. Which in general works fine. But we’re having endless problems with the classes defined in one of our libraries, that I can just not figure out. It’s this package/library for reference:

http://acode-browser1.usatlas.bnl.gov/lxr/source/athena/PhysicsAnalysis/Algorithms/JetAnalysisAlgorithms/?v=21.2

The problem is that when our framework tries to instantiate any of the algorithms defined in this package/library, we get these sort of errors:

In file included from libJetAnalysisAlgorithmsDict dictionary payload:50:
In file included from /home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/JetAnalysisAlgorithms/JetCalibrationAlg.h:12:
In file included from /home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/JetCalibTools/IJetCalibrationTool.h:19:
In file included from /home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/JetInterface/IJetModifier.h:17:
In file included from /home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/xAODJet/JetContainer.h:12:
In file included from /home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/xAODJet/Jet.h:12:
/home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/xAODJet/versions/Jet_v1.h:376:1: error: explicit specialization of 'DataVectorBase<xAOD::Jet_v1>' after instantiation
DATAVECTOR_BASE( xAOD::Jet_v1, xAOD::IParticle );
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/AthContainers/DataVector.h:615:43: note: expanded from macro 'DATAVECTOR_BASE'
#define DATAVECTOR_BASE(T, BASE)          \
                                          ^
/home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/AthContainers/DataVector.h:625:20: note: expanded from macro '\
DATAVECTOR_BASE_FWD'
template <> struct DataVectorBase<T>      \
                   ^~~~~~~~~~~~~~~~~
/home/krasznaa/projects/AnaAlg/build/x86_64-slc6-gcc62-opt/include/AthContainers/DataVector.h:721:42: note: implicit instantiation first required here
template <class T, class BASE = typename DataVectorBase<T>::Base>
                                         ^
TInterpreter::AutoParse   ERROR   Error parsing payload code for class CP::JetUncertaintiesAlg with content:

#line 1 "libJetAnalysisAlgorithmsDict dictionary payload"

#ifndef G__VECTOR_HAS_CLASS_ITERATOR
  #define G__VECTOR_HAS_CLASS_ITERATOR 1
#endif
#ifndef HAVE_PRETTY_FUNCTION
  #define HAVE_PRETTY_FUNCTION 1
#endif
#ifndef HAVE_64_BITS
  #define HAVE_64_BITS 1
#endif
#ifndef __IDENTIFIER_64BIT__
  #define __IDENTIFIER_64BIT__ 1
#endif
#ifndef ATLAS
  #define ATLAS 1
#endif
#ifndef ROOTCORE
  #define ROOTCORE 1
#endif
#ifndef XAOD_STANDALONE
  #define XAOD_STANDALONE 1
#endif
#ifndef XAOD_ANALYSIS
  #define XAOD_ANALYSIS 1
#endif
#ifndef ROOTCORE_RELEASE_SERIES
  #define ROOTCORE_RELEASE_SERIES 25
#endif
#ifndef PACKAGE_VERSION
  #define PACKAGE_VERSION "JetAnalysisAlgorithms-00-00-00"
#endif
#ifndef PACKAGE_VERSION_UQ
  #define PACKAGE_VERSION_UQ JetAnalysisAlgorithms-00-00-00
#endif
#ifndef EIGEN_DONT_VECTORIZE
  #define EIGEN_DONT_VECTORIZE 1
#endif

#define _BACKWARD_BACKWARD_WARNING_H
/*
  Copyright (C) 2002-2018 CERN for the benefit of the ATLAS collaboration
*/

/// @author Nils Krumnack


#ifndef JET_ANALYSIS_ALGORITHMS__JET_ANALYSIS_ALGORITHMS_DICT_H
#define JET_ANALYSIS_ALGORITHMS__JET_ANALYSIS_ALGORITHMS_DICT_H

#include <JetAnalysisAlgorithms/JetCalibrationAlg.h>
#include <JetAnalysisAlgorithms/JetSelectionAlg.h>
#include <JetAnalysisAlgorithms/JetSmearingAlg.h>
#include <JetAnalysisAlgorithms/JetUncertaintiesAlg.h>
#include <JetAnalysisAlgorithms/JvtEfficiencyAlg.h>
#include <JetAnalysisAlgorithms/JvtUpdateAlg.h>

#endif

#undef  _BACKWARD_BACKWARD_WARNING_H

input_line_255:2:38: error: allocation of incomplete type 'CP::JetUncertaintiesAlg'
 dynamic_cast<EL::AnaAlgorithm*>(new CP::JetUncertaintiesAlg ("JetUncertaintiesAlg", nullptr))
                                     ^~~~~~~~~~~~~~~~~~~~~~~
libJetAnalysisAlgorithmsDict dictionary forward declarations' payload:8:109: note: forward declaration of 'CP::JetUncertaintiesAlg'
namespace CP{class __attribute__((annotate("$clingAutoload$JetAnalysisAlgorithms/JetUncertaintiesAlg.h")))  JetU...
                                                                                                            ^
EventLoopComp_Algorith...ERROR   /home/krasznaa/projects/AnaAlg/athena/PhysicsAnalysis/D3PDTools/AnaAlgorithm/Root/AnaAlgorithmConfig.cxx:231 (StatusCode EL::AnaAlgorithmConfig::makeAlgorithm(std::unique_ptr<EL::AnaAlgorithm>&) const): failed to create algorithm of type CP::JetUncertaintiesAlg
EventLoopComp_Algorith...ERROR   /home/krasznaa/projects/AnaAlg/athena/PhysicsAnalysis/D3PDTools/AnaAlgorithm/Root/AnaAlgorithmConfig.cxx:232 (StatusCode EL::AnaAlgorithmConfig::makeAlgorithm(std::unique_ptr<EL::AnaAlgorithm>&) const): make sure you created a dictionary for your algorithm

Now, as far as I understand, this sort of error should happen when:

  • We declare a class, in this case xAOD::Jet_v1;
  • We instantiate a DataVector<xAOD::Jet_v1> object;
  • After all of this, we use the DATAVECTOR_BASE macro to specify that DataVector<xAOD::Jet_v1> should inherit from DataVector<xAOD::IParticle>.

I.e. we call the DATAVECTOR_BASE macro “too late”. But I can just not figure out how this is meant to happen. :frowning:

The situation is made much weirder by the fact that I can happily instantiate the algorithm from PyROOT for instance. It’s only when our framework tries to instantiate this algorithm that things break like this.

Also note that our other algorithms, which use containers very similar to xAOD::JetContainer, work just fine.

Does anyone have any ideas how I can try to debug this? I’ve been looking at this issue for a long time already, and I’m really out of ideas by now. How could I ask cling how it encountered the “un-specialised” form of that DataVectorBase struct? Unfortunately the error message telling me on which line it encountered it, without giving be a full backtrace of how it reached that line, is not enough help…

Any help is very much welcomed on this one.

Cheers,
Attila

Hi @Attila_Krasznahorkay,
apparently you “implicitly” instantiate a DataVectorBase<xAOD::Jet_v1> at DataVector.h:721, but you declare the specialization afterwards, at Jet_v1.h:376.

Could this be an inclusion order issue?
I don’t think the macro is the culprit, it’s just that the compiler sees the specialization after it has already been used.

Cheers,
Enrico

EDIT:
as per how to debug that, if you feel like going down the rabbit hole templight is a clang plugin that lets you step into clang’s template instantiation process similarly to what you would do in gdb.

Another idea would be to add #pragma message debug printouts in the headers involved, so you can see in which order the compiler goes through them

Hi Enrico,

I’ll give #pragma a try, maybe it will help. Though instrumenting all too many headers like that will be painful…

Note that I first thought that I understood the problem. Currently in our repository the DATAVECTOR_BASE call is done here:

http://acode-browser1.usatlas.bnl.gov/lxr/source/athena/Event/xAOD/xAODJet/xAODJet/versions/JetContainer_v1.h?v=21.2#0018

Which means that it is possible to arrive at this very same error with code like:

#include "AthContainers/DataVector.h"
#include "xAODJet/Jet.h"

DataVector< xAOD::Jet > globalJetContainer;

#include "xAODJet/JetContainer.h"

int main() {

   xAOD::JetContainer localJetContainer;

   return 0;
}

Which produces:

[ 50%] Building CXX object CMakeFiles/jetTest.dir/jetTest.cxx.o
In file included from /Users/krasznaa/ATLAS/sw/projects/permanent/dvbase/source/jetTest.cxx:7:
In file included from /Users/krasznaa/ATLAS/Software/AnalysisBase/21.2.24/InstallArea/x86_64-mac1013-clang90-opt/src/Event/xAOD/xAODJet/xAODJet/JetContainer.h:13:
/Users/krasznaa/ATLAS/Software/AnalysisBase/21.2.24/InstallArea/x86_64-mac1013-clang90-opt/src/Event/xAOD/xAODJet/xAODJet/versions/JetContainer_v1.h:18:1: error: 
      explicit specialization of 'DataVectorBase<xAOD::Jet_v1>' after instantiation
DATAVECTOR_BASE( xAOD::Jet_v1, xAOD::IParticle );
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/krasznaa/ATLAS/Software/AnalysisBase/21.2.24/InstallArea/x86_64-mac1013-clang90-opt/src/Control/AthContainers/AthContainers/DataVector.h:615:43: note: 
      expanded from macro 'DATAVECTOR_BASE'
#define DATAVECTOR_BASE(T, BASE)          \
                                          ^
/Users/krasznaa/ATLAS/Software/AnalysisBase/21.2.24/InstallArea/x86_64-mac1013-clang90-opt/src/Control/AthContainers/AthContainers/DataVector.h:625:20: note: 
      expanded from macro '\
DATAVECTOR_BASE_FWD'
template <> struct DataVectorBase<T>      \
                   ^~~~~~~~~~~~~~~~~
/Users/krasznaa/ATLAS/Software/AnalysisBase/21.2.24/InstallArea/x86_64-mac1013-clang90-opt/src/Control/AthContainers/AthContainers/DataVector.h:721:42: note: 
      implicit instantiation first required here
template <class T, class BASE = typename DataVectorBase<T>::Base>
                                         ^
1 error generated.
make[2]: *** [CMakeFiles/jetTest.dir/jetTest.cxx.o] Error 1
make[1]: *** [CMakeFiles/jetTest.dir/all] Error 2
make: *** [all] Error 2

So as a first thing I moved the DATAVECTOR_BASE call just after the definition of the xAOD::Jet_v1 class, in the Jet_v1.h header. Like it is done for the muons for instance:

http://acode-browser1.usatlas.bnl.gov/lxr/source/athena/Event/xAOD/xAODMuon/xAODMuon/versions/Muon_v1.h?v=21.2#0428

At that point I was convinced that this would fix the issue. But even after re-building our full analysis release from scratch, with this modification included, I still got this error from cling. Even though at that point I wouldn’t know how to trigger this error with clang anymore…

So this is why I’m all so stuck by now. My only guess is that cling may actually be making some subtle mistake here…

Cheers,
Attila

And I’ve also searched our repository for any occurrence of DataVector<xAOD::Jet as well, but didn’t find anything that was not supposed to be there. So it must be cling/the dictionary instantiating that type “too early” somehow.

Yes it can be one of two things:

  • either cling somehow instantiates DataVector<xAOD::Jet> before you include "JetContainer.h" when building dictionaries
  • or cling parses some extra headers (e.g. more than clang would), rightly instantiating DataVector<xAOD::Jet> when it parses it, but too early

If i’m not mistaken the second case can be excluded by merging the two headers "Jet.h" and "JetContainer.h", so the parsing order can only be "DataVector.h" first and then "JetAndContainer.h", and inside the latter you can make sure DataVector<xAOD::Jet> is not instantiated before the specialization is declared.

At the end of the day the question is what does cling mean with:

DataVector.h:721:42: note: 
      implicit instantiation first required here
template <class T, class BASE = typename DataVectorBase<T>::Base>

i.e. who was instantiating a DataVector<xAOD::Jet> (and therefore a DataVectorBase<xAOD::Jet>) before the template specialization was defined, and why.
I don’t know a graceful way to have cling answer this question.

edit: maybe @Axel or @pcanal do :slight_smile:

just a little update, in case it helps:
this is the minimal situation in which one gets the clang diagnostic you see:

template <typename T>
struct B {};

template <typename T, typename Base = B<T>>
struct D : Base {};

void f(D<int>) {}

template <>
struct B<int>;

int main() {
   return 0;
}

Unfortunately I can’t find a way for vanilla clang (without templight) or gcc to print out who requested the first instantiation of B<int>, but it seems that you are instantiating the derived class, which in turn instantiates the base class, and the message just refers to the derived class’ definition.

So it’s a spurious instantiation of DataVector<xAOD::Jet> that you are looking for and not DataVectorBase<xAOD::Jet>.

My suggestion would be to put a #pragma message everywhere a DataVector<xAOD::Jet> could be instantiated – I understand this is pretty invasive :sweat_smile:

I’m just completely baffled by all of this…

By now my Jet_v1.h header file looks like this:

...
namespace xAOD {
   ...
   class Jet_v1 : public IParticle {
   ...
   }; // class Jet_v1

} // namespace xAOD

// Declare IParticle as the base class of Jet_v1:
#include "AthContainers/DataVector.h"
DATAVECTOR_BASE( xAOD::Jet_v1, xAOD::IParticle );

// Include the template implementation(s):
#include "Jet_v1.icc"

I recompiled all of our code with this setup. And still, I get this error from cling at runtime. I can just not imagine how it could end up instantiating DataVector<xAOD::Jet_v1> at this point without having seen the DATAVECTOR_BASE(...) declaration already.

I can only hope that I’ll get some new ideas by next week, because I’m completely out of them by now… :frowning:

Attila

Just to say, the dictionary generated for this package/library refers to DataVector<xAOD::Jet_v1> directly a lot. Like:

...
   // Function generating the singleton type initializer
   static TGenericClassInfo *GenerateInitInstanceLocal(const ::DataLink<DataVector<xAOD::Jet_v1> >*)
   {
      ::DataLink<DataVector<xAOD::Jet_v1> > *ptr = 0;
      static ::TVirtualIsAProxy* isa_proxy = new ::TIsAProxy(typeid(::DataLink<DataVector<xAOD::Jet_v1> >));
...

But that’s the same for all of our other types as well. And yet we only see problems with the jets at the moment…

Okay, now I’m giving up for now… :stuck_out_tongue:

Hi Attila,

It would help to see the updated error messages, after your changes to Jet_v1.h.

And FYI, the compilation of the dictionary sources doesn’t interfere with cling runtime errors.

Axel.

Hi Axel,

The only thing that changed in the error messages after the update was the exact backtrace shown in the error message.

I’ll prepare a recipe for you guys to reproduce this error once I’m off of my latest/current sick leave…

Cheers,
Attila

And I notice it just now: The backtrace, with DATAVECTOR_BASE already in Jet_v1.h, is the one that I posted in the opening message of the thread. The backtrace we get with our code as-is in the repository right now, is a little different.

Attila

Dear All,

Unfortunately the following is only useful information for people with CERN accounts, but I guess that should be okay at this point…

I’ve pushed the modified code that I was last playing with, into this branch:

https://gitlab.cern.ch/akraszna/athena/tree/JetDictionaryFixes-21.2-20180406

You can clone this simply like:

git clone -b JetDictionaryFixes-21.2-20180406 https://:@gitlab.cern.ch:8443/akraszna/athena.git

(You need some form of “CERN recognised” authentication to get the code.)

At this point, to build the analysis release from scratch, you can execute the following in an lxplus-like environment:

source /cvmfs/sft.cern.ch/lcg/contrib/gcc/6.2/x86_64-slc6/setup.sh
export PATH=/cvmfs/sft.cern.ch/lcg/contrib/CMake/3.8.1/Linux-x86_64/bin:$PATH
export MAKEFLAGS=<something reasonable>
./athena/Projects/AnalysisBase/build_externals.sh -c
./athena/Projects/AnalysisBase/build.sh -acmi

This would put the compiled projects (2 of them) under ./build/install/. You can then launch the problematic job like:

# Set up the environment from scratch:
source /cvmfs/sft.cern.ch/lcg/contrib/gcc/6.2/x86_64-slc6/setup.sh
source ./build/install/AnalysisBase/21.2.25/InstallArea/x86_64-slc6-gcc62-opt/setup.sh
# Run the job. This script is in $PATH at this point...
JetAnalysisAlgorithmsTest_eljob.py

Now, if you want to use your own build of ROOT for this test, you can do that. If you don’t have an “acceptable version” of ROOT in your environment when executing the build_externals.sh script, then ROOT 6.12/06 gets built for you. But if you do, the build assumes that you want to build the analysis release against that version of ROOT, and doesn’t build it on its own.

The analysis release can be built on a number of different platforms as well. This version should succeed building on Ubuntu 16.04 for instance. But unfortunately with the very latest updates of Xcode it won’t build. (We haven’t pushed in all changes into our repository to make the full build successful with the latest version of clang just yet.)

Cheers,
Attila

And just to re-iterate: ROOT is able to instantiate an object of this type in principle.

[bash][pcadp02]:run-root > root -b
   ------------------------------------------------------------
  | Welcome to ROOT 6.12/06                http://root.cern.ch |
  |                               (c) 1995-2017, The ROOT Team |
  | Built for linuxx8664gcc                                    |
  | From tag v6-12-06, 9 February 2018                         |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------

root [0] CP::JetCalibrationAlg alg( "JetCalib", 0 );
root [1]

It’s only when we run our full-blown analysis job that things break down…

Attila

Okay, I still don’t understand what’s going on exactly, but I think I’m a little closer to understanding the issue.

I believe the confusion comes from the failing job opening one of our xAOD files before trying to instantiate the class in question. The input file (/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/CommonInputs/DAOD_PHYSVAL/data16_13TeV.00311321.physics_Main.DAOD_PHYSVAL.r9264_AthDerivation-21.2.1.0.root) has branches of type xAOD::JetContainer in it of course. It seems that as ROOT is opening the file, and loading the dictionaries for all the types that are stored in the file, the DataVector<xAOD::Jet_v1> type gets interpreted without seeing the DATAVECTOR_BASE statement.

Unfortunately I’m unable to reproduce this behaviour with the following type of simple executable:

// ROOT include(s):
#include <TInterpreter.h>
#include <TFile.h>
#include <TError.h>

// ATLAS include(s):
#include "xAODRootAccess/Init.h"
#include "AnaAlgorithm/AnaAlgorithm.h"

int main() {

   // Set up the runtime environment:
   xAOD::Init().ignore();

   // Open an xAOD file:
   TFile* f = TFile::Open( "/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/"
                           "CommonInputs/DAOD_PHYSVAL/data16_13TeV.00311321."
                           "physics_Main.DAOD_PHYSVAL.r9264_AthDerivation-21.2.1.0.root",
                           "READ" );
   if( f ) {
      Info( "jetAlgError", "File opened!" );
   }

   // (Try to) Instantiate the object:
   EL::AnaAlgorithm* alg = ( EL::AnaAlgorithm* )
      gInterpreter->Calc( "dynamic_cast<EL::AnaAlgorithm*>(new CP::JetCalibrationAlg(\"JetCalib\",0))" );
   if( alg ) {
      Info( "jetAlgError", "Yay!" );
   } else {
      Error( "jetAlgError", "Boo!" );
      return 1;
   }

   // Return gracefully:
   return 0;
}

This remains successful. But if I force-instantiate this type in our top-most PyROOT script before any file would get opened, the dictionary error disappears. Just by adding a line like:

dummyAlg = ROOT.CP.JetCalibrationAlg( "JetCalib", 0 )

just in line 21 of:

https://gitlab.cern.ch/akraszna/athena/blob/JetDictionaryFixes-21.2-20180406/PhysicsAnalysis/Algorithms/JetAnalysisAlgorithms/share/JetAnalysisAlgorithmsTest_eljob.py

So… This is at least some handle on the issue. But why would reading a ROOT file get ROOT into such a confused state?

Cheers,
Attila

Hi @Attila_Krasznahorkay - I bet https://github.com/root-project/root/pull/1897 is fixing it. Could you confirm, please?

Cheers, Axel.

It will take some effort to set up that test… I’ll try to do it next week. (I’ll need a build of our full analysis release on top of your modified version of the master branch of ROOT. Not something that we would do regularly…)

Cheers,
Attila

That’s okay, then we just wait until you try 6.14 or its release candidate, begin May?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.