Persisting Vc types with different vectorization techniques (SSE, AVX)

Hello,

I am working on persisting SIMD data types found in the Vc library (which also comes with ROOT). Since ROOT doesn’t natively support I/O of Vc types (due to over-alignment), I have discussed with @pcanal and @Axel how to do this anyway. They proposed to use a persistent data type (such as std::vector) as an intermediary buffer to resp. store and load Vc types to and from.

I have successfully been able to do so, but only for SSE-compiled programs. When trying to do the same thing when compiling with AVX (i.e. adding the CXX_FLAG “-mavx”), I get strange I/O behavior. Sometimes I/O works, but sometimes it segfaults into the following:

#5  0x0000000000404c72 in ROOT::new_VcContainer(void*) ()
#6  0x00007f8019db7bc9 in TClass::New(TClass::ENewType, bool) const () from /home/ahmad/Desktop/root_v6.08.06_prebuilt/root/lib/libCore.so
#7  0x00007f80198161a5 in TKey::ReadObjectAny(TClass const*) () from /home/ahmad/Desktop/root_v6.08.06_prebuilt/root/lib/libRIO.so
#8  0x00007f801973d5a5 in TDirectoryFile::GetObjectChecked(char const*, TClass const*) () from /home/ahmad/Desktop/root_v6.08.06_prebuilt/root/lib/libRIO.so
#9  0x00000000004048bc in main ()

My version of ROOT is v6.08.06. The binaries were SSE-compiled. I also build ROOT with the “-mavx” flag enabled to see if it matters with which vectorization technique ROOT is built, but still the same problem.

Please find the reproducer below, consisting of vc_persistency.cc, vc_persistency.h, Makefile and LinkDef.h. When compiling please refer to your path to Vc in the variables CXXFLAGS, LIBPATH and LIBS of the Makefile. Run ./avxRun multiple times to see the strange I/O behavior I mentioned earlier (assuming your machine supports AVX).

// vc_persistency.cc

#include "vc_persistency.h"

int main(void) {
  VcContainer *vcc = new VcContainer();
  Vc::double_v v_set(42);
  vcc->set_private(v_set);

  vcc->PersistVc();

  TFile *f = new TFile ("myfile.root", "RECREATE");
  f->WriteObject(vcc, "VcObj");
  f->Close();

  VcContainer *vcc_r;

  TFile *g = TFile::Open("myfile.root");
  g->GetObject("VcObj", vcc_r);
  g->Close();

  vcc_r->LoadVc();

  vcc_r->print_private();
  remove("myfile.root");
}
// vc_persistency.h

#include <iostream>
#include <fstream>

#include <Rtypes.h>
#include <TFile.h>

#include <Vc/Vc>

class VcContainer {
 public:
  VcContainer(TRootIOCtor*) {}  // ROOT I/O constructor
  VcContainer() {}

  void print_private() { std::cout << svc_d_ << std::endl; }

  Vc::double_v& get_private() { return svc_d_; }

  void set_private(Vc::double_v foo) {
    for (int j = 0; j < static_cast<int>(Vc::double_v::Size); j++)
      svc_d_[j] = foo[j];
  }

  virtual ~VcContainer() {}

  void PersistVc() {
    svc_d_pst_.clear();

    for (size_t i = 0; i < Vc::double_v::Size; i++) {
      svc_d_pst_.push_back(svc_d_[i]);
    }
  }

  void LoadVc() {
    for (size_t i = 0; i < Vc::double_v::Size; i++) {
      svc_d_[i] = svc_d_pst_[i];
    }
  }

  using pst_vector = std::vector<Vc::double_v::value_type>;

 private:
  int index_ = 0;
  Vc::double_v svc_d_;  //!
  pst_vector svc_d_pst_;
  ClassDef(VcContainer, 1);
};
ROOTCFLAGS    				= `root-config --cflags`
ROOTLIBS      				= `root-config --libs`
ROOTGLIBS     				= `root-config --glibs`

CXX           				= g++
CXXFLAGS      				= -I/opt/Vc/include -I$(ROOTSYS)/include -O -Wall -fPIC -Wno-reorder
FAVX                        = -mavx -fabi-version=6
FSSE                        = -msse -fabi-version=6
LD            				= g++
LDFLAGS       				= -g
SOFLAGS       				= -shared
LIBPATH                     = -L/opt/Vc/lib/
LIBS                        = -lVc

SSE							= sseRun
AVX							= avxRun
SOURCES						= vc_persistency.cc
HEADERS 					= vc_persistency.h
LINKDEF 					= $(wildcard *LinkDef.h *Linkdef.h)

CXXFLAGS 				   += $(ROOTCFLAGS)
GLIBS 						= $(ROOTGLIBS)

all: $(SSE) $(AVX)

clean:
	@rm *.o *.pcm *.d *.so $(SSE) $(AVX) sseDict.cc avxDict.cc *.root 2>/dev/null || true

reset:
	@rm *.root 2>/dev/null || true

# SSE Target
sseDict.cc: $(HEADERS) $(LINKDEF)
	@rootcling -f $@ -c $(CXXFLAGS) -msse $(HEADERS) $(LINKDEF) 2>/dev/null || true
sseDict.o: sseDict.cc
	@$(CXX) -c sseDict.cc $(CXXFLAGS) $(FSSE)
	@echo "\033[95mBuilt sseDict.\033[0m"
$(SSE): $(SOURCES) sseDict.o
	@$(CXX) $(SOURCES) $(LIBPATH) $(LIBS) -o $(SSE) sseDict.o $(CXXFLAGS) $(FSSE) $(GLIBS)
	@echo "\033[95mBuilt executable $(SSE).\033[0m"

# AVX Target
avxDict.cc: $(HEADERS) $(LINKDEF)
	@rootcling -f $@ -c $(CXXFLAGS) -mavx $(HEADERS) $(LINKDEF) 2>/dev/null || true
avxDict.o: avxDict.cc
	@$(CXX) -c avxDict.cc $(CXXFLAGS) $(FAVX)
	@echo "\033[95mBuilt avxDict.\033[0m"
$(AVX): $(SOURCES) avxDict.o
	@$(CXX) $(SOURCES) $(LIBPATH) $(LIBS) -o $(AVX) avxDict.o $(CXXFLAGS) $(FAVX) $(GLIBS)
	@echo "\033[95mBuilt executable $(AVX).\033[0m"
// LinkDef.h
#ifdef __ROOTCLING__

#pragma link off all globals;
#pragma link off all classes;
#pragma link off all functions;

#pragma link C++ nestedclasses;
#pragma link C++ nestedtypedef;

#pragma link C++ class VcContainer+;

#endif

Any help on this matter would be greatly appreciated.

  • Ahmad

Hi,

Maybe @Danilo could help here (he has some experience with vc)

Cheers, Bertrand.

Can you run the failing example with valgrind?

as a side note.

void LoadVc() {
    for (size_t i = 0; i <= Vc::double_v::Size; i++) {
      svc_d_[i] = svc_d_pst_[i];
    }
  }

might fail if svc_d_pst_ does not have the required size.

Result of: valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp --track-origins=yes ./avxCounter :
(NB the segfault occurs arbitrarily)

==59278== Memcheck, a memory error detector
==59278== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==59278== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==59278== Command: ./avxCounter
==59278== 

 *** Break *** segmentation violation
#0  0x0000000038109edc in ?? ()
#1  0x0000000000000008 in ?? ()
#2  0x0000000808b99dd0 in ?? ()
#3  0x0000000808b99d90 in ?? ()
#4  0x0000000039c47a10 in ?? ()
#5  0x000000000000003d in ?? ()
#6  0x0000000039c47a00 in ?? ()
#7  0x000000003a031588 in ?? ()
#8  0x00000000000000b8 in ?? ()
#9  0x000000000000003d in ?? ()
#10 0x0000000000000001 in ?? ()
#11 0x000000003a0314f8 in ?? ()
#12 0x00000000380b086b in ?? ()
#13 0x00000000380ad183 in ?? ()
#14 0x00000000380ae637 in ?? ()
#15 0x00000000380bdcbd in ?? ()
#16 0x0000000000000000 in ?? ()
==59278== 
==59278== HEAP SUMMARY:
==59278==     in use at exit: 27,685,639 bytes in 53,741 blocks
==59278==   total heap usage: 247,830 allocs, 194,089 frees, 196,240,856 bytes allocated
==59278== 
==59278== LEAK SUMMARY:
==59278==    definitely lost: 0 bytes in 0 blocks
==59278==    indirectly lost: 0 bytes in 0 blocks
==59278==      possibly lost: 59,867 bytes in 861 blocks
==59278==    still reachable: 27,427,767 bytes in 51,005 blocks
==59278==         suppressed: 198,005 bytes in 1,875 blocks
==59278== Rerun with --leak-check=full to see details of leaked memory
==59278== 
==59278== For counts of detected and suppressed errors, rerun with: -v
==59278== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 531 from 88)

(There was one error before, but just forgot to initialize a primitive member, which was not the reason for the random segfaults. Edited in the first post’s code for initialization).

Valgrinding with --leak-check=full gives 6k lines of ROOT-related errors that don’t seem to have to do with my problem. For some reason I never got these errors suppressed with the ROOT suppression file… Here is the output: http://pasted.co/d6272ba1

@pcanal, your side note about the LoadVc() function is true when mixing up between SSE/AVX, but shouldn’t matter if just either SSE or AVX is used when Persisting Vc types and Loading Vc types right? This example is indeed not portable, but its use was merely to isolate the segfault issue I’m having when compiling with AVX flags.

Not exactly, my point is that the following sequence leads to out-of-bound memory reads.

VcContainer *vcc = new VcContainer();
vcc->LoadVc();

This is not needed to find this kind of error.

*** Break *** segmentation violation
#0 0x0000000038109edc in ?? ()
#1 0x0000000000000008 in ?? ()
#2 0x0000000808b99dd0 in ?? ()

Can you try again with a debug build?

So you mean that in the case myfile.root doesn’t find an initialized VcContainer object when looking for VcObj, this will lead to out-of-bounds memory reads? Isn’t this guaranteed to work once I do vcc->PersistVc() and writing this to myfile.root?

Hmm it seems like with debugging info enabled valgrind doesn’t pick up the segfault anymore. Without valgrind it still segfaults though. I do get a longer trace now when it segfaults (without valgrind):

#0  0x00007f0b88d8f54c in __libc_waitpid (pid=61658, stat_loc=stat_loc
entry=0x7ffc5d2b33c0, options=options
entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
#1  0x00007f0b88d11232 in do_system (line=<optimised out>) at ../sysdeps/posix/system.c:148
#2  0x00007f0b89d10193 in TUnixSystem::StackTrace() () from /home/ahmad/Desktop/buildroot_avx/lib/libCore.so
#3  0x00007f0b89d12bec in TUnixSystem::DispatchSignals(ESignals) () from /home/ahmad/Desktop/buildroot_avx/lib/libCore.so
#4  <signal handler called>
#5  0x0000000000404cf0 in Storage (this=0x370d110) at /opt/Vc/include/Vc/avx/../sse/../common/storage.h:263
#6  Vector (this=0x370d110) at /opt/Vc/include/Vc/avx/../common/generalinterface.h:31
#7  VcContainer (this=0x370d0f0) at vc_counter.h:11
#8  ROOT::new_VcContainer (p=<optimised out>) at avxDict.cc:124
#9  0x00007f0b89cc2799 in TClass::New(TClass::ENewType, bool) const () from /home/ahmad/Desktop/buildroot_avx/lib/libCore.so
#10 0x00007f0b896b20f5 in TKey::ReadObjectAny(TClass const*) () from /home/ahmad/Desktop/buildroot_avx/lib/libRIO.so
#11 0x00007f0b896797c5 in TDirectoryFile::GetObjectChecked(char const*, TClass const*) () from /home/ahmad/Desktop/buildroot_avx/lib/libRIO.so
#12 0x0000000000404911 in GetObject<VcContainer> (ptr=<synthetic pointer>, namecycle=0x405df5 "VcObj", this=0x370bcc0) at /home/ahmad/Desktop/buildroot_avx/include/TDirectoryFile.h:82
#13 main () at vc_counter.cc:18

Also running avxRun with gdb will never manifest the segfault with a debug build. At /opt/Vc/include/Vc/common/storage.h:263 I find the following:

Vc_INTRINSIC Storage() : data() { assertCorrectAlignment(&data); }

where data is I think the doubles that I store in svc_d_. From what I can understand is that when ROOT wants to read back the VcContainer object to a newly allocated VcContainer(TRootIOCtor*) object, the private member svc_d_ is not correctly alligned. Then I don’t get how the SSE compiled program does not give this problem…

Yes. However, there is no guarantee in your interface that the code will be used that way. I.e. a user of your code might (wrongly) use the calling sequence I mention and the result will be random. i.e. to improve the code you need to add the proper checks at the start of LoadVc (and/or make it a private function).

I do get a longer trace now when it segfaults (without valgrind): …

Okay, so it now clear that it is an alignment problem. It is not clear how it can happen though.

Things you can try

  • See if the problem still appear after removing the (currently unused): int index_ = 0;
  • See if having VcContainer inherits from Vc::VectorAlignedBase improves the behavior.
  • Try to figure a way to print the address (of the Vc part and the VcContainer itself) in the failure case/point.

Cheers,
Philippe.

Thanks for your suggestions Philippe!

Your second suggestion seemed to have stopped the random behavior. This required me to persist* the Vc::VectorAlignedBase class, which is an alias for Vc::AlignedBase<MAX>, where MAX is the largest alignment size of all the available Vc::Vector types. This will mean multiple listings of Vc::AlignedBase<MAX> in the LinkDef.h file (i.e. for SSE MAX = 16, for AVX MAX = 32).

*I changed the following in $Vc/src/Vc/common/alignedbase.h:

template <std::size_t Alignment> struct alignas(Alignment) AlignedBase
{
    Vc_FREE_STORE_OPERATORS_ALIGNED(Alignment);
};

into:

template <std::size_t Alignment> struct alignas(Alignment) AlignedBase
{
    Vc_FREE_STORE_OPERATORS_ALIGNED(Alignment);
  virtual ~AlignedBase() {}
   private:
    ClassDef(AlignedBase, 1);
};

Is there a way you think that this can be done without modifying Vc files? It seems like something that should be picked up automatically when switching between SSE, AVX, etc…

My bad. I don’t even need to change the Vc::AlignedBase class! I can just inherit from Vc::VectorAlignedBase and list the template instances of Vc::AlignedBase in LinkDef.h.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.