Decouple PyROOT from ROOT

Is there a reason PyROOT can’t be decoupled from ROOT as a PIP installable package?

The non-ROOT specific part of PyROOT depends only on libCore and libCling. We are working on decoupling that. A future change might decouple PyROOT even further, making it depend only on libCling - but that will not happen in 2018.

It’s not the dependency that I’m talking about. Right now, PyROOT is part of the ROOT package and built with ROOT. So if you switch your Python executable (e.g. because you’re using Pyenv), ROOT has to be rebuilt to install PyROOT into the new environment. If PyROOT could be installed from PIP linking to the existing ROOT installation, this wouldn’t be an issue.

Thanks! We are aware of this lack of cooperation on our build system’s side.

So that’s also something we’re working on. The first step is to build PyROOT on top of an existing ROOT build. This might still arrive this year!

in the meantime, there are a couple of alternatives:

  • uproot: a pure python package that’s able to read a bunch of ROOT files
  • go-hep/rootio (a pure-Go package that’s able to read a bunch of ROOT files, and the next release (before september) should be able to write them as well) + go-python/gopy (that allows to wrap (almost) any Go package as a Py2 or Py3 module)

If those are “alternatives” you can as well install cppyy directly from PyPI. Comes with manylinux1 wheels for fast installation and still contains ROOT I/O out-of-the-box. You can also point LD_LIBRARY_PATH to an existing ROOT installation to pick that up instead, as long as the ROOT version matches (or you can recompile the tiny cppyy-backend package from PyPI from source for different ROOTs).

All you’re missing out on are the PyROOT-specific pythonizations, but those don’t exist in uproot etc. either.

Does this mean cppyy is sufficient for root_numpy / root_pandas to work?

I presume rootpy on the other hand actually requires PyROOT.

@sbinet uproot would eventually be an alternative for me once root_pandas supports it but I don’t see how getting go involved is an improvement when the problem was that PyROOT is a little hard to build.

@beojan building Go stuff is pretty easy (and waaay faster than anything C++-based).
gopy pythonizes Go APIs and tries to expose objects that implement the (python) buffer protocol.
pandas (and anything numpy-based) makes heavy use of that protocol. so, it’s a net win.

my reasoning being mentioning go-hep/rootio+gopy, besides just throwing it as a bonus, is that you get a python module that’s built on top of a Go package, completely independant of ROOT/C++ (and all the compilation trouble, toolchain headache and general slowness of such a C++ beast.)

another alternative I forgot to mention is Rust/root-io + PyO3 (the rust equivalent of pybind11.

Docker (which is built in Go) is the only piece of software I have tried, failed, and given up at building. Go’s build tools are OK if you actually want to write Go (though it does seem to have copied some of NPM’s shortcomings) but if you just want to install something, it’s a massive faff compared to cmake ..; make; make install.

Code written in C++ tends to use a few large libraries instead of lots of small ones, and tends to bundle any small, uncommon, libraries, which (perhaps surprisingly) greatly reduces the hassle compared to a language with a package system. Then you have all the GOPATH nonsense.

ROOT is definitely one of the harder pieces of C++ software to build (along with Qt) but frankly, it’s easier than Go if you’re not already familiar with Go.

it’s definitely veering a bit off topic, but, in my experience, any pure-Go package I’ve tried, is just a simple:

$> go get github.com/foo/bar

away. I am not including Go packages that need to use C libraries because then you get the worst of both C and Go worlds.
docker falls (kind of) in that category.

I’d be interested in getting more details (perhaps in a PM) about what you meant by "it does seem to have copied some of NPM’s shortcomings".

<shameless-plug>
also: https://sbinet.github.io/posts/2018-07-31-go-hep-manifesto/
</shameless-plug>

in my experience, any pure-Go package I’ve tried, is just a simple:

Once you’ve setup a Go environment, sure (otherwise does it automatically create ~/go? I don’t really like things cluttering up my home directory with non-hidden directories like this.). Not if you only want to build one package.

it does seem to have copied some of NPM’s shortcomings

I suppose PyPI also suffers from this really, though in practice it’s less of a problem, at least for scientific packages. I’m talking about it going off and automatically downloading and installing large numbers of dependencies which haven’t been vetted.

Because everything is statically linked, if a bug is found in a library you can’t just update that library, everything that uses it has to be rebuilt. This (along with plugin architectures being possible) is the main reason shared libraries are used in the C and C++ worlds (though with libraries like Qt or ROOT, they’re so big that compiling is going to take a long time no matter what language it’s written in, so recompiling for every application isn’t feasible). Not unlike Node where your code has a local copy of all it’s dependencies.

yes, starting with Go-1.8 (circa 2017) ~/go is created automatically if doesn’t exist yet.
but it’s no different than python’s $PYTHONPATH (~/go is the default value for $GOPATH) where all the python code lives.

you’ll be probably happy to know that with Go-1.11 (which should be released before Sept. 2018), $GOPATH will start to be completely irrelevant, with the advent of the so-called Go modules, which are self-contained, versioned (with semver semantics, checksums of all deps, etc…) packages.
Go modules are built w/o requiring a $GOPATH, and everything will be compiled under $TMP.
All the dependencies (versions+checksums) will be (automatically) described by a go.mod file.

see:

Go modules essentially enable completely reproducible builds, across time and space.

They also make updating a package b/c of a security/bug fix pretty easy. (that was already the case, but now with the explicit list of dependencies and their version, one can quickly decide whether a package needs to be updated b/c of a security fix that must be applied to one of its dependencies. transitively.)
rebuilding a Go package is so fast, there’s no excuse to not rebuild it: go get. and voila.

I don’t see how this would be different in the remote chance that C++ were to gain a real, cross-platform, easy CPAN-, CRAN-, cargo- or PyPI-like package manager.
dependencies, whether C++ ones, automatically downloaded ones or otherwise, are still to be vet, if the package/library user were to apply due diligence.
At least, with NPM, PyPI and go-get, you automatically download+install said dependencies.
Usually, with a CMake/make/make-install build, you have to chase all of these manually, install their dependencies manually (recursively), learn their build system, etc…

I am not saying go get is a perfect tool. (it’s not.)
but it’s light-years away and above the typical experience of somebody trying to install a C++ application that’s not packaged for their favorite distribution.
and it’s fast.
and it’s simple. (the number of switches and options is minimal.)

(also, go get automatically downloads, compiles and installs everything in one go. but you can go get -d to only automatically download everything, and go install to automatically compile and install everything, with no network access. something rather interesting for build reproducibility :P)

I don’t see how this would be different in the remote chance that C++ were to gain a real, cross-platform, easy CPAN -, CRAN -, cargo - or PyPI -like package manager.

That’s why I’m strongly opposed to C++ gaining such a package manager, though sometimes I feel like a lone voice. The lack of one means people think carefully about adding dependencies, vendoring them in the case of small ones (spdlog, range-v3), or using Boost or Qt (which is likely to already be installed) where possible.

The right way to implement automatic package management is with system vendors (distributions) providing vetted repositories, not with an ad-hoc anything goes system like most languages have. These language specific systems generally also don’t play nice with the system package manager.

Pkg-config and CMake Find files take almost all of the pain out of building C++ applications. It would be nice if CMake or pkg-config could query the system package manager and tell you which packages to install to get missing libraries, but in practice I’ve only rarely run into this situation. Generally, if I build something from source, it only needs libraries I’ve already got installed. If a C++ application isn’t packaged for your distribution and uses more than a couple (recursively) of unpackaged, unvendored dependencies, I’d say that’s a good sign that either the application author is too cavalier, or you’ve made a bad choice of distribution.

Not sure where this discussion is go-ing, but as mentioned cppyy comes with wheels so no compilation happening and virtualenv/pip will deal cleanly with paths.

To come back to the original point, and this extension on the question:

I’m not familiar with root_pandas, but I’ve looked in great detail into root_numpy when reviewing it and I’d say yes, if you replace the “import ROOT” by “import cppyy” and “ROOT.” by “cppyy.gbl.”, then that’ll work fine. (For testing purposes, you can do this transparently by mucking with the import hook.) For the graph/hist objects you’ll need to provide the libraries from a (python-less) ROOT installation. Further, for the extend that root_numpy would use cppyy.gbl, the ancient cppyy in PyROOT will do as well, so nothing is lost if you do not install cppyy but use vanilla ROOT with PyROOT instead. I.e. such a code change should be acceptable to whoever maintains root_numpy today.

Note that I’ve seen numpy/TTree compatibility features fly by for PyROOT, too, so mayby root_numpy is by now obsolete anyway if not used for standalone purposes.

Yes, I suspect so, but at the same time, rootpy isn’t going to be much use w/o all of ROOT (as opposed to I/O sec, which is I thought was the objective if uproot were a potential alternative). Also, there hasn’t been any activity on GitHub - rootpy/rootpy: A pythonic interface for the ROOT libraries on top of the PyROOT bindings. for close to a year, so even to the extend that it could, I’m not sure who would pick up such a task.

At some level though, it is what it is. I saw a lot of complaints during PyHEP of PyROOT not playing nice with the Python eco-system. The humorous part of those complaints is of course that PyROOT, having seen life as PythonROOT in 2002, predates pretty much the whole of that. This was one of the considerations for me to fork PyROOT as cppyy and fully restructure it when moving it into the Python eco-system. And even then, having had a clean start, it took more than a year for PyPI wheels to appear. Don’t even get me started on (ana)conda.

Anyway, those are the card you get to play with today. As Axel said, effort is underway to have more and better cards maybe even before the end of the year. When Enric finishes the PyROOT pythonizations on top of cppyy master, repackaging that as a PyPA package (simply by providing a setup.py within ROOT a la cppyy-backend, which is on PyPI and builds against $ROOTSYS if provided) would be a mere formality, even if the normal install would still be as part of ROOT proper.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.