Decouple PyROOT from ROOT

beojan · August 22, 2018, 9:22am

Is there a reason PyROOT can’t be decoupled from ROOT as a PIP installable package?

Axel · August 22, 2018, 10:25am

The non-ROOT specific part of PyROOT depends only on libCore and libCling. We are working on decoupling that. A future change might decouple PyROOT even further, making it depend only on libCling - but that will not happen in 2018.

beojan · August 22, 2018, 10:50am

It’s not the dependency that I’m talking about. Right now, PyROOT is part of the ROOT package and built with ROOT. So if you switch your Python executable (e.g. because you’re using Pyenv), ROOT has to be rebuilt to install PyROOT into the new environment. If PyROOT could be installed from PIP linking to the existing ROOT installation, this wouldn’t be an issue.

Axel · August 22, 2018, 11:45am

Thanks! We are aware of this lack of cooperation on our build system’s side.

So that’s also something we’re working on. The first step is to build PyROOT on top of an existing ROOT build. This might still arrive this year!

sbinet · August 22, 2018, 12:16pm

in the meantime, there are a couple of alternatives:

uproot: a pure python package that’s able to read a bunch of ROOT files
go-hep/rootio (a pure-Go package that’s able to read a bunch of ROOT files, and the next release (before september) should be able to write them as well) + go-python/gopy (that allows to wrap (almost) any Go package as a Py2 or Py3 module)

wlav · August 22, 2018, 5:46pm

If those are “alternatives” you can as well install cppyy directly from PyPI. Comes with manylinux1 wheels for fast installation and still contains ROOT I/O out-of-the-box. You can also point LD_LIBRARY_PATH to an existing ROOT installation to pick that up instead, as long as the ROOT version matches (or you can recompile the tiny cppyy-backend package from PyPI from source for different ROOTs).

All you’re missing out on are the PyROOT-specific pythonizations, but those don’t exist in uproot etc. either.

beojan · August 23, 2018, 8:12am

Does this mean cppyy is sufficient for root_numpy / root_pandas to work?

I presume rootpy on the other hand actually requires PyROOT.

@sbinet uproot would eventually be an alternative for me once root_pandas supports it but I don’t see how getting go involved is an improvement when the problem was that PyROOT is a little hard to build.

sbinet · August 23, 2018, 8:49am

@beojan building Go stuff is pretty easy (and waaay faster than anything C++-based).
gopy pythonizes Go APIs and tries to expose objects that implement the (python) buffer protocol.
pandas (and anything numpy-based) makes heavy use of that protocol. so, it’s a net win.

my reasoning being mentioning go-hep/rootio+gopy, besides just throwing it as a bonus, is that you get a python module that’s built on top of a Go package, completely independant of ROOT/C++ (and all the compilation trouble, toolchain headache and general slowness of such a C++ beast.)

another alternative I forgot to mention is Rust/root-io + PyO3 (the rust equivalent of pybind11.

beojan · August 23, 2018, 9:00am

Docker (which is built in Go) is the only piece of software I have tried, failed, and given up at building. Go’s build tools are OK if you actually want to write Go (though it does seem to have copied some of NPM’s shortcomings) but if you just want to install something, it’s a massive faff compared to cmake ..; make; make install.

Code written in C++ tends to use a few large libraries instead of lots of small ones, and tends to bundle any small, uncommon, libraries, which (perhaps surprisingly) greatly reduces the hassle compared to a language with a package system. Then you have all the GOPATH nonsense.

ROOT is definitely one of the harder pieces of C++ software to build (along with Qt) but frankly, it’s easier than Go if you’re not already familiar with Go.

sbinet · August 23, 2018, 9:20am

it’s definitely veering a bit off topic, but, in my experience, any pure-Go package I’ve tried, is just a simple:

$> go get github.com/foo/bar

away. I am not including Go packages that need to use C libraries because then you get the worst of both C and Go worlds.
docker falls (kind of) in that category.

I’d be interested in getting more details (perhaps in a PM) about what you meant by "it does seem to have copied some of NPM’s shortcomings".

<shameless-plug>
also: https://sbinet.github.io/posts/2018-07-31-go-hep-manifesto/
</shameless-plug>

beojan · August 23, 2018, 9:38am

in my experience, any pure-Go package I’ve tried, is just a simple:

Once you’ve setup a Go environment, sure (otherwise does it automatically create ~/go? I don’t really like things cluttering up my home directory with non-hidden directories like this.). Not if you only want to build one package.

it does seem to have copied some of NPM’s shortcomings

I suppose PyPI also suffers from this really, though in practice it’s less of a problem, at least for scientific packages. I’m talking about it going off and automatically downloading and installing large numbers of dependencies which haven’t been vetted.

Because everything is statically linked, if a bug is found in a library you can’t just update that library, everything that uses it has to be rebuilt. This (along with plugin architectures being possible) is the main reason shared libraries are used in the C and C++ worlds (though with libraries like Qt or ROOT, they’re so big that compiling is going to take a long time no matter what language it’s written in, so recompiling for every application isn’t feasible). Not unlike Node where your code has a local copy of all it’s dependencies.

sbinet · August 23, 2018, 10:32am

yes, starting with Go-1.8 (circa 2017) ~/go is created automatically if doesn’t exist yet.
but it’s no different than python’s $PYTHONPATH (~/go is the default value for $GOPATH) where all the python code lives.

you’ll be probably happy to know that with Go-1.11 (which should be released before Sept. 2018), $GOPATH will start to be completely irrelevant, with the advent of the so-called Go modules, which are self-contained, versioned (with semver semantics, checksums of all deps, etc…) packages.
Go modules are built w/o requiring a $GOPATH, and everything will be compiled under $TMP.
All the dependencies (versions+checksums) will be (automatically) described by a go.mod file.

see:

research!rsc: Go & Versioning (the original plan)
Introduction to Go Modules – The Roberto Selbach Chronicles (a more up-to-date status)

Go modules essentially enable completely reproducible builds, across time and space.

They also make updating a package b/c of a security/bug fix pretty easy. (that was already the case, but now with the explicit list of dependencies and their version, one can quickly decide whether a package needs to be updated b/c of a security fix that must be applied to one of its dependencies. transitively.)
rebuilding a Go package is so fast, there’s no excuse to not rebuild it: go get. and voila.

I don’t see how this would be different in the remote chance that C++ were to gain a real, cross-platform, easy CPAN-, CRAN-, cargo- or PyPI-like package manager.
dependencies, whether C++ ones, automatically downloaded ones or otherwise, are still to be vet, if the package/library user were to apply due diligence.
At least, with NPM, PyPI and go-get, you automatically download+install said dependencies.
Usually, with a CMake/make/make-install build, you have to chase all of these manually, install their dependencies manually (recursively), learn their build system, etc…

I am not saying go get is a perfect tool. (it’s not.)
but it’s light-years away and above the typical experience of somebody trying to install a C++ application that’s not packaged for their favorite distribution.
and it’s fast.
and it’s simple. (the number of switches and options is minimal.)

(also, go get automatically downloads, compiles and installs everything in one go. but you can go get -d to only automatically download everything, and go install to automatically compile and install everything, with no network access. something rather interesting for build reproducibility :P)

beojan · August 23, 2018, 11:17am

I don’t see how this would be different in the remote chance that C++ were to gain a real, cross-platform, easy CPAN -, CRAN -, cargo - or PyPI -like package manager.

That’s why I’m strongly opposed to C++ gaining such a package manager, though sometimes I feel like a lone voice. The lack of one means people think carefully about adding dependencies, vendoring them in the case of small ones (spdlog, range-v3), or using Boost or Qt (which is likely to already be installed) where possible.

The right way to implement automatic package management is with system vendors (distributions) providing vetted repositories, not with an ad-hoc anything goes system like most languages have. These language specific systems generally also don’t play nice with the system package manager.

Pkg-config and CMake Find files take almost all of the pain out of building C++ applications. It would be nice if CMake or pkg-config could query the system package manager and tell you which packages to install to get missing libraries, but in practice I’ve only rarely run into this situation. Generally, if I build something from source, it only needs libraries I’ve already got installed. If a C++ application isn’t packaged for your distribution and uses more than a couple (recursively) of unpackaged, unvendored dependencies, I’d say that’s a good sign that either the application author is too cavalier, or you’ve made a bad choice of distribution.

wlav · August 23, 2018, 4:29pm

Not sure where this discussion is go-ing, but as mentioned cppyy comes with wheels so no compilation happening and virtualenv/pip will deal cleanly with paths.

To come back to the original point, and this extension on the question:

I’m not familiar with root_pandas, but I’ve looked in great detail into root_numpy when reviewing it and I’d say yes, if you replace the “import ROOT” by “import cppyy” and “ROOT.” by “cppyy.gbl.”, then that’ll work fine. (For testing purposes, you can do this transparently by mucking with the import hook.) For the graph/hist objects you’ll need to provide the libraries from a (python-less) ROOT installation. Further, for the extend that root_numpy would use cppyy.gbl, the ancient cppyy in PyROOT will do as well, so nothing is lost if you do not install cppyy but use vanilla ROOT with PyROOT instead. I.e. such a code change should be acceptable to whoever maintains root_numpy today.

Note that I’ve seen numpy/TTree compatibility features fly by for PyROOT, too, so mayby root_numpy is by now obsolete anyway if not used for standalone purposes.

Yes, I suspect so, but at the same time, rootpy isn’t going to be much use w/o all of ROOT (as opposed to I/O sec, which is I thought was the objective if uproot were a potential alternative). Also, there hasn’t been any activity on GitHub - rootpy/rootpy: A pythonic interface for the ROOT libraries on top of the PyROOT bindings. for close to a year, so even to the extend that it could, I’m not sure who would pick up such a task.

At some level though, it is what it is. I saw a lot of complaints during PyHEP of PyROOT not playing nice with the Python eco-system. The humorous part of those complaints is of course that PyROOT, having seen life as PythonROOT in 2002, predates pretty much the whole of that. This was one of the considerations for me to fork PyROOT as cppyy and fully restructure it when moving it into the Python eco-system. And even then, having had a clean start, it took more than a year for PyPI wheels to appear. Don’t even get me started on (ana)conda.

Anyway, those are the card you get to play with today. As Axel said, effort is underway to have more and better cards maybe even before the end of the year. When Enric finishes the PyROOT pythonizations on top of cppyy master, repackaging that as a PyPA package (simply by providing a setup.py within ROOT a la cppyy-backend, which is on PyPI and builds against $ROOTSYS if provided) would be a mere formality, even if the normal install would still be as part of ROOT proper.

system · September 6, 2018, 4:32pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.