Can't append array to vector in PyROOT if it is created with ProcessLine()


Maybe I am missing some obvious initialisation, but, please consider the following code:

import ROOT
ROOT.gROOT.ProcessLine("vector<double> *v = new vector<double>();")
v1 = ROOT.vector("double")

It the commented line is uncommented, I get the NotImplementedError on it. However, initialising another vector directly from PyROOT seems to fix this error, and += operation on the vector created in C++ starts to work. Is it a bug, or am I missing something here?

I just want to store numpy arrays in vectors in a C++ class, thus I need += operator working on those vectors.

ROOT Version: 6.22.06
Platform: Fedora 33
Compiler: Not Provided

Hello @LeWhoo,

I have reproduced the issue. I think that the preferred way of declaring a vector is via ROOT.vector(). Anyway, this seems like a bug.

I can also confirm that declaring the vector as follows results in the same behavior.

ROOT.gROOT.ProcessLine("vector<double> v;")

Unfortunately, we will have to confirm this until @etejedor is back (in April).


Thanks! I know the preferred pythonic way, however since python classes can’t be nicely serialised to a TTree, the “ProcessLine” version is sometimes a must… I do it in another way now, and here comes probably another bug or rather a missing feature.

Both v+=np.array and ROOT.vector(“double”)(np.array) are very slow. I suspect there may be some python loop involved. When I pass the numpy.array as double * into my class constructor and inside the constructor I use vector::assign(), it is much faster. I didn’t measure how much faster, but probably 10 or even 100 times.

Thanks for reporting, @LeWhoo. It would be great if you could attach the code that you are using in the second case. We will look into this as soon as @etejedor is back.


Actually, I found out that the slow-down was caused mainly by another issue - I was filling a vector from HDF5 dataset, which is very slow. If I put the dataset inside np.array() it becomes much faster. Even if it is ROOT issue, I am not sure if it is worth investigating.

After fixing, the difference between assign and other methods is much smaller, but… depends on the environment for benchmarking. Here is the code:

import ROOT
import numpy as np
import time

a = np.random.rand(10000)
print(a.dtype, a.size)
ts = time.process_time()
v = ROOT.vector("double")(a)
ts = time.process_time()
v1 = ROOT.vector("double")()
ts = time.process_time()
v2 = ROOT.vector("double")()
print(a[0], v[0], v1[0], v2[0])

The result on my command line is:
So () init of vector is ~6 times slower than assign, and += is ~2 times slower.

However, if I run the same through jupyter notebooks, I get:

Differences are smaller and += is slower than (). Not sure why…

Still, perhaps at least () init should default to .assign().

After running your code excerpt, I can confirm that assign() is the fastest -actually, on my machine, it is one order of magnitude faster than passing the numpy array to the constructor-.

Maybe @etejedor finds it interesting to investigate these differences when he is back.

You’re looking at noise. First, use perf_counter instead of process_time; second, jack up the size of a by 100x. Then the results are repeatable and make sense (to me anyway).

The results were repeatable here, but you are right. I’ve increased the array size 1000 times, and now the += methods is the slowest.

With process_time:

With perf_counter:

I’m new to perf_counter vs process_time, but from what I was able to find, in case of benchmarking process_time was advised, as it excludes sleep and slowdowns of the process due to system activities.

Sure, but the code in your example isn’t sleeping, whereas the other difference is that process_time accumulates time of all threads. Thus, if you just import ROOT, then (unless things have changed that I’m not aware of) you also have the graphics thread doing whatever and all its cycles are accounted for in the total, too. That will vary by quite a bit from run to run and (again, unless things have changed), it has startup work to do, hitting the first loop harder then the others.

(For that matter, given that cppyy and Cling have lazy initialization running on the first call, such as creating the wrappers and deserializing all necessary IR from the PCH or PCMs, you may want to run a warmup round regardless.)

Anyway, see the script below as an example of what I mean: perf_counter is constant with the number of threads, process_time accumulates.

import cppyy
import time

double calc(size_t sz) {
    double res = 0.;
    for (int i = 0; i < sz; ++i)
        res *= std::atan(i);
    return res;

void multi(int n, size_t sz) {
    std::vector<std::thread> workers(n);
    for (int i = 0; i < n; i++)
        workers[i] = std::thread(calc, sz);

    for (auto& w: workers)

N  = 4
SZ = 100000000

for i in range(N):
    ts = time.perf_counter()
    cppyy.gbl.multi(i, SZ)
    ts = time.perf_counter()-ts
    print("perf:", i, ts)

for i in range(N):
    ts = time.process_time()
    cppyy.gbl.multi(i, SZ)
    ts = time.process_time()-ts
    print("proc:", i, ts)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.