I got a large DataFrame df. When I retrieve the values of a column like this
auto values = df.Take<double>("column");
and do a simple operation (I need to do something else**, this is just to illustrate, I don’t need to do the sum, I know there is another mechanism in the RDF for the sum):
double sum = 0;
for (const auto v : parValues) sum += v;
Then, I get the following error:
RDataFrame::Run: event loop was interrupted
Error in <TRint::HandleTermInput()>: std::bad_alloc caught: std::bad_alloc
If I reduce the number of entries inside df using df.Range(0,200000000) it runs smoothly. The total number of entries in my data frame is above 1500M entries.
That’s 12Gbytes memory required. Do you think it is a memory issue?
Thank you for your response!
In that case, there is a smarter way to do something with the column data?
**What I want to do is to create a vector of unique elements, i.e. remove all those elements that are repeated and get the unique elements back as an std::vector?
thanks for your post and sorry for a bit of a delay in replying. Before going further, it would be great if you could share a fully working reproducer of your problem (including the data) so we could also test it ourselves. You can also share it via email if you feel more comfortable with that.