Subtracting bad events using RDataframe

Hi, I would like to subtract some unwanted events to recreate a leaf using:

	auto new_events = df.Define("unwanted_events", "Take(myleaf1, myleaf1.size() > myleaf2.size())") // exclude these events
					    .Define("total_events", "myleaf1")
		   		        .Define("new_events", "total_events - unwanted_events"); // the problem is here

	auto h = new_events.Histo1D("new_events");
	h->DrawCopy();

I am getting the error:

Cannot call operator - on vectors of different sizes.

I understand that both vectors total_events and unwanted_events have different sizes, but what other way I can use to subtract them?

Thanks.

EDIT: example of data:
Capture

Notice event 4037. I want the whole event or preferably only particle [3] out.

I’m not sure I understand the use of Take(). To me it looks like you’d like to just filter some events, like this

df.Define("isBadEvent", "myleaf1.size() > myleaf2.size()")
  .Filter("!isBadEvent")

Hi @jblomer

It’s to define the bad events then filter for them.

I tried:

auto c1 = df.Define("unwanted_events", "Take(myleaf1, myleaf1.size() > myleaf2.size())") // exclude these events
					    .Define("total_events", "myleaf1") //original leaf
		   		        .Filter("!unwanted_events") // filter bad events out

	auto h = c1.Histo1D("new_events"); // draw original leaf (bad events excluded)
	h->DrawCopy();

But I am getting an error:

error: no viable conversion from returned value of type 'RVec<unsigned int>' to
      function return type 'bool'

EDIT: Actually it’s one bad particle located in index [3] of some events. So if I can say somehow, exclude myleaf1[3] and return myleaf1, that would be it. I managed to define those indices using first code line, but I am so far unable to filter it out.

If I understand correctly, you should be able to do something like this

df.Define("cleaned_myleaf1", "myleaf1[ myleaf1 < 42 ]")

What this does is it creates a new vector column “cleaned_myleaf1” that has only those elements from “myleaf1” for which the condition element < 42 holds. Of course you’d need to adjust the condition myleaf1 < 42 to whatever renders a particle as “good particle”.

Thanks for your answer.

Correct, and what this does is, defining for the first column, which is events (see photo above). What I would like to do is to control the index (column 2). So maybe something like:

df.Define("cleaned_myleaf1", "myleaf1[3].empty()")

So this should take out all index 3 and return column. But unfortunately it does not work.

I hope it is more clear now what I want to do :slight_smile:

Oh, now I also understand the use of Take. So that should work

df.Define("cleaned_myleaf1", "Take(myleaf1, myleaf2.size())");

I.e. cleaned_myleaf1 takes only the first elements of myleaf1, where “the first” are as many as the size of myleaf2.

Actually it is my fault. I was defining bad events. Instead, I could define good events and see what to do with them.

Apologies and thanks for your help.