How to do sPlot with weighted events

Hi,

I would like to perform sPlots on the Monte Carlo sample that I fit with RooFit.
The problem is that I have to weight my different categories of events (signal, background) to normalize them to a common luminosity. The TSPlot constructor available in ROOT takes only one tree as input, so it does not seem possible to specify anything concerning the weights.
Furthermore, as TSPlot performs the fit itself, and cannot be fed with a covariance matrix computed from elsewhere, I cannot hope to just compute the sweights and reweight them afterwards by the luminosity weights to get the correct fraction of each category of events.
I could make articicially a tree composed of a certain fraction of each categories of events and make the sPlots on those samples, but it does not seem logical to me to test the fit with splots on a different sample from the original one used to perform the fit.
Could anybody tell me how to handle this problem?
Thanks in advance for your help,
Cheers,
Emmanuel

Hi Emmanuel,

I just wanted to let you know, that we (Muriel Pivk and I) are thinking of the best way to add weighting to the current implementation, and hopefully, we’ll find a way soon enough.

Cheers,
Anna

Hi Anna and Muriel,

Thank you for your reply. As I am still stuck with this I was considering writing to one of you to know wether there was an available solution. Do you have a time estimate (a week, a month, more?) for adding this feature or do you have no idea?
Thanks a lot again.
Cheers,
Emmanuel

1 Like

What happened here? Is there anyone in charge of this sofware? Why do questions get asked and go unanswered for months or even years? My advice:

  1. Go and talk to the CERN management and ask for funding.
  2. Open software development positions that pay somewhere between 100K to 200K a year an are permanent. Competent developers cost money and won’t work for peanuts and temporary contracts.

Otherwise no one seems to be in charge of this and you are telling us:

“We do not know, you are on your own, fix it yourself”

And we are kind of busy doing the physics research part to be going through the ROOT source code and fixing ourselves things that should be done centrally by the ROOT people.

BTW, this is one of the reasons no one in the real world uses or cares about ROOT.

Cheers.

Hi rooter_03,

More ROOTers? Always! But CERN tells us it doesn’t have an infinite budget, and so we need to prioritize, from beam to buildings to personnel.

So what you’re saying is that you’re unhappy with how we invest in ROOT. Because your activity here tells me that you’re well aware that we invest. (And no, sadly the argument “and that’s why the world doesn’t use ROOT” isn’t working, there’s tons of super crucial software with no support, and vice versa.)

Maybe it’s a bit brutal of you to resurrect a post from 15 years ago that hasn’t seen an answer; maybe it’s unfair of us to do development even though ROOT has bugs. Whatever it is: we are trying our best to keep you, one of our physicists, productive. Apologies if this failed here. Shall we try to address the issue you have? Maybe in a new topic, where you describe what issue you’d like us to address?

Cheers, Axel.

3 Likes

Hi @Axel ,

OK, I am going to go though all of this in detail and carefully.

I do not know how CERN spends their budget in detail. What I know is this. A student makes around 20-30 K a year. A postdoc makes around 50K a year. If you have bad software that student and postdoc will spend half of their time doing work that is not needed. How many students and postdocs work at CERN in a given year? Let’s say 200 postdocs and 400 students, that’s at least 5 million dollars wasted because of bad software. I am pretty sure that improving the software will cost less than that in man power and I bet CERN’s calculations are not taking into account the ammount of energy and effort that we waste.

We physicists are constantly faced with money issues. In order to get grants, we need to communicate the importance of the problem and the urgency to solve it to the people with the authority to provide funding. I think the key word here are communicating, convincing, urgency.

It is urgent to communicate the urgency of improving the software in order to allow physicists to perform and waste less resources.

Of course, I am not the one having to deal with management. They probably are stubborn and no argument, regardless of how well built, might get to them.

Oh yea, I am definitely not happy. About the way you invest in ROOT? No, I did not say that. I think I did not specify why I am unhappy. I am unhappy because ROOT is badly designed, badly documented and It just requires far too much work to use. Let’s put it this way, I expect ROOT to be 1% of my life. The rest of my work already is 30% of my life. I know that for the developers it is 10%, maybe even 30% of their lives. However, the idea is that you have to deal with building good software, so that we do not have to deal with it. If I am a carpenter and my hammer is well built, I will never think of the hammer, I will think of the chair, because the chair is my job, not the hammer. I should not even notice that the hammer exists, it’s always there, it always works, it is easy to use and I could not live without it.

Also there are plenty of questions that required attention and did not get it. However, I do accept that there are places where support is far better than others. For example, I have got plenty of support when asking questions about RDataFrame. On the other hand, questions dealing with RooFit and RooStats mostly go unanswered or answered late. I can ask a question now but if you are going to reply in 3 weeks, that’s not good for me. By then I will have fixed it myself doing some nasty and time consuming hack.

Regarding the ROOT usage part. Well, if there is software that is crucial and no one maintains, that’s not a valid argument. Saying “ROOT does not have good support, but then what? XXX does not have good support either” is unnacceptable, we won’t be mediocre because others are. Ok, another advice here:

To PhD students: Steer away from ROOT as much as possible, although it won’t be possible always. You want to be a faculty? Of course you do, I do too, most of us want to be professor Dr XXX. But guess what? Only the very best will make it and sadly you probably aren’t among the very best. In my experience, most people end up quitting and doing Data Science, which is not bad, you will live a decent life. However, ROOT will be useless for you there. I have never seen (and feel free to check in glassdoor) a job ad for data scientist positions that requires the use of ROOT, TMVA, etc. Eventually, you might need to learn to use Python and libraries like TensorFlow, numpy, scipy, etc. Start now. Try to make this PhD experience also a learning experience that will serve you in your future career. For example, you have to train a BDT? Do not use TMVA, use TensorFlow. Do you have to make plots quickly? Try using matplotlib, not ROOT.

Back to you, ROOT people. This is another important point. For Physicists doing data analysis, C++ is a really bad language. How much raw talent will we waste seeing students trying to use C++ to do their analysis instead of Python? You need at least 3 years to be good at C++, and most students have little to no previous knowledge of C++ when they start their PhD. Maybe you @Axel think C++ is easier than it is, because you have worked with it for years. However it is not easy, it is hard and these students are not learning to be developers; they want to get plots of distributions and results to publish and put in their theses. I think, us people with this knowledge, should discourage as much as possible students from using C++. The algorithms can be written in C++, by an expert in C++. That algorithm should be available to the user in Python. And given what I have already said about the relities of the job market and academia (no faculty jobs + the job market does not care about ROOT) we should encourage students to become more familiar with python.

I understand your point, you think I cherry-picked an old post. This is what happened:

  1. Work on a bug, a problem I do not understand and I cannot solve.
  2. It’s been hours, maybe 5-7 hours working on the same thing.
  3. I finally found a way to implement the test but I need to be able to do X.
  4. Check the documentation but cannot find a way to do it.
  5. Check the forum and find 3 or 4 posts about it, all of them ended up been unanswered. One of them seems promissing but, it also went unanswered, years ago.
  6. Get angry and rant about it.
  7. Spend time writting functions that I should not have to write myself, because X should have been implemented years ago.

My problem “X” is exposed here:

It is fair to say that It is a 15 year old post and that ROOT Is not like that anymore. However, I do believe that ROOT is not like that everywhere, but it is still like that in some places, like RooFit and RooStats, e.g:

And It took me just one minute to find. Where the original developers are not in charge of the code anymore and people who had no knowledge of the libraries had to join and learn about them. I understand that for those people, learning how RooFit and RooStats work must be challenging. Now my point is relevant. A library as cental to Physicists as RooFit cannot be developed by someone who will dissapear and not show up anymore to provide support. This work has to be done by high quality, well paid, full time, permanent developers, ideally two of them, so that if one leaves, the other will be around to train a replacement and provide support.

I saw your reply days ago, however I waited to have a clear mind before replying to you. Am I been unfair with my comments? Maybe, I do not claim to know everything, but I do believe that there is some truth in what I have said and I never heard anyone saying this before. So I hope you do not take my words as an insult, but as the thoughts of a heavy ROOT user that have grown througout the years.

Cheers.

Thanks a lot, @rooter_03 for your thorough feedback. It raises points that we share, and we need to continue to communicate (what’s the use of a fantastic detector if in the end you waste physics reach because we don’t invest enough in software?) - and parts that we, as in ROOT, needs to address.

We have decided to call out parts that are unmaintained. We have three options:

  • make them maintained. With limited resources, the bulk of the community moving away from features, and code of legacy parts that doesn’t invite contributions this option isn’t exactly obvious.
  • remove unmaintained packages. This would immediately improve ROOT’s quality: unmaintained code is usually code following previous best practices, code that we don’t invest in. But we know we have users of this code, and we find the argument “you are too few to matter” unacceptable, given the resources we need to invest to keep these packages alive.
  • clearly signal that a package is not maintained anymore. Use at your own (hopefully legacy) risk; and calling out what to use instead. That’s the option we will get going towards to. For first examples, see ROOT: TLorentzVector Class Reference and, well, ROOT: TSPlot Class Reference

Regarding physicists spending too much time on using code of insufficient quality: we diagnosed the same issue a couple of years ago. With ROOT 6 and cling we now have the tools at hand to fix this. We recommend “ROOT as a Python module” just like “ROOT as a C++ library”. The modern interfaces we provide have been developed with good documentation, thorough tests, targeting robustness and least surprise. Take a look at RDataFrame; more like these are coming up, for instance the TTree successor RNTuple, and for instance a new histogramming and graphing library that’s under development.

We are also working on increasing the funding base for ROOT, encouraging collaboration and contributions, and reaching out into other fields. RNTuple is a nice example here: we know where we beat Parquet (considerably, for the HEP case), and those characteristics are not HEP specific. cling, ROOT’s C++ interpreter, is getting a life of its own, with ongoing integration into the llvm ecosystem.

If you are interested in converting your ideas into reality, for instance by contributing: let us know! We have plenty of areas where we can use brains and hands. If you cannot commit (yourself or resources) then we welcome your feedback, especially on the new ROOT: please try things out and criticize us!

Best regards,
Axel

rooter_03,

By pure accident I came across this posting (actually not completely coincidental I was looking what Axel was up to).

I read this email and I see the usual stuff and that is all fine, but you crossed a line when you started advising students as an anonymous person. What is your experience ?, where did you work ?

It is actually amusing that you put 10 years of your life (although a bit lazy at 30%) into higher education just to end up at a job that requires Tensorflow, matplotlib … You could have acquired that knowledge at a fraction of the cost/time at a local community college.

I hope that a science student devotes time to his/her study/research to learn a framework to approach new future problems. (anybody raise his hands who is going to use that difficult special-relativity coursework at a future industry job…).

ROOT has been an open-source project driven by its users and given continuity by core development members. For more than 25 years it has evolved and still is proving its relevance. There are not so many other large-scale software projects that can make this claim.

In general, the more the question relates to physics the more time consuming the answer will be. Questions related to storage issues (TTree,RDataFrame) will have quicker turn around than those related to statistical data analysis (Roofit). Actually, the core team might not even have the knowledge to answer the more involved ones, here the community should step in.

I disagree with the solution of high-quality, well paid, full time, permanent developers. For me it is actually quite the opposite, academic research depends on constant insertion of fresh new talent. Who wants to make a career out of writing/maintaining a histogramming package for the next 25 years; and it has to be that long because after that experience you are unemployable.

Case in point, just look at the members of the ROOT team and graph their number of responses as a function of time, taking out anomalies like Rene and Wile_E_Coyote.

Your advice about not learning C++ but Python is odd too. You are telling me that after 10 years of higher education and for sure some exposure to a programming language, you can not pick up a new language within a couple of weeks. That really sounds promising to those who will be finished in 10 years by which time for sure a new language is in fashion.

Not anonymous Eddy