[RooFit] How should I correctly implement weight errors when splitting RooDataSet

Hello RooFit experts,

When I try splitting the dataset, the errors seem not to be passed to subsets correctly.

My code is here:

void repro()
{
    using namespace RooFit;

    std::cout << "ROOT version: " << gROOT->GetVersion() << std::endl;
    // Add data1
    RooRealVar x1{"x1", "x1", 0, 10};
    RooCategory cat1{"cat1", "cat1", {{"sample_1", 0}}};
    RooRealVar weight1{"weight1", "weight1", 1.0};
    RooDataSet data1{"data1", "data1", {x1, cat1, weight1}, WeightVar(weight1), StoreError(weight1)};
    data1.add({x1, cat1}, 2.0, 0.3);

    // Add data2
    RooRealVar x2{"x2", "x2", 0, 10};
    RooCategory cat2{"cat2", "cat2", {{"sample_2", 1}}};
    RooRealVar weight2{"weight2", "weight2", 1.0};
    RooDataSet data2{"data2", "data2", {x2, cat2, weight2}, WeightVar(weight2), StoreError(weight2)};
    data2.add({x2, cat2}, 3.0, 0.4);

    // combine data1 and data2 into combData
    RooRealVar weightComb{"weightComb", "weightComb", 1.0};
    RooCategory catComb{"catComb", "catComb"};
    catComb.defineType("sample_1");
    catComb.defineType("sample_2");
    RooArgSet args;
    args.add(x1);
    args.add(x2);
    args.add(catComb);
    args.add(weightComb);
    args.add(weight1);
    args.add(weight2);

    RooDataSet combData{"combData", "Combined data", args, Index(catComb), Import({{"sample_1", &data1}, {"sample_2", &data2}}), WeightVar(weightComb), StoreError({weight1,weight2,weightComb})};

    // split combData into data1Comb and data2Comb
    std::unique_ptr<TList> dataList{combData.split(catComb, true)};
    auto& data1Comb = static_cast<RooDataSet&>(*dataList->At(0));
    auto& data2Comb = static_cast<RooDataSet&>(*dataList->At(1));

    // Check original data1 and data2 weight errors
    data1.get(0);
    data1Comb.get(0);
    std::cout << "data1 weightError: " << data1.weightError() << ", data1Comb weightError: " << data1Comb.weightError() << std::endl;
    data2.get(0);
    data2Comb.get(0);
    std::cout << "data2 weightError: " << data2.weightError() << ", data2Comb weightError: " << data2Comb.weightError() << std::endl;

    // debug:print all datasets
    std::cout << "\nCombined data:" << std::endl;
    combData.Print("v");
    combData.weightVar()->Print();
    std::cout << "\ndata1:" << std::endl;
    data1.Print("v");
    data1.weightVar()->Print();
    std::cout << "\ndata2:" << std::endl;
    data2.Print("v");
    data1.weightVar()->Print();
    std::cout << "\ndata1Comb:" << std::endl;
    data1Comb.Print("v");
    data1Comb.weightVar()->Print();
    std::cout << "\ndata2Comb:" << std::endl;
    data2Comb.Print("v");
    data2Comb.weightVar()->Print();

}

And the output is:

   ------------------------------------------------------------------
  | Welcome to ROOT 6.36.02                        https://root.cern |
  | (c) 1995-2025, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jul 10 2025, 20:02:19                 |
  | From tags/6-36-02@6-36-02                                        |
  | With g++ (GCC) 13.1.0                                            |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------


Processing repro.cpp...
ROOT version: 6.36.02
data1 weightError: 0.3, data1Comb weightError: 0
data2 weightError: 0.4, data2Comb weightError: 0

Combined data:
DataStore combData (data2)
  Contains 2 entries
  Observables: 
    1)       x1 = 5  L(0 - 10)  "x1"
    2)       x2 = 5  L(0 - 10)  "x2"
    3)  catComb = sample_2(idx = 1)
  "catComb"
    4)  weight1 = 2 +/- 0.3 C  L(-INF - +INF)  "weight1"
    5)  weight2 = 3 +/- 0.4 C  L(-INF - +INF)  "weight2"
  Dataset variable "weightComb" is interpreted as the event weight
RooRealVar::weightComb = 3 C  L(-INF - +INF) 

data1:
DataStore data1 (data1)
  Contains 1 entries
  Observables: 
    1)    x1 = 5  L(0 - 10)  "x1"
    2)  cat1 = sample_1(idx = 0)
  "cat1"
  Dataset variable "weight1" is interpreted as the event weight
RooRealVar::weight1 = 2 +/- 0.3 C  L(-INF - +INF) 

data2:
DataStore data2 (data2)
  Contains 1 entries
  Observables: 
    1)    x2 = 5  L(0 - 10)  "x2"
    2)  cat2 = sample_2(idx = 1)
  "cat2"
  Dataset variable "weight2" is interpreted as the event weight
RooRealVar::weight1 = 2 +/- 0.3 C  L(-INF - +INF) 

data1Comb:
DataStore sample_1 (sample_1)
  Contains 1 entries
  Observables: 
    1)       x1 = 5  L(0 - 10)  "x1"
    2)       x2 = 5  L(0 - 10)  "x2"
    3)  weight1 = 2 +/- 0.3 C  L(-INF - +INF)  "weight1"
    4)  weight2 = 1 C  L(-INF - +INF)  "weight2"
  Dataset variable "weight" is interpreted as the event weight
RooRealVar::weight = 2 C  L(-INF - +INF) 

data2Comb:
DataStore sample_2 (sample_2)
  Contains 1 entries
  Observables: 
    1)       x1 = 5  L(0 - 10)  "x1"
    2)       x2 = 5  L(0 - 10)  "x2"
    3)  weight1 = 2 +/- 0.3 C  L(-INF - +INF)  "weight1"
    4)  weight2 = 3 +/- 0.4 C  L(-INF - +INF)  "weight2"
  Dataset variable "weight" is interpreted as the event weight
RooRealVar::weight = 3 C  L(-INF - +INF) 

From the output

data1 weightError: 0.3, data1Comb weightError: 0
data2 weightError: 0.4, data2Comb weightError: 0

The information of errors seem missing.

It said that for data1Comb and data2Comb, Dataset variable "weight" is interpreted as the event weight. I even don’t know where the variable weight come from.

Thanks for your attention! I would appreciate if you could give me some suggestions.

I am not sure if it’s related to this part in RooAbsData::split() ROOT: roofit/roofitcore/src/RooAbsData.cxx Source File

std::vector<std::unique_ptr<RooAbsData>>
RooAbsData::split(const RooAbsCategory &splitCat, bool createEmptyDataSets) const
{
   SplittingSetup setup = initSplit(*this, splitCat);
 
   // Something went wrong
   if (!setup.cloneCat)
      throw std::runtime_error("runtime error in RooAbsData::split");
 
   auto createEmptyData = [&](const char *label) -> std::unique_ptr<RooAbsData> {
      return std::unique_ptr<RooAbsData>{
         emptyClone(label, label, &setup.subsetVars, setup.addWeightVar ? "weight" : nullptr)};
   };
 
   return splitImpl(*this, *setup.cloneCat, createEmptyDataSets, createEmptyData);
}

Here a string called “weight“.

In fact, if I tried this reproducer in [RF] Weight errors are wrong when splitting RooDataSet and RooDataHist with weight errors · Issue #12453 · root-project/root · GitHub

void repro()
{
    using namespace RooFit;

    RooRealVar x{"x", "x", 0, 10};
    RooCategory cat{"cat", "cat", {{"sample_0", 0}}};
    RooRealVar weight{"weight", "weight", 1.0};
    RooRealVar wt{"wt", "wt", 1.0};

    RooDataSet data1{"data1", "data1", {x, cat, weight}, WeightVar(weight), StoreError(weight)};
    RooDataSet data2{"data2", "data2", {x, cat, wt}, WeightVar(wt), StoreError(wt)};

    data1.add({x, cat}, 2.0, 0.3);
    data2.add({x, cat}, 3.0, 0.4);

    std::unique_ptr<TList> data1List{data1.split(cat, true)};
    auto& data1p = static_cast<RooDataSet&>(*data1List->At(0));
    std::unique_ptr<TList> data2List{data2.split(cat, true)};
    auto& data2p = static_cast<RooDataSet&>(*data2List->At(0));

    data1.get(0);
    data1p.get(0);
    std::cout << "data1 weightError: " << data1.weightError() << ", data1p weightError: " << data1p.weightError() << std::endl;
    data2.get(0);
    data2p.get(0);
    std::cout << "data2 weightError: " << data2.weightError() << ", data2p weightError: " << data2p.weightError() << std::endl;

}

The output shows that, if the name of the weight variable is not “weight“, the errors in split dataset is not correct.

   ------------------------------------------------------------------
  | Welcome to ROOT 6.36.02                        https://root.cern |
  | (c) 1995-2025, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jul 10 2025, 20:02:19                 |
  | From tags/6-36-02@6-36-02                                        |
  | With g++ (GCC) 13.1.0                                            |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------


Processing repro.cpp...
[#0] ERROR:DataHandling -- An event weight error was passed to the RooDataSet 'sample_0', but the weight variable 'weight' does not store errors. Check `StoreError` in the RooDataSet constructor.
data1 weightError: 0.3, data1p weightError: 0.3
data2 weightError: 0.4, data2p weightError: 0

where weight error in data1 is correct while in data2 is not correct

Hi Yanqi,
Thank you for your question.
@jonas could you please take a look?