Regression with constraints

Dear experts,
is it possibile to add a constraint while using a regression?
I would like to adding an invariant mass constraint when estimating a kinematic quantity.
Best, Marianna.

I guess @moneta can help you.

Hi,
You will have to add the constraint as penalty function to the overall function you are minimizing in your regression problem. There is no automatic way of doing it in ROOT

Cheers

Lorenzo

Hi Lorenzo,
with the TMVA-based regression, is there a possibility to change the function to minimize?

Alternatively, I’ve tried to put the constraint as an additional boolean target in the TMVA-based MLP regression. It seems to give reasonable physics results using the produced weights. Could you confirm this is safe?

However at the end of the run, in “Evaluation results ranked by smallest RMS on training sample”, I get an error, which is absent when running without the constraint:
root.exe(1455,0x7fff7595d000) malloc: *** error for object 0x7ff3cb6be3a8: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug.
Due to this error in the TMVAReg.root outfile I don’t get the TestTree and TrainTree.

Hi,
I don’t think you can modify the loss function to add a constraint term in the TMVA MLP and also TMVA DNN.
Your procedure might work, but I would test carefully.

Your crash could be due to the fact that now the problem is a multi-target regression. I am not sure how much this is supported. If it is possible, can you please post the macro and your data and I will verify it ?
Thanks

Lorenzo

Hi Lorenzo,
thanks for your feedback.
The crash present on the mac, was not there in linux.
On mac, I used a defininition for long formula of target and it worked also there.
Last question, for the constraint is it safer to use a boolean target or a float target with min and max values ? Is there any difference in the statistical treatment ?
Thanks Marianna.

Hi,
I think a boolean target is more like a classification problem and not a regression one, and for this it would be better to not have a least-square loss function as it is used in regression. So I would think a floating target is better. However, I don’t know the details of your problem and I cannot give you a definite answer

Lorenzo