Newer
Older
Master_thesis / thesis / modelfitting.tex
\section{Model fitting}
\label{sec:modelfitting}
\label{sec:theory}

In HEP, observations can be quantified by mathematical models which originate 
from hypotheses or theories and make assumptions about the 
underlying behaviour of nature. Often, these models have free parameters that 
we want to measure. A single model may describes only parts of the 
observations and combinations and compositions of models may be needed to build 
a model that describes the full data sample. Creating these models in a 
convenient and correct way and 
finding the values of the parameters to 
maximise the agreement with respect to the data is what ``model fitting" refers 
to.

\subsection{Maximum Likelihood}

At the very heart of model fitting is the need to quantify the agreement, or 
rather the disagreement, of a model with the data. This function of the 
parameters and data is known as the loss. It is the very definition of the 
problem and mathematically fully defines the solution. In HEP analysis losses 
are mostly based on the likelihood of 
the 
model under the data, whereby the model typically depends on free parameters.
In the following, an 
introduction to the method of maximum likelihood is given. A more detailed 
explanation and derivation can be found in Appendix \ref{appendix:likelihood}.


A likelihood can be defined by the following: given a model parametrised by 
$\theta$ and a dataset $x$, the likelihood describes the odds that an event 
happened under $\theta$
\begin{equation}
\label{eq:likelihood}
\mathcal{L}(\theta) = P(x | \theta).
\end{equation}

The likelihood as shown in Eq. \ref{eq:likelihood} is the quantity to be 
maximised in order 
to achieve the maximal $P(\theta | x)$. To build this likelihood, we need the 
model $f_{\theta} (x)$ to be a 
probability 
density function 
(PDF), i.e. it's normalised to $1$. Especially in HEP, it is often the case 
that 
the PDF is zero 
outside of certain boundaries, for example because points outside a
specified domain are
removed, in which case
\begin{equation}
\label{eq:pdf}
\int_{l}^{u} f_{\theta} (x) \mathrm{d}x = 1,
\end{equation}
where $l$ and $u$ define the lower and upper boundaries of the domain, 
respectively. This 
also extends to higher dimensions. It follows directly that any function 
$g_{\theta} (x)$\footnote{This is about the small subset of modelling functions 
in 
physics 
\textit{without} pretending mathematical correctness in a general way. 
This includes
functions $f: \mathbb{R}^n \rightarrowtail \mathbb{R} $ that are positive, 
$l^1$ 
and (piecewise) $C^1$.} can be normalised and therefore used as a PDF 
$f_{\theta} (x)$
\begin{equation}
\label{eq:pdf from func}
f_{\theta} (x) = \frac{g_{\theta} (x)}{\int_{l}^{u} g_{\theta} (x) 
\mathrm{d}x}.
\end{equation}
A likelihood can be a product of likelihoods of independent events
\begin{equation}
\label{eq:likelihood_from_products}
\mathcal{L} = \prod_{i} \mathcal{L}_i,
\end{equation}
and therefore the likelihood of dataset $x$ can be written as the joint 
probability of each event
\begin{equation*}
\label{eq:likelihood joint probability}
\mathcal{L}(x | \theta) = \prod_{i} f_{\theta} (x_i),
\end{equation*}
with $x_i$ a single event from the dataset $x$.

The calculation of $\mathcal{L}(x | \theta)$ involves the product of many small 
numbers, which is not possible to 
perform using a normal computer given its limited precision. To solve this 
issue, a log 
transformation can be applied. In addition, the log-likelihood is usually 
negated, 
thus
changing the target of finding the maximum to finding a minimum and ending up 
with a negative log likelihood (NLL).

A maximum likelihood estimate using the transformation above is therefore given 
by finding the minimum of the NLL
\begin{align}
\label{eq:nll}
NLL = - \sum_{i} \ln(f(\theta|x_i))
\end{align}

This maximises therefore the agreement between data and model, i.e. the 
\textit{probability of the model given the data}.

As seen in Eq. \ref{eq:likelihood_from_products}, the combination of 
likelihoods is 
quite versatile and not only limited to a model shape matching the data shape. 
Often, a combination of several of the following likelihoods is built

\begin{description}
	\item[Simultaneous] Multiple models can share parameters. To fit them 
	simultaneously to different datasets, their likelihoods can be combined 
	(summed).
	\item[Extended] While a PDF is normalised, we can add an absolute scale as 
	an additional term to the likelihood to reflect the number of events 
	contained in this model. Given the data, we know the number of 
	events and can add a Poisson term to account for them.
	\item[Prior] For some parameters, a prior distribution is known. This 
	describes the knowledge obtained from other measurements and influences the 
	likelihood if the parameters spread is in the same order of magnitude as 
	the sensitivity of the fit to the parameter. A prior, or constraint, is a 
	probability depending 
	directly on 
	the parameter value and can also be added to the likelihood.
\end{description} 
Regardless of the complexity of the model, we end up with a single number, the 
loss, that 
can be used to compare the agreement between different models or 
parametrizations and the data. When fitting a model, the loss is minimised by 
adjusting the parameters. While the absolute value of the loss is usually not 
important, 
the ratio of losses from different models can often be useful in further 
statistic tests.