\subsection{Minimisation} \label{sec:minimisation} The optimisation of functions is a large topic by itself and a lot of implementations of different algorithms exist. They usually need a function that returns a value, such as the loss, which depends on the parameter values that they use as arguments. Since the computation of the loss is efficiently implemented in \zfit{} thanks to TF and this usually is the heavy part of the minimisation, in practice any minimiser can be wrapped easily. In \zfit{} several minimiser algorithms are implemented by wrapping existing libraries and giving them a common API. An important part of the design is that the creation of a minimiser object does neither execute any minimiser function nor tie itself to a specific loss, it's simply the configuration of the minimiser. This implies that the minimiser is stateless, and to actually invoke it, the \pyth{minimise} method needs to be called. The information about the minimisation procedure and the parameter values are collected in a \pyth{FitResult} and returned by the minimiser. \subsubsection{Different optimisations} While a whole variety of algorithms exists, not all are equally feasible to be used for model fitting as done in \zfit{}. There are various distinctions that influence the choice of a certain minimiser. \begin{description} \item[Derivative] Some optimisers use the derivatives and others don't. In general, using the derivative provides a huge advantage since it tells about the local shape of the function. This requires though that the function to minimise be continuous, which is not always the case. For model fitting, functions are continuous and, thanks to TF, \zfit{} uses an analytic expression for them. \item[Global/Local] Some optimisers are better in finding a global minimum by doing a variation of a large grid search. Others focus on a local minimum by using a starting point and going along a path to the next minimum. The latter is more accurate and faster \textit{if} a good initial parameter estimation is given, so that the minimiser does not start far from the true minimum. In model fitting, a reasonable estimate can often be made. \item[Dimensionality] The number of parameters that have to be tweaked in order to minimise the loss has an impact on the strategy. While the Hessian matrix can help greatly with the minimisation, it has $n_{params}^2$ entries. This renders its usage impossible for hundreds of thousands of parameters as used in deep learning. For model fitting with a maximum of a few hundreds and typically around a dozen of parameters, using the Hessian is feasible. \end{description} In HEP model fitting, mostly local, second order derivative minimisers are used. An important algorithm is the Newton method of minimisation: leaving the mathematical details away, the algorithm performs a second order Taylor approximation of the function using the exact Hessian. Assuming that the target is a saddle point with the derivate equal to zero, the equation is solved and the algorithm jumps to the estimated minimum. Since this solution is only the true minimum if the second order approximation were exact, this step is repeated until some convergence criteria are fulfilled. Newton's method is often not directly used since the computation of the Hessian and its inverse can be computationally expensive. Instead an approximation of the inverse Hessian is calculated and updated on every minimisation step, giving rise to a family of methods called Quasi-Newton methods. Since these updates bring some ambiguity in higher dimensions, different methods use different assumptions on the updates. A prominent example is the BFGS algorithm and its low memory variant L-BFGS, which only stores the most recent steps. L-BFGS is also the method implemented in Minuit, the most common used minimiser in frameworks like \roofit or \root. Another important point is the stopping criteria. While Minuit has its own stopping criteria and in general good criteria have to be found with experience, this tuning for other minimisers is still ongoing work in \zfit{}. It is though simply possible to define and change this criteria for a minimiser. A special mention has to be made about TF optimisers. As discussed above, for deep learning first order algorithms are used relying on the so called ``gradient descent" technique. This algorithm simply follows the negative gradient iteratively. Advanced variations alter the step size based on heuristics but are still largely inferior for low dimensional problems in comparison with Quasi-Newton methods. A wrapper for any TF optimiser is available in \zfit{}, an example is provided using the implementation of the Adam\cite{kingma2014adam} optimiser. Simple benchmarks though yield an order of magnitude increase in the number of minimization steps compared to Quasi-Newton methods, they are not competitive. to the