Master_thesis/thesis/conclusion.tex at a61f2fdf0ee1393619868f59e6208e0690aabafd

Fork: 0
saslie / Master_thesis
Find file
Newer
Older
Master_thesis / thesis / conclusion.tex
Sascha Liechti on 30 Sep 2019 5 KB Minor updates
Raw Blame History
\section{Conclusion and outlook}
\label{sec:conclusion}

\zfit{} is a versatile library that fills the gap of model fitting in Python 
for HEP. Built on top of the deep learning framework TensorFlow, it has 
shown great advantages including a remarkable speedup for parallelisation. The 
formalisation into five loosely coupled parts and an extensive base class for 
custom models extend its scope far beyond the usual feature set of HEP fitting 
libraries.

The project has been successful so far, and is being used in several 
high-impact analyses. In addition, it has been shown how the \zfit{} design 
allows 
to build extremely complex analyses, namely amplitude analyses, in a reasonably 
simple way, unlike most general fitting libraries. However, a lot of work 
remains to be done on the way to establish a stable, 
reliable fitting library. The main features to be improved on in the near 
future are:

\begin{description}
	\item[Binned fits] Fits are sometimes 
	performed in bins of data, mostly to speed up the computation. Furthermore, 
	the shape of a model can be too complicated 
	to be described analytically and has to be deduced from simulation or data 
	samples by creating a template PDF.
	Currently, there is no native support yet for binned fits or 
	template PDFs. This is right now under active development and will 
	be added to \zfit{} in the future.
	
	\item[Optimisation] Model fitting can be a numbers game: in order to 
	estimate
	uncertainties of parameters or to study the sensitivity of a fit with 
	toys, a large number of repeated fits have to be performed. To keep this 
	feasible in terms of time and computing resources, performance matters. 
	There are currently still various places where the 
	computation can be 
	optimised. This includes the caching of computations and more 
	efficient numerical integration by using 
	advanced Monte Carlo techniques or other numerical methods.
	
	\item[Serialisation] Models are currently built within a script. Often, a 
	model needs to be stored and used again later on or in a modified version, 
	which is not well achieved
	by just dumping the code. To actually define a model, for most cases no 
	code is actually needed but a configuration file with the model description 
	is sufficient. This 
	allows to change certain parts of a model and rather inexperienced users to 
	safely build a model.
	Therefore, a complete serialisation of a model
	into a human readable format is planned for \zfit{}.
	
	\item[Content] In HEP, there are quite a few different shapes and 
	possibilities of 
	combinations that models are built with in order to describe the 
	observables correctly. This includes angles, masses, incorporating smearing 
	effects and 
	more. \zfit{} and its 
	extensions currently don't contain a 
	lot of different models or losses. The essential parts are contained but 
	more are planed to come in the future. It is expected for them to be 
	continuously added, also depending on the needs that may arise. 
	Furthermore, with \zphys{} a repository especially created for content and 
	simple community contributions is available.
	
	\item[Large scale] Fits in HEP can be large, both in terms of data as well 
	as 
	in the complexity of the fitting model. With future experiment upgrades an 
	increased amount of data is expected and a fitting library has to cope with 
	that. Complex models and more precise measurements also increase the need 
	for a reliable normalisation, achieved by a higher number of random samples 
	drawn for the integration. While 
	scalability to medium scales is already available with \zfit{}, the 
	software should not 
	be the limit in terms of scaling, the computing infrastructure should 
	be. This requires that on-the-fly normalisation computation can be 
	performed. The extension to huge data samples with out-of-core computations 
	and to use multiple nodes as well as 
	GPUs is also a requirement. TF supports this quite well, it was designed 
	for that, but the explicit implementation inside \zfit{} is not yet there.
\end{description}

Most of these current shortcomings have been foreseen and make it into the 
idea of \zfit{} to become a stable library; a clean implementation with a 
minimal maintenance effort is preferred over quantitative content. 
Additionally, the flexibility and available base classes
allow the 
user to add these features on top of \zfit{} as they are required.

Another future challenge is provided by a significant change of the backend. 
TensorFlow 2.0 is currently in beta stage and expected to 
appear somewhere during Summer/Fall of 2019. A complete restructuring of the 
library is expected, including a lot of clean up. While a lot will remain the 
same, some work will be needed to adjust \zfit{} to it.

In summary, \zfit{} started not only filling an open gap in the HEP Python 
ecosystem but also extends its functionality through the formalisation and 
flexibility far beyond 
of what traditional fitting frameworks are able to do.
While still under heavy development, the current library is already 
well suitable for a diversity of simple to advanced analysis.