Master_thesis/thesis/data.tex at 2617c12f6519c000939f9b11709357bf8c693919

Fork: 0
saslie / Master_thesis
Find file
Newer
Older
Master_thesis / thesis / data.tex
Sascha Liechti on 30 Sep 2019 2 KB Minor updates
Raw Blame History
\subsection{Data handling}
\label{sec:data}

Data to fit can come from different sources but it should be 
handled uniformly inside \zfit{}. To ease this, the \zdata{} object is
responsible for loading, ordering and simple preprocessing of data, which can 
have weights assigned to it. Furthermore, this abstraction layer with \zdata{} 
potentially allows for more advanced use cases such as batched, out-of-core 
computations of the likelihood.

The \zdata{} class supports a variety of data files and structures. While 
adding 
additional loading capabilities is  not difficult, the focus is on the 
following formats: the default HEP format \root\footnote{Even though \root 
files are supported, the \root library is not needed thanks to the uproot 
package as explained in Appendix \ref{appendix:data formats}.}, Numpy arrays 
and 
Pandas DataFrames, and pure Tensors. More details can be found in Appendix 
\ref{appendix:data formats}

Each \zdata{} has a \zspace{} with observables it is defined in. This assigns 
an observable to each column of the data, so the observables 
here act like columns from spreadsheets or DataFrames. This allows to retrieve a
subset or 
different ordering of the \zdata{} by specifying the observables explicitly in 
the method that returns the data as a Tensor.

Once instantiated, a \zdata{} object appears like a lightweight wrap of the 
Tensor class and 
can be directly used as such. It is possible therefore to simply operate on a 
\zdata{} object with any operation that 
would also accept a pure Tensor. While this is convenient for certain contexts 
where the correct ordering of the data is guaranteed such as inside a model, 
the preferred way of using the \zdata{} is to access the columns by names using 
the \pyth{unstack_x} method. The
\zdata{} class can also handle data generated on the fly and not fitting into 
memory, see \ref{appendix:data batching}.

\zdata{} objects can be ordered in-place as opposed to \zspace{}, the 
reordering of which returns 
a new instance. This is 
heavily used together with a context manager inside models if a \zdata{} is 
given as an argument in order to match the order 
of its observables with the order of the models observables.