rnn_bachelor_thesis/Report/06_RNN_used.tex at fcef3eea21d09c3c9fc12c743deeac8b35a26d63

Fork: 0
saslie / rnn_bachelor_thesis
Find file
Newer
Older
rnn_bachelor_thesis / Report / 06_RNN_used.tex
Sascha Liechti on 8 Aug 2018 3 KB Finished Report
Raw Blame History
\section{RNN's used}

\subsection{RNN for track prediction}

The first RNN had the task to predict the positions of the recurled 4 hits. As input, the 4 hits of an outgoing particle are used.

\begin{figure}[h]
\begin{center}
\includegraphics[width=1\textwidth]{img/RNN-Pred-Arch.png}
\caption{RNN Prediction architecture}
\label{RNN_pr_arch}
\end{center}
\end{figure}

\newpage
Figure \ref{RNN_pr_arch} shows the architecture used for the RNN track prediction. It is a one directional RNN with following layout for its layers:

\begin{itemize}
\item[1. Layer:] 50 LSTM cells
\item[2. Layer:] 50 LSTM cells
\item[3. Layer:] Dense layer (50 cells)\footnote{Dense layer cells are basically just basic NN cells as explained in section \ref{ML_Intro}}
\item[4. Layer:] Dense layer (12 cells)
\end{itemize} 

The optimal number of layers, cells and cell-type was found by systematically comparing RNN's that are equal besides one property (e.g. Using GRU's instead of LSTM cells). Also, all the activation functions were chosen to be selu's.\\
The loss and metric function used were the mean squared error(mse), as this had the most similarity with an euclidian distance. The model itself was trained by an Adam algorithm.\\
The output was a 12 dimensional vector of the shape: $(x_5, y_5, z_5, x_6, y_6, z_6, ..., z_8)$. Note that the numeration starts with 5 as the 5$^\text{th}$ hit of the track is the first one to be predicted.

\subsection{RNN for classification of tracks}

The second RNN was used a classifier to find the right tracks. As already described in section \ref{dataset2}, the input data was of shape $(batchsize, 4, 4)$ with $(\Delta x_i, \Delta y_i, \Delta z_i, \chi^2)_{\text{at step i}}$.\\

Where:

\begin{itemize}
\item $\Delta x_i = x_{i,\text{preselected}} - x_{i,\text{predicted}}$, the difference between the by the original tracking algorithm preselected track and the by the RNN predicted one
\item $\Delta y_i, \Delta z_i$ same as for $\Delta x_i$
\item Value of the $\chi^2$ fit
\end{itemize}

The output was then just a one dimensional vector, where $1$ stands for a true track and $0$ stands for a false track. The RNN itself is going to predict a number between $0$ and $1$, which can be interpreted as amount of confidence, that it is a true track.

\begin{figure}[H]
\begin{center}
\includegraphics[width=0.75\textwidth]{img/RNN-Classifier-Arch.png}
\caption{RNN classifier architecture}
\label{RNN_cl_arch}
\end{center}
\end{figure}

The RNN for the classification was chosen to be bidirectional and as in the RNN before LSTM cells were used. Here, a tanh was used for all the activation functions, besides the last one. The last layer uses a softmax activation function\footnote{Similar to a tanh but bounded between [0,1]}. As tanh doesn't automatically do batch normalization, between every layer of cells a batch normalization layer was added.\\
The layout of the layer was as follows:

\begin{itemize}
\item[1. Layer:] 30 LSTM cells (bidirectional, batch normalization)
\item[2. Layer:] 30 LSTM cells (bidirectional, batch normalization)
\item[3. Layer:] 30 LSTM cells (bidirectional, batch normalization)
\item[4. Layer:] Dense layer (50 cells, batch normalization)
\item[5. Layer:] Dense layer (1 cell, softmax activation function)
\end{itemize} 

The optimal number of layers, cells and cell-type was found by systematically comparing different RNN architectures. Also, it is important to note that the  second RNN is directly dependant of the first RNN. When changing the first RNN one would also have to retrain the second.