Newer
Older
rnn_bachelor_thesis / Report / 07_Analysis.tex
\section{Results}

\subsection{Best $\chi^2$}

The most simple version to try to classify which one is the right path out of the preselection would be to just take the path with the smallest $\chi^2$. Like this, we would choose the path that agrees the most with the track reconstructing algorithm that gives us our preselection. However, as already mentioned, in dataset 2 only around $75\%$ of the events even have the true track among the ones preselected by the reconstruction\footnote{E.g. by not having all 8 hits as a result of detector efficiency (searches for 8 hits)}. In this case we would have to label all the tracks as false tracks. By simply choosing the best $\chi^2$ we don't account for this at all. So, by default our maximum accuracy would be around $75\%$ if the true track would really always just be the one with the best $\chi^2$.\\

It turns out the accuracy of this method is only at $52.01\%$. So, there is a need for better algorithms to classify this problem.

\subsection{RNN classifier with RNN track prediction input}

The RNN's that we put in sequence (first track prediction then classification) are a much more complex model. When trained, they were able to label all the tracks right with an accuracy of around $87.63\%$. Note that the $75\%$ limit of always choosing one track for every event was exceeded\footnote{Usually the one that is considered the best by the corresponding algorithm}.\\

\begin{figure}[H]
\begin{center}
\begin{subfigure}{0.8\textwidth}
\includegraphics[width=1\textwidth]{img/RNN_tf-ft_hist.png}
\caption{Number of false positives and false negatives depending cut}
\label{RNN_tp_fp_hist}
\end{subfigure}
\begin{subfigure}{0.8\textwidth}
\includegraphics[width=1\textwidth]{img/RNN_ROC-curve.png}
\caption{ROC curve for the RNN model}
\label{RNN_ROC}
\end{subfigure}
\caption{XGBoost classifier figures}
\end{center}
\end{figure}

As shown in figure \ref{RNN_tp_fp_hist}, depending on where we apply the cut, we have a changing number of false positives and false negatives. In figure \ref{RNN_tp_fp_hist}, the blue bins are false positives and the orange bins are false negatives. Depending on what is more important for the experiment\footnote{E.g. all positives have to be correct $\rightarrow$ increase cut}. One can also qualitatively judge the performance here, as in the optimal case all the false positives would gather at the area where the cut goes to $0$. Analogously, we want all the false negatives to gather at the cut around $1$. Here we see that this is fulfilled really well. So, already by this graph we see that the system will perform well.\\

Figure \ref{RNN_ROC} shows the ROC curve \cite{ML:ROC_AUC:Bradley:1997:UAU:1746432.1746434} of the RNN classifier. Generally, the more area under the ROC curve the better the classifier. In the perfect case, where everything gets labelled $100\%$ correctly, the area under the curve(ROC AUC) would be $1$ and random guessing would be around $0.5$. Here, we have an area of $0.93$. This is already really close to the optimal case. 

\subsection{XGBoost}

Also an XGBoost classifier\footnote{Depth = 3 and number of estimators =3} was implemented and trained to have some more comparison to the performance of our RNN classification. XGBoost models train much faster than NN and are often a serious competitor to them as often they reach similar performances. Based on that, they are often used as baselines for RNN classifiers and a RNN classifier is considered good if they surpass the XGBoost model. The input of XGBoost model was the same as for the RNN classification. The accuracy of this classifier of labelling the tracks was at $80.74\%$ with a cut applied at $0.5$. Note that here we also exceeded the $75\%$ even though with a smaller accuracy than the RNN.

\begin{figure}[H]
\begin{center}
\begin{subfigure}{0.8\textwidth}
\includegraphics[width=1\textwidth]{img/XGB_tf-ft_hist.png}
\caption{Number of false positives and false negatives depending cut}
\label{XGB_tp_fp_hist}
\end{subfigure}
\begin{subfigure}{0.8\textwidth}
\includegraphics[width=1\textwidth]{img/XGB_ROC-curve.png}
\caption{ROC curve for the XGBoost model}
\label{XGB_ROC}
\end{subfigure}
\caption{XGBoost classifier figures}
\end{center}
\end{figure}

In figure \ref{XGB_tp_fp_hist} the blue bins are false positives and the orange bins are false negatives. Here we see that the bins are more evenly spread and gather less at the edges. So, already qualitatively we can guess that it will perform worse than our RNN's.\\

Figure \ref{XGB_ROC} shows the ROC curve of the XGB classifier. Generally, the more area under the ROC curve the better the classifier. In the perfect case, where everything gets labelled $100\%$ correctly, the area under the curve would be 1. Here we have an area of $0.88$.\\

\subsection{Comparison in performance of the RNN and XGBoost}

The RNN classifier performs with around $6\%$ better accuracy than the XGBoost classifier. Also, by comparing the the ROC curves in figure \ref{ROC_RNN_XGB}, one can clearly see that the area under the RNN ROC curve is bigger. In numbers we have around $0.05>$ more area under the curve for the RNN model. The RNN classifier performs significantly better in labelling the 8 hit tracks than the XGBoost model.

\begin{figure}[H]
\begin{center}
\includegraphics[width=0.8\textwidth]{img/RNN-XGB_ROC-curve_comparison.png}
\caption{Comparison ROC curves of RNN and XGBoost model}
\label{ROC_RNN_XGB}
\end{center}
\end{figure}
\newpage

\section{Results}

\subsection{Results}

The RNN models perform significantly better at labelling the 8 hit tracks than all other classifiers and methods.\\

\begin{tabular}{c | c c}
Model & Accuracy with cut at $0.5$ $[\%]$ & ROC AUC \\ \hline
Best $\chi^2$ & $52.01\%$ & / \\
XGBoost & $80.74$ & 0.88 \\
RNN & $87.63\%$ & 0.93
\end{tabular}\\

Using this system of RNN's proves to be viable solution to this problem and brings a huge jump in accuracy also over other machine learning solutions.

\subsection{Outlook and potential}

Where do we want to go from here? One way to improve the algorithm would for example be to create a fully connected neural network \cite{gent1992special}. By doing this both RNN's would be connected and would train as a unit. This would have the positive effect of not having to retrain the classifying RNN as well whenever the first on gets changed. \\
Another goal could be to make this type of RNN appliable to more types of problems. So for example, instead of being restricted to tracks of a specific length (here eight hits) one could make it more general to be able to deal with an arbitrary length of the track. This would be especially useful for this experiment, as a lot of particles don't just recurl once but many times over (in the central station). Hereby, they are creating a lot of background, which minimalizing is crucial to reach our desired sensitivity of $10^{-16}$.\\
The ultimate goal however, would be to replace the current track reconstruction algorithm altogether and put a RNN in its place. This could for example be done by an RNN performing beam search\footnote{Both inside out and outside in} \cite{graves2013speech} to find the true track of a particle. In other areas, beam search has proven to be a powerful tool and there is a lot of potential for this sort of algorithm in physics as well, especially in track reconstruction