\begin{frame}[c]
$\tau \to \mu\mu\mu$ in LHCb 
			Marcin Chrząszcz



%		\footnotesize\textcolor{gray}{With N. Serra, B. Storaci\\Thanks to the theory support from M. Shaposhnikov, D. Gorbunov}\normalsize\\
	Kaggle Seminar, San Francisco
August 21, 2015
\begin{frame}\frametitle{Lepton Flavour/Number Violation}
 Lepton Flavour Violation(LFV):


After $\Pmuon$ was discovered (1936) it was natural to think of it as an excited $\Pelectron$.
\item Expected: $B(\mu\to\Pe\gamma) \approx  10^{-4}$
\item Unless there is a nother $\Pnu$.


"Who ordered that?"


\item Up to this day charged LFV is being searched for in various decay modes.
\item LFV was already found in neutrino sector (oscillations).


 Lepton Number Violation (LNV) %(see J. Harrison \href{}{\color{blue}talk})

\item Even with LFV, lepton number can be a conserved quantity. 
\item Many NP models predict LNV (Majorana neutrinos)
\item LNV searched in s-called neutrinoless double $\beta$ decays.




  % \textref{M.Chrz\k{a}szcz 2014}

        \frametitle{Status of searches for $\color{white} \tau \to \mu \mu \mu$}


                \item Charged Lepton Flavour Violation process.
                \item The Standard Model contribution: penguin diagram with neutrino oscillation
              %  \item SM prediction is beyond experimental reach~$O(10^{-40})$.
            \begin{alertblock}{Current limits ($ \color{white} 90\,\%$ CL)}

                \item[BaBar] $3.3\times 10^{-8}$
                \item[Belle] $2.1\times 10^{-8}$
              \item[SM] $ O(10^{-40})$
                \item[var.\ SUSY] $10^{-10}$
                \item[non universal $\PZprime$] $10^{-8}$
                \item[mSUGRA+seesaw] $10^{-9}$
                \item[and many more...]

        \frametitle{$\tau$ production}
          \item $\Ptau$'s in LHCb come from five main sources:
\begin{tabular}{| c | c | c | }
  Mode & $7~\TeV$ & $8~\TeV$ \\ \hline
  Prompt $\PDs\to\Ptau$  & $71.1\pm3.0\,\%$ & $72.4\pm2.7\,\%$ \\
  Prompt $\PDplus\to\Ptau$  & $4.1\pm0.8\,\%$  & $4.2\pm0.7\,\%$ \\
  Non-prompt $\PDs\to\Ptau$ & $9.0\pm2.0\,\%$ & $8.5\pm1.7\,\%$ \\
  Non-prompt $\PDplus\to\Ptau$ &  $0.18\pm0.04\,\%$  & $0.17\pm0.04\,\%$ \\
  $X_{\Pbottom}\to\Ptau$   & $15.5\pm2.7\,\%$  & $14.7\pm2.3\,\%$ \\ \hline

          \item Pythia produces them in wrong propotions
          \item Channels were produced seperatly and added in the given proporitons.


            \item There is no measurement of $\mathcal{B}(\PDplus\to\Ptau)$.
            \item One can calculate it from: $\mathcal{B}(\PDplus\to\Pmu\Pnum)$ + helicity suppression + phase space, \texttt{hep-ex:0604043}.
            \item $\mathcal{B}(\PDplus\to\Ptau\Pnut)=(1.0\pm0.1) \times10^{-3}$.

        \frametitle{Signal and background discrimination}
       \item Two multivariate classifiers, $\mathcal{M}_{3body}$ and $\mathcal{M_{PID}}$.
  		\item $\mathcal{M}_{3body}$ trained using vertex and track fit quality, vertex displacement, vertex pointing, vertex isolation and $\Ptau$ $p_T$.
  		\item Used Blending Technique (see the next slide).
   %     \includegraphics[width=.95\textwidth]{m3body_2012.pdf}
\item Trained on signal and background MC.
\item Calibrated on $\PDs \to \Pphi(\mu\mu) \Ppi$ sample.

  \frametitle{Blending technique}

\item Each of the $\Ptau$ lepton production channel have a different signature in terms of kinematic distributions.
\item Signal blending technique improved the discriminating power by $6~\%$


          \item Assume all differences between $\Ptau\to\Pmu\Pmu\Pmu$ and $\PDs\to\Pphi\Ppi$ come from kinematics (mass, resonance, decay time), which is correct in MC.
          \item Get correction $\PDs \Longrightarrow   \Ptau$ from MC.
          \item Apply corrections to $\PDs\to\Pphi\Ppi$ on data.
        \item Publication in preparation.

              \item $\PDs\to\Pphi\Ppi$ decay well modelled in MC.\\
                        %       \item[$\rightarrow$] i.e.\ also badly pointing non-prompt $\PDs$


       \frametitle{Relative normalisation}
       $\boxed{\mathcal{B}(\Ptau\to\Pmu\Pmu\Pmu) = \frac{\mathcal{B}(\PDs\to\Pphi\Ppi)}{\mathcal{B}(\PDs\to\Ptau\Pnut)} \times f_{\PDs}^{\Ptau} \times \frac{\varepsilon_\text{norm}    }{\varepsilon_\text{sig}     }  \times \frac{N_\text{sig}}{N_\text{norm}} = \alpha\times N_\text{sig}}$
           \item where $\varepsilon$ stands for trigger, reconstruction, selection efficiency.
          \item $f_{\PDs}^{\Ptau}$ is the fraction of $\Ptau$ coming from $\PDs$.
           \item $\text{norm}$ = normalisation channel $\PDs\to\Pphi\Ppi$
                        \newline i.e.\ $(83\pm3)\,\%$ for 2012 data.

   \begin{frame}   \frametitle{Remaining backgrounds}
            \item Fit exponential to invariant mass spectrum in each likelihood bin.
            \item Don't use the $\pm 30~\MeV$ region.
           % \item[$\rightarrow$] Compatible results blinding only $\pm \unit{20}{\MeV}$\footnote{partially used in classifier development}
          Example of most sensitive regions in 2011 and 2012




$\color{red}4.6~(5.0)\times 10^{-8}$ at $90\%$ CL\\
$\color{pink}5.6~(6.1)\times 10^{-8}$ at $95\%$ CL\\  


        \frametitle{Why are we not putting the mass in the classifier?}

$\Rrightarrow$ Why don't we put mass in the classifier?\\
$\rightrightarrows$ Many reasons:\\
\item Our normalization channel is in different mass range!
\item Mass resolution is wrongly modelled in MC.
\item Easily to interpret:

\includegraphics[angle=-90, width=0.95\textwidth]{{images/10500_11000_y_bin_2_4.5}.pdf}




        \frametitle{Data agreement check, why do we bother?}
$\Rrightarrow$ It all boils down to our equation:
 $\boxed{\mathcal{B}(\Ptau\to\Pmu\Pmu\Pmu) = \frac{\mathcal{B}(\PDs\to\Pphi\Ppi)}{\mathcal{B}(\PDs\to\Ptau\Pnut)} \times f_{\PDs}^{\Ptau} \times \frac{\varepsilon_\text{norm}    }{\varepsilon_\text{sig}     }  \times \frac{N_\text{sig}}{N_\text{norm}} = \alpha\times N_\text{sig}}$\\{~}\\
There are 3 variables that we need to terminate: $\varepsilon_\text{sig}$, $\varepsilon_\text{norm}$ and $N_\text{norm}$.
\item $\varepsilon_\text{norm}$; determine from data, by a cut and count method.
\item $N_\text{norm}$; determined from data by a simple fit.
\item $\varepsilon_\text{sig}$; calibrated on data: 
\varepsilon_\text{sig}=\varepsilon_\text{sig}^\text{MC} \frac{\varepsilon_\text{norm}^\text{DATA}}{\varepsilon_\text{norm}^\text{MC}}
The hack that is used here is: $\varepsilon_\text{sig}$ is ok, but $N_\text{norm}$ is smaller, so alpha is bigger $\Rightarrow$ worse sensitivity.


\begin{frame}\frametitle{Wrap up}

\item Physics has a different application of ML than computer science.  
\item There are physics consequance of what you use!                    
\item Blindly taking all varaibles is the bad solution.                 







