The H →ZZ→4l channel was one of the of the main contributors to the Higgs boson discovery and has been fundamental for the measurements of its properties, such as mass, width and spin-parity. Even with the full LHC Run I and II datasets, these measurements of the properties of the Higgs boson are limited by the available statistics; this is the reason why it is so important to carry on with the analysis of the Run III data. The increased amount of data, expected from Run III, will allow the Higgs boson properties to be measured with better precision, which is something of particular concern since deviations from SM predictions would constitute evidence of new physics. The goal of this work is to develop machine learning algorithms to improve the efficiency and background suppression of lepton identification in the 𝐻 →4l decay channel. The background in this final state is of two kinds. Background events containing four genuine leptons, such as those deriving from non-resonant ZZ production, are called "irreducible" in the sense that they cannot be suppressed by lepton identification alone. Events where the 4l final state is "faked" by the presence of in-flight decays of light mesons within jets, heavy flavor hadron decays and charged hadrons overlapping with neutral-pions decays misidentified as leptons, can be suppressed with proper lepton identification criteria (ID), and are therefore referred as "reducible". While the amount of irreducible background affecting a given selection criterion can be estimated from Monte Carlo samples, this cannot be done efficiently and reliably for reducible background, given that fakes can appear in different ways and in a multitude of processes which correspond to a huge cross section compared to the 𝐻 →4l one. Reducible background is therefore estimated from the data itself, with complex techniques that rely on extrapolation from control regions orthogonal to the signal definition. One of the difficulty of improving lepton identification to suppress reducible background is that repeating these data-driven estimation for several identification techniques to be tested is complex and time consuming. During my master thesis I focused on this problem and developed a simpler method to compare the relative, per 4l event, performance of different lepton identification criteria in terms of efficiency and background reduction. The idea behind this technique is that reducible background is expected to appear as an excess of data with respect to the yield predicted by irreducible background Monte Carlo samples alone. This excess can be estimated from the sidebands around the Higgs boson peak in the 4l invariant mass distribution, so that it is not influenced by the presence of the signal. As a use case, I tested this new approach to different muon identification criteria available in CMS. One striking result of these studies is that multivariate discriminators (MVA) significantly outperform the cut-based ID criteria so far used in 𝐻 →4l analysis, with a reduction of reducible background of up to 45% for the same signal efficiency. These studies, based on MVAs developed for different physics cases, suggest that a very significant improvement in the signal purity would be achievable in 𝐻 →4l with a dedicated multivariate discriminant for lepton identification.

Studio dello stato finale H->ZZ->4l per il Run III di CMS

PICCO, VALENTINA
2020/2021

Abstract

The H →ZZ→4l channel was one of the of the main contributors to the Higgs boson discovery and has been fundamental for the measurements of its properties, such as mass, width and spin-parity. Even with the full LHC Run I and II datasets, these measurements of the properties of the Higgs boson are limited by the available statistics; this is the reason why it is so important to carry on with the analysis of the Run III data. The increased amount of data, expected from Run III, will allow the Higgs boson properties to be measured with better precision, which is something of particular concern since deviations from SM predictions would constitute evidence of new physics. The goal of this work is to develop machine learning algorithms to improve the efficiency and background suppression of lepton identification in the 𝐻 →4l decay channel. The background in this final state is of two kinds. Background events containing four genuine leptons, such as those deriving from non-resonant ZZ production, are called "irreducible" in the sense that they cannot be suppressed by lepton identification alone. Events where the 4l final state is "faked" by the presence of in-flight decays of light mesons within jets, heavy flavor hadron decays and charged hadrons overlapping with neutral-pions decays misidentified as leptons, can be suppressed with proper lepton identification criteria (ID), and are therefore referred as "reducible". While the amount of irreducible background affecting a given selection criterion can be estimated from Monte Carlo samples, this cannot be done efficiently and reliably for reducible background, given that fakes can appear in different ways and in a multitude of processes which correspond to a huge cross section compared to the 𝐻 →4l one. Reducible background is therefore estimated from the data itself, with complex techniques that rely on extrapolation from control regions orthogonal to the signal definition. One of the difficulty of improving lepton identification to suppress reducible background is that repeating these data-driven estimation for several identification techniques to be tested is complex and time consuming. During my master thesis I focused on this problem and developed a simpler method to compare the relative, per 4l event, performance of different lepton identification criteria in terms of efficiency and background reduction. The idea behind this technique is that reducible background is expected to appear as an excess of data with respect to the yield predicted by irreducible background Monte Carlo samples alone. This excess can be estimated from the sidebands around the Higgs boson peak in the 4l invariant mass distribution, so that it is not influenced by the presence of the signal. As a use case, I tested this new approach to different muon identification criteria available in CMS. One striking result of these studies is that multivariate discriminators (MVA) significantly outperform the cut-based ID criteria so far used in 𝐻 →4l analysis, with a reduction of reducible background of up to 45% for the same signal efficiency. These studies, based on MVAs developed for different physics cases, suggest that a very significant improvement in the signal purity would be achievable in 𝐻 →4l with a dedicated multivariate discriminant for lepton identification.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
844747_tesi_piccovalentina.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 7.47 MB
Formato Adobe PDF
7.47 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/69964