Explorations: The UC Davis Undergraduate Research Journal

Image: .pdf icon. Print | Table of contents

Resolution of Boosted Top Quark Structure by Fitting to Distributions of Particle Energy

David Nisson

Abstract

The Particle Flow algorithm developed by the Compact Muon Solenoid (CMS) collaboration obtains the individual decay products in a particle physics event recorded in the CMS detector at the Large Hadron Collider (LHC). This collider may produce a highly massive particle, on the order of 1 TeV or more, that decays to a pair of top and antitop quarks, with masses of about 0.17 TeV. Because of the low mass of these quarks compared with the mass of the original particle, their decay products would overlap and be somewhat difficult to distinguish from ordinary dijet events by conventional means. Here we present a novel approach based on hypothesis testing and fitting to the energy distribution of particles, and show the result of using this approach on the Particle Flow output for top decays by Z’ bosons. The approach appears to resolve the locations of the jets for boosted top events to within 0.03 in ΔR space; however, it misses approximately 10-20% of the energy on average, has a relative uncertainty above 10% for boosted top events, and does not seem to resolve the W boson mass in the top events. Nevertheless, it shows promise for competitive performance if the energy can be corrected.

Introduction

The Large Hadron Collider is, at the time of writing, the highest energy collider in the world, having successfully achieved collisions with a center-of-mass energy of 7 TeV (1). The LHC collides the beams head-on, which means that the mass of the system of colliding particles is about 7 TeV. Therefore such collisions would theoretically be able to produce extremely massive particles, on the order of 1-5 TeV. Such masses are hypothesized for a number of exotic new particles, including the Z’ boson predicted to couple to top quarks in various theories (2). The top-antitop events produced by such decays differ from ordinary top quark production in that the mass of the particle producing the quarks is much greater than the top mass of 170 GeV (3).

(1)

The energy of a particle is given by its momentum and mass by Equation 1. Thus, a 2 TeV Z’ at rest will have enough energy to decay to two 170 GeV top quarks, each having a momentum of about 1 TeV. Because of this high momentum-to-mass ratio, when the top quark decays, the products will be very close together in angle, as illustrated in Figure 1. This opens up the possibility that two of these jets will overlap, so that they cannot be resolved as easily by algorithms such as the Cambridge-Aachen clustering algorithm. A novel approach is needed to resolve such jets. The top quark decays to three different products (see equation 2).

(2)

Figure 1.  A highly massive Z’ decaying to boosted top quarks, with the jets overlapping when they reach the detectors. This is the type of event that we are interested in trying to distinguish from a dijet event and to detect the constituents of, as the top quarks further decay into three other quarks which are very close together.

At the Compact Muon Solenoid experiment at the LHC, an algorithm has been developed that uses the raw detector information to determine the “particle flow”, the final state particles that produced the signals in the detector for that event (4). Normally, top quarks are identified by applying clustering algorithms such as the Cambridge-Aachen algorithm to the particle flow output.

When the jets are very close together and are not resolved by clustering, computer algorithms will often tag the three jets of the boosted top as a single jet. However, any time protons collide they produce events called QCD events, in which the collision causes production of a quark and its antiquark (a dijet event). Sometimes one of the quarks will even radiate a gluon, producing a third jet (a multijet event). These events are also tagged as having one jet. With a collision energy of 7 TeV, it is possible for the invariant mass of the two quarks to match that of a Z’, so this presents a difficulty when tagging boosted top quarks by conventional clustering means.

The approach we take in the Jet Flow algorithm is to use the shape of the energy distribution to distinguish the jets without trying to discern which particle belongs to which jet. Our approach is based on surface fitting and hypothesis testing. Instead of trying to group particles into jets, we test different models, with increasing numbers of jets, to see if they have only the expected deviation in energy. If they deviate more than expected, additional jets are added to the model. This process is repeated until the model is a “good” fit for the energy distribution. The concept is illustrated in Figure 2.

Figure 2. Illustration of the concept of fitting to jets. Initially, a given energy distribution is tested by fitting to one jet. A chi-square function is calculated from the deviations in each data point from the actual energy distribution of the model with one jet. If it is deviating too much, we then add a second jet and see if that brings deviations within the expected range. In this case, the test with one Gaussian failed, but the two-Gaussian model succeeded, resulting in the resolution of two jets which would have otherwise been difficult to distinguish.

The reason behind this approach is that conventional clustering algorithms, when they see events like the one above, necessarily group the particles in the larger jet together, even if they actually “belong” to the smaller jets. Thus, the smaller jet particles are left with a low energy and “pushed” away from the location due to the closer particles being used in the larger jet instead. But a fitting algorithm doesn’t have this problem; it doesn’t care about which particle belongs to which jet. It is simply interested in the shape of the distribution. Therefore, it is expected to resolve the jets better than conventional clustering algorithms.

How do we quantify how “good” each of these models is? We use a function called a “chi-square” function. This function is essentially a weighted sum of the squares of the deviations in the data points:

(2)

Here σ is the expected standard deviation from the modeled value for the data point. Since σ2 is simply the average over all of the data points, this formula is equivalent to summing over numbers divided by their expected value, or adding 1 to itself a certain number of times. This number of times is called the number of “degrees of freedom”. If none of the parameters can be varied, then it is obvious that multiplying 1 by the number of data points will produce the number of data points, so the chi-square should approximate that number. However, if we can vary one of the parameters in the model, then if we minimize the chi-square with respect to one of them, we will always get zero if we only have one data point. Thus, our number of degrees of freedom in that case will be equal to (number of data points – 1). In general, degrees of freedom = data points – free parameters.

Once we have the number of degrees of freedom, we then have a way of finding out how good the fit is using the chi-square. In our case, we minimize our model of jets according to the jet energy and angles with respect to beam axis and to line pointing to floor. We call the latter angle phi and the former angle theta, and also to the width of the jet so that we are certain that our deviations are not just due to a mismatched jet size. (Actually, in our case we use the pseudorapidity, eta, to represent the angle to the beam, but the idea is still the same.) This totals to four degrees of freedom per jet.

To obtain the energy distribution, we put the energy of the Particle Flow particles into “bins” of a two-dimensional histogram in eta and phi. Our data points are thus the energies in the bins, and the number of data points will be the number of bins. Our number of degrees of freedom will therefore be (number of bins – 4*number of jets), if we use all of the bins in the histogram. The chi-square of the fit to all of the bins should be on the order of the number of degrees of freedom. Therefore, if we have the number of bins and the number of jets in the model being tested, we have enough information to determine whether the fit has more deviation than expected.

The goal of this algorithm is to resolve jets to the point where the algorithm can improve efficiency in tagging boosted top events. This is important because such events would be indicators of a highly massive particle that would certainly be exotic, as no particles have yet been found with masses greater than 170 GeV; the most massive particle found as of yet is the top quark. Such highly massive exotic particles could support theories based on the idea of space-time having more than three spatial dimensions and one temporal dimension.

Overview of Algorithm

jetflow_flowchart.gif
Text Box: Figure 3. Flowchart of the main Jet Flow algorithmic procedure, including loops over events and over clustered jets found in the event. The algorithm resolves jets by fitting models to them, to reduce the amount by which the reconstructed jet deviates from the true jet, and splits jets by testing the model to see how well it fits.

Figure 3. Flowchart of the main Jet Flow algorithmic procedure, including loops over events and over clustered jets found in the event. The algorithm resolves jets by fitting models to them, to reduce the amount by which the reconstructed jet deviates from the true jet, and splits jets by testing the model to see how well it fits.

The tagging algorithm we present in Figure 3, which we will call "Jet Flow”, is based on hypothesis testing by fitting mathematical models to data. It systematically tests hypothetical models for a given energy distribution, based on increasing numbers of jets, until a “good” fit is found. Jet Flow starts by filling a two-dimensional histogram in eta, phi space with the directions of the particles, weighted by the energy. Jet Flow then tests models of increasing numbers of jets, by fitting them to this histogram. For a given model, the chi-square is then used to judge whether the energy distribution is likely to model the real one. If it is, the parameters of the fit are then the direction and energy of the reconstructed jets. If not, then additional Gaussians are added as needed until the correct number of jets is found. Thus, even if the jets are too close together to resolve by clustering, the algorithm will still resolve the jets by the shape of the distribution.

Histogram Filling

If the histogram is filled in the conventional way, where the energy is simply added to the appropriate bin, the result is a distribution with scattered large “spikes” of varying energy where particles are located and almost no energy between them. This is a rather difficult type of distribution to fit to a smooth model, such as the Gaussian model we are using, as the locations of the particles are not predictable. For this reason, our filling actually consists, for each particle, of filling every bin with a Gaussian distribution of energy centered on the particle. Although this process destroys the information about the energy and location of each individual particle, it does not destroy the overall shape as long as the Gaussians added to the histogram are not much wider than the actual jets. However, it makes the distribution continuous, enabling the fitting of a continuous multi-Gaussian model.

Fitting to the energy distribution

The most important part of this algorithm comes after the histogram is made: the fitting. First, an initial model is chosen by finding the peaks of the distribution. The locations of these peaks are used to initialize the jet locations, and the amplitudes are converted to an energy estimate proportional to the amplitude and used to initialize the energies. A numerical minimization method, such as the Nelder-Mead method or the MIGRAD method, is then used to minimize a chi-square function, which depends on the differences between the bins in the model and the bins in the actual histogram of jet energies. This chi-square function measures the probability that the model actually represents the data.

If the chi-square indicates a low probability, the model is subtracted, bin by bin, from the original histogram, and the parameters for new Gaussians are initialized by the peaks in this “difference” histogram. The model is then extended to include these new Gaussians and initialized with the previous fit parameters for the old Gaussians and the initial guess for the new Gaussian. The minimization is then repeated, and the process of subtraction and minimization is repeated until enough Gaussians are found to make a “good” chi-square. Once a good fit is found, the parameters of this final fit are then the jet parameters. If one Gaussian is subtracted from a distribution of two Gaussians, the resulting distribution will be peaked near the second Gaussian, so we distinguish jets even when the energy does not have a significant peak.

Initializing multiple jets based on peaks

For the purposes of this study, the model is initialized not necessarily with one Gaussian but often with multiple Gaussians, using the TSpectrum peak finder, a finder that looks not only for the absolute maximum energy but also for local maxima in the data above a certain percentage of the absolute maximum. This peak finder in itself could be used to find jets, but the chi-square and subtraction would still be helpful in that it will find jets that are so close together that they are not distinguishable simply by their maxima. The problem with using TSpectrum is that it may also find a background fluctuation, such as a radiated photon, and it may become falsely tagged as a jet. This would produce more jets than are actually in the event, as our study reveals. Nevertheless, when tagging boosted tops, this scheme will improve performance.

Parameterization

The Jet Flow algorithm has four tunable parameters. The first is the smearing of the histogram (here set to 0.05). Next, when we calculate the probability of a fit being “good”, we need to decide when to add another Gaussian.  The maximum probability value at which to do this is here set to about 0.1. Third, we have the size of the histograms (here 1.0x1.0), and finally the number of histogram bins (here 30x30). The algorithm as it is parameterized in this study does a good job of finding the location in eta, phi space of the jets, but it does not resolve the energies of the jets as clearly. Thus, when determining the deviations from the true partons, we match the jets by choosing the matches that produce the shortest distances in eta, phi space. This is expected to be the most effective matching for reconstructed partons to true partons at this point, and therefore it should produce the most accurate results possible for the parton resolutions. Our jets seem to have widths near 0.05, so this is the smearing we chose. The probability cutoff choice is the most important, as it determines whether to add a new Gaussian. We need to do a more thorough study of the probabilities, but we feel that a cutoff of 0.1 is a good choice, since for the most part it would reject the events in which we are uninterested. We have chosen 1.0x1.0 histograms in eta, phi space, which is comparable to the CA cutoff that we are using, with 30x30 bins for the computation time, primarily so as not to limit this study.

Chi-square Function   

To fit a model to the distribution of energy, one needs some sort of chi-square function to be minimized by the algorithm. The chi-square also tests whether the model fits the actual energy in the histogram. In the standard chi-square test, the function is a sum of the ratios of  the bin deviations ,squared, to a standard deviation. This deviation can be anticipated in the bin energy for a particular bin. We obtained a chi-square function empirically by computing the deviations of the model from the true histograms in several intervals of energy. The chi-square function that we minimize is:

In principle we could sum over all bins of the histogram. However, as discussed in the section on goodness of fit, the bins well outside the Gaussian “cone” seem to have a lot of particles, probably due to electromagnetic radiation, that make large contributions to the chi-square function in the linear region, which would render the contributions of the actual jets negligible. For this reason, an attempt is made to ignore such cells by not including bins with less than a certain energy value, say 0.1 times the maximum energy. Future research will focus on the best level at which to apply such a cut, or whether ignoring bins is the best solution to the problem.

Obtaining the Sigma Function

The sigma function we use was obtained by minimizing an unweighted summed-square error initially and then calculating the deviations in bin energy for each bin in each histogram. The bin energy to maximum bin ratio in the interval between 0 and 1 was then sliced into 25 smaller subintervals. For each of these subintervals, the standard deviation was calculated. A mathematical function was then found that matches the data closely. This is the sigma function we used for the results in this report. The fit is simply an analytical function designed to approximate the data points and use them for our study; it does not represent any theoretical model of the deviations.

Because the bins deviate by the above function of energy, a good fit for the function would mean that the standard deviation of each point is equal to the weighting function, meaning that the contributions of each point to the chi-square will average near 1. For a “good” fit, the chisquare should thus approach the number of degrees of freedom. Normally, one would then use the standard chi-square probability function to get the exact probability, but in this case the deviations are not normally distributed about the modeled energy. Thus, the chi-square probability is obtained digitally from the number of degrees of freedom.

Distribution of Number of Jets

Figure 4. Distribution of number of jet found in final fit, for QCD dijets and for boosted tops. The dijet events peak at one, as expected, because there should only be one jet per histogram. The boosted top events peak at three, indicating that the fitter is correctly distinguishing the jets. Future explorations will include methods to make these distributions more sharply peaked.

The distribution of the number of jets found is shown in Figure 4, for QCD dijet events and for boosted top events with a 3000 GeV Z’ boson. The number of jets shown is the total number of jets that produced the good fit. If the initial fit was good, this is the total number of jets found by the local maxima in the distribution; the advantage of fitting in this case is to resolve more accurately the locations of the jets. There is a very clear distinction between the dijet and ttbar numbers of jets, with the dijet distribution peaking at 1 jet and the ttbar peaking at 3 jets, indicating that the algorithm is correctly resolving the substructure of the jets. The dijet events are expected to contain only one jet per histogram. The large number of histograms with two Gaussians is probably due to sensitivity to particles of radiation.

Parton and W and Top Mass Resolution before Jet Correction

Figure 5. Distribution of masses of pairs of jets found for 500 boosted top events. The mass here is the invariant mass of the pair of jets, assuming the masses of each individual jet to be zero. Since the decay sequence involves a real W boson which decays to two jets, the mass should have a fairly large spike at 81 GeV, the mass of the W boson. Instead, we see only a small bump there, which appears to be due to poor energy resolution.

The histogram in Figure 5 is the distribution of the invariant mass of each pair of jets found, regardless of the final number of jets that produced a good fit. Since the W boson decays to a pair of quarks, these are expected to produce a pair of jets. Thus, there should be a relatively large number of pair masses around 81 GeV, the mass of the W boson. Instead, we see only a small (but not insignificant) bump there. We also see an unexpectedly large number of invariant masses near zero. The only explanation for this is that two Gaussians represent the same jet. To resolve this problem, performing a “recombination step”, in which pairs of jets with less than a certain invariant mass are combined, may be a future exploration.

Figure 6. Distribution of invariant masses of groups of three reconstructed jets, from a sample of 500 boosted top events. The jets are again assumed to have zero individual mass. Again, the mass resolution is poor, probably for the same reason the W resolution is poor:  the large uncertainty and systematic bias in energy.

The top mass distribution in Figure 6 is the mass of each group of three Gaussians in each histogram, regardless of the number of jets found for that histogram. Because the top decays to a bottom and a W boson, the total number of final partons should be three, so there should be a large number of such “triplets” with three Gaussians. Again, we see a very large spread in the top mass. For this reason, and because of the poor W boson resolution, we explored the resolutions of the parton energy, eta, and phi. It turns out that, while the Jet Flow algorithm does well in resolving the locations of the jet in angular space, it does not resolve the total energy of the jet. The reason for this is unknown at the time of this writing. However, we may be able to correct for the initial poor energy resolution by investigating the dependence of the systematic bias on many different variables. Our current attempts to parameterize in terms of eta and the reconstructed energy have thus far failed to reconstruct the W mass, but there are known reasons that some improvements can be made.

Energy Resolution


Figure 7. Deviations in the true parton energy for 500 boosted top events (top) and for 10,000 dijet events (bottom). The true partons are obtained from the generator level information and matched to the reconstructed partons by the distance in eta, phi space of the reconstructed partons from the true partons. Note the larger uncertainty in energy for boosted tops than for QCD events; this may be related to differences in the parton energy.

Figure 7 shows the resolution of the parton energy for boosted top events with a 3000 GeV Z’ boson that decays to boosted tops, and for QCD dijet events with pT between 170 and 230 GeV. In both cases, a large amount of energy appears to be missing from the jets. The standard deviation is also very large, explaining the small probability of resolving the W mass against the background of pairs involving the bottom jet. Because the invariant mass is determined from the energies, determining the W mass requires these to be well resolved.  Our relative uncertainty in energy is around 20% of the energy, and the accuracy of our mass measurements, which are based in part on the energy, require that this figure be reduced.

Location resolution

 

Figure 8. Deviation from the true values of parton eta and phi of the reconstructed values, obtained from 500 boosted top events. There is a finite standard deviation of about 0.01, but it is not enough to account for our mass uncertainty. Nevertheless, it is a major advantage of our algorithm over other algorithms; these results are for a 3000 GeV Z’.

The main advantage of this algorithm over conventional clustering algorithms such as Cambridge-Aachen is that it can resolve the true locations of the constituent jets in eta, phi space even when they overlap. You can see this in the histograms of Figure 8; they are sharply peaked, with little histograms outside the very small range. The Jet Flow algorithm resolves the eta and phi locations of each jet too well to account for the poor mass resolution. Since a larger angle means a larger mass, an angle uncertainty of this size should not produce a large uncertainty in the mass. This is why we have decided to pursue correction of the jet energy, because if our algorithm can more accurately find the correct energies, it should therefore tag boosted tops better than conventional methods.

After current jet correction function

Figure 9. Shown here are the pair masses (which represent W mass resolution) after applying the current jet correction function we have now; note that the resolution is actually worse than before, with masses now skewed high, probably due to the jet correction function overcompensating because of the QCD radiation in the sample used to generate it. Again, this is from 500 boosted top events from a 3000 GeV Z’.

When the jet correction function we have now is applied to the jet energies found by fitting, we still cannot resolve the W mass. Our jet correction function actually overcompensates for the current bias. The reason is unknown. However, a function where the true parton matching requires jets to be within 0.1 in R space of the true jets should correct the bias; in this way, any radiation detected is more likely to be counted with the reconstructed jet rather than as two separate jets.

The plot in Figure 9 was made without including the subtraction step, thus eliminating the large number of events at zero mass. However, note that the masses are actually biased higher than they should be; the maximum should again be at 81 GeV. This result is probably because of the overcompensation for the bias in the energy deviation. In addition the overcompensation will produce a high bias for the top mass. We are still searching for a correction function that works for boosted top jets.

The preliminary correction function we applied overcompensates for the loss in energy, as can be seen in Figure 10., Therefore, this function does not seem to help resolve the W mass. In fact the mass resolution for the W is now even worse, as masses seem to be biased high, as a result of the high energy bias. Note that the peak, the location of the mean bias, is now well above 20 GeV—worse than our previous attempts. Future efforts will be directed toward a jet correction function that will increase the fit energy just enough to compensate for the bias, and not more.

Figure 10. Parton energy resolution after applying jet correction. Note that the function, by overcompensating, has made the bias opposite to the original bias.

Correction to energy

In order to correct for the missing energy discussed previously, we need to find the parameters that account for the missing energy.  To do this, we generate a jet correction function of the ratio of true to fit energy, which is based on averaging over “bins” the ratio of reconstructed to true energy as a function of reconstructed energy and width. We then apply this ratio to the energies of the jets. This was done for a 30,000 event sample for this thesis; more events will be used in the future. When this function is applied to the mass resolution of the W, it appears to overcompensate for the missing energy. Work to resolve this problem is underway.

Discussion

The conventional algorithms for jet resolution are based on grouping of particles into jets and therefore do not necessarily locate the particles correctly. By fitting we seem to be able to resolve the problem of jet locations. The algorithm also does a very good job at determining the number of partons, and therefore can tag boosted tops. However, the algorithm fails to reconstruct all of the energy of the original parton, as other algorithms do. For this reason, the W mass and top mass resolutions are actually poorer than those of other algorithms, and therefore the Jet Flow algorithm does not yet compete with other algorithms in tagging tops. However, the excellence in eta and phi resolutions shows promise for outperforming the algorithms currently in use, if we can get the energy resolution to within 5% of the true energies.

Accuracy of chi-square sigma function

The chi-square sigma function we use in this report is based on fits using an unweighted function of the sum of the squares of the errors. Thus, it may not be the best representation of the actual deviations. With more time, we could use our chi-square function to obtain a new chi-square function, and then repeat this process until we converged on a good function to use, which should be more or less constant with respect to these iterations. This poor chi-square function is probably the reason for at least some of the uncertainty in the energy resolution. In the future, we will investigate better chi-square functions.

Chisquare distribution and radiation

The chi-square vs. number of degrees of freedom was rather unexpected. Since the chi-square is designed to make contributions of each bin to a good fit that essentially equal 1, the chi-square should have been roughly proportional to the number of degrees of freedom. Instead it blows up at large numbers of degrees of freedom. This is probably due to bins with very low energies outside the Gaussian jet. If we were not ignoring bins below a certain energy value, these small contributions would add up and the contributions from the real jet errors would be insignificant. Here, the number of degrees of freedom is related to the number of bins in the histogram, which for larger numbers has a greater probability of including bins erroneously outside of the jet. Thus, these large contributions will start adding to the chi-square for large numbers of degrees of freedom. In the future, we may add a constant term in the sigma function rather than ignoring bins and creating different degrees of freedom for each histogram.

Distinguishing jets

Despite the mass resolution and energy setbacks, the Jet Flow algorithm seems to do a fairly good job in distinguishing the jets. The dijet number distribution peaks at 1, as expected, and the boosted top distribution peaks at three, as expected. Because new jets are added based on probability, the number of events with2, 3, and 4 jets should have decreased exponentially with the number of jets found. This would have been due to the probabilities of a “good” fit being regarded incorrectly as bad. However, we actually saw a very large number of dijet events with two Gaussians, in comparison with the single-jet histograms. This is probably due to the fact that the software used to find the maxima uses a fixed threshold relative to the maximum to decide whether a peak is high enough to be worthy of inclusion in the list of maxima found. There could be some background QED radiation or other fluctuations in the outside region of the histogram that could produce “false” jets.

It is also possible that a hard QCD radiation gluon could genuinely produce a second Gaussian in the histogram. In such a case, the number of jets is correct. Both this reason and the one discussed previously could contribute to the large number of dijet events observed with two Gaussians. However, the number of dijet histograms with three and four Gaussians is unexpectedly large. The three-Gaussian dijets may be a combination of hard QCD radiation and false maxima, but this does not account for the four-Gaussian dijets. One possible explanation for the four-Gaussian dijets is that two-Gaussian hard QCD jets register as bad fits and TSpectrum is finding multiple maxima after subtraction. Since the number of Gaussians is limited to four, total Gaussians above this limit will show up as four in Figure 4. To resolve these problems, a step may be included in the algorithm that determines whether a maximum found is significant compared with the expected deviations in the bin contents, and discards it if it is not

Energy and Mass Resolution and Bias

The poor W mass resolution appears to be related to the poor energy resolution, since our eta and phi resolution are too good to account for the W mass. This poor resolution was unexpected at first, since mere clustering seems to produce better results (5). However, we may still be able to correct for this missing energy using a jet correction function to get a good W peak, and the correct resolution of the number of jets shows promise for this algorithm in tagging boosted top events against dijet events.

Our jet correction function in this investigation worked poorly as well. The exact reason for this is unknown, but it may have something to do with the fact that the jets in the boosted top events overlap, whereas the dijet sample used to generate the correction function does not contain overlapping jets. Alternatively, there may be an unrecognized parameter, such as the number of reconstructed particles, not currently represented in the function that should be considered. The pseudorapidity of the boosted top events is rather small by comparison, so the jets are less likely to be elongated for the boosted top events than for the dijet events we have been studying. The jet correction function is based on a relatively small number of events, about 30,000; for an average of 2 jets per event, this means about 60,000 jets. With 25 cells, this means an average of about 2,400 jets per cell. This calls for a relatively large amount of statistical fluctuation, on the order of 1/(2400)0.5 = 0.02, in the ratio of reconstructed to true energy. In the future, we may use a sample of more events (probably 100,000,000).

It is remarkable that the bias for dijet events, 20 GeV, is lower than the bias for boosted top events, about 5 GeV, even though the uncertainty is much higher. The reason for this is an area for future investigation, but it may be related to the large overlap between jets from boosted top decays compared with jets from QCD events. Throughout this study we have assumed a Gaussian distribution of energy for each jet. In reality the jets may not have a true Gaussian shape, so if one jet overlaps with another, the Gaussian might put too much energy into one of the jets even as it puts the right amount in the other. In the future, we will study the shapes of the jets.

Conclusion

The Jet Flow algorithm shows promise for tagging boosted top events. It is good at resolving the correct number of jets, and the eta and phi resolution is within 0.03. The energy resolution is less impressive, with an uncertainty greater than 50 GeV for boosted top type events. However, it is likely that we will succeed in getting the energy resolution down to a level where we can resolve the W boson in boosted top events. This algorithm is quite promising for boosted objects in general, such as events caused by the hypothetical “t’” partner to the top quark, which decays to a boosted W boson. Once we get a good energy resolution with the fitting approach, the Jet Flow algorithm may open up new channels of research into exotic new physics at the Large Hadron Collider.

Acknowledgements

This research is performed in collaboration with Dr. John Stephen Conway at the University of California, Davis, for the Davis Boosted Top Group division of the UC Davis CMS Group.

Works Cited

1. Yes, we did it! The CERN Bulletin.2010, 14-15/2010.

2. CDF Collaboration. Search for Z'--> e^+e^- Using Dielectron Mass and Angular Dist.. Phys. Rev. Lett., 96, 211801 <arXiv:hep-ex/0602045v1>.

3. Amsler, C. et. al. (Particle Data Group), PL B667, 1 (2008) and 2009 partial update for the 2010 edition. (URL: http://pdg.lbl.gov)

4. CMS Collaboration. Particle-Flow Event Reconstruction in CMS and Performance for Jets, Taus and E_T^miss.[CERN CDS information server, CMS Physics Analysis Summary] Geneva, Switzerland: CMS Collaboration, 2009. CMS PAS PFT-09/001.

5. UC Davis Boosted Top Group.Boosted Top Meeting, April 2010.

Image: Arrow up, to the top of the page.