Working with Gradient Boosting Machine (GMB) Looking closer in the data set Cleaning up the data set Results after clean up Next steps
Each tree is grown using information from previous one After evaluating the first tree increase weights of observations that are hard to classify and lower the weights for those that are easy to classify In other words, given the current model fit a tree using the current residuals rather than the outcome This method slowly converts weak learners into strong ones
Sample of 10K tracks Training phase Inputs for GBM: x_1, x_2, x_3, x_4, y_1, y_2, y_3, y_4, z_1, z_2, z_3, z_4 7K tracks Number of trees = 200 Trees shrinkage = 0.1 (10%) Trees interaction depth = 10 GBM distribution = gaussian Testing phase 3K tracks 39 trees between 10 and 200 (in steps of 5)
Trying to predict 5th tracking hit Compute Minimum Mean Squared Error for each number of tree \(n_{tree}\) Considering 39 trees between 10 and 200 (in steps of 5) N = 3K tracks \[MSE_{n_{tree}} = \frac{\sum_{i=1}^{N} (x^{Test}_{i} - x^{Predicted}_{i})^2}{N}\]
Looking the central region of the detector Distance between plates seems to be higher than 20 mm So a cut in 20 mm looks fine
What about the horizontal points in a straight line? Does it seem reasonable? Actually it is pure Physics!!!
Tune GBM parameters Try to predict all the other hits in the track Find nearest neighbor point of a predicted point
Counting out more parameters Try to predict all the other hits in the track Find nearest neighbor point of a predicted point