### Predicting QB Success in the NFL

Last year I wrote and submitted a paper for the MIT Sloan Sports Analytics Conference. While my abstract was accepted my paper was not. The title of my paper was Reducing Risk in the NFL Draft: Using Machine Learning Algorithms to Predict Success in the NFL. You can read the full paper here

In it I describe a decision tree model that predicts a college QBs success in the NFL. To train the model I used over 40 variables including college stats, school competitiveness, combine performance, and text mining of pro scouting reports. Ultimately, the final model used 4 variables: college win %, body mass index (BMI), college games started per season, and age. The final model was 88% accurate in predicting whether a college player would be a success or a bust in the NFL. This model can be used to predict whether the top prospects in this year's draft will be successful in the NFL.

Below is an interactive version of that final QB model.

1. Where would Ryan Leaf and Peyton Manning fall in the decision tree? That would be an useful out-of-sample validation.

1. Good idea. I just ran. Peyton was predicted a success: right win %, starts/season and age. Leaf was actually predicted a bust: not enough starts/season. Thanks.

2. This comment has been removed by the author.

3. Very cool. I work with researchers at Georgia Tech who did something in a similar vein: http://www.cc.gatech.edu/gvu/ii/sportvis/nfldraft/run/

4. How is Romo an actual bust? How is Griffin not a bust?

1. Good point.

Romo was classified as a bust. Bust status is based on approximate value (AV) and starts/season. Romo had high AV but low starts: 9.8 gms. He was right on the edge actually.

Subjectively, its debatable if Romo was a success given his injury history. Great player, didn't play enough.

RG3 is wrong. The data is 2 seasons old. I built the model 1.5 years ago. Today RG3 would be classified as a bust.

See more details in the paper. Thanks.

Was the actual Decision Tree built in R ? Is it possible to post the R code here?

1. It was built in R.

2. Would it be possible to post the R code?

3. This comment has been removed by the author.

4. Hi, if possible Im interested in the R code too, im from Brazil and im doing a research about this same subject, and will use R, thank you and congrats on this wonderful study.

6. This comment has been removed by a blog administrator.

7. what does the body mass have to do with the success and how did you determine that was a top four trait to look at?

1. A decision tree model identified BMI as a terminal node. You can read more in the paper.

8. the fact that mahomes is labeled a bust makes this study a joke

1. My model obviously was wrong. Mahomes is an outlier in many ways. He didn't have a single winning season in college. No pro QB has been successful in the NFL and had a losing record in college.

Remember all models are wrong. Some are useful. Also it's real easy to chime in anonymously nearly 2 years after my paper was published and Mahomes had an MVP season.

div#ContactForm1 { display: none !important; }