Man Vs. DataRobot
Updated: Sep 20, 2018
In the spirit of Cleartelligence's recent partnership with DataRobot, I decided to put the robot to the test and compete against it in ESPN’s Capital One Bowl Mania competition. The goal of the game is to accurately predict the outcome of each College Football Bowl game.
What is DataRobot?
DataRobot is an automated machine learning platform that enables users of all skill levels to build and deploy highly accurate machine learning models in a fraction of the time of traditional modeling methods. Incorporating a library of hundreds of the most powerful open source machine learning algorithms, the DataRobot platform automates, trains and evaluates predictive models in parallel, delivering more accurate predictions at scale.
There are 41 Bowl Games and you will select the winner of each game. In addition to selecting the winner, you must assign a point value to each of your selections; starting with 1 point for the pick you are least sure about, all the way up to 41 points for the pick that you want to be worth the most. Each number between 1 and 41 must be used once and only once. The higher the number you assign, the more points you’ll receive for being correct.
I just went with my gut and simply selected which team I thought would win each game and did my best to rank them in terms of confidence; no science here!
First step in enabling DataRobot to make predictions is create a dataset to allow the platform to train and evaluate the hundreds of machine learning algorithms. Using Alteryx I was able to scrape data from sports-reference.com to compile a dataset that had results and statistics from every game for every team in FBS college football. The dataset that I uploaded to DataRobot included a total of 125 features (i.e variables or columns). The feature list included team and opponent name, conference, strength of schedule, and a number of offensive and defensive statistics such as average pass yards, rush yards, touchdowns, turnovers, etc.
Once I was happy with my data set, I simply uploaded it to DataRobot and let it run on autopilot. DataRobot’s autopilot feature, in layman’s terms, runs all of the machine learning algorithms included in the platform and ranks the models that best fit the data. It then blends some of the best fit models to create newer models that better fit the data and ranks those.
Now that I had the best fit model (which, for anyone interested, was a blend of an Elastic-Net Regressor, Vowpal Wabbit Regressor, and Ridge Regressor), the next step was to create a dataset for each of the 41 bowl games with the same features as the dataset I used to train the models. I then uploaded that dataset to DataRobot and selected the best fit model identified by DataRobot to predict the outcomes of each game. I then ranked the outcomes based on the confidence of each prediction and entered the picks; let the games begin!
I created the dashboard above in Tableau to track the results of each game as they occurred. The Crowd is the consensus from everyone that played the game on ESPN, this was my control. The top of the dashboard shows the total score as well as which team each competitor selected for the next upcoming game and how many points are at stake. The donut charts on the left show the percentage of correct picks that were made by each competitor. The horizontal bars in the middle show the total correct points in the darker color and the total potential points remaining is in the lighter color. On the right is a calculation to determine each competitors chance of winning the competition, the higher the percentage, the more trophy is filled in.
And the Winner is...
As you can see I was no match for DataRobot in the end, which outscored me by nearly 200 points. DataRobot performed extremely well overall evident by how much the model outscored the general consensus and finished in the TOP 7% of all entries on ESPN.
The chart above displays the total score by game to show how the competition played out:
The competition was tight for about 17 games between me and DataRobot, before Data Robot ran away with the competition. The Crowd held in for a little longer and kept it close for about 27 games, but in the end neither of us were much of competition for DataRobot.