This March Madness, we’re using machine learning to predict upsets

“Beware the Ides of March.” Yes, it’s finally that time of year again: when the emperors of college basketball must watch their backs, lest the lowly bottom seeds of the tournament strike.

Before March 15, millions around the world will fill out their March Madness brackets. In 2017, ESPN received a record 18.8 million brackets.

The first step to a perfect bracket is correctly choosing the first round. Unfortunately, most of us can’t predict the future. Last year, only 164 of the submitted brackets were perfect through the first round – less than 0.001 percent.

Many brackets are busted when a lower-seeded team upsets the favored higher seed. Since the field expanded to 64 teams in 1985, at least eight upsets occur on average each year. If you want to win your bracket pool, you better pick at least a few upsets.

We’re two math Ph.D. candidates at the Ohio State University who have a passion for data science and basketball. This year, we decided it would be fun to build a computer program that uses a mathematical approach to predict first-round upsets. If we’re right, a bracket picked using our program should perform better through the first round than the average bracket.

Fallible humans

It’s not easy to identify which of the first-round games will result in an upset.

Say you have to decide between the No. 10 seed and the No. 7 seed. The No. 10 seed has pulled off upsets in its past three tournament appearances, once even making the Final Four. The No. 7 seed is a team that’s received little to no national coverage; the casual fan has probably never heard of them. Which would you choose?

If you chose the No. 10 seed in 2017, you would have gone with Virginia Commonwealth University over Saint Mary’s of California – and you would have been wrong. Thanks to a decision-making fallacy called recency bias, humans can be tricked into to using their most recent observations to make a decision.

Recency bias is just one type of bias that can infiltrate someone’s picking process, but there are many others. Maybe you’re biased toward your home team, or maybe you identify with a player and desperately want him or her to succeed. All of this influences your bracket in a potentially negative way. Even seasoned professionals fall into these traps.

Modeling upsets

Machine learning can defend against these pitfalls.

In machine learning, statisticians, mathematicians and computer scientists train a machine to make predictions by letting it “learn” from past data. This approach has been used in many diverse fields, including marketing, medicine and sports.