Tag Archives: statistics

Statistical Prediction or Human Intuition? The Story of Moneyball

It would not be unreasonable if one’s impression of Michael Lewis’ fantastic Moneyball , is of a drama between stat nerds and old baseball scouts. This impression is most conspicuous in passages where Billy Beane, the manager of the Oakland Athletics, debates his seasoned scouts about recruiting Jeremy Brown, a senior catcher from the University of Alabama. The scouts were less than enthused about Brown because his 5-foot 10 inch, and yet 226-pound, frame did not fit the ideal image of a baseball athlete. Indeed, one of the scouts says of Brown:

“ ‘This kid wears a large pair of underwear’,…..

‘Okay’, says Billy.

‘It’s soft body,’ says the most vocal old scout. ‘A fleshy kind of a body.’

‘Oh, you mean like Babe Ruth?’ Says Billy…..

‘I don’t know’, says the scout. ‘A body like that can be low energy.’

‘Sometimes low energy is just being cool,’ says Billy.

‘Yeah,’ says the scout. ‘Well, in this case low energy is because when he walks, his thighs stick together.’

‘I repeat: we’re not selling jeans here,’ says Billy.’ ”

The passage with Beane and scouts is humorous, but it also suggests something important about decision-making, whether it be for a baseball team or some other form of business. At the time of writing Moneyball, the author Lewis was not aware but a leader of behavioral economics Richard Thaler and law professor Cass Sunstein observed that the scouts were basing their judgments of a player’s performance on a simple rule of thumb or heuristic known as the availability bias. When scouts think of a successful baseball player the image of someone who is tall and lean comes readily to their minds. Thus, the availability of this image led the scouts to believe that players with athletic bodies are more successful than those with less athletic ones.

The rule-of-thumb thinking is understandable because people do not have the cognitive ability nor the time to assess every detail before making a decision, but Beane realized that this way of thinking is no longer viable. The Oakland A’s lost three of their best players and were competing against teams that have triple their budget such as the New York Yankees. To stand a chance, Beane relied heavily on the statistical mind of the Harvard economics graduate, Paul DePodesta. Despite his lack of experience in playing baseball, DePodesta’s strong background in statistics and knowledge from previous works of baseball analysts such as Bill James, proved to be a game-changer.

Building on James’ insights, DePodesta used regression analysis, a common method in econometrics but in data science is known as a supervised learning technique, to determine which statistics (batting averages, slugging percentages, etc.) predict offensive success. He found that on-base percentage, by far, is the strongest predictor. His analysis showed that Jeremy Brown was likely to have a high on-base percentage and indeed, Brown had an on-base percentage of 0.451 in the 2002 season which is considerably higher than the average in major league baseball. Using DePodesta’s analysis, Beane transformed how Oakland A’s recruited their players and the team won 103 games, matching the Yankees and beating their previous year’s record.

Although DePodesta’s statistical analysis revolutionized how major league baseball teams recruited players, it would be incorrect to conclude that baseball teams (or firms generally) should rely solely on statistical analysis or that information from scouts is always inferior. In The Signal and The Noise, Nate Silver argues that the real lesson of Moneyball is that optimal decision making occurs when baseball teams have an effective process where all information is collected and weighed appropriately. For one thing, Beane increased the scouting budget since the 2002 season. He realized that relentless information gathering is the key to a baseball team’s success and should not be limited by the type of information, whether it be quantitative or qualitative. Some information cannot be easily measured but is still relevant. As Silver notes,

“If Prospect A is hitting 0.300 with twenty home runs and works at soup kitchen during his off days, and Prospect B is hitting 0.300 with twenty home runs but hits up night clubs and snorts coke during his free time, there is probably no way to quantify this distinction. But you’d sure as hell want to take it into account”.

The other way that statistical analysis is limited is that in most cases, statistical predictions still require human judgment. The economists Ajay Agrawal, Joshua Grans and Avi Goldfarb argue in Prediction Machines that since no prediction is 100% accurate, decision makers in a firm (or baseball team) still have to weigh the benefits and costs of each action. Though statistics on offensive success are pretty reliable, Silver pointed out that defensive success is less so because it is difficult to quantify. Not accounting for the defensive capabilities of a player, however, is costly. Silver estimated that Oakland A’s defective defense cost them 8 to 10 wins per season in the mid-1990s. Thus, the general manager, analysts, and scouts have to gather all relevant information and deliberate on how much weight they should put on a player’s offensive and defensive capabilities.

It is not by accident nor luck that the most successful baseball teams are the ones who incorporate all information. Even Silver’s statistical predictions of individual player performance were outperformed by the predictions in Baseball America because the magazine incorporates information from scouts and analysts. Similar to baseball, it is likely the case that the most successful firms will be the ones who learn to integrate quantitative and qualitative information and to have processes where relevant stakeholders can objectively weigh the benefits and costs of each action. This, of course, is easier said than done but as Billy Beane, played by Brad Pitt in the film Moneyball, responds to the scout who did not like how the team is changing their recruitment strategy, “Adapt or die”.

Leave a comment

Filed under statistics