ANALYZING A WINE TASTING STATISTICALLY
(Wherein we rigorously analyze the famous 1976 Paris tasteoff!)
Orley Ashenfelter and Richard E. Quandt
Wine evaluation has been shrouded in snobbishness for so long that it has become a major source of comedy. As Fran Lebowitz wrote, "Intellectuals talk about ideas; ordinary people talk about things; but boring people talk about wine!" In fact, reports about the evaluation of wine appear in most major newspapers and there are important publications, like the Wine Spectator, that are devoted to nothing much else. Surprisingly, the reported evaluations of wine, which are subjective in the extreme, are rarely subjected to any serious appraisal. The results of any wine tasting will be solemnly reported, with a winner and a loser being declared, without any concern for whether the results would be replicated the next day (they usually aren't) or for whether there is enough agreement among the evaluators so that any meaningful conclusion may be drawn. After all, if the same wine were served blind in five glasses, and the judges did not know this, there would no doubt be a first place winner, a last place winner, and everything in between--but the correct conclusion would only be that the judges are variable, not that the wines are different!
The evaluation of wine by a set of appraisers raises many questions about how best to aggregate reported preferences. In fact, many of these issues can and should be raised in other, and to some extent more important, contexts. For example, grant proposals are appraised in subjective peer reviews, diving competitions are subjectively evaluated by judges, and guilt is determined subjectively by juries. Wine appraisal provides a natural, easily replicated, and amusing way to raise and study some important issues.
We relay some of our own experience with wine tasting evaluations in what follows, and we use this experience to analyze what is arguably the most famous wine tasting ever staged: Stephen Spurrier's famous competition of French versus California wines, conducted on French soil using mostly French judges. Time magazine reported that a California wine had beaten some of France's most famous wines this wine tasting, and the California wine industry has never been the same. But does a careful analysis verify this result?
Do Your Own Wine Tasting (and Circulate the Results!)
After a decade of experience, we think there are three key commandments that a wine tasting event should observe to be informative.
(1) Taste the wines blind. As any experienced taster will admit, identifying wines blind is an incredibly difficult thing to do. As a result, there is no doubt that tasting wines blind is a humbling experience. Perhaps this is why it is resisted. But the failure to taste wines blind leads to terrible biases. Indeed, one of the primary purposes of an independent wine tasting is to test whether common perceptions are really correct. Doing this requires that extraneous information that reflects the opinions of others be kept from biasing the tasters. Otherwise, what is the point of creating the wine tasting event? You might just as well read the score a wine has received in a wine publication and parrot it to everyone who will listen (something which, in our experience, happens all to often!)
(2) Try to keep the tasters' opinions independent. Wine tasting is a very subjective experience. As a result, even when wines are served blind, the opinions of others often serve as focal points for agreement. For example, a very noticeable feature in many large wine tasting events is the presence of "table effects." What seems to happen is that one or two individuals have strong opinions at a table, and this crystallizes the opinions of others. Move the same people to a different table and they may have a completely different opinion!
To combat the problem of dependence of tasters, some very professional groups do not permit anyone to speak about the wines until after they have written down their ranking of them. In other groups, including one that we participate in regularly, independence does not seem to require such extreme measures--probably because the tasters in our group revel in disagreement—but even we exercise some discretion in what we say; i.e., somebody might say "I think one of the wines is slightly oxidized" rather than "wine C is slightly oxidized."
(3) Analyze the results of the wine tasting systematically. In their pioneering book, Wines: Their Sensory Evaluation, Amerine and Roessler set out the details of how one should summarize the results of a wine tasting and draw conclusions from it. The basic scientific presumption from which they start is that human behavior is not perfectly predictable. Among those with considerable experience with wine tasting, this is hardly a controversial assumption. As Steven Spurrier stated, in reference to the famous tasting of French vs. California wines he hosted in 1976, "The results of a blind tasting cannot be predicted and will not even be reproduced the next day by the same panel tasting the same wines." The primary goal in the analysis of a wine tasting is to determine the extent to which the conclusions that have been drawn are likely to be reproduced on another occasion.
Although there has often been broad agreement with the goals of Amerine and Roessler, the technical expertise required to implement their approach is far too formidable for most wine lovers, especially when it has to be implemented in during a wine tasting event. (There is alcohol in wine!) So here is what we have done: we have created a software package that will compile the preferences expressed at a wine tasting, analyze the results, and then create a written and comprehensible permanent report of the event. We have been using this software at wine tasting events over the past few years and it has become an invaluable tool and record keeping device. In order to show how it works, we have re-analyzed the most famous wine tasting event of all time--and with some surprising results.
Analyzing the Results of the Paris Tasteoff of 1976
(Did the California cabernet really win?)
The often discussed French versus California wine challenge took place in Paris in May of 1976. As luck would have it, a Time magazine reporter was present to broadcast the results to an eager US audience celebrating the 200th anniversary of the founding of their nation. And the results were shocking: In the presence of distinguished competition, on French soil, and in a blind tasting, the French judges had voted that a California cabernet had defeated its Bordeaux challengers. And a California chardonnay had defeated its Burgundy challengers too.See the results!
The Judges. The complete details of the cabernet tasting, including the scores awarded by each judge to each wine, are contained in the associated table, which is, in turn, cut out of the report written by our LIQUID ASSETS Winetaster software. (We transcribed the original data from the Connoisseurs Guide To California Wine, July 1976.) As we shall see, not everything reported by the press about this event has been accurate.
The wines were marked against a maximum score of 20. As is obvious from the table, the judges were a distinguished group. Apart from Steven Spurrier and Patricia Gallagher, whose l'Academie du Vin sponsored the event, it included the late Odette Kahn, editor of the Revue du Vin de France, the distinguished Jean-Claude Vrinat of the Restaurant Taillevent, the late Raymond Oliver of the restaurant Le Grand Vefour, the sommelier Christian Vanneque of Tour D'Argent, Aubert de Villaine of the Domaine de la Romanee-Conti, Pierre Tari of Chateau Giscours, Pierre Brejoux of the Institute of Appellations of Origin, Michel Dovaz of the Wine Institute of France, and Christian Millau of the eponymous restaurant guide.
The Results. The first thing to notice about this event is that, despite what is usually reported, not all of the judges were French! The scores of Englishman Steven Spurrier and his partner Patricia Gallagher were, in fact, counted in arriving at the results.
The second thing to notice is that the scoring is based on a simple averaging of the numerical grades. As Steven Spurrier acknowledged in Decanter magazine in August 1996, he tallied the winners by "adding the judges marks and dividing this by nine (which I was told later was statistically meaningless)." The problem with this approach is, of course, that it may give greater weight to judges who put a great deal of scatter into their numerical scores and thus express strong preferences by numerical differences. It is for precisely this reason that, in a typical athletic competition with multiple judges, the judges' numerical scores are converted to ranks before the winners are tallied. Converting the grades to ranks guarantees that each judge has the same influence on the outcome. Absent this, the judge who grades wines from, say, 1 to 20 will have a far greater influence on the outcome than a judge who grades the wines on the same scale but uses only the scores 19 and 20.
To see the problem suppose were two wines, A and B, to be scored by two tasters. Suppose the first judge scored wine A with a 1 and scored wine B with a 20, but that the second taster scored the same wines 20 and 19. The average score of the first wine would be 10.5 and the average score of the second wine would be 19.5. In fact, however, the first wine was preferred by the second taster, while the second wine was preferred by the first taster, so there is no clear group preference.
In the table we have also shown the conversion of the judges' scores to ranks and we also provide the group ranking. The method recommended by Amerine and Roessler for computing the group ranking is to count the "points against." This is done by simply adding the sum of the rankings for each wine. Since there were 11 judges, the best score obtainable would then be 11 first place votes, or 11 "points against." Since there were 10 wines in total, the worst score obtainable would be 11 tenth place votes, or 110 "points against." As the table indicates, the best score achieved was actually 41 (for the 1973 Stag's Leap Wine Cellars California cabernet). So, it was no mistake for Steven Spurrier to declare the California cabernet the winner. (Whew!) However, the worst score (of 79.5 points against) was for the 1972 Clos du Val California cabernet, and this is not the wine that was placed last using the average of the judges' numerical grades. As the table indicates, there is a loose agreement between the ranking of the wines using the average grade and the average rank awarded by the judge, but it is far from perfect.
The fact that the most preferred wine did not attain the lowest "points against" is a result of the fact that there was considerable disagreement on the ranking of the wines by the individual judges. This is common in virtually all carefully conducted wine tastings. In fact, to most experienced wine tasters complete agreement is a suspicious sign of collusion!
Despite the disagreement among the judges there is also considerable evidence of concordance. Using a common statistical scheme, our software package established that there is enough concordance among the tasters that it makes sense to believe that the resulting ranking is not just a product of random chance. A loose grouping of the wines by this statistical criteria suggests that the wines may be grouped into three categories. At the top are the 1973 Stag's Leap cabernet and the 1970 Montrose. The second group contains most of the remaining wines. It may be noted in particular, that the software analysis lumps Stag’s Leap and Montrose together in the "best" category, but excludes Ch. Mouton from this top group, even though it is not far behind. This is a consequence of the particular statistical test employed and we would not quibble if someone argued that Mouton belongs in the top group as well.
Judging the judges. It is also useful to consider how successful the judges were in appraising the wines. One measure of the success of a judge is the extent to which an individual judge's ranking is a good predictor of the group's ranking (where the group's ranking excludes the particular judge in question.) By this measure the judges would be ordered as follows (from best predictor to worst): A. de Villaine (.70 correlation), J.-C. Vrinat (.65), Ch. Millau (.61), Steven Spurrier (.47), Pierre Brejoux (.46), Ch. Vanneque (.42), Odette Kahn (.29), and Raymond Oliver (.25). Ironically, the preferences of the remaining judges (Dovaz, Gallagher, and Tari), two of whom were French, are unrelated to the group preference.
If you taste wines systematically in a group we hope you will consider circulating the results to others. If you would like to see a newsletter that uses good analytical practice in reporting the results of its wine tastings we suggest you take at look at the California Grapevine (P.O. Box 22152, San Diego, CA 92192-9973, tel. 619-457-4818; $32 per year, or ask for a sample copy). Edited by engineer N. Ponomareff, this newsletter is primarily a carefully annotated list of California wine tasting results analyzed by the methods available in the LIQUID ASSETS Winetaster software. There should be more publications just like it. We would like to see informal newsletters devoted to both mature and immature wines of the Rhone, Italy, Australia, and many other regions.
An indispensable analytical book for anyone seriously interested in the sensory evaluation of wine is Maynard Amerine and Edward Roessler, Wines: Their Sensory Evaluation, W.H. Freeman, 1983. Although this book can be tough going it is an extremely rewarding discussion of many of the most important issues raised in wine tasting evaluations. Other fascinating papers include a full Bayesian analysis of wine tasting applied to the 1976 French tasting data in Dennis Lindley's "The Analysis of a Wine Tasting," and frequentist Richard Quandt's "Measurement and Inference in Wine Tasting" prepared for presentation at the Meetings of the Vineyard Data Quantification Society in Ajaccio, Corsica on October 2-3, 1998. Both papers, and details on the Winetaster software will soon be available at the website www.liquidasset.com.
Return to the list of articles
Return to the review of Taber's book