|
Championship Chessmetrics Analysis
By Jeff Sonas.
Jeff is a statistical chess analyst who has written dozens of articles
since 1999 for Kasparov Chess website. In recent months he has invented
a new rating system and used it to generate 150 years of historical chess
ratings for thousands of players. You can explore these ratings on his
Chessmetrics website.
He is also V.P. of Engineering for Ninaza, providing web-based medical
software for the health care industry.
|
INTRODUCTION
We are cursed to live in "interesting times" in the chess world.
We have two different organizations sponsoring their own versions of the World
Championship, and the top-rated player in the world wants no part of either
championship. Countless proposals and unification plans have been suggested,
and rejected, and there is no end in sight.
I am a relatively weak chess player, but a very strong computer programmer
and statistician. Most of all, I am a big fan of chess, and I want to help.
I have little to contribute in the arenas of business plans, organizational
details, or negotiations, but nevertheless I do have something quite useful
to offer. I have developed some very sophisticated statistical tools that enable
me to objectively explore various chess topics, and in recent weeks I have devoted
considerable time to analyzing thousands of different world championship formats.
I would like to share the results of that analysis.
I am not affiliated with any chess organization, and I have no particular agenda
to promote. What I do have, instead, is the distinct impression that people
are making important decisions about the world championship, without adequate
information. One possible explanation is that the decision-makers are simply
unaware that it is often possible to use statistics to draw reasonably sound
conclusions about some of these topics. Or maybe they don't even care about
objective truth, and simply want to promote their own agendas or improve their
own situations. I'm going to adopt the role of the optimist here, and assume
that many people would love to have an objective analysis of the various options
available for the world chess championship, but that nobody ever thought to
ask for one. Well, here's your analysis...
Based on my calculations, I can now tell you whether one world championship
format is "objectively better" than another one, and I can explain
why. If you describe a typical world championship format to me, I can tell you,
with reasonably good accuracy, the average percentage chance of the strongest
player in the world winning the championship cycle. I call that percentage the
"effectiveness" of a world championship format.
For instance, it turns out that the 128-player FIDE World Championship has
an "effectiveness" of 38%, which means that 38% of the time, it will
be won by the strongest player in the world (assuming no boycotts). In other
words, five times out of eight the strongest player will fail to win the tournament.
The Einstein Group's world championship cycle (which will debut in July in Dortmund,
Germany) has a much better effectiveness of 50%, which still means that the
best player will be champion only half of the time. By comparison, a slightly
modified version of Yasser Seirawan's "Fresh Start" proposal is extremely
effective, at 67%. In fact, none of the 13,000 formats under consideration managed
to break the 70% barrier, so Yasser's proposal is almost maximally effective.
Through statistical analysis combined with random simulation, I have analyzed
13,000 different world championship formats in great detail, including Swiss
tournaments, knockout tournaments, long matches, short matches, round-robin
tournaments of various types, qualifier tournaments, and much more. I have tried
to include all of the formats which have been used historically or which are
currently under consideration, as well as many experimental formats. Out of
those 13,000 formats, the FIDE World Championship format is ranked #12,671,
which means that it is in the bottom 5% in effectiveness. Although the Einstein
Group format is clearly better, a 50% effectiveness is still not very good:
it ranks #10,945 on my list. The modified Seirawan proposal, by comparison,
is way up at #345.
After that introduction, you might be chomping at the bit to learn what format
is #1 on my list. However, I'm not going to tell you just yet, because "effectiveness"
is not the only important factor. Without giving those other factors their due
consideration, it doesn't make sense to talk yet about what is "best"
or even "objectively best".
THE FOUR IDEAL CHARACTERISTICS OF A WORLD CHAMPIONSHIP
In evaluating various world championship formats, I believe there are four
important characteristics to consider. I want to introduce a little bit of terminology
here, in an attempt to make all this easier to talk about. An ideal world championship
format would be "practical", "effective", "inclusive",
and "unbiased". Let me briefly cover what I mean with each of those
four words.
(1) "Practical" – The top players must be willing to
participate, the sponsors must be willing to sponsor the tournaments and/or
matches, and the playing sites must be available. Thus, World Championship formats
that include relatively shorter events, or just one event, would be more "practical"
than multi-stage formats or formats with very long matches or tournaments. And,
of course, World Championship formats with greater prize money will also be
more attractive to the players, although there are other important considerations
for most players.
(2) "Effective" – The overall purpose of the World Championship
is to allow the strongest player (whoever that may be) to demonstrate their
superiority by winning the championship. For instance, World Championship formats
with inadequate length or inefficient structure will frequently be won by weaker
players, whereas more effective formats would provide that strongest player
sufficient maneuvering space (even if they lose a game or two) to demonstrate
their superiority by winning the championship.
(3) "Inclusive" – It is easy to tell which players have
been the most successful in the recent past; just consult the rating list. However,
ratings are known to be somewhat inaccurate as measures of players' actual strength,
and it is quite conceivable that the strongest player is not actually the highest-rated
player. Thus it is typically a good idea to include several players in the World
Championship cycle, to give more people an option to demonstrate their ability.
However, the tricky part is that many super-inclusive formats, such as the FIDE
championships, are extremely ineffective at determining the strongest player.
Nevertheless, it is still possible (though challenging) to be both "inclusive"
and "effective" simultaneously.
(4) "Unbiased" – Traditionally, specific players in the
world championship cycle have been given certain advantages, due to their past
accomplishments. For instance, the defending champion might be seeded directly
into the final match, or a recent semifinalist might automatically qualify as
a Candidate without needing to play in an Interzonal. Other advantages have
included "draw odds", and the champion's right to an automatic rematch,
and first-round byes for high-rated players (as in the earlier 100-player FIDE
knockout tournaments). These "biases" are often perceived as being
unfair to everyone else, and should be avoided when possible. However, a "bias"
is not inherently bad; it is simply an advantage granted to a particular player.
It can be one way to make an event more "effective" without having
to make it impractically long.
In all fairness to the FIDE and Einstein Group approaches, they do have their
important advantages. The FIDE approach is extremely inclusive and unbiased,
and reasonably practical (as long as there is sufficient funding for such an
event). The Einstein Group's format is not particularly inclusive, though it
has the large practical advantage that it bears some resemblance to the traditional
way the championship has been run, and thus its winner might indeed be more
accepted by the public, as a legitimate champion, than the FIDE champion often
has been.
THE FIDE CHAMPIONSHIPS
The FIDE championship format takes a mere 22 days of play to reduce 128 competitors
down to one champion. It is very inclusive, and has no biases in favor of any
specific participant. For comparison, I identified 72 other formats that are
22 days or shorter, and also have no biases. Out of these possibilities, the
FIDE format (38% effectiveness) is right in the middle, ranked 37th out of 73.
Most options are in the 30%-40% range, and only one format managed to finish
above 50%. If FIDE were to invite just the eight top-rated players to its knockout
tournament, with two rounds of 6-game matches and then a 10-game final (22 playing
days), it would have a 52% chance to be won by the strongest player in the world,
slightly better than the Einstein Group approach. Another good unbiased and
practical option would be to have four simultaneous single-round-robin tournaments
with 10 players each (9 playing days), with the four winners advancing to two
rounds of knockout matches (4-game semifinal matches and then an 8-game final
match). That approach would be significantly more inclusive and only slightly
less effective (46% effectiveness) than the eight-player knockout.
When there are no biases introduced (i.e., nobody gets automatically seeded
into any later stage, and everyone is treated equally), a knockout event seems
to be far better than a Swiss. For instance, the options to take the top two
or four finishers from a 13-round Swiss tournament and then play short matches
between those top finishers, turn out to be very ineffective, often lower than
20%. However, as you will see in a little while, a format based on a Swiss qualifier
can actually be considerably more effective than a comparable format with a
knockout qualifier. This discovery greatly surprised me, and I will go into
more detail further down, when I discuss the Fresh Start proposal. However,
first let's finish talking about the FIDE and Einstein Group approaches.
The major criticism of the FIDE championship, of course, is that the individual
matches are too short. A single loss can mean almost certain elimination. Everyone
loses a game now and then, so it seems an overly drastic punishment to be eliminated
because you happened to have a minus score over the span of two games. The 2002
tournament made a half-hearted attempt to address this by lengthening the final
match from 6 games to 8 games. As I mentioned before the tournament, that is
hardly much of an improvement (it raised the effectiveness by 0.2%). It would
have been better (39% effectiveness) to use those extra two days to make the
quarterfinal round 4 games long, instead, although of course even better would
be a 4-game quarterfinal AND an 8-game final (41% effectiveness).
Another obvious option would be to change all of the 2-game matches into 4-game
matches. Of course, this would have the unfortunate result of adding at least
10 days to the length of the event if it stayed a 128-player tournament. To
compensate, the number of players could be reduced from 128 down to 64. Thus
with 4-game matches throughout, leading to an 8-game final (32 playing days),
the effectiveness would rise to 42%.
Unsurprisingly, the knockout tournament would become more and more effective,
as we make it less and less inclusive and lengthen various rounds. If we were
to halve the number of players again, a reasonably inclusive knockout tournament
(32 players) could still be held, with four-game matches throughout, leaving
room for either an 8-game final match (43% effectiveness) or a longer 14-game
match (44% effectiveness). With sixteen players, the effectiveness could be
improved to 46% by 4-game matches and a 14-game final. Finally, as I already
mentioned, the most effective unbiased tournament would be an eight-player knockout
tournament with six-game quarterfinal and semifinal matches, with a ten-game
final, an overall effectiveness of 52%.
THE EINSTEIN GROUP CHAMPIONSHIPS
Now let us turn to the Einstein Group championship format. This is an amazing
attempt to compress an entire Candidates Cycle and World Championship match
into a mere 30 days of play. The format has come under severe criticism because
the round-robin preliminaries and the subsequent two rounds of four-game matches
are perilously short. In its current state, the only significant bias involved
is that the defending champion gets to play in the final. So, I considered all
of my formats lasting 30 or fewer playing days, with the single bias that the
champion is seeded into the final automatically (assuming rapid tiebreaks throughout).
There were 208 different formats, and the Einstein Group approach (50% effectiveness)
ranked 148th, placing it in the bottom third.
The most effective approach (62% effectiveness), within these constraints,
would be to only invite the four top-rated players (other than the champion).
They would then play two rounds of six-game knockout matches to get from four
players down to one, and the winner would play the defending champion in an
18-game match. Even just a 14-game match would still be a 61% effectiveness,
and better than any other approach (given the constraints). If it were necessary
to include eight candidates, plus the champion (as is the case in Dortmund),
the 30 days would be better spent in three rounds of 4-game knockout matches,
followed by an 18-game match against the defending champion (60% effectiveness).
If it were desirable to be even more inclusive (for instance so that a "wildcard"
local participant like Christopher Lutz could be chosen, without impacting the
odds too significantly), you could have two simultaneous 10-player single-round-robins,
where the two winners play each other in a 4-game match, and the winner plays
the defending champion in a 16-game match (56% effectiveness). Or you could
even go the super-inclusive route, with a 196-player 13-round Swiss like Yasser
Seirawan suggests. The two top finishers could play each other in a 4-game match,
and the winner challenges the defending champion in an 8-game match. That would
only last 25 days, and would still have an effectiveness of 55%. All of these
options are significantly more effective than the actual format chosen by the
Einstein Group, while still lasting no more than 30 playing days.
Of course, none of those options resemble the format that will actually happen
in Dortmund. Are there less significant changes that would still greatly improve
the effectiveness? Absolutely. For instance, the pair of 4-game knockout matches
is hazardous. Even in a four-game match, it is very difficult to recover from
a loss. How about getting rid of one of those matches? Instead of picking the
top two players from each preliminary round-robin, you could just pick the top
finisher from each round-robin. Then a single four-game match between those
two winners, followed by the same 16-game final against the defending champion,
would make the event four days shorter, and it would raise the effectiveness
from 50% to 56%. It would be even better (60% effectiveness) to make use of
the whole 30 days by playing matches of 10 and then 14 games (rather than 4
and then 16).
Finally, there was a way to be even more effective, within the 30-day constraint,
although it did involve introducing another bias into the world championship
cycle. It always helps the effectiveness of a format if you allow the highest-rated
player to automatically bypass the qualifier event. For instance, you could
have the highest-rated player compete in a 10-game match against the winner
of a 4-player double-round-robin, and the winner would challenge the defending
champion in a 14-game match. That would be a 64% effectiveness, and it seems
likely that Garry Kasparov would have been more amenable to that option, although
of course I have no idea what went on with the negotiations. I should point
out that all of these numbers assume that nobody declines an invitation. With
neither Kasparov, Viswanathan Anand, nor Ruslan Ponomariov participating, the
effectiveness of the Einstein Group approach, in this particular cycle, will
of course be way lower than 50%. It will probably be more like 20% or 25%, since
it is reasonably likely that the best player in the world is either Kasparov,
Anand, or Ponomariov, and there is less than a 50-50 chance that the best player
in the world is actually participating in the championship cycle at all.
WHERE THESE NUMBERS COME FROM
I don't expect you to blindly accept all of these numbers. If you're still
paying attention by this point, you might be wondering whether I'm just making
up the numbers to serve my own purposes, or if I actually calculated them somehow.
I don't want to bog you down with all of the gory details, but here is a brief
summary of what I did.
I didn't want my conclusions to be skewed by any special characteristics of
the current rating list, such as an unusually large gap between #2 and #3, or
between #3 and #4. So I decided that my calculations would be based upon a "representative
rating list", rather than an actual one. I did some analysis of rating
list trends over the past few decades, and came up with a way to randomly simulate
millions of "typical" rating lists. Thus sometimes there is a huge
gap between #1 and #2, and sometimes it's very crowded at the top, with no clear
leader. Sometimes the champion isn't even the top-rated player.
However, it is also important to acknowledge that ratings are inaccurate. They
are merely estimates of players' true strengths, and those estimates have errors
associated with them (a standard deviation of about 50, if you're interested).
Somebody might have a rating of 2700, but their true strength could easily be
2580 or 2780. So, for each random rating list, I had to simulate a "true
strength" for each player. The one player with the highest "true strength"
is that elusive "strongest player in the world", whom we are trying
to identify through the use of an effective world championship format. Thus
sometimes the "strongest player in the world" might not be the world
champion or the top-rated player; they might even be rated #8 or #10 or #20
in the world, though it's unlikely. That is why it is important to be inclusive
with your world championship cycle; if you just use the top two or three players,
you might easily leave out the strongest player.
Armed with the ratings and true strengths of everyone on a simulated rating
list, I could then proceed to simulate a world championship cycle. I tried various
types of qualifier formats, different numbers of simultaneous qualifying tournaments,
allowing the top-rated one or two players to bypass the qualifier, different
ways of resolving tied matches, and/or allowing the champion to enter the cycle
at various stages. The breakthrough was my realization that all popular world
championship formats could in fact be expressed as an "Interzonal"
qualifier followed by a series of knockout matches. This allowed me to tackle
the problem systematically, rather than just trying a few options which I thought
might me "ideal". For instance, the FIDE championships were treated
as eight different qualifier tournaments (each of which were 16-player knockout
events won by a single player) and then a series of knockout matches among the
final eight players. The Einstein Group championships were treated as two simultaneous
qualifier tournaments (each of which were 4-player double-round-robin tournaments
that qualified two players), and then there were three rounds of knockout matches,
with the champion entering the cycle in the third and final round of knockouts.
And so on. For each simulated championship cycle, I could see whether the "strongest
player" actually won, and over an average of many thousands of iterations
for each format, that would tell me the "effectiveness" of each world
championship format.
YASSER SEIRAWAN'S "FRESH START" PROPOSAL
I have to admit that I expected my analysis to reveal a searing criticism of
Yasser Seirawan's "Fresh Start" proposal, with its Swiss qualifier.
Swiss tournaments are generally perceived to be very ineffective, especially
compared to knockout tournaments of comparable size. I expected that I would
have to conclude that "it's all well and nice to play three rounds of long
matches at the end of your world championship cycle, but what good is that when
the majority of Candidates were chosen in a lottery?"
I was even advised to save myself the effort of trying to program Swiss tournaments
in my simulations, since they were obviously so ineffective. A very prominent
arbiter told me, "You do not need that for your simulation. It is perfectly
obvious, if you want to obtain a winner who has the highest rating prior to
the event, then the current FIDE knockout system is best." However, I really
wanted to compare the FIDE and Einstein approaches against Yasser's proposal
(which is based upon a Swiss qualifier), so I ultimately decided to include
the Swiss qualifiers in my analysis.
Well, guess what? Out of the 13,000 world championship formats I evaluated,
number TWO on the list, with an effectiveness of 69.4%, was the following structure:
The world champion and the two highest-rated players (other than the world champion)
bypass the qualifier and automatically become Candidates. They are joined by
the top five finishers from a 196-player 13-round Swiss. Those eight players
then play three rounds of knockout matches (16-game quarterfinal, 20-game semifinal,
and 20-game final).
Does that sound familiar? It's almost exactly what Yasser Seirawan suggests
for the next world championship cycle. He actually suggests a 10-game quarterfinal,
a 14-game semifinal, and a 20-game final, and that shorter format (67% effectiveness)
shows up at #181 on my list (still in the top 2% of formats). And there are
details in his proposal about tiebreaks that were not included in my overall
analysis (though I do cover them further down); I assumed rapid tiebreaks everywhere
for the eight-player candidate cycles, since otherwise the calculations would
have taken months to run all the possibilities! And Yasser doesn't actually
say that it should be the two highest-rated players who bypass the qualifier;
he specifically names Garry Kasparov and Ruslan Ponomariov as the two players.
The number one format on my list, with an effectiveness of 69.5%, was actually
very similar to number two. In this scenario, only the top finisher from that
same Swiss tournament qualifies, to play the #1-rated player in a 20-game match.
The winner then plays the defending world champion in a 20-game match for the
title. That is the single most effective world championship that I could find,
but unfortunately it includes two biases: the world champion gets automatically
seeded into the final round, AND the top-rated player doesn't have to play in
the Swiss. Yasser's proposal would be somewhat less biased, as it is less of
an advantage to be an "automatic Candidate" when there are eight Candidates
rather than two, and of course in his proposal the defending world champion
does not get automatically seeded into the final match.
Since we're on the topic, I should point out that the #3 format on my list
has actually been tried, sort of, in the world championship. In 1959 Mikhail
Tal won an eight-player quadruple-round-robin tournament in Yugoslavia, allowing
him to play a 24-game match against the defending champion. In 1962 Tigran Petrosian
won an identical format in Curacao. And that same format is #3 on my list, with
an effectiveness of 69.3%, although it says that the winner of the round-robin
should face the top-rated player rather than the defending champion. Thus if
the defending champion was not the top-rated player, the champion would have
to play in (and win) the round-robin tournament for the opportunity to play
a championship match against the top-rated player. Also, it's not strictly like
the 1959 and 1962 Candidates tournaments, because back then the eight players
came from Interzonals, whereas this format recommends just taking the players
from the top of the rating list. Presumably the bias in favor of the top-rated
player is too much to make this format acceptable, although it is clearly very
effective.
Of course, there is no real difference between 68.3% and 68.5%. The point is
not so much that nine of the top twelve formats happened to have Swiss qualifiers.
The real dazzler is that a Swiss qualifier can with any seriousness be called
"optimal". Conventional wisdom tells us that knockout tournaments
are more effective than Swiss tournaments of comparable length. It says that
knockout tournaments work better, because the strongest players are in control
of their own destiny, and nobody can finish ahead of you unless you are actually
knocked out by someone. By contrast, in a Swiss you might do well but someone
else might happen to do even better.
Why is conventional wisdom wrong? Well, I have two possible explanations. One
has to do with information theory. In a multi-stage event such as a knockout
tournament, it only matters if you make it to the next stage, whether that be
from a 2-0 whitewash or a 3-3 standoff where somebody advances from a sudden-death
game. After each round, the slate is wiped clean and all remaining players start
with the same score. Obviously, that means discarding a considerable amount
of information about how players have been performing. When the whole point
is to identify the strongest player, it seems unwise to discard so much information.
By contrast, in a Swiss tournament, your total score reflects the whole of your
performance in the event. Of course, this "additional information"
has to be balanced against the fact that players face different levels of opposition
in a Swiss tournament, so a score of +2 might sometimes be more impressive than
a score of +4. But there are obviously ways to address that by optimizing the
pairings and/or scoring method, though that lies outside the scope of my analysis...
for now.
To understand my other explanation, consider an alteration to Yasser's proposal.
Rather than a large Swiss which generates five Candidates, you could instead
have five different simultaneous 16-player knockout tournaments (2-game matches
throughout), where the winner of each knockout tournament becomes a Candidate.
That approach would be good (62%) but not as good as the Swiss approach (67%).
With the knockout approach, you are basically splitting your field into five
subgroups, and deciding to take the single top-performing player from each subgroup.
If the strongest player in the world happens to be playing in the same subgroup
as another player who is almost as strong, then it becomes reasonably likely
(in the knockout approach) that the strongest player would lose a two-game match
to the slightly weaker player. You can't qualify both players and resolve their
differences later in a long match, since you are required to take exactly one
player from each subgroup (i.e., the one who wins each knockout tournament).
The numbers (62% vs. 67%) suggest that it would work much better to have all
of the players intermingled in one big tournament, so the five strongest performances
can advance, independent of who would have been in which subgroup.
However, the Swiss tournament is not some magical solution that should be used
anywhere; it is very easy to use it poorly. The Swiss only works well if the
highest-rated players bypass it and automatically become Candidates. Thus the
Swiss is best viewed as a super-inclusive way to sort through the rabble and
find the rare player who is extremely under-rated (literally) and actually very
strong. If we already know that a player is very strong (the defending champion,
or one of the two top-rated players in the world), it is far better to allow
them to bypass a Swiss where they might potentially lose a couple of games and
fail to qualify. For instance, if you had everyone (including the defending
champion) play in the Swiss, and picked the top eight finishers as your candidates,
then the effectiveness would only be 17%. If you automatically qualified the
defending champion, but the other seven qualifiers had to come from the Swiss,
the effectiveness would only be 53%, barely better than the Dortmund style.
The most important thing is to include at least the highest-rated player automatically,
along with the defending champion. If the two automatic qualifiers are the defending
champion and the (remaining) highest-rated player, the effectiveness jumps up
to 64%. And as we've seen already, if the second-rated player is also allowed
to bypass the qualifier, the effectiveness is a nearly-ideal 67%.
Another interesting question is whether the qualifier tournament becomes more
effective if you make it more inclusive. We have seen earlier, in the discussion
about the FIDE format, that a knockout loses effectiveness significantly when
you double the number of players. In the case of a Swiss, however, the inclusion
of extra players actually helps, rather than hurts, the effectiveness. For instance,
if you modify the Seirawan proposal to only include 64 players, the effectiveness
is 61%, but doubling the field of players, for a total of 128, raises the effectiveness
to 65%, and tripling the field (to Yasser's suggested 196-player level) leads
to the best effectiveness, the 67% already mentioned. Presumably this is because
the weaker players don't get in the way as much in a Swiss, after the first
round or two.
In a 128-player knockout, you have a large number of players who clearly are
not the strongest players in the tournament, but who can have a huge impact
on the outcome through the chance elimination of a top seed. We almost saw the
extreme example of that in Moscow, where a single loss to the bottom seed just
about resulted in the first-round elimination of #1 seed Viswanathan Anand.
On the other hand, by having such an inclusive field in the large Swiss, you
give yourself the possibility of identifying an extremely underrated player
who actually deserves to play in the Candidate section.
If you're trying to get a feel for what level of player would typically finish
in the top five in the 196-player Swiss tournament, I can tell you that an average
set of five qualifiers would have ratings ranging from 2600 to 2780. A very
strong set of five qualifiers (which would happen one time out of every ten)
might be something like: Michael Adams, Alexei Shirov, Peter Leko, Alexander
Morozevich, and Judit Polgar. A much weaker set of five qualifiers (which also
whould also happen one time out of every ten) would be like: Viswanathan Anand,
Zoltan Almasi, Konstantin Sakaev, Giorgi Giorgadze, and Xie Jun. On average,
out of the five top Swiss finishers, there would be two or three players rated
above 2700, and two or three players rated below 2700. Once every 25 or 30 tournaments,
all five qualifiers would be rated below 2700, and once every 40 or 45 tournaments,
all five qualifiers would be rated above 2700. About 45% of the time, at least
one qualifier would be a sub-2600 player.
RAPID TIEBREAKS
One controversial issue is whether rapid games are a good way to break ties.
This only matters, of course, if a tie actually occurs, so it is a more significant
factor when there are short events (such as the FIDE championships or the Dortmund
qualifier), and it wouldn't matter as much in the Seirawan proposal (though
of course it still could happen). There is a general perception that rapid and
blitz games are more "random" than classical games. This is undoubtedly
true, since time trouble always introduces an element of randomness into the
outcome of a game. However, I recently analyzed the results of several thousand
games played at various time controls over the past few years, and (statistically
speaking) this issue doesn't seem to be a particularly significant one. The
higher-rated player still manages about the expected percentage score, whether
the game is played at classical, rapid, or blitz controls. Here is a picture
to illustrate what I am talking about.
In this graph, we see the well-known trend that as the white player's rating
advantage gets bigger and bigger, White tends to score a higher and higher percentage.
If the two players have the same rating, then White scores 55%. If White has
a rating advantage of 200 points, then White would score almost 70%. The blue
line represents this relationship at classical time controls.
Now look at the red line, which represents rapid games. If rapid time controls
really did make the game a lot more random, then the higher-rated player would
tend to score closer to 50% than predicted, with either color. That means we
would see the red line being flatter, more horizontal, than the blue line. This
is true to a certain degree, especially on the right side of the graph, in those
scenarios where White has a large rating advantage. This means that rapid games
do indeed turn out more randomly when White is the big favorite; White is not
able to score as high a percentage as the ratings would suggest. For instance,
with a +300 rating point advantage, White would score 75% in classical games
but only 72% in rapid games. However, when Black is the favorite by more than
100 rating points (the left side of the graph), the rapid results are exactly
the same as classical. Thus, when outrated by 300 points, White scores an identical
33% whether it be classical or rapid. So, the conclusion to be drawn is that
the advantage of the white pieces is not as large in rapid games as in classical
games, especially when White is the higher-rated player. But the higher-rated
player should do just about as well in rapid as in classical. Perhaps the real
"randomness" comes from the fact that rapid matches are typically
only two games long, rather than four or six.
The blitz data (the white line on my graph) is a little more suspect, because
there are fewer results available to analyze. However, there is no compelling
evidence that blitz games are "more random" than rapid or even classical
games; the white line is not any more horizontal than the blue line. You can
see a distinctive bend in the middle of the white line, suggesting that the
advantage of the white pieces is magnified when the two players are of similar
strength. For instance, when the two players have the same rating, White scores
58% in blitz but only 55% in classical. As I just mentioned, the advantage of
the white pieces is not as large in rapid chess as it is in classical chess,
so in rapid games, when the players have identical ratings, White only manages
to score 53%. But again, I see no real evidence that the faster time controls
are diminishing or obscuring the rating difference between the two players in
blitz. Thus it seems that rapid games, or even blitz games if need be, are a
reasonably effective way to resolve ties.
Now, it is certainly true that we see a lot more decisive results in the faster
time controls, particularly in blitz. What do I mean by "a lot more"?
Well, switching the time controls from classical to rapid, has about the same
effect (on the frequency of draws) as changing one of the players from Peter
Leko to either Veselin Topalov or Alexei Shirov, or changing the opening from
a 1.d4 game to a Sicilian Dragon. Further, switching the time controls from
classical to blitz, has about the same effect (on the frequency of draws) as
changing a Peter Leko-Anatoly Karpov matchup into an Alexander Morozevich-Alexei
Fedorov matchup, or changing a Petroff's Defense into a King's Gambit. This
will indeed make the results slightly more random, which (as I said) could be
addressed by making the rapid tiebreaks longer. I hate to sound like a broken
record, but I should again point out that this exact approach (using 4-game
matches if a rapid tiebreak becomes necessary) was already suggested by Yasser
Seirawan in his "Fresh Start" proposal.
For instance, let's take a very simple unbiased case, where two simultaneous
10-player single-round-robin tournaments are held, and the winners play each
other in a title match. First let's consider the case where the final match
is only six games long. If a drawn match is to be resolved by the spin of a
roulette wheel, the effectiveness of this format is 37.3%. Obviously, it would
be better to actually play games to resolve the tie, since the stronger player
would have a better-than-even chance to win the tiebreak. So if we use the rapid-blitz
progression like in the FIDE championships, the effectiveness goes up to 39.2%.
Since blitz games are more random, if we simply played a long set of 2-game
rapid matches, it would be slightly better (39.3%). Finally, Yasser's suggestion
of a rapid match which would be four games long (rather than two), is the most
effective tiebreak method (39.5% effectiveness).
You can see from those numbers that the tiebreak method doesn't matter too
much, even for a mere six-game match; the effectiveness ranged from 37.3% (random)
to 39.5% (4-game rapid match). Of course, as the match length is increased,
the tiebreak method becomes less and less of a factor; for a 16-game match,
the random option has an effectiveness of 41.1% and the other options are all
41.6% or 41.7%. And for a 24-game match, the random tiebreak has an effectiveness
of 42.1% and all other options are tied at 42.3%. A drawn match is just too
unlikely.
However, sometimes this issue does not even arise. Specifically, if one of
the players has been granted "draw odds" in a particular match, that
player is automatically declared the winner in the event of a drawn match. Usually,
the defending champion is granted draw odds in their match, and this is obviously
a key part of Yasser's proposal, since it acknowledges two champions, and there
are also the curious provisions about "inheriting" draw odds if you
overcome them in your quarterfinal match. Generally, draw odds are not a good
way to resolve ties. They are better than a roulette wheel (since on average
the defending champion will be stronger than the challenger), but slightly less
effective than any other tiebreak method. The main benefit of draw odds is that
they provide an incentive for a defending champion to actually participate in
a world championship cycle, since the draw odds are a bias that favors the defending
champion.
Everything that I have said to this point applies to chess world championships
in general. The conclusions would have been identical a decade ago, or fifteen
years in the future, even with a completely different set of top players. However,
at this point I must leave off my attempts to be "generic", because
there is one final issue I want to cover, which must be handled "specifically".
I want to discuss the topic of who would be favored by the various biases in
the "Fresh Start" proposal, and in order to do that we must start
talking about "Vladimir Kramnik" and "Garry Kasparov" and
"Ruslan Ponomariov", rather than just "the defending champion"
or "the highest-rated player".
WHO IS FAVORED BY THE FRESH START PROPOSAL?
The "Fresh Start" proposal has an interesting set of biases. Kramnik,
Kasparov, and Ponomariov are all "rewarded" by being allowed to bypass
the qualifier, but each in turn is "punished" by the fact that the
other two players are also bypassing the qualifier. Ponomariov would presumably
be happy to avoid the qualifier, but sad that Kasparov and Kramnik (probably
his two strongest potential opponents) were guaranteed to qualify. Further,
as champions of their respective organizations, Kramnik and Ponomariov are additionally
granted another bias: draw odds in their quarterfinal and semifinal matches.
Finally, Kasparov is "punished" by the fact that he will have to overcome
draw odds in his semifinal match, whoever the opponent. So clearly Kramnik and
Ponomariov would benefit from the match structure, and Kasparov would probably
not benefit, but how big of a deal is this? What are the magnitudes of each
player's advantages and disadvantages? This is an extremely important question,
perhaps THE most important question about the relative merits of Yasser's proposal.
First of all, let's once again draw an important distinction between the meaning
of "highest-rated player" and "strongest player". Ratings
are inexact, and so the player with the highest rating might not actually be
the strongest player. There is no way to exactly measure who the strongest player
is; all we can do is talk about the "likelihood" that each player
really is the strongest in the world. The rating list tells us (with great accuracy)
who has been most successful recently, and gives us some idea of who will do
best in the near future, but we should always remember that no rating difference
is ever 100% conclusive; you have to deal with probabilities rather than absolutes.
By the way, I want to applaud the decision of the Einstein Group to use an
average of the FIDE and Professional ratings for the invitations and seedings
in their Dortmund qualifier. I had already mentioned a year ago that a simple
average of the two ratings did an excellent job of masking the limitations of
each individual one, so I think it was a great decision. To keep things consistent,
I have done the same thing in the following analysis (using the April 1st 2002
rating lists), although I had to add 50 points to each Professional rating to
make the numbers similar to the FIDE ratings. With these ratings, we can apply
some simple statistics and calculate each player's likelihood of being the strongest
in the world.
Unsurprisingly, it's probably either Garry Kasparov or Vladimir Kramnik. Kasparov
(average FIDE/Prof rating 2842) has a 49% chance of being the strongest player,
whereas Kramnik (2827) has a 34% chance. Veselin Topalov (2758), Ruslan Ponomariov
(2751), and Viswanathan Anand (2751) each have about a 3% chance, and the rest
of the world (2740 and below) has a combined 8% chance. In a perfect world championship
format, whenever Kasparov was indeed the strongest player, he would win the
championship. And likewise for Kramnik. Thus, in a perfect format, Kasparov
would have a 49% chance overall to win the championship, and Kramnik would have
a 34% chance, and so on.
However, the "perfect world championship" is only a myth. We've already
seen (above) that no known world championship format is even 70% effective,
so even in the best case, a third of the time the championship will be won by
somebody who is not the strongest player. We have to keep the matches down to
a reasonable and practical length, and sometimes that just isn't long enough
for the strongest player to demonstrate their superiority over another very
strong player.
I have spent several hours analyzing the statistical effect of draw odds, and
I can state very confidently that the actual selection of Candidates is far
more important than the question of who gets draw odds in a 10-game (or longer)
match. For instance, even if there were no draw odds, Kasparov and Kramnik would
still be "punished" by the fact that they have to play fairly short
matches against players who are certainly weaker, but nevertheless have some
chance to eliminate them. For instance, I just told you that we can be 83% sure
that either Kasparov or Kramnik is the strongest player in the world, but even
after they bypassed the qualifier, there would still be more than a 25% chance
that someone else would actually win the championship.
Ruslan Ponomariov is clearly the beneficiary of the most significant biases
in the "Fresh Start" proposal. Although his combined rating of 2751
puts Ponomariov in a virtual tie for fourth in the world with Viswanathan Anand,
he still has less than a 3% chance of actually being the strongest player in
the world. Nevertheless, Ponomariov would have a 10.4% chance to actually win
the championship. It turns out that if Ponomariov's rating were actually 2783
(rather than 2751), then the numbers would claim that Ponomariov did in fact
have a 10.4% chance of being the strongest player. Thus we can say that the
specific Fresh Start proposal "awards" Ponomariov 32 rating points,
in effect.
This is a very large bias in favor of Ponomariov. To try and put that bias
in more concrete terms, let's envision a fantasy scenario where Kasparov and
Kramnik are the only two players who bypass the qualifier, so Ponomariov has
to finish in the top six in the Swiss qualifier like anyone else. However, in
this fantasy, Ponomariov gets a special advantage (in the Swiss and in the final
rounds of matches) that he receives the white pieces every five games out of
six, instead of every one game out of two. According to my calculations, that
fantasy scenario gives Ponomariov about the same advantage that the actual Fresh
Start proposal gives him. Is that an unfair advantage? Or is it commensurate
with his position as FIDE World Champion? That is for someone else to decide,
I suppose.
It would be tempting to say that +32 rating points is way too many to "award"
Ponomariov, and that he should be granted an automatic place but not given draw
odds. Well, that doesn't really help very much, because the lion's share of
his advantage lies in his automatic Candidate status. Here is how the various
biases are measured by my technique:
(1) Being an automatic qualifier for the three rounds of matches (10/14/20
games): Kasparov -14 rating points, Kramnik -7 rating points, and Ponomariov
+22 rating points.
(2) Draw odds given to Kramnik and Ponomariov in the quarterfinal: Kramnik
+4 rating points, Ponomariov +6 rating points.
(3) Draw odds given to Kramnik and Ponomariov in the semifinal: Kramnik +4
rating points, Ponomariov +4 rating points.
(4) Any player who eliminates Kramnik or Ponomariov in the quarterfinal, inherits
draw odds for the semifinal: Kasparov -2 rating points.
Interestingly enough, this collection of small advantages for Kramnik, and
small disadvantages for Kasparov, are sufficient to make Kramnik the statistical
favorite if the Fresh Start proposal were to actually happen. Kramnik would
have a 38% chance to win the championship, Kasparov would have a 36% chance
to win the championship, and (as I've already said) Ponomariov would have just
over a 10% chance to win the championship. Nevertheless, that is only because
Kasparov and Kramnik are already so close together. In the bigger picture, this
draw odds issue does not seem to merit the attention it gets. A +4 rating point
advantage, across the entire world championship cycle, is less important statistically
than the total advantage you would get from your opponent blundering a pawn
in one single game, sometime during the cycle. Probably this is more of a prestige
issue than anything else, or perhaps there is a huge psychological issue I am
ignoring with my statistics (like the feeling that you are battling uphill from
the start, if the other person has draw odds).
CONCLUSION
As I said way back at the beginning, I have no particular agenda to promote.
However, I have had to re-examine many of my assumptions about chess, as a result
of this analysis, and I hope that will happen for you as well. Among other things,
I now have a much greater respect for Swiss tournaments than before, along with
a greater respect for Yasser Seirawan's judgment and intuition about what makes
a good tournament format! Perhaps some deeply-held beliefs about the "randomness"
of rapid chess will also be challenged as a result of my analysis, but possibly
that is too much to expect. Likewise for the "draw odds" debate, I
suppose...
This essay is the culmination of many, many late-night hours of effort. However,
I hope that it will prove to be a beginning, rather than an end. There are many
problems with the current state of the chess world, and statistics will never
be the only answer to any of them. Statistics are merely a tool, a source of
information, to assist people in finding a better answer to some of their problems.
There has been so much debate, and yet so little objective exploration of the
facts, and so I hope that this will be the beginning of a new effort, a new
kind of debate. I invite you to send me e-mail at jeff@chessmetrics.com, and
if there is enough interest perhaps I will publish a follow-up analysis which
incorporates feedback from all of you.
I would like to conclude with a quote from baseball analyst Bill James: "It
has always been my experience that if you can present a good argument and back
up what you are saying, there are people who will be persuaded. It is sometimes
possible to change the tenor of the debate by injecting information into the
discussion." I hope, very much, that he is correct.
Thank you for taking the time to read this.
Jeff Sonas
Links