How strong are the top chess programs?
In
recent months I have been devoting considerable effort to gathering and analyzing
data about computer chess programs, with the assistance of David Levy. Later
on, I will be publishing my results on the ICGA and ChessBase websites, but
in view of the recent performances by Shredder
in Argentina and Brutus
in Germany, I would like to give a quick evaluation while our memory of
those tournaments is still fresh.
In July I would have said that there was one computer program at 2800 strength
(Junior), three programs at 2700-2740 strength (Shredder, The King, and Fritz),
and five more (Brutus, Rebel, Chess Tiger, Hiarcs, and Yace) in the 2630-2690
range. In the past couple of weeks, Shredder had identical 8.5/10 scores in
the GM and IM divisions of a tournament in Argentina, while Brutus just finished
off an impressive 9/11 score at the Lippstadt GM tournament. How do those results
change my view of the top programs?
Well, Shredder's 8.5/10 score in the GM tournament was certainly very impressive.
With an average opposition rating of 2445, that's about a 2740 performance rating.
However, you have to balance that against the IM event at the same location.
Against opposition averaging 120 points lower, Shredder still gave up 3 draws
to reach the same 8.5/10 score. Any way you look at it, that's a performance
rating 120 points lower (2620) in that event.
It seems weird to talk about an 85% score being at all disappointing, but that's
just how it goes when you are playing against such "weaker" opposition.
Based on its pre-match rating, Shredder should have scored at least +9 if not
a perfect 10 in the weaker event. My revised estimates show Shredder, The King,
and Fritz all together in the 2700-2720 range. Here is a summary of Shredder's
tournament results over the past 20 months:
I should point out that the math gets a little unclear when you are a 300+
rating point favorite. Shredder would have to play in a much stronger human
tournament in order for us to really make much of a claim as to whether its
strength is 2700 or 2750 or 2800. Hopefully it will get that opportunity very
soon!
Also, there is some very interesting evidence I've uncovered, suggesting two
things:
- The strongest computers do progressively worse than expected against humans
when the computer has a rating advantage of more than 100 points.
- The strongest computers do better than expected against 2600+ players, and
this is increasingly true against progressively stronger humans.
I'm still in the research phase, but that effect certainly could be exaggerating
the perceived gap between Junior and Shredder, since Junior's human opponents,
on average, have been almost 300 points stronger than Shredder's (over the past
year and a half).
Let's move on to Brutus. Its performance at Lippstadt was very impressive and
also very informative. Before the tournament, I had zero games in my database
between Brutus and FIDE-rated humans. This made my #5 ranking of Brutus somewhat
suspect. My analysis has shown that games against humans are twice as significant
as games against computers, when you are trying to figure out how strong a computer
is. Thus we were really missing two-thirds of the story about Brutus because
we didn't know how it does against humans.
Well, now we have a much better idea of the overall strength of Brutus. Despite
a pretournament expectation of a +3 or +4 score, Brutus managed a +7 score (82%).
That's a lower percentage than Shredder managed, but remember that Brutus faced
a much stronger field than either of the Argentina events; the tournament field
at Lippstadt was just short of a 2500 average. It works out to a performance
rating of about 2765, clearly the best result of its career:
Of course, there is a big difference between a performance rating and an actual
rating. For example, nobody was saying that Junior was suddenly 2950-strength,
just because it scored 3.5/4 in a match last year against Mikhail Gurevich.
Nevertheless, 11 games is a pretty good sample. My computer rankings place a
lot of emphasis on very recent results, and a lot of emphasis on games against
humans. These results suggest that Brutus now deserves to be ranked #2 in the
world among computer chess players, trailing only Junior, with an approximate
strength of 2730-2740.
Now that we finally have a good amount of data on how Shredder and Brutus have
done against human players, here is a graphical comparison, summarizing the
performance ratings of the top computers over the past 20 months, including
only verified games against FIDE-rated humans:
As I mentioned earlier, I have a lot more to say about computers in chess,
but for now you'll have to be satisfied with these graphs. However, please feel
free to send me e-mail at jeff@chessmetrics.com
if you have any questions, comments, or suggestions.
|
Jeff Sonas is a statistical chess analyst who has written
dozens of articles since 1999 for Kasparov Chess website. He has invented
a new rating system and used it to generate 150 years of historical chess
ratings for thousands of players. You can explore these ratings on his
Chessmetrics website. Jeff is also V.P. of Engineering for Ninaza,
providing web-based medical software for the health care industry.
Previous articles:
|