SUMMARY REPORT OF QC COUNCIL MEETING IN ATHENS
1-2 June 2013
QC consultant Jeff Sonas has developed a "Retroactive Rating Calculator" so that FIDE can investigate hypothetical changes to the rating regulations. This allows us to determine what would have happened to all ratings calculated since 2008 if those hypothetical regulations had already been in effect, starting in 2008. This essentially provides a "beta-test" on proposed changes, to ensure the intended positive consequences would actually have occurred, and to identify any unintended negative consequences resulting from the changes.
The QC has used this "RRC" to investigate a number of possible changes to rating regulations, considering their impact on three main areas:
- absolute magnitude of ratings (i.e. inflation/deflation)
- the total number of rated players
- correlation between ratings and future results (i.e. predictive ability of ratings), both for the entire pool and for targeted sub-groups.
There seems no justification for modifying the K=10 coefficient currently used, as increasing this coefficient would push top players' ratings even higher than they currently are, and would reduce the accuracy of top ratings. However, increasing the other two K factors by 33% (K=15 becomes 20, K=30 becomes 40) will increase the accuracy of ratings throughout the entire rating pool without substantially increasing absolute magnitude of ratings. A smaller increase of 20% (K=15 becoming 18, K=30 becoming 36) would improve accuracy even slightly more, but the 20/40 coefficients were preferred in order to preserve the simplicity of manual rating change calculations.
(2) Sean Hewitt Proposal (maintaining most dynamic K-factor for players
Sean Hewitt of the English Chess Federation suggested this simple modification, which would allow freer rating change for young developing players, especially those who play frequently at a young age and thereby hit 30 games early in their development. Recent changes to the rating regulations (including the move to a monthly list, as well as a larger population of low rated opponents so that players surpass 30 rated games sooner) make this rule change very appealing, and it has a very significant positive impact on the accuracy of ratings for all players, even for players out to 100+ games played.
(3) For unrated players, only 5 games required for first rating, and all events and all games vs rated opponents count toward first rating, starting with first event where player scores >0% against rated opponents
This recommendation comes from the retroactive analysis of rating accuracy since 2008, and is a major change in the philosophy of the rating system. Prof. Elo’s principles maintained a system that blocks new players from achieving a first rating in many different ways. Jeff Sonas showed us that it is much better to provide a first rating more easily, and then allow players to regulate their rating by playing chess.
We had previously expected that ratings would be made more accurate by requiring even more than 9 games for the first rating, and assumed it would be perilous to allow first ratings based on fewer than 9 games, as that might threaten the integrity of all ratings. Of course we liked the idea of increasing the rating pool more rapidly, by making it easier to get an initial rating, but feared such a change because of the tradeoff of having less accurate ratings.
However, the simulation indicates that this does not appear to be an issue. In fact, requiring more than 9 games for the first rating does not improve its accuracy. And making it easier to get a rating will increase the size of the rating pool (by an additional 6,000 rated players each year) while actually improving the accuracy of all ratings. Simulations reveal a great benefit to providing a first rating as soon as possible, even after only 5 games. This generates more rated players sooner, so that they can serve as a rated opponent more quickly for others, and gives the new players more rated games during their period of high K-factor, in which to allow their ratings to find a proper level.
(4) For direct titles because of rating, the thresholds remain unchanged but a minimum of 27 rated games (including the ones for first rating calculation) is proposed
This is in order to avoid those "rockets" who play 9 games, have first rating over 2300 and immediately apply for FM. Also note that although the ratings of the highest rated players are gradually increasing over time, the reverse is happening a bit lower in the rating list, outside of the top 100. This means that less players are expected to get a rating over the rating requirement for FM and IM titles, and as a result there is a physical obstacle to players who wish to acquire titles. The increased K-factors (to 20 and 40) will also add an additional downward pressure to this registry around 2300-2400 points.
NOTE ABOUT INFLATION
The QC has also been analyzing the FIDE historical database of ratings and game results in order to better understand "rating inflation" in the FIDE rating pool.
Although a quick glance at the top of the rating list in recent years suggests a clear overall "inflationary" trend, the actual situation is much more complex. Among the players rated 2000 or higher, only the ratings of the 2700+ group are increasing, whereas on average, all players rated 2000-2700 are actually losing rating points each year! For instance there are fewer active players rated 2200+ each year.
In fact the differences in playing strength among the strongest players appear to be gradually increasing more and more, as a fairly clear trend for the past 20 years - the ratings of the top two or three thousand players are stretching apart more and more. So for instance we see that the difference in strength between #50 and #500 on the rating list is gradually increasing, as is the difference between #100 and #1,000, between #500 and #5,000, etc. This can be seen both from inspection of the rating list, as well as direct measurement of head-to-head games over time between players of comparable ranks on the rating list. Thus the ever-increasing ratings of the top 100 should perhaps be viewed, not as an undesirable artifact of the rating calculation, but rather as a desirable reaction of the rating system to this overall change in the distribution of top player strengths.