I am convinced that the ITF and USTA have provided their organizational leaders with “talking points” for staying “on message” when pitching the (World Tennis Number ) WTN. Additionally, I am pretty sure that one of the primary items on that list is that the WTN number is “very accurate.” That assertion is prominently repeated in online media, in presentations, and even during my casual conversations with organization insiders.
Quite frankly, an effective rating system has to be highly accurate. At the same time, tennis is a game of high volatility. Measuring and predicting tennis performance is fraught with peril. Most of the reservations expressed to me about WTN from the competitive tennis community center around uncertainty over the accuracy. Consequently, it is not surprising that the participating organizations are directly addressing that concern.
However, those statements are rarely accompanied by additional detail that illuminates why it is believed that the WTN is accurate. One slight exception to that came during the Intercollegiate Tennis Association (ITA) WTN coaches webinar. Even better, the additional information indirectly came from Victor Enciso who is a Senior Data Scientist at the ITF.
As an aside, when I was Google stalking Enciso for insight into his background, I naturally ran across his own WTN rating. He isn’t a strong tennis player. In fact, his rating is well below my indicated WTN “Game Zone.” In short, the WTN data suggests that I would crush him in a match. This is a person that was not hired for his tennis ability but rather for the skills needed to do his job. The tennis world needs more of that.
In answering questions about variance and outliers raised about the algorithm, Enciso pointed to a Frequently Asked Question (FAQ) document. That resource was recently updated with additional information about the methodology. Specifically, I now know that WTN uses the Glicko-2 rating system.
Coincidently, this is not my first brush with Glicko because it was originally developed for chess ratings. I happened to coach scholastic chess for about 8 years when my kids were in elementary and middle school. That sparked some deeper explorations of the various systems at the time.
The data associated with each player in the Glicko-2 system includes a rating (“r”), a rating deviation (“RD”), and volatility (“σ”). Volatility is the degree of expected fluctuation in a player’s rating and can come from erratic results. The Glicko-2 system includes the range summary of a player’s strength that the WTN has branded as the “Game Zone.”
For me, the knowledge that WTN is using the Glicko-2 system was reassuring because the metric wasn’t recently conjured up specifically for application to tennis. Additionally, it was developed by Dr. Mark E. Glickman, who is a Fellow of the American Statistical Association and a Senior Lecturer on Statistics at Harvard University. He has a long involvement with the US Chess rating committee that he currently chairs.
In any game of high variance such as tennis, “perfection” cannot be measured by unexpected results. This is what makes sports great. The side that should win “on paper” doesn’t always prevail. There will always be outliers and upsets. That is why we actually play the matches in the first place.
For me, this part of my exploration of the WTN has increased my confidence in the validity of the metric. I think that it will prove to be an effective tool with many valid Use Cases. It potentially enables transformative approaches for tennis both in the United States and Internationally.
The only question now is if, and how, the various organizations will actually leverage the opportunity.
- ITA x ITF World Tennis Number Coach Webinar, YouTube Unlisted Video, Recorded January 17, 2023.
- The Science Behind ITF World Tennis Number, ITF World Tennis Number News Post, September 5, 2022.
- World Tennis Number Frequently Asked Questions, ITF Hosted Web Page, viewed March 25, 2023.
- Example of the Glicko-2 system, Mark E. Glickman, March 22, 2022.
The Glicko-2 rating system was originally developed for chess-like contests that have win-lose outcomes rather than a “score”. This detracts from its applicability and validity when applied to a sport such as tennis. For example take 2 new players: A and B – who have just two recorded results each, all against the same established player C who has an ITF WTN with 100% confidence of 25.
Player A narrowly lost both matches (5-7, 6-7 and 6-7,6-7). Player B lost more decisively (0-6,1-6, and 0-6,0-6). To the average observer, a reasonably good assessment of A and B’s levels relative to each other and relative to C are apparent. A is much stronger than B and slightly weaker than C. To the Glicko-2 algorithm it just sees both new players as exactly the same, having each now lost 4 sets to the same well-established opponent. That should scream “problematic” to anyone thinking of using this as a meaningful rating system.
Sorry, I clicked to many times on the Post Comment button.
In the sixth paragraph, second sentence, “I happened coached” you can leave out “happened” or change to “I happened to coach”.