The Gotham Gulf and the 8 True WFTDA “Divisions”

Three divisions were not nearly enough. Using Gotham's rank score dominance as a guide, let's try to more accurately separate WFTDA teams.

The WFTDA ranking system, and the competitive structure it supports, has been in a constant state of flux ever since switching over to an algorithmic system in 2013 and replacing regional voting to determine playoff entrants.

Over the last several months, two major changes were made to the formula. The first switched to relative strength factors instead of absolute rank difference to calculate ranking points earned in a game. The second eliminated of the playoff bonus multipliers, making all games equally weighted.

These changes have made volatile the ranks and rank scores of a number of teams. However, there is one aspect of the ranking system that you’ve been able to set your watch to over the last two years. Gotham Girls Roller Derby, a league in a class of its own, always winds up with an enormous ranking points lead over the second-ranked team in the WFTDA.

It doesn’t matter how the ranks are calculated or if they are viewed at a relative or absolute scale. Gotham has always been #1, with a bullet.

On an absolute scale, Gotham's lead over the second-ranked team appears to be growing, even after the recent changes to the ranking algorithm…
On an absolute scale, Gotham’s lead over the second-ranked team appears to be growing, even after the recent changes to the ranking algorithm…
…but as everyone’s ranking points have increased due to those changes, the relative gap between #1 and #2, when set to scale, is not as large. (It’s still pretty friggin’ large, though.)

From the start, Gotham’s lead has been huge. In the first rankings release, Denver started behind and quickly fell further back. Bay Area got as close as anyone in the rankings, but slipped a bit as playoff games cycled in and out of the calculations.

Now, Rose City sits in the number two slot behind Gotham. Seeing Portland’s ranking point total (620.51) and corresponding strength factor (4.89) is a testament to how well the team has played together in 2014, and how so very close it came to slaying the dragon this year.

But then, there’s the dragon: Gotham’s ranking score (711.48) and strength factor (5.60) is significantly higher than Rose City, despite the two teams clearly showing themselves as equal on the track during a particular day in early November.1

The large multiplier bonuses previously given to teams that advance far in the playoffs, something Gotham always does, contributed to the past boost in its ranking score. That didn’t apply during WFTDA Championships this year, however, since that aspect of the algorithm was removed in October.

Still, with the ranking system starting to flush out games not directly affected by the improvements to the algorithm, and more teams under Gotham getting closer and closer to them on the scoreboard on a semi-regular basis, you have to wonder if the rank score gap between Gotham and Rose City, or #3 Bay Area (614.81 points), will ever come down to a level that resembles the differences between teams nearer each other a few steps down the totem pole.

That leads to a question I often ask myself as I see the ranking scores and gaps keep climbing higher at the top of the table: What do the differences between rank scores signify, really?

I try to picture a difference of 91 ranking points between Gotham and Rose City, and what that translates to in terms of the relative strength difference of the two teams as based on their play over the previous 12 months. I struggle to do it, and not just because of how the relative weight of ranks have changed so much this year. Does having 12% fewer ranking points make you 12% worse of a team? Is Gotham really better than Rose City today, equally as much as Gotham was better than Denver two years ago? Is that even a correct way of thinking about it?2

Always seeing every aspect of roller derby at multiple angles, I found inspiration on how to try to better visualize the Gotham ranking gulf as it stands today, courtesy of the WFTDA and its newest change to its ranking structure.

Along with the release of the November 2014 rankings, the WFTDA announced that it would be dissolving the three-tier division system and corresponding divisional gameplay requirements it has had in place since developing the rankings algorithm.

The requirements, if you’ll recall, asked teams wanting to go to the playoffs to play more games against same-division opponents, depending on what division they were assigned to following the previous playoff season. This was meant to ensure that teams were playing against opponents that were within the same general skill level, and playing against them more frequently, to help produce more accurate rankings for playoff seeding.

That the WFTDA is getting rid of divisions after only two years of service makes it seem, in retrospect, that they were more vanity plate than an at-a-glance guide to the tier list of actual team strength. The ranking algorithm and difference in rank score did the real work to guide teams in choosing opponents, since beating up team ranked far away delivered little benefit against the risk of falling flat against them and taking a big hit in the ranking point average.

This is probably why the WFTDA nixed divisions and compulsory scheduling, which created an imbalance at the divisional borders. A low-ranked Division 1 looking to return to the playoffs had to play three D1 teams above them, but could only add on one team below them in D2 to finish their checklists. Teams at the top of D2, meanwhile, could play three games against pretty much anyone they wanted in the D1-D2 crossover zone, making their scheduling job much easier.

Now that playoff qualification requirements are the same for everyone—four sanctioned games against any opponents of any rank—every team can essentially play within its own custom division. Teams with a rank in the low-30s, can, for example, go up against any combination of four or more teams in the #20-60 rank bubble around them (and, importantly, the geographic bubble around them) without being denied an equal chance to qualify for postseason play as those a rank-chunk above or below them. That’s definitely a better way of doing things.

Still, the original concept of divisions, that which groups together teams of relatively similar merit, isn’t one that should go away. Today, competitive WFTDA roller derby amounts to a 220-team international superleague, with more teams joining every month as more and more sanctioned games are played. That’s a lot!

From the start, the three divisions created by the WFTDA were never going to be enough to properly partition everyone. Even the differences between teams within the former Division 1 was far too great to put them under the same 40-team umbrella, as the significant rank score gaps between them is any indication.3

If the large rank score difference between them are any indication, it may be stretch to put Gotham and Rose City under the same umbrella, too.

Yes, that sounds ridiculous. (Because it is.) However, WFTDA rankings haven’t quite yet caught up to reality. That massive rank score gap between the #1 and #2 teams is still there, one that doesn’t seem like it should be as large as it is. Particularly, when compared to the rank score differences of equal teams further down the list.

This thought led me to try to experiment with devising a better-divided division system, one that looks at WFTDA rankings on a different scale, and then see whether the rankings and divisions it creates accurately reflects the difference between teams.

All of this is based on the gap between the top two teams—90.97 points—and what that difference may or may not represent in the real world.

Starting with the rank scores of Gotham (711.48) and Rose City (620.51), I continued to subtract 90.97 ranking points (529.54, 438.57, etc.) and used the resulting values as boundaries for new divisions. This created eight altogether, the top seven of which were spaced equally apart.

Next, since one of the goals here is to visualize the calculated difference between Gotham and Rose City, I put Gotham alone in Division 1 (d1) and Rose City at the top of Division 2 (d2).4 I then slotted in the remaining teams into their new fake-divisions, using their actual rank score from the November 2014 ranking release.

The resulting table is below. In making rank score the dividing factor, instead of actual rank, you get an interesting take on the current competitive landscape in the WFTDA.

cap
The table is set to the scale of ranking points score, with each division representing the ranking points difference between Gotham and Rose City. (Click for full size.)

As you can see here, four teams join Rose City in d2, and six teams each populate the new d3 and d4. Falling into d5, more teams have rank scores nearer each other, a trend that doubles and re-doubles in d6 and d7. Once you hit d8, the last and lowliest fake-division, almost all the teams in it are so weak/new, they’re assigned the minimum strength factor rating of 0.50.

By coincidence, the fake-division gaps created here separated the top 12 teams—those that played at the real WFTDA Championships—into three chunks: d1, d2, and d3. Teams that would rank high enough to make the real Division 1 playoff field, those from d4 and d5, were also nicely separated from the rest of the pack. Also by coincidence, the teams in d6 (rank #41-90) almost match up with the just-dissolved WFTDA Division 2 (rank #41-100).

That last part is something I’m still trying to wrap my head around. If Gotham and Rose City have been more equal than their 91-point ranking score difference would suggest, then why wasn’t the real Division 2, more than 80% of which can be enveloped within the same difference, much more competitive from top to bottom?

We can say with certainty that the Division 2 playoffs had a lot of closer games this year, with 14 of 36 finishing within 25 points and only eight resulting in a triple-digit blowout. On the other hand, that was in a tournament that had 20 teams within only 40 ranking points of each other. Cramming that many similar teams into the same tournament should always produce a result like that.

If you expanded the D2 playoff field to 50 teams, which would cover the same rank score range seen between Gotham and Rose City, chances are you’re not going to get the same ratio of close games/not-blowouts as happened in the Division 1 playoffs, which needed more than twice as many games (and five times the blowouts) to get the same number of heart-stopping finishes.5

It seems to me that there is a difference in value between 90 rank points at the top of the table, and 90 rank points nearer the middle and bottom. To try and find out the scale of that difference, let’s go back to our eight fake WFTDA divisions and make a few observations based on the real results of the 2014 WFTDA playoffs.

Working under the assumption that current WFTDA rankings are a good gauge on the strength of teams that played during the past playoff season, we will use the results of the playoff games as a retroactive check to see how reliable the fake-division placement of those teams are. If games between same-division opponents are reliably close, or at least reliably not blowouts, then those teams are well-grouped together.

This is a roundabout way of demonstrating that teams with a small rank score difference will play in closer games more often than two teams with bigger ranking score gaps. While this is obvious, there’s still the question of what the definition of a “small” and “big” ranking score gap is in terms of actual ranking points.

What’s the WFTDA ranking point threshold for when a close game or blowout is probable? Again, our fake-divisions do a pretty good job of drawing those lines.

If two top-40 teams within a 91-point rank score difference (the same “division,” effectively) played each other right now, based on their playoff results there would be a better than 3-of-4 chance of the game being competitive, and a nearly 1-out-3 chance the the final score will be super-close. Nice!

If from neighboring divisions, with the rank score between 91 and 182 ranking points apart, the chance of a close game becomes much more rare. You will see a decent game happen instead of a blowout twice as often, though. While not the best odds, those aren’t bad ones.

Once you hit a two-division range of over 180 ranking points, however, things go south in a hurry. At this rank score difference, virtually every game is a triple-digit blowout, and a rather large one at that.6 A gap of this magnitude is a clear “do not play” warning to the top teams that want to avoid scheduling a boring public bout, which is pretty helpful information, albeit information that’s not very useful in determining the difference in rank score scale up and down the WFTDA rankings.

To get to the bottom of this, take a gander at this final table that compares the 40 D1 playoff games played among teams within the same 90-point rank score “divisions,” versus all 36 games from the Division 2 playoffs, at which 20 teams covered about half of that rank score range.7 If the scale of ranking points was the same, we’d expect to see fewer closer games and more blowouts in D1, given the wider range of same-division teams.

Instead, we get another coincidence in an analysis jam-packed with them.

.
The maximum rank score gap of the Division 1 games listed here is 90 points. The max gap during the Division 2 playoffs is around 40 points. Yet the result of the games are pretty much the same.

Remarkably, the rate of hits and misses is the same. If you separate the top 40 teams into five equal divisions based on their ranking points score and restrict their playoff games to other teams within the same division only, you get the same ratio of close games and blowouts as you would in a tournament that takes the top 40% of teams within what would work out to be the sixth equal division.

If the sixth division really was “equal” to the other five we would expect the same rate of good/bad games up and down the entire d6 range, not just within the top part of it. This is strong evidence that the lower divisions are not quite so equal after all, adding to the argument that WFTDA ranking points are all not created equal.

Let’s wrap things up with what we can learn from all of this.

– The WFTDA was right to eliminate playoff multiplier bonuses from its ranking algorithm. More ranking points were easier to get higher up the rankings, and the artificially-high rank score aided near-playoff teams that were capable of competing against opponents that had previously received the playoff bonuses. Removing this is likely why a lot of mid-range teams suddenly zipped up the rankings from out of nowhere in October, since their true strength wasn’t getting cancelled out by the “fake” strength added by the multiplier.

– There are (still) wide skill gaps between the top teams in the WFTDA—but not as wide as the ranking scores would suggest. The impetus for this analysis was the 90.97 ranking score gap between Gotham and Rose City, and what that meant. Well, I still don’t know what it means. But I am fairly certain that that number is too large given what those two teams have showed themselves capable of during the playoffs, and how unlikely more lowly-ranked teams of a similar rankings gap would likely perform against one another.8 In my mind, these teams and many others at the top are closer together than their rank strength would indicate, and I hope the next few WFTDA rankings updates will start showing that.

– There is still room for improvement in the WFTDA ranking algorithm and the data it uses to for calculations. Ever hear of the phrase, “garbage in, garbage out?” Blowouts are still very common in WFTDA roller derby between equal teams. Bad blowouts are guaranteed against unequal ones. Those massive relative score differences are being fed into the same calculus as close games, and I wonder if putting these two types of games into the same dataset is the best way of going about things. Putting in a cap or scaling penalty on how much gain can be made from an X-point blowout might be a useful algorithm tweak at some point in the future, as fully rewarding a team for beating an opponent they have no business playing against may be further inflating the rank scores of top teams.9

– There needs to be higher granularity within the competitive WFTDA landscape. Two WFTDA regions grew to four WFTDA regions, then those were scrapped in favor of three WFTDA divisions, which themselves were dropped. This analysis demonstrates that even eight divisions may not be enough to properly separate teams from a competitive standpoint, yet the WFTDA has chosen to go back to where it started from: One big rat king where everyone is stuck together.

The difference between then and now, however, is several hundred more WFTDA teams. Perhaps we’re reaching a point where it’s time for the WFTDA to go back to a regional system (or a regional-divisional hybrid system) to give more teams a chance to compete for something more meaningful than a chance to move up the rankings. With as many teams as there are in the WFTDA, there are lots of options and many possibilities.

For instance, you could have four regional champions from the (fake) Division 6-8 collective and pit them against each other alongside the finalists from the (real) Division 1 and 2 at WFTDA Championships. The WFTDA European Tournament (WET) looks to be a first step toward rebooting the European (and Canadian) division that the WFTDA indicated it wanted to create way back in 2010—before scrapping that, too.

But that’s a future conversation. As things stand now with the WFTDA and its formulaic ranking system, it’s doing a pretty fair job of gauging team strength and accurately positioning teams. The recent tweaks to the system will certainly improve things further.

Eventually, teams will start battling Gotham on an even playing field, if they haven’t started doing so already. Once the ranking gulf comes down and the top teams are constantly closer together, maybe we won’t need multiple “divisions” to best separate them.

  • nocklebeast

    I vaguely recall reading on Flat Track Stats several years ago that they use(d?) a function (perhaps the hyperbolic tangent) to treat all bouts with more than a 100 point difference as same for their ranking system. Also it appears the difference between Gotham and Rose to be only1-2% in their ranking system.

  • captainlouelbammo

    X-Point blowout is an irrelevant term when discussing the ranking algorithm. The algorithm uses score ratios which means there is not much benefit in beating a team 800-5 versus 400-5 or even 200-5 or 100-5. The real test for the winning team is to keep your opponents score low so that you can keep the ratio high.