# NCAA Basketball - Rankings for Selection Sunday

## Introduction

These pages contains details and results of an approach for ranking US college basketball teams, aimed at guiding the selection of teams for annual NCAA national tournament. For details of the background to the selection process see here

Our approach is to quantify each team's strength of schedule in terms of a benchmark number of wins. This enables us to compare schedules for different teams, in terms of how many more wins we would expect a team to get if they played one schedule rather than another. To calculate this strength of schedule we estimate the average number of wins a fixed team, comparable in ability to the last team in the national tournament, would get if they played that schedule. If a team wins more game than their strength of schedule, then that is evidence that they warrant a bid to the national tournament.

Intepreting the results is relatively straight forward. In 2008/09 Creighton had a win-loss record of 26-7 and Boston College one of 22-11. While Creighton won 4 more matches than Boston College, their strength of schedule is 26.5 wins and Boston College's is 21.5 wins. Thus their schedule is 5 wins easier than Boston College's -- and thus we rank Boston College to have the better record, by one win.

The strength of schedules are calculated by fitting a statistical model to all regular season results to date. This is used to estimate the ability of each team, and thus the likelihood of winning a match against that team. We also give a margin of error, which gives an indication of how accurate the resulting estimates of the strength of schedule are. This can be used to highlight the teams for which there is uncertainty about whether their record deserves a bid to the national tournament.

Below are Frequently Asked Questions page, which contains more detail of the ranking procedure, and the interpretation of the results.

Full details of the method can be found in the academic paper Calculating Strength of Schedule, and Choosing Teams for March Madness.

## Background

There are around 340 division I US college basketball teams, almost all of whom belong to one of 31 conferences. Each March there is a national basketball tournament (known as March Madness). This is a knockout tournament, in which (from 2011) 68 teams take part. The field of 68 consists of 31 conference champions, and a further 37 teams that are picked by committee. The announcement of these 37 teams, and the draw for the tournament, is made on Selection Sunday.

The difficulty with selecting the 37 teams for the tournament is that the strength of schedule of each team varies greatly. For example in 2009, Arizona were given entry to the national tournament, with a Win-Loss record of 19-13, whereas Creighton, with a 26-7 record, were not. Picking the 37 teams with the best win-loss record is not appropriate.

• How are the rankings calculated?
We compare a team's actual win-loss record to that given by their strength of schedule. To account for differences in the number of games played, the comparison is based on the percentage of wins. Formally we calculate a team's win percentage minus their SOS win percentage, and rank on this.
• How do the rankings differ from using the Rating Percentage Index (RPI)?
The RPI aims to rank teams based on their win-loss record after compensating for the strength of each team's opponents. This is done by calculating a weighted average of a team's win percentage, their opponents' average win percentage, and their opponents' opponents' win percentage. To account for home advantage, more weight is given to away wins and home losses in calculating the first of these. See here for more details.
There are a number of issues with the RPI. For example, there are academic papers which suggest the RPI is biased against teams from stronger conferences. Also there are some strange artefacts of how the RPI is calculated. Firstly the weighting for away vs home wins/losses treats the effect of home advantage as equal for all matches, whereas home advantage is particularly important when teams play other teams of similar ability, and has little effect on the result when a team plays an opponent who is either much stronger or much weaker than they are. In our calculation of strength of schedule, the effect of home advantage correctly takes account of the strength of the opponent.
Secondly, a team's RPI can be decreased (often substantially) by playing very weak teams, even if the team wins all those matches; or conversely can be increased by losses to very strong teams. This appears to have been a factor in Arizona State not being selected for the NCAA tournament in 2008 (see this paper for more discussion). By comparison, our approach correctly accounts for the information from these results. For example, a match against a very weak team will increase a team's SOS by (almost) 1 expected win. Thus providing the team wins that match, it will have negligible effect on a team's rating as its number of wins less its expected number of wins (from the SOS) will be unchanged.
• How do the rankings differ from other rankings, such as Jeff Sagarin's Computer Ratings?
Evidence suggests that the most accurate computer rankings take into account winning margin, and not just the result of each match. An example is Jeff Sagarin's Pure Points system. The problem with using such ranking systems to choose who receive bids for the national tournament, is that if two teams have comparable schedules it will be the team with the better net point difference which is ranked higher, and not necessarily the team with the most wins. This goes against the idea that what primarily matters is who wins a match and not by how much. It could also mean that the best tactics for a team that is losing is to play conservatively and not try to win, rather than take risks in an attempt to win, but risk losing more heavily.
Our approach is based on ranking teams based on their win/loss record against an expected number of wins (the SOS) for their schedule. It uses computer modelling, which takes account of winning margins, in order to most accurately estimate the ability of a team's opponents and hence that team's SOS. Thus it takes advantage of the extra information that can be obtained from using winning margins, whilst still ranking the teams based on how many matches they win, rather than by how much they win or lose by.
• What are a team's win percentage and SOS win percentage
A team's win percentage is calculate as the number of wins divided by the number of matches played for that team. The SOS win percentage is the SOS number of wins divided by the number of matches played for that team.
• What is the margin of error?
To calculate each team's strength of schedule we need to know quantify the strength of each of their opponents. This can be done using statistical methods based on the results of matches within the current season. However this only gives us an estimate of the strength of each team, and there is corresponding uncertainty in these estimates. (This uncertainty is particularly large towards the beginning of the season.) The margin of error quantifies the corresponding uncertainty in the estimates of the strength of schedule. We define the margin of error so that the chance of the SOS being wrong by more than the margin of error in a particular direction is about 5%.
Consider Creighton in 2008/09. Their strength of schedule is estimated as 26.5, but the margin of error is 0.6. Thus their actual strength of schedule could lie anywhere from 27.1 to 25.9 (i.e. 26.5 plus or minus 0.6), though it is most likely to 26.5.
As Creighton only won 26 games, the strength of schedule suggests they did not do enough to warrant a bid. However, the margin of error information, suggests they are worthy of further consideration, as the difference between actual wins and estimated strength of schedule is less than the margin of error.
By comparison Baylor won 19 games with a strength of schedule of 19.7. This time the difference is greater than the margin of error, so there is very strong evidence that Baylor have not won enough games to warrant a bid.
• Why are the expected number of wins not whole numbers?
Whilst the actual number of wins a team has must be a whole number, the expected number of wins need not be. For example, if we consider a single match, and say it is equally likely the team would win or lose, then the expected number of wins for this match is 0.5.