More on Ranking Law Schools, and What Can be Learned from Ranking of Sports Teams: Part Two in a Series

In my last column, Part One in this series, I offered some observations about the increasingly loud and frequent criticisms of the US News rankings systems for law (and medical) schools that have been voiced in recent months by some of the nation’s best-known (and generally most highly rated) professional schools themselves. I explained that while some of the grounds for complaint are quite powerful, others may be self-contradictory and perhaps also self-serving. Some of the problem, I suggested, arises not just from the US News methodology in particular, but also from the fact that most every ranked institution feels underappreciated and thus will have qualms about any ranking that is not tailor-made to its own preferences. But I also pointed out that rankings controversies and criticisms (in a variety of arenas) are nothing new, and for that reason the professional-school-ranking world may benefit from looking at how rankings methods are evolving in the world of sports. In the space below, I suggest at least four ways in which rankings of academic institutions can borrow from innovations in college sports rankings.

To set the stage for these potential lessons, let’s begin by noting that both academic-institutional rankings and sports rankings make significant use of surveys or polls of (presumably) knowledgeable “experts” to evaluate and compare the overall strength of competing institutions. But such surveys of experts (in all popular domains of rankings) suffer from many flaws, including the fact that human perceptions (even perceptions of presumed experts) suffer from feedback loops, anchoring effects, and recency bias, as well as from the inability of each expert to know a lot about all the institutions being ranked. Take, for example, US News law rankings’ reputational surveys conducted among deans and professors of other law schools, as well as among a small number of lawyers and judges throughout the country. Are the several hundred academics who receive and return their surveys really the most knowledgeable folks about all 200 or so ABA-approved schools? Do lawyers and judges in some parts of the country really know much about very many law schools in other regions? And when survey respondents don’t really know much, are they inclined to over-rely on the bottom-line rankings from the previous year, creating a self-fulling (and often unrealistic) ordinal sequence where school’ rankings (overall) generally remain static over time, despite significant changes that the schools might have made over a several-year period? And, relatedly, do those rankings, in turn, help perpetuate the (relatively) static ordinal ranking by encouraging well-qualified prospective students and faculty members to choose to attend or work at the same-old group of highly ranked schools?

These same kinds of flaws plague college sports rankings as well. How many basketball games can each of the 62 Associated Press voters, whose rankings come out weekly, really watch, especially when each of these voters is a journalist who is busy cranking out content about the local team s/he covers? It is undeniable that many voters lack deep knowledge about many of the teams they evaluate. And such AP voters are undoubted overly influenced by last year’s (or last week’s) outcomes when they vote, even as rosters of college teams change tremendously between years and even within each year.

So what can be done to address these problems? In the college sports world, journalist voters have increasingly been introduced to and encouraged to make use of numerical, analytic metrics to help in the assessment, especially of teams that a voter likely has not had a chance to observe (in person or even on TV) over a large sample of games. Prominent metrics systems—which focus on statistics rather than human perceptions—in college basketball include the NET rankings, KenPom rankings, and Sagarin rankings. These systems rank teams (and sometimes individual players) based on many categories of offensive and defensive efficiency (e.g., how many points-per-possession a team has scored or given up over a large number of games), the frequency with which a team gets rebounds of its own missed shots enabling the possibility of so-called second-chance points, the margins of victory and loss of teams rather than just win-loss records, and the like. And all these statistics purport to take account of the quality of the opponents (again, as measured by statistics) against whom each team has competed, and whether the statistics in each game were achieved at a team’s home venue (where crowd support and perhaps more friendly refereeing help), on the road at opponents’ arenas (whose hostility itself varies by venue and may be taken into account), or in so-called neutral sites.

Even for journalists/voters who make use of metrics, there is a tendency for experts to think they can “beat the market.” Here’s one (admittedly anecdotal) illustration of how analytic metrics may warrant even more attention by analysts/journalists/AP voters: Here in Champaign, Illinois, a local newspaper (whose beat writer is one of the 62 AP voters for college basketball) previews and predicts the outcome and score of each game for the University of Illinois men’s team (which as of now has an overall record of 19-10). Putting aside predictions of actual scores—which are usually almost impossible to predict very accurately—the outcome record, as of the drafting of this column, by our local paper here this year (yes, an admittedly limited sample size of just one year) is 16-13. That is, of the 29 games, the local expert predicted the correct outcome for the Illinois team 16 times. If you exclude the six home games against very weak opponents from much less competitive conferences (games in which the Illini were favored by double digits and in which getting the outcome correct in favor of Illinois takes very little knowledge), the record is 10-13. If, instead, one had in each Illinois contest simply picked the team that the ESPN Power Index predictor (yet a fourth analytics ranking system), said, based on its number-crunching algorithm, had a 50+% chance of winning, one would have a prediction record of 23-6 (or 17-6 if you exclude the truly non-competitive matches).

To be sure, aspects of the various metrics methodologies themselves are open to criticism, and the algorithms are being refined (and hopefully improved) year by year. But the trend regarding their importance and use is quite clear. Even if the so-called “eyeball test” remains important in sports rankings, and even though individual voters are not yet replaceable by automated assessment systems (in part because each team doesn’t play all other teams each year and teams get better or worse during each season, such that fully controlling for quality of opponent is impossible), more and more decisions that matter—e.g., the selection of the field of 68 March Madness teams—are deeply influenced by the numbers.

So my first suggestion is that academic-institutional ratings should make better use of numerical data as well, and that the “voters”—those who fill out academic reputational surveys—will consult such data with greater frequency and sophistication when casting ballots. But just as controlling for things like strength of schedule in sports rankings is hard, so too comparing numerical assessments of academic-institutional performance can be challenging. Two examples drawn from the law-school world are: job placement numbers and bar passage numbers. The ABA collects, and US News weighs somewhat heavily, the percentage of a law school’s graduating class that is employed in full-time, long-term (that is slated to last a year or more) jobs that require or benefit greatly from having a law degree. Seems fair enough; law schools ought to be launching not just good careers but distinctively legal careers. (While some small number of graduates for personal reasons may prefer to work part-time after graduation, there is no reason to believe that percentage won’t be pretty similar among all law schools, and thus become a non-factor.) But what about law graduates who want to continue pursuing formal education (say, a PhD or an advanced degree in a particular field of law) rather than do legal work right after graduation? The percentage of these folks does vary greatly across law schools. For many years up until the current one, US News had been counting such folks as not being fully employed. Rightly (from my perspective), US News saw fit in the coming rankings to eliminate that discrimination against graduate students. US News also recently decided to stop discriminating against jobs that were funded by a graduate’s own law school or parent university. Here too that change makes some sense; if a school has the resources to provide good jobs that last a year or more and that offer additional on-the-job training in specific fields (like public interest law) for recent graduates, why should such graduates’ jobs count for less?

The difficulty here, of course, is whether these jobs all do in fact provide good work/training opportunities (and a fair wage) rather than make-weight tasks (and a pittance) that are fashioned merely to boost a school’s employment numbers. If a law school funds very low-pay, low-training-value positions for its graduates, should such jobs be counted the same way as all other employment? Probably not. But this observation reveals a much bigger problem (and one US News has not yet addressed) that comes from simply counting how many people are employed without looking more carefully at the types of jobs the graduates have. There is a big difference between working at the most sought-after law firms or in the most coveted judicial clerkships in cities where the supply of highly regarded law graduates exceeds demand, on the one hand, and jobs that offer less sophisticated work (and much lower pay) in much less sought-after cities and institutions. Of course, some graduates would rather work in Peoria, Illinois, than Chicago, but, in all honesty, it is much harder to secure a good job in the latter most of the time. For this reason, schools that are located in states (like California) where highly pedigreed graduates from all over the country are vying for jobs in tight markets (like San Francisco) are going to have lower placement rates than schools in lesser populated states where job seekers are not competing against nearly as many top-performing law graduates. Trying to account for these differences in markets is not an easy thing to do, but not doing it makes meaningful comparison hard too.

Or consider bar passage rates (another criterion on which US News compares schools). Different states have bar exams that vary in difficulty; the so-called “cut score” (or score needed to pass) differs by state. Until recently US News compared each school’s bar passage rate only by looking at the state in which the plurality of a school’s graduates sat for the exam. So if 70% of a school’s graduating class sat for, say, the Missouri bar, and had a high pass rate (in part because Missouri has among the lowest cut scores), that school would look very good even if the other 30% of its graduates had a much tougher time passing bar exams in other states. Happily, US News now looks at where each graduate takes a bar and assesses performance against the backdrop of each state’s overall pass rate. But even that correction doesn’t level the playing field. Why? Because not only do pass rates differ among states; so too do the academic strength of the pool of bar takers themselves. So, for example, California has not only a high cut score (and thus a lower pass rate on that account, something US News has now controlled for), but it also has a pool of test takers that is much stronger than the national average (because many of the most ambitious and talented graduates around the country want to live there.) As of now, US News does not account for that latter factor, and so schools that have a large number of graduates who take the California bar are at a disadvantage (both as the bar-exam and placement-rate aspects of US News).

The second way in which academic rankings can learn from sports rankings concerns the timing of rankings. The College Football Playoff Rankings (which in recent years has determined which four teams vie for the two playoff games that result in a national championship game) do not come out until the second half of the season. This is because the rankers understand that until there is a body of actual evidence about how good each team is in a given year, it is counterproductive to rank teams, especially when doing so would be unduly influenced by the previous year (and by media hype going into each season), and would also risk unfair stickiness that rewards the teams that were expected to be good but that may not be delivering on expectations. What does this mean for law school rankings? Perhaps that they shouldn’t be done every year. How much real change occurs year to year anyway? Perhaps ranking schools every three or five years (using data averages drawn from the whole three- or five-year period) would be more sound. US News doesn’t rank most of a university’s academic departments each year. It does so only for professional schools, but there is no reason to think professional school (relative) quality changes more quickly than does the relative quality of other departments. (I understand that ranking less frequently may result in less revenue for US News, but it might also bolster the journal’s credibility—even moving to every-other-year rather than every-year rankings would be an improvement.)

That leads me to a third suggestion: if averaging rankings criteria over a time period (see above) makes sense, so too does averaging rankings across different methodologies. In college basketball, for example, the March Madness Tournament selection committee makes use of multiple analytic systems (and is understandably somewhat guarded about its own processes) as well as “eyeball” tests like AP rankings. Just as in politics, a poll of polls is often more accurate than most individual polls, so too the answer to dissatisfaction about US News rankings perhaps should be the support of various other rankings so that no one rankings system dominates.

All of that leads me to my final suggestion, also drawn from college basketball. One of the great things about KenPom and other analytic rankings of college hoops is that each consumer can, with a push of a button, rank teams based on the criteria that they find most important. That is, they can create a personalized ranking. One big drawback of US News is not just that some of its component factors might be flawed, but also that the weighting of the various factors (while perhaps defensible) is somewhat arbitrary. How many students really care how many volumes are in a law school’s library? And yet that traditionally was a factor in the rankings. If a prospective student cares about not just overall job placement rates but placement rates among large firms, or in prominent public interest organizations, or in clerkships at federal courts, shouldn’t the student be easily able to adjust the relative weight of different factors? Or if a prospective student (or faculty member ) cares about the frequency with which faculty at a given school are cited in legal periodicals or in judicial opinions or are downloaded on SSRN, or about the diversity of the student body (things US News currently does not include at all), shouldn’t it be easy for that consumer to adjust the weights of the competing variable easily? Just as the answer to bad speech in America should usually be more and better speech, so too the answer to bad rankings might be more (not fewer), better, and more well-tailored rankings.