Citation as DNA: Who’s Your Data’s Daddy?
Case law database services are a central and highly competitive segment of the multi-billion dollar legal information market. It’s where Westlaw and Lexis began and where, a decade or so later, they faced competition from LoisLaw and others taking advantage of CD-ROM publication and the Internet. More recently Bloomberg Law stepped into the main ring. Over the past decade, the lower cost, bar-benefit services FastCase and Casemaker have also moved aggressively to claim market share. Innovative start-ups like CourtListener and Ravel Law continue to pop up. As conspicuously as these services compete on price, quantity, quality of their supplementary material (briefs, citators, pertinent commentary), currency, completeness, and user experience, little is said or known about the source and quality of their core content. Do you know where your service of choice gets its case law? I undertook to find out.
Imagine you could look under the hood at Westlaw, LexisNexis, and Bloomberg Law. What do you think you would find? One of these, let’s call it “A” for now, appears to conform its case law data consistently to final official reports in those U.S. jurisdictions where they exist, either in print or electronic form. Another, “B” for now, regularly makes editorial changes in precedential opinions without any signal to distinguish its additions from the court’s original text. A third, “C”, loads the slip opinions as released by a court, adds volume numbers and page breaks drawn from both official and unofficial reports to their texts but with the exception of a few jurisdictions, fails ever to conform those texts in other respects to final, official data. Can you say which of these is Bloomberg Law, which is Lexis, and which, Westlaw? If you have used Google Scholar’s potentially disruptive service, have you pondered its approach to case data acquisition? What about Casemaker and Fastcase?
Assuming that the source of case law is important to lawyers, judges, scholars, and librarians, this column explores how citation format can be used to expose the true parentage of a case law collection. (By parentage I mean the ultimate source of the opinion texts it offers, not the intermediary from which the database host may have acquired them in digital form or the citation data it may have added to them.) Since any case law collection of professional value includes both contemporary decisions and cases first disseminated during the print era, the possibility of different sources for different periods of time is quite real.
The Range of Options
Let us begin with the dominant source of benchmark texts for current decisions, the print reports known collectively as the “National Reporter System” (NRS), published by Thomson Reuters. Westlaw is the direct digital descendant of those print reports. Today’s Westlaw includes cases that have yet to appear in the NRS as well as many cases that Thomson Reuters will never publish in print, but the online versions of the decisions the company does publish in the NRS are ultimately conformed to their print counterparts. In the sense I am using the phrase here, the NRS is Westlaw’s data source. The NRS is not conformed to final, official opinion texts issued by the courts themselves. In fact, when asserting copyright in its versions of both federal and state court decisions the company has argued that the extent of its editorial work within opinion texts (and intermixed indistinguishably with them) meets the Copyright Act’s “originality” test. Furthermore, in some jurisdictions, the final NRS version is published well before official publication of an opinion and therefore cannot be conformed to it. Westlaw is “B”.
Where do the other legal research services get their case law? In far too many U.S. jurisdictions, the NRS version is, by default or explicit policy, the “official” version. It replaces the version of the decision initially released by the court (the so-called “slip opinion”) in at least two important respects. First, it contains the volume and page numbers needed to cite the case or any portion of its text. Second, it may contain subsequent revisions of the opinion by its authors or by Thomson Reuters editors with the approval or acquiescence of the court. For jurisdictions taking this approach to case law dissemination, the NRS version of a decision, not the version initially released by the deciding court, is the final, citable version. Many court web sites offer years of back decisions, but only in their original “slip opinion” form. Some even warn users not to rely on them. The Florida Supreme Court’s cautionary note, referring to some 14 years of back decision files, reads: “These opinions are . . . subject to formal revision before publication in the Southern Reporter, 2nd Series.” Since the final, revised opinions appearing in the NRS are not substituted for the initial versions held at the Florida site, this approach gives Thomson Reuters and Westlaw unique access to their final “official” text and forces all others to conform their case data to the NRS if they aim to have opinions that carry full citation information as well as all textual revisions occurring after initial release.
While a majority of U.S. jurisdictions rely entirely on the NRS, the U.S. Supreme Court and the appellate courts of perhaps a third of the states retain control of case law dissemination all the way through to a final, official version. Some jurisdictions continue to issue of a set of official print reports. Others, having ceased print publication, now release their appellate decisions digitally in a final, citable form.
In sourcing the case law of jurisdictions that are dependent on the NRS, legal database services have several options. The competitive need to offer an up-to-date collection will leave them little choice but to load temporary “slip opinion” versions of cases as soon they appear at a public web site. Once decisions have been subjected to the NRS editorial process, a competing service can merely draw and associate the resulting NRS volume and initial page numbers with the original text files without adding internal page breaks or worrying about intervening editorial changes. At additional effort and expense, it can add NRS page breaks to original decision texts. This is the approach labeled “C” above, and appears to be the Lexis model. A final option is to substitute the text exactly as it appears in the NRS, along with all embedded citation information, for the original “slip.” Whether to take either of the latter steps without a license from Thomson Reuters depends on an entity’s taste for litigation risk and the terms on which such a license is available. Thomson Reuters appears not to have conceded that the copyright it takes in NRS volumes extends only to the headnotes and similar identifiable editorial matter added to them.
Jurisdictions that produce their own “official” case reports allow an alternative approach, namely drawing decision texts (and citation data) from those reports rather than from the proprietary NRS. If the jurisdiction is one of the few that have moved to releasing decisions digitally in final, citable form, the question reduces to whether to retain that version or to replace it with the NRS text of a case once the latter becomes available. If the “official” version is only released later, in print, it still provides an alternative to the NRS. On the other hand, a service can respond to the need to cite using the official report system simply by inserting official report volume and page numbers in the opinion text as originally released. Alternatively, it can insert that additional citation data in the text as drawn from the NRS. Lastly, it can substitute the text exactly as it appears in the official report, the “A” approach followed by Bloomberg Law. How can one tell which approach a given service has employed? Citation analysis. Like DNA, under the right conditions citation format within a decision’s text can provide nearly conclusive evidence of parentage.
Citation Format as DNA
Among the editorial “improvements” made by the Thomson Reuters employees preparing cases for Westlaw and the books of the NRS is the addition of parallel NRS citations to all citations that lack them. With decisions from jurisdictions that do not publish their own case law, that addition is unnecessary; the NRS cite is all there is. It is also unnecessary for decisions handed down by courts that routinely include parallel NRS citations themselves. But the standard citation format used by a number of states includes only the official cite for a case, either throughout or following a first full parallel citation, and that is a traceable marker. It allows one to determine whether a given database has been drawn from the state sponsored “official” source or from NRS-based data.
To apply this approach, I first looked at decisions from Illinois and New Mexico, two jurisdictions that use universal citation to release opinions digitally in final citable form, opinions that themselves cite the states’ own precedent using official system citations without parallel references to the NRS.
On April 25, 2013, the Illinois Supreme Court decided Palm v. 2800 Lake Shore Drive Condominium Ass’n, 2013 IL 110505. The court’s web site has the final version of the text, complete with numbered paragraphs that enable pinpoint citation. A week earlier the New Mexico Supreme Court decided Sunnyland Farms, Inc. v. Central New Mexico Elec. Co-op., Inc. That decision, with its official, print-independent cite, 2013-NMSC-017, and paragraph numbering is also available in final form at a state site. Authenticity is assured by digital signature. Subsequently, the NRS published a version of both these decisions, with internal pagination and the standard NRS editorial treatment of case citations. While the official version of the Illinois decision cites in-state precedent using the official format alone and the New Mexico decision, following an initial full parallel reference to the decision below, employs only the state’s print-independent citation format, in the NRS those and other citations are given full parallel references to the NRS reports. That simple difference provides a DNA-like marker. The format of the citations appearing within those two decisions on a legal research service reveals the service’s data source for current Illinois and New Mexico case law.
Of the big three database services, only Bloomberg Law consistently conforms to the format of the official version. That is true with these two decisions and with decisions from jurisdictions like Massachusetts, Michigan, and New York that publish their own official reports in print and cite only to those reports. The decision of the Michigan Supreme Court in the case of Titan Ins. Co. v. Hyten, released on June 15, 2012, illustrates. When it was published on the court website, it carried neither official nor NRS citation information. With dispatch the “slip” decision was added to the Michigan case law collections of Westlaw, Lexis, Bloomberg Law, LoisLaw, Casemaker, Fastcase, and Google Scholar. Later, the opinion was published in the Michigan Reports (491 Mich. 547) and the NRS North Western Reporter (817 N.W.2d 562). The source of the current version of the case offered by those same services, as determined by citation analysis, is shown in the following table.
|Lexis||Original version released by the court||Lexis has, however, added both official report and NRS pagination to the original version|
|Bloomberg Law||Official report version||Shows both NRS and official report pagination|
|LoisLaw||Official report version||Shows NRS case cite but not NRS internal pagination|
|Casemaker||NRS||Shows both NRS and official report pagination|
|Fastcase||NRS||Shows both NRS and official report pagination|
|Google Scholar||NRS||Shows only NRS pagination. With recent Massachusetts decisions, however, Google Scholar exhibits reliance on the official report text.|
Note that Lexis follows a unique course. While it draws and inserts pagination data from both the state published official report and the NRS, it retains the original “slip opinion” text. How can one be sure that Lexis has not substituted the official report version for the “slip” and then inserted NRS pagination; the approach of Bloomberg Law? The evidence is to be found in cases that contain internal cross references. Typically this occurs when a majority opinion responds to a point made by a dissenter or vice versa. In jurisdictions that do not number paragraphs, the initial version of such references must either use slip opinion pagination or leave the page number blank, it being left to others to substitute a permanent page number. A public law reporter fills in the reference with official report pagination; the Thomson Reuters editorial staff substitute NRS pagination. Which one appears in an online database reveals its data source.
A decision with this telltale feature is Commonwealth v. Woodbine, handed down by the Massachusetts Supreme Judicial Court on March 28, 2012. The first, second, and third paragraphs of a lengthy dissent refer to specific pages of the majority opinion. In the slip opinion these cross references are blank (e.g., “Ante at __”). On Lexis they remain blank revealing its retention of the slip version despite the addition of volume and page breaks from the official report and NRS. On Bloomberg Law, LoisLaw, and Google Scholar the blank cross references are filled with page references drawn, presumably along with the rest of the decision text, from the official report. Casemaker and Fastcase contain the parallel page references inserted by Thomson Reuters. The same approach (searching for cross references within cases signaled by the words “ante” and “post”) reveals that Lexis generally relies on the original opinion text rather than either the NRS or subsequent official report version across U.S. jurisdictions, including importantly the federal Courts of Appeals. One notable exception is California, a state with which Lexis holds the official report publication contract. In similar fashion Thomson Reuters alters its normal practice and provides official report pages numbers for cross references within decisions from two states for which it publishes official reports, Massachusetts and New York. In the case of New York the publishing contract requires it to do so.
For reasons less obvious Google Scholar, which mostly draws current case data from NRS, is not totally consistent. With Massachusetts, at least, Google Scholar relies on the state’s official report.
Decisions From Pre-Digital Days
A competitive case law database must not only continuously add and further process current cases; it requires a sizable retrospective collection. Before the Google Scholar case database launched, Google had to acquire back-files for all fifty states and the federal courts, reaching if not all the way to their earliest case reports well into the 20th century at a minimum. In states that no longer published their own case law (Florida, Indiana, and Missouri, to name three) there was only one option, the NRS regional reporter, at least for the years after the state’s cessation of public law report publication – 1948, 1981, and 1956, respectively. For any state still publishing a set of official reports (Michigan, say, or Kansas or New Hampshire), an alternative source was available. Convenience and consistency might have argued for using the NRS as the source for all fifty states; other factors favored using “official” state-published reports where they existed. Citation analysis reveals that Google Scholar used the official, state published decisions for Kansas, Michigan, and New Hampshire. Its copies of decisions from the 1980s for those jurisdictions contain pagination from those states’ official reports even as decisions from Florida, Indiana, and Missouri show page breaks drawn from the Thomson Reuters’ regional reports.
Bloomberg Law and LoisLaw also opted for the official source, though Bloomberg requires closer analysis on this point. Unlike Google Scholar and LoisLaw, Bloomberg shows both official report and NRS page breaks in decisions that have both. How can one tell from which source it drew the decision texts? The case citations appearing in decisions from official report states like Michigan, Massachusetts, and New York provide the telltale marker. They do not include parallel references to the NRS, while the NRS versions of the same decisions do. In other words, when a Michigan decision rendered in 1956 cites prior authority, the citation appears in the official report as “Chase v. Clinton County, 241 Mich. 478.” In the regional reporter that citation has become “Chase v. Clinton County, 241 Mich. 478, 217 N.W. 565.” The Bloomberg Law version of this 1956 decision contains the citation to Chase without the NRS addition. One can observe the same in the Bloomberg Law versions of decisions from Massachusetts and New York.
Using these methods (and the same limited sample of decisions and states) to determine the data sources for the retrospective collections of other legal research services yields the following conclusions about their data sources.
|Lexis||Official report version||Official report version||Official report version|
|Casemaker||NRS version||NRS version||NRS version|
|Fastcase||NRS version||NRS version||NRS version|
A comprehensive review of where the major legal research services have drawn their case data would require a much broader sample of jurisdictions and decisions than I have examined. There are many reasons a particular service may treat jurisdictions differently. For example, despite its tight genetic link to the NRS, Westlaw offers New York appellate decisions in their official format—as an option. It holds the publication contract for the state’s official reports and, as previously noted, the contract’s terms require it to do so. Similarly, Lexis appears to conform decision texts to official reports in at least some of the states for which holds the publication contract. Both Casemaker and Fastcase have important relationships with state bar associations. It is quite plausible that those relationships may have affected or might in future affect their treatment of particular states’ case law. I invite others to pursue the point.
My principal aim has been to investigate and draw attention to the existence of differences among the major database services—differences both in how they deal with current decisions and in how their back files were assembled. The three top-of-the-line services do appear to represent three distinctive approaches. Westlaw is NRS. Bloomberg Law draws its case data from official reports in those jurisdictions that release them in electronic form or where they still exist in print. Lexis generally retains opinions in the form released by the deciding court, merely inserting NRS and official report page breaks as they become available. Are these differences important? Do users care? While lawyers, judges, and librarians, especially, pay lip service to the importance of document “authenticity” and a preference for “official” sources, it seems quite possible that for most, when it comes to case law, “good enough” and “generally reliable” carry the day. However, if knowing your data’s daddy is important to you, applying the techniques of citation analysis I’ve demonstrated here should reveal the identity. Might this “citation as DNA” approach lead to erroneous attributions of parentage? I would welcome debate over any of the conclusions reached here.
As print law reports disappear and online case law sources continue to proliferate, the origin and quality of their data warrant serious attention. It is far too easy to lose sight of that and be drawn by the flash of a new user interface, the enticement of lower cost, or the comfort of a familiar brand.