Forensic firearms identification involves linking evidence collected from crime scenes — namely, fired cartridge casings and bullets — to a particular firearm. Two assumptions underlie this identification process: First, firearms impart unique toolmarks on bullets and cartridge cases; and second, trained examiners can spot these marks and reliably determine that they were created by the same gun. Such testimony has played a central role in criminal trials for more than a century, connecting specific guns to specific crimes.1 But in recent times, the technique has faced increasing scientific and judicial scrutiny. This scrutiny is likely to increase as proposed amendments to Federal Rule of Evidence 702 — which governs expert testimony in federal courts — are set to take effect this December.
This essay summarizes a comprehensive review of more than 100 years of firearms comparison evidence case law. We first describe how judges’ initial skepticism of the new methodology quickly transformed into near-universal acceptance, largely because confident experts displayed dazzling new technology, terminology, and techniques. But after decades of rote acceptance of the assumptions underlying firearms comparison evidence, judicial engagement and skepticism in the technique have surged. Of the judicial rulings discussing this kind of evidence that we reviewed for our comprehensive online database,2 more than half were penned after 2010. Several factors are associated with this uptick, including the release of several scathing reports by the National Academy of Sciences (NAS), the President’s Council of Advisors on Science & Technology (PCAST), and the findings of new empirical studies that call into question the validity of firearm identification.
For the first time since 2000, the Federal Advisory Committee on Evidence Rules has proposed a set of amendments to Rule 702.3 The revisions specifically make clear that: (1) the proponent of an expert must show, by a preponderance standard, that various reliability requirements are met; and (2) an expert’s opinions must be supported by a reliable application of trustworthy methods to data.4 These amendments stem from the committee’s growing concerns over courts’ failure to properly vet expert scientific evidence in criminal cases. The Advisory Committee notes emphasize that these revisions are “especially pertinent” to forensic evidence.5 Further, for forensic pattern-comparison methods, like firearms evidence, the committee noted that opinions “must be limited to those inferences that can reasonably be drawn from a reliable application of the principles and methods.”6 The Rules Committee’s proposed amendments to Rule 702 will take effect on December 1, 2023, unless Congress acts.7
The committee’s amendments are likely to affect a wide range of types of expert evidence. By clarifying the burden on the party seeking to introduce an expert and highlighting the need to assure reliable use of methods to reach conclusions, the committee’s concerns are particularly important in the context of forensic methods, like firearms evidence, that have grown out of the experience of practitioners but have never been carefully scientifically validated or subject to robust empirical testing. As we will describe, the rule change targets the two main concerns that judges have raised in that context — (1) methodological reliability and (2) the overstatement of opinions in conclusions reached by experts — in firearms evidence cases.
Judges have increasingly assessed firearms examiners’ evidence as part of admissibility challenges. The dramatic rise in judicial engagement with the scientific limitations of firearms comparisons illustrates that scientific input matters to judges when they apply Daubert v. Merrell Dow Pharmaceuticals Inc. and Rule 702. Over time, more robust oversight from both scientific and legal stakeholders is likely to promote enhanced accuracy of evidentiary methods. We conclude by examining lessons regarding the gradual judicial shift toward a more scientific approach.
An unfired cartridge case contains three basic components: (1) a primer (which is located at the head of the cartridge case); (2) propellant (i.e., gunpowder); and (3) a bullet. When someone pulls a gun’s trigger, the gun’s firing pin strikes the primer. This strike creates a spark, which ignites the propellant. The propellant’s ignition then forces the bullet to detach from the cartridge case and exit the firearm’s barrel.
This firing process can impart marks — called toolmarks — on the cartridge case and/or bullet.8 Critically, different types of guns create different types of toolmarks. For example, because manufacturers produce firing pins of different shapes, these pins can leave distinct indentations on a primer. Similarly, each gun’s barrel is lined with grooves that impart a spiral spin on a bullet. These grooves vary by number and direction, so they also create their own marks. Practitioners call these types of general features “class characteristics.”9 The ammunition’s size is also a class characteristic. Class characteristics are a useful first step in firearm examination because practitioners can seek to rule out certain guns with clearly different class characteristics. Alternatively, observing similar class characteristics indicates that a particular firearm cannot be ruled out and thus warrants further examination.
But mere agreement of class characteristics is not enough to determine that bullets or cartridge cases were fired by a particular gun. To draw that inference, examiners must identify and evaluate “individual characteristics,” which are defined by the Association of Firearm and Toolmark Examiners (AFTE) as:
Marks produced by the random imperfections or irregularities of tool surfaces. These random imperfections or irregularities are produced incidental to manufacture and/or caused by use, corrosion, or damage. They are unique to that tool to the practical exclusion of all other tools.10 at 65.
Examiners rely on training and experience to assess whether marks are these so-called “individual characteristics.”11
Examiners following the AFTE Theory of Identification — the process of toolmark identification used by most professional firearms examiners12 — will compare the individual characteristics of casings or bullets recovered at a crime scene to casing or bullet exemplars fired by a particular gun. Based on this comparison, examiners can reach one of several conclusions: identification, elimination, or inconclusive.13 There are no numeric thresholds for how many individual characteristics must be observed before the examiner can declare an identification or match. Rather, the AFTE Theory states that an identification can be reached “when the unique surface contours of two toolmarks are in ‘sufficient agreement.’”14 As defined by AFTE:
The statement that “sufficient agreement” exists between two toolmarks means that the agreement of individual characteristics is of a quantity and quality that the likelihood another tool could have made the mark is so remote as to be considered a practical impossibility.15
The first reported opinions discussing firearms comparison evidence date back to the 1870s and reflect mixed perspectives on the admissibility of expert testimony.18 One of the earliest opinions on the topic was written in 1902 by none other than Oliver Wendell Holmes, then the chief justice of the Massachusetts Supreme Judicial Court. The defendant in that case — Commonwealth v. Best19 — was convicted of murder. On appeal, Best argued that firearms comparison evidence was erroneously offered at the trial.20 In his quintessentially succinct style, Justice Holmes swiftly disposed of these arguments, concluding that “the sources of error suggested were trifling.”21 Despite this being one of the first published opinions on the admissibility of firearms toolmark evidence, Justice Holmes found “no reason to doubt that the testimony was properly admitted.”22
In a 1923 case, however, the Illinois Supreme Court powerfully rejected expert firearms comparison evidence.23 That court reversed the conviction at issue for multiple reasons,24 but it particularly took issue with the state’s use of a police officer as an expert. The officer was “asked to examine the Colt automatic 32” in evidence, and at trial he testified that the gun “was the identical revolver from which the bullet introduced in evidence was fired on the night [the victim] was shot.”25 The court disagreed:
The evidence of this officer is clearly absurd, besides not being based upon any known rule that would make it admissible. If the real facts were brought out, it would undoubtedly show that all Colt revolvers of the same model and of the same caliber are rifled precisely in the same manner, and the statement that one can know that a certain bullet was fired out of a 32-caliber revolver, when there are hundreds and perhaps thousands of others rifled in precisely the same manner and of precisely the same character, is preposterous.26
By the late 1920s, however, the Illinois Supreme Court’s skepticism marked the exception, not the rule. The courts were particularly wooed by the work of Major Calvin Goddard. Goddard had founded a private crime laboratory — the Bureau of Forensic Ballistics — and published a seminal article on firearm evidence for the U.S. Army.27 In an especially influential case, Evans v. Commonwealth,28 Goddard testified during trial that he “only required one single test to identify the bullet in evidence as having been fired through the Evans pistol.”29 The Kentucky Supreme Court concluded that Goddard’s opinion was admissible, but only as a lay opinion, not that of an expert.30
Beginning in the 1930s, judicial acceptance of firearms comparison testimony spread nationally. Judges appeared powerfully influenced by Evans, which became one of the lodestar cases for admitting firearms comparison evidence. Soon, use of toolmark evidence in criminal prosecutions became “accepted” and “well-recognized.” By the 1970s and 1980s, courts routinely admitted firearms expert testimony, often citing to Evans without further discussion.31 Courts did, however, expect examiners to possess specialized training and credentials. But judges did not question the methodology itself.
Following the U.S. Supreme Court’s Daubert ruling, federal courts might have been expected to begin to more carefully scrutinize firearms evidence, and some did so.32 With this increased scrutiny, defendants’ objections began to shift away from concerns about specific experts’ qualifications to concerns about the reliability of the underlying methodology’ itself.33
But this change did not immediately follow the 1993 landmark case of Daubert. Instead, as Figure 1 illustrates, each decade through the 1990s reported a steady number of 20 or fewer judicial rulings regarding firearms comparison evidence. But by the 2000s, these rulings began to increase in number. Interestingly, the larger increase began after 2010.
An initial turning point may have been the District of Massachusetts’s 2005 ruling in United States v. Green.34 There, the government sought to introduce expert testimony that the individual characteristics of six shell casings matched a recovered firearm “to the exclusion of every other firearm in the world.”35 Then-Judge Nancy Gertner called this conclusion “extraordinary.”36 She emphasized that in “distinguishing class and sub-class characteristics from individual ones,” the examiner “conceded, over and over again, that he relied mainly on his subjective judgment. There were no reference materials of any specificity, no national or even local database on which he relied.”37 Despite these concerns, Judge Gertner candidly acknowledged that “the problem for the defense is that every single court post-Daubert has admitted this testimony, sometimes without any searching review, much less a hearing.”38 Judge Gertner thus admitted the testimony, but she did not “allow [the expert] to conclude that the match he found . . . permit[ted] ‘the exclusion of all other guns’ as the source of the shell casings.”39 Slowly, other federal courts began to follow Judge Gertner’s approach, and law enforcement agencies, such as the FBI, ultimately disavowed conclusions of a match “to the exclusion of every other firearm in the world.”40
Over half of the rulings in our database occurred after 2009, when the NAS released a groundbreaking report, Strengthening Forensic Science in the United States.41 The 2009 report contains a scientific assessment of a variety of forensic science disciplines, along with recommendations for improvements.42 The report critiques “the lack of a precisely defined process” for firearms evidence.43 Because the AFTE methodology “does not even consider, let alone address, questions regarding variability, reliability, repeatability, or the number of correlations needed to achieve a given degree of confidence,”44 firearms examiners are “not able to specify how many points of similarity are necessary for a given level of confidence in the result.”45
Building on this work, PCAST published a 2016 report evaluating commonly used forensic science techniques in criminal proceedings. PCAST evaluated all of the existing scientific studies that tested the validity of firearms identification. It concluded that, with a single exception, the studies were not appropriately designed to truly test firearms examiners’ accuracy. Specifically, the tests used in the vast majority of studies was a sorting task that allowed examiners to use a process of elimination to make identifications. PCAST analogized the tests to a Sudoku puzzle, making the tests — and the incredible results — totally unlike real-world comparison work.46 The sole study that PCAST deemed appropriately designed came from the Ames National Laboratory in Iowa (Ames I). That study reported a 1.01 percent false positive error rate (in other words, in about one out of 100 comparisons, the examiner incorrectly reported a match). However, this error rate ignores the substantial number of inconclusive results provided by examiners.47 PCAST concluded that “[b]ecause there has been only a single appropriately designed study, the current evidence falls short of the scientific criteria for foundational validity.”48 Much like the NAS report that preceded it, PCAST pointed to the need for additional, appropriately designed studies to test the validity of firearm examination.49
Admissibility challenges to firearms identification evidence surged following the PCAST report’s release. These challenges were largely unsuccessful. But in 2019, Judge Todd Edelman in the D.C. Superior Court conducted an extensive admissibility hearing. In that hearing, the court considered these reports, as well as testimony from mainstream research scientists, who explained the principles of scientific testing and why, contrary to the claims of firearms examiners, the studies do not actually show low error rates.50 Following the hearing, the judge issued a ruling that “precluded the government from eliciting testimony identifying the recovered firearm as the source of the recovered cartridge casing.” Instead, the Court ruled that the government’s expert witness must limit his testimony to a conclusion that the firearm “cannot be excluded as the source of the casing.”51
Other courts around the country began to follow suit.52 One court noted, however, that the FBI and the Ames Laboratory were “currently conducting a second black box study on the AFTE Theory,”53 and the results of that study could potentially change the trajectory of recent opinions.
That study — the FBI/Ames Laboratory study (Ames II) — has become a modern mystery. A detailed report of the study was first posted online in early 2021 and admitted into evidence in several trials.54 But the report seems to have been subsequently scrubbed from the internet. A portion of the report was recently published in a peer-reviewed journal, though all of the research scientists from the Ames Laboratory “declined authorship and individual acknowledgment.”55 The published study reports false positive error rates of less than 1 percent, though scientists have raised serious questions about those estimates’ veracity.56 Still, the portion of the study that was not published is a rather stinging indictment of forensic firearms identification: When an examiner analyzed bullets or cartridge cases a second time, she reached a different conclusion 21–38 percent of the time.57 Even worse, when two different examiners analyzed the same bullets or cartridge cases, they reached different conclusions 32–69 percent of the time.58
|Examples of Court-ordered Conclusion Language||Citations From Selected Examples|
|“more likely than not”||United States v. Glynn, 578 F. Supp. 2d 567 (S.D.N.Y. 2002)|
|“reasonable degree of ballistic certainty”||United States v. Monteiro, 407 F. Supp. 2d 351 (D. Mass. 2006)|
|“consistent with”||United States v. Sutton, No. 2018 CF1 009709 (D.C. Sup. Ct. May 9, 2022)|
|“a complete restriction on the
characterization of certainty”
|United States v. Willock, 696 F. Supp. 2d 536 (D. Md. 2010)|
|“the recovered firearm cannot be excluded as the source of the cartridge casing found on the scene of the alleged shooting”||United States v. Tibbs, No. 2016 CF1 19431, 2019 WL 4359486 (D.C. Super. Ct. 2019); Missouri v. Goodwin-Bey, No. 1531-CR00555-01 (Cir. Ct. Green County, Mo., Dec. 16, 2016)|
|“qualitative opinions” can only be offered on the significance of “class characteristics”||People v. Ross, 129 N.Y.S.3d 629 (N.Y. Sup. Ct. 2020)|
While courts have mostly continued to admit firearms examiner testimony, many now admit the testimony “only under limiting instruction[s] restricting the degree of certainty” to which experts may express their identifications.59 The resulting case law is diverse, sometimes inconsistent, and reflects a gradual evolution. Some of the initial decisions that followed the 2009 NAS report held that an examiner could only testify to a milder degree, forbidding aggressive statements like “to the exclusion of all other firearms in the world,”60 and instead imposing a more cautious formulation of confidence, such as a “reasonable degree of ballistic certainty.”61 Other courts have taken a different approach, using more familiar standards of proof as a frame of reference. For example, courts have ruled that examiners could only opine that it was more likely than not that a bullet recovered from the crime scene came from the defendant’s firearm.62 The table at right summarizes some of the main approaches that courts have taken toward limiting testimonial conclusions about whether a bullet found at the scene came from the firearm in question.
While the consensus approach in the early 2000s adopted the formulation of “a reasonable degree of ballistic certainty,” it is not clear what level of confidence actually constitutes a “reasonable degree of certainty.” Starting in 2020, the U.S. Department of Justice (DOJ) therefore barred examiners in federal cases from using that or similar terminology.63 DOJ also prohibited examiners from making assertions of a “zero error rate” or “infallibility.”64 Some judges have likewise begun to scrutinize experts’ probabilistic claims and limited experts’ ability to claim infallibility or a lack of error rate.65
A growing group of judges also offer intermediate approaches. For example, a District of Columbia judge held that an expert could testify that the ammunition in question was “consistent with” being fired from a particular firearm.66 Another district court ordered that an expert could offer a statement of consistency but “may not testify, to any degree of certainty, that the recovered firearm is the source of the recovered bullet fragment or the recovered shell casing.”67 In more recent cases, judges have barred experts from making any certainty-based conclusions whatsoever. For example, one court ruled that the examiner could not offer any probability that the firearm in question was the source of a cartridge. Instead, the examiner could testify only that “the recovered firearm cannot be excluded as the source of the cartridge casing found on the scene.”68 In yet another case, the district judge ordered “a complete restriction on the characterization of certainty.”69 The Maryland Supreme Court recently ruled that an expert can only opine on whether spent bullets or cartridges are “consistent or inconsistent” with those known to have been fired by a particular weapon.70
i. Limiting Non-Class-Based Opinions
Going even further, some courts have limited firearms testimony to opinions offered on class characteristics only.71 That is, an expert could explain that a certain type of gun fired the relevant bullets or cartridge cases, but the expert could not testify that the same gun fired two bullets or cartridge cases. Courts have reasoned that descriptions of class characteristics are objective and measurable, while linking bullets to a particular gun is not “the product of a scientific inquiry.”72
ii. Qualification and Proficiency Rulings
Judges have also focused on Rule 702’s preliminary question: whether the proffered expert has sufficient “knowledge, skill, experience, training, or education” to offer conclusions.73 In United States v. Cloud,74 for example, the judge emphasized that one of the two examiners had failed a proficiency test when finding that examiner not qualified to testify.75 Typically, proficiency tests for forensic examiners are administered by commercial test providers, and accredited labs are required to administer such tests annually.76 While these tests present their own concerns of reliability and consistency, they nonetheless highlight the types of errors that practitioners may make. As one of the authors and Gregory Mitchell have argued, a careful inquiry into objective proficiency of the witness should be an integral part of the question of whether a person should be qualified as an expert.77
iii. “As Applied” Challenges
Still additional opinions have focused on Rule 702(d), which provides that qualified expert testimony is admissible only when “the expert has reliably applied the principles and methods to the facts of the case.”78 These “as applied” challenges focus on the expert’s actual work. They examine not just whether the expert followed the right steps, but also on whether the expert’s casework was actually supported by a valid method.79 Opposing parties have focused on, for example, firearms experts’ lack of documentation and the way they applied their methods to a particular case.80 Some courts have found the presence of some documentation, such as “notes, worksheets, and photographs” to be sufficient to admit the expert evidence.81
The arc of judicial review of firearms evidence follows a pattern familiar in forensics generally. After initial rulings that predated modern scientific methods, judges responded to more recent scientific critiques by limiting firearms evidence in a range of ways. Although “an overwhelming acceptance” of firearms identification persists,82 long-entrenched judicial acceptance has eroded in recent years.
Indeed, in perhaps a sign of things to come, a trial judge in Cook County, Ill., recently excluded firearms expert testimony entirely, based on scientific concerns with reliability. There, the judge concluded that the probative value of the evidence was a “big zero” and raised the concern of “yet another wrongful conviction” based on such evidence if the jurors viewed “[t]he combination of scary weapons, spent bullets, and death pictures without even a minimal connection” to expertise that is repeatable and reproducible.83
The proposed amendments to Federal Rule of Evidence 702 reflect yet another step in this direction. They encourage judges to more carefully ask whether the proponent of an expert has met the rule’s reliability requirements and whether the expert’s opinions are themselves scientifically supported.84 As noted above, the Advisory Committee notes additionally emphasize that expert opinions must be supported by reliable principles and methods:
Expert opinion testimony regarding the weight of feature comparison evidence (i.e., evidence that a set of features corresponds between two examined items) must be limited to those inferences that can reasonably be drawn from a reliable application of the principles and methods.85
Further, the committee has emphasized that how broadly or narrowly opinions are expressed by experts should be informed by research. The committee explained that: “In deciding whether to admit forensic expert testimony, the judge should (where possible) receive an estimate of the known or potential rate of error of the methodology employed, based (where appropriate) on studies that reflect how often the method produces accurate results.”86
This guidance tracks the approach in more recent judicial rulings regarding firearms evidence, rulings in which judges have examined evidence regarding error rates and have limited testimony, in part or entirely, based on what can be drawn from the methods at issue. The years to come may see increased litigation of these issues, particularly where the 702 amendments, drafted with forensic pattern evidence in mind, serve to highlight each of the main questions to be addressed. This history of firearms evidence suggests how the slow, but perhaps steady, reception of science may continue to inform our halls of justice.
Brandon L. Garrett is the L. Neil Williams Professor of Law at Duke Law School, founder and faculty director of the Wilson Center for Science and Justice at Duke, and a member of the leadership team at the Center for Statistics and Applications in Forensic Science (CSAFE).
Eric Tucker is a law clerk at the U.S. Court of Appeals for the Second Circuit. He recently clerked at the U.S. District Court for the District of Delaware.
Nicholas Scurich is chair of the Department of Psychological Science at UC–Irvine with joint appointments in the Psychological Science and Criminology, Law & Society departments. His research has been funded by state and federal agencies and he has received numerous scholarly awards.
Hannah Bloom is a third-year student at Duke Law School, pursuing a career in civil rights litigation.