Bolch Judicial Institute
Duke Law School
by Thomas D. Albright and Jed S. RakoffVol. 104 No. 1 (2020) | A Clearer View | Download PDF Version of Article
Six years ago, the U.S. National Academy of Sciences (NAS) convened a panel of experts to consider the problem of eyewitness identification. Eyewitnesses have long played a significant role in criminal investigations and prosecutions. Despite this history of valued testimony, mounting evidence from both science1 and conviction records of DNA-based exonerees2 indicates that eyewitnesses easily and unwittingly identify innocent suspects.
The societal problem caused by eyewitness misidentification — the conviction of innocent people — stems from a panoply of dysfunction, ranging from scientific naiveté and investigative bias to prosecutorial disregard and judicial ignorance, as well as a natural human tendency to trust what people say they saw.3 The NAS eyewitness panel was thus composed of experts representing a variety of fields, including the scientific study of human visual perception and memory, sociology, statistics, law, and law enforcement. After reviewing evidence from many sources, the panel released a comprehensive report in the fall of 2014,4 which identified factors that commonly lead to erroneous conviction and made substantial recommendations for reform.
The recommendations of the NAS eyewitness panel covered three topic areas: scientific understanding; law enforcement practice; and use of eyewitness evidence in the courtroom. We are pleased to report that these recommendations have been embraced by both scientific and legal communities and have had, in a few short years, a significant positive impact on the field. The NAS report itself, which is freely available from the National Academies Press website,5 has been downloaded more than 13,000 times and has elicited rich discussion, new scientific research, and improved legal practice. In what follows we summarize some of these developments.
The overarching scientific development stimulated by the NAS report has been the engagement of a research community with expertise in sensory and cognitive processes. This development addresses a longstanding weakness, which is that reputed fact and operational strategies in the field of eyewitness identification have often come from applied studies6 that have little grounding in a principled mechanistic understanding of how people see, remember, and make decisions.
One product of this engagement is a recasting of the problem.7 Traditionally, factors that affect eyewitness performance have been identified symptomatically and classified as “estimator” or “system” variables, which are taxonomic distinctions based on time of influence and the degree to which the criminal justice system has control over the outcome.8 Estimator variables characterize the viewing conditions and perceptual/cognitive state of the witness at the time of the crime. These variables (e.g., lighting, viewing distance, stress, and fear) should be considered when assessing the validity of testimony, but they cannot be changed. System variables (e.g., the manner in which a lineup is conducted), by contrast, may influence identification accuracy after the crime and can thus be controlled to improve the likelihood of correct identification. Although insights from this classification scheme have indeed led to improvements, it has promoted a palliative approach to eyewitness performance — we can simply recognize and/or tweak the state of these variables to get the best outcome — at the expense of understanding and mitigating the root causes of eyewitness failure.9
Accurate identification requires that the witness correctly perceive and remember the events of a crime. Although this task is uncommon for most people, we regularly engage in analogous behaviors that require recognition of a previously seen and remembered target, such as finding the luggage on the carousel, the tumor in the lung, or the car in the parking lot. In all such cases, the observer is an instrument for information measurement, classification, and storage. The root causes of misidentification are thus tied to the operating characteristics of human observers — sensitivity, storage capacity, and susceptibility to interference or bias — which have been studied extensively by the science community for decades. The product of this research — a principled mechanistic understanding of how vision and memory work — suggests ways to optimize human performance of visual recognition and avoid conditions in which people are likely to fail.10
At the most basic level, we know that three factors bear on the performance of an eyewitness: uncertainty, bias, and confidence. Uncertainty results from “noise,” or unpredictable perturbation of otherwise meaningful signals. Noise is ever-present in our sensory and mnemonic worlds and constrains the information that can be acquired by an observer. Such constraints naturally reduce the accuracy and utility of observations and thus, to the extent that uncertainty can be quantified, place useful upper bounds on the probability that a witness’s identification is correct. Usually unbeknownst to the observer, bias quietly fills informational gaps left by uncertainty — we see what we expect to see and are none the wiser. Confidence is the degree to which the identifier feels certain that his observation is correct; its most common manifestation in this context is overconfidence, or a special form of bias in which the observer implicitly rates the certainty of the experience to be greater than it warrants. This is commonly the result of external forces, such as other evidence or opinions,11 which drive the observer’s certainty in line with a larger story.12 Overconfidence may be the most pernicious problem with eyewitness testimony, since even very poor-quality information can influence decision and action if communicated to others with certainty.13
This human information-processing perspective on eyewitness identification had already begun to take shape at the time of the NAS report.14 One important issue was growing awareness by the eyewitness research community that a lineup identification is the unique product of two unknown variables: the strength of the observer’s recognition memory and the criterion used by the observer to decide. Memory strength is affected by uncertainty, and the decision criterion is determined by various biases as well as overconfidence; but, at the end of the day, it can be difficult or impossible to know whether a suspect identification reflects strong memory or a lax decision criterion.15 This difference matters, since the former is much more likely to yield an accurate identification.
In an effort to overcome the underlying ambiguity of identification decisions, much recent eyewitness research has focused on two specific questions:16 (1) What can be done to improve the ability of an eyewitness to access his/her memory of events from the crime scene, such that the witness is better able to discriminate the face of the perpetrator from faces of innocent people?; and (2) Is it possible to determine, in the face of manifold forms of uncertainty and bias, the likelihood that any given witness has identified the right person?
Most research bearing on this question has been inspired by specific hypotheses about how presentation of a set of choice stimuli — lineup faces in this case — influences memory-based discriminability.17 Because the question itself is so fundamental, and stimulus presentation is one of the few tractable variables, no topic in recent eyewitness identification research has stirred as much activity, interest, and controversy as the manner in which a lineup is conducted.
Until a few decades ago, lineups were always performed such that all faces were visible at the same time (initially live and later largely with photographs). Laboratory studies were designed to quantify the average performance of observers under these “simultaneous lineup” conditions using a simple metric — termed the “diagnosticity ratio” — defined as the probability of correctly identifying the culprit relative to the probability of incorrectly identifying an innocent suspect. Motivated by the hypothesis that identification errors result from relative rather than absolute comparisons of lineup faces, eyewitness researchers designed an alternative to the simultaneous lineup that was intended to curtail relative judgments. This “sequential lineup,” in which faces are presented one at a time, was predicted to yield a greater ratio of correct to incorrect identifications.18 This hypothesis was upheld in early studies, and a number of law enforcement jurisdictions subsequently switched from simultaneous to sequential lineup procedures.19
On the surface of things, the move from simultaneous to sequential lineups seemed sensible and appealing, given the relative reduction of misidentifications. The logic motivating the change was deeply flawed, however, as noted in the NAS report, because it failed to consider the fact that a lineup identification confounds the effects of recognition memory and decision criterion. In other words, the cause of the reduction of misidentifications was ambiguous — it was not possible to know whether it had been due to witnesses being better at accessing memory from the crime scene or witnesses simply being more conservative about pointing the finger.20 As further research showed, sequential lineups encourage witnesses to adopt a more conservative criterion — they simply make fewer identifications of any sort — but there has been no significant evidence that sequential lineups elicit greater discriminability of lineup participants based on recognition memory for the culprit, which is the desired outcome.
Simultaneous vs. Sequential Lineups: Which Is Better, and Why?
The sequential-simultaneous debate was in full swing when the NAS eyewitness panel was convened. Also by this time, some investigators had adopted a signal detection approach known as ROC analysis21 to evaluate witness performance in a way that disentangles memory strength from decision criterion. To do so requires holding either memory strength or decision criterion constant at known values. While the criterion that an eyewitness uses to decide may be inscrutable, it is correlated with the witness’s confidence in his or her decision.22 On average, confident witnesses are more likely to be selective and identify only those faces that meet a stringent criterion. It follows that the probability of correct identification, measured for a known set of confidence-based decision criteria, is proportional to the strength of recognition memory.
Getting a handle on one of the two critical variables for identification — the decision criterion — helps to overcome the fundamental ambiguity of the identification process. This approach, which has dominated eyewitness studies of the past few years, has yielded many important insights into the factors that influence accurate identifications.23 Although this approach yielded tentative support for simultaneous lineups, as studies suggested improved discriminability with this approach, the NAS panel felt that a recommendation in support of either simultaneous or sequential was premature. The panel nonetheless urged caution when considering a change of lineup procedures.
The NAS report prompted a surge of studies designed to evaluate performance as a function of lineup type.24 Considered together with the earlier simultaneous/sequential comparisons cited in the NAS report, these recent analyses indicate that, on average, witnesses are better able to optimize sensitivity to their memories — that is, they manifest better discriminability — when simultaneous lineups are used.25
Beyond Traditional Lineups?
The relative discriminability afforded by simultaneous or sequential lineups was only made clear by the adoption of a more sophisticated approach to data analysis, but these two basic lineup procedures have been in use for decades. Although both methods are simple to apply in practical settings by law enforcement, and the outcomes are easy to intuit (perhaps deceptively so) by triers of fact, there are good scientific reasons to break outside this box and explore new ways to improve eyewitness performance. For example, the method used to evaluate the merits of traditional lineups measures the ratio of correct-to-false identifications, which is assessed for a known set of confidence-based decision criteria (such as expressed confidence). That method, of course, depends on precise and accurate measurements of witness confidence. As we have seen, confidence can veer off the rails when witnesses are exposed to other sources of information.26 The alternative would be to develop an approach that nails down the other key variable that underlies lineup identification: strength of recognition memory.
Memory strengths are not directly accessible because they live only within the mind of the observer. But they can be estimated using experimental techniques that have long been part of the repertoire of basic scientific studies of human information processing. These techniques, known as “perceptual scaling,” map the relationship between a set of physical stimuli and the corresponding responses of an observer’s perceptual system.27 In the case of eyewitness identification, scaling techniques can be used to quantify perceived similarity of each lineup face to a remembered target. Perceived similarity, in turn, is an estimate of the recognition memory signal elicited by each face. Because it affords a view beneath the surface of a categorical identification, scaling of recognition memory signals elegantly reduces the identification problem to one of statistical inference,28 in which lineup faces can be classified probabilistically as perpetrator or innocent suspect based on the estimated memory signals.29 This new approach holds much promise as a means to overcome the fundamental ambiguity of traditional eyewitness identifications.30
The aforementioned studies of lineup type shed valuable light on the contributions of recognition memory and decision criteria in lineup identification. In doing so, they pinpoint lineup conditions that yield, on average, the best ability of a witness to discriminate the culprit from innocent suspects. These discoveries are an indispensable basis for policy decisions about the type of lineup to use in actual criminal cases. This approach reveals nothing, however, about the probability that a given witness identification is correct, which is of course what the trier of fact really needs to know.
Probability of correctness, or “accuracy,” is distinct from discriminability and is defined as the ratio of correct identifications of the culprit relative to all (correct or incorrect) culprit identifications reported.31 Accuracy is surely impacted by various forms of uncertainty and bias. For example, if witnesses cannot easily see actors and events of the crime because of dim lighting or distance, they face uncertainty and are less likely to be correct in their identifications. One might suppose that witnesses who are uncertain are less confident, and thus expressions of confidence might usefully predict accuracy. For many years, however, studies revealed little correlation between confidence and accuracy in recognition memory tasks,32 and legal standards for the use of witness confidence judgments in eyewitness identification naturally followed that scientific foundation.33
This seemingly inoperative relationship between confidence and accuracy has been challenged by two recent observations. First, as noted above, confidence is empirically correlated with the decision criterion for identification.34 Second, as observers become more conservative in their decision criterion — selecting a target only when absolutely certain — they are expected to become more accurate.35 It follows that confidence should predict accuracy. Indeed, one of the most important discoveries since the release of the NAS report confirms this prediction: On average, highly confident36 witnesses are in fact highly accurate in their identifications.37 This is true even in the presence of significant uncertainty and bias, which reduce overall accuracy and correspondingly reduce the overall likelihood that a witness will report a high confidence identification. The bottom line is that a high-confidence identification, when it occurs at the time of the lineup, is likely to be a correct identification, and thus witness confidence is of great probative value for the trier of fact.
Further support for this view comes from a novel procedure for estimating the strength of recognition memory signals based on confidence judgments.38 In this case, rather than rendering a categorical identification, the witness assigns a confidence rating to each lineup participant, which is presumed to reflect the corresponding strength of recognition memory. The accuracy of suspect identifications was found to be correlated with confidence in those identifications. The strength of that correlation was greatest when confidence ratings regarding fillers (or lineup participants known to be innocent) were low, and thus not competing with recognition memory for the suspect. In other words, estimates of memory strength for all lineup participants further increases the informational value of a high-confidence suspect identification.
Another new approach to the accuracy question, which stems from the perceptual scaling method highlighted above, also seeks to estimate the recognition memory signals that underlie an eyewitness identification. This method involves the presentation of all possible pairs of a set of lineup faces. Witnesses are asked to make relative — not absolute — judgments: Which face of each pair looks more like the perpetrator?39 The consistency of such judgments across different face-pair presentations serves as an objective quantitative index of certainty for any individual witness. This index is witness-specific and sidesteps the criterion dependence of confidence statements: A witness is asked to choose between two alternatives, rather than offer a more abstract statement about confidence as to an absolute judgment, which is notoriously difficult to quantify. Instead, the index of certainty provides a statistical basis for triage of uncertain witnesses and may prove useful for predicting the correctness of identifications.
In addition to the emergence of a broader information-processing perspective and the aforementioned advancements pertaining to eyewitness discriminability and accuracy, there have been a number of more specific scientific developments stimulated by the NAS report. These cover a range of factors that affect eyewitness uncertainty, bias, and confidence.40 We mention two of these factors here because of their promise for improving eyewitness performance: (1) lineup filler selection and (2) use of other visual cues for lineup identification.
Fillers are lineup participants known to be innocent, who serve as lures to challenge recognition memory. The choice of fillers has long been known to markedly influence eyewitness performance.41 To understand why this is true, it helps to consider eyewitness identification — and object recognition more generally — as a process of statistical inference. People recognize objects probabilistically based on the degree to which they elicit a memory signal corresponding to a particular target previously seen. It naturally follows that similar objects are more likely to elicit the same memory signal and thus have similar likelihoods of being recognized as the target. The similarity of fillers to the suspect is thus an important variable that affects both eyewitness discriminability and accuracy.
Lineups composed of fillers who are all of roughly the same degree of physical or perceived similarity to the suspect are termed “fair.” In other words, to be fair, lineup fillers must look like each other as well as like the suspect himself. Conversely, lineups composed of fillers that possess differing degrees of similarity to the suspect are termed “unfair” or “biased.” An unfair lineup, in which one filler is closer in similarity to the perpetrator, reduces uncertainty by lessening the number of sensible choices and oversimplifies the witness’s statistical inference. The eyewitness may be essentially dealing with a two-choice problem rather than a six-choice problem. This has the effect of simultaneously increasing the likelihood of identifying the culprit and the likelihood of misidentifying someone who looks like the culprit — thereby decreasing both discriminability and accuracy.
Despite the well understood and potentially disastrous consequences of unfair lineups, by the time of the NAS report, there had been very few serious attempts to systematize the process of filler selection. Published guidelines for filler selection stated that lineups should be constructed to ensure that “the suspect does not unduly stand out” and should “avoid using fillers that so closely resemble the suspect that a person familiar with the suspect might find it difficult to distinguish the suspect from the fillers.”42 This guidance — fillers should be similar to the suspect but not too much so — is clearly open to interpretation and is often applied by different agents in different ways.
More recently, and for a variety of reasons having nothing specifically to do with eyewitness identification,43 many studies of face recognition have focused on metrics of face similarity. As applied to eyewitness identification, the goal would be to employ these metrics to create lineups in which fillers are all of known similarity to the suspect. There are two kinds of approaches to this problem, one of which uses physical parameters of the face — such as distance between the eyes, height of the forehead and width of the mouth — to define similarity.44 The other kind of approach defines face similarity perceptually, based on human judgments.
In both approaches, we begin with a library of faces, ideally representative of the demographic of interest. The physical similarity approach exploits the fact that faces commonly differ from one another along multiple physical dimensions. Thus, the characteristics of any face can be quantified and described by a unique point in some high-dimensional “face space.” Collapsing this high-dimensional space onto a manageable three dimensions allows one to compute the Euclidean distance between any two points, which serves as a measure of physical similarity between the respective faces.
The alternative perceptual approach measures face similarity directly from reports of human observers. Numerous attempts have been made to do this by asking people to rate face similarity,45 but these are plagued by the criterion-dependence of subjective ratings and the effort required to apply this method to large face libraries. Perceptual scaling methods, such as the paired comparison procedure described above, avoid the criterion problem and are a natural choice to produce similarity measures for any given pair of faces. The desired product of both physical and perceptual approaches is a set of similarity measures for all possible pairs of faces in the library.46 When drawing from this similarity-indexed library, it should be possible to customize filler selection for a given suspect by specifying both the average face similarity distance between suspect and fillers, and the variance of face similarity amongst the fillers. Such a system could then be used to empirically determine the parameters of lineup face similarity that yield the best eyewitness performance.
Other Visual Cues for Lineup Identification
Lineups today rarely employ live participants, and with the move to still photographs has come a significant reduction of visual information that might be used for recognition. Lineup photographs are en face, they lack stereoscopic and motion cues that might reveal three-dimensional structure. They are absent whole-body information, such as posture and gait, and they are often monochromatic. At the same time, it has become increasingly clear from studies of visual object recognition that performance is better when more information-bearing cues are available to the observer.47
A remarkable recent study along these lines identified the specific pieces of information from facial images that were used by observers to perceptually encode three-dimensional shape and texture.48 These coding rules were then used to “reverse engineer” new faces, which were found to be perceptually similar to those originally observed. The significance of this for lineup design is that the empirically determined coding principles high-light the types of visual information that would be beneficial for eyewitness identification. Although simplistic studies involving video images for lineup identification have failed to find much utility,49 a principled approach in which visual presentations of lineup participants convey information that matches the ways in which people encode and remember faces is likely to be of great value for improving eyewitness performance.
In addition to kickstarting a lot of new and interesting science that may ultimately improve the ability of eyewitnesses to identify the culprit, the NAS report made recommendations aimed at enhancing and standardizing practices employed by law enforcement and at strengthening the use of eyewitness evidence in the courts. These recommendations have led to specific reforms, which we highlight in the following sections.
Better Lineup Procedures
As reflected above, improving lineup procedures and, more generally, police techniques regarding eyewitness identifications, can only go so far toward avoiding inaccurate identifications, which are sometimes caused by factors unrelated to such practices. Nevertheless, misleading police practices are a material factor in a number of misidentifications. Accordingly, the NAS panel made five recommendations to improve police practices, especially in connection with lineups and photo arrays. These recommendations were: (1) training all law enforcement officers on the variables that can affect eyewitness identifications; (2) adopting “blind” lineup and photo array procedures (such as having the procedure administered by an officer who is not involved in the underlying investigation); (3) providing the officers who do administer the procedures with standardized witness instructions designed to avoid suggestiveness and contamination; (4) documenting the witness’s stated level of confidence at the time of an identification; and (5) videotaping the witness identification process.
The good news is that in the few years since the NAS panel issued its report, no fewer than 19 states have passed legislation or have adopted rules requiring the reforms set forth in recommendations 2, 3, and 4, i.e., blind procedures, standardized instructions, and recording of confidence levels. These states are California, Colorado, Florida, Georgia, Hawaii, Illinois, Kansas, Louisiana, Maryland, Massachusetts, Montana, Nebraska, Nevada, New Hampshire, New Mexico, New York, Oklahoma, Utah, and West Virginia. Some of these states have also adopted recommendation 5 (videotaping), though others have simply chosen to recommend it where feasible. Finally, while only a few states have adopted recommendation 1 (universal training), implementation of the other reforms has presumably served to sensitize police officers to some of the attendant issues and problems.
In addition to these statewide legislative and regulatory reforms, two criminal justice organizations with broad jurisdiction have recently weighed in on police practices for eyewitness identification. In September 2016, the International Association of Chiefs of Police released a new “Model Policy” for the conduct of lineups.50 Similarly, in January 2017, the U.S. Department of Justice released new “Procedures for Conducting Photo Arrays,”51 the first revision of DOJ policies since 1999. In both cases, the procedures adopted were drawn directly from the recommendations of the NAS report.
The bad news is that 31 states still have not acted. While local police authorities in several of these states had already adopted some or all of these best practices even prior to the NAS report, it can only be hoped that the remaining states and municipalities will follow the lead of the states that have now adopted most of the report’s recommendations.
Strengthening Eyewitness Evidence in the Courts
The NAS report also made four recommendations designed to strengthen the value of eyewitness identification evidence in court. These were: (6) making more frequent pre-trial judicial inquiries into the adequacy of the eyewitness testimony proposed to be offered; (7) making juries aware of the circumstances and confidence of the eyewitness’s prior identifications; (8) allowing expert witnesses to educate juries about the problems with eyewitness testimony; and (9) alternatively, using jury instructions to convey this information.
Here, the record of improvements has been more spotty. Following the earlier lead of New Jersey, Massachusetts has now issued a set of jury instructions to be given before or after the testimony of an eyewitness to alert juries to some of the potential issues, and the Supreme Court of Utah has now approved a new rule allowing judges to conduct pre-trial suppression hearings to determine if an eyewitness’s identification is too problematic to be presented to a jury. But, with these exceptions, the main effect of the NAS report on the courts, thus far, has been to sensitize some (though by no means all) judges to some of the problems.
Nevertheless, anecdotal evidence indicates that, since the issuance of the NAS report, defense counsel have become more assiduous in raising with the courts issues regarding eyewitness identification. For example, in several jurisdictions, appellants have raised as an issue on appeal the failure of trial courts to provide funds for indigent defendants to retain eyewitness experts. While it does not appear that any of these cases has yet provided a definitive answer as to whether such funds should be made available, it may be inferred that issues regarding the problems with eyewitness identifications are at least becoming more salient in the minds of judges.
In short, there has been some progress in the courts, but not an overwhelming response. It is not clear why. After all, the 19 states that have enacted lineup reforms represent a fairly broad cross-section of America, so there does not appear to be an ideological “split” that prevents improvements elsewhere. Perhaps it is just a matter of inertia. But given the stakes involved — i.e., the wrongful conviction of innocent persons — one may well hope for better in the future.
The unfortunate products of eyewitness misidentification — frivolous investigations, errant prosecutions, and wrongful convictions — have persisted for many years, largely due to a lack of understanding. The NAS eyewitness report, however, has provided a wealth of information, offering a detailed and coherent roadmap for change by highlighting the associated primary problems and underlying causes. As we have summarized herein, reform is now underway on multiple fronts, fueled by passion, principles, and good ideas. Indeed, this is shaping up to be one of the great success stories at the intersection of science and law. We look forward to the next five years.