On lie detection

To understand why “lie detection” is controversial, we need to get into the history of it a little bit, and we also need to explain what different ideas are floating around, and talk about iBorderCtrl specifically. This text, and the footnotes, should give you an overview of the “lie detection” field and outline the gross lack of scientific basis for many of the developments in it.


History of "Lie Detection"

There has never been a fool-proof method to show whether someone is telling the truth or lying. Historically, we “solved” this problem with torture. From antiquity through the Middle Ages, trial by ordeal was used as if surviving water or fire proved innocence.1)

In the 20th century, American scientists tried to develop ways of measuring physiology during questioning that could prove truthfulness or deception. Special devices (dubbed “polygraphs”) were built to measure breathing and heart rates as well as sweating.2) But the courts were quick to reject these new methods for their lack of acceptance in relevant scientific communities. Instead it was decided that first scientists had to agree a certain method of gathering evidence was actually scientific for courts to accept it as evidence.3) Three generations of refuted claims4) later, the state of “lie detection” is still that we're asked to believe experts who often have a huge financial stake in the technology.

That hasn't stopped a 3-4 billion dollar industry5) from developing in the US. “Officially licensed polygraphers” make people pay money for polygraph tests that they can try to use in courts or other proceedings (naturally only if they like the results) and the US intelligence and security apparatus makes extensive (and controversial) use of lie detection to try and find moles, leakers, terrorists, and other criminals.6)

“Lie detection” never did achieve that broad legal or scientific acceptance, because no lie response was ever discovered that uniquely identifies when people are being deceptive or truthful. Rather, most psychophysiologists and psychologists think lie detection is not theoretically sound.7) They doubt the high accuracy claims that are floating around are supported by evidence, and they believe polygraphs can be easily beaten with countermeasures8).

In the early 2000s, Congress asked leading scientists at the National Academy of Sciences to evaluate the scientific evidence behind polygraphs.9) The scientists issued a report warning that using polygraphs for mass security screenings would harm national security by wrongly implicating large numbers of innocent people as deceptive while also failing to identify a non-negligible proportion of guilty people.10) And because most of the available research on lie detection was of such poor quality, they refused to publish an average accuracy rate or range at all11).

"Lie Detection" in Europe

The American polygraph industry currently dominates globally, but in Europe lie detection has traditionally been associated with the Soviets.12) The Stasi in East Germany also used Soviet polygraph interrogation techniques as part of its intimidation and “touchless torture” repertoire.13) Police in Eastern European countries such as Poland, Romania, Slovenia, Latvia and Lithuania commonly use polygraphs in police investigations and personnel security screenings.14)

"Next generation lie detection"

In recent years, new techniques for lie detection have been proposed. Instead of measuring the crude physiological parameters of the old polygraphs, relatively new approaches generally suggest measuring pupil dilation, small eye and facial movements, facial thermography, signs of stress in the voice, posture, and linguistic analysis. The current holy grail appears to be the use of machine learning to study the data from ever-cheaper EEG headsets, directly measuring the electrical activity in the brain itself—when questions can be structured in terms of concealed information rather than deception per se.15) Scholars are increasingly concerned about emerging and possible abuses of these EEG-based brain measurements.16)

iBorderCtrl

Border security is the main growth area for lie detection in Europe. The European security market is growing and set to be worth 128 billion euros by 2020.17). In October 2018, the European Commission announced that as part of the €80 billion Horizon 2020 EU Research and Innovation initiative, it was funding iBorderCtrl (the Intelligent Portable Border Control System) to the tune of €4.5 million. iBorderCtrl includes a lie detection component called ADDS (Automatic Deception Detection System) from a company called “Silent Talker”, a startup staffed by scientists from the Manchester Metropolitan University.

In the case of iBorderCtrl, they claim to be using micro-expressions to detect deception, even going so far as to claim that their system will identify and classify people based on “biomarkers of deceit.” The self-reported accuracy numbers look good at first glance. (But even if we were to believe them, they are still problematic in a mass-screening context.18))

The problem is that there is absolutely no scientific basis for the assertion that unique “biomarkers of deceit” exist, or are about to be discovered after fruitless centuries of pursuit. Rather, the solid scientific consensus on deception detection is that we can't do it. “Lie detection” doesn't exist, because there is no unique lie response to detect. Dr. Vera Wilde wrote a longer blog post called “Biomarkers of Scientific Deceit” on the iBorderCtrl.no blog to talk about the state of the art in micro-expressions research.

Apart from this being pseudoscience, there are serious issues to be expected with all sorts of biases that come from using AI to classify behaviours among such different groups of people. The iBorderCtrl consortium makes all sorts of handwaving statements as if that situation is under control so Dr. Wilde wrote another blog post titled "iBorderCtrl Automates Discrimination (and no amount of handwaving makes that go away)" to address that.

Beyond iBorderCtrl

Technology gets cheaper and and cheaper, and more and more raw sensor data from our lives is processed by larger and larger tech giants all hoping to squeeze out just a little more information about us than the others. Processing data about us for clues about our internal state may well be a gold mine of the future. Even if lie detection as such will remain difficult for some time to come, data about (say) whether we are nervous or relaxed may well predict (say) what the next products are that we are likely to buy. Dr. Wilde's blog post Protecting Sacred Space: Real "Lie Detection" Would Threaten Human Rights As We Know Them examines in-depth some current or possible near future technological advances and their implications for human rights.

In short

  • iBorderCtrl wants to roll out pseudoscientific technology to intimidate travelers.19)
  • The way it's set up, the proposed “lie detection” is likely to discriminate against various subgroups20).
  • Even if there somehow exists a magic way to look inside people's heads, mandatory use of such technology on the population would raise a host of other issues.21)
  • Introduction of iBorderCtrl in its proposed form would be the largest rollout of “lie detection” in the world, and it would mark the first time (we know of) that parts of the general population would be subjected to lie detection en masse.
  • As far as we know, the studies done on this Automatic Deception Detection System have been done by insiders, who stand to become very rich if their system is adopted along the European borders. The progress reporting to the EU is all confidential. This system needs to be studied out in the open, by independent scientists.
  • We need informed debate on lie detection and its potential uses and abuses.
  • “Lie detection” (of any kind) is way, way, way too controversial (ethically and scientifically) to be put into general use by mid-level bureaucrats without proper oversight.
  • This technology and the drive to do what it would like to accomplish are not only scary when used by authorities and/or at borders. We will need to keep an eye on science dealing with inferring or altering our internal state, and on who processes sensor data about us and what techniques they are using to process it.
1)
Historical scholarship on so-called judicial torture starting in Roman antiquity and running through the Khmer Rouge here, in Roman antiquity here, and medieval through eighteenth century history here.
2)
The development of polygraphs in American policing was part of a larger movement to make policing appear more scientific and professional in response to public outrage over widespread police use of torture. Its criminal justice origins predicted its later widespread adoption in American policing and intelligence contexts, in spite of academic, civil, legislative and judicial efforts to limit its uses and abuses. But such widespread use has remained mostly limited to the American security apparatus, due to insufficient scientific basis for broader use.
3)
In the landmark polygraph (in)admissibility ruling Frye v. United States, 293 F. 1013 (D.C. Cir. 1923), Justice Orsdel wrote:
… while the courts will go a long way in admitting experimental testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.
The so-called Frye standard of general acceptance kept polygraph “test” results out. It also governed scientific evidence admissibility in the US for most of the 20th century, and still governs in several states. Elsewhere it has been combined with or superseded by the 1993 Daubert and hybrid Frye-Daubert standards. Polygraph “tests” generally remain inadmissible in criminal courts due to concerns about poor quality methodology in existing studies of polygraph validity, and unpredictable accuracy. But roughly half of US states currently admit stipulated polygraph tests as evidence, New Mexico admits them per se, and polygraph proponents often try to argue the evidence for admissibility under the Frye and Daubert standards. However, in another landmark case, the Supreme Court later ruled in US v. Scheffer, 523 U.S. 303 (1998) that scientific concerns about polygraphs' accuracy, reliability, and susceptibility to countermeasures had rightly prevented an airman from presenting polygraph “evidence” in his defense in court-martial proceedings. In 2005, the heads of a National Academy of Sciences committee to review the scientific evidence on the polygraph wrote in their peer-reviewed summary of that work:
We believe that the courts have been justified in casting a skeptical eye on the relevance and suitability of polygraph test results as legal evidence. Generalizing from the available scientific evidence to the circumstances of a particular polygraph examination is fraught with difficulty. Further, the courts should extend their reluctance to rely upon the polygraph to the many quasiforensic uses that are emerging, such as in sex offender management programs (see the discussion in [4]). The courts and the legal system should not act as if there is a scientific basis for many, if any, of these uses. They need to hear the truth about lie detection.
4)
… And three damning government reports spanning over thirty years during which neither the scientific consensus nor the government's curious affinity for polygraphs substantially changed…
5)
The American polygraph industry's worth is estimated at 3.6 billion dollars annually. Next-generation “lie detection” adds some revenue as well.
6)
In some well-documented cases, federal US polygraphers interrogated recruits or employees about protected and irrelevant subjects including childhood sexual victimization, adult sexual victimization and sexual identity, and religion. This is not a new problem: It was inappropriate polygraph questioning of a sexual nature, among other things, that back in the '60s resulted in one of the earlier reports on government use of polygraphs (p. 3). It has also happened that statements were attributed to people during polygraph interrogations that they never made, or examiners generated fraudulent polygraph “test” results. Such frauds resulted in the firings of five CIA polygraphers as a result of Operation Bad Apple. Short of internal surveillance, non-transparency makes such frauds impossible to detect or rebut for all but the most privileged elites. Secrecy is a problem for researchers, too. Nonetheless, we know as a result of documents obtained under the Freedom of Information Act that polygraphers themselves expressed significant concerns about the value of polygraphs as they have been used overseas, including in Iraq and Afghanistan, on deployed personnel and detainees alike.
8)
… Because they're not, and they can.
9)
Backstory: There was a spy scare during which nuclear scientist and Los Alamos National Laboratory employee Wen Ho Lee was polygraphed. Three polygraphers read his “test” results as indicating innocence. The FBI later concluded he had “failed” the same polygraph, underscoring inter-rater reliability concerns. Although the case against Lee eventually fell apart, the public furor and organizational fear around it led the security establishment to successfully pressure Congress to require Department of Energy (DOE)-wide polygraph screening, including throughout the National Labs. The Federal Register announcement (64 Federal Register 242, p. 70961, 17 December 1999) of how the DOE would interpret the new law, Section 3135 of H.R. 5408, the National Defense Authorization Act for FY 2001, triggered major resistance from Lab employees. That resistance resulted in multiple public hearings, a majority of Los Alamos X-Division employees endorsing a petition asking then-DOE Secretary Bill Richardson to reconsider the polygraph policy change, a quiet Congressional roll-back of the polygraph provision, and a report from from a panel chaired by former CIA counterintelligence chief Paul Redmond to the House Permanent Select Committee on Intelligence on Lab employees’ antipathy towards polygraphy as unscientific—followed by a Congressional request to the National Academy of Sciences (NAS) to evaluate the scientific evidence on polygraph security screenings at the Labs, and potential alternatives for deception detection. When the NAS initially submitted their report, the DOE responded in April that they would not change their policy in response. Then, as committee Chairman Stephen Fienberg was getting on a plane to testify before Congress, he was told that the DOE had changed its mind… Their testimony the next day wouldn’t be what they had told him, after all… The DOE would pull back its polygraph program expansion in response to the scientists’ report. And the DOE did indeed testify to Congress the next day that they would do that. But then they backtracked, later claiming that the study suggested polygraphs could be effective as part of detailed investigations—an allegation Fienberg publicly disputed as “clearly taken out of context and distorted.” Rather, he reiterated, “There is virtually no science behind the screening process on which they rely.” Years later, the NAS committee chair Stephen Fienberg and study director Paul Stern summarized their work and its impact in the journal Statistical Science. At the end, they note:
As this paper was going to press in January 2005, the Department of Energy finally announced its proposed revised polygraph rules in the Federal Register [2].They provide a detailed plan for implementing the plan outlined in Deputy Secretary McSlarrow’s September 2003 testimony. But no other federal agency has stepped forward with a plan to curb the use of polygraphs. All of them have heard the truth about polygraphs as we know it, but they have failed to acknowledge it by action.
Ultimately, in 2006, the DOE did issue a regulation limiting polygraphs of its employees to certain cases with specific cause, as opposed to the mass screening program they (repeatedly) tried to implement following the Wen Ho Lee scandal.
10)
As Al Zelicoff, former senior scientist with the Center for National Security and Arms Control at Sandia National Laboratories, wrote for The Washington Post, the National Academy of Sciences report:
determined that the polygraph was not a worthless tool – indeed, that it was much worse than worthless. The report said that “available evidence indicates that polygraph testing as currently used has extremely serious limitations . . . if the intent is both to identify security risks and protect valued employees.” The NAS panel, made up of internationally respected psychologists and statisticians, further determined that the test was so nonspecific that even if the polygraphers managed to finally uncover their first spy, at least 100 innocent laboratory employees would have their clearances yanked because of the “false positives” inherent in the test. The NAS concluded: “Polygraph testing yields an unacceptable choice . . . between too many loyal employees falsely judged deceptive and too many major security threats left undetected. Its accuracy . . . is insufficient to justify reliance on its use in employee security screening in federal agencies.” It doesn't get much clearer than that.
Despite having cleared this op-ed before publication, Zelicoff was subsequently forced to resign in retaliation for his opposition to polygraphs.
11)
Actually, they published a table featuring a hypothetical accuracy rate of 80%, higher than any plausible polygraph accuracy rate according to available evidence (see Table S-1). They did this in order to apply Bayes’ Rule to show how polygraph screening would hurt national security by wrongly implicating 1,598 innocent Lab employees and missing 2 spies in a hypothetical sample of 10,0000 Lab employees with 10 spies. So even with a higher-than-likely accuracy rate plus a higher-than-likely base rate of spying among Lab employees, the scientists were convinced that polygraph security screenings would harm the Labs. Their report and testimony convinced Congress of this, too. They just couldn’t then get intelligence agencies with entrenched polygraph programs to give them, or the evidence, the time of day. That non-response of the government bureaucracy to scientific criticism of the polygraph is a continuation of a pattern of such non-response. In 1983, the Congressional Office of Technology Assessment published its Scientific Validity of Polygraph Testing: A Research Review and Evaluation—A Technical Memorandum, concluding (p. 100):
OTA recognizes that the administration as well as NSA, CIA, and DOD believe that the polygraphis a useful screening tool. However, OTA concluded that the available research evidence does not establish the scientific validity of the polygraph for this purpose.
12)
… And then of course the Russians and Belarusians, in literature generally unknown in the West
13)
At one point, the cash-strapped East Germans were so militarized and desperate for coffee that they traded polygraph machines for beans.
14)
See, e.g., this news article. This pattern is exactly what you would expect to see in Europe based on geopolitical power dynamics and the overall pattern of US polygraph export to client states. As previously noted:
there is a large and consistently expanding global network of U.S. polygraph training and equipment recipients in the form of other governments… polygraph programs are required as part of U.S. sponsored anti-corruption programs such as Plan Colombia, the Mérida Initiative in Mexico, and others in the Bahamas, Bolivia, Guatemala, Honduras, and Iraq (U.S. Government Accountability Office 2010; U.S. Department of State 2010).
15)
When the National Academy of Sciences reported in 2003 on their review of deception detection research, they observed that brain function-measuring classes of “lie detection” technologies are:
attractive on grounds of basic psychophysiology because of the possibility that appropriately selected brain measures might get closer than any autonomic measures to psychological processes that are closely tied to deception. Brain activity can be measured with modern functional imaging techniques such as positron emission tomography (PET) and magnetic resonance imaging (MRI, often referred to as functional MRI or fMRI when used to relate brain function to behavior), as well as by recording event-related potentials, characteristics of brain electrical activity following specific discrete stimuli or “events”…
(p. 154-5). The scientists went on to identify several problems with fMRI for deception detection (p. 159-160).
18)
Although lack of transparency prevents independent scientists from knowing what is really going on here, iBorderCtrl scientists have repeatedly stated that their results show a 76% accuracy rate. That is concerning if you consider that the National Academy of Sciences convinced Congress that polygraph screening would harm national security even if it had a hypothetical (and higher than plausible) accuracy rate of 80%. Applied to a larger sample of 200 million hypothetical refugees with an identical accuracy rate of 80% and relevant criminal base rate of 1/1,000, such a next-generation lie detector would unfairly implicate almost 2 million innocent people while still letting 300 terrorists through. See Table 4.
19)
The National Academy of Sciences (NAS), reporting in 2003 on their review of the scientific evidence on the polygraph as well as on next-generation lie detection tools, concluded:
Various techniques for detecting deception have been suggested or might be used as substitutes for or supplements to the polygraph. None of them has received as much research attention as the polygraph in the context of detecting deception, so evidence on accuracy is only minimal for most of the techniques. Some of the potential alternatives show promise, but none has yet been shown to outperform the polygraph. None shows any promise of supplanting the polygraph for screening purposes in the near term. Our conclusions are based on basic scientific knowledge and available information about accuracy.
21)
E.g., Ienca and Andorno suggest new neurotechnology applications will require new legal frameworks to defend the basis for old human rights including freedom of thought. They imagine such a regime defending the right to cognitive liberty, the right to mental privacy, the right to mental integrity, and the right to psychological continuity. Lavazza broadens the definition of mental integrity to address both privacy and cognitive freedom—the power to keep one's thoughts to oneself, and to know that the internal world is unsurveilled, and therefore can be as free from external pressures as we can make it ourselves.