How the “Surveillance AI Pipeline” Literally Objectifies Human Beings

A preprint study covering 20,000 computer vision papers and patents over three decades shows “The Surveillance AI Pipeline,” and why the field needs more interdisciplinary studies.
How the “Surveillance AI Pipeline” Literally Objectifies Human Beings
Photo by Parker Coffman / Unsplash

The vast majority of computer vision research leads to technology that surveils human beings, a new preprint study that analyzed more than 20,000 computer vision papers and 11,000 patents spanning three decades has found. Crucially, the study found that computer vision papers often refer to human beings as “objects,” a convention that both obfuscates how common surveillance of humans is in the field, and objectifies humans by definition.

“The studies presented in this paper ultimately reveal that the field of computer vision is not merely a neutral pursuit of knowledge; it is a foundational layer for a paradigm of surveillance,” the study’s authors wrote. The study, which has not been peer-reviewed yet, describes what the researchers call “The Surveillance AI Pipeline,” which is also the title of the paper.

The study’s lead author Pratyusha Ria Kalluri told 404 Media on a call that she and her co-authors manually annotated 100 computer vision papers and 100 patents that cited those papers. During this process, the study found that 90 percent of the papers and patents extracted data about humans, and 68 percent reported that they specifically enable extracting data about human bodies and body parts. Only 1 percent of the papers and patents stated they target only non-humans.

From the study "The Surveillance AI Pipeline."

In order to analyze the rest of papers and patents spanning three decades, the study used an automated system which scanned the documents for a lexicon of surveillance words that the study’s authors came up with. This analysis found not only that the majority of computer vision research enables technology and leads to patents that surveil humans, but that this has accelerated over the years. “Comparing the 1990s to the 2010s, the number of papers with downstream surveillance patents increased more than five-fold,” the study’s authors wrote.

“In addition to body part and facial recognition, these technologies were frequently aimed at mass analyzing datasets of humans in the midst of everyday movement and activity (shopping, walking down the street, sports events) for purposes such as security monitoring, people counting, action recognition, pedestrian detection, for instance in the context of automated cars, or unnamed purposes,” the authors wrote.

As co-author Luca Soldaini said on a call with 404 Media, even in the seemingly benign context of computer vision enabled cameras on self-driving cars, which are ostensibly there to detect and prevent collision with human beings, computer vision is often eventually used for surveillance.

“The way I see it is that even benign applications like that, because data that involves humans is collected by an automatic car, even if you're doing this for object detection, you're gonna have images of humans, of pedestrians, or people inside the car—in practice collecting data from folks without their consent.” Soldaini said.

Soldaini also pointed to instances when this data was eventually used for surveillance, like police requesting self-driving car footage for video evidence.

From the study "The Surveillance AI Pipeline."

Part of the reason the authors painstakingly manually annotated 100 papers and patents is that the field of computer vision often obfuscates surveillance of human beings with jargon. Computer vision’s ability to identify anything in an image is often referred to as “object detection,” but computer vision technology that is able or specifically designed to surveil human beings also referred to them as “objects.”

“I think it's dehumanizing people quite literally,” co-author Myra Cheng told 404 Media on a call. “You're literally objectifying people. I think that makes it easier to divorce whatever application or paper or method you're building from the very real consequences that it might have on people and especially marginalized populations.”

“This idea that object detection, that you can say object and mean human, was not a default phenomenon,” Kalluri said. “Even in the law literature when we were talking to patent lawyers, there was a concerted effort to decide and follow up and make it legal precedent, to make it a convention in the field to say object to mean human. That's already something that now we think of as normal, but it's actually very strange.”

Computer vision, machine learning, and AI more broadly is understandably primarily the domain of computer scientists. But what we’ve seen in recent years is that when computer scientists build the future without input from experts in history, ethics, and the humanities, we get a future where facial recognition is deployed even though it can incriminate innocent people, self-driving cars that clog the roads, and generative AI tools that steal from human creators to generate non-consensual content.

“The Surveillance AI Pipeline” doesn’t only highlight an important reality about computer vision technology, its team of authors, who also have backgrounds in ethics, critical data studies, abolitionist community organizing, history, and feminist studies shows the value of a interdisciplinary approach in a field where it is so painfully lacking.

Using “object” language, Kalluri said, “allows this continuation of the narrative of the field as this neutral thing that deals with abstraction and computer science has a long history of getting to use that word to put up some resistance to a lot of critique. I think that's one problem. I think another is the history of using ‘object’ language to describe people is not new, right? That entire history is about a history of oppression and repression.”