Kenyan Workers 'Psychologically Traumatized' by Graphic Text to Make AI Chatbots Work

General
Kenyan Workers 'Psychologically Traumatized' by Graphic Text to Make AI Chatbots Work

ChatGPT has impressed millions with its ability to string together coherent and sometimes accurate sentences, blurbs, and scripts. To write sentences like a human, this AI bot was trained with machine learning algorithms based on a vast catalog of material scavenged from the web. But not all of ChatGPT's development was automated: human labor was required to prevent ChatGPT from falling into the same trap as its predecessor GPT-3, which could make inappropriate and sometimes racist (open in new tab) comments.

According to a recent investigation by Time (opens in new tab), OpenAI, the creator of ChatGPT, outsourced this unpleasant data processing task to Kenyan workers.

ChatGPT, like image generation tools like DALL-E (also run by OpenAI), Stable Diffusion, and Midjourney, is trained on datasets too large to be meticulously curated by hand. Without training, ChatGPT would not work at all, but not all of the text that can be found on the Internet leads to the kinds of comments that we want AI bots to make.

In our outsourced work, we labeled examples of offensive text that might appear in learning materials. A collection of these labeled text samples was then fed to another AI and trained to notice and remove similar offensive text from ChatGPT's responses to users.

By training the AI to avoid inappropriate language and themes, ChatGPT is kept cleaner and less likely to be used to generate disturbing content. However, in this effort to improve the bots, OpenAI has exposed Kenya's low-wage workers to some of the worst material on the web.

"To get these labels, OpenAI sent tens of thousands of text fragments to Kenyan outsourcing companies starting in November 2021," Time magazine reported.

"Many of the texts appear to have been pulled from the darkest depths of the Internet. Some of them described in graphic detail situations of child sexual abuse, bestiality, murder, suicide, torture, self-mutilation, and incest.

Time magazine reports that one worker began to have recurring hallucinations as a result of content he encountered on the job. All four workers interviewed by Time magazine said that they had suffered psychological trauma as a result of their work.

Approximately 36 workers were hired to perform tasks on behalf of OpenAI, and each was reportedly expected to "read and label 150 to 250 texts per nine-hour shift."

The outsourcing was handled by a San Francisco-based company called Sama, which has workers in Kenya, Uganda, and India; according to Time magazine, OpenAI signed three contracts for labeling work in late 2021, totaling about $200,000 The company said.

Sama says its employees have access to individual and group sessions with a professional mental health therapist, which they can access at any time. However, employees interviewed by Time magazine said they only had access to group sessions.

"Our mission is to ensure that artificial intelligence benefits all of humanity, and we are working to build a safe and useful AI system that limits bias and harmful content," an OpenAI spokesperson told Time magazine regarding the outsourced data processing work. " Harmful (text and image) classification and filtering is a necessary step in creating a tool that can detect harmful content while minimizing the amount of violent and sexual content in the training data."

According to Time, the nature of the narrow work for OpenAI took a different turn when it began collecting "sexually violent images" in February 2022, some of which are considered illegal in the US. OpenAI stated that labeling harmful images was a "necessary step" to ensure the safe use of its tools, but that Sama did not intend to collect the most extreme categories of images and that this was miscommunication.

Sama ultimately terminated its contract with OpenAI early. According to the report, Sama's team raised concerns about the content of the images, and the contract between the two companies ultimately fell apart. In the aftermath, some of Sama's employees were moved to lower-paying contracts or had their positions terminated altogether.Time's full report (opens in new tab) goes into more detail about OpenAI's relationship with Sama.

OpenAI is currently valued at billions of dollars. Microsoft is reportedly looking to put more money into the AI company despite recent mass layoffs and has announced plans to integrate OpenAI's technology into its services.

Moderation work has long been associated with some degree of human suffering: a 2019 report on the mental wellbeing of employees on the moderation team used by Facebook (opens in new tab) describes long-term trauma symptoms as a result of the work It describes.

The need for OpenAI labeling is one aspect of a larger ethical crisis that is growing at the heart of AI research. Machines cannot learn to behave like humans without human-made materials, but not everyone wants their work fed into algorithms. Last year, artists began labeling their work "without AI" to ward off companies collecting training data for image generators. Now it's the opposite problem. There are materials that bot makers do not want AI to influence. Here again, the job of training respectable AI bots is left to people, in this case workers hired to read the web's most disturbing content.

.

Categories