Exploitation: Training ChatGPT for Less than $2 an Hour.

Humans Remain the Source, AI Brings Progress but Not Kindness

Researchers from the University of Pennsylvania and OpenAI recently published reports indicating that, using the US labor market as an example, about 80% of workers will have at least 10% of their job tasks speeded up through AI or even fully automated.

While AI is believed to replace tedious basic labor, allowing workers to pursue more fulfilling and humane tasks, these startups selling the latest AI models with billions of dollars in valuation, such as OpenAI, are built on a mass of monotonous, repetitive low-skilled labor.

A joint investigation report by The New York Magazine and The Verge recently revealed the bitter supply chain behind generative AI models.

A video about self-driving car learning that lasts for a few seconds is labeled as an 8-hour task and is only compensated with $10.

Let’s take the example of Joe, a recent university graduate from Nairobi, Kenya. He applied for a job as an annotator, which involves labeling and categorizing data for AI to learn. Initially, he helped add annotations to the footage captured by self-driving cars, allowing the AI to recognize vehicles, pedestrians, and trees. However, it took Joe about 8 hours to organize and label a few seconds of learning footage, and he earned a meager $10 for his efforts.

Later on, Joe joined a large company called Remotasks, which offered him four times the salary but required him to work in isolation for long hours. His tasks included classifying different types of clothing in photos and inferring the location of a room from the perspective of a cleaning robot’s camera. Moreover, he was prohibited from discussing ongoing projects with his colleagues.

In fact, Remotasks is a subsidiary of Scale AI, whose clients include OpenAI and the US military. In Silicon Valley, there are more companies like Scale AI that offer pre-organized and categorized data to tech giants such as OpenAI, Microsoft, and Google, allowing them to train their AI models more quickly. However, these data providers mostly rely on cheap labor from third-world countries.

Tagging gruesome and violent content, causing mental distress, and earning less than 2 US dollars per hour.

According to a report by “Noema Magazine”, unlike AI researchers in Silicon Valley who earn six-figure salaries, the data annotators who help build AI’s basic cognition usually come from economically disadvantaged countries. For example, Venezuela is a major base for providing training data for self-driving car visual recognition systems, while in Bulgaria, many Syrian refugees provide the necessary analysis data for facial recognition. Their average hourly wage is mostly less than 2 US dollars.

Aside from the issue of low wages, some data labeling work may also poison the physical and mental health of workers.

Earlier this year, a report by “Times” exposed that OpenAI has been using a large number of Kenyan workers through outsourcing company Sama since 2021 to provide data for training ChatGPT to recognize inappropriate content. Data annotators need to browse a large amount of material related to sexual abuse, torture, and bloody violence every day. Despite the company providing interviews on mental health, many employees still suffer from mental trauma.

Currently, some people in the industry believe that this phenomenon is only a transitional period. Sonam Jindal, head of the non-profit organization Partnership on AI responsible for researching AI practices, pointed out that many companies believe that this is only the work required during the initial modeling period. “Once the model is completed, there is no need for such a large amount of repetitive work, so why care about it?”

Is the exploited annotator merely a product of the transitional period? Expert: There will always be new situations that require annotation.

However, annotating for AI may not be so simple.

Milagros Miceli, a German data researcher, pointed out that the entire annotation industry generally faces the problem of inconsistent standards. “While humans can understand what a ‘shirt’ is with just a few examples, machines need tens of thousands of examples.” And thousands of examples usually need to be divided among at least hundreds of workers to complete the labeling.

Some annotators will label a shirt reflected in a mirror, while others will not label a shirt that is neatly folded. There are many other situations, such as shirts hanging on hangers, shirts made of raincoat materials, or shirts on fire, where different people have different labeling principles. This makes it impossible for the model to completely learn to recognize shirts.

In 2018, Uber killed a woman while testing a self-driving car because it was set to avoid cyclists and pedestrians but did not know how to handle a person walking a bicycle across the road. Of course, the person responsible for distributing labeling tasks can create a thick manual to tell annotators when to label, but there will always be special situations that cannot be exhaustively covered, and humans will need to establish standards again.

Eliminate menial labor, anticipate greater involvement of professionals, and elevate the annotation industry.

Erik Duhaime, the CEO of Centaur Labs, a medical data annotation company, has stated that annotation work is like another industrial revolution for AI: a massive workflow is broken down into small and simple tasks that are repeated in large numbers along the production line, with some steps completed by machines and some by humans, just like in Charlie Chaplin’s “Modern Times.”

Perhaps in the visible future, AI models will still rely on the labor of annotators, but it still has the potential to get better and better.

Currently, annotation workers employed by Remotasks in the US typically earn between $10 and $25 per hour, with income for specific topics or experts being higher. An annotator named Anna shared the process of training the chatbot Sparrow, developed by Google DeepMind. Anna spends her entire day chatting with the machine and enjoys the process. Sometimes they discuss science fiction novels, share jokes and TV shows, “sometimes the bot’s response makes me laugh and vice versa.” And she can earn about $14 per hour, which is better than the local minimum wage.

Edwin Chen, founder of the annotation start-up Surge, founded in 2020, believes that the industry needs to move away from the past low-skilled labor model.

“If we want AI’s thinking patterns to be more diverse, we must use more professional annotation methods to enable AI to capture the unique creativity and thought values of humans.”

Recently, there has been an increase in demand for other advanced and complex annotation needs in the market.

In May, Scale AI began listing more advanced annotation jobs on its website, recruiting professionals from almost every industry, such as finance, nutrition, law, and literature, to train AI to recognize professional knowledge. You might be able to teach AI to recognize legal provisions for $45 per hour, or to write poetry for $25 per hour.

Now, other leading companies in the AI field, such as Anthropic and Meta, have even started using GPT-4 to generate training data, attempting to eliminate the need for manual annotation. Sam Altman, CEO of OpenAI, believes that as AI advances, the demand for human annotation data will decrease.