Itemoids

Africa

The Dictator Myth That Refuses to Die

The Atlantic

www.theatlantic.com › international › archive › 2023 › 07 › authoritarianism-dictatorship-effectiveness-china › 674820

Last week, at a Fox News town hall (where else?), former President Donald Trump called China’s despot, Xi Jinping, a “brilliant” guy who “runs 1.4 billion people with an iron fist.” Lest anyone doubt his admiration, Trump added that Xi is “smart, brilliant, everything perfect. There’s nobody in Hollywood like this guy.”

Trump is not alone. Many in the United States and around the globe see the allure of a dictator who gets things done and makes the trains run on time, no matter the rules or laws that stand in the way. According to repeated polling, roughly one in four Americans agrees with the statement that a “strong leader who doesn’t have to bother with Congress and elections” is desirable. A much higher proportion of citizens agrees with that sentiment elsewhere, including in some of the most populous democracies: 55 percent of Indians, 52 percent of Indonesians, 38 percent of Nigerians, and 31 percent of Japanese.

This grass-is-greener view of authoritarian rule tends to emerge most often where governments are failing to meet popular expectations. When democracy delivers, dictatorship doesn’t seem like a rosy alternative. Only 6 percent of Germans and 9 percent of Swedes are seduced by strongmen.

[Brian Klaas: Democracy has a customer-service problem]

Admiration for autocracy is built on a pernicious lie that I call the “myth of benevolent dictatorship.” The myth is built on three flimsy pillars: first, that dictators produce stronger economic growth than their democratic counterparts; second, that dictators, unswayed by volatile public opinion, are strategic long-term thinkers; and third, that dictators bring stability, whereas divided democracies produce chaos.  

Two decades ago, the United States and its Western allies became embroiled in Iraq and later blundered into the financial crisis, leading think tanks to begin praising the “Beijing Consensus,” or the “China Model,” as an alternative to liberal democracy. Critiques of democracy surged in popularity in the era of Trump and Brexit. In the United States, intellectual publications ran articles arguing that the problem was too much democracy. In 2018, The Times of London published a column titled “Our Timid Leaders Can Learn From Strongmen.” China’s state media, capitalizing on the West’s democratic woes, argued that democracy is a “scary” system that produces self-inflicted wounds.

But events and new research in the past several years have taken a wrecking ball to the long-standing myth of benevolent dictatorship. All three pillars of the lie are crumbling. Every fresh data point proves Winston Churchill right: “Democracy is the worst form of Government, except for all those other forms that have been tried from time to time.”

Let’s start with the myth that dictatorships produce stronger growth. This falsehood arose from a few well-known, cherry-picked examples, in which despots oversaw astonishing transformations of their national economy. Starting in the late 1950s, Lee Kuan Yew helped transform Singapore from a poor, opium-filled backwater into a wealthy economic powerhouse. And in China, per capita GDP rose from nearly $318 in 1990 to more than $12,500 today. Those successes are eye-popping.

But a systematic evaluation of the overall data reveals another reality. Even with these outliers of strong growth, most rigorous studies have found limited or no evidence that authoritarian regimes produce better economic growth than democratic ones. Some researchers, such as the political economists Darren Acemoglu and James Robinson, have found compelling evidence that the inclusive political institutions of democracy are one of the strongest factors in producing stable, long-term growth.

When authoritarian regimes do succeed economically, they often do so at a cost, because even booming dictatorships are prone to catastrophic busts. As the political scientist Jacob Nyrup has written: “China has within a 50-year time frame both experienced a famine, where 20-45 million people died, and an economic boom, where hundreds of millions of people were lifted out of poverty.” The rosiest interpretation of the authoritarian economic data, then, is that autocrats may sometimes preside over marginally higher growth, but with a much greater risk of economic collapse. That’s not a wise trade-off.

However, the myth of strongmen as economic gurus has an even bigger problem. Dictators turn out to have manipulated their economic data for decades. For a long time, they’ve fooled us. But now we have proof: The reason their numbers sometimes seem too good to be true is that they are.

Every government has motivation to fudge its economic data. But democracies have institutions that provide oversight and block politicians from that impulse, ensuring accurate figures. No such checks exist in dictatorships.

That difference led Luis Martinez, an economist at the University of Chicago, to test whether despots were overstating their growth rate. He did so with an ingenious method. Previous studies have verified the presence of a strong, accurate correlation between the amount of nighttime light captured by satellites and overall economic activity. When economies grow, they emit more nighttime light (which is why you can clearly pick out cities on a nighttime satellite image, and why the density of light is so much lower in Africa than, say, in Europe or on the American East Coast). High-resolution images allow researchers to track changes in nighttime illumination over time, and the detailed, granular data these images produce are nearly impossible to manipulate. Martinez discovered an astonishing disparity suggesting that dictators have been overstating their GDP growth by about 35 percent.

And the more the numbers are checked, the more manipulation is exposed. In Rwanda, where The New York Times has named President Paul Kagame “the global elite’s favorite strongman” because of his apparently brilliant record of economic growth, the government claimed that it had decreased poverty by 6 percent from 2010 to 2014. Researchers found that the inverse was true: Poverty had actually surged by 5 to 7 percent. Fittingly, the notion that Benito Mussolini made the trains run on time was a lie; he built ornate stations and invested in train lines used by elites, but the commuting masses got left behind.

[Read: The undoing of China’s economic miracle]

Even China, the apparent authoritarian economic miracle, is showing signs of slowing down, its growth model no longer so well matched to the global economy. Such cracks in growth are an innate feature of autocracy. Because dictatorships criminalize dissent, normal mechanisms of economic feedback are broken, and the system doesn’t self-correct when blundering into economic mistakes. Beijing’s quixotic quest to maintain perpetual “zero COVID” was a case in point. Autocrats are adept at building ports and roads and mines. But thriving modern economies are sustained less by open mines than by open minds, of which dictatorships, by design, have a limited supply.

Advocates for the myth of benevolent dictatorship conveniently ignore a crucial fact, which is that much of the growth in autocracies comes either from manufacturing products that were invented in the more open societies of the democratic West, or from exporting goods to rich democracies. (The top destinations for Chinese exports are the United States, Japan, and South Korea.) In that way, even the outliers of autocratic growth depend for their success on the innovation and consumer wealth of democracies. Would China have lifted millions out of poverty through export-led growth quite so fast if democratic America hadn’t become an economic powerhouse first?

The myth’s second pillar turns out to be no less rickety than the first. It holds that dictators are more strategic long-term thinkers than democrats because they’re not beholden to fickle public opinion. But this lie is believable only if you don’t understand how most dictatorships actually work.

Over more than a decade, I’ve studied and interviewed despots and the henchmen who surround them. One conclusion I’ve drawn is that making decisions based on bad information is an intrinsic feature of the systems dictators run. The longer despots cling to power, the more likely they are to fall into what I call “the dictator trap,” in which they crush dissent, purge anyone who challenges them, and construct their own reality through propaganda, all to maintain control. Speaking truth to power in such a system can literally be deadly. As a result, dictators are told only what they want to hear, not what is true, and they begin to believe their own lies. Vladimir Putin’s catastrophic war in Ukraine is a tragic illustration of the dictator trap: Putin got high on his own supply, and innocent Ukrainians are the victims of his power trip.

Despots often use their power not for long-term planning, but for short-term self-glorification, as no end of examples can attest. Turkmenistan’s former dictator Saparmurat Niyazov blew millions to build, in his own honor, a golden statue that would rotate to always face the sun. In another stroke of genius, he closed all rural hospitals so that the sick could have the privilege of being treated in his pristine marble capital of Ashgabat. Most of the population lived outside the city, and countless thousands likely died because they couldn’t reach a hospital in time. His successor erected an enormous golden statue of his favorite breed of dog. Thankfully, democracies have checks and balances to suppress such narcissistic whims.

The most persistent pillar of the myth, however, is the one that holds that dictators produce stability. Some dictators have hung on to power for decades. Before his death, Muammar Qaddafi ruled Libya for 42 years. Paul Biya of Cameroon, an 89-year-old despot who had no idea where he was during a recent event, took office during the Vietnam War. Putin has been in power for more than two decades; Xi has ruled for only one so far, but he appears prepared to retain his position indefinitely.

To stay in power, authoritarian leaders face constant trade-offs. If they strengthen military or paramilitary leaders, they face the risk of a coup d’état. But if they weaken their men under arms, then they can’t protect themselves from external invasion. To keep their elites happy, despots need to make them rich through corruption—usually at the expense of the population. But a ruling class awash in ill-gotten gains could inspire a revolution, or a wild card: assassination. Autocrats appear stable, but they’re not. They’re constantly vulnerable, forced to make every decision based on what will stave off threats to survive in power.

The stability that does exist in autocracies is, ironically, derived partially from the trappings of democracy. Recent research has made clear that dictators have developed mechanisms to “mimic democracy to prolong autocracy.” Most authoritarian leaders now hold elections, but rig them. Some use parliaments or courts to enact unpopular decisions while avoiding blame.

[From the December 2021 issue: The bad guys are winning]

Eventually, though, dictatorships tend to fall apart. And when they collapse, they really collapse. Elections in democracies change governments, not regimes. Personalist dictatorships, by contrast, often implode. When Qaddafi was killed, Libya disintegrated. He had deliberately designed the political system to function only with him at its center. The same could be true of Putin’s Russia. When he is toppled or dies, the country won’t have a smooth, peaceful transition.

The often-disastrous demise of autocrats creates a negative feedback loop. Nearly seven in 10 leaders of personalist dictatorships end up jailed, exiled, or killed once they lose power. While in power, many despots are aware of this grim fact, and so they use violence to stay in power, often growing more extreme as they lurch toward their downfall. The effect can hardly be called “stability,” even if the same person occupies the palace for decades.

For anyone who still clings to the illusion that dictatorships are likely to be prosperous, strategically wise, or internally stable, I propose a simple test. Imagine that someone wrote down the names of all the countries in the world on little slips of paper and then separated them into two hats: one for democracy, one for dictatorships. You would select one of the two hats, draw a slip of paper from it, look at the name, and then spend the rest of your life living in that country. Who knows, maybe you’d get lucky and end up in an authoritarian regime that seems stable and is producing steady growth. But I know which hat I would choose. And even if you fantasize about finding the unicorn that is a benevolent strongman, I suspect you do too.

America Already Has an AI Underclass

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 07 › ai-chatbot-human-evaluator-feedback › 674805

On weekdays, between homeschooling her two children, Michelle Curtis logs on to her computer to squeeze in a few hours of work. Her screen flashes with Google Search results, the writings of a Google chatbot, and the outputs of other algorithms, and she has a few minutes to respond to each—judging the usefulness of the blue links she’s been provided, checking the accuracy of an AI’s description of a praying mantis, or deciding which of two chatbot-written birthday poems is better. She never knows what she will have to assess in advance, and for the AI-related tasks, which have formed the bulk of her work since February, she says she has little guidance and not enough time to do a thorough job.

Curtis is an AI rater. She works for the data company Appen, which is subcontracted by Google to evaluate the outputs of the tech giant’s AI products and search algorithm. Countless people do similar work around the world for Google; the ChatGPT-maker, OpenAI; and other tech firms. Their human feedback plays a crucial role in developing chatbots, search engines, social-media feeds, and targeted-advertising systems—the most important parts of the digital economy.

Curtis told me that the job is grueling, underpaid, and poorly defined. Whereas Google has a 176-page guide for search evaluations, the instructions for AI tasks are relatively sparse, she said. For every task she performs that involves rating AI outputs, she is given a few sentences or paragraphs of vague, even convoluted instructions and as little as just a few minutes to fully absorb them before the time allotted to complete the task is up. Unlike a page of Google results, chatbots promise authoritative answers—offering the final, rather than first, step of inquiry, which Curtis said makes her feel a heightened moral responsibility to assess AI responses as accurately as possible. She dreads these timed tasks for the very same reason: “It’s just not humanly possible to do in the amount of time that we’re given.” On Sundays, she works a full eight hours. “Those long days can really wear on you,” she said.

Armughan Ahmad, Appen’s CEO, told me through a spokesperson that the company “complies with minimum wages” and is investing in improved training and benefits for its workers; a Google spokesperson said Appen is solely responsible for raters’ working conditions and job training. For Google to mention these people at all is notable. Despite their importance to the generative-AI boom and tech economy more generally, these workers are almost never referenced in tech companies’ prophecies about the ascendance of intelligent machines. AI moguls describe their products as forces akin to electricity or nuclear fission, like facts of nature waiting to be discovered, and speak of “maximally curious” machines that learn and grow on their own, like children. The human side of sculpting algorithms tends to be relegated to opaque descriptions of “human annotations” and “quality tests,” evacuated of the time and energy powering those annotations.

[Read: Google’s new search tool could eat the internet alive]

The tech industry has a history of veiling the difficult, exploitative, and sometimes dangerous work needed to clean up its platforms and programs. But as AI rapidly infiltrates our daily lives, tensions between tech companies framing their software as self-propelling and the AI raters and other people actually pushing those products along have started to surface. In 2021, Appen raters began organizing with the Alphabet Workers Union-Communications Workers of America to push for greater recognition and compensation; Curtis joined its ranks last year. At the center of the fight is a big question: In the coming era of AI, can the people doing the tech industry’s grunt work ever be seen and treated not as tireless machines but simply as what they are—human?  

The technical name for the use of such ratings to improve AI models is reinforcement learning with human feedback, or RLHF. OpenAI, Google, Anthropic, and other companies all use the technique. After a chatbot has processed massive amounts of text, human feedback helps fine-tune it. ChatGPT is impressive because using it feels like chatting with a human, but that pastiche does not naturally arise through ingesting data from something like the entire internet, an amalgam of recipes and patents and blogs and novels. Although AI programs are set up to be effective at pattern detection, they “don’t have any sense of contextual understanding, no ability to parse whether AI-generated text looks more or less like what a human would have written,” Sarah Myers West, the managing director of the AI Now Institute, an independent research organization, told me. Only an actual person can make that call.

The program might write multiple recipes for chocolate cake, which a rater ranks and edits. Those evaluations and examples will inform the chatbot’s statistical model of language and next-word predictions, which should make the program better at writing recipes in the style of a human, for chocolate cake and beyond. A person might check a chatbot’s response for factual accuracy, rate how well it fits the prompt, or flag toxic outputs; subject experts can be particularly helpful, and they tend to be paid more.

Using human evaluations to improve algorithmic products is a fairly old practice at this point: Google and Facebook have been using them for almost a decade, if not more, to develop search engines, targeted ads, and other products, Sasha Luccioni, an AI researcher at the machine-learning company Hugging Face, told me. The extent to which human ratings have shaped today’s algorithms depends on who you ask, however. Major tech companies that design and profit from search engines, chatbots, and other algorithmic products tend to characterize the raters’ work as only one among many important aspects of building cutting-edge AI products. Courtenay Mencini, a Google spokesperson, told me that “ratings do not directly impact or solely train our algorithms. Rather, they’re one data point … taken in aggregate with extensive internal development and testing.” OpenAI has emphasized that training on huge amounts of text, rather than RLHF, accounts for most of GPT-4’s capabilities.

[From the September 2023 issue: Does Sam Altman know what he’s creating?]

AI experts I spoke with outside these companies took a different stance. Targeted human feedback has been “the single most impactful change that made [current] AI models as good as they are,” allowing the leap from GPT-2’s half-baked emails to GPT-4’s convincing essays, Luccioni said. She and others argue that tech companies intentionally downplay the importance of human feedback. Such obfuscation “sockets away some of the most unseemly elements of these technologies,” such as hateful content and misinformation that humans have to identify, Myers West told me—not to mention the conditions the people work under. Even setting aside those elements, describing the extent of human intervention would risk dispelling the magical and marketable illusion of intelligent machines—a “Wizard of Oz effect,” Luccioni said.

Despite tech companies’ stated positions, digging into their own press statements and research papers about AI reveals that they frequently do acknowledge the value of this human labor, if in broad terms. A Google blog post promoting a new chatbot last year, for instance, said that “to create safer dialogue agents, we need to be able to learn from human feedback.” Google has similarly described human evaluations as necessary to its search engine. The company touts RLHF as “particularly useful” for applying its AI services to industries such as health care and finance. Two lead researchers at OpenAI similarly described human evaluations as vital to training ChatGPT in an interview with MIT Technology Review. The company stated elsewhere that GPT-4 exhibited “large improvements” in accuracy after RLHF training and that human feedback was crucial to fine-tuning it. Meta’s most recent language model, released this week, relies on “over 1 million new human annotations,” according to the company.

To some extent, the significance of humans’ AI ratings is evident in the money pouring into them. One company that hires people to do RLHF and data annotation was valued at more than $7 billion in 2021, and its CEO recently predicted that AI companies will soon spend billions of dollars on RLHF, similar to their investment in computing power. The global market for labeling data used to train these models (such as tagging an image of a cat with the label “cat”), another part of the “ghost work” powering AI, could reach nearly $14 billion by 2030, according to an estimate from April 2022, months before the ChatGPT gold rush began.

All of that money, however, rarely seems to be reaching the actual people doing the ghostly labor. The contours of the work are starting to materialize, and the few public investigations into it are alarming: Workers in Africa are paid as little as $1.50 an hour to check outputs for disturbing content that has reportedly left some of them with PTSD. Some contractors in the U.S. can earn only a couple of dollars above the minimum wage for repetitive, exhausting, and rudderless work. The pattern is similar to that of social-media content moderators, who can be paid a tenth as much as software engineers to scan traumatic content for hours every day. “The poor working conditions directly impact data quality,” Krystal Kauffman, a fellow at the Distributed AI Research Institute and an organizer of raters and data labelers on Amazon Mechanical Turk, a crowdsourcing platform, told me.

Stress, low pay, minimal instructions, inconsistent tasks, and tight deadlines—the sheer volume of data needed to train AI models almost necessitates a rush job—are a recipe for human error, according to Appen raters affiliated with the Alphabet Workers Union-Communications Workers of America and multiple independent experts. Documents obtained by Bloomberg, for instance, show that AI raters at Google have as little as three minutes to complete some tasks, and that they evaluate high-stakes responses, such as how to safely dose medication. Even OpenAI has written, in the technical report accompanying GPT-4, that “undesired behaviors [in AI systems] can arise when instructions to labelers were underspecified” during RLHF.

Tech companies have at times responded to these issues by stating that ratings are not the only way they check accuracy, that humans doing those ratings are paid adequately based on their location and afforded proper training, and that viewing traumatic materials is not a typical experience. Mencini, the Google spokesperson, told me that Google’s wages and benefits standards for contractors do not apply to raters, because they “work part-time from home, can be assigned to multiple companies’ accounts at a time, and do not have access to Google’s systems or campuses.” In response to allegations of raters seeing offensive materials, she said that workers “select to opt into reviewing sensitive content, and can opt out freely at any time.” The companies also tend to shift blame to their vendors—Mencini, for instance, told me that “Google is simply not the employer of any Appen workers.”  

[Read: The coming humanist renaissance]

Appen’s raters told me that their working conditions do not align with various tech companies’ assurances—and that they hold Appen and Google responsible, because both profit from their work. Over the past year, Michelle Curtis and other raters have demanded more time to complete AI evaluations, benefits, better compensation, and the right to organize. The job’s flexibility does have advantages, they told me. Curtis has been able to navigate her children’s medical issues; another Appen rater I spoke with, Ed Stackhouse, said the adjustable hours afford him time to deal with a heart condition. But flexibility does not justify low pay and a lack of benefits, Shannon Wait, an organizer with the AWU-CWA, told me; there’s nothing flexible about precarity.

The group made headway at the start of the year, when Curtis and her fellow raters received their first-ever raise. She now makes $14.50 an hour, up from $12.75—still below the minimum of $15 an hour that Google has promised to its vendors, temporary staff, and contractors. The union continued raising concerns about working conditions; Stackhouse wrote a letter to Congress about these issues in May. Then, just over two weeks later, Curtis, Stackhouse, and several other raters received an email from Appen stating, “Your employment is being terminated due to business conditions.”

The AWU-CWA suspected that Appen and Google were punishing the raters for speaking out.  “The raters that were let go all had one thing in common, which was that they were vocal about working conditions or involved in organizing,” Stackhouse told me. Although Appen did suffer a drop in revenue during the broader tech downturn last year, the company also had, and has, open job postings. Four weeks before the termination, Appen had sent an email offering cash incentives to work more hours and meet “a significant spike in jobs available since the beginning of year,” when the generative-AI boom was in full swing; just six days before the layoffs, Appen sent another email lauding “record-high production levels” and re-upping the bonus-pay offer. On June 14, the union filed a complaint with the National Labor Relations Board alleging that Appen and Google had retaliated against raters “by terminating six employees who were engaged in protected [labor] activity.”

Less than two weeks after the complaint was filed, Appen reversed its decision to fire Curtis, Stackhouse, and the others; their positions were reinstated with back pay. Ahmad, Appen’s CEO, told me in an email that his company bases “employment decisions on business requirements” and is “happy that our business needs changed and we were able to hire back the laid off contributors.” He added, “Our policy is not to discriminate against employees due to any protected labor activities,” and that “we’ve been actively investing in workplace enhancements like smarter training, and improved benefits.”

Mencini, the Google spokesperson, told me that “only Appen, as the employer, determines their employees’ working conditions,” and that “Appen provides job training for their employees.” As with compensation and training, Mencini deflected responsibility for the treatment of organizing workers as well: “We, of course, respect the labor rights of Appen employees to join a union, but it’s a matter between them and their employer, Appen.”

That AI purveyors would obscure the human labor undergirding their products is predictable. Much of the data that train AI models is labeled by people making poverty wages, many of them located in the global South. Amazon deliveries are cheap in part because working conditions in the company’s warehouses subsidize them. Social media is usable and desirable because of armies of content moderators also largely in the global South. “Cloud” computing, a cornerstone of Amazon’s and Microsoft’s businesses, takes place in giant data centers.

AI raters might be understood as an extension of that cloud, treated not as laborers with human needs so much as productive units, carbon transistors on a series of fleshly microchips—objects, not people. Yet even microchips take up space; they require not just electricity but also ventilation to keep from overheating. The Appen raters’ termination and reinstatement is part of “a more generalized pattern within the tech industry of engaging in very swift retaliation against workers” when they organize for better pay or against ethical concerns about the products they work on, Myers West, of the AI Now Institute, told me.

Ironically, one crucial bit of human labor that AI programs have proved unable to automate is their own training. Human subjectivity and prejudice have long migrated their way into algorithms, and those flaws mean machines may not be able to perfect themselves. Various attempts to train AI models with other AI models have bred further bias and worsened performance, though a few have shown limited success. “I can’t imagine that we will be able to replicate [human intervention] with current AI approaches,” Hugging Face’s Luccioni told me in an email; Ahmad said that “using AI to train AI can have dire consequences as it pertains to the viability and credibility of this technology.” The tech industry has so far failed to purge the ghosts haunting its many other machines and services—the people organizing on warehouse floors, walking out of corporate headquarters, unionizing overseas, and leaking classified documents. Appen’s raters are proving that, even amid the generative-AI boom, humanity may not be so easily exorcized.