AI and the Transformation of Capitalism (Talk)

Dr. Anders Sandberg: Counterfeit Fitness Signals: Reward Hacking in Brains, AI, Markets and Societies

In the beginning of the 1950’s, scientists understood little about human behavior, reasoning and motivation. However, in a famous 1954 experiment, James Olds and Peter Milner discovered some of the earliest evidence for the existence of the pleasure center in the brain. Olds and Milner placed electrodes deep within the brains of rats and allowed the rats to press a lever releasing tiny electric jolts to the electrodes. Quickly, the scientists discovered that when they placed electrodes in specific areas of the brain, the rats would repeatedly press the lever up to 2000 times per hour, preferring to press the lever over most other stimulation. This act of hacking the brain’s reward system via electrical stimulation was later dubbed “wireheading.”

The phenomena of hacking an agent’s reward system, however, is not unique to neuroscience. As early as the 1700’s, philosophers such as Jean-Jacques Rousseau were concerned about societies "pleasuring themselves to death” through overindulging in luxuries. During the Victorian era, people thought that the rise of artificial environments would make mankind parasitic and entirely dependent on technology. And in the modern world, human desires have arguably shaped the rise of civilizations. From Facebook likes to fast food, we’ve built our environments to offer instant gratification at the expense of our long-term mental and physical health. 

Reward hacking similarly appears in the development of artificial intelligence. Within the field of reinforcement learning, agents have discovered loop-holes in their reward function, optimizing for behaviors other than researchers’ intended goals. For example, within the game CoastRunners 7, the reinforcement learning agent learned to repeatedly turn in a large circle and knock over targets, infinitely accumulating points instead of finishing the course. Numerous other studies have observed similar behaviors where AI systems exploit loop-holes in their objective function to maximize rewards. 

This leads us to the question, does reward hacking exist in capitalism? While many companies and states alter their financial accounts to look more profitable, markets themselves have incentives to make products more addictive or profitable in the short-term. Furthermore, irrespective of whether you believe that capitalism maximizes for profit or that profit is a proxy for utility, both metrics can be short-circuited and neglect other considerations.  

The common failure mode of complex adaptive systems—including biological, social, and economic systems—is their reliance on signals indicating success. This poses an interesting set of questions: 

  • How might solutions to mitigating reward hacking in economic, political, or biological systems apply to the development of artificial intelligence? Could this give us ideas about how to improve markets? 

  • How do we stabilize complex, adaptive, self-ramming systems?

  • As we build more and more flexible systems, can we prevent them from going haywire because of their reward functions?

Discussion

Dr. Sandberg’s presentation sparked an interesting discussion about the consequences of narrow optimization. Metrics that correlate to wellbeing—for example, GDP per capita or shareholder value—fail to hold up in increasingly complex systems. There are always gaps between what we optimize for and what society actually cares about. Consequently, the biggest problems affecting society are alignment problems. From climate change to wealth inequality, optimizing for metrics such as profit or GDP fails to account for humanity’s other considerations. 

Given the prevalence of reward hacking in AI, markets, and societies, the group proposed three alternatives to narrow optimization. First, we could define better quantitative metrics. If we could infer causal models that link easily observed metrics to what we actually care about, then we could optimize for the right set of metrics. These metrics would also hopefully be more resilient to reward-hacking.

Second, we could stop relying on quantitative metrics and optimize for broader qualitative goals. For example, inverse reinforcement learning and cooperative reinforcement learning start with human preferences as a primitive. As the AI safety literature shifts away from relying on explicit objective functions, could markets do the same? Could we find better ways to aggregate human preferences and align markets with those preferences?

Third, we could get rid of fixed objectives and reimagine what healthy markets would look like. In other complex systems such as the environment, there is no optimization point. There’s only balance. It’s well-established that the health of an ecosystem relies on its complexity (i.e. biodiversity). Simpler ecosystems are more fragile and unhealthy. Applying this idea to markets, we could measure economic balance and informational complexity in relation to the economy’s health.

Each of the three approaches, however, presents its own set of complications and questions. Better quantitative metrics may still be vulnerable to reward hacking. Optimizing for more abstract goals such as human preferences assumes that satisfying people’s preferences leads to their flourishing. Lastly, operationalizing “health” or “balance” requires defining what a healthy or balanced economy looks like and might lead us back to relying on narrow metrics. 

Peter Eckersley: AI & The Transformation of Capitalism

Fears about artificial intelligence are pervasive. From the automation of low-wage labor to how authoritarian states utilize AI, concerns about AI cut across diverse segments of society. In particular, researchers working on the development of AI worry about the consequences of building large-scale optimization systems that are misaligned with human values. However, one such powerful system already exists: capitalism. Capitalism is a form of artificial intelligence. 

Capitalism and artificial intelligence are both powerful optimization systems, but their relationship is more than metaphorical. New research shows that they share a deeper mathematical relationship. Markets are a type of neural network that perform gradient descent by backpropagation. More specifically, supply chains and competitive markets learn through backpropagation.  

Capitalism and AI have another property in common: they have the wrong objective function. From economic theory, it’s clear markets fail when it comes to issues such as providing public goods, inequality, long-term planning, managing tail risks, and accounting for externalities. Sometimes policies are able to correct these externalities through taxation, subsidies, or other legal restrictions. However, these policies frequently take decades to implement rather than months. If capitalism had better objectives, many of these problems would not arise in the first place. 

The AI Objectives Institute therefore proposes the creation of a new institution focused on aligning capitalism as the first case of powerful AI. For artificial intelligence, markets and humanity, our objective is to create better objectives. 

Brittney Gallagher: Conversational AI for Adjusting Capitalism

While artificial intelligence and capitalism share similar mathematical properties, AI could also help us solve alignment problems in markets. Natural language processing—the field of AI research focused on helping computers understand human language—offers a wealth of opportunities to better aggregate human values and experiences. 

One example of this is conversational AI. Over the past few decades, conversational AIs have gotten better and better at imitating and understanding human language. For example, with the conversational AI chatbot Replika, many of their users develop deep, romantic relationships with the bot. However, the potential applications of conversational AI extend beyond creating romantic or platonic relationships between AIs and humans: conversational AI could transform the nature of economic research and policy. With advanced language models such as GPT-3 or WuDao 2.0, we could potentially aggregate qualitative feedback from millions of people regarding their preferences and quality of life. The AI Objectives Institute is consequently motivated by the following questions: 

  • What are the most impactful and scalable ways we could use conversational AI?

  • Can we get language models to understand the lived human experience? 

  • Can we produce actionable, democratic feedback for economic policy with them?

Given these questions, the institute has three preliminary ideas for how to use language models. First, we could build a volunteer chatbot for the institute. Second, we could co-parent an evolutionary AI to teach the AI human values and build relationships between humans and AIs. Lastly, we could scale qualitative social science research to better inform policy decisions and create a statistic of stories. 

Previous
Previous

Aligning recommender systems with Jonathan Stray (Talk)

Next
Next

Introducing the AI Objectives Institute