Modeling incentives at scale using LLMs
Originally posted on LessWrong.
Bruno Marnette, Philipp Zahn
The goal of this post is to collect feedback on a new project idea. This work would be a collaboration between the AI Objectives Institute (AOI), MetaGov and other partner institutions.
Project PIs: Bruno Marnette (AOI) and Philipp Zahn (MetaGov, 20squares).
Special thanks to Colleen McKenzie, Matija Franklin, Timothy Telleen-Lawton, Gaia Dempsay, Ping Yee, Justin Stimatze, Cleo Nardo, Tushant Jha, Deger Turan and others for feedback and review.
Motivations
Humans routinely model what the incentives of other humans may be. Historians look at the incentives of powerful individuals to explain why they made specific decisions. Journalists look at the incentives of large companies to uncover collusions and conflicts of interests. Economists use incentive models to understand markets. Lawmakers look into existing incentives before introducing new ones.
There is however a natural limit to how much modeling and analysis a single person or a small team can do and this is where we see an opportunity to leverage LLMs. Instead of relying mostly on human labor to build incentive models, we can now delegate more of the work to machines to reduce cost and increase the scale. There are two main advantages:
LLM-powered incentive models could process larger amounts of information, thus making them more complete, more balanced, and more accurate.
The automation and flexibility of an LLM pipeline allows for rapid iteration and refinement. We can adapt our models by simply adjusting prompts.
Working definitions
We define "incentive" as "factors that may encourage an entity to take a particular action or engage in a specific behavior". The word "factors" can point to different types of motivations, forces, pressures, or rewards and punishments. Depending on the application domain, we may be interested in capturing financial, legal, social, cultural, ideological, or psychological factors.
Research questions
There is already a large body of evidence confirming that LLMs can perform in-depth analysis of documents, as long as they are prompted in the right way and provided with the right context. For example, a MIT paper has illustrated very well how LLMs can be used to produce precise and formal models using a probabilistic programming language. An Anthropic paper has shown how LLMs with large context windows can extract insights from large datasets, even when the datasets consist of human opinions and subjective, hard-to-interpret statements. Likewise, research conducted at AOI has shown that frontier LLMs can identify and synthesize valuable clusters of information from a range of complex sources and input formats (including e.g. video interviews in various languages). And even if today's LLMs are fairly noisy and produce variable quality of results, it is fair to assume that the next generation of LLMs will be more capable and more reliable than what we have today.
That's why we believe the interesting questions (for this project) are less about the raw capabilities of foundational LLMs, and more about how to make the best use of LLMs when extracting and aggregating information related to incentives. More specifically, we're planning to address the following:
Data and copyrights – What are the best datasets to use as sources of potential incentives? Can LLMs be used to evaluate the reliability of different sources? What biases will data availability introduce? How much detail can we extract from a particular source while still remaining within the bounds of Fair Use?
Modeling – How should we model the different incentives of different agents? Should we try to model concrete goals or more abstract motivations? How precise and granular do we want to be? Do we want to assign probability estimates to different scenarios? What is the right level of sophistication that we can reasonably aim for? Should we produce knowledge graphs or should we produce code? What would be processes that can identify which modeling approach is most successful in providing value to a human observer?
Prompting – Which prompts and which prompt engineering techniques should we use? How can we address the known limits and biases of current LLMs? Should we instruct LLMs to adopt specific principles or reason in specific ways (e.g. using Chain of Thought)? Which tools and processes should we leverage to come up with good prompts and test their performances?
Product – What are the benefits of different types of models when it comes to applications? What sort of interface would be most convenient for researchers and/or the public at large? What features will different use cases require?
Impact – How might the release of the aforementioned models and tools impact society at large? How would different actors react to having their incentives made clearer to other parties? Should we be worried about feedback loops and self-fulfilling prophecies? How might the models be misused for propaganda? Will reactions to these models improve future modeling? Could wrong conclusions reinforce similar errors in future modeling?
We're planning to assemble a team of collaborators and advisors coming from various disciplines to make progress on all five domain areas and then apply our learnings to multiple application areas.
Scope and methodology
In fact, one of our main ideas is to start by decomposing incentive problems into different components (e.g. motives, plans or goals) and then decide which of these components are the most important for the given domain.
To be more precise, we plan to break down the modeling process into two phases:
First, we'll try to build explanatory qualitative models by extracting a finite list of entities and a finite list of goals/scenarios of interest for a specific domain. This will provide us with a common vocabulary and a fixed ontology to map the information that will be extracted from many different sources. These models will also include directional information of the form "source S claims that entity E is likely to endorse (or disavow) the pursuit of goal G", but they won't include numerical estimates. Our models may also include graph edges and other structural elements, but the only information attached to matrix cells and graph edges will consist of text (the quotes extracted from the sources, not numbers).
As a second and separate step, when the data contain sufficient signals, we'll also try to use LLMs to turn our qualitative models into rich predictive numerical models to quantify the levels of influence and force of causal relations between entities and goals.
We prefer not to make too many assumptions about what types of formalisms will work best in different domains. Instead, we plan to test a relatively large number of possible formalisms. If the modeling work were done by hand, this approach would sound prohibitively inefficient. But since a lot of the work will be driven by LLMs, we believe it will be fairly easy for us to change and compare formalisms by making small changes to the prompts.
When working on qualitative models (Phase 1) we will optimize for balance, completeness and credible neutrality by choosing our sources based on simple and transparent criteria. When working on producing numerical models (in Phase 2), it may become more difficult for us to remain as neutral as we would like, because producing a number to represent a relationship will sometimes require resolving conflicts between contradictory sources, and it will sometimes require higher quality data that is not available for a given domain. We will, however, do our best to give users sufficient control to compare and contrast information from different sources.
Explanatory models
Here is an overview of how we envision producing explanatory qualitative models.
For each document added to our pipeline, we'll start by asking an LLM to extract the list of entities, individuals, organizations and institutions of interest in the document, and then we will use Wikipedia's API to match these entities to unique identifiers. We've already implemented this part and confirmed that it works well in many cases. In some instances the Wikipedia API may return several options, but we can then ask an LLM to choose the best page based on its description, and so far this seems to resolve most ambiguities.
After modeling the list of entities mentioned in each document, we will ask LLMs a long series of questions which will all be loosely or tightly related to incentives. For instance:
What are the stated goals of {entity}?
What does {entity} seem to be optimizing for?
What seem to be {entity}'s main motivations?
Which actions may {entity} be able to take?
Which scenarios would {entity} consider good and favorable to them?
Which scenarios would {entity} consider bad and unfavorable to them?
What may be {entity}'s main fears and concerns?
What resources does {entity} have at their disposal?
We will then use clustering techniques to merge similar answers, focus on the questions that produced consistent sets of answers, and use these sets of answers to start building ontologies. The experiments that we have run so far suggest that different questions work better in different situations.
For a geopolitical domain, we may want to extract the possible goals of different entities (e.g. conquering a specific territory or negotiating a ceasefire).
For more abstract domains, we may decide to extract different high-level optimisers (e.g. one entity may be optimizing for "safety" and another for "freedom")
For some domains, it may be important to consider the mood and psychology of the entities involved, while others would be best analyzed using material incentives (e.g. financial)
We will then select a fixed ontology (meaning a fixed list of goals, or favored scenarios, or something else…), and we will run our LLM pipeline again to extract a clean dataset mapping uniquely identified entities to uniquely identified goals from the pre-computed list. (We have observed in previous experiments that LLMs perform much better when provided with a clear target ontology).
For each cell of the entity/goal matrix, we will collect a set of reasons and explanations provided by different sources that propose a link between an entity and a goal. We will also use LLMs to provide summaries and aggregated scores for each cell.
We then plan to produce a graph or square matrix over entities representing possible links and influences. For instance, if an individual happens to be a major shareholder of a specific company, we may want to track that this individual has significant influence over that company, and this type of information is often publicly available. There are many different notions of "influence" to consider here, but we will try to experiment with different prompts/definitions and select the one that seems to extract the more interesting and meaningful insights.
Likewise, we plan to produce a graph or square matrix over goals representing causal links between the goals. Depending on the exact type of goals considered, we may again be interested in different definitions of "causal link". For instance, if the goals represent different potential scenarios or events, we may be interested in conditional forecasting (e.g. if P(B|A) is considered significantly higher than P(B), then we may draw an edge from A to B).
Predictive models
As discussed in the methodology section, we plan to experiment with different types of numerical models. For instance, when our ontology of goals can be mapped to potential scenarios or events, we may look into conditional forecasting techniques or dynamic Bayesian networks to model how different events may influence each other's probability. Other classical modeling techniques to consider may include the estimation of agent’s revealed preferences. To build such models, we would typically start by asking LLMs to find (conditional) probability estimates in the source data itself or to make guesstimates when not found. To improve the quality of these guesstimates, we will leverage best practices in prompt engineering, including:
encouraging structured reasoning (Chain of Thought or Tree of Thoughts);
instructing the LLMs to follow well-established methodologies (e.g. simulate Delphi method of Analytic Hierarchy Process); and
providing really good examples as part of the prompts (we will seek help from professional forecasters to produce them).
We believe it should also be possible to turn qualitative models into game-theoretic models by asking LLMs to generate the possible consequences for the agents’ available actions and to combine them with estimated payoff functions. A key benefit of this approach is that it would allow us to reason about (and predict) which actions various entities may decide to take to increase the likelihood of different outcomes. This extends research that is currently being conducted by AI research labs on multi-agent coevolution systems and hybrid causal models combining influence diagrams and structural causal model frameworks.
We're not planning (yet) to invent new mathematical modeling frameworks for the first phase of this project, but if resources permit, we will try to instruct LLMs to use the formalisms defined in different papers and see which ones yield the most accurate and useful models. In fact, instead of giving the LLM a single formalism, we could provide an entire modeling cookbook and let it suggest which model to use. Such a cookbook would easily fit the context window size of modern models such as gpt-4-turbo.
On the more ambitious side of the spectrum, we would also like to try generating code in expressive languages, for instance a probabilistic programming language. Generating arbitrary code comes with additional challenges (e.g. the code might not terminate) but it could be much more precise and expressive.
Products and application domains
Education
The next generation of students is likely to spend less reading history books and more time asking history questions to an intelligent chatbot. But there is also a danger that the chatbot will provide them with oversimplified, sometimes biased narratives. High-quality incentive models could be a building block for more sophisticated and reliable education tools. Consider for instance a student learning about the French Revolution. The tool could reveal the complex web of economic, social, and political drivers that influenced the actions of groups such as the monarchy, nobility, clergy, bourgeoisie, and peasantry by analyzing historical texts and records through the lens of incentive modeling.
Social networks
Incentive models could be used to automatically generate Community Notes on social media platforms. For instance, when Marc Andresseen tweeted about his techno-optimist manifesto, an educational note could have been added to bring relevant context to people unfamiliar with his position. For instance: it may be relevant to highlight how Marc Andressen's VC firm has invested billions in AI and is incentivised to push against any sort of AI regulation. This would be obvious to most of his followers, but perhaps important for people less familiar with the space to understand the dynamics at play. We would however need to avoid implying that people are only motivated by material incentives when they are not, so such notes should be positioned as "context" (as opposed to "warning" or anything implying wrongdoing).
Depolarization
When dealing with polarizing topics, we expect sources to disagree on the incentives of different entities. In such situations, it may be particularly interesting to compare how opposing sources paint different pictures. This is something we could easily do by generating two different models from two different sets of sources. As long as the two models are relatively self-consistent and grounded in relatable human incentives, looking at them may help an observer develop empathy and respect for both sides. Likewise, a good incentive model could provide the proof that no conspiracy, no nefarious plan, no evil schemes are necessary to explain what people do most of the time, even if it seems inexplicable at first.
Forecasting
Large incentive models could provide valuable inputs for people who are in the business of predicting the future. We're thinking in particular about forecasting tools and platforms such as Metaculus. Forecasters could use information in the entity/goal matrix, as well as the matrices detailing influence and causal links between goals to better understand the potential actions and reactions of different entities. They could also look at different incentive models from different data sources to produce multiple projections. In turn, those predictions could be tracked over time for accuracy so that the most predictive models visibly accrue trust.
Existential risks
We believe another natural application domain would be risk analysis and prevention. Think for instance about climate change. A philanthropist investor looking at a series of projects may need to forecast the chance of success of each project before deciding to invest. The chances of success of a typical ESG project often depend on whether the different entities involved are sufficiently aligned. Looking at the incentive model of each project could also help estimate this.
Policymaking
Consider for instance the domain of social media regulation or AI regulation. In both cases, there are many entities involved (governments, tech companies, content providers, platform users, troll farms…) and many conflicting incentives to consider. When an academic group or a think tank tries to model such domains by hand, they are often unable to represent all the entities and all the factors that they would have wanted to represent. Using LLMs will help make the models more complete and closer to the real world, potentially making them more credible and more likely to influence policies.
Coordination
The crux of many societal challenges is not value misalignment (most people would prefer global peace) but a lack of shared understanding about the incentives of others and the alternative Nash equilibria available to us. Expansive incentive models could significantly boost our collective epistemic understanding, clarify the decision space, and make it more tractable for collective intelligence to navigate toward optimal outcomes. More generally, our intuition is that incentive models could enhance the kind of deliberation tools that we have been developing at the AI Objectives Institute.
Quality and safety
Evaluation
To check the quality of the models, we plan to run evaluations on samples of statements extracted by our models. We will seek feedback from all users on the plausibility of incentives to get a sense of how much trust can be put in the models. For technical subjects, we will also work with domain experts to get high-quality feedback. We may also use the Prolific platform to recruit diverse participants and will ask them to verify the accuracy of the information extracted by LLMs. To scale this review process further, we plan to use LLMs to surface situations where incentives may be misrepresented or misassigned. We will be particularly cautious about the release of numerical models. When numerical predictions do not seem sufficiently reliable, we will only release a qualitative model.
High-stakes and delicate domains
Our approach will vary by domain due to different risks. For example, detailed models on geopolitical conflicts might pose national security risks and could be exploited by adversaries. Therefore, we plan to avoid recent data on active conflicts and sensitive topics where AI's current capabilities may not ensure the necessary accuracy and sensitivity. Initially, we'll steer clear of complex cultural issues, ensuring any future models in these areas undergo rigorous quality checks to prevent the propagation of stereotypes or polarizing narratives.
Misuse for propaganda
There is always a risk that our models may be used for automated influence operations. A thorough report by OpenAI on this topic concluded that there are no silver bullets to solve the broader problem, but we still plan to conduct product-specific tests to assess how our model's information might be misused. This may include evaluating the perceived credibility and bias of various model-generated snippets, some randomly created and others deliberately skewed by an independent red team. If testers can easily identify manipulated narratives, it might indicate a lower risk of misuse. Over time, we aim to partially automate this testing process.
On the balance, we believe that it is beneficial for the public at large to have easy access to the information we plan to aggregate in these models. Indeed, all the information in our models will come from sources that a nefarious actor would already have excess to, especially if they are a sophisticated bad actor like a state-sponsored troll farm. It's in fact the potential victims of misinformation and propaganda who would get most value from incentive models, because it would help them identify the possible motivations of the propagandists.
Tentative roadmap
We are currently (Q4, 2023) in the ideation phase of this project. We're actively seeking feedback and advice from experts and potential users to refine our plan and avoid falling into rabbit holes.
The next phase (Q1, 2024) will be the formalization phase. By the end of March 2024, we hope to have a clearly defined structure and appropriate resources to execute. Our default assumption is that this project will be co-hosted by two non-profits, the AOI and MetaGov, but we are open to collaborating with more institutions. While a few people have already offered to collaborate as volunteers, we also plan to raise funds to recruit and compensate more contributors. We're also considering strategic, domain-specific sources of funds. For instance, if an organization focused on climate change would like to sponsor a case study on a climate-related topic, we would be very interested in exploring this, especially if this organization were also able to provide access to data and/or domain experts.
Discussions with potential partners and sponsors will also be an occasion for us to build some first prototypes and demos (also in Q1, 2024). We plan to build these using relatively small datasets and iterate fast on the UI to figure out early what may be the most interesting features in the eyes of potential users and partners.
The following phase (Q2 & Q3, 2024) will be focused on a few case studies. Rather than immediately trying to build a one-size-fits-all product, we want to go deep on a few chosen example domains. In contrast with the previous phase, there will be a strong focus on quality and accuracy. We will work closely with domain experts to assess this quality and will make sure to have users with concrete goals lined up. We want these future users to serve as co-designers for the final models and interfaces.
Finally, the last phase of next year (from Q4 2024) will be dedicated to productizing our work. This will require us to consolidate what we learned from different case studies into a single AI pipeline and a single feature set. Our ultimate goal is to release a free open-source tool—we hope by the end of 2024—that anyone could use off the shelf to produce their own large incentive models. Then we would focus the following years on distributing and further improving this tool to make it more broadly relevant and impactful.