Research Brief: Beneficial Deployment of Transformative Technologies

May 1

Written By Colleen McKenzie

Colleen McKenzie, Peli Grietzer, T J

Introduction to AOI’s Research

Theory of Impact

Current Portfolio of Projects

Foundational & Strategic Research

FAQs

Introduction to AOI's Research

The advent of large-scale generative models with a clear path to transformative AI brings our society to an inflection point. Determining what sorts of systems will be built with these technologies, and for what purposes, will play a crucial role in shoring up human flourishing. The AI Objectives Institute (AOI) is a nonprofit research incubator for differential technological development¹ to ensure that algorithmic intelligence is used in service of human autonomy, coherence and well-being, while promoting existential security and civilizational resilience. Our goal is to build the foundation for collaborative development of systems that achieve these goals.

While the emerging capabilities of large-scale generative models hold immense potential for human welfare and progress, they also mark our first real-time encounter with the dangers of advanced AI. Today’s large-scale generative models already confront us with concrete, specific risks of harm from misuse or irresponsible deployment, and future advances pose existential threats of agentic and systemic failures, and of social and civilizational degradation or lock-ins.

We believe that humanity’s predicament, faced with such events, is similar to the predicament of players in an escape room: We have limited time to find our way out of the situation, and our search for clues – for ways to align AI systems – depends on maintaining the integrity of our collective deliberation under pressure. To make it out of an escape room, we must effectively work through the puzzles while communicating with our teammates and keeping our wits about us in a strange and overstimulating world. AOI's three research avenues map roughly to the requirements of problem-solving capacity, cooperation, and autonomy necessary for success, and we aim to make strategic, theoretical, and applied progress in each:

Sociotechnical alignment: promoting AI development paradigms that favor governability of institutions and AI systems, including proofs of concept for alternative AI development paradigms.
Scalable coordination: promoting ways to make cooperation and deliberation more scalable, including demonstrations of socially beneficial applications of AI for improving collective agency.
Human autonomy: research on preserving and enhancing civilizational health and agency, including experiments with resilience-enhancing applications of existing AI capabilities.

As a research incubator, AOI hosts a wide range of applied projects informed by our working groups of internal & affiliated researchers. This fusion of theory and practice not only lets us support a variety of different approaches, but also creates feedback loops that help us learn about the problem-spaces we are working in. Our current research focuses include assessment models for the societal impacts of AI, as well as frameworks for researching sociotechnical alignment cruxes like responsible deployment, deliberative processes, and assistive vs deceptive behavior.

Theory of Impact

Differential Technological Development

The principle of differential technological development stems from the recognition that the order in which technologies are developed will influence their societal impacts.² The use of cars before seat belts, for example, or nuclear weapons before electronic locks, resulted in much higher downside risk than after the introduction of these safety technologies. This strategy suggests that looking not only at the potential harms and benefits of new technologies, but also at the ways they can interact to decrease overall downside risk, results in the best overall results of technological advancement.

AOI intends to incorporate differential development as a core principle of our approach – pursuing research to model the results of various sequences of possible technologies, and building proofs-of-concepts of those we believe have the most beneficial impact on future advances.

Areas of Impact

We believe that working towards differential progress along the three research directions above is valuable in multiple but closely interacting ways:

1. Governance matters, and improving societal capacity for governance is one of the best ways to address existential risk. Reducing existential risk is a universal public good, while failure to address existential risk is a failure of social coordination. We expect improvements to public deliberation and the governance of sociotechnical institutions to reduce existential risk.

2. AI could be massively beneficial, and figuring out how to realize the potential of transformative AI in line with human flourishing should be a high priority. The challenge of making transformative AI good for humanity goes beyond solving technical alignment. We believe that deliberations about how to deploy transformative AI must involve serious inquiry into the role advanced AI can play in enhancing human autonomy, well-being, and security.

3. Better governance and use of AI will require experimentation with new AI engineering paradigms. The work of ensuring that AI development remains in service of human flourishing will involve directly confronting engineering challenges that mainstream, large-scale AI research may defer or sideline. This makes AOI well-positioned to explore new AI engineering and governance paradigms and influence the broader field.

4. Present-day work on socially beneficial AI will have a significant impact on the sociotechnical development of future AI. We believe that immediately socially beneficial interventions in AI have natural continuity with the long-term alignment of AI to human flourishing, but the relationship between the two can be complex and unexpected. Systematic thinking about principles of civilizational health and agency can help secure the long-term benefits of our interventions.

Current Portfolio of Projects

Sociotechnical Alignment

Open Agency Architecture

The Open Agency Architecture (OAA)³ proposes a framework for how institutions can adopt highly capable AI. OAA composes bounded, verified components – which can be either human or AI systems – to handle separate parts of the problem-solving process, with human oversight at the points of connection. Building in alignment at this structural level gives OAA systems more flexibility to incorporate new components, allowing for much broader collaboration on building these component parts, while limiting the risk of misalignment from each new component.

AOI is implementing a proof-of-concept OAA prototype, to create a foundation for coordinating concrete alignment work – integrating input from researchers and practitioners across all areas of governance, sociotechnical alignment, and AI capabilities research. We will start by demonstrating that OAA's modular architecture can achieve parity with – or even outcompete – the current generation of monolithic models in bounded learning environments. Our overall goal is to develop an open-source blueprint for creating institutions that can evolve to incorporate continuously revised and improved processes, including not just transformative AI but also AI-assisted innovations in governance.

Scalable Coordination

Talk to the City

Talk to the City is a LLM interface for improving collective deliberation and decision-making by analyzing detailed, qualitative responses to questionnaires. It aggregates those responses, clusters distinct viewpoints, and represents each with a LLM chat interface – yielding a simulation of what a citizens' assembly for that group might look like.

Applying recent advances in AI to problems of group coordination – and ensuring that these technologies are deployed safely and in true alignment with the people they serve – requires technical research into a wide array of questions. Building on a 2022 DeepMind paper on LLMs for collective deliberation,⁴ we ask how LLM fine-tuning can be leveraged for:

Finding key disagreement within groups,
Surfacing mutually beneficial possibilities and policies between deliberating parties,
Approaching common understanding,
Identifying confusion and miscommunications between perspectives.

The detail and interactivity our extended LLM-based interface provides can help policymakers uncover misunderstandings, key debates, and areas of common ground in complex public discourse. Use cases range from union decision-making, to determining the needs of recipients in refugee camps, to a collaboration with Metaculus on LLM-assisted deliberations on AI predictions.

Human Autonomy

Moral Mirror

Moral Mirror is a proof-of-concept prototype that seeks to explore whether current-generation AI can assist humans in their search for wisdom: can we make chat-based AI assistants that promote our autonomy and sovereignty rather than subvert it?⁵ We are currently building an interactive LLM-based journaling method that leads users through structured self-discovery, drawing on processes used by behavioral health professionals, coaches, and psychologists, to guide people in exploring their values, past experiences, and ideals for how to live.

While optimizing RL and recommender systems for user engagement is relatively straightforward, it is less straightforward to tune algorithms toward respecting human growth and development. To take up this challenge, we are developing a philosophically informed formal theory of ML systems that help individuals align their behavior to their own stated goals and values.

Lucid Lens

Lucid Lens aims to help people understand the source and intent behind the content they encounter, so that they can make conscious decisions about what content to trust. We believe that such AI-based defensive measures for attentional and epistemic security will become increasingly important as generative AI increases in its persuasive capabilities. As manipulative content grows in volume, power, and variety, it becomes critical to understand how we are influenced and why.

Our initial prototype (designed to work through phone and browser plugins) uses a model trained on existing persuasive and AI-generated content to notify consumers when content is likely to be persuasive, distorted, or highly targeted. We aim to increase the transparency of the systems and motivations underlying content on the internet – not just to help people choose what content they interact with, but to encourage the creation of content and systems more aligned to people's true wants and needs.

Foundational & Strategic Research

Markets as Learning Optimizers

Comparative study of optimization in AI and social systems, such as markets and corporations, can be a rich source of insight into the dynamics of each. Many optimization and alignment questions in one domain have counterparts in the other: markets clear through a process akin to gradient descent, and Goodhart's Law is equivalent to model overfitting.

Anders Sandberg (FHI, AOI) and Aanjaneya Kumar (PhD IISER Pune, AOI Fellow) are working on a preliminary mathematical result demonstrating that, under reasonable assumptions, market economies are structurally very similar to artificial neural networks.⁶ We expect that this work will identify abstractions that hold true for optimizers across different contexts and scales, and will suggest alignment strategies that generalize.

Position Paper: Perspectives on Safe AI

AOI’s working paper maps out the relationship between two major viewpoints on AI risk: the responsible AI view, which critiques the ethical integrity of present-day ML applications (e.g. encoded bias in judicial and hiring systems, problems of alienation and accountability in AI-assisted decision making), and the AGI risk view, which focuses on future harms from more advanced AI (e.g. agentic AI systems gaining control of technical and economic infrastructure). We argue that even though these perspectives are explicitly concerned with different timescales – and differ strongly in their culture – they have significant common ground on substance. Our paper seeks to articulate their shared concerns, as well as research directions for models of AI safety that satisfy both groups.

Cause Prioritization Modeling

Much of AOI's research focuses on differential technological development of AI capabilities: determining the best order in which to develop new technology, to ensure total benefits outweigh negative effects. In this paper we will investigate how we might model the decision process for that differential development as an iterated game. In each round of such a game, civilization can exercise some capabilities that will either deplete or enhance its autonomy in subsequent rounds, affecting its ability to use future capacities responsibly.

Our aim is to use this model in developing a cause prioritization strategy that specifies properties about the world which, if maintained, would ensure that the world keeps going in a better state. One such property would be confidence that algorithmic speech is usually truthful⁷ – with that confidence, we could continue distinguishing true from false information, resulting in less pollution of the information ecology. What other such features, we ask, would be most important in prioritizing differential AI research and deployment?

Frequently Asked Questions

Why is AOI focusing on these problems, instead of the difficult technical problems of aligning machine intelligence?

AOI agrees that research into concrete problems of technical alignment – determining how to define an AI's objectives clearly enough that it remains aligned as it increases in capability – are a difficult and essential part of ensuring sociotechnical alignment.

But we believe improving human coordination is just as essential, and that it complements research on hard technical problems in the following ways:

Improving human coordination will lay groundwork for efficient, effective implementation of technical alignment insights.
Current AI systems already pose questions of how best to integrate technology into society – we don't want problems of disinformation, polarization, and encoded bias to weaken collective decision-making in advance of more capable AI.
Collaborating to find shared objectives among groups of human actors is an essential first step in determining what objectives we want more capable technology to work towards.

Why does AOI support the use of current-generation LLMs, which are as yet unreliable and unintelligible, to build critical social infrastructure?

We recognize that there are risks from misunderstanding or over-relying on the capabilities of current-generation LLMs. But we also believe that if used with a clear understanding of their limitations, language models may be a valuable tool for analyzing and simulating discourse at scale. Our intention is to explore this possibility first through small, bounded experiments – which will shed light on the capabilities and the limits of these models, and may reveal the ways in which over-reliance could be problematic.

Is AOI advancing AI capabilities?

AOI does not research possible paths to more advanced AI, so in our view it's unlikely we'll contribute to progress on capabilities. Our aim in developing new technology is to find applications of current-generation AI to problems of alignment, governance, and individual resilience – using models developed by companies like OpenAI and Anthropic, not improvements of our own – and build proofs of concept to help people make use of increasing AI capabilities developed elsewhere.

Notes

Differential technological development: leverage risk-reducing interactions between technologies by affecting their relative timing. See Theory of Impact section above for more detail. ↩︎
See Differential Technology Development: A Responsible Innovation Principle for Navigating Technology Risks (Sandbrink et al, 2022) for a full discussion of this strategy. ↩︎
See the OAA overview alignment forum post. ↩︎
Fine-tuning language models to find agreement among humans with diverse preferences, https://www.deepmind.com/publications/fine-tuning-language-models-to-find-agreement-among-humans-with-diverse-preferences ↩︎
Joel Lehman's work on Machine Love informs this project: https://arxiv.org/abs/2302.09248 ↩︎
See discussion in the "Direct Isomorphisms between Neural Networks and Markets" section of our whitepaper. ↩︎
See section 6.4, "Why now," in Evans et al, "Truthful AI: Developing and governing AI that does not lie": https://arxiv.org/abs/2110.06674 ↩︎

Colleen McKenzie

Research Brief: Beneficial Deployment of Transformative Technologies

Introduction to AOI's Research

Theory of Impact

Current Portfolio of Projects

Sociotechnical Alignment

Scalable Coordination

Human Autonomy

Foundational & Strategic Research

Frequently Asked Questions

Notes

Roadmap for a collaborative prototype of an Open Agency Architecture

Mapping the Discourse on AI Safety & Ethics

AOI

Subscribe to our newsletter