Mapping the Discourse on AI Safety & Ethics
Değer Turan, Colleen McKenzie, Oliver Klingefjord
Using our prototype discourse visualization tool Talk to the City, we mapped Twitter conversations about the impacts of AI, and identified six distinct perspectives:
1. Ethical approaches to tackling current harms
2. AI is Inscrutable and Difficult to Control
3. Calls for Independent & Democratic Oversight
4. Dangers of Rapid Development
5. Human Interests are Hard to Formalize
6. Optimistic AI Futures
Each points to specific risks and courses for action, but we found significant overlap between the groups, and calls for more collaboration from all points of view.
We plan to continue the project of making these conversations more visible, and to create contexts that support richer, multi-threaded approaches to discussing problems of AI safety – and we hope others in the space will join us in doing so.
Background
In the past few weeks, conversations about the future of artificial intelligence and its societal effects have grown exponentially, in response to new product announcements and calls for safe adoption of new technology (such as the Future of Life Institute open letter and many opinion pieces). These discussions incorporate a variety of concerns – on AI ethics, AI safety, long-term concerns, and societal goals – in a conversation of increasing importance not just for the people working on these issues directly, but also for the global population whose lives they will affect. But for effective collective action, we believe we need to bring visibility to each perspective and their interactions, to capture the nuance of each viewpoint and prevent majority views from obscuring the full extent of discussion.
Last week, we started to analyze and visualize this conversation with a collective deliberation tool we're building called Talk to the City (more detail in our launch announcement). At AI Objectives Institute (AOI), a nonprofit research incubator, we're building tools for collective sensemaking and scalable coordination, with the goal of using AI safely to help institutions and societies build resilience.
We used Talk to the City to map this space of discourse: capturing conversations happening on Twitter and other platforms, and grouping tweets by topic, to help us create clusters of similar viewpoints on what can and should happen in the face of rapidly advancing technology. Our objective was to distill what each cluster believes to be the most important facets of this conversation, and which approaches they believe will result in the best outcomes.
Early Prototype, Preliminary Results
This experiment in mapping Twitter conversations is far from an in-depth summary: it is a first attempt at collecting data and categorizing relevant views, using a prototype tool we're still iterating on. We're doing this now because we expect AI safety discourse will continue to grow in scale and complexity. At AOI, we think that tools for collective discourse need to capture the nuance and diversity of views in large-scale conversations – and we believe our ongoing work towards that goal will benefit from feedback on early results. In parallel, we're integrating content from outbound Twitter links and from collective deliberation platforms like Pol.is, with the eventual goal of capturing input from a wide variety of platforms with relevant discussion. Ultimately, we aim to turn Talk to the City into a platform for more effective deliberation and policy development, by giving policymakers better visibility into continuous discourse at unprecedented depth and scale – but this is our first test of the platform as an early prototype.
If you notice important perspectives we've missed, please point them out to us! If you find specific tweets you don't see reflected in our analysis, retweet them and mention @AIObjectives, and we'll investigate. We'll be updating this project continually, to incorporate more of the conversation and reflect how it morphs over time. If you're interested in the thinking behind tools like Talk to the City, check out our whitepaper. If you think we've made mistakes and want to help us improve, get in touch at hello@objective.is. Our goal is to offer ever-evolving prototypes that help create visibility into collective discourse at scale, and to use that visibility to improve the tools we make.
How it works
Prototype at talktothecity.com/twitter
Talk to the City offers a simple UI for visualizing collective discourse at scale. It works with a variety of different underlying data structures, including collections of longform responses in addition to structured data like tweets.
Here's how we created the twitter mapping:
We started with a seed population of 94 Twitter users who participate frequently in conversations about the intersection of AI and society.
We added all users followed by that seed population, for a total of 35,416 unique twitter users, and collected all original tweets (not retweets) they had posted in the last three weeks (March 9 - April 6).
We used GPT-4 to classify tweets as either relevant or not, by asking whether each tweet was about "AI ethics, AI safety, AI alignment, or the responsible use of AI". 3981 tweets from 245 users were marked relevant.
Talk to the City clusters tweets by their text content (using UMAP for dimensionality reduction and HDBSCAN for clustering), and uses GPT-4 to label each cluster. Many of the LLM-generated labels were similar to each other, so we manually relabeled just over half of them for clarity.
For the purposes of grouping opinions on the societal impacts of AI, we excluded clusters that were:
Descriptions of object-level news about AI capabilities (the "Capabilities of current AI" and “Reasoning about current capabilities” clusters)
Meta-level reflections on the discourse itself (the "External discourse" cluster)
For more details about this analysis, see the Methods & Caveats section below.
Six Perspectives on AI Ethics & Safety
Our visualization groups tweets by topic of discussion. While many of these clusters capture the participants' judgments of the topics considered, our categorization below combines and revises some of these clusters, to summarize the main themes surfaced by the Talk to the City platform.
The six perspectives we've seen emerge are as follows:
1. Ethical approaches to tackling current harms
Talk to the City clusters: AI Ethics; LLM safety & technical investigations; Red teaming AI
Concerns about the impact of advanced algorithms on society are not future problems: present-day issues such as misinformation, algorithmic bias, and disproportionate impacts of job automation already have clear negative impacts.
The risks here are not just from AGI—which I think remains distant—but also from LLMs, which are here now, and already dangerous. This just-out report from @Europol raises multiple, immediate disturbing scenarios: https://t.co/Hk9tFV2muX
— Gary Marcus (@GaryMarcus) March 28, 2023
Members of this group call for action on these present questions, and not focus on hypothetical problems of future, more powerful systems – and in some cases, call out the focus on far-future concerns as harmful to work on present issues:
Yes, we need regulation. But as we said: "We should focus on the very real and very present exploitative practices of the companies claiming to build them, who are rapidly centralizing power and increasing social inequities."https://t.co/YsuDm8AHUs
— @emilymbender@dair-community.social on Mastodon (@emilymbender) April 3, 2023
I did not sign the FLI letter. The intention is good, but focus on future more powerful AI systems detracts much needed attention to the current systems. Also, rather than pausing training, we should focus on responsible deployment and use. #AIethics #ResponsibleAI
— Francesca Rossi (@frossi_t) March 30, 2023
these are overshadowed by fearmongering and AI hype, which steers the discourse to the risks of imagined "powerful digital minds" with "human-competitive intelligence." Those hypothetical risks are the focus of a dangerous ideology called longtermism
— @timnitGebru@dair-community.social on Mastodon (@timnitGebru) March 31, 2023
We also include here people concerned with unaddressed security vulnerabilities in current systems:
I keep thinking about the early days of the mainstream Internet, when worms caused massive data loss every few weeks. It took decades of infosec research, development, and culture change to get out of that mess.
— Arvind Narayanan (@random_walker) April 5, 2023
Now we're building an Internet of hackable, wormable LLM agents.
The views of this group are closely related to movements like FAT ML and the ACM FAccT conference, and institutes such as DAIR, whose statement in response to the FLI letter offers an explanation of one of the perspectives in this cluster:
Tl;dr: The harms from so-called AI are real and present and follow from the acts of people and corporations deploying automated systems. Regulatory efforts should focus on transparency, accountability and preventing exploitative labor practices.
2. AI is Inscrutable and Difficult to Control
Talk to the City clusters: Difficulty of aligning superintelligence; Why future alignment is hard; parts of Reflections on optimism & pessimism; Why future alignment is hard; parts of: Long-term concerns; Predictions about superintelligence, LLM safety & technical investigations
A more future-facing group of perspectives is that of people pointing out the difficulty of wrapping our minds around what current and future AI systems can do, and the difficulty this poses for creating systems that serve our best interests.
As a concise summary:The world is on track to deploy millions of superhuman AIs. But our current training techniques would give them (direct and indirect) incentives to deceive and manipulate humans. And we can't predict how they'd do this because they often develop surprising new capabilities.
— Richard Ngo (@RichardMCNgo) March 29, 2023
The difficulty of determining how AI will behave is not an entirely foreign concept, if it involves deception:
It is hard to end up trusting a superhuman AI for some of the same reasons it is hard to end up trusting a human dictator - you can ask them questions, but they can lie; you can observe their behavior on small temptations, but what about large ones? That said...
— Eliezer Yudkowsky (@ESYudkowsky) April 2, 2023
But current and future systems are fundamentally different from the human minds we are familiar with…
The alien-ness of the shoggoth comes from:
— w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝ (@anthrupad) March 18, 2023
(1) only a tiny subset of human cognition is (noisily) tracked
(2) many other, inhuman cognition added (e.g. group interaction dynamics ≠ the dynamics of internal thoughts)
(3) different architecture
(4) different 'shaping' process pic.twitter.com/2dRY3KOaCE
…which this makes them unprecedentedly hard to predict, and possibly volatile in destructive ways:
I'm also worried that we won't be able to come up with a good training regime, and that even if we do we won't know how to solve inner alignment, and that we won't know whether we've solved inner alignment because the systems are opaque and hard to red team sufficiently
— Jeffrey Ladish (@JeffLadish) March 28, 2023
A core intuition I have about deep neural networks is that they are complex adaptive systems. This creates a number of control difficulties that are different from traditional engineering challenges: https://t.co/TQVv1iHQ4v
— Jacob Steinhardt (@JacobSteinhardt) April 4, 2023
Members of this group call not only for recognition of the difficulty of these problems, but also for improving interpretability of the systems in question:Stuart Russell's argument that optimizers tend to set irrelevant parameters to extremes is another form of this argument, and predicts that a world with drastically increased optimization power will have much more extreme settings, risky. That seems enough for me as an argument.
— Anders Sandberg (@anderssandberg) April 4, 2023
I'd strongly support the idea of a Manhattan Project of intense research to make machines more trustworthy and interpretable (regardless of, or in parallel with a moratorium.) The premature super-investment in non-interpretable technologies is the core of our problems. https://t.co/CJ9Q4e3vlw
— Judea Pearl (@yudapearl) April 1, 2023
3. Calls for Independent & Democratic Oversight
Talk to the City cluster: AI Governance & Regulation
Regardless of the type and time-scale of their concerns, many people believe the most pressing concern for AI safety is the development of independent regulation and oversight for the development of these new technologies.
People overwhelmingly agree that we need democratic oversight, regulation, transparency, and independent civil society initiatives to lead inclusive, democratic conversations about AI. pic.twitter.com/Air6z4EviA
— Collective Intelligence Project (@collect_intel) April 2, 2023
How these decisions get made is arguably the bigger issue than whether OpenAI might the right decision in this case. I think codifying best practices and establishing third-party oversight will be essential - there's some great work on this but more is urgently needed
— Jess Whittlestone (@jesswhittles) March 15, 2023
Calls for such regulation point to the need for transparency in how these systems are built and how they behave, and most suggest that such oversight must be democratic.
Bengio: "There is an urgent need to regulate these systems by aiming for more transparency and oversight of AI systems to protect society." "the risks and uncertainty have reached such a level that it requires an acceleration also in the development of our governance mechanisms." https://t.co/MYwrkoVCGT
— Allan Dafoe (@AllanDafoe) April 5, 2023
In parallel to interest in democratic processes for AI governance, some people point out that new technology can be built in ways that work in favor of democratic processes, or in ways that favor authoritarian structures:
Worrisome paper showing something many suspected. I think beside standard AI alignment/safety there is a need to work on tools and institutions to resist AI-empowered authoritarianism. https://t.co/NdQu1SGmX9
— Anders Sandberg (@anderssandberg) March 15, 2023
Another common theme among these perspectives is the need for new modes of governance to keep pace with cutting-edge technology, to ensure that reflective deliberation about their use keeps up with the development of new capabilities.
How can we have "effective governance [of AI] at the rate of technological change"?
— Aviv Ovadya 🥦 (@metaviv) March 18, 2023
Well, there are new ways we can rapidly govern global tech, which have been explored first with social media... https://t.co/N6WcF26Peu
(I hope to share more about using this to govern AI soon!)
4. Dangers of Rapid Development
Talk to the City clusters: Pausing training of powerful AI systems; parts of LLM safety & technical investigations
Another specific concern raised by many voices – many of them adjacent to the previous cluster, in our graph – is the speed of development of new technology, compared to how quickly we are coming to understand its effects.
Before we scramble to deeply integrate LLMs everywhere in the economy, can we pause and think whether it is wise to do so?
— Jan Leike (@janleike) March 17, 2023
This is quite immature technology and we don't understand how it works.
If we're not careful we're setting ourselves up for a lot of correlated failures.
In particular, people in this group point to the effects of race dynamics on all other issues faced by the field. Many signatories of the FLI cited this incentive for speed, and the need to counter it, as motivating factors:
I have signed this letter as well.
— Connor Leahy (@NPCollapse) March 29, 2023
The current race towards ever more powerful AI is insanely irresponsible and not sustainable.
While I don't fully agree with how everything in the letter is framed, this is still a good reality check. https://t.co/f2gxeqZqQW
My trust in the large AI labs has decreased over time. AFAICT they're starting to engage in exactly the kind of dangerous arms race dynamics they explicitly warned us against from the start.
— Arram Sabeti (@arram) April 2, 2023
Others felt that the FLI letter's call for a six-month pause in development did not go far enough:
Eliezer Yudkowsky wrote an article in Time today calling for an indefinite moratorium on AGI development. While I don't agree with every detail, I think it's basically right, and if there were an open letter proposing what Eliezer is proposing here, I would sign it
— Jeffrey Ladish (@JeffLadish) March 30, 2023
The US pausing all AI R&D for 6 months would not give China enough time to catch up in AI. However, I can't see what would happen in 6 months (esp. w/o AI R&D) that would make the FLI and their chums OK with restarting fast-paced advanced AI research 6 months later.
— Ben Goertzel (@bengoertzel) April 1, 2023
This group had a variety of views on how to best handle development races between countries, but were largely in agreement about the importance of achieving as much international cooperation as possible towards slowing down the development of advanced AI.
5. Human Interests are Difficult to Formalize
Talk to the City cluster: Why future alignment is hard, parts of Comparing machine & human intelligence
Separate from the difficulty of ensuring present and future AI systems work towards human interests is the question of what, exactly, those human interests are. Human morality is complex, full of nuance, and subjective – making it difficult to describe our values and goals clearly enough for use in formal systems.
No one knows how to describe human values. When we write laws we do our best, but in the end the only reason they work at all is because their meaning is interpreted by other humans with 99.9% identical genes implementing the same basic emotions and cognitive architecture.
— Arram Sabeti (@arram) April 2, 2023
I agree that too few people are working on AI alignment. But there's absolutely no reason to think that 'AI alignment with human values is solvable', and there are good reasons to think it isn't (given the diversity, complexity, animosity, & hypocrisy of human values).
— Geoffrey Miller (@primalpoly) March 29, 2023
People in this group point out that many of our existing systems are misaligned, and that current approaches to the alignment of new technology will have similar issues:
That's mainly how people pursue AI alignment. Doesn't work.
— Joe Edelman (@edelwax) March 30, 2023
"Does what those training it want"
— amplifies existing misaligned dynamics from finance and geopolitics
— is led by short-sighted values (and so—in the eyes of an AI which learns better values—a moral error)
— leads to… https://t.co/nEwJcHGQwj
And in some sense, specifying our intent even for simple software systems has been a problem throughout the history of the field:
This is quite possible; though, just a thought: in a sense, work on alignment started with looking for the very first bug in the very first program - it's been always hard to make programs behave the way we intended them to (i.e. "align" them with our goals), no? https://t.co/UlToFX73y9
— Irina Rish (@irinarish) March 19, 2023
As an aside, this current twitter dataset includes fewer voices from this perspective than we expect to hear from in the future, as more people from relevant non-technical fields (e.g. ethics, psychology, and moral philosophy) join the conversation.
6. Optimistic AI Futures
Talk to the City clusters: AI optimism
Most distinct from the other groups in this discourse are the people suggesting reasons to be optimistic about the integration of cutting-edge AI – from expectations that safety will be a solvable problem, to excitement about amazing new experiences it could bring about for humanity.
Optimistic discussions of AI safety point to similar problems with past technologies which we now generally recognize as safe:
Some folks say "I'm scared of AGI"
— Yann LeCun (@ylecun) April 3, 2023
Are they scared of flying?
No!
Not because airplanes can't crash.
But because engineers have made airliners very safe.
Why would AI be any different?
Why should AI engineers be more scared of AI than aircraft engineers were scared of flying?
Others expect that as AI improves, it will pose fewer risks to human interests, and that advances in AI safety will correlate with technological progress:
I genuinely think that above a certain level of agency, systems will be self aligning, that leaving alignment to one of the present coalitions of political AI ethicists, regulators or xriskers would have bad results, and that AGI will have better solutions for our future than us.
— Joscha Bach (@Plinz) April 1, 2023
Still others make less specific claims about outcomes, but reflect on humans' development of tools having brought us abundance in the past – and, in some cases, suggest that our uncertainty about the future means that unimaginably positive outcomes are also possible:
"We should take the plunge. We already have taken the plunge. We designed/tolerated our decentralized society so we could take the plunge." https://t.co/mrHY3xhcuk, "See you all on the other side."
— tylercowen (@tylercowen) March 27, 2023
I want as many of my brothers and sisters, all humans who desire to, to augment their capabilities at will, with no externally imposed limit. Only we can save us.
— Alexandros Marinos 🏴☠️ (@alexandrosM) March 29, 2023
Conclusions & Further Questions
The complexity, impact, and global scale of this discussion results in a diversity of approaches and motivations. Our goal here was to group perspectives by their opinions on the most important aspects of that question and actions to take toward resolving it. As these perspectives come into contact with one another, they sometimes generate interesting syntheses and novel ideas – but also lead to detailed arguments about the best paths forward.
One common theme we noticed from people across these categories was an interest in more coordination between these different approaches to the problem of AI safety. While the groups outlined above have different priorities and suggestions for how to ensure safe adoption of AI, many participants note that these differences are outweighed by their common goals.
The current AI debate seems to neatly got everybody divided: AI safety vs ethics, doomers vs non-doomers, sceptics vs believers, experts vs experts, near vs far risk, regulate vs self-regulate vs laissez faire... plus a helping of uncharitable interpretation of arguments. pic.twitter.com/KM0jHf88kG
— Anders Sandberg (@anderssandberg) April 2, 2023
We desperately need to stop dichotomizing. AI poses serious risks BOTH short-term AND long-term.
— Gary Marcus (@GaryMarcus) March 28, 2023
You don't need to think superintelligence is remotely imminent to be deeply worried.
How @geoffreyhinton, @elonmusk & I wound up sharing some concerns.https://t.co/pDmDe29UMv
These conversations are difficult to capture in sufficient nuance and depth, especially on platforms like Twitter that require concise, discrete statements of beliefs. We plan to create contexts that support richer, multi-threaded approaches to discussing problems of AI safety, and hope others in the space will join us in doing so. This initial report is intended as a jumping-off point for further analysis: the clusters we've identified will inevitably morph over time, as groups deliberate with and learn from each other. Other approaches to analyzing these conversations may even move away from clustering algorithms entirely, towards more granular analysis of the questions under consideration.
It's unproductive to put researchers into 'groups' and politicise issues unnecessarily.
— Emma Bluemke (@emmabluemke) April 3, 2023
AI concerns are dozens of smaller topics, and those topics do not group along political lines.
There are no 'sides' to pick - it's multi-dimensional - find common ground & avoid extremes🙏 pic.twitter.com/a0xqoDNNPQ
With a platform that can articulate the overlap and divergence of different viewpoints, to source agreements and disagreements within groups efficiently and comprehensively, we believe can develop policy and deliberate more effectively. We hope this initial report offers a starting point for more reflection on the landscape of ideas in this discourse as we continue this work towards clarifying large-scale conversations, with the aim of capturing as much nuance and variety of perspectives as possible. Look for our ongoing work towards that end on our substack and twitter.
For comments or suggestions on this project, talk to us at @AIObjectives. If you want to help us build more tools for collective deliberation, get in touch at hello@objective.is. For deeper discussion of the ideas and theory of impact behind all AOI's projects, including Talk to the City, check out our whitepaper. We welcome collaboration with anyone who wants to get involved with the work we're doing – reach out if you want to join us on our mission to use these rapidly advancing technologies to create the best possible futures.
Methods & Caveats
Our approach let us quickly capture discussions quickly unfolding among large groups, but has a number of drawbacks:
The time-limited nature of the data means we don't capture the most clear, precise, or accurate articulation of people's views. Many of the people making comments have previously explained their views in much more nuance and depth elsewhere – often in longform newsletters or articles not quoted anywhere on Twitter – and make only brief allusions to them in tweets. The data here capture the most common and recently relevant aspects of the interlocutors' views, but more targeted surveys are necessary to capture them in high fidelity.
Our tweet-by-tweet analysis makes it difficult to capture more complex dialogue between multiple parties. Our analysis makes it hard to parse brief replies that were clear in the context of a broader discussion, and we don't evaluate threads as a whole even when they outline a single perspective. This influenced our approach to categorizing tweets by their core goals, values, and assessments – as opposed to their evaluations of other perspectives.
Using LLMs to gauge relevance is an imperfect measure. For this project, we were most concerned with missing tweets relevant to the conversation, and it performed better than expected – in a manual review, only 13 of the tweets it discarded were deemed relevant – but it included a number of tweets that were unrelated to the discussion or low-content (e.g. single emoji responses).
Our choice of which voices to include in this discussion is only as diverse as the follow graphs of our initial users. We hope that as this conversation unfolds, it will expand to include more people and ideas relevant to questions of alignment – including ethicists, economists, complex systems theorists, psychologists, and others with insight to bring.
Many thanks to Alex Gray, Emma Bluemke, Gaia Dempsey, and Anna Yelizarova for their input and feedback on this piece.