Using AI to Give People a Voice, a Case Study in Michigan

Brittney Gallagher, Colleen McKenzie, Bruno Marnette, Deger Turan

About
Initial Findings
Final Results
Areas for Improvement
Next Steps
Discussion
Methods Demo Video Acknowledgments

At the AI Objectives Institute, we have been exploring how modern AI technologies could assist under-resourced communities.  In this post, we are sharing our learning from a recent case study, where we used our AI-powered analysis platform, Talk to the City, to help formerly incarcerated individuals raise awareness of the challenges that they face when reintegrating into society.

Talk to the City’s Heal Michigan report is where advocacy meets legislation to create change
— Shawanna Vaughn, Founder and Director of Silent Cry

About

In collaboration with Silent Cry, a non-profit serving formerly incarcerated people, we used Talk to the City (TttC) in an interview pilot project called Heal Michigan to explore perspectives on reentry challenges for these returning citizens. Using AI to analyze video interviews, the project captures the nuanced experiences and challenges faced by formerly incarcerated individuals, highlighting common issues such as digital literacy, job access, and housing discrimination. This approach not only surfaces general trends and individual stories, but can facilitate direct communication of these challenges to local lawmakers, advocating for meaningful change.

The implementation of AI technologies in this context is not without its challenges. Issues such as accurately categorizing claims, avoiding repetition, and maintaining context in personal narratives have been identified as areas for improvement. Despite these hurdles, the project underscores the potential of AI-driven tools like TttC to enhance advocacy efforts for underrepresented groups.

At its core, the Heal Michigan case study and the broader TttC project represent a promising step towards leveraging technology to bridge the gap between under-resourced communities and decision-makers. By refining and expanding these tools, there is an opportunity to empower these communities, enabling them to influence governance and policy decisions directly.

How to read the report

The report below is interactive, clicking on a topic takes you to that topic and subtopics. Claims can be clicked and the quote and video associated with that claim can be played.

Initial Findings

Correctly identifying topics and subtopics

In early iterations, LLMs struggled to correctly create topics and subtopics. In many instances, topics were nearly identical:

  • “Reintegration Challenges for Returning Citizens”, 

  • “Challenges Faced by Incarcerated and Formerly Incarcerated Individuals”,

  • “Reintegration Challenges for Formerly Incarcerated Individuals”. 

  • “Rehabilitation and Reintegration of Incarcerated Individuals”

As we moved to GPT-4 Turbo, this was resolved. Our best results came from an iterative process, where we used initial results to inform our own opinions on the ideal set of top-level topics, and specified those topics in the prompt.

The final Cluster Extraction prompt: 

I will give you a long transcript extracted from different video interviews on the topic of "what challenges are you and the community facing?"

I want you to propose a way to break down the information contained in these comments and arrange these comments into topics and subtopics of interest. Many of the participants are formerly incarcerated (also referred to as returning citizens) or related to someone who is formerly incarcerated and much of the issues relate to that, but not all.

Not all of these are challenges, within many of these interviews were examples of successes, try and include some of those. 

Keep the topic and subtopic names very concise and use the short description to explain what the topic is about.

There should be between 3 and 8 topics. Some ideas for topics are Re-entry challenges, Systemic Issues, Success Stories, Community Support. Topics like education for children would fit under Systemic Issues.

Early versions used de-humanizing language that was not in the dataset to describe clusters

Initially, both GPT-4 and Claude2 performed better than expected, but there was still a need for many manual changes. One example was when an early version came up with a cluster name with the term ‘Ex-Convicts’. ‘Ex-Convicts’ or ‘convicts’ were not included in any of the transcripts, the model came up with this term. Words like ‘convict’, ‘inmate’ are considered de-humanizing when talking about the formerly incarcerated. 

In the final iteration, using GPT-4 Turbo, this was not the case. There were no instances of the model using de-humanizing language unless that language was used by the participant.

The power of seeing/hearing people tell their stories

We did a demo of an early version of the report at an event hosted by Silent Cry called Black August. Many guests at the event were exposed to a report of this nature for the first time and found interacting with it both novel and compelling. One of the participants enjoyed seeing themselves on screen and being able to explore a report and see the person behind the claim.

My involvement with the Heal Michigan project has been profoundly insightful. I deeply appreciate the team’s commitment to maintaining a consistent message and a comprehensive understanding throughout our work. The project’s ambition to grasp the complex seasons of change and to leverage technology as a transformative tool is commendable. This innovative approach not only shifts the narrative but also broadens and deepens the conversation around the experiences of those who have been formerly incarcerated. By focusing on providing nuanced insights into their journeys, the Heal Michigan project endeavors to enact positive change, influencing both individual experiences and societal perceptions. I am truly grateful for the opportunity to contribute to this initiative, which seeks to understand and share the intricate stories of these individuals’ lives, fostering a greater understanding and driving meaningful reform.
— Cozine Welch, program coordinator for Michigan Collaborative To End Mass Incarceration and statewide organizer for the Michigan Criminal Justice Program at AFSC

Final Results

Using GPT-4's larger context window and allowing prompt editing enabled a well-organized report with compelling stories from interviews. In the final version, the topics were specified in the prompt as suggestions, as were examples of how to categorize ideas were also included in the prompt. 

Claims were flagged by participants in three categories:

  • Inaccurate: The claim was inaccurate, many of these were manually updated

  • Miscategorized: The claim was in the incorrect topic/subtopic. All of these were moved to the correct topic/subtopic

  • Removed: There was something fundamentally wrong with the claim or it was a duplicate and the participant asked for it to be removed from the final report

A total of 529 claims1 were generated. Results of the participant review are listed in Table 1. 

Initially, 4.91% 2 of the claims were flagged as inaccurate, miscategorized, or removed. Any inaccurate and miscategorized claims or quotes were manually updated and indicated by a “*” in the report. After correcting these claims, the Revised Error Rate (i.e. the proportion of removed claims) was 2.84%. 

Table 1. Results of participant review of the final report
Claims Removed Inaccurate Miscategorized

Error Rate3

Corrected Revised Error Rate
1 76 4 2 0 7.89% 2 5.26%
2 35 0 0 0 0.00% 0 0.00%
3 37 5 0 0 13.51% 0 13.51%
4 34 0 0 3 8.82% 3 0.00%
5 38 0 1 0 2.63% 1 0.00%
6 40 3 0 0 7.50% 0 7.50%
7 79 3 4 1 10.13% 5 3.80%
8 35 0 0 0 0.00% 0 0.00%
9 51 0 0 0 0.00% 0 0.00%
10 42 0 0 0 0.00% 0 0.00%
11 32 0 0 0 0.00% 0 0.00%
12 30 0 0 0 0.00% 0 0.00%
529 15 7 4 4.91% 11 2.84%

Claims that were removed from the report

There were 15 claims that were removed from the report. There were various reasons for each. Most were removed because they were too similar to a claim made by the same participant, but some were because context was missed. The participants requested that these claims be removed. 

Some examples:

Table 2. Examples of claims that were removed from the dataset
Claim Quote

Reasoning4

"Access to basic needs like food is a critical first step in the reentry process. "It might include a cheeseburger. Cuz if I don't get a cheeseburger in you, then we can't talk about anything else." Though this is true I don't think it is the context I was providing. The cheeseburger was an example of something very simple that may need to be addressed before getting to deeper issues.
"Formerly incarcerated individuals face challenges obtaining a driver's license due to educational barriers. "A lot of people...don't have a high school education. So can they even pass the test? Is there somebody that interprets and read for them? It lacks nuance about what was discussed just before.
 

Areas for Improvement

Interesting versus obvious

Out of the box LLMs (GPT-4) struggled to pull out what is interesting, especially stories that humanize an issue. In this dataset, it did a fairly good job at identifying overarching themes, but much of the stories' detailed content was lost in early iterations. For example, for stories about access to housing for formerly incarcerated individuals, the humanizing aspect of an individuals' experience is lost when the argument is reduced to “there needs to be better access to housing.”

One participant discusses being released from prison and being unable to find housing after doing all the right things. They chose to live in an abandoned house rather than be on the streets - because that’s where they felt safest. Over the years, they fixed up the house and eventually the owner signed the house over to her. In all versions, this story was not included in the list of quotes supporting the importance of access to better housing. Even with changes to the prompt, some of the humanizing stories were not included.

Another participant’s experience was also presented absent compelling details. His wife was on a lease for a new apartment and was approved, then the landlords learned he would be living there also and asked for his name to be included on the lease and then their approval was revoked. In later iterations, this story was included under the claim, “Returning citizens face housing discrimination even with good financial standing.”

While the definition of “interesting” will vary between different projects, our long-term plan is to  compile a good list of prompt templates allowing us to capture the most appropriate types of stories.

Duplicate claims

The version of the Talk to the City used for this project did not yet include any automated deduplication process and the generated reports often included several near-duplicate claims. Duplication was particularly problematic here because the interviews were long and the participants often repeated the same idea several times. Future versions of Talk to the City will leverage LLMs to remove unnecessary duplicates coming from the same participant, but we will still keep one quote for each of the participant having made the same claim, as to give them an equal chance to be heard.

Explaining high-context references

Many of the ideas discussed may not be clear to individuals outside of that community. In this report, some readers may not have context on things like “prison gerrymandering” or “good time credits”. Having the ability within the report to explain these references would be beneficial. 

Additionally, it would be helpful to have links to the numerous organizations and pending legislation mentioned by the participants.

We have experimented with various AI services, including OpenAI’s GPT-4 (with the ability to search the web), and it seems that the aforementioned improvements could be automated in the near future, although we would prefer for such AI services to become more reliable than there are today, to make sure that we only add factually accurate context. 

Other improvements

Hard to Find Errors

The modern LLMs that we used to produce the reports have performed very well on average but they still got things wrong sometimes and it took a significant amount of time and attention to manually look for errors such as miscategorized claims or inaccurate summaries.

In future versions of Talk to the City, we plan to apply several methods to try and identify issues more automatically. We plan for instance to instruct another LLM to review the report and flag possible miscategorizations or inaccuracies, which can then be reviewed by hand.

Video alignment and continuation

There are some instances where the timestamps are off by a few seconds, it can be a little jarring when the video starts in the middle of the quote.

Because videos are uploaded to Vimeo in their entirety, the video continues playing after the quote has been said, many times this is fine because the participant is going deeper into the topic, but sometimes when it’s closer to a transition it can be a difficult transition.

Both issues could however be mitigated in future iterations by using a different transcription service (able to produce more precise timestamps) and a different video platform (able to stop playing automatically after a given segment).

Next Steps

Going forward, our top priority is to make Talk to the City easier for communities to experiment with, and to improve the clarity and utility of the reports it generates. We want to support a wide variety of communities in running their own instances of Talk to the City, to create a clear collective source of truth that community members can refer to, and help members understand points of convergence, divergence, and need for further discussion.

We believe current methods, such as multiple choice voting and short form text surveys, cannot capture nuanced discourse in high fidelity—resulting in a lack of detailed understanding of people's perspectives, and even discouraging people from participating at all. 

We hope that by building tools that can incorporate rich data and engaging interactive media, we can help transform mainstream expectations of the quality of collective discourse and coordination. In addition to our inclusion of long-form text and video data, we are working on:

  • Whatsapp integrations for smart surveying tools

  • Integrating geographic and demographic metadata

  • Iterative discussions to track how respondents' views change in response to seeing Talk to the City reports

  • Interactive elicitation tools (as discussed in a previous article)

Get in touch [hello@objective.is] if you are interested in developing or using any of the following tools in an open-source context.

Discussion

The Heal Michigan case study demonstrated the potential of leveraging AI technologies to amplify the voices of under-resourced communities. By extracting key claims, topics, and stories directly from video interviews, the report allowed users to explore the lived experiences and perspectives of formerly incarcerated individuals through both general trends in their views and the details of their individual perspectives. Even with a small dataset of 12 participants, struggles around digital literacy, access to jobs, housing discrimination and other barriers to reentry emerged as common ground among participants. The AOI team continues to work with Silent Cry and similar organizations to present these challenges faced by returning citizens directly to local lawmakers, using the TttC report.

Many people and communities feel unable to influence the governance processes that directly affect them, even at local scales. This feeling reflects institutions' increasing reliance on large-scale aggregate data to inform decisions that affect large populations, even though these aggregates often fail to capture the nuance of individual perspectives. But we see potential for new AI technology to present this large-scale data in combination with the detail of individual and small-group perspectives—empowering those groups and amplifying their voices. These tools may help us escape cycles of declining trust in institutions by helping us consult the public—especially marginalized groups—in larger numbers, at a faster pace, and in more transparent ways than existing methods.

While the AI systems used showed promise in automatically surfacing common themes and arguments, the process also highlighted areas for improvement. Challenges included correctly categorizing claims, avoiding repetition, maintaining context around impactful personal stories, and using appropriate language free of dehumanizing labels. Additionally, seamlessly integrating video clips and providing links to external references could further enhance the user experience.

Despite these limitations, the project demonstrated AI can be used to create reports that put human narratives at the forefront. As language models and other AI capabilities continue to advance, tools like Talk to the City could become powerful platforms for advocacy—allowing under-resourced communities to raise awareness, find common ground, and directly influence decisions that impact them. Expanding and refining this approach holds great potential for giving an authentic voice to underrepresented groups on a wider range of important issues.

Methods

Data collection process

Heal Michigan represents a collection of video interviews of 12 participants (about 8 hours of video), conducted between July 8th and August 7th 2023. Each participant was asked about challenges their community were facing and if they have ideas for solutions (e.g. policy measures, etc) to those problems. They were all interviewed by Brittney Gallagher (the author of this post). The participants were based in Michigan and 10 were formerly incarcerated individuals, also referred to as returning citizens.

Participants were introduced to us through Silent Cry’s network. Each participant was interviewed over Google Meet. The video interviews were transcribed using Descript and edited to remove filler words (such as uh and um); the interviewer was also removed. Edited videos were uploaded to Vimeo. 

Consent

Each participant was asked if they consented to their name and interview being available online and the transcript anonymously fed into an LLM. In each interview, we explained what an LLM was in non-technical terms, and clarified that the transcripts were uploaded to OpenAI’s servers (we used GPT-4).

As this was one of the early demonstrations of TttC and the participants were members of an under-resourced community, it was decided to have the participants review claims generated by the model. Once the report was generated, each participant reviewed and flagged any claims that were inaccurate, miscategorized, or did not belong in the dataset. After an internal review of the flagged claims, they were either edited or removed from the final report. Any claims that were changed by a human are indicated in the report by “*”. One participant elected to have the interviewer evaluate their claims. 

Pipeline overview 

In order to generate the final reports, we started from the original video interviews and applied a series of transformations leveraging different AI services.  

  • We use a speech-to-text AI service called Whisper to extract video transcripts with precise timestamps 

  • We then used an LLM with a large context window (gpt-4-turbo) to identify topics and subtopics in the data 

  • We then processed again each interview transcript with an LLM to extract the key claims (arguments or opinions) expressed by each interviewee

  • We also used LLMs to match each claim to the appropriate topic and subtopic

  • For each claim, we also kept track of the original quote and looked at the timestamps in the transcript to map the quote to an exact position in the video

Demo Video

Acknowledgments

Notes

  1. Initially the total number of claims was 530, but one was removed prior to the count because the participant referenced an employer they were no longer affiliated with. ↩︎

  2. There was one outlier where the participant flagged nearly half their claims because they didn’t like how the quote was written out, because this was not an issue with the claim, this was not included in the Inaccurate count. ↩︎

  3. Error Rate was calculated by summing the removed, inaccurate, and miscategorized over the number of claims for that participant. ↩︎

  4. This was the reasoning from the participant. ↩︎

 

Previous
Previous

Announcing Colleen McKenzie as the New Executive Director of the AI Objectives Institute

Next
Next

How AI Agents Will Improve the Consultation Process