NotebookLM's podcast tool: A (potential) game changer for impact data storytelling

NotebookLM Post 1 of 3: My initial experiments

Oct 22, 2024

TL:DR: The data utilization gap in nonprofits isn't just about collecting better data – it's about communicating it in ways that drive action. As AI tools like NotebookLM evolve, they offer exciting possibilities for bridging this gap through innovative storytelling. But with great power comes great responsibility. Here's the first in a three-part deep dive into the implications, challenges, and future possibilities of AI-powered impact storytelling in the nonprofit sector. This post documents my initial experiments and reflections with the podcast feature of NotebookLM.

In nonprofit evaluation, we're constantly seeking ways to make our data sing. But what if it could actually speak? Enter NotebookLM's new podcast feature – a tool that can potentially transform how we communicate complex information. I recently played around with NotebookLM and conducted a few experiments to understand its potential. I was blown away by the ease and initial accuracy of the tool and am cautiously optimistic that it can develop into something very powerful for anyone who focuses on storytelling to communicate impact, complexity, and systems change. I’ll explore this over a series of three posts. This first one charts my experiments and initial reflections. The next two dive a bit deeper into what this means for evaluation and social impact.

Image created with Dall-E: *image that shows data (numbers and words, charts) being processed and emitted out of headphones. lots of bright colors and excitement.*

Discovering NotebookLM

I’d been playing with NotebookLM for a few weeks for a straightforward research project and found it incredibly useful for synthesis across sources. (If you haven’t played with it yet, I recommend starting here.) I also liked that it seemed more ‘transparent’ and flexible than projects in Claude because of the way it cites the sources within its answers and allows you to toggle off/on sources for different queries. But then I listened to AI Daily Briefing’s podcast episode where the host used the new podcast feature to summarize and present a dual-host podcast discussion about the longshoreman strike. Something clicked as I listened to the AI-generated discussion about a complex labor issue. I realized after the discission was over that I understood more about the topic from this brief audio snippet than I might have from reading the original article (let’s face it — reading on the phone is often filled with distractions!). The possibilities for making dense information accessible suddenly seemed endless.

From Paper to Podcast: An Experiment in Accessibility

Intrigued, I decided to test NotebookLM with one of my own papers – a complex piece on social change that I had written a few years ago. With a mix of excitement and trepidation, I fed the paper into the tool and hit 'generate'. (It really is that easy. The push of a single button. I’ll get more to the limitations of that later and my hopes for the future of this tool.)

The result? Pretty darn awesome.

Two AI hosts enthusiastically dove into a conversation about my paper, distilling key points and even prioritizing the real-world examples I'd tucked into sidebars, which I thought was an interesting editorial choice. It wasn't just a regurgitation of facts; it was a genuine dialogue that made my ideas more approachable and engaging.

But was it accurate? What were the tradeoffs? I needed to know more.

Under the Microscope: AI Analyzing AI

The AI-generated podcast presenting a podcast discussion on the AI Daily Brief was a nice meta moment, so I decided to do my own. For my meta twist, I turned to another AI tool (my beloved Claude 3.5 Sonnet) to analyze the transformation from paper to podcast. I gave Claude my original paper and used Otter to create a transcript of the NotebookLM podcast, which I also gave to Claude. I asked it to analyze the results (full prompt footnoted below).

The insights were reassuring. According to Claude (and from my listening, I fully agree), here’s how the podcast from NotebookLM compared to my original paper.

The good:

Increased Accessibility: The conversational format made complex concepts more digestible. My initial paper was academic (shocker), and the podcast was simplified.
Better Engagement: The dialogue style was more engaging than formal academic prose. Research supports this!
Examples Emphasized: The AI hosts cleverly used my case studies to anchor the discussion, whereas they were more of a sidebar in the original paper. This makes sense from a narrative standpoint, but I was still surprised to see it do this.

The challenging:

Depth Dilemma: Some nuances were lost in the translation to a more casual format.
Citation Conundrum: The podcast format made it difficult to include proper citations. My paper had citations a reader could easily flip to, but that doesn’t exist with the podcast. This is part of why many research papers about podcasts suggest they support multi-modal learning by being accompanied by visuals or texts.
Tone Shift: The informal conversation style, while engaging, was a significant departure from the original academic tone. From my point of view, this was a great thing. But it won’t work for all stakeholders.

These insights sparked a deeper reflection on the tool's potential and limitations. If this is how it did with a theoretical paper, what would it do with more typical evaluation data?

From Theory to Practice: Tackling Real Evaluation Data

Intrigued by my initial success with the feature, I continued experimenting. Could NotebookLM handle raw evaluation data? I took a recent project – a global nonprofit's AI literacy training initiative – and put it to the test. I decided not to do much to clean or prepare the data beyond initial checks for test responses and things like that. I wanted to see what my AI tools were capable of.

I created two sources for my NotebookLM. First, I used a Qualtrics results pdf export (that’s it — two clicks to the results tab and export as pdf) and gave that to Claude. I asked it to summarize the data (prompt in footnotes below). I then gave NotebookLM Claude’s summary and the original Qualtrics export. In my first experiment, I also gave NotebookLM the original session design and delivery script. However, I found that the resulting podcast overemphasized the design, not the evaluation data. I tried to remove it from the sources but it still was very present when recreating the podcast. This seems to be a glitch, so I ended up having to create a new notebook with just the two data-centric sources I wanted it to focus on.

The result? An 8.5-minute audio piece that not only presented the data clearly but also suggested next steps for the organization. It was concise, engaging, and actionable – everything we strive for in evaluation reports.

The client team's reaction was even more exciting. One colleague called it "the best breakdown and advertisement for this initiative" they'd heard. The podcast soon made its rounds in the organization, sparking discussions and, more importantly, understanding.

Reflecting on AI's Role in Impact Storytelling

As I reflected on these experiments, the potential of AI-generated podcasts in evaluation and nonprofit communication became clear to me. As this tool and others like it are further developed, I see some really promising possibilities.

Enhanced Accessibility: Complex data becomes more approachable through conversation.
Increased Engagement: Stakeholders might actually listen to a podcast when they'd skip a report. How many board members or leadership teams are inundated with data and documents they don’t have time to dive into?
Actionable Insights: The format naturally lends itself to discussing implications and next steps. I was excited that the podcast inherently went here in the analysis of the data and I think it’s a place we aim to get consumers of evaluation data to, but we aren’t always successful at the ‘what’s next’ part.
Efficient Communication: Deliver key findings in a format busy leaders can consume on-the-go. It’s no substitute for meaning-making sessions and spending time with the data, but if it tees up a more meaningful conversation and allows an entry point for everyone, then I think that’s a great first step.

But we must also navigate challenges. Here’s what I hope to see in the tool as it develops, or from tools that decide to compete with it:

Accuracy Oversight: Human review remains crucial to ensure the AI doesn't misinterpret data. As of right now, there’s no way for me to simply correct something that the podcast gets wrong (for example, it continually mispronounced the name of the initiative — it doesn’t do acronyms well — and also mischaracterized the training data for the tool, but there was no way to fix this).
Customization Constraints: Current limitations on controlling length or tone need addressing. Right now, you have no idea how long your podcast will be and you can’t control the output. I imagine that small toggles are forthcoming that will allow you to set a time limit goal and adjust for different tones. This would make the tool exponentially more useful.
Representation in Voices: The current AI hosts may not reflect the diversity of our stakeholders. The two hosts, while sounding of different genders, sounded pretty stereotypically white, and there was no way to adjust this or select different voices. Many platforms out there exist that can do this, and I know NotebookLM is experimental, but having different ages, races, and accents represented is an important part of effective communication with stakeholders.

Development of these tools is moving incredibly fast. Between the time that I drafted this post last week and it sat in my queue, NotebookLM rolled out updates to the podcast feature. There is now a ‘customization’ box (image below) where you can guide the tool for things you’d like the hosts to focus on. Not quite the simple toggle features, but certainly a step toward the customization features that will really enable this tool to be useful.

AI Storytelling: Ready to Play?

Ready to dip your toes into the world of AI-powered data narratives using NotebookLM? After you’ve accessed NotebookLM, here are three micro-moves to get you started:

Podcast Your Last Report: Take a recent evaluation report and create an AI podcast version. Compare the two (both yourself and using AI!) and reflect on the differences.
Stakeholder Listening Party: Share both written and podcast versions with a small group. Gather feedback on comprehension and engagement.
Complexity Challenge: Use NotebookLM to explain a complex, systems-level evaluation. See how the conversational format handles nuanced information.

The Future of Data Storytelling: Why I’m Excited

NotebookLM's podcast feature is just the beginning. It challenges us to rethink how we share our findings, engage our stakeholders, and, ultimately, drive change with our data. I invite you to join me in this exploration, to experiment, reflect, and shape the future of data storytelling in the nonprofit sector.

In my next post, I’ll dive deeper into specific use cases for AI-generated podcasts in nonprofit communication. From board reports to donor updates, we'll explore how this tool can revolutionize how we share our impact. Stay tuned, and in the meantime, I'd love to hear about your experiments with AI in evaluation. Have you tried similar tools? What challenges do you foresee? I want to hear about it!

Footnotes:

Prompt for Claude to analyze NotebookLM’s podcast feature conversion of my original paper: “Good morning, Claude. I recently worked with a software to convert a paper of mine into a podcast style dialogue. I would like you to perform an analysis of how successful this was, what might be concerning about this, and what might be promising. I'm curious for you to determine how accurate the effort was to move from paper to podcast. I would like you reflection on the overall style and tone between the two pieces, how they compare, and/or how they complement one another. I'm also open to other things that you might notice. I plan to share this analysis as an exploration of how we can leverage podcast conversion tools for storytelling in the social impact sector, but I want to be sure to speak to the advantages and disadvantages. I'll give you the original paper and the podcast transcript for your analysis. Thank you!”
Prompt for Claude to analyze the Qualtrics results export: “I'd like your help reviewing end of session surveys from a recent training I created. For the training, I designed a 45 minute session to introduce people to AI, specifically a local installation called [name]. I then trained staff on how to run the sessions, and they trained over 300 fellow colleagues. This data is from the survey given out at the end of the sessions. I will include the session structure/plan and the survey results. Your analysis should be thorough, identifying the overall reaction to the session, key strengths, areas of improvement, and representative quotes for themes in the open-ended data.”