Cached at:
05/08/26, 07:39 AM
TL;DR: Google Senior Product Manager Joel Yawili shares his insights on the Lyria 3 and Lyria 3 Pro music generation models, discussing the interdisciplinary team behind them, the breakthrough in generating music from 30 seconds to 3 minutes, the multimodal collaboration workflow with Gemini, and considerations regarding artist rights and social impact.
## The Fusion of Technology and Music: Combining Personal Passion with Professional Career
I’m Rachid Fine, welcome to the Made by Google podcast. Today’s topic is **Lyria 3** and **Lyria 3 Pro**, Google’s impressive music generation models available on multiple platforms, including the Gemini app. To dive deeper, we interviewed Senior Product Manager **Joel Yawili**.
Joel has been deep in the tech industry for years and now combines technology with music. He noted that this is one of the rare projects where his professional passion (product management) directly overlaps with his personal love (music), which is an incredibly exciting experience. When asked what instrument he would play if he weren’t in tech, Joel mentioned that inspired by Carlos Santana’s song "Maria Maria," he has been trying to learn guitar for the past three years. While his skills are still beginner-level, it is the instrument of his dreams.
## Lyria 3: Empowering Diverse Creative Expression
For users who have never heard of Lyria, Joel describes it as a tool designed to unleash creativity. Creative expression takes many forms:
* **Personal Level**: Sending personalized messages or songs to friends.
* **Business Level**: Small businesses building highly customized music marketing campaigns.
* **Content Creation**: YouTube creators customizing background music for their videos.
The core is "creative expression," with the specific form depending on the user’s use case. Joel observed that since the release of Lyria 3, the most common usage has been creating personalized songs for others. The team has witnessed impressive cases, such as users inputting **Terms of Service** or **meeting notes** into the model to generate fun 30-second songs.
One of the most impactful stories involved a user who had not been in touch with a friend for years and could never find the right words to reconnect. Using the earlier short-duration model, he created a song that expressed emotions previously difficult to articulate, ultimately successfully repairing their friendship through the music.
## "It Takes a Village": The Philosophy of Building an Interdisciplinary Team
In building Lyria 3, the team structure reflects the diversity of music audiences. Joel pointed out that the spectrum of music users is broad, ranging from ordinary listeners with no musical training, to professionally trained musicians, to intermediate groups like himself who love music but are not formally trained.
Therefore, the team is composed of highly technical researchers, engineers, and non-technical roles. Joel cited an African proverb, "**It takes a village**," emphasizing that this project requires everyone’s involvement. It is essential to combine people who understand music with those who do not, because the final product needs to serve all types of users.
In the working environment at Google DeepMind, music is everywhere. Joel admitted that he has generated so many tracks that he was even reminded by colleagues to "slow down" due to computational resource usage. The team’s chat groups are filled with inside jokes, test results, and shares of cool use cases, with the entire team constantly "playing" with the tool.
## From 30 Seconds to 3 Minutes: The Technological Leap of Lyria 3 Pro
Lyria 3 initially supported generating 30-second clips, while the newly launched **Lyria 3 Pro** supports generating songs up to **3 minutes** long. Joel emphasized that this is not just an extension of duration but includes the following key improvements:
1. **Improved Music and Lyric Quality**: Overall sound quality and lyric generation quality have improved.
2. **Structure Awareness**: Users can very specifically dictate song structure, including intro length, the presence of bridges, choruses, or verses.
3. **Fine-grained Control**: Users can dive into details, such as specifying the vocalist for specific time segments. Joel gave an example where he requested the first 46 seconds to be sung by a male voice, followed by a transition to female-only vocals, and finally a duet. The Pro model accurately executed these complex instructions.
Joel shared a nostalgic moment: when he entered a prompt requesting a song from his birthplace, **Kinshasa**, the model not only fluently sang in the local language, **Lingala**, but also captured the nuances of Kinshasa music, such as the unique style of "music stopping, speaking, then resuming." This moment made him realize the profound impact this technology could have on the general public. He also created a song for his mother about memories of plantains as a symbol of gratitude.
## The Collaborative Workflow of Gemini and Lyria
When handling lyric generation, Joel noted that while the external workflow appears unified, there are detailed distinctions internally for different genres (such as hip-hop vs. country), requiring significant effort to ensure lyrics align with the nuances of specific genres.
**Gemini** plays an important role in this process:
* **Multimodality and World Knowledge**: Gemini possesses macro-level understanding of the world, helping the model interpret the cultural nuances behind complex prompts like "Kinshasa style."
* **Iterating Lyrics**: Joel suggested that advanced creators could first use Gemini’s powerful creativity and thinking-partner capabilities to **iterate and refine lyrics**, and then input the final text into Lyria for music generation. This approach is particularly suitable for users with specific ideas in mind.
Additionally, Lyria supports **multimodal input**. Users can upload images (such as of pets, friends, or objects), allowing the model to interpret the image content and create tailored songs.
## Art Cover Generation and Considerations for Social Impact
Regarding song covers, Google leveraged the capabilities of **ImageFX** (note: misheard as "Nano Banana" in the transcript, actually Google’s image generation model). In the second version of Lyria 3 Pro, in addition to providing default covers, a **custom cover** feature was added. Users can upload their own images, and the system will combine these images to create exclusive covers that match the song’s theme.
Addressing public concerns about whether "AI-generated music will replace human artists," Joel expressed a cautious attitude:
* **Developed with Artists**: Lyria 3 was developed in collaboration with artists, indicating that Google values artist rights and is not developing products in a vacuum.
* **Balancing Fear and Empowerment**: Acknowledging that technology can be frightening for some and empowering for others, Google is committed to finding a balance between the two, reflected in product design and communication.
Ultimately, Google’s goal is to build a tool that respects the existing artistic ecosystem while stimulating public creativity.
Source: How AI Helps You Express Your Vibe | Made by Google Podcast S9E3 (https://www.youtube.com/watch?v=7gnybANIBws)