PhonoByte
Subscribe
Sign in
Home
Short Summary
Deep Dives
AudioCodec
AudioFoundationModels
Projects
About
AudioFoundationModels
Evaluation of Large Audio-Language Models (LALMs)
Recent advancements have seen LALMs – multimodal LLMs capable of processing and generating auditory and/or textual input – expand their potential beyond…
Jun 2
•
Sankar Mukherjee
Architecting Real-time Speech Interaction: Step-Audio's Approach to Seamless Tool Calling
Building truly intelligent real-time speech interaction systems presents significant technical hurdles.
May 21
•
Sankar Mukherjee
VITA-Audio: Real-Time Speech Generation, Redefined
Introducing VITA-Audio, a groundbreaking end-to-end speech model that dramatically reduces latency in real-time speech applications.
May 14
•
Sankar Mukherjee
How Voila Speaks with Nuance: The Power of Structured Interleaved Alignment
Voila is a new family of voice-language foundation models.
May 12
•
Sankar Mukherjee
A New Era in Audio AI: Introducing Kimi-Audio
The shift in audio AI is underway — from narrow, task-specific systems to unified, foundation-level architectures.
Apr 29
•
Sankar Mukherjee
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts