Recent advancements have seen LALMs – multimodal LLMs capable of processing and generating auditory and/or textual input – expand their potential beyond basic tasks like speech recognition to complex capabilities such as audio-grounded reasoning and interactive dialogue.
Evaluation of Large Audio-Language Models…
Recent advancements have seen LALMs – multimodal LLMs capable of processing and generating auditory and/or textual input – expand their potential beyond basic tasks like speech recognition to complex capabilities such as audio-grounded reasoning and interactive dialogue.