AI Researcher · Audio & Speech

I'm Umberto Cappellazzo, and I work as a Research Associate in the Department of Computing at Imperial College London, UK. I'm a member of the iBUG group led by Maja Pantic and am fortunate to be advised by Stavros Petridis. I'm currently working on self-supervised audio representation learning, speech tokenizers, and multimodal LLMs. In particular, I've mainly focused on advancing audio-visual speech recognition through Large Language Models, in close collaboration with Meta AI. I've published several papers along this direction (IEEE ICASSP x3, Interspeech x3, IEEE ASRU, NeurIPS). Previously, I obtained my PhD in Information Engineering and Computer Science from the University of Trento, Italy.

RESEARCH

🗣️

Audio-Visual Speech Recognition

LLM-based AVSR that reads lips and listens at once — state-of-the-art on LRS2/LRS3 via modality-aware compression and LoRA.

Llama-AVSRLRS3Multimodal LLM
🪆

Elastic & Matryoshka Models

One model, many granularities. Matryoshka representation learning and Mixture-of-Experts for adaptive inference.

MoMEOmni-AVSRElastic
🎛️

Parameter-Efficient Fine-Tuning

Adapters, LoRA, prompt-tuning, and soft Mixture-of-Adapters — matching full fine-tuning at a fraction of the cost.

PEFTSoft-MoAAST
🌊

Self-Supervised Audio Learning

Large-scale self-supervised audio pre-training via next-embedding auto-regressive objectives in latent space.

SSLSpectrogramsTokenizers
🔍

Interpreting Multimodal LLMs

Probing attention sinks, massive activations, and modality contributions via Shapley attribution.

SHAPLEY VALUESAttention SinksMASSIVE ACTIVATIONS
♻️

Continual Learning for Speech

Learning sequentially without forgetting — rehearsal, distillation, and contrastive objectives for spoken language understanding (SLU).

Continual LearningSLUDistillation

BIO

Mar 2025 — present

Research Associate · Imperial College London (iBUG team)

Advised by Stavros Petridis in the group led by Maja Pantic. Focus on multimodal LLMs and self-supervised audio representation learning.

Feb 2024 — Nov 2024

Visiting Researcher · Imperial College London

Nine-month visit with iBUG exploring LLMs for AVSR, advised by Stavros Petridis — the work behind Llama-AVSR.

Summer 2023

JSALT 2023 · Le Mans, France

Finite-state methods with modern neural architectures group; early-exit techniques for CTC/MMI.

Nov 2021 — Jan 2025

PhD · University of Trento

"Efficient Knowledge Transfer and Adaptation for Speech and Beyond." Defended cum laude, Jan 2025. Supervised by Daniele Falavigna and Alessio Brutti.

2016 — 2019

M.S. Telecommunication Engineering · University of Padova

Thesis: deep-learning-based ECG delineator. Supervised by Michele Rossi and Matteo Gadaleta.

2013 — 2016

B.S. Information Engineering · University of Padova

Thesis: message authentication over an ideal or noisy channel. Supervised by Nicola Laurenti.

PUBLICATIONS

Interspeech '26 [Long Paper Track]

Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in AVSR

U. Cappellazzo, S. Petridis, M. Pantic

Interspeech '26

VIB-AVSR: Variational Information Bottleneck for Noise-Robust LLM-Based AVSR

P. Arora, N. Singh, U. Cappellazzo, S. Petridis, M. Pantic

Interspeech '26

MambAdapter: Lightweight Mamba-Based Adapters for PEFT in Speech and Audio

S. Ali, U. Cappellazzo, M. Ravanelli

ICASSP '26

Omni-AVSR: Towards Unified Multimodal Speech Recognition with LLMs

U. Cappellazzo, X. Liu, P. Ma, S. Petridis, M. Pantic

ICASSP '26

Mitigating Attention Sinks and Massive Activations in AVSR with LLMs

Anand, U. Cappellazzo, S. Petridis, M. Pantic

NeurIPS '25

MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition

U. Cappellazzo, M. Kim, P. Ma, H. Chen, X. Liu, S. Petridis, M. Pantic

ASRU '25

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

U. Cappellazzo, M. Kim, S. Petridis

Interspeech '25

Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach

U. Cappellazzo, M. Kim, S. Petridis, D. Falavigna, A. Brutti

ICASSP '25

Large Language Models Are Strong Audio-Visual Speech Recognition Learners

U. Cappellazzo, M. Kim, H. Chen, P. Ma, S. Petridis, D. Falavigna, A. Brutti, M. Pantic

MLSP '24

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

U. Cappellazzo, D. Falavigna, A. Brutti, M. Ravanelli

ACL Findings '24

Continual Contrastive Spoken Language Understanding

U. Cappellazzo, E. Fini, M. Yang, D. Falavigna, A. Brutti, B. Raj

Interspeech '24

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

U. Cappellazzo, D. Falavigna, A. Brutti

Interspeech '24

Evaluating and Improving Continual Learning in Spoken Language Understanding

M. Yang, X. Li, U. Cappellazzo, S. Watanabe, B. Raj

ICASSP '24

Improving Continual Learning of Acoustic Scene Classification via Mutual Information Optimization

M. Yang, U. Cappellazzo, X. Li, S. Watanabe, B. Raj

ICASSP '24 WS

Training Dynamic Models using Early Exits for ASR on Resource-Constrained Devices

G. A. Wright, U. Cappellazzo, S. Zaiem, D. Raj, L. Ondel Yang, D. Falavigna, M. Ali, A. Brutti

Interspeech '23

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End SLU

U. Cappellazzo, M. Yang, D. Falavigna, A. Brutti

Interspeech '23

An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for SLU

U. Cappellazzo, D. Falavigna, A. Brutti

News

LATEST UPDATES

05 Jun '26
3/3 papers accepted to INTERSPEECH 2026 (1 long, 2 regular): Dr. SHAP-AV, VIB-AVSR, MambAdapter. See you in Sydney! 🇦🇺
13 Mar '26
New paper Dr. SHAP-AV — first comprehensive study of modality contributions in AVSR at scale. Project · Paper · Code
17 Jan '26
Two papers accepted to IEEE ICASSP 2026: Omni-AVSR and a study on attention sinks & massive activations in audio-visual LLMs.
19 Sep '25
MoME accepted to NeurIPS 2025 — unifying Matryoshka representation learning with sparse Mixture-of-Experts.
07 Aug '25
Llama-MTSK accepted to IEEE ASRU 2025. See you in Honolulu! 🌺
22 May '25
Llama-SMoP accepted to Interspeech 2025 — a sparse Mixture of Projectors for LLM-based AVSR.
11 Mar '25
Joined Imperial College London (iBUG) as a Research Associate, advised by Stavros Petridis.
15 Jan '25
Defended my PhD cum laude at the University of Trento. Dissertation · Slides
Get in touch

LET'S COLLABORATE

Always happy to discuss audio & speech representation learning, multimodal LLMs, or collaborations. Reach me at umbertocappellazzo [at] gmail [dot] com.