From Dialects to Peptides: Scalable and Efficient AI for People
This talk presents a unified AI framework for decoding complex human and biological signals - spanning African and Arabic dialects to proteomics - by prioritizing rigorous measurement, cultural competence, and computational efficiency to ensure global scalability and accessibility.
Overview
We argue that real-world impact at global scale potentially rests on three pillars: rigorous quantification of what is missing, explicit modeling of culture rather than mere linguistic form, and strict efficiency constraints that make deployment feasible. Centered on speech and language technologies for approximately 1.7 billion people in Africa and the Arabic-speaking world, we introduce a unified toolkit for decoding complex, noisy signals in low-resource settings.
We begin by addressing linguistic diversity through measurement-first evaluation. We consolidate fragmented resources into coherent, comparable assessment protocols that stress-test systems under dialectal variation, code-switching, informal registers, and domain-specific conditions such as health, education, and governance. We then extend beyond surface-level correctness to cultural competence by examining whether models adhere to community norms, values, figurative meaning, and contextual appropriateness.
Throughout, efficiency is treated as a first-class scientific objective: we develop training and inference strategies that preserve capability while reducing computational cost, latency, and data demands, thereby widening access beyond high-resource environments. Finally, we demonstrate the broader methodological reach of this agenda in proteomics. Treating mass spectra as structured evidence, we build fast, accurate sequencing methods that incorporate model self-correction. Together, these threads yield a unified approach to decoding the complex languages of both humans and biology.
Presenters
Muhammad Abdul-Mageed, Canada Research Chair, Natural Language Processing and Machine Learning; Associate Professor, School of Information, Department of Linguistics, The University of British Columbia
Brief Biography
Muhammad Abdul-Mageed is the Canada Research Chair in Natural Language Processing and Machine Learning and an Associate Professor at the University of British Columbia. As Director of the UBC Deep Learning & NLP Group, Co-Director of the SSHRC I Trust Artificial Intelligence partnership, and Co-Lead of the SSHRC Ensuring Full Literacy initiative, he develops multilingual, multimodal, and cross-cultural large language models that are culturally sensitive, equitable, efficient, and socially aware. These models advance applications across speech, language, and vision—supporting improved human health, more engaging learning, safer social networking, and reduced information overload.
Securing in excess of $20 million in research funding, his work has been supported by the Gates Foundation (through Clear Global), NSERC, and the Canada Foundation for Innovation, with additional contributions from Google, AMD, and Amazon. A recipient of the 2025 Abdul-Hameed Shoman Award for AI and Arabic and more than 10 best paper awards, Dr. Abdul-Mageed has authored more than 200 peer-reviewed publications. He has advised the Government of Canada on generative AI policy and delivered invited lectures, keynotes, and panel presentations in more than 25 countries. His work has been featured in outlets such as MIT Technology Review, The Globe and Mail, Euronews, and Libération.