I am a Machine Learning Researcher in Apple's Machine Learning Research team, and I am based in Copenhagen, Denmark.
My work sits at the intersection of machine learning, language acquisition, and cognitive science, with a focus on understanding how humans learn and use language, and using those insights to build more data-efficient and human-aligned models.
I completed my PhD in Machine Learning and Cognitive Science at the Ecole Normale Supérieure, under the supervision of Emmanuel Dupoux and Guillaume Wisniewski. I was also an AI Research Scientist Intern at Meta.
Before that, I earned an MSc in Speech and Language Processing from the University of Edinburgh, and a BSc in Psychology from City, University of London. I also worked as a speech recognition engineer, focusing on acoustic modelling, prior to starting my PhD (see my full CV here).
My research explores, among other things:
Multilingual speech and language modeling
Representation learning with limited or realistic data
Evaluation for speech and interpreting systems
Bridging cognitive science and machine learning
Recent work spans multilingual representation learning, multimodal approaches to speech modeling, and the development of evaluation frameworks for speech and interpreting systems. I have also designed cognitive-inspired evaluation tasks and contributed to benchmark initiatives such as the 2021 Zero Resource Speech challenge.
My work is grounded in insights from bilingual language acquisition viewed through a reverse-engineering lens. During my PhD, I built unsupervised speech models trained on child-like input to study how language structure can emerge without supervision, and to formulate hypotheses about learning strategies in early bilingual settings. This perspective continues to inform my current research, where insights from language development help guide the design and evaluation of multilingual and speech models.
I am a strong advocate for building ties between the machine learning and cognitive science communities, as I believe both fields benefit from shared insights. Check out the slides from our Interspeech Tutorial on the topic if you're interested!
07/11/2025 - [🏆 AWARD] We received an SAC Highlight Award for our paper "Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks" at EMNLP 2025 in Suzhou!
24/09/2025 - [🤞 PREPRINT] New preprint out : "Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models". In this work, led by María Andrea Cruz Blandón during her internship at Apple, we show that visual grounding helps reduce the multilingual gap present in self-supervised speech models.
19/09/2025 - [📄 PAPERS] We’ll be presenting three papers at EMNLP 2025! And as a highlight, our paper "Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks", was nominated for the the SAC Highlight Awards, fingers crossed!
16/08/2025 - [🎥 SLIDES] The slides for our Interspeech tutorial on Language Acquisition and Speech Technology are available online : https://zenodo.org/records/17018214
16/08/2025 - [🌎 TRAVEL] I’ll be in Rotterdam for Interspeech 2025, where I’m co-presenting a tutorial on Language Acquisition and Speech Technology (details below).
27/07/2025 - [🌎 TRAVEL] Heading to Vienna for ACL 202! ! I’ll be around for the conference and the IWSLT workshop, happy to meet up if you’re attending.
22/05/2025 - [🤞 PREPRINT] New preprint out : "Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks". We apply ABX tasks to multilingual text models to understand how form and meaning representations are organised.
22/04/2025 - [🧑🏫 TUTORIAL] I will be presenting, along with Emmanuel Dupoux and Okko Räsänen, a tutorial at Interspeech 2025 on the topic : Speech Technology Meets Early Language Acquisition: How Interdisciplinary Efforts Benefit Both Fields. Looking forward to seeing many of you!
21/04/2025 - [📄 PAPER] Our paper with Ansgar Endress, entitled "The specificity of sequential statistical learning: Statistical learning accumulates predictive information from unstructured input but is dissociable from (declarative) memory for words", was just released in Cognition. It features some of my very early work as an psychology undergraduate about 10 years ago, nice to see it out!
06/01/2025 - [📄 PAPER] New paper out in Developmental Science : Simulating Early Phonetic and Word Learning Without Linguistic Categories. This work, which is a significant part of both Marvin Lavechin and my PhD research, looks at simulating early language acquisition with SSL speech models, with a cognitive perspective.