Projects

Here are some of the latest projects I've been working on. These projects fall into two main lines of research: Multilingual and Speech Representation Learning and Speech Evaluation and Human-Aligned Metrics.

Axis 1 : Multilingual and Speech Representation Learning

I study how multilingual models acquire and represent linguistic structure, across both speech and text.

I examine how languages interact in shared representation spaces, how to reduce cross-lingual interference, and how multimodal grounding can support more robust and data-efficient learning.

This line of work is influenced by bilingual language acquisition and explores language discrimination, representation disentanglement, and cross-lingual generalization.

Key publications & projects include:

Discriminating Form and Meaning in Multilingual Models (text) — EMNLP 2025 [pdf]
Multimodal grounding for multilingual speech learning (speech + vision) — manuscript under review [preprint]
PhD work on bilingual learning simulations and emergent structure from unsupervised speech input [pdf] (see chapter 4)

Axis 2 : Speech Evaluation and Human-Aligned Metrics

I study and develop evaluation methods that reflect human communicative goals, with a focus on speech, multilingual systems, and interpreting. This includes principled speech evaluation taxonomies, cognitive-inspired assessment tasks, and contributions to community benchmarks such as the Zero Speech Challenge.

Key publications & projects include:

Toward Machine Interpreting — EMNLP 2025 [pdf]
Speech evaluation taxonomy — manuscript under review [preprint]
Prosody evaluation benchmarks (ProsAudit, EmphAssess) [pdf ProsAudit] [pdf EmphAssess]
Committee member, Zero Speech Challenge 2021

PhD & Earlier Projects

Below is an overview of earlier projects that laid the foundation for my current research on multilingual learning and human-aligned evaluation.

STELA - Learning Simulation of Language Acquisition

In close collaboration with fellow PhD student Marvin Lavechin.

STELA (STatistical Learning of Early Language Acquisition) is a learning simulation, which aims at modelling language acquisition under the scope of the statistical learning hypothesis.

The approach is based on self-supervised models which learn language based on raw speech. The STELA framework also allows the possibility to generate comparable developmental learning curve at the phonetic and lexical level.

More info on the project coming soon.

Layout on how learning simulations (like STELA) and infants compare.

paper (1)

paper (2)

video

slides

Modelling bilingual language acquisition

This is the main focus of my thesis and a project which is still well ongoing. In short, using the same approach as in the STELA project, I am modelling developmental learning curves for monolingual and bilingual models. We can then compare the curves to existing experimental and observational results.

Thesis (see Chapter 4)

Prosodic Evaluation and Benchmarks

I have recently started looking into how we can evaluate speech models on their prosodic capabilites. As part of this effort, I've published two benchmarks, ProsAudit and EmphAssess, focusing on prosodic breaks and emphasis respectively.

ProsAudit has been integrated to the Zero Resource Speech Challenge language modeling track.

EmphAssess focuses on the transfer of emphasis in Speech-to-Speech models.

EmphAssess

ProsAudit

Automatic Language Similarity

I am really interested in the effect that language similarity can have in speech models. Will a model trained in one language perform or transfer better to languages that are similar to the seed language? What happens if the model is trained on multiple languages?

From there stems another question: what is language similarity? Can models capture it automatically? And what kind of typology will be captured?

I presented a paper at Speech Prosody 2022 where we did a pilot study at capturing language typology using i-vectors. I am also looking at the effect of language similarity in modelling various speech-related cognitive processes (language discrimination and separation, language familiarity effect, language learning...)

Speech Prosody paper

Automatic clustering of languages typologies using i-vectors

ZeroResource Speech Challenge 2021

I was a co-organiser for the 2021 edition of the Zero Speech Challenge, where I among other things developed the semantic and syntactic benchmarks.

ZeroSpeech 2021 is a challenge aimed at Spoken Language Modelling from raw speech. This task consists in learning language models directly from raw audio in an unknown language, without any annotation or text.

For more info, check out the website (the challenge is still open for new submissions!).

website

PDF

video

Get in touch

maureen.deseyssel (at) gmail . com

PhD and Previous work

Page updated

Google Sites

Report abuse