Decoding Speech From Human Brain Activity Using Diffusion Models
Keywords
Loading...
Authors
Issue Date
2023-05-15
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
This thesis investigated the use of diffusion models for decoding speech from brain activity, which can
enable the development of brain-computer interfaces to restore communication in severely paralyzed
individuals. Although the field has seen significant progress, existing approaches display several limitations:
they often require recordings of isolated utterances with multiple repetitions, or use brain
data from multiple individuals to generate intelligible speech. To address this, we employed a twostage
training framework: First, we pre-trained a diffusion-based speech generator on a large speech
corpus, and then utilized the speech generator to develop models that generate speech from brain
activity. We worked on brain data recorded from a single subject during a book reading task and
trained our models to generate speech from single instances of words in the brain data. Our results
showed that our models can generate naturalistic, intelligible speech by mapping brain data to speech
fragments from the pre-training dataset. We conclude that diffusion models are a promising choice for
generating speech from brain activity, and are robust enough to work on the brain activity of a single
subject, without repetitions of utterances. This has the potential to advance the field of speech BCIs
for severely paralyzed individuals.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen
