Sustainable and Accessible AI in Healthcare: Efficient LLM Development
Keywords
No Thumbnail Available
Authors
Issue Date
2024-10-17
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
This thesis investigates RetNet, a recently introduced architecture for
large language models (LLMs). The study focuses on three aspects:
inference efficiency, training efficiency, and performance. We first per form a large set of small-scale experiments to explore RetNet’s char acteristics and its different representations. Afterwards, we use our
gained insights to train a large-scale clinical LLM. RetNet exhibits sig nificant advantages during inference, especially for long sequences and
large contexts, due to its recurrent representation. During training,
RetNet requires fewer steps to achieve similar loss compared to trans formers, proving more data efficient which is particularly valuable for
the data-scarce medical domain. Additionally, the chunkwise repre sentation in RetNet allows for efficient training with large block sizes.
Large-scale experiments revealed a performance gap between RetNet
and transformers, with transformers achieving lower loss during pre training and better downstream performance on a clinical task. The
study also explored knowledge distillation (KD) to improve the per formance of smaller RetNet models, but KD proved ineffective due to
increased training time and information-sparse softmax values.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen