Sustainable and Accessible AI in Healthcare: Efficient LLM Development

Leijenhorst, van, Luke

Sustainable and Accessible AI in Healthcare: Efficient LLM Development

Files

Leijenhorst, van, L. s-1045958-MSc-MKI92-Thesis-2024.pdf (1.95 MB)

Authors

Leijenhorst, van, Luke

Issue Date

2024-10-17

Language

en

URI

https://theses.ubn.ru.nl/handle/123456789/18878

Abstract

This thesis investigates RetNet, a recently introduced architecture for large language models (LLMs). The study focuses on three aspects: inference efficiency, training efficiency, and performance. We first per form a large set of small-scale experiments to explore RetNet’s char acteristics and its different representations. Afterwards, we use our gained insights to train a large-scale clinical LLM. RetNet exhibits sig nificant advantages during inference, especially for long sequences and large contexts, due to its recurrent representation. During training, RetNet requires fewer steps to achieve similar loss compared to trans formers, proving more data efficient which is particularly valuable for the data-scarce medical domain. Additionally, the chunkwise repre sentation in RetNet allows for efficient training with large block sizes. Large-scale experiments revealed a performance gap between RetNet and transformers, with transformers achieving lower loss during pre training and better downstream performance on a clinical task. The study also explored knowledge distillation (KD) to improve the per formance of smaller RetNet models, but KD proved ineffective due to increased training time and information-sparse softmax values.