Sustainable and Accessible AI in Healthcare: Efficient LLM Development

Keywords

No Thumbnail Available

Issue Date

2024-10-17

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

This thesis investigates RetNet, a recently introduced architecture for large language models (LLMs). The study focuses on three aspects: inference efficiency, training efficiency, and performance. We first per form a large set of small-scale experiments to explore RetNet’s char acteristics and its different representations. Afterwards, we use our gained insights to train a large-scale clinical LLM. RetNet exhibits sig nificant advantages during inference, especially for long sequences and large contexts, due to its recurrent representation. During training, RetNet requires fewer steps to achieve similar loss compared to trans formers, proving more data efficient which is particularly valuable for the data-scarce medical domain. Additionally, the chunkwise repre sentation in RetNet allows for efficient training with large block sizes. Large-scale experiments revealed a performance gap between RetNet and transformers, with transformers achieving lower loss during pre training and better downstream performance on a clinical task. The study also explored knowledge distillation (KD) to improve the per formance of smaller RetNet models, but KD proved ineffective due to increased training time and information-sparse softmax values.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen