Changing the U-net in a diffusion model

Keywords
Loading...
Thumbnail Image
Authors
Issue Date
2023-01-27
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Diffusion models are an important discovery in the field of image generation, performing better than the previous state- of- the- art, big GANS, while also being easier to train and implement. Due to this, diffusion has replaced GANs as the current state of the art, with use in projects such as Dall-e and Imagen from Google. However, the issue with diffusion is the fact that it is so new. Because of this, a lot has yet to be explored and usually, many diffusion models use similar implementations. And often, many of these implementations use U-nets with little differences between U-net implementations. So an exploration of these U-nets and modification of these U-nets is important to discover potential new improvements or, at the least, explain why certain functions of the U-net are crucial. In the end there were three implementations attempted: A baseline with no changes; U- net with a removal of the time embeddings; and U- net on top of a U-net. These then attempted to generate images using the Stanford cars data set. The baseline seemed to perform decently, creating a mostly complete picture of a car at 100 epochs. The removal of the time embeddings seemed to fail as it only produced noise, though this is incorrect and most likely did so as a result of a programming error. Finally the U-net on U-net was far too slow to produce anything, taking about three hours to hit five epochs.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen