Beyond Mean-Squared-Error, Examining the utility of training generative models of images with a loss function based on Mahalanobis distance
No Thumbnail Available
Generative models of high-dimensional data, such as images or audio, are an active field of research in machine learning. Traditionally, modelling images has been treated as a regression task, such that models are optimized to minimize the Mean-Square-Error (MSE) loss function, that is, the average squared euclidean distance between model outputs and data samples. A significant short-coming of MSE is that it assumes that pixels are independent of one another, whereas natural images contain strict covariance patterns between pixels (edges, textures, etc.). We examine the utility of learning the parameters of a loss function based on Mahalanobis distance, which takes pixel covariance into account, as an alternative to MSE. The Mahalanobis-based loss function is implemented using a Gaussian mixture model (GMM) with full covariance matrices, which is fitted to the MNIST dataset of hand-written digits. In order to better isolate the covariance structures learned by the GMM, the eigenvectors of each component’s covariance matrix are extracted. Additional mixture models are learned over the extracted eigenvectors, in order to sample the learned covariance structures directly, improving the perceptual quality of generated images somewhat. A quantitative comparison of the base and modified GMMs indicates that the proposed modifications reduce over-fitting (measured by the difference between train-set and test-set log-likelihood), however, additional research is needed to further improve sample quality and the model’s fit to the data.
Faculteit der Sociale Wetenschappen