Daytime Population Mapping Predictive modelling of daytime population in European countries.

Thumbnail Image
Issue Date
Journal Title
Journal ISSN
Volume Title
Understanding the population’s fine-grained spatial and temporal distribution is essential for fields such as disaster risk management, health-risk analysis, and infrastructure planning. Population dynamics are complex due to many influencing factors and the population’s continuously changing size, density and distribution. These complicated population dynamics make it difficult and expensive to create maps for different temporal and spatial scales, leaving many studies to focus on the residential population (i.e., where people live). An approach to get more informative population maps is the study of the daytime population (i.e., where people are during the day). Especially in urban areas, many factors, such as the mobility of commuters and the distribution of leisure activities, influence the daytime population. For this reason, most studies focus on a single study area and estimate the daytime population in that area after modelling some of its relationships. The advantage of focusing on a single study area is that the feature space of the predictor variables is known. However, the transferability of modelled relationships to create predictions in a new area remains a problem due to potentially differing feature spaces. This project explores the possibility of creating a data-driven model trained on data from one European country and transferring the model to other similar European countries. With this goal in mind, multiple open data sets available for all European countries are explored as predictor variables to improve the predictive mapping of the daytime population. Predictor variables used for training were points of interest, residential population, land cover type, road length, and elevation. Since the performance of methods can vary significantly for diverse datasets, three methods of varying complexity were compared: SGD linear SVR, XGBoost, and an Artificial Neural Network (ANN). Baseline models were trained on engineered features. Then, hyperparameter tuning was performed as an optimization step. After optimizing the model to its original study area, the models were transferred to another European country to compare the models’ extrapolation performances. R2, RMSE, and MAE were the used performance indicators for model comparison. A PCA dimensionality reduction step was implemented to compare the similarity between the training data of the study areas. With this technique, it was attempted to provide an estimate of the uncertainty of predictions during extrapolation. However, the resulting extrapolation ratios were not correlated with the error residuals during extrapolation. Thus, this measure could not be used to approximate the models’ prediction uncertainties. Nevertheless, the PCA feature spaces were visually compared between the study areas to explore the variability within their data and explain some findings on different performances when training models in different study areas. Final results, averaged over the two study areas, suggest that XGBoost (R2=0.52) and ANN (R2=0.52) have similar performance when predicting within their training area, with SGD linear SVR (R2=0.47) staying behind in performance. However, SGD linear SVR performed best during extrapolation, closely followed by the XGBoost model. The ANN models had poor extrapolation performance. Due to the complexity of training ANNs and their inferior performance during extrapolation, XGBoost seems the preferred method for the current task. Keywords— Daytime population, predictive mapping, SGD
Faculteit der Sociale Wetenschappen