Daytime Population Mapping Predictive modelling of daytime population in European countries.
Keywords
Loading...
Authors
Issue Date
2022-10-06
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Understanding the population’s fine-grained spatial and temporal distribution is essential
for fields such as disaster risk management, health-risk analysis, and infrastructure
planning. Population dynamics are complex due to many influencing factors and the
population’s continuously changing size, density and distribution. These complicated
population dynamics make it difficult and expensive to create maps for different temporal
and spatial scales, leaving many studies to focus on the residential population (i.e.,
where people live).
An approach to get more informative population maps is the study of the daytime
population (i.e., where people are during the day). Especially in urban areas, many
factors, such as the mobility of commuters and the distribution of leisure activities, influence
the daytime population. For this reason, most studies focus on a single study
area and estimate the daytime population in that area after modelling some of its relationships.
The advantage of focusing on a single study area is that the feature space
of the predictor variables is known. However, the transferability of modelled relationships
to create predictions in a new area remains a problem due to potentially differing
feature spaces. This project explores the possibility of creating a data-driven model
trained on data from one European country and transferring the model to other similar
European countries. With this goal in mind, multiple open data sets available for all European
countries are explored as predictor variables to improve the predictive mapping
of the daytime population. Predictor variables used for training were points of interest,
residential population, land cover type, road length, and elevation.
Since the performance of methods can vary significantly for diverse datasets, three
methods of varying complexity were compared: SGD linear SVR, XGBoost, and an
Artificial Neural Network (ANN). Baseline models were trained on engineered features.
Then, hyperparameter tuning was performed as an optimization step. After optimizing
the model to its original study area, the models were transferred to another European
country to compare the models’ extrapolation performances. R2, RMSE, and MAE were
the used performance indicators for model comparison.
A PCA dimensionality reduction step was implemented to compare the similarity
between the training data of the study areas. With this technique, it was attempted to
provide an estimate of the uncertainty of predictions during extrapolation. However, the
resulting extrapolation ratios were not correlated with the error residuals during extrapolation.
Thus, this measure could not be used to approximate the models’ prediction
uncertainties. Nevertheless, the PCA feature spaces were visually compared between
the study areas to explore the variability within their data and explain some findings
on different performances when training models in different study areas.
Final results, averaged over the two study areas, suggest that XGBoost (R2=0.52)
and ANN (R2=0.52) have similar performance when predicting within their training
area, with SGD linear SVR (R2=0.47) staying behind in performance. However, SGD
linear SVR performed best during extrapolation, closely followed by the XGBoost model.
The ANN models had poor extrapolation performance. Due to the complexity of training
ANNs and their inferior performance during extrapolation, XGBoost seems the preferred
method for the current task.
Keywords— Daytime population, predictive mapping, SGD
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen