Abstract:
Fog forecasting still shows large inaccuracies in accurately predicting fog formation, dissipation and duration. Since a few years, Machine learning (ML) algorithms are increasingly used in addition to numerical fog forecasts because of their computational speed and ability to learn non-linear interactions between the variables. Due to their black-box nature, precise and accurate training and evaluation is vital to prevent insufficient training or meaningless scores. Three main points important for fog prediction are explained in the following.
1. Fog forecasting datasets consist of autocorrelated variables. In most cases, there is an information leakage between the training and test data sets which are used to evaluate the model performance. This information leakage can have an impact on the performance scores because the stronger the information flow, the easier it is for the model to memorize.
2. Fog forecasting datasets have a temporal order. To be able to make statements about the performance of an operational model this temporal order should already be simulated during model training and evaluation. This is because for an operational model, the training data points are always older than the data points to be predicted. Commonly used training methods neglect this fact.
3. Time series used for fog forecasting usually have a large imbalance between the frequency of the fog class and non-fog class. This imbalance can have an unfavorable interaction with the confusion matrix based meteorological scores that are widely used for evaluation. All of the aforementioned points, if not considered, can lead to an insufficient forecast without even being noticed.
Therefore, the negative influence on the model score of two commonly used training methods that neglect the points named above will be shown using an XGBoost model and a logistic regression model. In comparison, a training and evaluation method was evaluated that maintains the temporal order and thus simulates the performance of an operational model. It will also be shown that common meteorological scores, since they are computed based on a confusion matrix, share a weakness when the data set is unbalanced: Persistence behavior remains undetected.
The study is funded by the DFG research project “FOG-ML FOrecasting radiation foG by combining station and satellite data using Machine Learning”.