1) Select the MODEL:
A Linear Regression is used to investigate the presence of linear relations between power consumption and weather conditions.
A Nearest Neighbour approach highlights a similarity relation between weather conditions and power consumtion from different days.
A Gaussian Process Regression tries to identify a correlation among power consumption based on weather conditions similarity.
An Artificial Neural Network builds highly non-linear relations between weather conditions and power consumption drastically increasing over-fitting risk.
2) Select the SPECS:
Due to the strong theoretical bases of a Linear Model, it is possible to identify which weather features are significantly useful for regression. Therefore, it is possible to obtain from a complete model, which uses all the features, a simplified one, in which the non significant regressors are neglected.
The aim of this reduction os to keep the same prediction power while decreasing the number of used informations, moving in this case from seven to three features.
2) Select the SPECS:
A regression through Nearest Neighbour approach looks for a strong similarity between past and future weather conditions, giving as power consumption prediction a combination of past ones. Therefore, weather conditions might not be enough: it might happen that during spring there appears a summer-like unexpected warm day, while weather consumption remains spring-like.
Hence, a complete model keeps into account even this possibility compared to one that doesn't use days reciprocal position with respect to the calendar.
Moreover, it is necessary to choose the neighbourhood size, i.e. how many past similar days to use for prediction.
2) Select the SPECS:
A Gaussian Process Regression is born as a generalization of a Linear Model whose coefficients have prior distributions. Therefore, one obtains that the power consumtion distribution is a Gaussian Vector whose covariance matrix directly depends on same day weather conditions.
Covariance increases with respect to days proximity, and decreases with respect to a distance growth. The correlation decay commonly follows the exponential of squared distance, If one wants to increase the number of involved past days in future days consumption prediction it is possible to soften the decay, using the exponential of distance.
2) Select the SPECS:
In order to obtain meaningful results, an Artificial Neural Network has to be supervised during training session, in order not to over fit the training data.
This risk quickly grows with respect to the number of nodes (or neurons) activated. On the other hand, one of the most common methods to face it is to split calibration dataset in training and validation sets, while using Levenberg-Marquardt iterative updating procedure. Else, similarly to what is commonly done for linear regressions, it is possible to use an adaptive regularization technique, which penalizes neurons that excessively adapt to data, promoting generalization of Artificial Neural Network. This approach is known as Bayesian Regularization, and is highly sensitive to input and output data: if weather data are not correctly normalized and consumption isn't deseasonalized, neural network isn't able to learn.
3) Set a SEASONALITY:
Power consumption strongly varies with respect to seasons. In particular, minima are reached during summer and maxima during winter. This shows a general sinusodial pattern with yearly periodicity.
Moreover, it is possible to expect that weekends and holidays might have different power consumption compared to week days.
It has to be merged to an increase in volatility during winter (which can be kept into account using logarithmic function), and the presence of auto-regressive effects on a daily basis.
