Projects:2016s1-101 Predicting Power Outages from Weather Patterns

From Projects
Jump to: navigation, search

Project Members

Augustus Okoye, Siti Nabilah Mohd Razib

Supervisor  : Dr. Brian Ng
Co-supervisor: Dr. Hong Gunn Chew
SAPN Team  : Frank Crisci, Ali Walsh, Ashley Niebling and Surya Gamini

Project Motivation

Power outage of electric lines have been a major discomfort both for civilians and SA Power Networks. There are different machine learning algorithms under the principle of artificial intelligence successfully applied in statistical problems such as image/speech recognition. With many at our disposal, is it possible to predict power outage of electric lines?

Project Aim

Using machine learning technology to make sense of the weather and its impacts on the electricity grid. Therefore, the outcomes of this project is a machine that is able to predict outage occurrence based on weather forecast.

Project Overview


As seen from above diagram that data record will be the input for Artificial Intelligence (AI). AI will analyse the data and obtain the pattern of data. This pattern will then be used to predict the output.
Note that data includes weather parameters (temperature, humidity, wind speed, wind direction) and outage records. AI will then learn how weather properties will affect outage. Properties of outage and non-outage cases are being used to predict future outage given weather forecast in the outage prediction. The focus of this project will be the AI algorithm which the performance of it will be tested.

The Artificial Intelligence

Among many tree-based supervised machine learning models, random forest method is a collection of several decision trees to yield improved prediction accuracy as well as minimised variance. The variables used in individual trees in the ensemble is randomly selected subset from all candidate predictors. Overall, the random forest takes a majority vote of all trees to determine the final prediction.

This is a single tree of random forest after training of decision tree. The machine is given one set of data without knowing its weather properties for outage and non-outage case. The machine will ask questions at each nodes to create split and put the data into smaller class and more questions are given at each split for smaller class until it reach smallest group of classes at the leaves (bottom level) and this bottom class will be classified either yes-outage or no-outage. However, single tree does not gives higher accuracy especially when large number observations are given as it will result in overfitting.
Therefore, random forest which is collection of tree is being used as shown. The final decision of forest will be taking the majority votes of decision tree result.


Real life records of outage events and historical weather data were provided by SAPN.

The geographical distribution of some Automatic Weather Stations across SA. A single AWS hold the weather data for multiple electric feeder lines. We have focused on the Parafield AWS as its feeders hold most outage events
For a given outage event A or B, a window of size 3 is back-sampled from the weather data. This approach is used since an outage at time 𝑡_𝑥 cannot be affected by a weather record at 𝑡_(𝑥^′ ), for 𝑥^′>𝑥

For all outage observations, there is an equal number of generated no-outage observations.
Each observation point of outage/no outage event is assigned a predictive set consisting of a finite hourly window weather data according to their time of occurrence.
The binary observation points together with their weather data is the utility data.


There is a total of 4 weather parameters; wind speed, wind direction, temperature and relative humidity. Bars with negative sign signifies that they are excluded. OOB is the out-of-bag cross-validation within the algorithm while Test is the external test performed on the trained algorithm. This diagram shows that all the weather parameters collective determine prediction accuracy – none contributes than the other, and each parameter hold almost sufficient predictive features.

Due to some low-level design constraints, there were very limited utility data for training and testing the algorithm (<100 observations in some cases). However, the algorithm showed over 60% accuracy in most experiments. Ideally, >10000 observations is adequate for such application.
It is found that:

  • The number of outage and no-outage data points must be at fixed ratio.
  • 24-hour window of weather data for every observation
  • The algorithm should be trained on one AWS.

Gini importance measure of the weather parameters 4 hours prior to regular outages. On the top is the FALSE importance indicating the contribution of the parameters towards no outage occurrence. On the bottom is towards outage events. Temperature and relative humidity is shown to be most contributors for determine no outage events while wind speed and direction are closely the cause of most outages.
A 24-hour wind of 4 weather parameters gives a total of 96 feature predictors. Principle component analysis is a popular analytic tool that explicitly describe the correlation between variables in a feature space, thus describing the total variance in the data in uncorrelated components of the same dimension. Here the analysis shows that a only portion of the data hold the total variance. Since learning machines seek variation in data, this means the majority of the data is redundant and are rather noise.


A 24-hour sample of weather data was found to contain sufficient predicting features but also comprising of seemingly inseparable large redundant elements. Overall experiments show high probability of success, however this study has not revealed an applicable method for extracting purely predictive components in a given data.


  • L. Breiman, “Random Forest,” in Machine Learning, Dordrecht, Kluwer Academic Publishers, 2001, pp. 5-32.
  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, “Tree-based Methods,” in An Introduction to Statistical Learning, New York, Springer, 2015, pp. 303-331.
  • I. Jolliffe, Principal Component Analysis, New York: Springer-Verlag New York, 2002.
  • Gilles Louppe, Louis Wehenkel, Antonio Sutera, Pierre Geurts, “Understanding variable importances in forests of randomized trees,” in WUSTL Machine Learning, Belgium, 2013.