Both the duration and frequency of power outage events has increased markedly over the last decade.  Due to aging infrastructure, the US has more outages than any industrialized country. Climate change and increased demand for electricity have increased the pressure on an already stressed system.  With each outage costing millions of dollars, any preventative measures that might protect against an outage could save utilities and businesses considerable financial strain.

In order to prevent outages, it would be logical to start by analyzing their causes.  A quick glance at recent news gives an overview of some common causes of outages.

Equipment failures:

Equipment failures due to age or overuse can lead to disruption of service in large portions of the network.

Transformer failure in Colorado

An electrical transformer went out causing a major outage in Glenwood Springs, CO.

Underground cable explosion

An underground feeder cable explosion and fire caused electricity to be cut off to a quarter of downtown St. Louis.

Broken electrical pole

A broken electrical pole in New Brunswick caused loss of power to 15,000 people.



Snakes in South Carolina

A snake interfering with substation equipment caused 15,000 people to lose power in South Carolina.  

Monkeys in Kenya

A similar event involving a monkey caused an outage throughout Kenya.  

A Bird in Texas

A bird cause a brief power loss in Texas.

Squirrels Almost Everywhere

@cybersquirrel1 created a map of animal related disruptions.


Weather-related events:

Storms, ice, and lightning strikes can lead to equipment failure and blackouts.

Lightning strikes in Wyoming

A lightning strike left more than 9000 customers without power in Wyoming.


Human activity:

Construction, car accidents, and other human activities can also disrupt the electrical system.

Car accident in Virginia

An outage was caused when a van struck an electrical pole in Virginia.

A protest causes an electrical shutdown in California

A protester climbing an electrical pole caused an outage in California.



Tree limbs falling on aerial lines or roots growing into feeder cables are common causes of equipment failures.

A tree downs an aerial cable in Canada

A limb hitting an aerial line caused a widespread outage in Saskatchewan.


To tackle the problem of predicting and ultimately preventing outages, it is necessary to consider the extent to which the major causes for an outage are predictable.   The failure of a piece of equipment which has had past maintenance issues might be highly predictable, while a lightning bolt striking a new transformer might not be.  In this first instalment, we’ll look at the problem of predicting mechanical failures.  

Traditional power utility maintenance systems frequently rely on a run to failure system in which equipment is replaced only after a failure occurs.  In cases in which allowing equipment to fail is not desirable, scheduling equipment replacement based on age is a common method.  However, age-based replacement is far from an ideal solution as it may lead to the replacement of expensive equipment that still has decades of life left while doing little to limit catastrophic failures.  The goal of GridCure’s preventative maintenance program is to provide a more sophisticated diagnosis of which equipment is likely to fail so that maintenance and replacement efforts can focus on the equipment that is in need of replacement.

In some cases, machine learning can be used to augment existing maintenance procedures.

One of the most expensive and critical pieces of equipment in the electrical system are large power  transformers at electrical substations, which can weigh up to 400 tons and cost millions of dollars.  Substation transformers perform the role of stepping the voltage of an incoming line up or down.  A transformer malfunction can be a catastrophic event involving explosions or fires and damage to surrounding transformer equipment. Because of the safety issues that a malfunctioning transformer can cause, it is desirable to repair or replace transformers before a failure actually occurs.

Transformer malfunction is generally detected using dissolved gas analysis. Large transformers contain oil, which serves as a cooling agent.  Electrical discharge events or overheating cause various chemical reactions in the insulation and oil.   There are a number of criterion for analyzing the concentration of various gasses in the oil and ratios of concentrations of gases in the oil in order to assess the health of the transformer.  One of the most commonly used is Duval’s triangle, a method for comparing the concentration of methane, ethane, and acetylene to classify what type of fault has occurred.

Duval’s triangle and similar methods are useful in determining whether a transformer has already undergone a fault, but the method can be unreliable near the boundaries of the graph, and may not catch a fault in the early stages.  Also, it is a fairly simple method that doesn’t take into account other features such as the progression of the readings over time, the load that the transformer has experienced, or the maintenance history of  the transformer.  It would be desirable to develop more sophisticated methods for determining whether a transformer is likely to fail.

The simplest way statistically to determine which transformer is most likely to fail would be to find a large number of transformers of the same age, brand, DGA results and level of wear and tear.  Of course, in reality, finding a large number of transformers that are qualitatively the same would be quite difficult.  It’s likely that one wouldn’t find enough “similar transformers” to get good statistics.  Instead of looking for transformers that are exactly the same in all qualities, it makes sense to think about what features of the transformer are most important.  Which transformers should be in a comparison group? Transformers of the same age? Transformers of the same brand? Transformers with the same usage level?  One fairly simple way to determine which features are most important is using  a decision tree such as the one below.  For the transformer fault problem, a decision tree could be used to repeatedly find the variable most useful in separating groups of transformers that are more likely to fail from transformers that are less likely to fail.  It does this by minimizing a loss function at each split.  In the below diagram the best criterion for grouping similar transformers is age and brand, but more complicated decision trees could take into account many factors.  The tree can be read by following the decisions from root to the bottom leaf.  A transformer less than 25 years old with more than would have a probability of failure of approximately 679/2679 or 25.3%, while a transformer that was more than 25 years old and had brand A would have an approximate failure rate of 179/291 or 61.5%.


Decision trees are simple to understand, but there are other methods that give better results in practice.  Random forest and gradient boosting machines are different methods of combining large numbers of decision trees to create more complicated probability predictions.  Other methods like neural networks and support vector machines can be used to create more complicated nonlinear combinations of variables as a criterion for failure, which are likely to produce better results than more simple linear rules of thumb such as Duval’s triangle.

Using a variety of machine learning methods learning methods, GridCure is developing sophisticated methods for combining all information about transformer and other pieces of key equipment to determine which equipment may be approaching a failure well before a dangerous situation occurs.