How Synthetic Data Can Help Oil and Energy Companies Better Monitor Expansive Infrastructure

The vast, sprawling infrastructure associated with the oil and energy sector makes manual monitoring for corrosion, damage and other potential issues, practically impossible. Machine learning algorithms offer an ideal solution, but their adoption is being hindered by a lack of training data. Synthetic data not only resolves this issue, but can help save the industry the heavy financial burden and clean-up costs caused when infrastructure fails.

Critical infrastructure, such as pipelines and power lines, zigzag across every country around the globe, almost completely unnoticed. However, there are tens of thousands of engineers and surveyors whose job it is to inspect this critical infrastructure for corrosion, leaks, plant encroachment, third-party intrusion, and anything else that may threaten to cause their failure. The ultimate goal is to prevent outages, expensive repairs and environmental disasters triggered by weaknesses that are discovered too late.

The Downside of Manual Monitoring 

The major flaw in using human resources for the preventative maintenance of critical infrastructure is that it’s not possible to physically travel along each pipeline or powerline to manually assess its condition. 

There are 2.6 million miles of oil and gas pipelines alone in the U.S., with an average age of 20 years. And two-thirds of Americans live within 600 feet of one. Their monitoring and maintenance is therefore of extreme importance. 

So, to prevent disasters, we need to constantly be on the lookout for deteriorating and aging infrastructure. 

Rusted pipes alone cost the industry $1.4B per year and are one of the key causes of failure. But while all major infrastructure is at risk without preventative maintenance, the scale involved in the oil and energy sector makes this an immensely difficult challenge. 

There is, however, a realization of the need to replace the manual inspection of assets using technology. Utilizing different sources of data to enable machine learning algorithms to monitor infrastructure to automatically analyze large volumes of data is transforming our ability to monitor vast infrastructure.

Use of Machine Learning Algorithms 

Using algorithms, a giant rust patch on a bolt or joint can be easy to spot. But only if you have sufficient data to train your algorithm. Finding the raw data to train algorithms is, however, the trickiest part of the challenge. Use cases in the energy and oil sector, and the ones responsible for major disasters, are by nature rare and often not captured in geospatial imagery.

So, how can you train an algorithm to recognize clearly hazardous conditions that you are unable to collect enough imagery on to properly train your ML algorithm? 

This is exactly the pain point synthetic data solves. 

It is not possible to manually collect and annotate the amount of data and the rare cases needed to build an effective training data set. So we use synthetic training data to overcome that data bottleneck. 

Synthetic Data is Vital for Effective Training 

Automatically creating synthetic training data allows us to ensure we have a vast supply of randomized imagery allowing for every variation. This is vital as we need a training data set that is as wide and varied as possible. 

Synthetic data gives us full control over multiple, critical parameters, such as the different sensors, the different objects, the different scenes in which they are placed, the weather conditions, the relationships between various objects, and so on. Basically synthetic data can capture any important parameter to best meet our needs.

For instance, synthetic data can allow for hundreds of different shades or patterns of rust, even if we’ve only come across a couple of dozen in existing images. We could simulate perfectly natural plant intrusion patterns we’ve never before seen. Or we could feed our algorithm unusual permutations of shadows or water on objects that might change their appearance.

But the benefits of synthetic training data go beyond the ability to simply create immersive, hypothetical scenarios. A particularly expensive, time-consuming stage of creating training data is manual tagging or annotation. Subject matter experts need to initially review thousands of images, annotating them with indications of the specific infrastructure problems the system is learning to spot. 

Fast, Cost Effective and More Accurate 

Automatic tagging is an integral part of the process of developing synthetic data. Each image is created with all metadata embedded from the start. For companies paying seasoned professionals, sometimes distributed worldwide because they are hard to find and prone to human error, this is a game-changer.

When utilizing synthetic training data, OneView’s customers are able to increase their model accuracy by up to 25%, significantly reduce the time to market, answer all rare and edge cases, and achieve major savings on their data acquisition and preprocessing operations. 

Indeed, synthetic data is the only solution that can properly train algorithms, quickly and cost effectively, by supplying a broad enough range of edge cases to identify failing or damaged infrastructure. This is why synthetic data has a key role to play in helping the oil and energy sector avoid the next big disaster.