Five Key Insights into Synthetic Data for Geospatial Imagery

In the recent Ai4 conference presentation How Synthetic Data Presents Opportunities for Airbus Customers, Airbus Senior Innovation Manager, Jeff Faudi joined OneView’s CEO and co-founder Omri Greenberg to present their joint research into the impact of training ML algorithms with synthetic data. Here are the five key takeaways from their insightful presentation.

1. The Amount of Geospatial Data is Rapidly Expanding

Airbus collects almost 6 million square kilometers of land masses each day using its currently available optical sensors. They are producing 100 times more imagery than 10 years ago, and this will soon amount to 50 terabytes per day. Indeed, the geospatial imagery sector as a whole is projected to experience significant growth over the coming years due to investments in satellite technologies, demand for location-based data and the increased adoption of drones. 

2. This Growth is Increasing the Data Bottleneck

Every time Airbus clients and resellers want to train a new algorithm, they need to manually annotate each dataset. This can amount to hundreds of thousands of objects that need to be manually identified and annotated. Also, in some edge cases, the data itself is difficult to acquire. This means they cannot even start the training process because the relevant imagery is unavailable in sufficient quantities. So the collecting and annotating of data continues to be a major barrier to the roll out of machine learning algorithms. 

3. Synthetic Data Solves the Problem 

Synthetic data helps Airbus and its clients to create automatically annotated synthetic imagery (even for edge cases) to improve the accuracy of algorithms. This slashes the time needed to collect data and dramatically cuts the annotation cost. It also reduces the turnaround time to train a new algorithm. 

4. Mix of Synthetic and Real Data Proves Optimal 

An Airbus and OneView Proof of Concept study to validate the use of synthetic data to train machine learning algorithms compared the performance of synthetic data, real data and a mix of both. The synthetic data produced excellent results, but the 95% (synthetic) and 5% (real) mixed data set improved algorithm performance by up to 20%. 

5. Synthetic Data Will be The Mainstay in Data Training Sets

There will be a huge increase in the use of synthetic data to train machine learning algorithms. The percentage will be between 90-95%, resulting in better accuracy. Synthetic data will also become standard in training algorithms to detect edge cases where real-world data is lacking. 

To learn how your organization can benefit from OneView’s synthetic data click here to request a demo.