Achieving safe autonomous driving requires almost endless hours of training software in every situation that could arise before putting a vehicle on the road. Historically, autonomy companies have collected hoards of real-world data with which to train their algorithms, but it’s impossible to train a system on how to handle edge cases based on real-world data alone. Not only that, but it takes a long time to even collect, classify, and label all that data in the first place.
Most autonomous vehicle companies, such as Cruise, Waymo, and Waabi, use synthetic data to train and test perception models with speed and a level of control that is impossible with data collected from the real world. Parallel Domain, a startup that has built a data generation platform for autonomous enterprises, says synthetic data is a critical component for scaling the AI that powers vision and perception systems and preparing them for the unpredictability of the physical world.
The startup just closed on a $30 million Series B led by March Capital, with the participation of return investors Costanoa Ventures, Foundry Group, Calibrate Ventures and Ubiquity Ventures. Parallel Domain has focused on the automotive market, providing synthetic data to some of the major OEMs building advanced driver assistance systems and autonomous driving companies building much more advanced autonomous driving systems. Now, Parallel Domain is ready to expand into drones and mobile computer vision, according to co-founder and CEO Kevin McNamara.
“We’re really doubling down on generative AI approaches to content generation as well,” McNamara told TechCrunch. “How can we use some of the advances in generative AI to bring a much broader diversity of things, people, and behaviors to our worlds? Because again, the hard part here is really, once you have a physically accurate renderer, how are you going to build the million different scenarios that a car will need to find?
The startup also wants to hire a team to support its growing customer base in North America, Europe and Asia, according to McNamara.
Construction of virtual worlds
When Parallel Domain was founded in 2017, the startup was heavily focused on creating virtual worlds based on real-world map data. Over the past five years, Parallel Domain has added to its world generation by filling it with cars, people, different times of day, weather, and the full range of behaviors that make those worlds interesting. This allows customers, of which Parallel Domain accounts Google, Continental, Woven Planet and Toyota Research Institute, to generate dynamic camera, radar and lidar data they would need to train and test their vision and perception systems, McNamara said.
The Parallel Domain synthetic data platform consists of two modes: training and testing. During training, customers will describe high-level parameters (for example, highway driving in 50% rain, 20% night, and an ambulance in each sequence) that they want to train their model on, and the system will generate hundreds of thousands of examples to meet those parameters.
As for testing, Parallel Domain offers an API that allows the customer to control the location of dynamic things in the world, which can then be plugged into their simulator to test specific scenarios.
Waymo, for example, is particularly interested in using synthetic data to test different weather conditions, the company told TechCrunch. (Disclaimer: Waymo is not a confirmed Parallel Domain customer.) Waymo sees weather as a new lens that he can apply to all the miles he’s driven in the real world and in simulation, since it would be impossible to remember all those experiences with arbitrary weather. conditions.
Whether it’s testing or training, every time Parallel Domain software creates a simulation, it can automatically generate labels to correspond with each simulated agent. This helps machine learning teams conduct supervised learning and testing without having to go through the painstaking process of labeling the data themselves.
Parallel Domain envisions a world where self-employed companies use synthetic data for most, if not all, of their training and testing needs. Today, the ratio of real and synthetic world data varies from company to company. More established companies with the historical resources to have collected a lot of data are using synthetic data for about 20-40% of their needs, while companies earlier in their development process products rely 80% in the synthetic world versus 20% in the real world. According to McNamara.
Julia Klein, a partner at March Capital and now a board member at Parallel Domain, said she believes synthetic data will play a critical role in the future of machine learning.
“Getting the real world data you need to train computer vision models is often a hurdle and there are delays in terms of being able to get that data in, label it, prepare it in a position where it can actually be used,” Klein told TechCrunch. “What we’ve seen with Parallel Domain is that they’re speeding up that process considerably, and they’re also addressing things that you might not even get in real-world data sets.”