raises $6M on the promise of ending data scarcity 

Original Source Here

The availability of data can paralyze a company and its effort to bring software-centric products and services to market. To solve this issue, two-year-old data startup is generating synthetic data for the satellite, medical, robotics and automotive industries.

At its most broad, synthetic data is manufactured rather than gathered from the real world. “When we use the term synthetic data what we really mean is engineered simulated datasets, and in particular, we focus on a physics-based simulation,” CEO Nathan Kundtz explained in a recent interview with TechCrunch.

Kundtz received his PhD in physics from Duke University and cut his teeth in the space industry, heading the satellite antenna developer Kymeta Corporation. After leaving that company, he started working with other small space companies, when he noticed what he called a “chicken and egg” problem.

For example, imagine a company develops a new kind of sensor for a satellite and is looking for funding to commercialize. The company would need to demonstrate to investors that the sensor could generate a useful insight. In order to generate these insights, the company would need to launch a constellation and start collecting a large amount of data.

“This lack of access to data was hindering artificial intelligence,” he said.

Investor interest’s approach to opening up that access has caught the attention of investors. The company has raised a $6 million seed round led by Space Capital, with participation from Tectonic Ventures, Congruent Ventures, Union Labs and Uncorrelated Ventures.

Using a physics-based approach distinguishes from some of its competitors, which are using purely generative methods to create synthetic data. That means these competitors are taking an existing data set and engineering more of it. Generally, this is accomplished using generative adversarial networks (GANs), an AI technique that uses competing neural networks to simulate and refine synthetic data. According to Kundtz, that’s of limited utility to emerging industries, that often have very little or no data to start with.

There are other factors that can affect a company’s ability to get data. It can be a costly, difficult and time-consuming process.  These issues get worse with non-RGB images, like those generated by synthetic aperture radar.

So how does physics solve this problem of generating new information? “We can introduce new information to the process of creating these algorithms through our knowledge of physics, through the equations that govern, for instance, how light interacts with things,” Kundtz said. “So we can simulate what things will look like under different scenarios and then use that to generate datasets.”

A toolkit for developers has developed a platform that includes a no-code configuration tool and APIs to let customers engineer and tweak the parameters on a data set, and a set of tools for dataset introspection and data analysis. The company also provides some starter code for specific applications that customers are interested in, like satellite imagery. The company calls this Platform as a Service.

While a customer does need a certain amount of expertise to use the system, Kundtz said that amount is decreasing each day; some of the funding is going to go toward continuing to lower to skill set required to use the platform.

“What we’re pushing towards is, anybody who can click a button in a browser can generate synthetic data, and not just synthetic data but can really control the types of synthetic data that they want and can introduce that into the rest of a machine learning workflow.”

But you don’t know what you don’t know — a company wouldn’t necessarily know in advance the parameters needed to make a synthetic dataset effective or an algorithm functioning. takes an iterative approach, and emphasizes the interactivity of its platform as a way for customers to identify the gaps in its algorithm or better understand its blind spots.

Kundtz says he doesn’t think synthetic data will completely replace real-world data, but that it will come to play an increasingly important gap for artificial intelligence applications. It also has the potential to take even a smidge of power from companies like Google, which have proprietary access to trillions of images and mountains of datasets. has already brought a handful of customers onto its platform but it’s still essentially in beta, so the funding is going to be used to expand access to the platform as well as investing in particular types of data for specific verticals.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: