How to build AI data engines that use the right data at the right time

Join leaders July 26-28 for Transform AI and Edge Week. Hear high-level leaders discuss topics around AL/ML technology, conversational AI, IVA, NLP, Edge, and more. Book your free pass now!

Machine learning (ML) has many applications, and supervised ML, in particular, has taken off in recent years.

Thus, it is critical that organizations build data engines that use the right data at the right stage of their project lifecycle, Manu Sharma told the audience at VentureBeat's Transform 2022 event.

The founder and CEO of Labelbox explained that the "fundamental premise" of supervised ML is the creation of annotated or labeled data. It involves applying semantic annotations to any unstructured information, such as text and video. The key is to do this precisely so that the annotations or labels reflect an understanding of the business logic or business application, Sharma explained.

The data is then fed into neural networks, with the intention that these networks mimic behavior from the data.
Event
Transform 2022

Sign up now to get your free virtual pass to Transform AI Week, July 26-28. Hear from AI and data leaders at Visa, Lowe's eBay, Credit Karma, Kaiser, Honeywell, Google, Nissan, Toyota, John Deere, and more.
register here
The Labelbox platform allows data labeling in any modality - images, video or text - and in any configuration. The company's catalog offering brings all unstructured data together in one place and allows teams to "segment, slice, and slice data for a variety of applications," Sharma said. The company's tools also prepare data for model training, as well as model testing and evaluation.
Iteration Cycle Bottleneck
Sharma has described a "fundamental bottleneck" when it comes to iteration cycles for the development of artificial intelligence (AI) systems. In 90% of companies, it can take months for each iteration — and deployment time becomes significant when you consider that each model can go through 50 to 100 iterations, he said.

"It's really difficult to convert labeled data into production AI models," Sharma said. "It is easy to create prototypes, but it is very difficult to convert these models into production."

Some Labelbox customers have been able to deploy models in 3-6 months, although he pointed out that not all use cases are the same. "Some of the use cases are really challenging and amazing cases that teams are continuing to research," he said.

However, generally speaking, companies are thinking at higher levels and gaining an understanding of how to use the right technologies and products to iterate their models faster and get them into production.

"Over the years, all areas of engineering have benefited from faster iteration," Sharma said. As examples, he mentioned biotechnology, self-driving cars and rockets. "The best companies in these segments are those that have been able to get their products on board quickly and bring them to market, especially (the companies) that are highly innovative."

Nevertheless, while speed of implementation may be essential, it must be carefully balanced with customer needs and general security and privacy concerns (especially with self-driving cars or banking apps, for example).

“There definitely need to be checks and balances put in place for teams to make sure they can test their models before they go into production,” Sharma said.
Accelerate Data Engine Flywheel
Sharma outlined four "major milestones" in the modern data engine workflow.

The first is creating data and identifying "good data" to increase model performance.

The second is data labeling, which includes both human and programmatic labeling. D...

Business Jul 26, 2022 0 105 Add to Reading List

How to build AI data engines that use the right data at the right time

Join leaders July 26-28 for Transform AI and Edge Week. Hear high-level leaders discuss topics around AL/ML technology, conversational AI, IVA, NLP, Edge, and more. Book your free pass now!

Machine learning (ML) has many applications, and supervised ML, in particular, has taken off in recent years.

Thus, it is critical that organizations build data engines that use the right data at the right stage of their project lifecycle, Manu Sharma told the audience at VentureBeat's Transform 2022 event.

The founder and CEO of Labelbox explained that the "fundamental premise" of supervised ML is the creation of annotated or labeled data. It involves applying semantic annotations to any unstructured information, such as text and video. The key is to do this precisely so that the annotations or labels reflect an understanding of the business logic or business application, Sharma explained.

The data is then fed into neural networks, with the intention that these networks mimic behavior from the data.

Event

Transform 2022

Sign up now to get your free virtual pass to Transform AI Week, July 26-28. Hear from AI and data leaders at Visa, Lowe's eBay, Credit Karma, Kaiser, Honeywell, Google, Nissan, Toyota, John Deere, and more.

The Labelbox platform allows data labeling in any modality - images, video or text - and in any configuration. The company's catalog offering brings all unstructured data together in one place and allows teams to "segment, slice, and slice data for a variety of applications," Sharma said. The company's tools also prepare data for model training, as well as model testing and evaluation.

Iteration Cycle Bottleneck

Sharma has described a "fundamental bottleneck" when it comes to iteration cycles for the development of artificial intelligence (AI) systems. In 90% of companies, it can take months for each iteration — and deployment time becomes significant when you consider that each model can go through 50 to 100 iterations, he said.

"It's really difficult to convert labeled data into production AI models," Sharma said. "It is easy to create prototypes, but it is very difficult to convert these models into production."

Some Labelbox customers have been able to deploy models in 3-6 months, although he pointed out that not all use cases are the same. "Some of the use cases are really challenging and amazing cases that teams are continuing to research," he said.

However, generally speaking, companies are thinking at higher levels and gaining an understanding of how to use the right technologies and products to iterate their models faster and get them into production.

"Over the years, all areas of engineering have benefited from faster iteration," Sharma said. As examples, he mentioned biotechnology, self-driving cars and rockets. "The best companies in these segments are those that have been able to get their products on board quickly and bring them to market, especially (the companies) that are highly innovative."

Nevertheless, while speed of implementation may be essential, it must be carefully balanced with customer needs and general security and privacy concerns (especially with self-driving cars or banking apps, for example).

“There definitely need to be checks and balances put in place for teams to make sure they can test their models before they go into production,” Sharma said.

Accelerate Data Engine Flywheel

Sharma outlined four "major milestones" in the modern data engine workflow.

The first is creating data and identifying "good data" to increase model performance.

The second is data labeling, which includes both human and programmatic labeling. D...