Upleveling Data Labeling with Sama's Jerome Pasquero

Listen and Subscribe

As a Senior Product Manager at Sama, Jerome Pasquero understands the power of data, and he joins us today to share a wealth of knowledge on how better annotation ensures better models.

Key Points From This Episode:

Jerome’s background, interest in AI, and how he landed his role at Sama.
Social initiatives, training data, and what attracted Jerome to Sama.
The shift from focusing on AI models to the importance of data quality.
Why academia requires the use of a foundational dataset to compare models.
The reason for the early focus on building new AI models.
Whether datasets will become open source in the future as models have.
The role of annotation in making data meaningful and useful.
Challenges of annotating data and different approaches to doing so.
The three components of data annotation: models, filtering, and the annotation pipeline.
How to hone in on goals for filtering data into valuable subsets that align with your desired outcomes.
How to measure a model’s accuracy by focusing on user experience and more.
What data drift is and how to prevent it by keeping track of it and retraining models where necessary.
How to know that your training data is close enough to your production data.
What excites Jerome most about the world of data and annotation.

Stream the full episode below, or head here to select your favorite listening app and view the full transcript.

Tweetables:“Most of the successful model architectures are now open source. You can get them anywhere on the web easily, but the one thing that a company is guarding with its life is its data.” — Jerome Pasquero “If you consider that we now know that a model can be highly sensitive to the quality of the data that are used to train it, there is this natural shift to try to feed models with the best data possible and data quality becomes of paramount importance.” — Jerome Pasquero “The point of this whole system is that, once you have these three components in place, you can drive your filtering strategy.” — Jerome Pasquero “You can always get more data later. What you want to avoid is getting yourself into a situation where the data that you are annotating is useless.” — Jerome Pasquero “A model is like a living thing. You need to take care of it otherwise it is going to degrade, not because it’s degrading internally, but because the data that it is used to seeing has changed.” — Jerome Pasquero Links Mentioned in Today’s Episode:Jerome Pasquero on LinkedIn Jerome Pasquero Blog: Top 10 Data Labeling FAQsSama

RESOURCES