As a Senior Product Manager at Sama, Jerome Pasquero understands the power of data, and he joins us today to share a wealth of knowledge on how better annotation ensures better models.
Key Points From This Episode:
- Jerome’s background, interest in AI, and how he landed his role at Sama.
- Social initiatives, training data, and what attracted Jerome to Sama.
- The shift from focusing on AI models to the importance of data quality.
- Why academia requires the use of a foundational dataset to compare models.
- The reason for the early focus on building new AI models.
- Whether datasets will become open source in the future as models have.
- The role of annotation in making data meaningful and useful.
- Challenges of annotating data and different approaches to doing so.
- The three components of data annotation: models, filtering, and the annotation pipeline.
- How to hone in on goals for filtering data into valuable subsets that align with your desired outcomes.
- How to measure a model’s accuracy by focusing on user experience and more.
- What data drift is and how to prevent it by keeping track of it and retraining models where necessary.
- How to know that your training data is close enough to your production data.
- What excites Jerome most about the world of data and annotation.
Stream the full episode below, or head here to select your favorite listening app and view the full transcript.