Not all retailers have the resources or know-how to effectively turn the data they have into a competitive edge. As ML models for common retail use cases become increasingly “off the shelf,” the real competitive advantage is increasingly going to lie with your data and what you do with it.
Data and annotations are equally important at every stage of the AI model lifecycle, whether you’re training your model, testing it, or monitoring it in production.
As you progress through the AI model lifecycle, it’s likely that you’ll ask some of the following questions (and if you aren’t, then you probably should be):
- Where do I start with my data?
- How much data do I need, and when?
- How do I ensure the data sent for annotation is representative of production data?
- How do I make sure there is no inherent bias in my data?
- Which parts of my training data should I get annotated first?
- What do I do with special cases?
- What kind of labels do I need, and what quality?
- How do I capture edge cases?
- How do I manage ambiguity?
- How negatively impactful are errors in my data?
- How can I make sure my model in production is still performing as expected?
- Who can I trust to annotate my training data?
At Sama, we’ve helped hundreds of retailers answer these data labeling questions and more. We’ve compiled our advice in this ebook, along with recommendations for approaching your data annotation strategy holistically and sustainably.