Sama's Gold Tasks: ML Training Data with Gold-Standard Quality

Abha Laddha

March 23, 2021

3 Minute Read

Several methods exist to help companies define and measure data quality. To define the correct annotation of given data, you want to start by creating annotation guidelines. On top of proposing a multi-level quality checks system, Sama’s experts have built unique know-how in efficiently designing annotation guidelines that enhance data quality.

Cue Gold Tasks; referring to tasks that have been annotated “perfectly” or meet the “gold standard”. Such tasks are often used by the client to communicate their expectations around precision and quality used as examples during training. At Sama, we use Gold Tasks in two more ways: During training, to assess annotators and identify those who are ready to move into production and during production to generate an automated metric on quality.

Gold Assessments

Every project launch at Sama is accompanied with a period of training, where annotators focus on requirements for the specific workflow, familiarizing themselves with the taxonomy, accuracy levels needed, and edge cases. Gold tasks come in as they move from classroom training to practice tasks.

A gold set is created that is representative of the overall complexity of the dataset, ensuring a healthy mix of the edge cases. annotators practice on this set of which we already have “gold” answers. As each task is submitted we are able to compare the annotator answers with the gold task, generating custom metrics and error tags. The metrics are determined by the type of workflow and tool used, for e.g. in a semantic segmentation workflow, we focus on IoU calculations per label and depending on the client rubric each label may be weighed differently.

These metrics are then used to create trends and analyze each annotator’s performance individually, and provide relevant feedback. We are able to analyze trends at an asset label (which type of images are more difficult than others?), annotator level (which annotator is struggling exactly where?), and the impact of time (was today better than yesterday?).

Gold assessments, therefore, help us accelerate training by providing customized feedback to each annotator early on and allowing us to track their improvement over time. This enables Sama to quickly identify doubts, find edge cases, and have high confidence that annotators are ready to move on to production. Lastly, this allows us to calibrate and train the manual QAs on the specifics of this particular workflow, ensuring that nothing is missed.

Gold Metrics

Similar to gold assessments, gold metrics compare an annotator’s tasks against a known, completed task. These tasks, however, are interspersed within the production queue with the annotators unaware that these are gold tasks. These tasks then act as tests for the annotators, generating similar metrics as mentioned above. This allows the team to report upon the annotator’s performance against the gold tasks that increase insight into quality and help to further tailor training and coaching.

Gold metrics are most useful for clients looking to automate the quality loop on their side. Given that their Sama project team consistently samples and approves only high quality tasks, it is a neat way for them to save time and capacity. 

Because no two AI projects are alike, you need to make sure that your quality assurance (QA) process is designed to meet the unique needs of your particular project. Learn more on how to supercharge your data quality with Sama's Automated Quality Accelerators.