Sama staff attended ICCV 2023, and gained valuable insights, traded ideas, and made connections with like-minded individuals who are shaping the future of ML, AI, and CV. The sense of community and collaboration among such passionate individuals was energizing and made our experience all the more memorable.
Our two papers received a lot of positive feedback during the poster sessions. You can read the summaries here, and we’ll post the full papers soon! And we had great conversations about our focus on complex workflows, quality annotations at-scale, our GenAI for foundation models, and the importance of QA in future ML implementations.
While we wish our team could have attended every session, the sheer volume of content meant we had to pick and choose. Of everything we learned and saw, here is what we’re still musing on.
Angus Leigh, ML Developer:
The importance of high-quality labels and the existence of label impurities and ambiguities in common datasets. These two sessions were interesting:
An open source, self-driving dataset with a commercial-friendly, permissive license. It’s the first one I’ve seen that does not exclude commercial use, and I’m excited to dig in more.
The big boost in LLM performance in the past year. It makes me wonder what computer-vision related problems can be posed in the LLM-framework to take advantage of those proven, effective solvers.
Pat Steves, Technical Product Manager:
I saw lots of research on automated labeling in conjunction with mining for rare cases to find where manual annotation is needed. Waymo research gave a talk about this.
Self supervision was also a popular topic. This was a great workshop about why manual labels are problematic: they simplify the problem too much, encourage the model to memorize the fixed set of labels, and make us rely on fixed datasets and fixed objectives.
Mark Bakker, Account Executive and data scientist:
Mark held down the fort at our booth, and shared this insight from conversations held throughout the event:
“Everyone agreed that humans are essential to annotate edge cases that cannot be predicted. Synthetic data is the next step in automating the development of scarce and PII data, but humans should be in the loop to prevent models from overfitting.”
Learn more about how Sama can annotate data for computer vision use cases with high accuracy while meeting the challenges of scale.