↳Sama’s commitment to our people, the planet, & governance is outlined in our annual Impact Report. Read Now.
ICCV 2023: 5 sessions and a dataset we’re still thinking about

ICCV 2023: 5 sessions and a dataset we’re still thinking about

Sama staff attended ICCV 2023, and gained valuable insights, traded ideas, and made connections with like-minded individuals who are shaping the future of ML, AI, and CV. The sense of community and collaboration among such passionate individuals was energizing and made our experience all the more memorable.

Our two papers received a lot of positive feedback during the poster sessions. You can read the summaries here, and we’ll post the full papers soon! And we had great conversations about our focus on complex workflows, quality annotations at-scale, our GenAI for foundation models, and the importance of QA in future ML implementations.

While we wish our team could have attended every session, the sheer volume of content meant we had to pick and choose. Of everything we learned and saw, here is what we’re still musing on.

Sama staff smile back at you from our bright green booth at ICCV 2023

Angus Leigh, ML Developer:

The importance of high-quality labels and the existence of label impurities and ambiguities in common datasets. These two sessions were interesting:

  1. Improving Visual Representations by Circumventing Text Feature Learning
  2. SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining

An open source, self-driving dataset with a commercial-friendly, permissive license. It’s the first one I’ve seen that does not exclude commercial use, and I’m excited to dig in more.

Zenseact Open Dataset

The big boost in LLM performance in the past year. It makes me wonder what computer-vision related problems can be posed in the LLM-framework to take advantage of those proven, effective solvers.

MotionLM: Multi-Agent Motion Forecasting as Language Modeling

Pat Steves, Technical Product Manager:

I saw lots of research on automated labeling in conjunction with mining for rare cases to find where manual annotation is needed. Waymo research gave a talk about this.

Self supervision was also a popular topic. This was a great workshop about why manual labels are problematic: they simplify the problem too much, encourage the model to memorize the fixed set of labels, and make us rely on fixed datasets and fixed objectives.

Mark Bakker, Account Executive and data scientist:

Mark held down the fort at our booth, and shared this insight from conversations held throughout the event:

“Everyone agreed that humans are essential to annotate edge cases that cannot be predicted. Synthetic data is the next step in automating the development of scarce and PII data, but humans should be in the loop to prevent models from overfitting.”


Learn more about how Sama can annotate data for computer vision use cases with high accuracy while meeting the challenges of scale.

Related Resources

New Ebook: How to Get Quality Ground Truth Labels for All Autonomous Driving Applications – Without Busting the Bank

13 Min Read

The State of Data Annotation in Computer Vision

13 Min Read
video annotators

In-House vs Outsourcing Data Annotation for ML: Pros & Cons

13 Min Read