Key Ingredients for More Accurate Computer Vision Data Labeling

Key Ingredients for More Accurate Computer Vision Data Labeling

No matter how much processing power you have at your fingertips, any computer vision model you build is only as good as the data used to train it. And the beating heart of all that data is the way you define your data. The initial semantic knowledge that’s used to orient any dataset (or point cloud, in this case) has to be provided by an actual human. For whatever reason, most humans simply won’t make this type of error:

That said, while humans wouldn’t mistake a horse-drawn carriage for a tractor-trailer, more fine-grained or subtle differences are another matter.

Consider, for example, just how much education you’d need to distinguish between images of tumors. The quality of specialized knowledge is variable across people. And while it’s possible to overcome individual variation by using multiple independent annotators who validate each other’s work (see our post on annotation review here) and to train a team of annotators to correctly annotate highly complex raw data, having the right rubric and the annotation tools for your own personal experimentation is the most important crucial first step.

No one knows your research objectives better than yourself. And so, it’s important for any computer vision projects you undertake, no matter how small or large they eventually become, that you create the rubric for segmenting the data. The first, underlying source of truth has to come from you.

How to create a robust 3D rubric

When you build a rubric, here are the best practices you should observe:

  1. Define the limits of your (data collection) systems:
    1. How are your cameras and/or other sensors calibrated? 
    2. How accurately does your system represent localization? 
  2. Define what “quality” means:
    1. Develop visual examples of how a properly annotated item looks like.
    2. Define the different error types that annotators will likely generate.
    3. Create a scoring scheme.
  3. Create “gold tasks”, i.e., a pre-scored task, not just an item, that future tasks can be judged against.
  4. Test and iterate using a small subset of your data so that you’re sure the steps you took above actually yield results in line with your expectations. Keep going until your rubric enables independent annotators to perform at a threshold you deem optimal. When you do, you know you can start scaling your project.

So that’s the plan for developing a robust rubric. But you probably noticed that there’s a key ingredient crucial to the success of your rubric in those steps: A platform that lets you test out the rubric in the first place.

A prerequisite for 3D quality rubrics

Currently, data scientists largely have to create their own annotation software from scratch (or select a handful of disparate tools to curate, annotate, and gather training data), or go straight to hiring managed services to annotate point cloud data. You’re respectively burning up a lot of your own time or a lot of money in these two scenarios.

The ideal solution is to find a singular platform that has all the same 3D annotation features used by professional 3D annotation companies:

  • Data validation/model prediction tools;
  • An easy to use user interface;
  • Compatibility with all major point cloud file formats
    • .bpf
    • .csd
    • .ept
    • .e57
    • .gdal
    • .geowave
    • .i3s
    • .ilvis2
    • .las
    • .matlab
    • .mbio
    • .mrsid
    • .nitf
    • .npy
    • .pcd
    • .ply
    • .pts
    • .qfit
    • .txt
  • Configurability of metadata types;
  • A very quick startup time;
  • Quickly adaptable settings; and
  • Ease of scaling.

Luckily, that’s where Sama Go comes in. We here at Sama built a platform for our own expert 3D annotators use, and we’ve now opened up that platform to you.

Sama Go was built by data scientists for data scientists because, frankly, we simply couldn’t find the tools we needed to make the best computer vision models possible. So you can be sure that if you have any data annotation needs, we’ll have experienced it too and built a tool that meets those needs.

Sign up for Sama Go today; it’s free to use!

Related Resources

In-House vs Outsourcing Data Annotation for ML: Pros & Cons

13 Min Read

Sama’s Experiment-Driven Approach to Solving for High-Quality Labels at Scale

6 Min Read

ML Assisted Annotation Powered by MICROMODEL Technology

8 Min Read