Your machine learning model is only as good as the data it’s trained on. And with 80% of AI project time being spent training the large volume of data necessary to train a model, efficiency improvements early on in the process are sure to have compounding effects.
At Sama, we have a dedicated Machine Learning team working at the forefront of AI research to identify optimization opportunities just like this, so we can develop advanced annotation tools to smooth the path to production for our clients. One of the papers the team presented at last year’s CVPR—Human-Centric Efficiency Improvements in Image Annotation for Autonomous Driving—shared an approach to speeding up polygonal instance segmentation using ML.
Today, this technology has been incorporated into our platform to make our clients’ labeling process more efficient.
We call it ML Assisted Annotation powered by MicroModels™, and it’s already helping clients predictably get higher quality training data in half the time.
Read on for an overview of ML Assisted Annotation powered by MicroModels™: how it can help you develop models that are more scalable, robust and accurate – and can be brought into production more quickly.
ML Assisted Annotation (MAA) powered by MicroModels is an architecture that allows Sama to expedite the labeling process by drawing from a library of models trained on specific use cases. MAA can be used to generate high-quality pre-labeled annotations, which annotators validate to help them continuously improve over time.
This powerful combination of skilled annotators and an AI-powered platform allows us to deliver a high standard of label quality to our customers every time, along with efficiency improvements and quicker time to market.
In order to understand how MAA works, we first need to discuss the DEXTR model. DEXTR, or “Deep Extreme Cut,” is a publicly available object segmentation model for images and videos.
Many ML methods like DEXTR have been suggested to speed up the process of instance segmentation, but these are not typically tested in a high-scale production environment, nor are ML outputs easily edited by human annotators. This makes it difficult to confidently reach the label quality standards required to run a model in production.
MAA combines the well-known DEXTR approach with a raster-to-polygon algorithm to make results easily editable by a human in the loop. We’ve found that this approach—which pairs skilled annotators with ML-powered automation—significantly increases labeling efficiency and quality.
Let’s see what that looks like in practice, using an example from the Autonomous Vehicle industry.
When an annotator logs into the Sama annotation platform, they are presented with this workspace. In this example, the workspace is customized to allow the annotator to draw instance segmentation polygons around each of these vehicles:
You’ll notice that there are several vehicles in this image. In a manual context, it could take a human several hours to deliver high-quality annotations of every single vehicle:
This process is significantly accelerated with Machine-Assisted Polygon Annotation.
The model allows the annotator to use a crosshair tool to identify only four extreme points: left, right, top and bottom boundaries. These four clicks are the only inputs needed to create a heat map that is then sent to the inference server, returning an accurate prediction of a raster mask.
A polygon prediction can then be further refined by an annotator by switching into editing mode. This enables annotators to label precisely and ensure that high-quality requirements are met without compromise.
This mode also enables annotators to use more than four extreme points in order to produce even more accurate predictions. A fifth user input point can easily be added, with the model immediately incorporating the new input to update its prediction.
If an ML model struggles to identify specific shapes, annotators can add a few more inference points to help result in a more accurate prediction, and then refine that prediction manually to ensure high-quality labels.
Our clients are already seeing impressive results from MAA powered by MicroModels:
What’s more: increasing the efficiency of this labor-intensive manual data annotation process reduces the barrier to entry for more ML teams... and not just those with large R&D budgets. Technology like this can also help democratize data labeling by driving down cost, so we can see even more deserving companies leverage AI to drive value for their business.
Small teams who are getting started with labeling may not have yet defined what type of annotations they need, or how much data they need to be successful. MAA can help them iterate more quickly, developing models in short increments rather than in large, cumbersome workstreams. The end result is a quicker time to value, and ultimately, to market — for organizations of all shapes and sizes.
Currently a Director of Product Management at Sama, Saul is passionate about the intersection of technology and social impact. He manages Sama’s data labelling products to ensure high quality training data efficiently and reliably reaches our customers. Experienced in both product and professional services, Saul is a proven leader who takes a data driven approach to expanding Sama’s capabilities and features. When not at work, you can usually find Saul enjoying the outdoors and spending time with his family.