Keep it Secret, Keep it Safe: Announcing the PII Data Anonymizer

Audrey Boguchwal

March 26, 2020

3 Minute Read

Sama is excited to launch the PII Data Anonymizer as part of our platform for video training data. This technology enables obscuring of sensitive, personally identifying information (PII) in training data.

samasource-pii-data-anonymization

In light of new laws like GDPR and CCPA, it’s important for companies building AI and ML technologies to carefully manage data with PII information. Obscuring PII helps Sama and our customers work to protect privacy

Sama’s PII Data Anonymizer helps make more data available to train AI by keeping personally identifying information safe across a variety of data sources: People in camera images from retail spaces and public places, street-level images of people and license plates captured by vehicles, smart city applications on public transit and more.

Applications for anonymization range from autonomous transportation, detailed customer demographics, customer data like clothing and emotion, people counters, and security.

This deep learning pre-annotation technology allows Sama to obscure faces and vehicle license plates that appear in data without the need for any human intervention. That means that private information remains private and is never seen by another person.

samasource-pii-data-anonymizer-for-training-dataaWhen Sama receives customer data, it can be run through our anonymizer technology service before any labeling occurs. The service would automatically detect faces and license plates and obscure them, as well as blur faces and license plates so they are not recognizable.

Alternatively, it can replace faces and license plates with realistic computer-generated avatars. This AI-generated content creates training data that looks like real-time data when people and vehicles are the primary objects of interest for the algorithm.

Unlike manual blurring, Sama’s PII Data Anonymizer is run without a human examining the data, which contributes to the privacy of PII data. It is built on deep learning and is run within our technology platform, ensuring that customer data never leaves Sama’s secure cloud environment.

From pilots to multi-year projects, Sama securely trains and validates computer vision and NLP models. We work on a range of use cases ranging from e-commerce to autonomous transportation, manufacturing, navigation, retail, AR/VR, and biotech. If your goal is to quickly build smarter AI, contact our team to discuss your training data needs.