Blog > AI
Data-Driven AI for Good with Neil Sahota from IBM and the UN

Data-Driven AI for Good with Neil Sahota from IBM and the UN

Most of what we read about artificial intelligence (AI) discusses the speed, accuracy, and “stamina” of well-trained algorithms in assisting humans. Yet even IBM’s Watson artificial intelligence platform has proven that machine learning (ML) tools are only as effective as the training data that goes into them.

IBM is recognized as an industry leader in the data, analytics, ML, and AI sectors. Yet on Jeopardy several years ago, Watson gave puzzling and incorrect responses to trivia questions. It won the tournament, yet reinforced challenges many organizations have experienced with their AI initiatives.

For this episode of the How AI Happens podcast, we hear from Neil Sahota, an AI Advisor to the United Nations, a co-founder of the UN’s AI for Good initiative, an IBM Master Inventor and author of Own the AI Revolution.

sama-pii-data-anonymizer-for-training-data

Here are some key takeaways:

Data collection and storage are critical AI ingredients

The concept of AI has been around for decades. Yet the concept only became reality when companies like IBM developed tools for enterprise application integration, high-volume data collection, and unprecedented data processing speeds. The Watson platform is recognized for its ability to collect data from many accessible sources, organize it, identify relationships between data points and analyze it for future queries.

Collection may involve ingesting and aggregating data into a central database from various feeds. Or it might call for creating and training an indexing crawler to explore various accessible data repositories. Then it either scrapes data for collection, or identifies relevant data points and the relationships between them. Finally, it “bookmarks” information for on-demand discovery.

In the AI and cloud era, engineers need to understand how algorithms access and reason through oceans of information based on their training and programming. A couple of decades ago, computers and software could only present the data they had stored, based on a user-prepared reporting structure. Now, AI-enhanced applications respond to complex, real-time, natural language queries. Just like humans learn and improve with experience, as can ML bots.

AI confidence levels are important for training data

In healthcare, the closer a query result can be to absolute certainty, the better. Like when a surgeon queries thousands of patient cases seeking the right course of treatment for a set of symptoms.

Even for Watson, there are often many possibilities when diagnosing and treating patients. ML algorithms learn by ingesting and processing training and production data, building certainty in their response relevance and accuracy.

A confidence score is either presented as a figure between zero and one, or 1% and 100%. In healthcare, a virtual medical assistant may be 95% confident in a diagnosis based on symptoms and the case histories it has learned. Large, authentic data sets help train algorithms to make the best possible user recommendations based on available cases and outcomes.

Why 99.9% Query Result Confidence is Unrealistic

Many organizations have a wealth of data to train their algorithms, but near-certainty is unrealistic and potentially dangerous. An organization may not have enough training data and algorithmic confidence is based on excessive bias.

Uncommonly high confidence levels may encourage users to ignore alternatives. Or as users follow a recommendation repeatedly, results may change. Sahota cited the Law of Diminishing Returns to demonstrate this point.

Synthetic testing data improves query response accuracy confidence

Sahota described how in “closed ecosystems” like controlled financial services environments, synthetic data can help improve confidence for detecting instances of fraud or money laundering. Yet when there are many variables, like when studying the natural world, it is far less effective.

In the AI and ML sectors, human intervention and coaching are often required for training algorithms. In some cases, high variability and low confidence allow human intuition and abstract thinking to make better decisions. For example, using AI algorithms for court cases in the criminal, civil and international realms could be influenced by unfair biases. Without human abstract thought and empathy, penalties could be too harsh, or too lenient.

sama-pii-data-anonymization

The Future of AI for Good

AI is implemented by many organizations to augment humans in roles like customer service. Sahota believes AI algorithms and robotics will enhance our creativity, productivity, and even our abilities to see, hear, and walk. His work with global organizations like IBM and the UN provides him with unique perspectives on what is happening in AI and ML today, and what the future holds.

Technologies like computer vision, modern neuroscience, the metaverse, and AI were the stuff of science fiction only decades ago. Sahota’s views on opportunities like human-machine integration and sustainable mining may seem far-fetched, yet are likely closer to reality than one might think.

To listen in on more conversations like this one, visit our podcast page.

Related Resources

New Ebook: How to Get Quality Ground Truth Labels for All Autonomous Driving Applications – Without Busting the Bank

13 Min Read
data-annotation-challenge-retail

The State of Data Annotation in Computer Vision

13 Min Read

In-House vs Outsourcing Data Annotation for ML: Pros & Cons

13 Min Read