Machine Learning 101

Sama Team

June 21, 2018

8 Minute Read

If you’ve kept up with today’s tech news, then you’ve probably read some pieces about machine learning. Unfortunately, many of those articles target expert audiences who already know how to code and design algorithms. What is machine learning, anyway, and where can you turn to get up to speed on the basics?

In this post, we’ll present a simple overview of machine learning and how it helps computers solve complex problems. Even if you're a complete novice, you'll learn something new from the information below.

What is Machine Learning?

Insider blog TechEmergence compiled a definition of machine learning that aggregates definitions from several leading experts in industry and academia:

“Machine Learning is the science of getting computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.” - TechEmergence

The key difference between machine learning and traditional programming is that a machine learning algorithm does not have to be told formally how to get from the input data to the output data. The machine learning algorithm is given examples of input and the expected output and it learns the rules itself. As the algorithm is presented with more input examples and their associated expected output, it can improve decision making performance over time.

chesspieces_bw

For example, if you played chess against software that was built using machine learning, the software powering your computer opponent would study the results of your moves, its moves and its strategies to become a better player. Eventually, it would learn so much that it would defeat you in every game - even when presented with unfamiliar scenarios it had not seen and analyzed. Even if you're a chess master, the computer will almost certainly learn to play better than you.

You can apply the chess example to any type of information. For instance, machine learning could help software identify people trespassing on property, predict stock market trends, navigate autonomous vehicles, identify farming pests and more. As long as the software has access to useful data and a reliable algorithm, it can learn.

The Three Types of Machine Learning

Not surprisingly, a cutting-edge computer science topic like machine learning can get very complicated. Most machine learning work can be grouped into three categories: supervised learning, unsupervised learning, and reinforcement learning.

SUPERVISED LEARNING

Supervised learning means that software is trained on data that has been labeled. For instance, you might input a 500 images labeled "cow" and another 500 images labeled "human” into the algorithm. After analyzing the images, the program could differentiate between a picture of a human or a cow based on an analysis of the pixels arrangement color and shape. Reasonably accurate computer vision programs require large quantities training images and can make amusing mistakes if inadvertently trained to notice something else -- like grass.

coworhuman

Labeling data makes it considerably easier for computers to learn. This principle isn't surprising when you think about how you learn. Imagine if someone handed you a page full of numbers with no explanation. You probably wouldn't know what the numbers mean and thus you wouldn’t know how to process that data right away. However, if you were then handed the same page of data with the label “phone numbers,” the numbers suddenly make more sense.

UNSUPERVISED LEARNING

In unsupervised machine learning, the data used do not have any labels. Without labels, successful machine learning usually requires more data before it can generate useful outputs. The algorithm can try to detect similarities and differences in the input data and start to group them based on those characteristics. With enough examples, the groupings can become very meaningful.

Referring to the scenario above, the phone numbers (assuming they have area codes) would have three digits that vary far less than the following seven. The unsupervised learning algorithm could start to group the phone numbers based on their similar area codes, and correctly assign a newly discovered phone number into the appropriate area code group.

numbers

Of course, the algorithm doesn’t even know what an area code is, but it has learned something important about patterns that it can apply to sorting future samples. (And now you perhaps have an inkling of how Netflix can recommend movies based on your previous viewing choices.)

REINFORCEMENT LEARNING

Reinforced learning conceptually splits the difference between supervised and unsupervised approaches with a trial-and-error approach. In the chess-playing example, you might have an algorithm that can make any move, and a grader that tells the algorithm whether the player’s move is legal (that is, if it tries to move a pawn six spaces forward, the grader says, “nope!”). Through trial-and-error, the algorithm would eventually learn how each chess piece should move. Similarly, as it plays through more games, it would learn what it means to win or lose, and how to better achieve the wins.

In fact, just these kinds of techniques allowed the AlphaGo and AlphaGo Zero programs to very rapidly become world-class Go players.

Clean Data is Necessary (But Hard to Get)

Machine learning relies on clean data. Without reliable data, software can't learn the right lessons or become better at usefully automating tasks. It might learn from the noise instead of the signal.

Unfortunately, it's difficult for data scientists to provide the most advanced learning algorithms with good, clean data. Some of the reasons include:

  • Insufficient people to label a mountain of raw data;

  • Irrelevant data that gets mixed in with desired data;

  • Incomplete or partially labeled data; and

  • Human error in labeling data

These challenges could mean that your machine learning algorithm uses corrupted training data, which could lead to poor learning results that get repeated and amplified. The software, in other words, doesn't learn the right lessons to do its job well.

Working with a partner that understands the most effective ways to source and identify clean data will give you an advantage over competitors. You can learn more about data enrichment by reaching out to Sama. Our training data work is trusted by the world’s leading technology teams working on AI and Machine Learning across industries, from self driving cars to robotics for advanced surgery.