“You don’t put a team together with a computer, Billy.”
Well, as Billy Beane showed in the 2011 movie Moneyball, you do. And, with help from statistics, Beane put together a winning one.
But today’s sports analytics are on a different scale. Games are massive data-generating machines, ripe for analysis of every move and strategy. The sheer volume of data, combined with the efficiencies needed to come out on top — a second shaved off can be a win — make sports prime candidates for the use of artificial intelligence.
How sports organizations are already scoring with AI
Most reliable uses of AI today are based on machine learning, such as aggregating large volumes of different data from a large population, with the AI finding patterns and delivering recommendations based on these patterns. The field of sports analytics is no exception.
Recommendation algorithms fed by behavioral data
In sports, AI can drive ticket and merchandise sales by studying purchase patterns from a large number of fans. When fans use a team’s app to buy tickets or check the latest statistics, with every click, they are sharing data that can be aggregated and analyzed. Much like Netflix recommends a movie based on what you (and viewers like you) have already watched, sports franchises can use fan interactions to suggest purchases — a Cristiano Ronaldo jersey if a fan keeps checking the player’s stats — and ultimately, boost revenue.
Improving performance by studying patterns in players
AI is used in sports both on and off pitch, enhancing audience engagement and player performance. Camera feeds from previous matches and data on body angles when pitching or kicking collected from players’ wearables all funnel into algorithms. The programs analyze large banks of data to understand what it takes for a win; for example, what body stance a player should adopt or how often a team needs to be in possession of a ball.
Teams can use this insight to contrast their performance against the ideal and improve their approach in the process. Using video to improve performance and strategy is not new but AI delivers an increase in the kinds — fastest shot, height of shot, average shot depth and spread in tennis for example — and volumes of data to perfect these strategies.
In the future, expect such AI recommendations to fine-tune player performance in real-time.
Computer vision data labeling challenges in sports analytics
To create robust AI, we need to train algorithms on large banks of data that, as much as possible, account for all the situations and edge cases the model will observe in production.
For any single player, computer vision models must be trained to recognize an almost countless variety of poses, body postures, and positions.
In addition to this, semantic information must be added in order to create context on what actions the players are performing: are they passing the ball, running with it, intercepting, defending, and so forth? How are their actions changing over time?
Multiply this by the number of players, and the number of video frames in any given sports game, and you can imagine that the amount of data that needs to be labeled adds up rather quickly.
Beyond the sheer volume of data, there is also the challenge of labeling videos and images in varying lighting and weather conditions. Additionally, close interactions and equipment can result in partial or full occlusion of players or the ball.
Finally, take for instance sports where movements are fast and erratic, or where players wear helmets and can easily be mistaken for one another.
Case in point: there are a nearly infinite number of edge cases that must be accounted for when labeling data for sports analytics.
Take for example “noisy” data such as footage of a field partially obscured by a backstop.
How high-quality annotated data can net you an AI win
With the above challenges in mind, what can you do to ensure that you are fueling your algorithms with the highest-quality labels possible?
Given that datasets for sports analytics are necessarily large, be strategic about what data you send to be labeled first. Data curation and filtering tools can help prioritize labels that will have the biggest impact on your model’s performance.
And while it may be tempting to try and predict every edge case that your model may observe in production, assume that there’s a lot you don’t know you don’t know. Put in place a plan to uncover and quickly address edge cases as they arise. This includes setting aside budget and time to catch and resolve these outliers.
A crucial component to this is ensuring that your labeling process is iterative with tight feedback loops. Whether you have an in-house annotator who is your resident specialist on the rules of the game, or you’re upskilling a Data Annotation at Scale to become experts on your data, communication is key to high-quality annotated data at scale.
In Moneyball, Billy Beane’s mantra was “Adapt or die.” A similar recommendation holds true with artificial intelligence. Using AI in a variety of use cases both on and off the pitch can help sports companies score big. Those that are data-centric in their approach will race ahead of the competition — those that don’t risk being left behind.