Did you know that the automatic sports broadcasting system that tracked players in real-time by moving a robotic camera to capture their movements was created by the brilliant Dr Patrick Lucey who is the Chief Scientist at Stats Perform, who’s mantra is “We are the DNA of Sports.” Brilliant, right?
Just like any type of data in a corporate ERP system, sports has a plethora of data to track player’s movements, speed, agility and more. So, how can AI help coaches, athletes and analysts to achieve the end goal… WIN!
As a leader in data collection for sports, Stats Perform offers a wide range of sports predictions and insights using AI solutions. These bad boys have been in business for over 40 years and have collected the world’s deepest sports data. This includes covering over 27,000 live streamed events worldwide with a total of 501,000 matches covered annually from 3,900 competitions. This translates into the collection of billions of unique event and tracking data points available in their immense sports databases. To make use of this invaluable dataset, Stats Perform has created an AI Innovation Centre that hired more than 300 developers and 50 data scientists to create a series of AI products with the goal of measuring what was once immeasurable in sport. (Source)
So what are the different types of sports data? Whether the sport is football, water polo, basketball or tennis, they have boiled it down to three main data points.
Let’s start with Box Score:
Utilizing high-level box-score statistics (half-time match score, full-time match score, goal scorers, time of goals, yellow cards, etc.) can analyze or summarize a 90-minute match of soccer or football to provide insight as to how the game was played in just a few moments. Box-score statistics can provide intelligence as to who (or whom) won the match, which team had the lead first, when the goals scored and how close together the score was to one other. Box-score stats provide a decent snapshot of a game and a decent level of match reconstruction.
We have all heard the announcers discuss total number of shots taken by a basketball team, total missed by a single player and total % of free throws. I have always been amazed by announcers with their ability to know player stats and know that the hockey goalie saved 10 of 12 shots. This data can now be analyzed and massaged to predict player and team behavior. Now, let’s take it a level deeper into Event Data.
The second facet of Stats Perform is Event Data:
Event data, or play-by-play data, provides more detail than box-score statistics by identifying and tracking key moments during a match. For examples, play-by-play commentary of a match can offer textual descriptions of what occurred at every minute of the match. Similarly, spacial data of the game (i.e. spacial location of players) can provide visual reconstructions of some of the key events in a match, such as how a particular goal was scored. While it is not the same as watching the video, it is a quick digitised view of the real-world play that can be reconstructed in seconds. The next question becomes, how do we make sense of all this data in sports?
The third facet is Tracking Data:
Details are everything. Tracking the Box Score and Event Data (as called defined by Sports Perform) is a detailed level of data being captured in sports. Could you imagine if a coach knew when and how a QB would throw a football and in which trajectory 80% of the time? The analysis of this data enables the projection of the location of all players and the ball into a diagram of the pitch that best reconstructs a match from the raw video footage of that match. In high school and college, we would have video reviews and our coaches would spend countless hours stalking our competition to know when a basketball player would drive to the basket or pass down low for the score. Having a digital representation through tracking data of all players on the entire pitch enables analysts to perform better querying than simply using a video feed that only displays a subsection of the pitch.
While we sit here in the 2020s and live in a data space, we have to give homage to Bill James. In 1981 ( I was 1), he created the Project Scoresheet that aimed to create a network of fans to collect and distribute baseball information to Daryl Morey’s integration of advanced statistical analysis in the Houston Rockets in 2007. Bill literally set the tone for using data in sports to track movement and wins.
Now, we are in a new era of data (that definitely includes sports) that puts precedence on the value of traditional box-score and event data by complementing it using deeper tracking data. The AI revolution in sports is here and it’s excititng, but it is a game changer.
As a complete athlete since I was 8 years old, we would show up to games unknown of the competition until we played them 2-3x per season. Our AI was locked in our brain based on our competitions athleticism, if they were right-handed, if they had long strides in cross-country, etc. Now, we have the following:
To overcome the limited coverage of in-venue systems, Stats Perform are now focusing their AI efforts in capturing tracking data directly from broadcast video, through an initiative called AutoStats. It leverages the fact that for every sports game being played, there should be at least one video footage of that event being recorded and potentially being broadcasted. The way of getting the best coverage of tracking data is capturing the data directly from broadcasting footage.
2. Broadcasting Footage:
To overcome the limited coverage of in-venue systems, companies focus on their AI efforts in capturing tracking data directly from broadcast video, through an initiative called AutoStats. It leverages the fact that for every sports game being played, there should be at least one video footage of that event being recorded and potentially being broadcasted. The way of getting the best coverage of tracking data is capturing the data directly from broadcasting footage.
3. Converting Pixels To Dots
Converting video pixels to dots refers the process of taking the video footage of the game and digitally mapping each player movement to trajectories that can be displayed on a diagram of the pitch in the form of dots. The main advantage of this method is the compression of the footage. An uncompress raw snapshot image of a game at 1920x1080px from a single camera angle can be as large as 50MB, which means video footage of that game can be as large as 50MB per frame. If instead of one camera angle you have 6 different camera angles, the data file size multiplies to around 300MB per frame. This is an incredibly high amount of high dimensional data, but not all of it is useful for sports analysis.
The way the conversion from pixels to dots occur is via supervised learning, where the computer learns through machine learning processes to map and predict the input data from the pixels to the desired output of the dots. A number of computer vision techniques can be applied to achieve this goal. To get away from sports for a second, the below image shows how to convert pixels to dots on a basketball court?
4. Mapping Dots to Events
Taking the pixel data of the video, the movements of the dots over specific timeframes can be mapped to particular events. For example, in basketball, you can start mapping these dots in the tracking data to particular basketball-related events that describe how certain outcomes occur in terms of tactical themes, such as pick and roll, type of coverages on pick and roll, did the player do a drive or a post up, off-ball screens, hand off, close out, etc. The dot trajectories are mapped to the semantics of a basketball play, and the players involved in that play, using a machine learning model that does that transformation using pre-labelled data.
5. Mapping Events to Metrics
Expected metrics explain the quality of execution of certain events. The labels assigned to certain events are often not informative enough to explain that event. Instead, expected metrics transform an outcome label of 0 or 1 (goal or no goal) to a probability of 0 to 100% using machine learning. For example, a shot that goes in goal is considered 100% effective. However, a shot attempt that hits the post might be considered 70% effective, even if it did not end up in a goal. Regardless of the final outcome of that event, expected metrics help to evaluate whether an event was more likely to be 0% (unsuccessful), 100% (successful) or somewhere in the middle (ie. 55% successful). This concept of expected metrics is the basis of the Expected Goals (xG) metric in football. Expected Goals can also be extended to passes to calculate the likelihood of a pass reaching a certain teammate on the pitch.
Expected metrics provide an additional degree of context to each situation. For example, in basketball they use Expected Field Goal percentage (EFG) to explain that if a player misses a 3-point shot, rather than simply classify that player as missing a shot we can assess what is the likelihood that an average league player would have scored that shot from a similar situation. This can provide a measure of talent of a player over the league average and better contextualise his performance.
What is the conclusion here? This is the true value of highly granular levels of data and a data-driven approach to sports analysis. Tennis pros, basketball coaches, college scouts are utilizing unique types and amounts of data to predict, analyze and scout athletes.