New artificial intelligence can detect offsides much faster than past technology, which took over a minute on average.
FIFA is using new artificial intelligence to help referees call offsides in this year's World Cup.
How does it work?
The system is called semi-automated offside technology (SAOT) and uses 12 cameras attached to the roof of the stadium to track the ball and each player's movements.
SAOT uses artificial intelligence to recognize and track players and the ball, calculating their positions 50 times per second.
A sensor is attached to the official Qatar 2022 World Cup ball, called Al Rihla — Arabic for "the journey" — allowing SAOT to compare the exact moment it was kicked with the position of the team's last defender and the opposing team's striker.
This level of precision is key for very tight situations in which it's difficult for referees to quickly call offsides. Sometimes a goal and even the result of the whole match can depend on this.
Whenever SAOT detects an offside, an alert is sent to the video match officials. They inform the referee, who ultimately has the final say. That's why the system is considered "semi-automated."
An upgrade in offside detection
In typical soccer matches, video assistant referee (VAR) systems are used. They take around 70 seconds to detect an offside — much longer than SAOT. Using the VAR technology, officials had to find the correct kick moment and draw the offside line themselves. With SAOT, all they have to do is confirm the offside suggested by the system.
The new process "happens within a few seconds and means that offside decisions can be made faster and more accurately," according to FIFA's website.
If the referee confirms SAOT's suggestion, the system will generate a 3D animation of the offside broadcast on a large screen in the stadium to allow fans to see why the call was made.
The SAOT system was tested for three years and is now "the most accurate offside support system available to video match officials," according to FIFA.
Detecting objects: A complex task
The task of making sense of — and extracting valuable information from — video footage is called video analysis, and the artificial intelligence sub-field that deals with that is computer vision.
Imagine that you are a computer, and you can't see the same way humans do. Your eyes are replaced with digital cameras that receive light and transform that information into data. The data tells you how every pixel at every frame appears — for example, how much green, red and blue each pixel has.
This data usually appears as a gigantic table of values. For example, for a 1080p video, where each frame has 1920 x 1080 pixels, each row would have 1920 pixels and each column 1080.
How do you make sense out of this? Well, that's one of the hottest topics in artificial intelligence — object detection and tracking.
How computers recognize people and things
Data scientists have developed different techniques to approach this problem. One is called convolutional neural network (CNN). You can see how this process looks on this website made by Adam Harley, a postdoctoral researcher at Stanford University in the US.
CNNs work by detecting objects layer-by-layer. One way to understand how this works is to think about the process of trying to discern the identity of an object in a pitch-black room.
Using your hands to feel the object, you will ask a series of questions that become increasingly more specific. First you may wonder "Is it rigid or soft?"
You press the object and realize it's hard in some places and softer in others. This movement transforms your understanding of the object: Now you have enough information to know it's something with both soft and hard features. This knowledge would represent the first "layer" of detection. In CNNs, this would be considered the "convolution."
After identifying the first layer, you will ask more questions — like what texture the object has, how big it is or what type of shape it holds. As each of these questions is answered, another layer forms, increasing your overall understanding of what's in front of you. This is close to how CNNs work.
At some point, you will gather enough information to start guessing what the object is. You've gathered that the object is furry, has four legs and ears that stick up from the top of its head. Is it a cat? By now, the CNN would be asking: Is it a player? Or a ball?
Once enough information has been gathered to make guesses, a classification process will be used to cross-check the computer's hypothesis with known objects.
Artificial intelligence like SAOT is normally trained against huge video databases full of objects that have already been identified by humans — in this case, for example, soccer players in the field. This is how the artificial intelligence learns how players look. After intensive training, this technology can easily and quickly detect and track players.