Definition of Computer Vision
Quoted from DQLab, Computer Vision is one of the technologies that allows computers to see and recognize objects around them like humans using cameras and Artificial Intelligence.
As humans, we observe things around us using optical nerves, retinas, and visual cortices. We can distinguish between objects, estimate their distance from us, and calculate their speed. Similarly, computers use Computer Vision to recognize an object.
However, unlike humans, computers do not get tired. This allows us to use computers to analyze a large number of objects in a short time. This allows computers to detect product defects that are not visible to the human eye.
Computer Vision requires a large database to function effectively. This is because they require a lot of information to get every possible outcome according to the given task. For example, a computer trained to recognize healthy plants needs to ‘see’ thousands of visual reference inputs of plants, farmland, animals, and other related objects. Only with that, the computer can effectively recognize various types of healthy plants, distinguish them from unhealthy plants, measure the quality of farmland, detect pests, and other animals among the plants, and so on.
The Two Main Technologies that Power Computer Vision:
1. Deep Learning (A type of Machine Learning)
Machine Learning utilizes algorithm-based models to allow computers to learn context through visual data analysis. After enough data is provided to the model, the model will be able to ‘see the big picture’ and distinguish between visual inputs. Instead of being programmed to recognize and differentiate between images, the machine uses AI algorithms to learn on its own.
2. Convolutional Neural Network
Convolutional Neural Network helps Machine Learning models see by breaking down images into pixels. Each pixel is labeled or tagged. These labels are then collectively used to perform convolution, a mathematical process that combines two functions to produce a third function. Through this process, the convolutional neural network can process visual inputs.
To see images as humans do, the neural network performs convolution and checks the accuracy of the output in many iterations. Just like humans will see distant objects, the convolutional neural network starts by identifying basic shapes and hard edges. After this is done, the model fills in the gaps in its data and executes iterations of its output. This goes on until the output accurately ‘predicts’ what will happen.
While the Convolutional Neural Network understands single images, recurrent neural networks process video input to allow the computer to ‘learn’ how a series of images relate to each other.
Example of Computer Vision
The following are examples of company products that apply Computer Vision, showcasing the potential of AI to help industries:
1. Google Translate
In 2015, Google launched a translation service that utilizes Computer Vision through a smartphone camera. When we open the Google Translate application, we are provided with a Scan Text menu. The way it works is that the camera detects the text and translates it according to the language we choose.
2. Facebook 3D
Photo Launched in 2018, Facebook 3D Photo transforms ordinary two-dimensional photos into 3D images. Users can rotate, tilt, or swipe on their smartphones to view these images from different perspectives. Machine learning is used to extrapolate the 3D shape of the objects depicted in the image. Through this process, a realistic-looking 3D effect is applied to the image.
3. FaceApp
Faceapp is a popular image manipulation application that modifies the visual input of human faces to change gender, age, and other features. This is achieved through a deep convolutional generative adversarial network, a particular subtype of computer vision. Faceapp works by collecting sample data from the smartphones of several users and inputting it into the deep neural network. This allows the system to ‘learn’ every little detail of human facial appearance. This learning is then used to improve the predictive ability of the application and enable it to simulate wrinkles, modify hairlines, and make other realistic changes to human facial images.
4. SentioScope
SentioScope is a fitness and sports tracking system developed by Sentio. SentioScope typically operates as player tracking for soccer, processing real-time visual input from live games. Recorded data is uploaded to a cloud-based analytics platform. SentioScope relies on 4K camera settings to capture visual input. It then processes this input to detect players and gain real-time insights into their movements and behavior.