The field of Artificial Intelligence and deep learning has bloomed beautifully in the last decade. It gave birth to the state of the art computer vision techniques. This encouraged innovation of many useful gadgets. We have developed cameras with jaw-dropping shutter speeds and color qualities. Moreover, we’re successful in reducing the size of these cameras. We can even fit them in pens!
Despite that, cameras still lack the basic ‘intelligence’ that can distinguish basic things such as a normal circle and a soccer ball. Fei Fei Li is a pioneer in this field. Researchers know her for her immense contribution in the field of image recognition and computer vision. She explains about things such as developments in the field. And how scientists are working to make cameras understand the pictures.
You might wonder that if cameras can even capture detailed surfaces on Mars and Moon, then what’s the next big thing in this field? As mentioned above, cameras need to understand what’s going on in the pictures. Cameras without artificial intelligence can’t point out the differences between a completely black picture and scenery, and that is very similar to blindness as they can draw any perception from imagery.
Machine learning engineers have been devising ways to train computers to identify objects by developing complex mathematical models and algorithms. However, Fei Fei Li observed that instead of focusing solely on algorithms, we should also pay attention to the type of image data on which the model will be trained.
You might wonder that if cameras can even capture detailed surfaces on Mars and Moon, then what’s the next big thing in this field? As mentioned above, they need to understand what’s going on in the pictures. Cameras without artificial intelligence can’t point out the differences between a completely black picture and scenery. This is very similar to blindness as they can draw any perception from imagery.
Machine learning engineers have been devising ways to train computers to identify objects by developing complex mathematical models and algorithms. However, Fei Fei Li observed that we should not focus solely on algorithms. Instead, we should also pay attention to the type of image data for training the model.
In 2009, around 50,000 workers across the globe started working on Fei Fei’s idea of a large and comprehensive dataset. This is the world-famous project - ImageNet. It has 15 million images and 22 thousand object classes. The reason behind making this versatile and huge dataset was to imitate the data in mind of a 3-year-old child. This mimics the training of an infant’s mind which is very similar to neural networks.
Researchers began training a CNN model on the given data. They followed the approach suggested by Geoffrey Hinton back in the 1980s. Just like the human brain, CNN has nodes and stacked-up layers in which the data flows over iterations. The CNN based on ImageNet had 24 million nodes & 15 billion connections.
Surprisingly, this model was very accurate in classifying multiple objects, even in complex images. Back then, it was a state of the art algorithm and model which won many competitions. The best part is that the general public can download ImageNet, that too free of cost!
Google’s Street View uses ImageNet to gain deep insights. For instance, it draws a correlation between car prices and location, crime rates, zip codes, etc. However, it only covers object detection and recognition.
Computer scientists have been using CNNs for image recognition. They are combining them with RNNs to perform tasks of NLP. As a result, we can see their uses in a lot of domains such as image captioning, robotics, drones, and a wide range of educational/security gadgets.
This field has made significant progress. But computers are still susceptible to mistakes. Even though we’ve come very far in the last few years, there's still a long way to go.