Making computers detect and distinguish objects has become simpler than ever. Machine vision algorithms have undergone remarkable development over the past few years where accuracy and speed have reached sufficient levels to enable a wide variety of practical applications.
Furthermore, a lot of groundbreaking algorithms have been implemented and made open source to anyone interested in the field. To a large extent, this success can be attributed to the significant advances in the field of machine learning which has gained a lot of traction. A lot of researchers in various fields – from modeling in bio-engineering to forecasting in quantitative finance – are attempting to successfully employ machine learning techniques and hope to replicate the remarkable results achieved in the field of machine vision. But what makes machine learning so successful in computer vision in particular?Successful data modeling and forecasting requires three main characteristics of the problem at hand:
1) the nature of the problem and the circumstances around it are sufficiently unchanged in time, i.e., the relationships that the model captures are preserved in time;
2) there is good observation of the data required for extracting these relationships; and
3) there is a good understanding of the problem essence and the data features which are fundamental for successful modeling.
In machine vision, these conditions are met:
1) the main features of most common objects don’t change;
2) high-resolution devices can provide very high-quality images with sufficient information for precise detection; and
3) the important features that distinguish each object can be defined. The last point has probably been the biggest challenge for earlier machine detection algorithms relying on edge detection, clustering and defined hierarchies. These approaches for feature detection to a large extent have to be individually specified for each object of interest which makes it a very complex and challenging task.
Deep learning is the technique which provided a viable solution. In it, features do not have to be analyzed and predefined, but can be extracted with the help of multiple layers of so-called convolution neural networks (CNNs) which themselves are trained from data. The state-of-the art approach for feature extraction is the deep residual neural network (ResNet) introduced by Microsoft [1]. In it, hundreds of feature maps can be generated from a single image, and each map captures different features which are important for detection – one captures edges, another captures curves, etc. These feature maps enable very elaborate relationships to be picked up within the image. State-of-the-art detection and segmentation algorithms include: Fast R-CNN [2], Faster R-CNN [3], Mask R-CNN [4], U-net [5]. They can deliver multiple object detection and segmentation within a single image at sub-second speeds. This opens up numerous possibilities for novel applications in automation of many practical tasks.A big enabler in the field has been the development of open source libraries for building deep learning neural networks such as Python’s Tensorflow, Keras, Pytorch, and Fastai. Even implementations of R-CNN and Unet algorithms are made available open source to anyone interested in the field. A common concern with deep learning is the large computational complexity for algorithm training, where typical models could have hundreds of thousands of parameters to be estimated from training data. This demand for computational power can be met by commercially offered solutions in the form of cloud computing. Some of the more prominent platforms include: Google Cloud, Colaborate, Azure, Salamander, Amazon Web Services, Amazon Sage.
If you feel the need to add machine vision to your products, now is the best time. The groundwork for building practical machine vision applications has already been laid. It is easier than ever even for hobbyists to engage successfully in this area. Imagine how many more possibilities are introduced for people skilled in the field who now have been unburdened by the challenges of having to set up everything from scratch every time they want to build a new practical application.
We at ConSenSo can make this application for you. Just give us a shout!
Sources:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep Residual Learning for Image Recognition”, https://arxiv.org/abs/1512.03385
[2] Ross Girshick, “Fast R-CNN”, Microsoft Research, in ICCV 2015, https://arxiv.org/abs/1504.08083
[3] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, https://arxiv.org/abs/1506.01497
[4] Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, “Mask R-CNN”, https://arxiv.org/abs/1703.06870
[5] Olaf Ronneberger, Philipp Fischer, Thomas Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation”, in MICCAI 2015, https://arxiv.org/abs/1505.04597
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.