Real time object detection Viola-Jones algorithm is proposed by two scientist Paul Viola and Micheal Jones in 2001. It is mainly use for face detection, goal is to detect face from non face. The merits of this algorithm is its speed and its accuracy.
1. Selection of Haar-like features
If we look at the human face we can see that each face share some common features such as, darker part below eyes, cheeks are brighter than eye region, lighter part below eyebrows, darker part between two lips etc. From this we can tell that every human face share some similar pattern and this pattern helps to detect human face.
Summary of the above paragraph is to differentiate between brighter part and darker part. This can be done by summing the pixels in both region and comparing them. The sum of pixel values in the darker region will be smaller than the sum of pixels in the lighter region. This can be accomplished using Haar-like features.
2. Creation of an Integral image
The integral image creation process is progressive addition of intensities on subsequent pixels in both horizontal and vertical axis.
Horizontal axis => 4 + 1 = 5, 5 + 2 = 7, 7 + 2 = 9 ….
Vertical axis => 4 + 0 = 4, 4 + 3 = 7, 7 + 2 = 9 ….
Integral images helps to calculate sum of pixel in a given image quickly. Also, this gives a considerable speed advantage over more sophisticated alternative features. Because each feature’s rectangular area is always adjacent to at least one other rectangle.
3. Running AdaBoost training
AdaBoost training is use to speed up the calculation, for example if we are working on 24×24 image then there can be more than 160,000 features. It is impossible to evaluate them all when testing an image. So it is necessary to select few important features that can be useful to detect face or other object. Adaboost training helps to do so. For more refer here.
4. Creation of Classifier Cascade
The objective creating this is to avoid non faces. It is basically a strong classifier made of multiple week classifier
Let’s try to understand this from above figure
- Pass image in scanning window
- Image goes into cascade(1)
- Case 1
- Cascade(1) accepts the image, means it thinks it’s a face
- Pass image to cascade(2)
- Case 2
- Cascade(1) rejects the image, means it thinks it’s not a face
- Pass image to reject window and declare it as non face
- After passing from each cascade the image is confirm to be face and ready for further processing
For better result it is necessary to use best classifier at early stage. Otherwise, classifiers would cause mistakes and will prohibit that image from next stages.