Recognition of Partially Occluded, Expression Variant Faces using Elastic Bunch Graph Matching
Aug 2nd, 2007 by admin
Recognition of Partially Occluded, Expression Variant Faces using Elastic Bunch Graph Matching
![]() |
ABSTRACT-The aim of the paper is to develop a system able to process a large number of new faces by introducing a novel data structure, namely face graph (FG), which requires less space for storing important information of face. The occluded image and thereafter the occluded part of the face is detected using Elastic Bunch Graph Matching method. This procedure is applicable for different views of faces (i.e. recognizing a half-profile from a database of frontals). The performance of the proposed system is compared with earlier methods.
KEY WORDS: Face Recognition, Occlusion, Face graph, Elastic Bunch Graph Matching, Fiducial point, Jets.
-
INTRODUCTION
Recognition of human faces from single image per person is made difficult by variations in face position, size, illumination, expression, and occlusion and poses (front, profile, etc.). We propose a semi-automatic system able to recognize human faces as well as the occluded part on the basis of single gray-level mug shots matched against a large data set including one image per person (100-250 persons). The system performance and robustness is assessed and compared with other systems.
To be able to process a large number of new faces, we have introduced a data structure, namely face graph (FG), which requires less space for storing important information of face. It is constructed using a representative set (typically 70) of model graphs having same poses, (e.g. frontal) and hence the same graph structure.
For new images, graphs are generated and compared with the stored graphs using a similarity function. This similarity function accounts for spatial distortion and the example with the highest average similarity is taken as the desired person. When mismatch occurs, the vertices are identified by elastic bunch graph matching (EBGM) algorithm [3][16] and the occluded part of the face is detected using the corresponding nodes of the face graph. This procedure is applicable for different views (i.e. recognizing a half-profile from a database of frontals).
The complexities of face recognition mainly lie because of the constantly changing appearance of the human faces such as variations in occlusion, illumination and expression. One way to overcome these difficulties is to explain the variations by explicitly modeling them as free parameters [4, 5].
One image per person problem can be traced back to the early period when geometric-based methods were popular, where various configural features such as the distance between two eyes are manually extracted from the single face image [8]. Recently, several researchers [11], [18-20] have begun to pay attention to this classical problem within the template based framework due to the needs of the applications and its potential significance of solving problem. Recently, Martinez [18, 21, 22] has partially tackled the previous problems using a local probabilistic method with separate Gaussian distribution. After that Xiaoyang Tan, Songcan Chen, Zhi-Hua Zhou and Fuyan Zang extended the work by proposing an alternative way of representing the face subspace using self-organizing maps (SOMs) [24][25]. The paper proposes a data structure for storing facial information of FG and identifies the fiducial points by EBGM. One of the main motivations of such an extension is that even for small occluded images the occluded parts of the face are identified. In addition, all the significant information of local feature vectors are extracted due to the algorithm’s unsupervised and nonparametric characteristic, while eliminating possible faults like outliers or missing values. In this way the compact and robust representation of subspace can be reliably learned.
The rest of the paper is organized as follows. Section II describes EBGM. The proposed method is described in Section III. The experiments are reported in Section IV. Section V concludes the paper.
-
BACKGROUND
A. LANDMARK LOCATIONS AND FEATURES
Landmarks are parts of the face that are easily located and have similar structure across all faces. Some obvious examples of landmarks are the eyes, nose, and mouth. Each of these landmarks is well defined, common to all faces, and has distinct representations in image space. Typically landmarks are defined such that their location has a very small error tolerance. Instead of defining a landmark as the “nose”, the landmark is defined as the “nose tip”. The nose tip can be located to within a few pixels. The nose in its entirety is a large structure, and so difficult to select a single point to represent its location. There are other parts of the face that would serve as poor landmarks. Some examples of these would be the forehead or cheeks. Although these are common structures of the face, it is very difficult to determine an exact location for these structures. Even if one could define such a point, such as a cheekbone, it is very difficult to locate it to a high degree of accuracy because there may be nothing in an image that cues the algorithm to such a point.
There are times when it may be necessary to define a point that is difficult to locate, such as a point on the edge of the head. In the algorithm it is important to find such points to determine the border of the face.
However, finding an exact point is difficult. If you consider a point of the side of the face near the ear, it is easy to define the horizontal coordinate of that point. Determining the vertical coordinate is much more difficult because the point is valid anywhere along the side of the face.
The EBGM algorithm uses local frequency information on facial structures to perform classification. Facial structures are typically referred to as landmarks. The concept of a landmark splits into two distinct parts: landmark locations and landmark jets.
Landmark defines the geometry of the face and location is represented by the coordinate of corresponding pixel. An example of this is the nose tip, which has a well-defined location. Landmark features are defined by the frequency information of the local regions that surround the landmark locations.
B. JETS
Jet is a condensed and robust representation of a local gray value distribution. It is based on a Gabor wavelet transform, which is a convolution with a family of complex Gabor wavelets having the shape of plane waves restricted by a Gaussian envelope function. The wavelets are similar in the sense that they can all be generated from a mother wavelet by rotation and scaling. All complex coefficients of the transform taken at one image location form a jet.
Figure 1. The graph representation of a face is based on the Gabor wavelet transform, a convolution with a set of wavelet kernels.
Since the Gabor kernels are wave-like, the coefficients vary with about the main frequency of the kernel, see imaginary part of fig. 1. This causes problem when comparing jets, because a small displacement may lead to very different coefficients. The magnitudes, however, vary slowly and can directly be used for comparison. One can also compensate for the fast coefficient variations and use them for an accurate displacement estimation. Jets are robust with respect to illumination variations, scaling, translation, and distortion.
C. ELASTIC BUNCH GRAPH MATCHING
Elastic bunch graph matching is a face recognition algorithm. The algorithm recognizes novel faces by first localizing a set of landmark features and then measuring similarity between these features. Both localization and comparison uses Gabor jets extracted at landmark positions. In localization, jets are extracted from novel images and matched to jets extracted from a set of training/model jets. Similarity between novel images is expressed as function of similarity between localized Gabor jets corresponding to facial landmarks. The overall performance of the algorithm subject to changes in the number of training/model images, choice of specific wavelet encoding, displacement estimation technique and Gabor jet similarity measure is explored in a series of independent tests. Several findings were particularly striking, including results suggesting that landmark localization is less reliable than might be expected. However, it is also striking that this did not appear to greatly degrade recognition performance.
-
PROPOSED METHOD
To enhance the quality of images and to correct distortion, images are filtered, transformed, thinned and normalized. In several pattern recognition systems, it is required that input images have the same dimensions, which can be achieved through size normalization. Here the size was fixed to 128×128 pixels for each image. The algorithm has been described with reference to Fig. 2.
Step1: Display the image and select 7 fiducial points on face (forehead, left eye, right eye, nose tip, left side of the lip, right side of the lip, chick) using landmark locations. Collect the coordinates and gray value of each point.
Step2: Store the coordinate and gray value of the fiducial points corresponding to each image into FG, say B(i) for the ith image. The structure of the FG is given below:
|
P1 |
P2 |
P3 |
P4 |
P5 |
P6 |
P7 |
|
|
P1 |
G(1) |
D12 |
D13 |
D14 |
.. |
.. |
.. |
|
P2 |
D21 |
G(2) |
D23 |
D24 |
.. |
.. |
.. |
|
P3 |
D31 |
D32 |
G(3) |
D34 |
.. |
.. |
.. |
|
P4 |
D41 |
D42 |
D43 |
G(4) |
.. |
.. |
.. |
|
P5 |
D51 |
D52 |
D53 |
D54 |
.. |
.. |
.. |
|
P6 |
D61 |
D62 |
D63 |
D64 |
.. |
.. |
.. |
|
P7 |
D71 |
D72 |
D73 |
D74 |
.. |
Where P1, P2,…. are used to indicate the fiducial points in following sequence
P1 - point selected on forehead.
P2 - point selected on left eye.
P3 - point selected on right eye.
P4 - point selected on nose tip.
P5 - point selected on left side of lip.
P6 - point selected on right side of lip.
P7 - point selected on chick.
Dij indicates the distance between the point i and j, calculated as
Dij =?((xi-xj)2 + (yi-yj)2)
Gi indicates the gray value of point i. .
Step3: Repeat Step1 & Step2 until all the images of the training set are covered.
Step4: Input the test image to be recognized.
Step5: Prepare the FG of test image using Step1 & Step2.
For each training image do:-
Step6: Calculate point-to-point distance between the training and test image fiducial points. And compute the average distance. Store the value into an array called avg(i). (avg(i) holds the value of the average value of the distance with the i-th training and test image.)
Step7: Select minimum of avg(i), which implies selection of that image having little difference with the test image compare to the other training images.
Step8: Find the maximum of avg(i), which determines the occluded image.
Step9: To identify occluded portion, extract the nodes from the corresponding FG using EBGM algorithm for which mismatch occurs.
Step10: Stop.
Figure 2: Flow of the Algorithm
-
EXPERIMENTS
To verify the performance of the proposed method we have conducted various experiments using face image AR Database.
Figure 3. Input training image: File name: a1.bmp
After supplying the image as training image the corresponding FG has been formed, shown in Table 1.
In this way we create FG for the face images of different person taking single image per person. Several images are shown below taken from the AR database for experiments.
|
P1 |
P2 |
P3 |
P4 |
P5 |
P6 |
P7 |
|
|
P1 |
83 |
11.768 |
335.39 |
? |
? |
? |
? |
|
P2 |
11.768 |
80 |
347.11 |
318.44 |
? |
? |
? |
|
P3 |
335.39 |
347.11 |
29 |
38.431 |
? |
? |
? |
|
P4 |
? |
318.44 |
38.431 |
209 |
181.66 |
158.16 |
? |
|
P5 |
? |
? |
? |
181.66 |
118 |
? |
149.83 |
|
P6 |
? |
? |
? |
158.16 |
? |
116 |
30.237 |
|
P7 |
? |
? |
? |
? |
149.83 |
30.237 |
113 |
Table 1
Performance:
To verify the performance of the proposed method, we have conducted various experiments on two well-know face image databases. The first face image databases are employed to test the performance of the SOM – face algorithm under the conditions when partial occlusion and expression variant are involved. The second database is used to explore some practical properties of the proposed algorithm, such as the selection of the size of the sub block. Except when stated otherwise all the experiment reported here were conducted as follows. In the localizing step, the training images were partitioned into non-overlapping sub-blocks with equal size. Then a single SOM network was trained in batch mode using the sub-blocks. The training process was divided into two phases as recommended by a rough training phase (to adjust the topological order of the weight vectors) and a fine adjustment phase to find tune the feature map so as to provide an accurate statistical quantification of the input space. The initial weights of all neurons were set to the greatest eigen vectors of the training data, and the neighborhood of the neurons converge exponentially to ‘1’ with the increase of training time. The soft k-NN [9] ensemble classifier was used for final classification decision.
Finally, the experimental result of SOM algorithm is compared to our method, which gives better rate of recognition. The result of SOM is given below
1. The test image is in the train images
The percentage of matching is 96.69
2. The test image is partially occluded and that test image without occlusion is in train
The percentage of matching is 49.6
3. The test image that is quite similar with any one-train image
The percentage of matching is 76.00
4. The test image that is totally different from all train
The percentage of matching is 26
5. The test image that is similar with train but in different mode
The percentage of matching is 59.60
V. CONCLUSION:
In the paper, we introduce a very simple but effective method of face recognition with one training image per person. The images are allowed to vary in expressions and have partial occlusion. The proposed method achieves comparable or better performance than the previous methods with no single extra virtual samples needed to be generated. Secondly, it shows higher robustness against expression variance and partial occlusions.
However, it is worth mentioning that the proposed method assumes that the system knows what is occluded and what is not occluded in advance. It is natural to us the questions how do decide the location and size of the occlusion. In a general case, this is a very difficult problem. Although experiment shows that our method demonstrates highly robust performance against occlusion even when the occlusion side and location is unknown to the system, the possible need of the manual effort of the user can be regarded as a drawback of the proposed system. Thus, further studies will be the focus of our future work.
Acknowledgment:
The authors would like to thank Dr. A.M. Martinez for providing the AR Database.
REFERENCES
[1]. Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1991) 71-86
[2]. Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern
Analysis and Machine Intelligence 19
(1997) 711-720
[3]. Wu, J., Zhou, Z.-H.: Face recognition with one training image per person. Pattern Recognition Letters 23 (2002) 1711-1719
[4]. Chen, S.C., Zhang, D.Q, Zhou, Z.-H.: Enhanced (PC) for face recognition with one training image per person. Pattern Recognition Letters, in press
[5]. Tan, X.Y, Chen, S.C., Zhou Z.-H., Zhang, F.Y.: Recognizing seriously occluded, expression variant faces from single training image per person with SOM-based kNN ensemble. Technical Report, Computer Science & Engineering Department, Nanjing University of Aeronautics and Astronautics, Nanjing, China, Dec. 2003
[6]. Raytchev, B., Murase, H.: Unsupervised face recognition by associative chaining. Pattern Recognition 36 (2003) 245-257
[7]. Pang, S., Kim, D., Bang, S.Y.: Membership authentication in the dynamic group by face classification using SVM ensemble. Pattern Recognition Letters 24 (2003) 215-225
[8]. Lu, J., Plataniotis, K.N., Venetsanopoulos, A.N.: Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans. Neural Networks 14 (2003) 117-126
[9]. Kohonen, T., Self-Organizing Map. 2nd edition. Springer-Verlag, Berlin (1997)
[10]. Pan, Z.S., Chen, S.C., Zhang, D.Q.: A kernel-based SOM classification in input space. Acta Electronica Sinica 32 (2004) 227-231 (in Chinese)
[11].Andras, P.: Kernel-kohonen networks. International Journal of Neural Systems, 12 (2002) 117-135
[12].Pentland, A., Moghaddam, B., Starner, T.: View-based and modular eigenspaces for face recognition. In: Proceedings of the IEEE International Conference on Computer Vision and
Pattern Recognition, Seattle, WA, (1994) 84-91
[13].Kuncheva, L.I, Whitaker, C.J.: Feature subsets for classifier combination: an enumerative experiment. In: Kittler, J., Roli, F. (eds.): Lecture Notes in Computer Science, Vol. 2096. Springer, Berlin (2001) 228-237
[14].Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.J.: The FERET database and evaluation procedure for face recognition algorithms. Image and Vision Computing 16 (1998) 295-306
[15] M. Lades, J.C. Vorbriiggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, and W. Konen, "Distortion Invariant Ob- ject Recognition in the Dynamic Link Architecture," IEEE Trans. Computers, vol. 42, no. 3, pp. 300-311,1993.
[16] L. Wiskott, J.-M. Fellous, N. Kruger, and C. von der Malsburg, "Face Recognition by Elastic Bunch Graph Matching," Technical Report IR-INI 96-08, Institut fur Neuroinformatik, Ruhr- Universitat Bochum, D-44780 Bochum, Germany, 1996.
[17] J.G. Daugman, "Complete Discrete 2D Gabor Transform by Neu- ral Networks for Image Analysis and Compression," IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 36, pp. 1,169-1,179, July 1988.
[18] W.M. Theimer and H.A. Mallot, "Phase-Based Binocular Ver- gence Control and Depth Reconstruction Using Active Vision," Proc. CVGIP: Image Understanding, vol. 60, pp. 343-358, Nov. 1994.
[19] N. Kruger, M. P6tzsch, and C. von der Malsburg, "Estimation of Face Position and Pose With Labeled Graphs," Proc. British Ma- chine Vision ConI: (BMVC96). pp. 735-743, 1996.
[20] P.J. Rauss, J. Phillips, M.K. Hamilton, and A.T. DePersia, "FERET (Face-Recognition Technology) Recognition Algorithms," Proc. Filth Automatic Target Recognizer System and Technology Symp., 1996.
[21] P.J. Phillips, P.J. Rauss, and S.Z. Der, "FERET (Face Recognition Technology) Recognition Algorithm Development and Test Re- port," Technical Report ARL-TR-995, U.S. Army Research Labo- ratory, 2800 Powder Mill Road, Adelphi, Md., Oct. 1996.
[22] N. Kruger, "An Algorithm for the Learning of Weights in Dis- crimination Functions Using A Priori Constraints," IEEE Trans. Pattern Analysis and Machine Intelligence. vol. 19, no. 7, July 1997.
[23] T. Maurer and C. von der Malsburg, "Linear Feature Transforma- tions to Recognize Faces Rotated in Depth," Proc. Int’l ConI: Artifi- cial Neural Networks, ICANN’95, Paris. pp. 353-358, Oct. 1995.
[24] L. Wiskott, "Phantom Faces for Face Analysis," Pattern Recogni- tion, vol. 30, no. 6, pp. 837-846, 1996.
[25] B.S. Manjunath, R. Chellappa, and C. von der Malsburg, "A Fea- ture-Based Approach to Face Recognition," Technical Report CAR-TR-604 or CS-TR-2834. Computer Vision Laboratory, Univ. of Maryland, College Park, Md.. 1992.
[26] T. Maurer and C. von der Malsburg, "Tracking and Learning Graphs on Image Sequences of Faces," Proc. ICANN 1996, C. von der Malsburg, W. von Seelen, J.C. Vorbruggen, and B. Sendhoff, eds., Bochum. pp. 323-328. Springer Verlag, July 1996.
Attached Files:



Loading ...


Microsoft blames WGA meltdown on…
Microsoft blames WGA meltdown on human errorReseller News, New Zealand -Aug 29, 2007By Gregg Keizer, Framingham | Thursday, August 30 2007 Microsoft…
Disciples launch higher education platform…
The Community Disciples’ youth group is taking on a new goal: raising the number of black Caldwell County students who…
2dfc3bb2b938289d2de9…
2dfc3bb2b938…
Next-gen digital downloads: Wii sets…
In a console cycle where the three offerings are so wildly different, one tie that bonds the Wii, PS3, and…
Sony Ericsson Z750 Live Photo…
Gallery by Brad Kellett on Tuesday March 27, 2007. Note: Sponsored advertising links, if any, are in green. The Sony…
Agatha Christie: An English Mystery…
Agatha Christie: An English MysteryTimes Online, UK -Sep 15, 2007Bizarrely, this eminently sensible explanation is prefaced by the breathy declaration that,…
N Korea and Burma strike…
TWO of the most secretive nations, military-run Burma and Stalinist North Korea, agreed yesterday to restore diplomatic ties in a…
Hilton turns to King for…
Hilton turns to King for post-jail interviewThe Spokesman Review, WA -14 hours agoNEW YORK â?? After being spurned by the broadcast networks,…
hbmzrriw…
hbmzrriw…
Cybertary(TM), Inc. Helps Entrepreneurs and…
Valerie Dow and Tina Angell are the Cybertary regional franchise owners in Sacramento, California. “In just six months we…
Children find fun in snow…
I brought a homemade white cake, a party mix, and fresh vegetables and dip. After the meal, games were played…
Boldly going - Minneapolis Star…
Boldly goingMinneapolis Star Tribune (subscription), MN -Jan 30, 2008The producer and writer, who moved to Los Angeles from the Twin Cities eight…