REAL TIME MULTIPLE FACE SEGMENTATION IN VIDEO SEQUENCES
Aug 2nd, 2007 by admin
![]() |
Human face analysis and recognition are crucial part in
intelligent systems. However, there are several
challenges before robust and reliable face analysis
systems can be deployed in real-world environments.
The proposed method segments human faces in a realtime
video sequence. A skin color model is built to
capture the inherent chrominance of the human skin
color. Skin likelihood image is obtained using Gaussian
fitted skin color model for each frame captured from a
streaming video. Skin color segmentation operation is
performed on the skin likelihood image using adaptive
thresholding and optimal thresholding. The proposed
method is a PC-based system that is capable of
segmenting faces to keep it in view, given an arbitrary
real -time video feed, regardless of the size, orientation
and viewpoint of the faces. Steps are also built into the
system to segment new faces as they enter the video. The
proposed method supports automatic segmentation with
no need for manual re-initialization and achieves
satisfying performance in terms of segmentation quality
and computational complexity. This approach is
computationally inexpensive and does not require any
specialized hardware.
(Note: This Paper was presented in ICSIP., Signalspot Please Download the paper
for proper formatting, images,equations and symbols)
KEY WORDS
Face segmentation, Skin color model, Gaussian model,
Face detection
1. Introduction
Recent security concerns have made the need for robust
face segmentation more imperative, especially in the
areas of security and surveillance. In addition, face
segmentation is also crucial in applications such as
teleconferencing, face and facial gesture recognition,
telecommunications, robotics as well as humancomputer
interactions. Segmentation of moving faces
from video is very important area in image sequence
analysis with direct applications to face recognition.
Methods based on analysis of difference images,
discontinuities in flow fields using clustering, line
processes or Markov random fields are presented in [1],
[2], [3].
For singular face segmentation, various methods have
been proposed in literature. Important techniques
include top down model based approach, bottom up
feature based approach, texture based approach, neural
network based approach, motion based approach and
depth based approach [4], [5], [6]. Though most of these
techniques are efficient they are computationally
expensive for real-time applications. Parametric skin
color modelling and connected component analysis
based face region segmentation for singular face in realtime
is discussed in [7].
The proposed method efficiently segments multiple
faces in a real-time video sequence. A reliable skin
color model was developed by collecting skin samples
of different ethnicities. Skin color model is then fitted
into Gaussian distribution, which is used to find skin
likelihood image of original color image from a realtime
video sequence. Skin color segmented image is
obtained from skin likelihood image by using adaptive
and optimal thresholding. Face segmentation is
achieved using Euler’s method and developed
heuristics. We have developed and tested some
heuristics, which segments multiple faces in the skin
color segmented image.
The rest of the paper is organized as follows:
Development of Skin color model and Skin likelihood
image using Gaussian fitted skin color model are
discussed in Section 2. Skin color segmentation
operation is then performed on the skin likelihood
image using adaptive thresholding and optimal
thresholding which are discussed in Section 3. Euler’s
method and development of heuristics are discussed in
Section 4 and Section 5.
2. Skin color model
The YCCr model is used to avoid the luminance
problem and to get the chromatic components. The
YCCr model equations from the RGB (RED, GREEN,
BLUE) model are given by the following Equation 1.
Y=0.257R + 0.50G +0.096B +16
C=-0.147R -0.2907G + 0.438B + 128 (1)
Cr= 0.438R-0.368G-0.07B+128
Chromatic colors have been effectively used to segment
color images in many application s [8], [9 ]. It is also
well suited in this case to segment skin regions from
non-skin regions. The color distribution of skin colors
of different people was found to be clustered in a small
area of the chromatic color space. Although skin colors
of different people appear to vary over a wide range,
they differ much less in color than in brightness. In
other words, skin colors of different people are very
close, but they differ mainly in intensities [8], [9]. With
this finding, we proceed to develop a skin-color model
in the chromatic color space. For development of skin
color model, 20 skin samples from different color
images are collected. Samples are taken from persons of
different ethnicities: American, Asian and African. They
are used to determine color distribution of human skin
in chromatic color space. The skin samples were then
filtered using a low-pass filter to reduce the effect of
noise in the samples. The color histogram revealed that
the distribution of skin-color of different people are
clustered in the chromatic color space and a skin color
distribution is represented by a Gaussian model N(ì,
cov), where:
Mean: m = E{x}
Where [ ]T
b r x = C C (2)
var : cov( , ) {( )( )} 1 2 1 1 2 2 Co iance x x = E x - m x - m
2.1. Skin likelihood image
With this Gaussian fitted skin color model, we now
obtain the likelihood of skin for any pixel of an image.
Therefore, if a pixel, having transformed from RGB
color space to chromatic color space (having a
chromatic pair value of (Cr, C)), the likelihood of skin
for this pixel is then computed as follows [10], [11]:
exp[ 0.5( ) cov ( )]
( , )
- - m 1 - m
= =
x - x
Likelihood P C C
T
r b
Where [ ]T
r b x = C C (3)
Hence, this skin color model is transform ing a color
image into a Gray scale image such that the gray value
at each pixel shows the likelihood of the pixel belonging
to the skin.
3. Adaptive and optimal thresholding
Since the skin regions are brighter than the other parts
of the images, the skin regions are segm ented from the
rest of the image through a thresholding process [12].
To process different images of different people with
different skin, a fixed threshold value is not possible to
be found. Since people with different skins have
different likelihood, an adaptive thresholding process is
required to achieve the optimal threshold value for each
run.
The adaptive thresholding is based on the observation
that stepping the threshold value down may intuitively
increase the segmented region. However, the increas e in
segmented region will gradually decrease (as percentage
of skin color regions detected approaches 100%), but
will increase sharply when the threshold value is
considerably too small that other non-skin regions get
included. The threshold value at which the minimum
increase in region size is observed while stepping down
the threshold value will be the optimal threshold. The
threshold value is decremented from 0.55 to 0.05 in
steps of 0.1. Then skin color regions are segmented
from the rest of the regions with this optimal threshold
value (optimal thresholding). Using this technique of
adaptive thresholding and optimal thresholding, many
frames yield good results; the skin-colored regions are
effectively segmented from the non-skin colored
regions. The Original color frame in presence of
multiple faces from a real-time video sequence in Figure
1 (a). The Skin likelihood image of Figure 1 (a) is
shown in Figure 1 (b). The Skin color segmented image
of Figure 1 (b) is shown in Figure 1 (c).
It was found that not all detected skin regions contain
faces. Some correspond to the hands and arms and other
exposed part of the body, while some corresponds to
background objects with colors similar to those of the
skin.
4. Euler’s Method
A face region is defined as a closed region in the image,
which is having one or more holes inside it. These holes
are due to eyes, nose and mouth. Face region color
boundary is represented by pixels with value one for
binary images. Face region is also a set of connected
components within an image. All holes in a binary
image have pixel value of zero (black).
After experimenting with several images, it was found
that a face region should have at least one hole inside
that region. Therefore those regions that have no holes
are eliminated. To determine the number of holes inside
a region, we have computed, the Euler number [5] of the
region, defined as:
E=C-H (4)
Where E: the Euler number C: t he number of
connected components H: t he number of holes in
a region.
The number of connected components (i.e. the face
region) is set to one by considering one face region. The
number of holes is, then:
H=1-E (5)
Where H : the number of holes in a region and E: the
Euler number.
5. Development of heuristics
To study the region, the area and center of the region are
determined. Before finding area and centre, the holes in
the region are filled. There are many methods to
compute area and centre of the region. One efficient
way is to compute the center of mass (centroid) of the
region [8]. The center of area in binary images is the
same as the center of the mass and it is computed as
shown below:
1/ ( [ , ])
1 1
1 ?par = =
=
i
m
j
x A jB i j
?par = =
=
i
m
j
y A iB i j
1 1
1 1/ ( [ , ]) (6)
where, B is the matrix of size [n x m] representation of
the region. A is the area in pixels of the region.
The human face is normally vertically oriented.
However, some of them have a little inclination. A
unique orientation is found by elongating the object
[13]. The orientation of the region is determined by the
orientation of the axis of elongation. In this axis the
inertia was found to be minimum. The axis is computed
by finding the line for which the sum of the squared
distances between region points and the line is
minimum. In other words, the least-squares of a line to
the region points in the image were computed [8]. At
the end of the process, the angle of inclination (?) is
given by [14, 15]:
q = 1 / 2(a tan( b / a - c)) (7)
Where: ?par = =
=
i
m
j
a x ij B i j
1 1
( ’ )2 [ , ]
?par = =
=
i
m
j
b x ijx ijB i j
1 1
2 ’ ’ [ , ]
?par = =
=
i
m
j
c y ij B i j
1 1
( ’ )2 [ , ]
and:
1
1
’
’
y y y
x x x
= -
= -
After several experiments, it was found that the height
to width ratio (heuristics) of the human face is in
between 0.8-1.6.
These heuristics segments a face region from skin and
skin colored regions. The Face segmented image of
Figure 1 (c) is shown in Figure 2 (a). The holes in the
segmented image are filled and it is multiplied with
Gray scale image as in Figure 2 (b) and the result image
is an image having face region as shown in Figure 2 (c).
(a) Original color image (b) Skin likelihood image (c) Skin color segmented image.
Figure 1. Skin color segmentation in presence of multiple faces
(a) Face segmented image (b) Gray scale image (c) Just face image
Figure 2. Face segmentation of multiple faces
6. Experimental Results
The system was comprehensively tested with 35 male
and 5 female test subjects. The testing is performed
with illumination variation, distance variation, face
variation, pose variation, occlusions and orientations.
Results obtained are shown in Figure 3. The proposed
method segments multiple faces in a real-time video
sequence, which is independent of illumination,
distance, face, pose, occlusions and orientations. This
approach segments new faces as they enter the video
without manual re-initialization. A segmentation rate of
87.5% was obtained. The frames captured from a realtime
video sequence are shown in Figure 4 and the
corresponding segmented frames are shown in Figure 5.
Segmentation rate for different face orientations is
shown in Table 1.
7. Conclusion
The contribution of the proposed method is an optimal
blend of skin-color, Euler’s method and heuristics to
achieve high performance in real -time. The proposed
method is a novel real-time multiple face segmentation
system which captures image sequence from a camera
and segments efficiently a face. This approach is a
method for real -time efficient multiple face
segmentation under illumination, pose, face and scale
variation. The entire algorithm for multiple face
segmentation is implemented in VC++ on a 2.4 GHz P4
machine and takes it an average of 6msec per frame to
segment. The system can be further developed to
incorporate the following (i) Real Time Multiple face
detection, recognition and tracking in commercial
security system. (ii) Real Time Multiple Object
detection, recognition and tracking. (iii) a
videoconferencing system so as to allow the
conferencing participants to mov e freely around the
room while the system keeps the participants in view.
(a) Less illumination (b) More illumination
(c) Distance variation (d) Occlusions
(e) Orientations
Figure 3. Face segmentation with different testing parameters
Figure 4 . Frames captured from a real-time video sequence
Figure 5.Face segmented frames of Figure 4.
Table 1. Segmentation rate for different face orientations
Face orientation Multiple face
segmentation rate (in %)
Front Profile (00) 97.50
Left Profile (250) 90.00
Right Profile (250) 85.00
Tilted 90o to the left 65.00
Tilted 90o to the right 72.50
Tilted upwards (100) 92.50
Tilted downwards (100) 80.00
References
[1] W.N. Martin and J.K. Aggarwal, “Dynamic scene
analysis: A survey”, Computer Vision, Graphics and
Image Process, Vol. 7, 1978, pp. 356-374.
[2] J.K. Aggarwal and N. Nandhakumar, “On the
computation of motion from sequences of images”,
Proc. IEEE, Vol. 76, 1988, pp. 917-935.
[3] R. Chellappa, C.L. Wilson, and S. Sirohey, “Human
and Machine Recognition of Faces: A Survey”, Proc. of
the IEEE, Vol. 83(5), 1995, pp. 705-740.
[4] K. Sobottka, I. Pitas, “A novel method for
automatic face segmentation, facial feature extraction
and tracking”, Signal processing: Image
communication, 12, pp. 263- 281, 1998.
[5] S.Spors and R. Rabenstein ,” A Real-Time Face
Tracker For Color Video” , IEEE Int. Conf. on
Acoustics, Speech & Signal Processing, Utah, USA,
May 2001.
[6] K. C. Yow, R. Cipolla, Feature-based human face
detection, Image and Vision Computing, no. 15, pp.
713-735, 1997.
[7] R.Srikantaswamy and R.D.Sudhaker Samuel, “A
Novel High-performance Real Time Face Recognition
Engine using Radial Basis Function (RBF) Neural
Networks in a Video Sequence”, International
Conference on Human Machine Interface, 20-23 Dec
2004, I.I.Sc.Bangalore, India.
[8] R. Ramesh, Kasturi R. and Schunck B., Machine
Vision, pp 31 - 51, McGraw Hill, New York 1995
[9] D. Comaniciu, V. Ramesh, and P. Meer, "Kernelbased
object tracking," IEEE Transactions on Pattern
Analysis and Machine Intelligence 25(5) , pp. 564–577,
2003.
[10] Yi-Tsung Chien, Yea-Shuan Huag, Sheng-Wen
Jeng, Yao-Hong Tasi, and Hung- Xin Zhao, “Real-Time
Surveillance System by Use of the Face Understanding
Technologies”, Proc. VIIth Digital Image Computing:
Techniques and Applications, Sun C., Talbot H.,
Ourselin S. and Adriaansen T. (Eds.) , 10-12 Dec. 2003,
Sydney.
[11] J. Strom, T.Jebara, S.Basu and A.Pentaland, “Real
Time Tracking and Modelling of Faces”, Proceedings
of the Modelling People Workshop at ICCV’99 .
[12] Rafael C. Gonzalez and Richard E. Woods,
“Digital Image Processing” Pearson Education ,2nd
Edition, 2002.
[13] L. Goldmann, M. Krinidis, N. Nikolaidis, S.
Asteriadi and T. Sikora ,” An Integrated System For
Face Detection and Tracking”, IEEE Proc. Of ICFG,
pp. 176-181, 1995.
[14] N.Malasne, F.Yang, M.Paindavoine, “Real Time
Implementation of a Face Tracking”,IEEE Transactions
on neural networks, VoL7, No.5, pp. 1121-1138,1996 .
[15] Kohsia S. Huang and Mohan M. Trivedi,” Robust
Real -Time Detection, Tracking, and Pose Estimation of
Faces in Video Streams”, 17th International Conference
on Pattern Recognition Cambridge, United Kingdom,
Aug 23-26, 2004.
Attached Files:



Loading ...

