DESIGN AND IMPLEMENTATION OF A SMART CAMERA SURVEILLANCE SYSTEM
Jul 27th, 2007 by admin
DESIGN AND IMPLEMENTATION OF A SMART CAMERA SURVEILLANCE SYSTEM
ABSTRACT
This paper describes Smart Camera Surveillance System, which detects a person in a scene in indoor environment and works in real time under various disturbances, such as changes in lighting condition, presence of noise in images etc. It does not require prior knowledge of background and is almost invariant to background color. It clubs change detection with human detection processes to detect a human robustly. The work proposes very low cost method for surveillance system by using single static web camera, instead of using high cost CCD camera. Anyone can view the detected human from any where in real time. It just needs to have a system connected in LAN. Basically the System is designed to: 1) detect any change occurred in the scene, 2) detect if change is occurred due to person in the scene, 3) if person is detected then the live video is transmitted to server, 4) access of live image in real time by anyone connected in the LAN. The testing of the system is done in laboratory room in real-time using dedicated Pentium IV machine running Windows XP for capturing live images and for human detection. Another dedicated Pentium IV machine is used which is connected in LAN also using window XP in other building acting as a Server. It receives video stream online in real-time whenever person is detected and also transmits it to all other systems in LAN by broadcasting video stream. Java Media Framework (JMF) API is used for transmitting the video stream in real time.
![]() |
KEY WORDS
Video Surveillance, Human Detection, Change and Skin Color Detection
1. Introduction
Smart Camera Surveillance System (SCSS) is one of the most important ongoing research topics throughout the world. It comprises of many aspects and has immense applications in security purpose. SCSS deals with continuous monitoring of area under surveillance, and to detect the change that is of interest and generate signal accordingly. Surveillance system can be indoor or outdoor. Basically the area to monitor usually contains several objects. An image of the area at a given instance of time represents projection of the area covered at that instance by the camera, which depends on the orientation and position of the camera [1]. CMU’s Video Surveillance and Monitoring and MIT AI Lab’s Forest of Sensors project are some recent efforts in the field. Interests have further been strengthened by its commercial applications such as surveillance of office buildings, airports, also in military ones [2].
Coming to conventional method of surveillance system, it has lot of limitations that makes it non-ideal for different applications. As recording the surveillance video is of less use because it just records the video stream and can be used only to provide proof that some security breach has occurred. Second aspect of conventional method is to employ a regular security worker to continuously have an eye on the live video, which is expensive and prone to human errors due to various reasons. Even though a security person is having an eye on video, conventional method just provides video input, without providing any helping assistance to person on work.
SCSS overcomes above mentioned limitations by detecting suspicious activity and conveys an alarm to the security person. It uses one dedicated computer system to monitor the scene and to detect any unusual activity. It uses the concepts of change detection, human detection and motion detection of human motion. This paper discusses a System that can be used as a robust, reliable and an autonomous system for detecting human in the scene and transmitting the video stream in LAN for access to the person concerned. It uses temporal subtraction with an adaptive background subtraction method [3] to detect the changed object which can be a human. Basically background method is of little use if any new object is added in the background, but adaptive filter solves that problem effectively. It also does not require prior image of the background.
Further connected component analysis and blob analysis are used to filter out useless objects. These might be due to noise as low cost web camera is used to acquire image which is highly prone to noise. Further it clubs both the methods of shape analysis of human and color modeling to detect the presence of person in the change detected image. This makes the system more reliable to variation in luminosity. Color modeling is performed after change detection, which makes the system invariant to background color to a large extent. It uses Sobel edge detection filter to extract the structure of set of all the blobs which can be human, and then performs structural analysis to match with a human structure. It calculates the aspect ratio of the blobs detected. As the height of the human is larger than the width, aspect ratio has been defined as height of the human by his/her width, to extract all the blobs which satisfy the above criteria. The details of the implemented system, results obtained and conclusions drawn are explained in the following sections.
2. Theoretical Description
Change detection
Detection of changes in two successive frames is a very important step for many dynamic vision applications. Any perceptible motion in a scene results in some change in the sequence of frames of the scene. Motion characteristics can be analyzed if such changes are detected. It can be very good quantitative estimate of the motion components of an object, when motion is restricted to a plane that is parallel to the image plane. The most obvious method of detecting changes between two frames is to directly compare the corresponding pixels of the two frames to determine whether they are the same [1]. So the two frames F(x, y, i+1) and F(x, y, i) are obtained from web camera. Where x, y are coordinates of the pixel location and i and i+1 are ith and ith+1 images of video frame. Adaptive filter is used to develop background image which is used subsequently with temporal approach for change detection [3]. Adaptive filter adjusts the pixel values of the image in run time. Adaptive filter stores all the pixels which are ignored in change detection and human detection process. These pixels can be small changes or any new object added to background which is not human. When human is detected in the scene the adaptive filter stores the image pervious to the scene in which human is detected. Now this image is used for subsequent change and human detection.
Human detection
In order to determine connected components and sets of blobs, 8-connected pixel filter is applied (refer Fig. 1). 8-connected pixel filter helps in finding number of blobs with number of pixels in each blob.
Fig. 1. Eight connected pixel filter
Eight connected filter is used to identify whether the pixels are connected to any other pixel. Pixel found connected to any other pixels is assigned same component number. It also counts the total number of pixels connected together.
Next skin modeling is done. Skin color modeling [4] is used by many researches for thresholding and segmentation; and works fine in indoor environment and in our case also increases the probability of finding a human in the blob extracted. Skin color is searched after all preprocessing and change detection phase. So the system becomes almost invariant to background color. Also small blobs with skin color which may cause false alarm in all previous work where skin modeling is done are removed. For skin modeling, 120 images of different skin color with their RGB value of size 32*32 pixels were taken. Mean of the red and green colors of total number of pixels are calculated. Chromacity of the skin color depends mostly on green and red colors. Blue contribution in skin color estimation is very less and is therefore neglected. After getting mean value of red and green colors of all images, variance and standard deviation are calculated. These values are used for calculating the probability of each pixel having the skin color [4].
Sobel edge detection filter is applied to determine silhouette of all the blobs. Sobel method [5] finds edges using the Sobel approximation to the derivative. It returns edges at those points where the gradient of I is maximum where I is intensity value. After applying Sobel filter, silhouette of all the blobs are obtained. Now structural analysis is done which further increases the probability of detecting human in the scene. The set of blobs are checked for aspect ratio. Aspect ratio is calculated as height of blob by width of blob. Next the entire blob is matched for aspect ratio of human; it considers that person is not sitting, but can be in any other position (height > width).
3. Details of Proposed Scheme
Main task for Smart Camera Surveillance System is to detect a human in scene and transmit the video stream. It can further be accessed by any one in LAN. Main steps involved as shown in Fig. 2 are explained below.
Fig. 2. Steps for SCSS
Image Grabbing
This module captures images in real time and stores current video frame in real time after every one second to a specified location of memory. This picture of video frame is subsequently used for change detection and for human detection. Once human is detected in current frame, video stream is transmitted to server from where it can be accessed by any one in LAN. Image frames are obtained by using a low cost, single static web camera attached to the system.
Change Detection
This module basically deals with detecting any change occurring in the current scene. The steps for change detection are shown in Fig. 3. Change detection is performed in dynamic indoor environment, with static camera. It uses temporal subtraction method with adaptation filter and background subtraction method. Here background need not be initialized, as it uses adaptation method to build background from images obtained in real time. While temporal subtraction makes the system invariant of light, the background subtraction makes it more robust in detecting slow changes and also makes the system background color independent.
Output of Image Grabbing Module
Input to Human Detection module
Fig. 3. Steps for Change Detection
Further the system increases the efficiency of the temporal subtraction by analyzing the changes over a sequence of frames rather than just the recent image. It increases the efficiency of slow changes in the scene and makes system more robust. Basically temporal method is very simple to implement but is more prone to noise which can be due to change in illumination. In addition electronic noise of the camera may lead to wrong alarms. However combining the above methods reduces it by a large extent. After this step pixel thresholding is done. Once thresholding is applied and change is detected, then human detection module is applied. Actually human detection module makes change detection more robust. It assists the module to adapt if any new object other than human is detected which is not of interest.
Human Detection
Once change is detected, the human detection module searches for the cause of the change. As shown in Fig. 4 this module deals with detecting of a human in the image and tracking him in subsequent images of scene. Detection of human is invariant to color of the cloth he/she might be wearing and also with the position of the human in the image. Even if some part of the body is not visible, due to look angle or occluding, it detects the person in most of the circumstances. Basically change detection and human detection modules assist each other functionally.
Output of Change Detection Module
Call Transmit Video Stream Module
Fig. 4. Steps for Human Detection
Human detection module uses blob analysis to search set of connected component objects with more than certain thresholding value. If any component is found then it applies subsequent algorithms and if not then manipulates adaptive filter of change detection module. If blobs of connected components are found with more than certain thresholding value then skin modeling is applied. After skin modeling is applied each blob is matched for aspect ratio of human. Shape analysis is carried out by applying edge detection filter on binary image. It assumes that width of the person with hands stretched is less than height of the person. Basically features of all the above methods are combined to perform human detection robustly in all the conditions. It increases the computation cost but leads to more satisfactory results.
Transmit Video Stream
Once the human is detected, the video is transmitted to the server. For transmitting to the server in real time, Java Media Framework (JMF) is used. JMF is an API of java which is used to transmit the video stream in real time to a remote site. JMF API [6] uses real time protocol (RTP) to transmit the video stream, as soon as a person is detected by the above modules. It initiates the transfer of video stream. Server continuously polls for any incoming video stream. As soon as the client initiates the transfer of video stream, server starts receiving the video stream in real time. Server can be placed anywhere in LAN.
The Java Media Framework API (JMF) enables audio, video and other time-based media to be added to applications and applets built on Java technology. This optional package, which can capture, playback, stream, and transcode multiple media formats, extends the Java 2 Platform, Standard Edition (J2SE) for multimedia developers by providing a powerful toolkit to develop scalable, cross-platform technology.
Accessing Video Stream
Once the video stream is transferred to server, any one connected in the LAN can access the video in real time. Everyone in LAN can ask for the service of server anytime but will receive the video image as soon as a person is detected in the scene to be surveyed. Server performs two tasks. Firstly, it receives the image from the site and than transfers it to multiple clients attached to it. The transfer of video stream to computers in LAN is also performed by JMF using RTP protocol. Server broadcasts the video stream in LAN. If the IP address of Server is static, it can be accessed by anyone in internet by running client receive module.
4. Implementation and Results
Image processing part of change detection and human detection is done in Matlab 6. Online transmission and receiving of video stream part is done in Java using JMF. Two dedicated Pentium IV systems are used. One system is kept at the site where images are grabbed and detection part is performed. Second system is used as Server which receives video frame in real-time whenever human is detected. It keeps polling for any input video stream coming from the first system. As soon as it receives video stream it multicasts the video stream which can accessed by anyone in LAN. The results of change detection and human detection modules are shown on two examples in Figures 5 to 7 and Figures 8 to 11 respectively. In first case, the object is a human being and in second case the object is a chair.
5. Conclusions and Future Scope
Smart Camera Surveillance System has proved to be a topic of great interest by many research organizations in recent time especially after September 11 incident in USA. Its requirement is increasing considering persistent human errors due to various reasons in conventional surveillance systems. The proposed system provides a cost effective solution for surveillance. Implemented SCSS works very well in most of the circumstances. However, it consumes some time to process when a person appears in the scene. It has been found through experiments that clubbing all the above algorithms gives better results than used singly as drawback of one method is overcome in another method. Change detection method followed by human detection step helps better in Surveillance System. It uses temporal subtraction with adaptive background subtraction method to detect the changed object which can be a human. This helps to make the system robust to luminous change and background color. It uses connected component analysis and blob analysis to filter out useless objects. It further clubs structural analysis of human, color skin modeling and aspect ratio to determine the human in the scene. Even if any part of body is occluding than also system is able to detect the person. But if the person is at the edge of the scene then the probability of finding the person is reduced significantly.
There is still some scope for improvement in the system. The present system uses blob detection method for shape analysis. Other algorithms such as correlation based method may be tried out along with above method to increase the robustness of the system. A new module may be added, if needed, for maintaining the database of images whenever a person is detected.
Acknowledgements
The authors express their sincere gratefulness to Dr. M. D. Tiwari, Director, I.I.I.T Allahabad, for his encouragement and support through out the course of this work and also for facilitating technical and literature facilities, required in the development of this work.
References
[1] R. Jain, R. Kasturi, & B.G. Schunck, Machine Vision (McGraw-Hill, 1995).
[2] Wayne Wolf & I. Burak Ozer, A Smart Camera for Real Time Human Activity Recognition. Proc. of IEEE Workshop on Signal Processing Systems, 26-28 Sept. 2001, 217 - 224.
[3] Stefan Huwer & Heinrich Niemann, Adaptive Change Detection for Real-Time Surveillance Applications. Proc. of 3rd IEEE International Workshop on Visual Surveillance, July 2000, 37 - 46.
[4] Vladimir Vezhnevets, Vassili Sazonov, & Alla Andreeva, A Survey on Pixel-Based Skin Color Detection Techniques. Proc. of Graphicon-2003, Moscow, Russia, September 2003, 85-92.
[5] R.C. Gonzalez & R.E. Woods, Digital Image Processing, Second Ed. (Pearson Education, 2002).
[6] http://java.sun.com/products/java-media/jmf/: info regarding JMF.
(Note: This Paper was presented in ICSIP Please Download the paper for proper formatting, images and symbols)
Attached Files:
Attached Files:



Loading ...

