Turn Videos in Images into Stories in Texts
The following example shows the working space of city traffic built in CamSeer.
A working space is the context for CamSeer and tell the following things to CarSeer:
1. Where is it living.
2. What to look at.
3. What to report.
4. The inner parameters(don’t need to take care by the user).
For example, when you have lots of surveillance video for the traffic system in a city,
you can just tell CamSeer that it is living in a city and will look at the traffics.
Then CamSeer will automatically load a working space called “City Traffic” as the starting
point to work. The user don't need to worry about the following things:
1: Should I tell CamSeer where the traffic lanes are?
2. Should I tell CamSeer to observe vehicles and pedestrians?
3. Should I tell CamSeer that a reverse driving pattern is dangerous and should
be reported to me?
4. Should I tell CamSeer to observe breakdown vehicles?
No, you just input a freeform text “city traffic” and CamSeer will do the all the rest
for you by:
1. Learn the lane configurations of the road automatically.
2. Construct the traffic flows for you automatically.
3. Report to you unusually driving pattern such as reverse driving and
breakdown vehicles, crossing road at dangerous zone, etc.
Let us input “city traffic”, the following workspace will be loaded by CamSeer.
The left panel shows the things that the CamSeer needs to recognize and actions it needs to take. Also the working environments such as, the tyWednesday, 05/31/2006 12:53 PM#EndDate --> CarSeer. However, since there is no human expert to tell CarSeer what to do, CarSeer need to learn everything from the video raw data. In this case, the first two things CarSeer need to learns are the configuration of traffic lanes and the background. Let us load the first video clip and see how CarSeer to survive unknown traffics of an unknown city road.
This is a city road of three lanes in each direction and a bike/motorcycle route with a sidewalk at each side of the road. The first challenge for CamSeer is to find all these lanes, the routes and the sidewalks all by itself. The following screenshot shows the primary results of lane structures based on the first few video frames.
After a while, CamSeer figures out that there is a main traffic lane and a sidewalk and
a bike lane for each traffic direction. As shown in the following screenshot,
when CamSeer feel confidence to its learning results, it will issue a “lane stable”
signal to indicate that the lane structure had been figures out from the video frame
and a “background stable” signal to indicate that it understand the background and
the foreground for motion understanding purpose. Note that the red and green colors
representing different directions of the traffic flows.
After learning all these basic settings all by itself, CamSeer is now ready to perform
its duties specified in the workspace. The actions that CamSeer can take are:
Issue alarm when any person or vehicles go to wrong ways;
Observe breakdown cars, etc.
The following screenshot showed that CamSeer issued an alarm when two girls walk along the motorcycle route heading the direction of traffic flow.
The following screenshot showed that CamSeer issued an alarm when a person parked a motorcycle and stayed in the motorcycle route. At first CamSeer issued alarms of “Alarm breakdown and/or objects block traffic flow”, after then CamSeer enlarged the regions of the objects and recognize them. In this case, CamSeer recognized that the object staying in the motorcycle route was a pedestrian.
Armed with the advanced and powerful "look-into-image" ability of
Yang's Cognitive Image Search Engine,
CamSeer can provide even powerful potentials for city traffic monitoring
with the next generation of high-resolution CCTV cameras.
Some examples of such applications are:
Find and locate suspect vehicles by using the vehicle-detecting and face recognition abilities of Yang’s Cognitive Image Search Engine.
For homeland security applications, we can develop some special workspaces for
CamSeer to monitor suspicious terrorism behaviors such as leave some package on
Golden Gate Bridge and etc. by using
the landmark recognition abilities built in Yang’s Cognitive Image Search Engine.
What else can we do? The list goes on and on because once we built the CamSeer image understanding toolbox, everything we can search by using PicSeer can also be located in any video stream. For example, we can easily understand human activated from videos as shown in [here].


