Smart mobility has been a hot topic for decades. To narrow down the range to our daily life, we want to have intelligent vehicles. The main goal of making vehicles intelligent is to improve the convenience of human life and enhance the quality of life. To make intelligent vehicles part of our daily life, we need to incorporate ‘cognition’ and ‘perception’ with scene understanding algorithms to fill the gap between machine vision and human vision. Scene understanding is an interpretation of objects, events, and actions that comprise an input scene. It adds cognition to computer vision algorithms, and helps the machine to understand the ‘why’ within a scene. Present day algorithms are capable of detecting, segmenting, and recognizing a collection of objects within a scene and making a hierarchical interpretation based on obtained results. However, such an approach has the following pitfalls:
Currently, scene understanding is driven by either top-down template matching tasks or bottom-up hierarchical approach. A top-down approach outputs different semantics such as the category labels and the within-category attributes or states. However, such a task-dependent recognition strategy hasn’t taken into account the perceptual properties of object recognition in human vision.
A hierarchical approach of scene understanding ignores the intrinsic structure of the data and limits itself to the results generated by a bottom-up grouping.
The current algorithms work only in well-controlled environments. However, they easily fail in the real-world cases since there are a variety of conditional changes, which leads to complications in scene understanding.
Looking at scene understanding as a bottom-up approach leads to, treating detection and recognition algorithms on a frame based level and then combining them temporally for analysis. Adding a temporal component throughout the network is essential for perception.
An intuitive solution would be to integrate ‘cognition’ to networks by adding ‘perception’. Some of the potential applications are listed below followed by a course of action that integrates perception with learning and tests its robustness.
Pedestrian action recognition: help decision making for autonomous vehicles.
Vehicle action detection: detect the actions from your neighbor vehicles ⇒ help drivers decide to react with what kinds of action.
Emergency detection: emergency may happen from all directions (e.g. blind-spot).
Road condition classification: the vehicles may need to reduce the speed in rainy or snowy days.
Lane prediction: what if the lane marks are not very clear? Human still know how to drive because we can imagine the lanes with our experience (‘perception’).
Robustness: During a thunderstorm or foggy conditions, when visibility is low, a human driver can still recognize objects and generally drive safely. In its present state, a deep learning algorithm would need copious amounts of bad weather data to robustly segment video frames. A perceptually capable learning algorithm can change that.