Publications

Thesis



Scene Understanding with Multi-view Geometry and Semantics

Cosimo Rubino

Abstract

Inferring a generic 3D scene by using multi-view methods has been extensively investi- gated since the beginning of the research in Computer Vision. However, performance is generally low when the observed scene is complex: Strong shading variations, illumina- tion changes can affect heavily the final estimation, especially if the scene is composed by moving objects with smooth and untextured surfaces. For example, urban street scenes are characterised by all these difficulties, and classical approaches based solely on geometri- cal cues can give poor results. A strategy to overcome the limitations that arise in some scenarios consists in exploiting the semantic information to improve the robustness of ge- ometric approaches. Semantics can be also used to localise objects inside the scene and to separate them from the surrounding environment. This thesis proposes novel approaches for scene understanding using RGB images, in particular for the motion segmentation and the object localisation problems.
For segmenting motions two novel frameworks are described: A pair-wise consensus and a n-view optimisation based approaches. Both of them employ a state-of-art object detector to derive the semantics. The pair-wise method adopts a RANSAC strategy for fitting the motions, where the selection of the samples is driven by a semantic score con- fidence. The n-view framework utilises geometrical constraints and known object classes associated to the urban-street level scenario to over-constrain the problem and to better separate long-term trajectories belonging to background or objects, reducing the effect of the noise.
The object localisation task is performed by a multi-view technique, which handles the information provided by the object detector through the bounding boxes in order to estimate the volume occupied by the objects. The method is geometric and has been formulated in closed form for both the perspective and orthographic camera models. An extensive campaign of experiments has been performed for all the techniques, showing that the inclusion of high-level reasoning in geometrical approaches leads to better results, especially when dealing with realistic scenarios.


[pdf]