Coherent Scene Understanding with 3D Geometric Reasoning

Jiyan Pan
doctoral dissertation, tech. report CMU-RI-TR-14-06, Robotics Institute, Carnegie Mellon University, April, 2014

  • Adobe portable document format (pdf) (49MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

When looking at a single 2D image of a scene, humans could effortlessly un- derstand the 3D world behind the scene even though stereo and motion cues are not available. Due to this remarkable human capability, one of the ultimate goals of computer vision is to enable machines to automatically infer the 3D structure of a scene given a single 2D image. This dissertation proposes methods that produce a geometrically and semantically coherent 3D interpretation of urban scenes from a single image, and shows the benefits of reasoning in 3D when analyzing 2D images.

In this dissertation, we model an urban scene using three types of elements. The first type is global geometries such as ground plane and gravity direction. The second type is objects such as cars and pedestrians that have definitive shapes and extents. The third type is vertical surfaces such as building facades that do not have definitive shapes and extents. Such a modeling allows for a richer characterization of an urban scene than existing works.

To tackle the inherent ambiguity involved in recovering the 3D structure from a single 2D image, we systematically identify geometric constraints among the three types of elements in our model, and encode such constraints in a Conditional Ran- dom Field (CRF). For objects, we consider both their global geometric compatibil- ity with ground plane and gravity direction, and their local geometric compatibility between adjacent objects. For building facades, we decompose them into a set of continuously-oriented planes mutually related by 3D geometric relationships, and constrained by nearby objects in 3D. We also propose a generalized RANSAC al- gorithm to make the inference of the model tractable. We show that performing 3D geometric reasoning using our model benefits individual tasks such as object detec- tion, viewpoint estimation, and facade layout recovery. In addition, it yields a more informative interpretation of the 3D scene behind the image.


Text Reference
Jiyan Pan, "Coherent Scene Understanding with 3D Geometric Reasoning," doctoral dissertation, tech. report CMU-RI-TR-14-06, Robotics Institute, Carnegie Mellon University, April, 2014

BibTeX Reference
   author = "Jiyan Pan",
   title = "Coherent Scene Understanding with 3D Geometric Reasoning",
   booktitle = "",
   school = "Robotics Institute, Carnegie Mellon University",
   month = "April",
   year = "2014",
   number= "CMU-RI-TR-14-06",
   address= "Pittsburgh, PA",