Scene reconstruction from video sequences has become a prominent computer vision research area in recent years, due to its large number of applications in fields such as security, robotics and virtual reality. Despite recent progress in this field, there are still a number of issues that manifest as incomplete, incorrect or computationally-expensive reconstructions. The engine behind achieving reconstruction is the matching of features between images, where common conditions such as occlusions, lighting changes and texture-less regions can all affect matching accuracy. Subsequent processes that rely on matching accuracy, such as camera parameter estimation, structure computation and non-linear parameter optimization, are also vulnerable to additional sources of error, such as degeneracies and mathematical instability. Detection and
correction of errors, along with robustness in parameter solvers, are a must in order to achieve a very accurate final scene reconstruction. However, error detection is in general difficult due to the lack of ground-truth information about the given scene, such as the absolute position of scene points or GPS/IMU coordinates for the camera(s) viewing the scene.
In this dissertation, methods are presented for the detection, factorization and correction of error sources present in all stages of a scene reconstruction pipeline from video, in the absence of ground-truth knowledge. Two main applications are discussed. The first set of algorithms derive total structural error measurements after an initial scene structure computation and factorize errors into those related to the underlying feature matching process and those related to camera parameter estimation. A brute-force local correction of inaccurate feature matches is presented, as well as an improved conditioning scheme for non-linear parameter optimization which applies weights on input parameters in proportion to estimated camera parameter errors. Another application is in reconstruction pre-processing, where an algorithm detects and discards frames that would lead to inaccurate feature matching, camera pose estimation degeneracies or mathematical instability in structure computation based on a residual error comparison between two different match motion models. The presented algorithms were designed for aerial video but have been proven to work across different scene types and camera motions, and for both real and synthetic scenes.