Date of Award
Fall 11-24-2021
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Shihao Ji
Abstract
Current stereo matching techniques are challenged by restricted searching space, occluded regions and sheer size. While monocular depth estimation is spared from these challenges and can achieve satisfactory results with monocular cues, the lack of stereoscopic relationship renders the monocular prediction less reliable on its own especially in highly dynamic or cluttered environments. To address these issues in both scenarios, an optic-chiasm-inspired self-supervised binocular depth estimation method is proposed in thesis, wherein vision transformer with gated positional cross-attention layer is designed to enable feature-sensitive pattern retrieval between views, while retaining the extensive context information aggregated through self-attentions. This crossover design is biologically analogous to the optic-chasma structure in human visual system and hence the name, ChiTransformer. It leverages strengths of both monocular and binocular approaches. Our experiments show this architecture yields substantial improvements on self-supervised stereo approaches by 15% and can be used on both rectilinear images and fisheye images.
DOI
https://doi.org/10.57709/26632434
Recommended Citation
Su, Qing, "ChiTransformer: Towards Reliable Stereo from Cues." Thesis, Georgia State University, 2021.
doi: https://doi.org/10.57709/26632434
File Upload Confirmation
1