Author

Qing SuFollow

Date of Award

Fall 11-24-2021

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Shihao Ji

Abstract

Current stereo matching techniques are challenged by restricted searching space, occluded regions and sheer size. While monocular depth estimation is spared from these challenges and can achieve satisfactory results with monocular cues, the lack of stereoscopic relationship renders the monocular prediction less reliable on its own especially in highly dynamic or cluttered environments. To address these issues in both scenarios, an optic-chiasm-inspired self-supervised binocular depth estimation method is proposed in thesis, wherein vision transformer with gated positional cross-attention layer is designed to enable feature-sensitive pattern retrieval between views, while retaining the extensive context information aggregated through self-attentions. This crossover design is biologically analogous to the optic-chasma structure in human visual system and hence the name, ChiTransformer. It leverages strengths of both monocular and binocular approaches. Our experiments show this architecture yields substantial improvements on self-supervised stereo approaches by 15% and can be used on both rectilinear images and fisheye images.

DOI

https://doi.org/10.57709/26632434

File Upload Confirmation

1

Share

COinS