Date of Award
5-1-2024
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Jonathan Shihao Ji
Abstract
In this work, we present a novel hierarchical navigation policy for object navigation that leverages both object detection models and large language models (LLMs) to enhance the interpretation and interaction with complex indoor environments. Our approach integrates object detection to accurately assess the surrounding space and employs a layout reconstruction strategy to model the environment’s structure. By defining our navigation strategy hierarchically, we separate the decision-making into long-term and short-term goals, effectively utilizing the existing concept of ”frontier-based goal selection.” We refine this method by representing frontiers through a series of observations transformed into language via object detection models. Each frontier is then scored using LLMs, allowing for a reasoned selection of the most promising navigational targets. Our framework, simple yet effective, not only aligns with the demands of dynamic and unknown environments but also surpasses existing baselines in terms of efficiency and accuracy, offering significant advancements in the field of robotic navigation. Code can be found at https://github.com/weizhenFrank/ObjNav.
DOI
https://doi.org/10.57709/36972831
Recommended Citation
Liu, Weizhen, "Towards Vision and Language Models Aided Object Navigation." Thesis, Georgia State University, 2024.
doi: https://doi.org/10.57709/36972831
File Upload Confirmation
1