Loading...
Thumbnail Image
Item

Towards Vision and Language Models Aided Object Navigation

Liu, Weizhen
Citations
Altmetric:
Abstract

In this work, we present a novel hierarchical navigation policy for object navigation that leverages both object detection models and large language models (LLMs) to enhance the interpretation and interaction with complex indoor environments. Our approach integrates object detection to accurately assess the surrounding space and employs a layout reconstruction strategy to model the environment’s structure. By defining our navigation strategy hierarchically, we separate the decision-making into long-term and short-term goals, effectively utilizing the existing concept of ”frontier-based goal selection.” We refine this method by representing frontiers through a series of observations transformed into language via object detection models. Each frontier is then scored using LLMs, allowing for a reasoned selection of the most promising navigational targets. Our framework, simple yet effective, not only aligns with the demands of dynamic and unknown environments but also surpasses existing baselines in terms of efficiency and accuracy, offering significant advancements in the field of robotic navigation. Code can be found at https://github.com/weizhenFrank/ObjNav.

Description
Date
2024-05-01
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
Object navigation, LLMs, Frontier representation, Decisionmaking
Citation
Embedded videos