Loading...
Thumbnail Image
Item

ABC-VLM: Attending & Bridging Context in Vision-Language Modeling for Medical Image Analysis

Teja Krishna Cherukuri
Citations
Altmetric:
Abstract

Accurate medical image analysis and reporting in retinal imaging is essential for the early detection of vision-threatening conditions like diabetic retinopathy, enabling timely diagno- sis and effective treatment decisions. However, the inherent challenges—variability in lesion appearance, scarcity of labeled data, and the complexity of integrating visual and textual information—hinder the effectiveness of conventional approaches. This research introduces a novel approach, hypothesizing that attending to and bridging context through guided mech-anisms can enhance the accuracy and interpretability of retinal image analysis, enabling precise retinopathy diagnosis and comprehensive medical report generation. We propose ABC-VLM: Attending & Bridging Context in Vision-Language Modeling for Medical Image Analysis. The Attending Context is realized through the Guided Context Gating (GCG) mechanism, which systematically attends to global context, spatial correlations, and localized lesion details by integrating Context Formulation, Channel Correlation, and Guided Gating. This approach enhances diagnostic precision, achieving a significant 6.53% improvement in severity classification accuracy compared to state-of-the-art attention models. The Bridging Context is achieved via the Guided Context Self-Attention based Multi-modal Medical Vision Language Transformer (GCS-M3VLT), which bridges visual and textual modalities by aligning fine-grained lesion features with clinical context, yielding a 0.023 BLEU@4 boost in medical report generation on the DeepEyeNet dataset. By explicitly attending to and bridging context, ABC-VLM addresses key limitations in retinal image analysis and medical reporting, offering a transformative approach that sets a new benchmark for data-limited and resource-constrained medical applications.

Comments
Description
Date
2025-04-01
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
Retinal Image Analysis, Diabetic Retinopathy, Medical Report Generation, Vision Language Modeling, Guided Context Gating, Guided Context Self Attention
Citation
Teja Krishna Cherukuri (2025). "ABC-VLM: Attending & Bridging Context in Vision-Language Modeling for Medical Image Analysis." Thesis, Georgia State University. https://doi.org/10.57709/6vh1-eq58
Embargo Lift Date
2025-04-01
Embedded videos