Large vision language model: enhanced-RSCLIP with exemplar-image prompting for uncommon object detection in satellite imagery
Name:
electronics-14-03071-v2.pdf
Size:
4.276Mb
Format:
PDF
Description:
final published version
Abstract
Large Vision Language Models (LVLMs) have shown promise in remote sensing applications, yet struggle with “uncommon” objects that lack sufficient public labeled data. This paper presents Enhanced-RSCLIP, a novel dual-prompt architecture that combines text prompting with exemplar-image processing for cattle herd detection in satellite imagery. Our approach introduces a key innovation where an exemplar-image preprocessing module using crop-based or attention-based algorithms extracts focused object features which are fed as a dual stream to a contrastive learning framework that fuses textual descriptions with visual exemplar embeddings. We evaluated our method on a custom dataset of 260 satellite images across UK and Nigerian regions. Enhanced-RSCLIP with crop-based exemplar processing achieved 72% accuracy in cattle detection and 56.2% overall accuracy on cross-domain transfer tasks, significantly outperforming text-only CLIP (31% overall accuracy). The dual-prompt architecture enables effective few-shot learning and cross-regional transfer from data-rich (UK) to data-sparse (Nigeria) environments, demonstrating a 41% improvement over baseline approaches for uncommon object detection in satellite imagery.Citation
Efunogbon T, Efunogbon A, Liu E, Li D, Qiu R (2025) 'Large vision language model: enhanced-RSCLIP with exemplar-image prompting for uncommon object detection in satellite imagery', Electronics, 14 (15) 3071Publisher
MDPIJournal
ElectronicsAdditional Links
https://www.mdpi.com/2079-9292/14/15/3071Type
ArticleLanguage
enISSN
2079-9292Sponsors
This research received no external funding.ae974a485f413a2113503eed53cd6c53
10.3390/electronics14153071
Scopus Count
Collections
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Green - can archive pre-print and post-print or publisher's version/PDF


