Dataset assetOpen Source CommunityRemote Sensing TechnologyImage‑Text Alignment

LHRS-Align

LHRS‑Align is a large‑scale, semantically rich and feature‑diverse remote‑sensing image‑text alignment dataset. It leverages volunteer geographic information (VGI) from OpenStreetMap and remote‑sensing images from Google Earth, containing 1.15 million high‑quality RS image‑text pairs.

Source

github

Created

Feb 4, 2024

Updated

Jul 16, 2024

Signals

420 views

Availability

Linked source ready

Overview

Dataset description and usage context

LHRS‑Bot Dataset Overview

Dataset Introduction

LHRS‑Bot is a multimodal large language model (MLLM) that utilizes globally available volunteer geographic information (VGI) and remote‑sensing (RS) images. The model demonstrates deep understanding of RS images and capabilities for complex reasoning in the RS domain.

Dataset Release Information

15 July 2024: Updated paper available on arXiv.
9 July 2024: Evaluation benchmark LHRS‑Bench released.
2 July 2024: Paper accepted at ECCV 2024; training scripts and data open‑sourced.
7 Feb 2024: Model weights available on Google Drive and Baidu Disk.
2 Feb 2024: Code and checkpoint released.

Dataset Preparation

Installation

Clone the repository:

git clone git@github.com:NJU-LHRS/LHRS-Bot.git
cd LHRS-Bot

Create a virtual environment:

conda create -n lhrs python=3.10
conda activate lhrs

Install dependencies:
```
pip install -e .
```

Checkpoints

LLaMA2‑7B‑Chat:
- Automatic download via Hugging Face token.
- Manual download from the provided links if needed.
LHRS‑Bot checkpoints (stages 1‑3) are hosted on Baidu Disk and Google Drive; ensure the TextLoRA folder and FINAL.pt reside in the same directory.

Training

Data preparation and formatting follow the instructions in [DataPrepare/README.md]. Training stages 1‑3 use distinct scripts and directories.

Demonstration

Web UI (Gradio):

python lhrs_webui.py -c Config/multi_modal_eval.yaml \
    --checkpoint-path ${PathToCheckpoint}.pt \
    --server-port 8000 \
    --server-name 127.0.0.1 \
    --share

CLI:

python cli_qa.py -c Config/multi_modal_eval.yaml \
    --model-path ${PathToCheckpoint}.pt \
    --image-file ${TheImagePathYouWantToChat} \
    --accelerator "gpu" \
    --temperature 0.4 \
    --max-new-tokens 512

Acknowledgements

We thank the following repositories for their excellent work:

Disclaimer

If you find our work useful, please star the GitHub repository and consider citing our paper:

@misc{2402.02544,
Author = {Dilxat Muhtar and Zhenshi Li and Feng Gu and Xueliang Zhang and Pengfeng Xiao},
Title = {LHRS‑Bot: Empowering Remote Sensing with VGI‑Enhanced Large Multimodal Language Model},
Year = {2024},
Eprint = {arXiv:2402.02544},
}

License: Apache

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio