JUHE API Marketplace
DATASET
Open Source Community

LHRS-Align

LHRS‑Align is a large‑scale, semantically rich and feature‑diverse remote‑sensing image‑text alignment dataset. It leverages volunteer geographic information (VGI) from OpenStreetMap and remote‑sensing images from Google Earth, containing 1.15 million high‑quality RS image‑text pairs.

Updated 7/16/2024
github

Description

LHRS‑Bot Dataset Overview

Dataset Introduction

LHRS‑Bot is a multimodal large language model (MLLM) that utilizes globally available volunteer geographic information (VGI) and remote‑sensing (RS) images. The model demonstrates deep understanding of RS images and capabilities for complex reasoning in the RS domain.

Dataset Release Information

  • 15 July 2024: Updated paper available on arXiv.
  • 9 July 2024: Evaluation benchmark LHRS‑Bench released.
  • 2 July 2024: Paper accepted at ECCV 2024; training scripts and data open‑sourced.
  • 7 Feb 2024: Model weights available on Google Drive and Baidu Disk.
  • 2 Feb 2024: Code and checkpoint released.

Dataset Preparation

Installation

  1. Clone the repository:
    git clone git@github.com:NJU-LHRS/LHRS-Bot.git
    cd LHRS-Bot
    
  2. Create a virtual environment:
    conda create -n lhrs python=3.10
    conda activate lhrs
    
  3. Install dependencies:
    pip install -e .
    

Checkpoints

  • LLaMA2‑7B‑Chat:
    • Automatic download via Hugging Face token.
    • Manual download from the provided links if needed.
  • LHRS‑Bot checkpoints (stages 1‑3) are hosted on Baidu Disk and Google Drive; ensure the TextLoRA folder and FINAL.pt reside in the same directory.

Training

Data preparation and formatting follow the instructions in [DataPrepare/README.md]. Training stages 1‑3 use distinct scripts and directories.

Demonstration

  • Web UI (Gradio):
    python lhrs_webui.py -c Config/multi_modal_eval.yaml \
        --checkpoint-path ${PathToCheckpoint}.pt \
        --server-port 8000 \
        --server-name 127.0.0.1 \
        --share
    
  • CLI:
    python cli_qa.py -c Config/multi_modal_eval.yaml \
        --model-path ${PathToCheckpoint}.pt \
        --image-file ${TheImagePathYouWantToChat} \
        --accelerator "gpu" \
        --temperature 0.4 \
        --max-new-tokens 512
    

Acknowledgements

We thank the following repositories for their excellent work:

Disclaimer

If you find our work useful, please star the GitHub repository and consider citing our paper:

@misc{2402.02544,
Author = {Dilxat Muhtar and Zhenshi Li and Feng Gu and Xueliang Zhang and Pengfeng Xiao},
Title = {LHRS‑Bot: Empowering Remote Sensing with VGI‑Enhanced Large Multimodal Language Model},
Year = {2024},
Eprint = {arXiv:2402.02544},
}

License: Apache

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Remote Sensing Technology
Image‑Text Alignment

Source

Organization: github

Created: 2/4/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.