JUHE API Marketplace
DATASET
Open Source Community

VT-MOT

The VT‑MOT dataset was created by the Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, at Anhui University. It is a large‑scale visible‑light and thermal‑infrared video benchmark for multi‑object tracking, containing 582 video pairs (401 k frame pairs) captured from UAVs, surveillance cameras, and handheld devices, with precise spatio‑temporal alignment and 3.99 million high‑quality bounding boxes. The dataset was produced through meticulous frame‑by‑frame alignment and double‑checked annotation, ensuring high quality and density. VT‑MOT is intended for multi‑object tracking in challenging environments, leveraging the complementary strengths of visible and thermal modalities.

Updated 8/2/2024
arXiv

Description

PFTrack Dataset Overview

Dataset Introduction

PFTrack is a large‑scale visible‑light and thermal‑infrared multi‑object tracking video dataset, named VT‑MOT. Its main characteristics are:

  1. Scale and Diversity: 582 video pairs, 401 k frame pairs, captured from surveillance, UAV, and handheld platforms.
  2. High‑Precision Cross‑Modal Alignment: Frame‑level spatial and temporal alignment performed by professionals.
  3. Dense High‑Quality Annotations: 3.99 million annotated boxes, manually verified, covering occlusions and re‑identification challenges.

Contributions

  • Constructed a large‑scale visible‑light‑thermal infrared multi‑object tracking dataset VT‑MOT for all‑weather and all‑time research.
  • Performed manual spatio‑temporal alignment for all video sequences, providing high‑quality aligned data and dense annotations.
  • Proposed a simple yet effective progressive fusion tracking framework that efficiently fuses temporal and complementary information from both modalities.

Dataset Structure

The dataset is organized as follows:

${PFTrack_ROOT}
|-- data
|   `-- VTMOT
|       `-- train
|           |-- video1
|           |   |-- visible
|           |   |   |-- 0000001.jpg
|           |   |   |-- 0000002.jpg
|           |   |   ...
|           |   |-- infrared
|           |   |   |-- 0000001.jpg
|           |   |   |-- 0000002.jpg
|           |   ...
|           |   |-- gt
|           |   |   `-- gt.txt
|           |   `-- seqinfo
|           `-- video2
|           ...
|       `-- test
|           ...
|       `-- annotations
|           |-- train.json
|           `-- test.json

Usage

Training

python -u main.py tracking --modal RGB-T --save_all --exp_id VTMOT_PFTrack \
    --dataset mot_rgbt --dataset_version mot_rgbt \
    --load_model "./exp/tracking/VTMOT_RGBT/***.pth" \
    --batch_size 12 --pre_hm --ltrb_amodal --same_aug \
    --hm_disturb 0.05 --lost_disturb 0.4 --fp_disturb 0.1 \
    --gpus 0

Testing

python test_rgbt.py tracking --modal RGB-T --test_mot_rgbt True \
    --exp_id VTMOT_PFTrack --dataset mot_rgbt --dataset_version mot_rgbt \
    --pre_hm --ltrb_amodal --track_thresh 0.4 --pre_thresh 0.5 \
    --load_model ./exp/tracking/VTMOT_RGBT/model.pth

Evaluation

cd trackeval
python run_mot_challenge.py

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Multi‑Object Tracking
Video Analysis

Source

Organization: arXiv

Created: 8/2/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.