Dataset assetOpen Source CommunityVideo AnalysisMulti‑Object Tracking

VT-MOT

The VT‑MOT dataset was created by the Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, at Anhui University. It is a large‑scale visible‑light and thermal‑infrared video benchmark for multi‑object tracking, containing 582 video pairs (401 k frame pairs) captured from UAVs, surveillance cameras, and handheld devices, with precise spatio‑temporal alignment and 3.99 million high‑quality bounding boxes. The dataset was produced through meticulous frame‑by‑frame alignment and double‑checked annotation, ensuring high quality and density. VT‑MOT is intended for multi‑object tracking in challenging environments, leveraging the complementary strengths of visible and thermal modalities.

Source

arXiv

Created

Aug 2, 2024

Updated

Aug 2, 2024

Signals

1,257 views

Availability

Linked source ready

Overview

Dataset description and usage context

PFTrack Dataset Overview

Dataset Introduction

PFTrack is a large‑scale visible‑light and thermal‑infrared multi‑object tracking video dataset, named VT‑MOT. Its main characteristics are:

Scale and Diversity: 582 video pairs, 401 k frame pairs, captured from surveillance, UAV, and handheld platforms.
High‑Precision Cross‑Modal Alignment: Frame‑level spatial and temporal alignment performed by professionals.
Dense High‑Quality Annotations: 3.99 million annotated boxes, manually verified, covering occlusions and re‑identification challenges.

Contributions

Constructed a large‑scale visible‑light‑thermal infrared multi‑object tracking dataset VT‑MOT for all‑weather and all‑time research.
Performed manual spatio‑temporal alignment for all video sequences, providing high‑quality aligned data and dense annotations.
Proposed a simple yet effective progressive fusion tracking framework that efficiently fuses temporal and complementary information from both modalities.

Dataset Structure

The dataset is organized as follows:

${PFTrack_ROOT}
|-- data
|   `-- VTMOT
|       `-- train
|           |-- video1
|           |   |-- visible
|           |   |   |-- 0000001.jpg
|           |   |   |-- 0000002.jpg
|           |   |   ...
|           |   |-- infrared
|           |   |   |-- 0000001.jpg
|           |   |   |-- 0000002.jpg
|           |   ...
|           |   |-- gt
|           |   |   `-- gt.txt
|           |   `-- seqinfo
|           `-- video2
|           ...
|       `-- test
|           ...
|       `-- annotations
|           |-- train.json
|           `-- test.json

Usage

Training

python -u main.py tracking --modal RGB-T --save_all --exp_id VTMOT_PFTrack \
    --dataset mot_rgbt --dataset_version mot_rgbt \
    --load_model "./exp/tracking/VTMOT_RGBT/***.pth" \
    --batch_size 12 --pre_hm --ltrb_amodal --same_aug \
    --hm_disturb 0.05 --lost_disturb 0.4 --fp_disturb 0.1 \
    --gpus 0

Testing

python test_rgbt.py tracking --modal RGB-T --test_mot_rgbt True \
    --exp_id VTMOT_PFTrack --dataset mot_rgbt --dataset_version mot_rgbt \
    --pre_hm --ltrb_amodal --track_thresh 0.4 --pre_thresh 0.5 \
    --load_model ./exp/tracking/VTMOT_RGBT/model.pth

Evaluation

cd trackeval
python run_mot_challenge.py

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio