Back to datasets
Dataset assetOpen Source CommunityVideo AnalysisSpatio‑Temporal Localization
VidSTG
The VidSTG dataset is built on the video relation dataset VidOR for spatio‑temporal video grounding tasks, especially handling multi‑form sentences. It includes video partition files and sentence annotation files, detailing video IDs, frame counts, frame rates, dimensions, as well as object, relation and temporal ground‑truth annotations.
Source
github
Created
Mar 24, 2020
Updated
Apr 22, 2024
Signals
339 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Source
- VidSTG: Constructed from the video relation dataset VidOR.
Composition
- Original VidOR: 7,000 training videos, 835 validation videos, and 2,165 test videos (test annotations are unavailable and thus omitted).
- VidSTG: 10 % of the training videos are used as validation data; the original validation set serves as the test set.
Contents
- Video Partition Files:
train_files.json,val_files.json,test_files.jsoncontaining video IDs for each split. - Sentence Annotation Files:
train_annotations.json,val_annotations.json,test_annotations.json.
Annotation Structure
- Video ID: Unique identifier.
- Frame Count: Number of frames.
- Resolution: Width and height.
- Subject/Object List: IDs and categories.
- Temporal Segment: Frame range used.
- Relations: Subject ID, object ID, predicate, and frame range.
- Temporal Ground‑Truth: Time span of each relation.
- Caption: Descriptive sentence.
- Question: Query sentence about the video.
Citation
If you use this dataset, please cite:
- VidSTG paper: Zhang, Zhu et al. "Where Does It Exist: Spatio‑Temporal Video Grounding for Multi‑Form Sentences". CVPR, 2020.
- VidOR paper: Shang, Xindi et al. "Annotating Objects and Relations in User‑Generated Videos". International Conference on Multimedia Retrieval, 2019.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.