Back to datasets
Dataset assetOpen Source CommunityVideo AnalysisSpatio‑Temporal Localization

VidSTG

The VidSTG dataset is built on the video relation dataset VidOR for spatio‑temporal video grounding tasks, especially handling multi‑form sentences. It includes video partition files and sentence annotation files, detailing video IDs, frame counts, frame rates, dimensions, as well as object, relation and temporal ground‑truth annotations.

Source
github
Created
Mar 24, 2020
Updated
Apr 22, 2024
Signals
339 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Source

  • VidSTG: Constructed from the video relation dataset VidOR.

Composition

  • Original VidOR: 7,000 training videos, 835 validation videos, and 2,165 test videos (test annotations are unavailable and thus omitted).
  • VidSTG: 10 % of the training videos are used as validation data; the original validation set serves as the test set.

Contents

  • Video Partition Files: train_files.json, val_files.json, test_files.json containing video IDs for each split.
  • Sentence Annotation Files: train_annotations.json, val_annotations.json, test_annotations.json.

Annotation Structure

  • Video ID: Unique identifier.
  • Frame Count: Number of frames.
  • Resolution: Width and height.
  • Subject/Object List: IDs and categories.
  • Temporal Segment: Frame range used.
  • Relations: Subject ID, object ID, predicate, and frame range.
  • Temporal Ground‑Truth: Time span of each relation.
  • Caption: Descriptive sentence.
  • Question: Query sentence about the video.

Citation

If you use this dataset, please cite:

  • VidSTG paper: Zhang, Zhu et al. "Where Does It Exist: Spatio‑Temporal Video Grounding for Multi‑Form Sentences". CVPR, 2020.
  • VidOR paper: Shang, Xindi et al. "Annotating Objects and Relations in User‑Generated Videos". International Conference on Multimedia Retrieval, 2019.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio