VidSTG
Video AnalysisSpatio‑Temporal Localization
The VidSTG dataset is built on the video relation dataset VidOR for spatio‑temporal video grounding tasks, especially handling multi‑form sentences. It includes video partition files and sentence annotation files, detailing video IDs, frame counts, frame rates, dimensions, as well as object, relation and temporal ground‑truth annotations.
Source githubUpdated Apr 22, 2024339 viewsLinked
Inspect dataset