Back to datasets
Dataset assetOpen Source CommunityEducation ResearchNews Video

NewsVideoDataset

The dataset contains 2,883 news videos with a total duration of 151,474 seconds, 13,431 tags, 3,302 sentences, and 9,179 unique tokens. It is intended for educational and research purposes, and the videos are sourced from the AFP news agency on YouTube.

Source
github
Created
Aug 26, 2018
Updated
Nov 14, 2023
Signals
167 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: NewsVideoDataset

Dataset Statistics

  • Total Videos: 2,883
  • Total Duration: 151,474 seconds
  • Number of Labels: 13,431
  • Number of Sentences: 3,302
  • Number of Unique Tokens: 9,179
  • Average Duration per Video: 52.5 seconds
  • Average Number of Labels per Video: 4.7
  • Average Number of Sentences per Video: 1.2
  • Average Number of Unique Tokens per Video: 3.2

Dataset Acquisition Method

  • Video Download: Use the youtube-dl program to batch download videos and their metadata via the urls.txt file.
  • Metadata Processing: Use the python pack_data.py script to clean and pack the metadata.

Citation Information

  • Citation Format:

@inproceedings{whitehead2018KaVD, Author = {Whitehead, Spencer and Ji, Heng and Bansal, Mohit and Chang, Shih-Fu and Voss, Clare R.}, title={Incorporating Background Knowledge into Video Description Generation}, booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year={2018}, month={November}, publisher={Association for Computational Linguistics}, location={Brussels, Belgium} }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio