JUHE API Marketplace
DATASET
Open Source Community

NewsVideoDataset

The dataset contains 2,883 news videos with a total duration of 151,474 seconds, 13,431 tags, 3,302 sentences, and 9,179 unique tokens. It is intended for educational and research purposes, and the videos are sourced from the AFP news agency on YouTube.

Updated 11/14/2023
github

Description

Dataset Overview

Dataset Name

  • Name: NewsVideoDataset

Dataset Statistics

  • Total Videos: 2,883
  • Total Duration: 151,474 seconds
  • Number of Labels: 13,431
  • Number of Sentences: 3,302
  • Number of Unique Tokens: 9,179
  • Average Duration per Video: 52.5 seconds
  • Average Number of Labels per Video: 4.7
  • Average Number of Sentences per Video: 1.2
  • Average Number of Unique Tokens per Video: 3.2

Dataset Acquisition Method

  • Video Download: Use the youtube-dl program to batch download videos and their metadata via the urls.txt file.
  • Metadata Processing: Use the python pack_data.py script to clean and pack the metadata.

Citation Information

  • Citation Format:

@inproceedings{whitehead2018KaVD, Author = {Whitehead, Spencer and Ji, Heng and Bansal, Mohit and Chang, Shih-Fu and Voss, Clare R.}, title={Incorporating Background Knowledge into Video Description Generation}, booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year={2018}, month={November}, publisher={Association for Computational Linguistics}, location={Brussels, Belgium} }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

News Video
Education Research

Source

Organization: github

Created: 8/26/2018

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.