NewsVideoDataset
The dataset contains 2,883 news videos with a total duration of 151,474 seconds, 13,431 tags, 3,302 sentences, and 9,179 unique tokens. It is intended for educational and research purposes, and the videos are sourced from the AFP news agency on YouTube.
Description
Dataset Overview
Dataset Name
- Name: NewsVideoDataset
Dataset Statistics
- Total Videos: 2,883
- Total Duration: 151,474 seconds
- Number of Labels: 13,431
- Number of Sentences: 3,302
- Number of Unique Tokens: 9,179
- Average Duration per Video: 52.5 seconds
- Average Number of Labels per Video: 4.7
- Average Number of Sentences per Video: 1.2
- Average Number of Unique Tokens per Video: 3.2
Dataset Acquisition Method
- Video Download: Use the
youtube-dlprogram to batch download videos and their metadata via theurls.txtfile. - Metadata Processing: Use the
python pack_data.pyscript to clean and pack the metadata.
Citation Information
- Citation Format:
@inproceedings{whitehead2018KaVD, Author = {Whitehead, Spencer and Ji, Heng and Bansal, Mohit and Chang, Shih-Fu and Voss, Clare R.}, title={Incorporating Background Knowledge into Video Description Generation}, booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year={2018}, month={November}, publisher={Association for Computational Linguistics}, location={Brussels, Belgium} }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 8/26/2018
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.