NewsVideoDataset
The dataset contains 2,883 news videos with a total duration of 151,474 seconds, 13,431 tags, 3,302 sentences, and 9,179 unique tokens. It is intended for educational and research purposes, and the videos are sourced from the AFP news agency on YouTube.
Dataset description and usage context
Dataset Overview
Dataset Name
- Name: NewsVideoDataset
Dataset Statistics
- Total Videos: 2,883
- Total Duration: 151,474 seconds
- Number of Labels: 13,431
- Number of Sentences: 3,302
- Number of Unique Tokens: 9,179
- Average Duration per Video: 52.5 seconds
- Average Number of Labels per Video: 4.7
- Average Number of Sentences per Video: 1.2
- Average Number of Unique Tokens per Video: 3.2
Dataset Acquisition Method
- Video Download: Use the
youtube-dlprogram to batch download videos and their metadata via theurls.txtfile. - Metadata Processing: Use the
python pack_data.pyscript to clean and pack the metadata.
Citation Information
- Citation Format:
@inproceedings{whitehead2018KaVD, Author = {Whitehead, Spencer and Ji, Heng and Bansal, Mohit and Chang, Shih-Fu and Voss, Clare R.}, title={Incorporating Background Knowledge into Video Description Generation}, booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year={2018}, month={November}, publisher={Association for Computational Linguistics}, location={Brussels, Belgium} }
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.