Explore high-quality datasets for your AI and machine learning projects.
How2 is a multimodal dataset containing approximately 80,000 instructional videos (~2,000 hours) with English subtitles and summaries. About 300 hours of videos have been crowd‑translated into Portuguese and were used in the JSALT 2018 workshop. The dataset can be used for speech recognition, speech summarization, text summarization, and their multimodal extensions.