VideoRetrieval
The dataset includes three configurations: corpus, default, and queries. The corpus configuration contains document IDs, text, and titles, split into a dev partition with 100,930 samples and a total size of 8,580,491 bytes. The default configuration contains query IDs, document IDs, and scores, also in a dev split with 1,000 samples and 27,968 bytes. The queries configuration contains query IDs and text, in a dev split with 1,000 samples and 34,156 bytes.
Description
Dataset Overview
Dataset Configurations
Configuration Name: corpus
- Features:
_id: stringtext: stringtitle: string
- Split:
dev:- Bytes: 8,580,491
- Samples: 100,930
- Download Size: 7,277,662 bytes
- Dataset Size: 8,580,491 bytes
- Data Files:
dev:corpus/dev-*
Configuration Name: default
- Features:
query-id: stringcorpus-id: stringscore: int64
- Split:
dev:- Bytes: 27,968
- Samples: 1,000
- Download Size: 17,445 bytes
- Dataset Size: 27,968 bytes
- Data Files:
dev:data/dev-*
Configuration Name: queries
- Features:
_id: stringtext: string
- Split:
dev:- Bytes: 34,156
- Samples: 1,000
- Download Size: 29,116 bytes
- Dataset Size: 34,156 bytes
- Data Files:
dev:queries/dev-*
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 11/28/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.