Back to datasets
Dataset assetOpen Source CommunityVideo Retrieval
VideoRetrieval
The dataset includes three configurations: corpus, default, and queries. The corpus configuration contains document IDs, text, and titles, split into a dev partition with 100,930 samples and a total size of 8,580,491 bytes. The default configuration contains query IDs, document IDs, and scores, also in a dev split with 1,000 samples and 27,968 bytes. The queries configuration contains query IDs and text, in a dev split with 1,000 samples and 34,156 bytes.
Source
huggingface
Created
Nov 28, 2024
Updated
Dec 1, 2024
Signals
131 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Configurations
Configuration Name: corpus
- Features:
_id: stringtext: stringtitle: string
- Split:
dev:- Bytes: 8,580,491
- Samples: 100,930
- Download Size: 7,277,662 bytes
- Dataset Size: 8,580,491 bytes
- Data Files:
dev:corpus/dev-*
Configuration Name: default
- Features:
query-id: stringcorpus-id: stringscore: int64
- Split:
dev:- Bytes: 27,968
- Samples: 1,000
- Download Size: 17,445 bytes
- Dataset Size: 27,968 bytes
- Data Files:
dev:data/dev-*
Configuration Name: queries
- Features:
_id: stringtext: string
- Split:
dev:- Bytes: 34,156
- Samples: 1,000
- Download Size: 29,116 bytes
- Dataset Size: 34,156 bytes
- Data Files:
dev:queries/dev-*
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.