JUHE API Marketplace
DATASET
Open Source Community

VideoRetrieval

The dataset includes three configurations: corpus, default, and queries. The corpus configuration contains document IDs, text, and titles, split into a dev partition with 100,930 samples and a total size of 8,580,491 bytes. The default configuration contains query IDs, document IDs, and scores, also in a dev split with 1,000 samples and 27,968 bytes. The queries configuration contains query IDs and text, in a dev split with 1,000 samples and 34,156 bytes.

Updated 12/1/2024
huggingface

Description

Dataset Overview

Dataset Configurations

Configuration Name: corpus

  • Features:
    • _id: string
    • text: string
    • title: string
  • Split:
    • dev:
      • Bytes: 8,580,491
      • Samples: 100,930
  • Download Size: 7,277,662 bytes
  • Dataset Size: 8,580,491 bytes
  • Data Files:
    • dev: corpus/dev-*

Configuration Name: default

  • Features:
    • query-id: string
    • corpus-id: string
    • score: int64
  • Split:
    • dev:
      • Bytes: 27,968
      • Samples: 1,000
  • Download Size: 17,445 bytes
  • Dataset Size: 27,968 bytes
  • Data Files:
    • dev: data/dev-*

Configuration Name: queries

  • Features:
    • _id: string
    • text: string
  • Split:
    • dev:
      • Bytes: 34,156
      • Samples: 1,000
  • Download Size: 29,116 bytes
  • Dataset Size: 34,156 bytes
  • Data Files:
    • dev: queries/dev-*

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Video Retrieval

Source

Organization: huggingface

Created: 11/28/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.