DATASET
Open Source Community
viking-education
This dataset is for Chinese text retrieval tasks, consisting of two parts: a document corpus and a query set. The corpus contains 10,574 documents, each with a unique ID, content, and image information. The query set contains 100 queries, each with a unique ID, content, type, and IDs of relevant documents.
Updated 12/20/2024
huggingface
Description
Dataset Overview
Language
- Chinese
Multilinguality
- Monolingual
Task Category
- Text Retrieval
Task ID
- Document Retrieval
Configuration Names
- corpus
- queries
Tags
- Text Retrieval
Dataset Information
Configuration Name: corpus
- Features
- Corpus_id: string
- Corpus_content: string
- Corpus_image: string
- Split
- corpus: 10,574 samples
Configuration Name: queries
- Features
- Query_id: string
- Query_content: string
- Query_type: string
- Corpus_id: string
- Split
- queries: 100 samples
Configurations
Configuration Name: corpus
- Data Files
- Split: corpus
- Path: corpus.jsonl
Configuration Name: queries
- Data Files
- Split: queries
- Path: queries.jsonl
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Text Retrieval
Chinese Language Processing
Source
Organization: huggingface
Created: 12/12/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.