Back to datasets
Dataset assetOpen Source CommunityText RetrievalChinese Language Processing
viking-education
This dataset is for Chinese text retrieval tasks, consisting of two parts: a document corpus and a query set. The corpus contains 10,574 documents, each with a unique ID, content, and image information. The query set contains 100 queries, each with a unique ID, content, type, and IDs of relevant documents.
Source
huggingface
Created
Dec 12, 2024
Updated
Dec 20, 2024
Signals
90 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Language
- Chinese
Multilinguality
- Monolingual
Task Category
- Text Retrieval
Task ID
- Document Retrieval
Configuration Names
- corpus
- queries
Tags
- Text Retrieval
Dataset Information
Configuration Name: corpus
- Features
- Corpus_id: string
- Corpus_content: string
- Corpus_image: string
- Split
- corpus: 10,574 samples
Configuration Name: queries
- Features
- Query_id: string
- Query_content: string
- Query_type: string
- Corpus_id: string
- Split
- queries: 100 samples
Configurations
Configuration Name: corpus
- Data Files
- Split: corpus
- Path: corpus.jsonl
Configuration Name: queries
- Data Files
- Split: queries
- Path: queries.jsonl
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.