Back to datasets
Dataset assetOpen Source CommunityText RetrievalChinese Language Processing

viking-education

This dataset is for Chinese text retrieval tasks, consisting of two parts: a document corpus and a query set. The corpus contains 10,574 documents, each with a unique ID, content, and image information. The query set contains 100 queries, each with a unique ID, content, type, and IDs of relevant documents.

Source
huggingface
Created
Dec 12, 2024
Updated
Dec 20, 2024
Signals
90 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Language

  • Chinese

Multilinguality

  • Monolingual

Task Category

  • Text Retrieval

Task ID

  • Document Retrieval

Configuration Names

  • corpus
  • queries

Tags

  • Text Retrieval

Dataset Information

Configuration Name: corpus

  • Features
    • Corpus_id: string
    • Corpus_content: string
    • Corpus_image: string
  • Split
    • corpus: 10,574 samples

Configuration Name: queries

  • Features
    • Query_id: string
    • Query_content: string
    • Query_type: string
    • Corpus_id: string
  • Split
    • queries: 100 samples

Configurations

Configuration Name: corpus

  • Data Files
    • Split: corpus
    • Path: corpus.jsonl

Configuration Name: queries

  • Data Files
    • Split: queries
    • Path: queries.jsonl
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio