Back to datasets
Dataset assetOpen Source CommunityLanguage ModelsInformation Retrieval

wangrongsheng/RerankerLLM-Dataset

This dataset supports training and testing of re‑ranking large language models (LLMs). It contains queries sampled from the MS MARCO dataset along with rankings predicted by ChatGPT. Files include 10 K‑ and 100 K‑scale query sets and their corresponding ChatGPT predictions.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 5, 2024
Signals
48 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

License

  • License Type: Apache-2.0

Task Category

  • Task Category: Summarization

Dataset Size

  • Dataset Size Range: 10K < n < 100K

Dataset Files

File NameDescription
marco-train-10k.jsonlContains 10K queries sampled from MS MARCO
marco-train-10k-gpt3.5.jsonChatGPT‑predicted rankings for the 10K queries
marco-train-100k.jsonlContains 100K queries sampled from MS MARCO
marco-train-100k-gpt3.5.jsonChatGPT‑predicted rankings for the 100K queries
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio