Back to datasets
Dataset assetOpen Source CommunityLanguage ModelsInformation Retrieval
wangrongsheng/RerankerLLM-Dataset
This dataset supports training and testing of re‑ranking large language models (LLMs). It contains queries sampled from the MS MARCO dataset along with rankings predicted by ChatGPT. Files include 10 K‑ and 100 K‑scale query sets and their corresponding ChatGPT predictions.
Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 5, 2024
Signals
48 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
License
- License Type: Apache-2.0
Task Category
- Task Category: Summarization
Dataset Size
- Dataset Size Range: 10K < n < 100K
Dataset Files
| File Name | Description |
|---|---|
| marco-train-10k.jsonl | Contains 10K queries sampled from MS MARCO |
| marco-train-10k-gpt3.5.json | ChatGPT‑predicted rankings for the 10K queries |
| marco-train-100k.jsonl | Contains 100K queries sampled from MS MARCO |
| marco-train-100k-gpt3.5.json | ChatGPT‑predicted rankings for the 100K queries |
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.