DATASET
Open Source Community
wangrongsheng/RerankerLLM-Dataset
This dataset supports training and testing of re‑ranking large language models (LLMs). It contains queries sampled from the MS MARCO dataset along with rankings predicted by ChatGPT. Files include 10 K‑ and 100 K‑scale query sets and their corresponding ChatGPT predictions.
Updated 4/5/2024
hugging_face
Description
Dataset Overview
License
- License Type: Apache-2.0
Task Category
- Task Category: Summarization
Dataset Size
- Dataset Size Range: 10K < n < 100K
Dataset Files
| File Name | Description |
|---|---|
| marco-train-10k.jsonl | Contains 10K queries sampled from MS MARCO |
| marco-train-10k-gpt3.5.json | ChatGPT‑predicted rankings for the 10K queries |
| marco-train-100k.jsonl | Contains 100K queries sampled from MS MARCO |
| marco-train-100k-gpt3.5.json | ChatGPT‑predicted rankings for the 100K queries |
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Information Retrieval
Language Models
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.