wangrongsheng/RerankerLLM-Dataset

This dataset supports training and testing of re‑ranking large language models (LLMs). It contains queries sampled from the MS MARCO dataset along with rankings predicted by ChatGPT. Files include 10 K‑ and 100 K‑scale query sets and their corresponding ChatGPT predictions.

Updated 4/5/2024

hugging_face

Description

Dataset Overview

License

License Type: Apache-2.0

Task Category

Task Category: Summarization

Dataset Size

Dataset Size Range: 10K < n < 100K

Dataset Files

File Name	Description
marco-train-10k.jsonl	Contains 10K queries sampled from MS MARCO
marco-train-10k-gpt3.5.json	ChatGPT‑predicted rankings for the 10K queries
marco-train-100k.jsonl	Contains 100K queries sampled from MS MARCO
marco-train-100k-gpt3.5.json	ChatGPT‑predicted rankings for the 100K queries

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Information Retrieval

Language Models

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →