JUHE API Marketplace
DATASET
Open Source Community

wangrongsheng/RerankerLLM-Dataset

This dataset supports training and testing of re‑ranking large language models (LLMs). It contains queries sampled from the MS MARCO dataset along with rankings predicted by ChatGPT. Files include 10 K‑ and 100 K‑scale query sets and their corresponding ChatGPT predictions.

Updated 4/5/2024
hugging_face

Description

Dataset Overview

License

  • License Type: Apache-2.0

Task Category

  • Task Category: Summarization

Dataset Size

  • Dataset Size Range: 10K < n < 100K

Dataset Files

File NameDescription
marco-train-10k.jsonlContains 10K queries sampled from MS MARCO
marco-train-10k-gpt3.5.jsonChatGPT‑predicted rankings for the 10K queries
marco-train-100k.jsonlContains 100K queries sampled from MS MARCO
marco-train-100k-gpt3.5.jsonChatGPT‑predicted rankings for the 100K queries

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Information Retrieval
Language Models

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.