irds/msmarco-document-v2_trec-dl-2019
The msmarco-document-v2/trec-dl-2019 dataset, provided by the ir-datasets package, focuses on text retrieval tasks. It contains 200 queries and 13,940 relevance judgments (qrels) for evaluating document retrieval systems. Example usage includes loading and processing the data with HuggingFace's datasets library in Python.
Dataset description and usage context
Dataset Overview
Dataset Name
msmarco-document-v2/trec-dl-2019
Data Source
- Source dataset:
irds/msmarco-document-v2
Task Category
- Text Retrieval
Data Content
queries(query topics): count = 200qrels(relevance judgments): count = 13,940docs: use theirds/msmarco-document-v2dataset
Usage
from datasets import load_dataset
queries = load_dataset(irds/msmarco-document-v2_trec-dl-2019, queries)
for record in queries:
record # {query_id: ..., text: ...}
qrels = load_dataset(irds/msmarco-document-v2_trec-dl-2019, qrels)
for record in qrels:
record # {query_id: ..., doc_id: ..., relevance: ..., iteration: ...}
Citation Information
@inproceedings{Craswell2019TrecDl, title={Overview of the TREC 2019 deep learning track}, author={Nick Craswell and Bhaskar Mitra and Emine Yilmaz and Daniel Campos and Ellen Voorhees}, booktitle={TREC 2019}, year={2019} } @inproceedings{Bajaj2016Msmarco, title={MS MARCO: A Human Generated MAchine Reading COmprehension Dataset}, author={Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang}, booktitle={InCoCo@NIPS}, year={6016} }
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.