JUHE API Marketplace
DATASET
Open Source Community

irds/nfcorpus

The nfcorpus dataset is a text retrieval collection for medical information retrieval, consisting of 5,371 documents. Each document includes a document ID, URL, title, and abstract. The dataset was introduced by Vera Boteva et al. at the 2016 European Conference on Information Retrieval and has been used in several related sets such as `nfcorpus_dev`, `nfcorpus_test`, etc.

Updated 1/5/2023
hugging_face

Description

Dataset Overview

Dataset Name

nfcorpus

Source

Provided by the ir-datasets package.

Content

  • Data type: docs (documents, i.e., corpus)
  • Number of documents: 5,371

Use Cases

Used in multiple related datasets, including:

  • nfcorpus_dev
  • nfcorpus_dev_nontopic
  • nfcorpus_dev_video
  • nfcorpus_test
  • nfcorpus_test_nontopic
  • nfcorpus_test_video
  • nfcorpus_train
  • nfcorpus_train_nontopic
  • nfcorpus_train_video

Example Usage

python from datasets import load_dataset

docs = load_dataset(irds/nfcorpus, docs) for record in docs: record # {doc_id: ..., url: ..., title: ..., abstract: ...}

Citation

@inproceedings{Boteva2016Nfcorpus, title="A Full-Text Learning to Rank Dataset for Medical Information Retrieval", author = "Vera Boteva and Demian Gholipour and Artem Sokolov and Stefan Riezler", booktitle = "Proceedings of the European Conference on Information Retrieval ({ECIR})", location = "Padova, Italy", publisher = "Springer", year = 2016 }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing
Information Retrieval

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.