irds/nfcorpus

The nfcorpus dataset is a text retrieval collection for medical information retrieval, consisting of 5,371 documents. Each document includes a document ID, URL, title, and abstract. The dataset was introduced by Vera Boteva et al. at the 2016 European Conference on Information Retrieval and has been used in several related sets such as `nfcorpus_dev`, `nfcorpus_test`, etc.

Updated 1/5/2023

hugging_face

Dataset Overview

Dataset Name

nfcorpus

Source

Provided by the ir-datasets package.

Content

Data type: docs (documents, i.e., corpus)
Number of documents: 5,371

Use Cases

Used in multiple related datasets, including:

nfcorpus_dev
nfcorpus_dev_nontopic
nfcorpus_dev_video
nfcorpus_test
nfcorpus_test_nontopic
nfcorpus_test_video
nfcorpus_train
nfcorpus_train_nontopic
nfcorpus_train_video

Example Usage

python from datasets import load_dataset

docs = load_dataset(irds/nfcorpus, docs) for record in docs: record # {doc_id: ..., url: ..., title: ..., abstract: ...}

Citation

@inproceedings{Boteva2016Nfcorpus, title="A Full-Text Learning to Rank Dataset for Medical Information Retrieval", author = "Vera Boteva and Demian Gholipour and Artem Sokolov and Stefan Riezler", booktitle = "Proceedings of the European Conference on Information Retrieval ({ECIR})", location = "Padova, Italy", publisher = "Springer", year = 2016 }

irds/nfcorpus

Description

Dataset Overview

Dataset Name

Source

Content

Use Cases

Example Usage

Citation

AI studio

Access Dataset

Topics

Source