irds/nfcorpus
The nfcorpus dataset is a text retrieval collection for medical information retrieval, consisting of 5,371 documents. Each document includes a document ID, URL, title, and abstract. The dataset was introduced by Vera Boteva et al. at the 2016 European Conference on Information Retrieval and has been used in several related sets such as `nfcorpus_dev`, `nfcorpus_test`, etc.
Description
Dataset Overview
Dataset Name
nfcorpus
Source
Provided by the ir-datasets package.
Content
- Data type:
docs(documents, i.e., corpus) - Number of documents: 5,371
Use Cases
Used in multiple related datasets, including:
nfcorpus_devnfcorpus_dev_nontopicnfcorpus_dev_videonfcorpus_testnfcorpus_test_nontopicnfcorpus_test_videonfcorpus_trainnfcorpus_train_nontopicnfcorpus_train_video
Example Usage
python from datasets import load_dataset
docs = load_dataset(irds/nfcorpus, docs) for record in docs: record # {doc_id: ..., url: ..., title: ..., abstract: ...}
Citation
@inproceedings{Boteva2016Nfcorpus, title="A Full-Text Learning to Rank Dataset for Medical Information Retrieval", author = "Vera Boteva and Demian Gholipour and Artem Sokolov and Stefan Riezler", booktitle = "Proceedings of the European Conference on Information Retrieval ({ECIR})", location = "Padova, Italy", publisher = "Springer", year = 2016 }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.