Explore high-quality datasets for your AI and machine learning projects.
HybridQA is a large‑scale question‑answering dataset that requires reasoning over heterogeneous information. Each question is linked to a Wikipedia table and multiple free‑form passages that are aligned with entities in the table. The questions are designed to need both tabular and textual information; missing either source makes the question unanswerable. The dataset includes a training set (62,682 instances), a validation set (3,466), and a test set (3,463). The language is English and it is released under a CC‑BY‑4.0 license.