Explore high-quality datasets for your AI and machine learning projects.
The qa_wikipedia dataset is a question‑answering dataset containing multiple documents extracted from Wikipedia along with associated questions. Features include document ID, title, context, question, answer start position, answer text, and the full article. The dataset is split into training, test, and validation subsets for different modeling stages.
Based on Korean Wikipedia data, this dataset is processed into a question‑answer format. Its goal is to be processed via code rather than a language model, and new processing ideas will be uploaded as new versions.