Explore high-quality datasets for your AI and machine learning projects.
The dataset collects 728,321 Wikipedia biographies, intended to evaluate text generation algorithms. Each article provides the opening paragraph and infobox (both tokenized). It is used to assess algorithms that generate text from structured data, particularly in the biography domain.