Explore high-quality datasets for your AI and machine learning projects.
The GovReport dataset comprises reports and their abstracts authored by U.S. governmental research agencies such as the Congressional Research Service and the Government Accountability Office. Compared with other long‑document summarization datasets, GovReport features longer documents and abstracts, requiring more context to cover key summary points. It provides three configurations: plain_text (default), plain_text_with_recommendations, and structure, corresponding to different data formats. The language is English; size ranges between 10 K and 100 K; license is CC BY 4.0.