Explore high-quality datasets for your AI and machine learning projects.
The Cambridge English Write & Improve + LOCNESS dataset is an English corpus for grammatical error correction. Write & Improve is an online platform that helps non‑native English learners improve their writing; after a student submits an essay, the system provides instant feedback and human annotators assign a CEFR level. The LOCNESS corpus contains essays written by native English students and is annotated by Write & Improve annotators so that researchers can evaluate their systems across different English proficiency levels. The dataset supports tasks of correcting grammatical, lexical, and spelling errors. It provides two configurations, wi and locness, corresponding to different data sources and annotation methods.