Back to datasets
Dataset assetOpen Source CommunityEnglish ExamVocabulary Analysis
NETEMVocabulary
The 2024 National Master's Graduate Entrance Exam English (I) Syllabus Vocabulary List contains 5,530 required words. Based on approximately 200 test papers from CET‑4/6, graduate English, and specialized English exams, the vocabulary list was ranked by frequency of occurrence. A lemmatization strategy was used for ranking. The top 2,444 words appear more than 40 times, i.e., roughly once every five test papers, and are considered true high‑frequency words. Definitions were manually cross‑checked and alternative spellings were listed to ensure data accuracy.
Source
github
Created
Oct 3, 2022
Updated
May 20, 2024
Signals
1,048 views
Availability
Linked source ready
Overview
Dataset description and usage context
Graduate Entrance Exam Vocabulary Frequency Ranking Dataset Overview
Dataset Description
- Vocabulary Source: The 2024 National Master's Graduate Entrance Exam English (I) Syllabus Vocabulary List, containing 5,530 entries.
- Frequency Statistics: Frequency ranking of the vocabulary list based on approximately 200 test paper texts from CET‑4/6, graduate English exams, and specialized English exams.
- Ranking Method: Utilizes lemmatization strategy, which may differ slightly from the actual exam presentation.
- High‑Frequency Vocabulary: The top 2,444 words appear more than 40 times, averaging one occurrence every five test papers.
- Data Accuracy: Definitions have undergone preliminary manual verification to ensure correctness. Alternate spellings are included for each word.
Data Storage
- Data Files:
netem_full_list.jsonstores all data and has been converted into anetem_full_list.sqlfile.
Dataset Usage
- Dataset License: Shared under CC BY‑NC‑SA 4.0.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.