matthewfranglen/aste-v2
Aspect Sentiment Triplet Extraction v2 is designed for extracting tuples consisting of a target entity, its associated sentiment, and the opinion span that explains the sentiment. It focuses on aspect‑based sentiment analysis (ABSA) to identify aspects of target entities and the polarity expressed for each aspect. The data are derived from SemEval 2014, 2015, and 2016 datasets, pre‑processed with spell correction and tokenization. The dataset includes training, validation, and test splits, each line containing index, text, start and end indices for aspect and opinion spans, the aspect and opinion terms, and the sentiment class.
Description
Dataset Overview
Dataset Name
Aspect Sentiment Triplet Extraction v2
Language
- English
Related Papers
- 2107.12214
- 2010.02609
- 1911.01616
Dataset Scale
- 1K < n < 10K
Task Types
- Token Classification
- Text Classification
Configuration Details
2014‑laptop‑sem‑eval
- Training:
data/2014/laptop/sem-eval/train.gz.parquet - Validation:
data/2014/laptop/sem-eval/valid.gz.parquet - Test:
data/2014/laptop/sem-eval/test.gz.parquet
2014‑laptop‑aste‑v2
- Training:
data/2014/laptop/aste/train.gz.parquet - Validation:
data/2014/laptop/aste/valid.gz.parquet - Test:
data/2014/laptop/aste/test.gz.parquet
... (remaining configuration sections omitted for brevity) ...
Dataset Description
Task Overview
The Aspect Sentiment Triplet Extraction (ASTE) task aims to extract target entities, associated sentiments, and opinion words from text. For example, given the sentence:
The screen is very large and crystal clear with amazing colors and resolution. The goal is to extract triples such as: [(screen, large, Positive), (screen, clear, Positive), (colors, amazing, Positive), (resolution, amazing, Positive)]
Data Source
The dataset is based on SemEval 2014, 2015, and 2016 datasets with additional preprocessing.
Dataset Details
Columns include index, text, aspect_start_index, aspect_end_index, aspect_term, opinion_start_index, opinion_end_index, opinion_term, and sentiment (negative, neutral, positive).
Pre‑processing
Includes spell correction and tokenization, e.g.:
Keyboard good sized and wasy to use. (easy misspelled as wasy).
Pre‑processed text may contain extra spaces, e.g.:
It s just as fast with one program open as it is with sixteen open.
Two dataset variants are provided: those ending with ‑aste‑v2 contain pre‑processed text, while those ending with ‑sem‑eval contain the original SemEval text.
Citation
@misc{xu2021learning, title={Learning Span‑Level Interactions for Aspect Sentiment Triplet Extraction}, author={Lu Xu and Yew Ken Chia and Lidong Bing}, year={2021}, eprint={2107.12214}, archivePrefix={arXiv}, primaryClass={cs.CL} } ... (additional citation entries omitted for brevity)
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.