JUHE API Marketplace
DATASET
Open Source Community

matthewfranglen/aste-v2

Aspect Sentiment Triplet Extraction v2 is designed for extracting tuples consisting of a target entity, its associated sentiment, and the opinion span that explains the sentiment. It focuses on aspect‑based sentiment analysis (ABSA) to identify aspects of target entities and the polarity expressed for each aspect. The data are derived from SemEval 2014, 2015, and 2016 datasets, pre‑processed with spell correction and tokenization. The dataset includes training, validation, and test splits, each line containing index, text, start and end indices for aspect and opinion spans, the aspect and opinion terms, and the sentiment class.

Updated 10/9/2023
hugging_face

Description

Dataset Overview

Dataset Name

Aspect Sentiment Triplet Extraction v2

Language

  • English

Related Papers

  • 2107.12214
  • 2010.02609
  • 1911.01616

Dataset Scale

  • 1K < n < 10K

Task Types

  • Token Classification
  • Text Classification

Configuration Details

2014‑laptop‑sem‑eval

  • Training: data/2014/laptop/sem-eval/train.gz.parquet
  • Validation: data/2014/laptop/sem-eval/valid.gz.parquet
  • Test: data/2014/laptop/sem-eval/test.gz.parquet

2014‑laptop‑aste‑v2

  • Training: data/2014/laptop/aste/train.gz.parquet
  • Validation: data/2014/laptop/aste/valid.gz.parquet
  • Test: data/2014/laptop/aste/test.gz.parquet

... (remaining configuration sections omitted for brevity) ...

Dataset Description

Task Overview

The Aspect Sentiment Triplet Extraction (ASTE) task aims to extract target entities, associated sentiments, and opinion words from text. For example, given the sentence:

The screen is very large and crystal clear with amazing colors and resolution. The goal is to extract triples such as: [(screen, large, Positive), (screen, clear, Positive), (colors, amazing, Positive), (resolution, amazing, Positive)]

Data Source

The dataset is based on SemEval 2014, 2015, and 2016 datasets with additional preprocessing.

Dataset Details

Columns include index, text, aspect_start_index, aspect_end_index, aspect_term, opinion_start_index, opinion_end_index, opinion_term, and sentiment (negative, neutral, positive).

Pre‑processing

Includes spell correction and tokenization, e.g.:

Keyboard good sized and wasy to use. (easy misspelled as wasy).

Pre‑processed text may contain extra spaces, e.g.:

It s just as fast with one program open as it is with sixteen open.

Two dataset variants are provided: those ending with ‑aste‑v2 contain pre‑processed text, while those ending with ‑sem‑eval contain the original SemEval text.

Citation

@misc{xu2021learning, title={Learning Span‑Level Interactions for Aspect Sentiment Triplet Extraction}, author={Lu Xu and Yew Ken Chia and Lidong Bing}, year={2021}, eprint={2107.12214}, archivePrefix={arXiv}, primaryClass={cs.CL} } ... (additional citation entries omitted for brevity)

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Sentiment Analysis
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.