JUHE API Marketplace
DATASET
Open Source Community

takala/financial_phrasebank

The FinancialPhrasebank is a dataset of financial news sentences for sentiment classification. It contains 4,840 English sentences, each classified according to the agreement rate of 5–8 annotators. The dataset is provided in four configurations based on annotator agreement levels (50%, 66%, 75%, and 100%). The purpose of creating the dataset is to address the lack of high‑quality training data for financial sentiment analysis. The dataset was annotated by 16 individuals with background knowledge of financial markets, including researchers and master's students. Use of the dataset is governed by the Creative Commons Attribution‑NonCommercial‑ShareAlike 3.0 Unported License.

Updated 1/18/2024
hugging_face

Description

Dataset Overview

  • Name: FinancialPhrasebank
  • Language: English
  • License: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
  • Multilinguality: Monolingual
  • Size: 1K<n<10K
  • Source Dataset: Original Data
  • Task Category: Text Classification
  • Task ID: Multi‑class Classification, Sentiment Classification
  • Label Creator: Expert Generated
  • Language Creator: Discovery

Dataset Structure

Data Instance

json { "sentence": "Pharmaceuticals group Orion Corp reported a fall in its third-quarter earnings that were hit by larger expenditures on R&D and marketing .", "label": "negative" }

Data Fields

  • sentence: The tokenized sentence in the dataset, data type is string.
  • label: The label corresponding to the class, data type is categorical label, categories include negative, neutral, positive.

Data Split

  • sentences_allagree: All annotators (100% agreement), 2,264 instances.
  • sentences_75agree: >=75% annotator agreement, 3,453 instances.
  • sentences_66agree: >=66% annotator agreement, 4,217 instances.
  • sentences_50agree: >=50% annotator agreement, 4,846 instances.

Dataset Creation

Source Data

  • Initial Data Collection and Normalization: English financial news downloaded from the LexisNexis database; 10,000 articles were randomly selected, resulting in approximately 5,000 sentences after filtering.
  • Source Language Producers: Multiple financial journalists.

Annotation

  • Annotation Process: 4,840 sentences were annotated by 16 individuals with financial background knowledge.
  • Annotators: Three researchers and thirteen master's students from Aalto University's School of Business, primarily specializing in finance, accounting, and economics.

Dataset Usage Considerations

  • Bias Discussion: All annotators are from the same institution, so annotator consistency should consider this factor.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Finance
Sentiment Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.