takala/financial_phrasebank

The FinancialPhrasebank is a dataset of financial news sentences for sentiment classification. It contains 4,840 English sentences, each classified according to the agreement rate of 5–8 annotators. The dataset is provided in four configurations based on annotator agreement levels (50%, 66%, 75%, and 100%). The purpose of creating the dataset is to address the lack of high‑quality training data for financial sentiment analysis. The dataset was annotated by 16 individuals with background knowledge of financial markets, including researchers and master's students. Use of the dataset is governed by the Creative Commons Attribution‑NonCommercial‑ShareAlike 3.0 Unported License.

Updated 1/18/2024

hugging_face

Description

Dataset Overview

Name: FinancialPhrasebank
Language: English
License: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
Multilinguality: Monolingual
Size: 1K<n<10K
Source Dataset: Original Data
Task Category: Text Classification
Task ID: Multi‑class Classification, Sentiment Classification
Label Creator: Expert Generated
Language Creator: Discovery

Dataset Structure

Data Instance

json { "sentence": "Pharmaceuticals group Orion Corp reported a fall in its third-quarter earnings that were hit by larger expenditures on R&D and marketing .", "label": "negative" }

Data Fields

sentence: The tokenized sentence in the dataset, data type is string.
label: The label corresponding to the class, data type is categorical label, categories include negative, neutral, positive.

Data Split

sentences_allagree: All annotators (100% agreement), 2,264 instances.
sentences_75agree: >=75% annotator agreement, 3,453 instances.
sentences_66agree: >=66% annotator agreement, 4,217 instances.
sentences_50agree: >=50% annotator agreement, 4,846 instances.

Dataset Creation

Source Data

Initial Data Collection and Normalization: English financial news downloaded from the LexisNexis database; 10,000 articles were randomly selected, resulting in approximately 5,000 sentences after filtering.
Source Language Producers: Multiple financial journalists.

Annotation

Annotation Process: 4,840 sentences were annotated by 16 individuals with financial background knowledge.
Annotators: Three researchers and thirteen master's students from Aalto University's School of Business, primarily specializing in finance, accounting, and economics.

Dataset Usage Considerations

Bias Discussion: All annotators are from the same institution, so annotator consistency should consider this factor.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Finance

Sentiment Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →