JUHE API Marketplace
DATASET
Open Source Community

FinancialDatasets

The SmoothNLP Financial Text Dataset comprises multiple sub‑datasets covering corporate business information, financial news, column articles, investment institution data, investment events, and 36Kr news, suitable for NLP research.

Updated 5/23/2024
github

Description

Dataset Overview

Dataset Name

  • SmoothNLP Financial Text Dataset (Public)

Dataset Content

Dataset NameFieldsSamplesTotal RowsDownload Link
Corporate Business Infoname,company_name,company_intro,business,address,registration_id,established_date,legal_representative,registered_capital,credit_code,website10 k500 kDownload
Financial Newstitle-新闻标题,content-新闻内容,pub_ts-发稿日期20 k2.1 MDownload
Column Articlestitle-新闻标题,content-新闻内容,pub_ts-发稿日期10 k580 kDownload
Investment Institutionsinstitution_name,introduction,industry,size,round1 k30 kDownload
Investment Eventsevent_info,investor,funded_company,funding_event,round,amount2 k70 kDownload
36Kr Newstitle-新闻标题,content-新闻内容,url-网址10 k110 kDownload

Recommended Research Directions

  • Embedding (Word2Vec, BERT, etc.)
  • Entity Recognition – NER
  • Unsupervised Clustering: Cluster companies based on description information
  • Industry Classification of Enterprises
  • Title Summarization – Text Summary
  • Sequence Classification

Data Showcase

  • Investment Institutions
  • Investment Events
  • Corporate Business Info
  • Financial News
  • Column Articles
  • 36Kr News

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Finance
NLP

Source

Organization: github

Created: 5/27/2019

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.