Back to datasets
Dataset assetOpen Source CommunityFinanceNLP

FinancialDatasets

The SmoothNLP Financial Text Dataset comprises multiple sub‑datasets covering corporate business information, financial news, column articles, investment institution data, investment events, and 36Kr news, suitable for NLP research.

Source
github
Created
May 27, 2019
Updated
May 23, 2024
Signals
204 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • SmoothNLP Financial Text Dataset (Public)

Dataset Content

Dataset NameFieldsSamplesTotal RowsDownload Link
Corporate Business Infoname,company_name,company_intro,business,address,registration_id,established_date,legal_representative,registered_capital,credit_code,website10 k500 kDownload
Financial Newstitle-新闻标题,content-新闻内容,pub_ts-发稿日期20 k2.1 MDownload
Column Articlestitle-新闻标题,content-新闻内容,pub_ts-发稿日期10 k580 kDownload
Investment Institutionsinstitution_name,introduction,industry,size,round1 k30 kDownload
Investment Eventsevent_info,investor,funded_company,funding_event,round,amount2 k70 kDownload
36Kr Newstitle-新闻标题,content-新闻内容,url-网址10 k110 kDownload

Recommended Research Directions

  • Embedding (Word2Vec, BERT, etc.)
  • Entity Recognition – NER
  • Unsupervised Clustering: Cluster companies based on description information
  • Industry Classification of Enterprises
  • Title Summarization – Text Summary
  • Sequence Classification

Data Showcase

  • Investment Institutions
  • Investment Events
  • Corporate Business Info
  • Financial News
  • Column Articles
  • 36Kr News
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio