Back to datasets
Dataset assetOpen Source CommunityFinanceNLP
FinancialDatasets
The SmoothNLP Financial Text Dataset comprises multiple sub‑datasets covering corporate business information, financial news, column articles, investment institution data, investment events, and 36Kr news, suitable for NLP research.
Source
github
Created
May 27, 2019
Updated
May 23, 2024
Signals
204 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- SmoothNLP Financial Text Dataset (Public)
Dataset Content
| Dataset Name | Fields | Samples | Total Rows | Download Link |
|---|---|---|---|---|
| Corporate Business Info | name,company_name,company_intro,business,address,registration_id,established_date,legal_representative,registered_capital,credit_code,website | 10 k | 500 k | Download |
| Financial News | title-新闻标题,content-新闻内容,pub_ts-发稿日期 | 20 k | 2.1 M | Download |
| Column Articles | title-新闻标题,content-新闻内容,pub_ts-发稿日期 | 10 k | 580 k | Download |
| Investment Institutions | institution_name,introduction,industry,size,round | 1 k | 30 k | Download |
| Investment Events | event_info,investor,funded_company,funding_event,round,amount | 2 k | 70 k | Download |
| 36Kr News | title-新闻标题,content-新闻内容,url-网址 | 10 k | 110 k | Download |
Recommended Research Directions
- Embedding (Word2Vec, BERT, etc.)
- Entity Recognition – NER
- Unsupervised Clustering: Cluster companies based on description information
- Industry Classification of Enterprises
- Title Summarization – Text Summary
- Sequence Classification
Data Showcase
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.