DATASET
Open Source Community
FinancialDatasets
The SmoothNLP Financial Text Dataset comprises multiple sub‑datasets covering corporate business information, financial news, column articles, investment institution data, investment events, and 36Kr news, suitable for NLP research.
Updated 5/23/2024
github
Description
Dataset Overview
Dataset Name
- SmoothNLP Financial Text Dataset (Public)
Dataset Content
| Dataset Name | Fields | Samples | Total Rows | Download Link |
|---|---|---|---|---|
| Corporate Business Info | name,company_name,company_intro,business,address,registration_id,established_date,legal_representative,registered_capital,credit_code,website | 10 k | 500 k | Download |
| Financial News | title-新闻标题,content-新闻内容,pub_ts-发稿日期 | 20 k | 2.1 M | Download |
| Column Articles | title-新闻标题,content-新闻内容,pub_ts-发稿日期 | 10 k | 580 k | Download |
| Investment Institutions | institution_name,introduction,industry,size,round | 1 k | 30 k | Download |
| Investment Events | event_info,investor,funded_company,funding_event,round,amount | 2 k | 70 k | Download |
| 36Kr News | title-新闻标题,content-新闻内容,url-网址 | 10 k | 110 k | Download |
Recommended Research Directions
- Embedding (Word2Vec, BERT, etc.)
- Entity Recognition – NER
- Unsupervised Clustering: Cluster companies based on description information
- Industry Classification of Enterprises
- Title Summarization – Text Summary
- Sequence Classification
Data Showcase
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Finance
NLP
Source
Organization: github
Created: 5/27/2019
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.